Search Engine XML Sitemap Improvements

In December 2006, Google, Yahoo! & Microsoft collaborated and all agreed to support the new XML sitemap protocol that Google released as a beta in 2005.

Implementing an XML sitemap for a web site is a simple way for a webmaster to inform the search engines what content exists on their site that they absolutely want indexed. The XML sitemap does not necessarily need to include all content on a site you want indexed, however the content that exists within the XML sitemap is looked upon as a priority for indexing.

When the XML sitemap protocol was initially released by Google as a beta, webmasters needed to inform Google of its existence through the Google Webmasters Tools utility. When Yahoo! and Microsoft joined the party, all vendors accepted a standard HTTP request to a given URL as notification of the XML sitemaps location. These methods have worked fine, however required a little bit of extra work for each search engine. It was recently announced that you can now specify the location of the XML sitemap within a standard robots.txt file.

It’s a small change to the robots.txt file, however it’s an improvement that makes so much sense since the robots.txt file is specifically for the search engine crawlers. If you want to use this new notification method, simply add the following information into your existing robots.txt file:

  • Sitemap: <sitemap_location>

It is possible to list more than one sitemap using this mechanism, however if you’re already providing a sitemap index file – a single reference to the index file is all that is required. The sitemap_location should be the fully qualified location of the sitemap, such as http://www.mydomain.com/sitemap.xml.