An XML sitemap is a file that lists a website’s important URLs to help search engines discover, crawl, and index content more efficiently. Search engines like Google, Bing, and Yahoo! use XML sitemaps to find pages they might otherwise miss, especially on large or complex sites. The main benefits include faster indexing of new content, better crawl budget management, and improved visibility for pages with weak internal linking. XML sitemaps serve as a direct communication channel between webmasters and search engines, ensuring important URLs are submitted for consideration. The key components include the URL location (loc tag), last modification date (lastmod tag), change frequency (changefreq tag), and priority score (priority tag). This guide covers everything from XML sitemap structure and creation to submission, testing, and SEO best practices.
What Is an XML Sitemap?
An XML sitemap is an XML file that contains a structured list of URLs designed specifically for search engines. Google introduced XML sitemaps in June 2005 to solve the problem of URL discovery across the internet. The principle was straightforward: webmasters create and maintain a definitive list of important URLs on their websites so search engines can quickly discover when pages are created, updated, or removed.
XML stands for Extensive Markup Language, which uses tags to structure data in a machine-readable format. Unlike HTML sitemaps designed for human visitors, XML sitemaps communicate directly with search engine crawlers like Googlebot. They act as a safety net, helping crawlers find pages that aren’t easily accessed through normal site crawling, such as orphaned pages with no internal links pointing to them.
Advantages and SEO Importance
XML sitemaps provide 5 main advantages for SEO:
- Faster content discovery: New pages get indexed quicker when listed in a sitemap rather than waiting for crawlers to find them through links.
- Crawl budget optimization: Search engines allocate a limited crawl budget to each site. XML sitemaps help crawlers prioritize important pages over less significant ones.
- Improved indexing for large sites: Websites with thousands of pages benefit significantly because crawlers can systematically access all URLs without navigating complex site structures.
- Support for media content: Image and video sitemaps help search engines index visual content that standard crawling might miss.
- Better handling of dynamic content: Frequently updated sites like news portals or ecommerce stores ensure search engines stay current with changes.
For new websites with few backlinks, XML sitemaps are especially valuable because they compensate for the lack of external signals that typically help search engines discover content.
Difference Between XML Sitemap and HTML Sitemap
XML and HTML sitemaps serve different audiences and purposes. XML sitemaps are designed for search engine bots, while HTML sitemaps help human visitors navigate a website.
| Feature | XML Sitemap | HTML Sitemap |
| Audience | Search engine crawlers | Website visitors |
| Format | Machine-readable XML | Visual HTML page |
| Purpose | Facilitate indexing | Improve navigation |
| Visibility | Not viewed by users | Accessible in footer or dedicated page |
| SEO contribution | Direct crawl optimization | Indirect via user experience |
| Update method | Often dynamic/automatic | Usually manual |
Using both sitemap types together creates an ideal setup. XML sitemaps provide the technical foundation for search engines, while HTML sitemaps help users find content when site navigation is insufficient.
XML Sitemap File Structure and Tags (loc, lastmod, changefreq, priority)
An XML sitemap follows a specific format that search engines can parse quickly. Here’s a standard example:
“`xml
<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”https://www.sitemaps.org/schemas/sitemap/0.9″>
<url>
<loc>https://www.example.com/</loc>
<lastmod>2024-01-01</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://www.example.com/about</loc>
<lastmod>2023-12-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
“`
The 4 core tags in an XML sitemap file serve specific functions:
- `<loc>`: Specifies the full absolute URL of the page. Must include the protocol (https://) and complete path.
- `<lastmod>`: Indicates when the page was last modified. Uses ISO 8601 date format (YYYY-MM-DD). Google uses this element to gather data on content freshness.
- `<changefreq>`: Suggests how often the page content changes. Valid values include always, hourly, daily, weekly, monthly, yearly, and never.
- `<priority>`: Expresses the page’s importance relative to other URLs on the same site. Values range from 0.0 to 1.0, with 1.0 being the most important.
The `<urlset>` wrapper contains all URL entries, and each `<url>` tag encloses an individual page’s metadata.
When to Use an XML Sitemap and When It’s Not Necessary
Use an XML sitemap in these 5 situations:
- Large websites with hundreds or thousands of pages where crawlers might miss content
- New websites with few external backlinks and limited discoverability
- Frequently updated sites like news portals, blogs, or ecommerce stores
- Sites with poor internal linking where some pages remain orphaned
- Sites with rich media content including images and videos that need specialized indexing
An XML sitemap is not necessary in these 3 cases:
- Small static sites with fewer than 20 pages and strong internal linking
- Sites where all pages are already well-indexed through natural crawling
- Single-page websites with no additional content to discover
Even when not strictly necessary, having an XML sitemap remains advantageous because it provides search engines with a clear roadmap of your content.
Where Do XML Sitemaps Live?
XML sitemaps live in the root directory of a website, immediately after the domain name. The standard naming convention is `domain.com/sitemap.xml`. This location is important because search engines automatically check for sitemaps at this default path.
The sitemap URL should also be declared in the `robots.txt` file so crawlers can discover it easily. Add this line to your robots.txt:
“`
Sitemap: https://www.example.com/sitemap.xml
“`
This declaration ensures that when Googlebot crawls the robots.txt file, it immediately knows where to find the sitemap without needing additional discovery steps.
What URLs Should Be Included In An XML Sitemap?
Include only URLs that are intended to be crawled, indexed, and ranked in search results. URLs must meet 4 specific criteria:
- Return a clean 200 status code
- Not contain a noindex tag
- Not be canonicalized elsewhere
- Not be blocked by robots.txt
HTTP Status Codes
Sitemap URLs should return clean 200 status codes. Do not include URLs that return 301 or 302 redirects, 404 errors, or 410 errors. Google won’t index pages returning 404 errors. If Googlebot encounters a 301 redirect, it typically follows the redirect to the destination URL and indexes that instead.
If a URL has been changed, apply a 301 redirect from the old URL to the new one, then replace the old URL in the sitemap with the new destination URL.
Noindexed & Disallowed Pages
Pages with a noindex tag should not appear in your XML sitemap. These pages are clearly not intended for indexing. Similarly, exclude URLs blocked by robots.txt from being crawled.
If you find noindexed or disallowed pages in your sitemap, investigate whether those restrictions should be removed. A rogue robots.txt rule or misplaced noindex tag might be blocking pages you actually want indexed.
Non-Canonical URLs
If a page in the sitemap has a canonical tag pointing to another page, remove that URL and replace it with the canonicalized one. Including non-canonical URLs confuses search engines about which version of the content to index.
Does Every Clean 200 Status URL Need To Be Included?
No. Especially on very large sites, prioritize the most important pages and include those in the XML sitemap. Lower priority, less important pages can be omitted. A page not included in the XML sitemap can still get crawled and indexed through internal links and other discovery methods.
Sitemap Limits & Index Files
An XML sitemap can contain a maximum of 50,000 URLs and cannot exceed 50MB in file size. Sitemaps exceeding these limits may get partially crawled or ignored completely.
For websites with more than 50,000 URLs, create multiple sitemaps and use a sitemap index file. A sitemap index file is a sitemap that links to other sitemaps:
“`xml
<?xml version=”1.0″ encoding=”UTF-8″?>
<sitemapindex xmlns=”https://www.sitemaps.org/schemas/sitemap/0.9″>
<sitemap>
<loc>https://www.example.com/sitemap-pages.xml</loc>
<lastmod>2024-01-01</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-products.xml</loc>
<lastmod>2024-01-01</lastmod>
</sitemap>
</sitemapindex>
“`
Structure URLs methodically in hierarchical sitemaps by grouping site sections or subfolders together. This helps Google understand how URLs interrelate and improves crawl efficiency.
Types of XML Sitemaps
XML sitemaps come in 4 main types, each serving a different purpose:
- Standard sitemaps: List regular web pages for indexing.
- Image sitemaps: Help search engines index visual content and allow adding details like locations and captions. Useful for sites with significant image content.
- Video sitemaps: Help videos appear in search results and provide metadata about video content, including title, description, and duration.
- News sitemaps: Highlight recently published articles less than two days old. Designed for news websites, these can list up to 1,000 URLs at a time.
Dynamic vs. Static
Dynamic sitemaps automatically update to include new URLs and exclude URLs that shouldn’t be included. Most CMS platforms generate dynamic sitemaps. When configured correctly, they require minimal maintenance.
Static sitemaps are created manually or using tools like Screaming Frog SEO spider. They offer greater control over which URLs are included but don’t automatically update. For sites where dynamic sitemaps cannot be configured to meet sitemap criteria, static sitemaps work well with regular updates scheduled quarterly or more frequently.
How to Create an XML Sitemap?
Creating an XML sitemap involves 5 methods:
- Use automatic plugins: In WordPress, plugins like Yoast SEO or Rank Math create sitemaps automatically.
- Online sitemap tools: Enter your site URL into tools like XML-sitemaps.com to generate a sitemap file.
- Manual writing: Write the XML structure by hand using `<urlset>`, `<url>`, and `<loc>` tags if you have technical knowledge.
- Framework support: Modern frameworks like Laravel, Next.js, and Nuxt have modules for generating sitemaps.
- SEO spider tools: Screaming Frog can crawl your site and export a properly formatted XML sitemap.
Place the sitemap file in the site’s root directory with the name sitemap.xml. Add the sitemap path to robots.txt for easier bot access.
Automatic Sitemap Generation in WordPress and Other Platforms
For sites using WordPress, Shopify, Wix or Squarespace
Check your XML sitemap by typing yourdomain.com/sitemap.xml in your browser. These platforms automatically generate sitemaps:
- WordPress: WordPress 5.5+ includes basic sitemap support. Plugins like Yoast SEO, Rank Math, or All in One SEO provide enhanced sitemap functionality.
- Shopify: Sitemaps generate automatically and live in the root directory.
- Wix and Squarespace: Sitemaps prepare in the background with no setup required.
For sites using other CMS providers
Contact the platform’s support team to find your site’s XML sitemap. Most CMS providers automatically generate one. Check documentation for the specific sitemap URL format.
For sites whose platform doesn’t provide an XML sitemap
Use a sitemap generator tool where you enter your website’s URL. The tool scans your site and creates a sitemap file in XML format. Download the file and upload it to your site’s root directory.
XML Sitemaps SEO Best Practice
Keep URLs up to date
Remove old pages from your sitemap when they’re deleted from the site. Add new pages you want visible in search results. Automated solutions eliminate the need for manual tweaks.
Don’t include redirects
If you apply a 301 redirect to a URL, remove the origin URL from your XML sitemap and replace it with the target URL. Automatically generated sitemaps handle this, but manual updates require attention to avoid confusing search engines.
Include your sitemap in your robots.txt
Signposting the URL of your XML sitemap via robots.txt is best practice. It clearly shows search engines you want them to access and index certain URLs.
Create individual XML sitemaps for different areas of your site
Create separate sitemaps for different site sections. For example, blogs should have their own sitemap separate from other page types. Ecommerce sites benefit from splitting category and product pages into distinct groups. List these individual sitemaps on an XML sitemap index page.
This approach helps search engines understand site architecture and makes it easier to spot where indexation issues are occurring. If a specific sitemap shows many pages not making it into the index, investigate problems specific to that area.
Common Errors in XML Sitemaps and How to Fix Them
Whitespace errors
Plugin conflicts can create lines of whitespace at the top of your XML sitemap, preventing crawling. Look for these errors in Google Search Console under the Sitemaps tab:
- “XML or text declaration not at start of entity”
- “XML declaration allowed only at the start of the document”
Check for conflicts between your sitemap plugin and other plugins. In some cases, the website theme also creates whitespace errors.
Incorrect URL formatting
URLs must follow the standard format. Any sections not conforming to this standard prevent crawling. Automated sitemap solutions usually avoid this problem, but manual updates can introduce errors.
Google Search Console and Bing Webmaster Tools highlight the location of URL formatting errors and the problem that needs addressing.
Sitemaps exceeding limits
Split URLs into groups of no more than 50,000 per file. Individual sitemaps must not exceed 50MB. URLs over the limit in one sitemap will not be read by search engines.
Missing or incorrect metadata
Include accurate metadata like the `<lastmod>` element. Google uses this data to understand content changes. Only update `<lastmod>` when noteworthy content changes occur. Inaccurate updates can cause search engines to ignore this element going forward.
How to Submit Your XML Sitemap to Google
Step 1: Log into Google Search Console
Log into your Google Search Console account. Create an account and set up your website if you don’t have one.
Step 2: Go to ‘Sitemaps’
Navigate to the ‘Sitemaps’ section under Indexing. You’ll find “Add a new sitemap” and “Submitted sitemaps” on this page.
Step 3: Submit your sitemap
Enter your sitemap URL (for example, sitemap.xml) and click Submit. You’ll receive confirmation that the submission was successful.
Your sitemap appears in the “Submitted sitemaps” section showing when it was submitted and how many pages Google discovered. A green “Success” status indicates no issues.
Submission to Webmaster Tools
In Bing Webmaster Tools
Navigate to the sitemaps section from the main dashboard. Click “Submit a Sitemap” at the bottom right and enter your sitemap’s URL.
Submit to both Google Search Console and Bing Webmaster Tools to ensure all major search engines can access your sitemap.
How to Test Your XML Sitemap
Test your XML sitemap regularly using Google Search Console and Bing Webmaster Tools. Two common status messages indicate problems:
“Couldn’t fetch”
This status means Google couldn’t access your sitemap. This usually happens if your sitemap URL is incorrect or there are connectivity issues. Check that your server isn’t blocking Googlebot, your sitemap is in the right location, and the URL is correct.
“Sitemap has X errors”
Google could read some of the sitemap but found specific errors. These errors typically result from broken URLs, improper formatting, redirects, or blocked pages. Google specifies exactly which URLs have errors and what the problems are, making identification and fixing straightforward.
XML Sitemap FAQs
Do I need an XML sitemap?
Yes, it’s highly recommended. An XML sitemap helps search engines discover your content, stay current with changes, and understand your site architecture. By splitting out sitemaps, you can also spot where common indexation issues occur across different areas of your site.
How often should you update your sitemap?
Update your sitemap every time new content is added, existing content is updated, or pages are removed. This ensures search engines are aware of changes and can crawl and index pages efficiently.
What happens if my sitemap has errors?
Search engines may struggle to crawl and index your pages properly if your sitemap has errors. Investigate and fix any errors that appear in Google Search Console or Bing Webmaster Tools promptly.
How many URLs can an XML sitemap have?
An XML sitemap can have up to 50,000 URLs. The total file size cannot exceed 50MB. For websites exceeding these limits, create multiple sitemaps and use a sitemap index to link them together.
Should I include noindex pages in my sitemap?
No. Pages with noindex applied should not be included in your sitemap. Adding these pages confuses search engines and makes your sitemap less effective. Focus on URLs you want to be indexed.
Can a sitemap improve my SEO rankings?
Sitemaps don’t directly boost SEO rankings, but they play a crucial role in helping search engines discover your pages. By using a sitemap, you ensure all important pages are crawled, leading to better visibility and improved SEO performance over time.


