What Is an XML Sitemap?
An XML sitemap is a website file that lists pages you want search engines to index. It also provides extra information, like when the page was last updated.
XML stands for Extensible Markup Language. It is a format that allows easy storage of URL data and makes it easier for search engines to parse the data.
The XML sitemap looks something like this:
Having an XML sitemap is not a requirement. But it can help search engines discover your new and updated pages. Which means it can boost your SEO.
Google introduced XML sitemaps in 2005. Other search engines like Yahoo and Microsoft joined shortly after.
In this post, we’ll take a look at whether you need a sitemap, as well as various sitemap types and best practices. Then, we’ll cover how to create, check, and submit your XML sitemap.
Do I Need an XML Sitemap?
Google provides example cases when having a sitemap may not be necessary.
These include cases when your site has less than 500 pages. Or when your pages are properly internally linked.
So you may ask: Do I need a sitemap if I have a small website with strong internal linking?
The answer is: You don’t. But you should have one anyway.
- The effort it takes to create a sitemap is minimal
- Having a sitemap can only be beneficial for your site (and will never hurt it)
- Having a sitemap can speed up the process of Google discovering your pages
As Gary Illyes from Google confirmed, XML sitemaps are the second most important source for Google to discover new URLs.
Why ignore it?
And, of course, having an XML sitemap is an absolute must if you own a large website (thousands of pages) and/or your website is new and doesn’t have many links.
XML Sitemap Types
There are various types of XML sitemaps.
Google supports sitemaps for different types of content. Namely:
You can either integrate these special media types into a regular sitemap or create dedicated sitemaps for them.
Although most websites only need one simple sitemap, there are cases when you may need multiple sitemaps or sitemaps for different file types.
Say your site has a huge number of indexable URLs. XML sitemaps have a size limit of 50,000 URLs or 50MB. So you’d need to use multiple sitemaps if you exceed that limit.
Finally, you can use separate sitemaps for various types of pages, such as blog posts or author pages.
If you use more than one XML sitemap, use a sitemap index. It’s a sitemap that lists all your other sitemaps.
Here’s what a sitemap index can look like:
XML Sitemap Best Practices
Now, let’s take a look at Google’s technical guidelines and best practices for XML sitemaps and how to set up your XML sitemap correctly:
- Only include URLs you want to have indexed by search engines. (For example, if you have multiple versions of the same page, include only the canonical URL.)
- Only include URLs that return a 200 status code. (No redirects or 4xx/5xx error status codes.)
- Make sure a single sitemap file is less than 50MB or 50,000 URLs. Use multiple sitemaps if needed.
- If you use multiple sitemaps, create an index sitemap that will list all of them.
- Make sure your sitemap is UTF-8 encoded.
- Include links to localized version(s) of each URL. (See documentation by Google.)
- Update your sitemap every time there’s a new URL or an old URL has been updated.
- Include information about when the page was last updated (the “lastmod” attribute).
- Link to your sitemap from your robots.txt file. (Read about the sitemap directive in robots.txt.)
- Submit your sitemap to Google. (You’ll learn how to do it in one of the next chapters.)
The good news is that if you use a CMS (content management system), plugin, or sitemap generator to create a sitemap.xml file, it’ll make sure it meets most of the requirements listed above.
Now, you’re ready to learn how to create an XML sitemap.
How to Generate an XML Sitemap
Most CMSs create and update your XML sitemap automatically. These include:
There’s typically little to no room for manual edits of your sitemap in a CMS. But that’s not a problem for most users.
So you don’t have to do anything if you use one of those.
Note: If you don’t use a CMS, jump to the XML Sitemap Generators section.
XML Sitemap Plugins in WordPress
If you’re a more advanced WordPress user who wants to have full control over the sitemap.xml file, you have the option to replace the default one.
For example, there is no simple way to exclude certain pages from a WordPress-generated XML sitemap (other than editing the PHP code).
This is where plugins come in handy.
In general, you can use two types of WordPress plugins to create an XML sitemap:
We’ll take a closer look at Yoast SEO, one of the most popular WordPress SEO plugins.
Note: If you don’t have an SEO plugin yet, follow our detailed instructions on how to set one up in our WordPress SEO guide.
Once installed, Yoast SEO both creates a new sitemap and replaces the native WordPress sitemap automatically.
You can go to “Yoast SEO” settings in the left menu of the WordPress dashboard.
Go to “General” settings and click the “Features” tab. Here, you’ll find the “XML Sitemap” section.
The feature will be “On” automatically.
To view your actual XML sitemap, click the question mark symbol. Then, click the “See the XML sitemap” link.
Yoast automatically creates an index sitemap that consists of individual sitemaps for posts, pages, categories, authors, etc.
It will look like this:
After clicking the link to an individual sitemap, you’ll see a list of URLs it includes:
If you want to exclude some pages from your sitemap, you can simply disable their indexing through the Yoast plugin.
This means search engines won’t show these pages in search results. And Yoast will remove them from the sitemap.
To exclude an individual URL, you need to go to the editor of that specific page or post.
At the bottom, you’ll find the Yoast SEO settings. Expand the “Advanced” section.
Then, in the “Allow search engines to show this Post in search results?” section, select “No.”
To exclude whole content types, archives, and taxonomies, you’ll need to go to the Yoast SEO “Search Appearance” settings.
Say you don’t want Google to index your author archives.
Go to the “Archives” tab.
Then, under “Show author archives in search results?,” switch the button to “Off.”
This setting will also remove the author sitemap from your sitemap index.
To learn more about customizing your sitemap index, read this guide by Yoast.
XML Sitemap Generators
If you don’t use a CMS, you have two options when it comes to creating an XML sitemap:
- Creating the sitemap manually
- Using a sitemap generator
Creating a sitemap manually can be OK if you have a static website with a couple of pages. But this option is tedious for larger sites with content that changes frequently.
It’s generally more practical to use a sitemap generator—a tool that creates the sitemap for you automatically.
We recommend using a downloadable desktop tool (like Inspyder Sitemap Creator or Sitemap Writer Pro) that updates your sitemap every time a page is created or changed.
Tip: Read our post about the best sitemap generators to learn more about your options.
How to Check Your XML Sitemap
To check the functionality of your sitemap, you can use Semrush’s Site Audit tool.
All you need to do is to create a free account (no credit card needed) and set up your first crawl. (This setup guide will help you get started.)
The overview of your audit will look something like this:
To find any XML sitemap issues, head to the “Issues” tab.
Then, search for “sitemap” to only see issues related to your sitemap.xml file.
The issues will be divided into three categories—errors, warnings, and notices—based on their severity.
In case the issue pertains to multiple URLs listed in your sitemap, you can click the link related to that issue and see all affected URLs.
To learn more about the issue, you can always click the “Why and how to fix it” link. A modal window will appear, describing the issue and proposing ways to fix it.
Here’s an example of this modal for the “Sitemap.xml not indicated in robots.txt” warning:
If you want to learn more about the Site Audit tool and the various technical SEO aspects of your website it can check for you, check out our detailed Site Audit guide.
Also, consider performing an overall technical SEO audit for your website.
How to Submit Your XML Sitemap to Google
Although Google crawlers will eventually pick up your sitemap on their own, it is best practice to submit the sitemap’s URL to Google.
- It will speed up the process of discovering your sitemap
- It will help you identify issues Google might have with your sitemap
You can do this in Google Search Console.
Note: If you don’t have a Google Search Console account, read our article about the tool and how to set it up.
First, open the “Sitemaps” dashboard in Google Search Console. You’ll find it in the left menu, under the “Indexing” section.
Copy and paste the URL of your sitemap to the “Add a new sitemap” field and hit “Submit.”
If you have multiple sitemaps, you don’t need to submit each one separately. Just submit your index sitemap that lists all of your other sitemaps.
Your sitemap won’t be processed immediately. (In fact, it can take up to a couple of days.)
Once Google indexes your sitemap, it will show a green “Success” status next to your sitemap in the “Submitted sitemaps” section.
If there are issues with your sitemap, the status will be “Has errors” or “Couldn’t fetch.”
To view the detected issues, click the row containing your sitemap. Google also provides detailed instructions for each of the possible errors.
Do All Pages Need to Be in an XML Sitemap?
Your sitemap should include only the pages you want search engines to index and show in their search results.
So it shouldn’t include pages you want to exclude from search results. These can include non-canonical duplicate pages, admin pages, pages hidden behind a paywall, thank you pages, etc.
Moreover, an XML sitemap should only list valid pages (pages that return a 200 status code). Make sure it doesn’t include pages with any other status codes, such as:
- 3xx: pages with redirections (e.g., 301 permanent redirect)
- 4xx: pages that are unavailable (e.g., 404 page not found)
- 5xx: pages returning server errors (e.g., 502 bad gateway)
What’s the Difference Between HTML and XML Sitemaps?
The main difference between HTML and XML sitemaps is their purpose. HTML sitemaps help visitors with navigation, while XML sitemaps help search engines discover pages.
Google recommends using XML sitemaps for SEO purposes.
When it comes to HTML sitemaps, Google Search Advocate John Mueller recommends focusing on having clear navigation and good site architecture instead of using HTML sitemaps. He states that “they should never be needed.”
Should I Use the “Priority” and “Changefreq” Attributes in My XML Sitemap?
“Priority” and “changefreq” are two optional attributes that can be added to each URL listed in an XML sitemap.
Google has repeatedly stated (e.g., in this Webmaster Central hangout) that they don’t play a significant role in sitemaps.
Here’s what they do:
- Priority attribute: Gives each URL a priority value from 0 to 1. For example, giving a page priority of 0.8 informs search engines that you deem this page fairly important.
- Changefreq attribute: Informs search engines how often a page is updated (can include values like “always,” “daily,” “weekly,” etc.)
Although these may seem like useful attributes to include in a sitemap, the URL of a page and the “lastmod” attribute are really the only two things that are important for Google, as John Mueller confirmed on Twitter:
The URL + last modification date is what we care about for websearch.
— John Mueller is mostly not here 🐀 (@JohnMu) August 17, 2017
How Do I Know if My XML Sitemap Is Set Up Correctly?
With Semrush’s Site Audit tool, you can audit any website and check for various issues related to XML sitemaps.
All you need to do is to create a free account (no credit card needed).
The tool will check whether a sitemap.xml file is present. Then, it will list any formatting errors and pages that should not appear in a sitemap.
It will also check whether your sitemap meets the technical requirements (e.g., the size limit) and best practices (e.g., being linked to from your robots.txt file).