When was the last time you used a map?
I can’t even remember to be honest.
But sitemaps—sitemaps are as relevant as ever when it comes to SEO.
For years I simply submitted sitemaps to Google Search Console because I’d heard it was best practice, checked the box and moved on.
I didn’t fully understand why I did so and was full of misconceptions.
But XML sitemaps aren’t just a box to be checked.
Sitemaps are a powerful tool, and as an SEO, it’s crucial that you understand their background, the ins and outs, and best practices.
XML Sitemaps: Everything You Need to Know for SEO
What Is an XML Sitemap?
Put simply, an XML sitemap is a directory or guide containing the most important pages of a website. They’re intended to help crawlers understand your website and how it’s structured.
A sitemap is an XML (Extensible Markup Language) file that’s easily digested by search engines. It looks something like this:
This is just a single URL and showcases all available tags according to sitemaps.org, but the only required tag is the location—the URL of the page.
Yoast, a popular SEO tool for WordPress websites, generates sitemaps that look like this:
As mentioned, a sitemap’s primary function is to help search engines understand websites. XML sitemaps do this by indicating three important things:
- The most important pages on a website.
- Site structure and architecture.
- How recently pages have been updated.
2 Common Misconceptions About XML Sitemaps
Before we dive into how to generate a sitemap, what to include and what to exclude, let’s address two very common misconceptions.
1. Every Page Should Be Included
As noted above, a sitemap should showcase a website’s most important pages. The theory is only the pages of your site that you want found on search engines should be included in your sitemap.
If possible, you should do your best to split your site pages into two categories: those you want users to land on from search engines, and those you don’t.
Pages you want to be accessed from search engines should not be blocked by robots.txt and should be included in your XML sitemap.
Pages you don’t want to be accessed from search engines should be blocked by robots.txt and should not be included in your XML sitemap.
2. Every Page Included Will Be Indexed
You read that right:
Just because you’ve included a URL in your sitemap does not guarantee its being indexed.
And vice versa, even if you’ve excluded a URL in your sitemap, search engine crawlers may still index the page.
XML sitemaps are merely a recommendation to crawlers. It’s important to keep in mind that your site is sending lots of other signals to crawlers. If you really don’t want a page indexed, we’d definitely recommend a robots.txt disallow.
Probably the most tried and true way to see which pages Google is indeed indexing is to perform a site:search.
You can also perform a search for a specific URL to see if it may or may not be being indexed.
XML Sitemap Best Practices
So we have a good idea of what an XML sitemap is, and we’ve looked at a couple of common sitemap misconceptions. Let’s dive into best practices.
1. Use a Tool to Generate Your Sitemap
The first step to properly utilizing sitemaps is to generate one. Unfortunately, they aren’t just magically created. You have a couple of options to do so:
If you don’t already have the Yoast SEO plugin installed on your site, here’s what to do:
A. Within your WordPress admin dashboard, navigate to Plugins and click “Add New.”
B. In the search bar, search for “Yoast.”
C. Click “Install Now,” and then “Activate.”
D. Navigate to [your-domain.com]/sitemap.xml. Voilà!
This is another great tool for XML sitemap generation, especially if you don’t utilize WordPress. Here’s what to do:
A. Navigate to xml-sitemaps.com and type your domain into the entry bar.
B. Allow the site to crawl your domain (it might take a minute or so). Download your sitemap.
C. Using an FTP or file manager, upload the domain into the root folder of your website.
If you have a Windows computer, we’d recommend Filezilla. Here are some instructions for uploading files with Filezilla. If you have a Mac computer, we’d recommend Transmit. Here are some instructions for using Transmit.
2. Submit to Google Search Console
Now that you have a sitemap, you may be wondering what to do with it.
The first step is to submit it to Google Search Console. This will help Google crawl and index your website, but as we mentioned, this does not guarantee every page included will be indexed, or that every page excluded will be excluded from the index.
Here’s what to do:
Once signed in, you should be taken to a screen that looks like this:
B. Enter your domain and click “Add Property.”
Google requires verification that you indeed own the site. If you have Google Analytics set up, it’ll be done for you automatically. If not, there are a few other options for verifying.
C. Navigate to sitemaps.
D. Insert your sitemap URL and click “Submit.”
And you’re all set! Give Google some time to read the sitemap and check back periodically to see if Google has encountered any errors.
3. Prioritize Highest Quality Pages
When it comes to ranking, it would seem Google not only considers the value of the page in question, but also the overall quality of a website.
Let’s say your website has 500 pages, but only 10 are pages containing fantastic content that’s useful to users. The rest are either old and irrelevant blog posts or “utility” pages (log-ins, shopping carts, places to retrieve lost passwords, etc.).
It’s very possible Google would take this as a signal that the vast majority of your website contains low-quality content, thus hurting your chances of ranking your most important pages well.
So keep this in mind as you decide what pages you want included in your sitemap. As we mentioned above, it’s pretty simple:
- Include and index pages you want found through search engines
- Exclude and no-index pages you wouldn’t want found through search engines
4. Use Noindex
Speaking of including and excluding pages, it’s really important to be consistent. Including a page on your sitemap yet instructing search engines not to crawl it is not a good idea.
So if you don’t want a page included in your sitemap, leave it off, and make sure it’s not being indexed.
You have a few options when it comes to making sure search engines don’t crawl your site. You can utilize meta robots (instructions in the <head> of a page) or your robots.txt file (a single file containing crawler instructions).
Meta robots are probably a safe bet for an older blog post or a utility page, while a robots.txt disallow would make more sense if you’re looking to minimize your crawl budget.
5. Consider Crawl Budget
Speaking of your crawl budget, it’s important to keep this in mind when considering what pages to include or exclude in your XML sitemap.
Put simply, a crawl budget has to do with what Google refers to as a “crawl rate limit.” In other words, Googlebots can’t just crawl every single page on the web anytime they want. There are limits both to Google’s servers and your site’s servers.
If you have a really large site (we’re talking a few thousand URLs), you’ve got to be much more careful when choosing which pages to include than most websites on the web that have far fewer pages.
And there you have it—everything you need to know about XML sitemaps, how to generate them, submit them and use them to boost your SEO.
Once you’ve followed all these steps, make sure you’re keeping a close eye on your website performance with Monitor Backlinks. It’ll track your keywords for you so you always know which of your pages are crawled and ranking on Google.
Now go and help Google better crawl your website!