Why some articles are excluded from Google's web crawler
A sitemap is a file where you list the webpages of your site that tells Google and other search engines about the organization of your site's content. Search engine web crawlers like Googlebot read this file so your site gets crawled intelligently.
If you're on RebelMouse, you might notice that sometimes new content does not appear in your sitemap. We follow Google's guidelines when rendering sitemaps. This means that we follow the search engine's rules to include URLs for articles published in the last two days. However, articles will be excluded from your sitemap if:
- The article is published to a private section.
- The article is excluded from search results.
- The article is unpublished.
- The article is set to link out to a source URL.
For example, an article can be excluded from search engine results when the Exclude from search engines checkbox is ticked in Entry Editor:
And here you can see an article set as a link out with the original source selected:
If you have any questions about sitemaps, feel free to reach out to firstname.lastname@example.org or contact your account manager today.