Google Search Central has launched a new series called “Crawling December” to provide insights into how Googlebot crawls and indexes webpages.
Google will publish a new article each week this month exploring various aspects of the crawling process that are not often discussed but can significantly impact website crawling.
The first post in the series covers the basics of crawling and sheds light on essential yet lesser-known details about how Googlebot handles page resources and manages crawl budgets.
Today’s websites are complex due to advanced JavaScript and CSS, making them harder to crawl than old HTML-only pages. Googlebot works like a web browser but on a different schedule.
When Googlebot visits a webpage, it first downloads the HTML from the main URL, which may link to JavaScript, CSS, images, and videos. Then, Google’s Web Rendering Service (WRS) uses Googlebot to download these resources to create the final page view.
Here are the steps in order:
Crawling extra resources can reduce the main website’s crawl budget. To help with this, Google says that “WRS tries to cache every resource (JavaScript and CSS) used in the pages it renders.”
It’s important to note that the WRS cache lasts up to 30 days and is not influenced by the HTTP caching rules set by developers.
This caching strategy helps to save a site’s crawl budget.
This post gives site owners tips on how to optimize their crawl budget:
Also, Google warns that blocking resource crawling with robots.txt can be risky.
If Google can’t access a necessary resource for rendering, it may have trouble getting the page content and ranking it properly.
The Search Central team says the best way to see what resources Googlebot is crawling is by checking a site’s raw access logs.
You can identify Googlebot by its IP address using the ranges published in Google’s developer documentation.
This post clarifies three key points that impact how Google finds and processes your site’s content:
Understanding these mechanics helps SEOs and developers make better decisions about resource hosting and accessibility – choices that directly impact how well Google can crawl and index their sites.
Featured Image: ArtemisDiana/Shutterstock