Google announced an update to their crawler documentation, adding more information about caching which should help better understand how to optimize for Google’s crawler. By following the new guidelines on implementing proper HTTP caching headers, SEOs and publishers can improve crawling efficiency and optimize server resources.
The crawler documentation now has a section that explains how Google’s crawlers use HTTP caching mechanisms that help to conserve computing resources for both publishers and Google during crawling.
Additions to the documentation significantly expand on the prior version.
Google recommends enabling caching with headers like ETag and If-None-Match, as well as optionally Last-Modified and If-Modified-Since, to signal whether content has changed. This can help reduce unnecessary crawling and save server resources, which is a win for both publishers and Google’s crawlers.
The new documentation states:
“Google’s crawling infrastructure supports heuristic HTTP caching as defined by the HTTP caching standard, specifically through the ETag response- and If-None-Match request header, and the Last-Modified response- and If-Modified-Since request header.”
Google recommends using ETag over Last-Modified because ETag is less prone to errors like date formatting issues and provides more precise content validation. It also explains what happens if both ETag and Last-Modified response headers are served:
“If both ETag and Last-Modified response header fields are present in the HTTP response, Google’s crawlers use the ETag value as required by the HTTP standard.”
The new documentation also states that other HTTP caching directives are not supported.
The new documentation explains that support for caching differs among Google’s crawlers. For example, Googlebot supports caching for re-crawling, while Storebot-Google has limited caching support.
Google explains:
“Individual Google crawlers and fetchers may or may not make use of caching, depending on the needs of the product they’re associated with. For example, Googlebot supports caching when re-crawling URLs for Google Search, and Storebot-Google only supports caching in certain conditions”
Google’s new documentation recommends contacting hosting or CMS providers for assistance. It also suggests (but doesn’t require) that publishers set the max-age field of the Cache-Control response header in order to help crawlers know when to crawl specific URLs.
Entirely New Blog Post
Google has also published a brand new blog post:
Crawling December: HTTP caching
Read the updated documentation:
HTTP Caching
Featured Image by Shutterstock/Asier Romero