Google’s Updated Crawler Guidance Recommends ETags

Google announced an update to their crawler documentation, adding more information about caching which should help better understand how to optimize for Google’s crawler. By following the new guidelines on implementing proper HTTP caching headers, SEOs and publishers can improve crawling efficiency and optimize server resources.

Updated Crawler Documentation

The crawler documentation now has a section that explains how Google’s crawlers use HTTP caching mechanisms that help to conserve computing resources for both publishers and Google during crawling.

Additions to the documentation significantly expand on the prior version.

Caching Mechanisms

Google recommends enabling caching with headers like ETag and If-None-Match, as well as optionally Last-Modified and If-Modified-Since, to signal whether content has changed. This can help reduce unnecessary crawling and save server resources, which is a win for both publishers and Google’s crawlers.

The new documentation states:

“Google’s crawling infrastructure supports heuristic HTTP caching as defined by the HTTP caching standard, specifically through the ETag response- and If-None-Match request header, and the Last-Modified response- and If-Modified-Since request header.”

Google’s Preference For Preference for ETag

Google recommends using ETag over Last-Modified because ETag is less prone to errors like date formatting issues and provides more precise content validation. It also explains what happens if both ETag and Last-Modified response headers are served:

“If both ETag and Last-Modified response header fields are present in the HTTP response, Google’s crawlers use the ETag value as required by the HTTP standard.”

The new documentation also states that other HTTP caching directives are not supported.

Variable Support Across Crawlers

The new documentation explains that support for caching differs among Google’s crawlers. For example, Googlebot supports caching for re-crawling, while Storebot-Google has limited caching support.

Google explains:

“Individual Google crawlers and fetchers may or may not make use of caching, depending on the needs of the product they’re associated with. For example, Googlebot supports caching when re-crawling URLs for Google Search, and Storebot-Google only supports caching in certain conditions”

Guidance On Implementation

Google’s new documentation recommends contacting hosting or CMS providers for assistance. It also suggests (but doesn’t require) that publishers set the max-age field of the Cache-Control response header in order to help crawlers know when to crawl specific URLs.

Entirely New Blog Post

Google has also published a brand new blog post:

Crawling December: HTTP caching

Read the updated documentation:

HTTP Caching

Featured Image by Shutterstock/Asier Romero