Can the Date Header Be Used to Detect Caching?

No. But actually, not really.

Web Caches

Let’s start with some background. Web caches are in-the-middle components physically located between a client and an origin web server. They are used to reduce the latency of HTTP requests by storing the responses of the server and serving them to the client when the same request is made again. Moreover, web caches are used to reduce the load on the origin web servers. Web caches can be placed anywhere in the path between the client and the origin web server. Content Delivery Networks (CDNs) are a type of web caches that is geographically distributed and are usually placed as close as possible to the client, reducing the latency of the requests.

Detecting whether a response is cached or not (i.e., coming from the web cache or from the origin server) is fundamental to detect web cache vulnerabilities, such as Web Cache Deception (WCD), Cache Poisoning, and HTTP request smuggling (HRS).

Cache Header Heuristics

In our research Web Cache Deception Escalates!, we introduced the Cache Header Heuristics to detect the cache status of HTTP responses. This technique is based on the analysis of cache status headers, that are used by web caches to communicate the status of the cached response to the client. Even though the major web caching technologies use similar headers, they are not standardized and each web cache can use different headers. Following is a table with the cache status headers used by the most popular web caches (from our paper Web Cache Deception Escalates!):

CDN / CacheHeader Name(s)Hit value(s)Miss value(s)
Akamaiserver-timing, X-Cache, X-Cache-Remotedesc=HIT, TCP\_HITdesc=MISS, TCP\_MISS
CDN77X-CacheHITMISS
Cloudflarecf-cache-statusHITMISS
CloudFrontx-cacheHit from cloudfrontMiss from cloudfront
FastlyX-CacheHITMISS
Google Cloudcdn\_cache\_statushitmiss
KeyCDNX-CacheHITMISS
AzureX-cacheTCP\_HIT, TCP\_REMOTE\_HITTCP\_MISS
Apache, ATSX-CacheHITMISS
NGINXX-Proxy-CacheHITMISS
Rack CacheX-Rack-Cachehitmiss
SquidX-CacheHIT from *MISS from *
VarnishX-CacheHITMISS
Unknownx-cache-infocachedcaching

Date Header

The Date header is a standard HTTP header that is used to communicate the date and time at which the response was generated by the origin web server.

According to RFC 7231: “The “Date” header field represents the date and time at which the message was originated”. Based on this, should web caches and proxies change the Date header when they serve a cached response? This was discussed in this Hackers News thread, and I agree with the original poster that the Date header should not be changed by web caches and proxies based on the wording of the RFC (the message was originated: the origin web server is the one that originated the message, not the web cache or proxy).

Consequently, if the Date header is not changed by web caches and proxies, it can be used to detect whether a response is cached or not. If the Date header of a response is the same as the Date header of a previous response, then the response is cached. Otherwise, the response is not cached.

Unfortunately, the reality is not that simple, and the vast majority of caching technologies do in fact change the value of the Date header each time they send a stored copy.

To test what technologies change the Date header, I developed a simple web crawler and analyzed the documentation of the most popular web caches. The results are summarized in the following table:

Change the date header
AkamaiYes
CDN77Yes
CloudflareYes
CloudFrontNo
FastlyYes
Google CloudYes
KeyCDNYes
AzureYes
Apache, ATSNo
NGINXYes
Rack CacheYes
SquidNo
VarnishNo

Methodology

To check whether a web cache changes the Date header, my crawler performs the following steps:

  1. Find a cached response (using the Cache Header Heuristics).
  2. Issue a request to the same URL and check whether the Date header is the same as the one of the cached response.
  3. Cache-bust the request (e.g., by adding a random query parameter), check that the response is not cached, and check whether the Date header is now different from the one of the cached response.
  4. Identify the web cache technology used by the website using a secret algorithm that I developed.

Documentation

The majority of web caches do not document this behavior. The only web caches for which I was able to find some documentation about this are the following:

  • Fastly: changes the Date header (Date: “If a Date header is present on a response when served by Fastly, we will update the value to the current time”).
  • Apache, Apache Traffic Server (ATS): the value of the Date header cached and is not changed (HTTP Proxy Caching: “where date is the date in the object’s server response header”).

If you find some more documentation about this behavior, please let me know!

Conclusion

The Date header cannot be used to detect whether a response is cached or not on the vast majority of web caches since they change the value of the Date header each time they send a stored copy. However, there are some web caches that do not change the Date header (CloudFront, Apache and ATS, Squid, and Varnish), and this might be useful to detect whether a response is cached or not.

Matteo Golinelli
Matteo Golinelli
CyberSecurity PhD Student

My research interests include web security, with special focus on web caches.