Can the Date Header Be Used to Detect Caching?
No. But actually, not really.
Web Caches
Let’s start with some background. Web caches are in-the-middle components physically located between a client and an origin web server. They are used to reduce the latency of HTTP requests by storing the responses of the server and serving them to the client when the same request is made again. Moreover, web caches are used to reduce the load on the origin web servers. Web caches can be placed anywhere in the path between the client and the origin web server. Content Delivery Networks (CDNs) are a type of web caches that is geographically distributed and are usually placed as close as possible to the client, reducing the latency of the requests.
Detecting whether a response is cached or not (i.e., coming from the web cache or from the origin server) is fundamental to detect web cache vulnerabilities, such as Web Cache Deception (WCD), Cache Poisoning, and HTTP request smuggling (HRS).
Cache Header Heuristics
In our research Web Cache Deception Escalates!, we introduced the Cache Header Heuristics to detect the cache status of HTTP responses. This technique is based on the analysis of cache status headers, that are used by web caches to communicate the status of the cached response to the client. Even though the major web caching technologies use similar headers, they are not standardized and each web cache can use different headers. Following is a table with the cache status headers used by the most popular web caches (from our paper Web Cache Deception Escalates!):
CDN / Cache | Header Name(s) | Hit value(s) | Miss value(s) |
---|---|---|---|
Akamai | server-timing, X-Cache, X-Cache-Remote | desc=HIT, TCP\_HIT | desc=MISS, TCP\_MISS |
CDN77 | X-Cache | HIT | MISS |
Cloudflare | cf-cache-status | HIT | MISS |
CloudFront | x-cache | Hit from cloudfront | Miss from cloudfront |
Fastly | X-Cache | HIT | MISS |
Google Cloud | cdn\_cache\_status | hit | miss |
KeyCDN | X-Cache | HIT | MISS |
Azure | X-cache | TCP\_HIT, TCP\_REMOTE\_HIT | TCP\_MISS |
Apache, ATS | X-Cache | HIT | MISS |
NGINX | X-Proxy-Cache | HIT | MISS |
Rack Cache | X-Rack-Cache | hit | miss |
Squid | X-Cache | HIT from * | MISS from * |
Varnish | X-Cache | HIT | MISS |
Unknown | x-cache-info | cached | caching |
Date
Header
The Date
header is a standard HTTP header that is used to communicate the date and time at which the response was generated by the origin web server.
According to RFC 7231: “The “Date” header field represents the date and time at which the message was originated”. Based on this, should web caches and proxies change the Date
header when they serve a cached response? This was discussed in this Hackers News thread, and I agree with the original poster that the Date
header should not be changed by web caches and proxies based on the wording of the RFC (the message was originated: the origin web server is the one that originated the message, not the web cache or proxy).
Consequently, if the Date
header is not changed by web caches and proxies, it can be used to detect whether a response is cached or not. If the Date
header of a response is the same as the Date
header of a previous response, then the response is cached. Otherwise, the response is not cached.
Unfortunately, the reality is not that simple, and the vast majority of caching technologies do in fact change the value of the Date
header each time they send a stored copy.
To test what technologies change the Date
header, I developed a simple web crawler and analyzed the documentation of the most popular web caches. The results are summarized in the following table:
Change the date header | |
---|---|
Akamai | Yes |
CDN77 | Yes |
Cloudflare | Yes |
CloudFront | No |
Fastly | Yes |
Google Cloud | Yes |
KeyCDN | Yes |
Azure | Yes |
Apache, ATS | No |
NGINX | Yes |
Rack Cache | Yes |
Squid | No |
Varnish | No |
Methodology
To check whether a web cache changes the Date
header, my crawler performs the following steps:
- Find a cached response (using the
Cache Header Heuristics
). - Issue a request to the same URL and check whether the
Date
header is the same as the one of the cached response. - Cache-bust the request (e.g., by adding a random query parameter), check that the response is not cached, and check whether the
Date
header is now different from the one of the cached response. - Identify the web cache technology used by the website using a secret algorithm that I developed.
Documentation
The majority of web caches do not document this behavior. The only web caches for which I was able to find some documentation about this are the following:
- Fastly: changes the
Date
header (Date: “If a Date header is present on a response when served by Fastly, we will update the value to the current time”). - Apache, Apache Traffic Server (ATS): the value of the
Date
header cached and is not changed (HTTP Proxy Caching: “where date is the date in the object’s server response header”).
If you find some more documentation about this behavior, please let me know!
Conclusion
The Date
header cannot be used to detect whether a response is cached or not on the vast majority of web caches since they change the value of the Date
header each time they send a stored copy. However, there are some web caches that do not change the Date
header (CloudFront, Apache and ATS, Squid, and Varnish), and this might be useful to detect whether a response is cached or not.