Cache-Status A new header for diagnosing CDN caching behaviour

Website Performance Learning

Diagnosing caching CDN can be a complex beast, with multiple levels of caching, Origin Shielding and local caching. Understanding how all these interact can require many weeks of pouring over RFC's to understand the finer details of ETag, Cache-Control and Last-Modified headers - not to mention advanced controls overriding any of the above.

When you have a problem, or are trying to optimise caching, all you want is some visibility as to how a request was handled.

This is where the new Cache-Status header is useful. The new draft RFC - RFC draft-ietf-httpbis-cache-header-08 aims to:

  aide debugging by standardising the format of various non-standard debug headers
  used by major CDN providers. The semantics of these headrs are often unclear and vary
  between implementations.

The draft RFC proposes a new header, Cache-Status, with a uniform format allowing visibility how multiple caching providers interact and handle a request. The Cache-Status header forms a list, with each member of the list represents a cache that has handled the request, with the last member in the list belonging to the cached most recently served by the user. The header is only applicable to responses directly generated by an Origin server. Each member can add an appropriate parameter indicating as to how it handled a request.

A cache-status header will appear like:

  Cache-Status: OriginCache; hit; ttl=1100, "CDN Company Here"; fwd=uri-miss;

It is clearly formatted and labelled, with each cache in the line separated in list format via a ','

The format of the Cache-Status header is a list comprising of the following possible parameters:

hit = boolean
fwd = (bypass, method, uri-miss, vary-miss, request, stale, partial)
fwd-status = integer
ttl = integer
stored = boolean
collapsed = boolean
key = string
detail = token / string

What the parameters mean

Hit

The hit parameter signifies that a request was satisfied by the cache. It is a boolean parameter meaning that its presence indicates a cache hit.

For example:

  Cache-Status: Peakhour.IO; hit

Fwd

The Fwd parameter indicates when a response was forwarded to an Origin server - and why. The Fwd parameter is contains an argument comprised of the following:

bypass - The cache was configured to not handle this request
method - The request method's semantics require the request to be forwarded
uri-miss - The cache did not contain any responses that matched the request URI
vary-miss - The cache contained a response that matched the request URI, but could not select a response based upon this request's headers and stored Vary headers.
miss - The cache did not contain any responses that could be used to satisfy this request (to be used when an implementation cannot distinguish between uri-miss and vary-miss)
request - The cache was able to select a fresh response for the request, but the request's semantics (e.g., Cache-Control request directives) did not allow its use
stale - The cache was able to select a response for the request, but it was stale
partial - The cache was able to select a partial response for the request, but it did not contain all of the requested ranges (or the request was for the complete response)

For example:

  Cache-Status: Peakhour.IO; fwd=uri-miss

Fwd-status

The fwd-status indicates what status code the next hop returned in response to the request. It is only meaningful when "fwd" is present. For example, a complete miss would look like fwd=uri-miss and the HTTP status code of the downstream response would be supplied in the fwd-status:

The following example shows that the cache did not satisfy the request and that the downstream server indicated a HTTP 304 response - HTTP Not Modified.

  Cache-Status: Peakhour.IO; fwd=uri-miss; fwd-status=304

ttl

Each cache item is associated with a lifetime, referred to as the Time To Live (TTL). A TTL is in seconds and indicates the remaining freshness of the resource. This value is calculated by the cache.

For example, a positive hit on the cache indicating that the resource has a freshness of 376 seconds.

  Cache-Status: ExampleCache; hit; ttl=376

Another example indicating a positive hit on the cache a negative freshness - a stale resource.

  Cache-Status: ExampleCache; hit; ttl=-412

stored

Indicates whether the received response was stored by the cache. An example showing a cache miss and the cache storing the response for the next hit.

  Cache-Status: ExampleCache; fwd=uri-miss; stored;

collapsed

Indicates whether the received response was collapsed with another request.

key

The key is the lookup index into a cache. Cache keys conveys a representation of how the cache will look up the resource used for the response. The cache key is implementation specific.

An example showing a cache hit, the cache key and secondary key, and the remaining TTL of the resource.

  cache-status: peakhour.io; hit; key="https://example.com/calendar.css"; secondary-key="encoding::gzip"; ttl=30674859

detail

Allows additional implementation specific information not captured by other parameters.

Multiple layers of caching

The header allows multiple layers of caching, for example a global CDN may sit in front of a local varnish server working together. Each cache appends its cache-status to the header, so the last item will be the closest cache to the actual application and the first item being the closest cache to the accessing user.

  Cache-Status: OriginCache; hit; ttl=1100, "CDN Company Here"; hit; ttl=545

Security

There are possible security implications of this header. Making the header public will give an attacker insight as to how the cache is configured, possibly allowing them to bypass a cache entirely to access an origin server directly. Cache header security is provider specific, but could involve manually enabling when required, securing access to specific ip addresses or providing a custom header in a request to trigger the cache-status header.

Peakhour.IO regularly sees automated scans passing up headers of other providers to trigger a cache-status header return.

Conclusion

The Cache-Status header is a powerful mechanism enabling insight how all levels of caching are interacting on a website offering a standard set format for providing information as to how a cache handled a request and also enabling multiple caches to work together to provide full insight as to how a request was handled.

Peakhour fully supports this new header and it can be enabled/disabled and secured in the dashboard.

Website Performance Learning