CDN Cache Keys - flexible caching

cache learning

Content Delivery Networks (CDNs) are a must-have for any business website. CDNs speed up websites by storing website content, and assets, in caches on servers spread around the world. These caches are typically key/value stores. For example, let's say we are requesting an image found at:

https://www.example.com/someimage.jpg

Depending on how the CDN works internally, it may use say /someimage.jpg (the key) to retrieve the image (the value) from its cache. Most CDNs will be fairly rigid in what they use as keys, or be very complicated (eg requiring programming skills) to configure, or make you pay for more flexibility.

The Peakhour.io CDN cache combines flexibility and simplicity to enable sophisticated, fine-grained control of caching behaviour, to extract the maximum benefit for your application

The Peakhour.IO cache key consists of a primary key, sub key, and secondary key. Both the primary and secondary key must match to constitute a cache hit.

Primary key

The primary key is the key from the client request that would uniquely identify a resource. It consists of the scheme, host, path and query string from the request. Within our cache, the primary key may be augmented from other elements of the request, such as the presence of a particular header or a header value such as a cookie value. At Peakhour.IO, we call these augmented keys cache subkey vars

The following options can further manipulate the primary key by specifying how the query string can be handled. These options include ignore query string (great for defeating cache-busting techniques) or stripping certain tags (quite commonly UTM tags).

Secondary key

The secondary key describes parts of the request message that influenced the content of the response from the origin. This information is stored in the secondary key and the following headers are used:

Vary
Content-Encoding
User-Agent: (browser/mobile)
Accept-Language
Variant-06

However, servers are very commonly mis-configured, and can send back headers that don't match the behaviour of the content. For example, its very common for a server to send back a Vary header with Vary: user-agent, however when you check the behaviour by sending different user agents, content doesn't actually change. This can cause unnecessary cache fragmentation and lower hit rates. Fortunately Peakhour allows you to override/ignore origin behaviour to correct for this.

If you are a visual person, a complete picture of all the elements of a request/response that could make up a cache key

For even more flexibility Peakhour.io can use specific Cookie values in the secondary key, perfect for Content Management Systems like Magento which set a special cookie, X-Magento-Vary, to enable special caching behaviour that wouldn't normally be achievable. eg is the user logged in, is the user part of a special group, has the user chosen a different currency etc.

How can I see my keys?

Cache keys are generated on the fly based on the request of a client and response of an origin. To see a cache key, enable Debug headers in the Peakhour.IO dashboard. As an example, for the given request:

GET / HTTP/1.1
Accept-Encoding: gzip, deflate, br

HTTP/1.1 200
Vary: accept-encoding
content-encoding: br

we would generate the following cache-status header:

cache-status: peakhour.io; fwd=uri-miss; key="https://example.com/what-is-new.html"; secondary-key="encoding::br"; stored; ttl=31536000

You can notice the key being the scheme://host/path and the secondary-key being the encoding served by the origin.

What's in it for me?

By understanding cache-keys, and how they are constructed, a web application can tailor its responses to better utilise a cache. There are simple options for fine-tuning the handling of cache keys to provide a great user experience and high cache hit rates.

Interested in HTTP and RFCs? Peakhour.IO needs people who can read and implement flexible, performant code - we're hiring!

cache learning