

It's often said, "There are only two hard things in computer science - cache invalidation and naming things." While this quote humorously emphasizes the complexity of caching, it also underscores the significance of mastering caching, especially when working with CDNs.
How CDNs Work and Why You Need Them
The primary purpose of a CDN is to serve web content from different servers based on the user’s geographical location. CDNs keep copies of data (images, large JavaScript files etc.) on multiple high-throughput servers all around the world. By serving the data from the server that’s closest to the actual user they ensure blazing fast access to content, enhancing the user experience and reducing latency.
There are multiple ways of configuring CDNs, but this post will investigate what’s often referred to as a “reverse proxy” setup. In this scenario, when a user requests content, the CDN first checks its cache. If the content is present (a cache hit), it's delivered from the closest server. If not (a cache miss), the CDN fetches it from the origin (your) server, caches it, and then delivers it to the user.
Some of the most renowned CDNs include:
- Cloudflare: Known for its security features and comprehensive free tier.
- CloudFront: Amazon's robust CDN solution, integrating seamlessly with other AWS services.
- Fastly: Praised for its real-time cache analytics and speedy content delivery.
- Akamai: One of the oldest and largest CDNs, catering to top-tier businesses.
The Power of Cache-Control Headers with CDNs
Without proper Cache-Control headers, a CDN might frequently contact the origin server or serve outdated content. Both scenarios negate the primary advantages of a CDN: speed and reduced load on the origin server. Cache-Control headers provide directives to the CDN about how, and for how long, to cache content.
Decoding Cache-Control: Using It Right
There's a plethora of cache-control directives, but here are the notable ones:
- max-age: Specifies the maximum amount of time (in seconds) the content can be cached.
- public: Indicates any cache, including CDNs, can cache the content.
- private: The response is specific to a user and should not be cached by shared caches like CDNs but can be cached by their browser.
- no-cache / no-store: Prohibits caching altogether. Use e.g. for real-time API responses.
It's crucial to note that relying on no-cache or no-store isn't ideal for semi-static content (i.e. images which change frequently). While they seem like a safe bet, they can cripple the very benefits CDNs provide by forwarding every request to the server and adding latency by being just one hop in between.
How Etags Work
Etags, or Entity Tags, are unique identifiers assigned to specific versions of web resources. When a user first requests a resource, the server provides the content with an Etag. On subsequent requests, the client sends back this Etag, essentially asking, "Has the content changed since this tag?" If unchanged, the server responds with a "Not Modified" status, so the client uses its cached version. If different, the server sends the updated content with a new Etag. This mechanism saves bandwidth, ensures efficient content verification, and helps maintain a fresh cache.
The Elegance of must-revalidate and proxy-revalidate
Instead of the stark no-cache or no-store, the directives must-revalidate and proxy-revalidate are more nuanced and beneficial.
- must-revalidate: It mandates that once content becomes stale (surpasses max-age), caches must check with the origin server for updates before serving it. It ensures fresh content without frequent checks.
- proxy-revalidate: It's similar to must-revalidate, but it's specific to shared caches like CDNs. Paired with Etags, the CDN can quickly ascertain if content has changed or if its cached version is still relevant.
The combined use of must-revalidate, proxy-revalidate, and Etags creates a streamlined content delivery process. CDNs serve content speedily, but when content might be stale, these headers ensure that the CDN efficiently verifies and updates as needed.

In Conclusion
While CDNs can drastically improve web performance, their efficiency hinges on the judicious use of cache-control headers. By leveraging directives like must-revalidate and proxy-revalidate, developers can harness the full potential of CDNs while ensuring content remains fresh and relevant for users.

Implementing a Custom RAG Pipeline for Technical Documents
In this article, we explore Qdrant, gpt4o-mini for embeddings and OCR as well as the use of DSPy to build an optimized Retrieval Augmented Generation (RAG) pipeline for technical documents.


Prototyping machine dashboards using Node RED
Quick and easy way to set up a dashboard for the machines on your factory floor


