beginners guide to cache management

Whether you are writing for a personal blog or for a professional Newspaper, you might ask yourself whether your posts require an image. The answer would most probably be yes.

Caching in websites and other software applications means saving data in a temporary storage to reduce loading time and consequently improve user experience. Caching can be performed on the client side by saving data in the user’s browser. Or intermediate servers like CDNs can be used to reduce the physical distance between the user and their data to improve loading time. 

For websites especially, caching plays an important role in improving the user experience if used strategically. By saving files that rarely change on the client side, websites can load faster on repeated visits and also reduce the load on the servers. Generally JavaScript and CSS files are cached on the client’s side. Whereas, files that update often such as HTML or other template files, are rarely cached or cached for short periods of time otherwise content updates may not propagate to all users in a timely fashion.

If done right, with the right files, caching can improve the user experience and improve the loading time. But, if done without having the right knowledge, caching can become quite problematic and can lead to a situation where a bug might propagate in the system for a long time due to incorrect cache management. 

In this article, we are going to teach you everything about caching and how to configure it the right way. 

How Caching Works

When you access the LibPixel website through the browser, it downloads and processes several HTML, JavaScript and CSS files to show you all the different pages. For example, when you are accessing the homepage of LibPixel, it has some CSS file attached to it which contains the styling of the whole website. 

When you access LibPixel for the first time, you receive the CSS file from our server, which is then parsed by your browser and saved locally for future usage. This process is presented in the figure below where you can view the interaction between your browser and LibPixel when you came to the website for the first time.

getting resource
Getting resources from the server

Now, when you access another page, let’s say, this article, the browser may require the same CSS file. Now, instead of requesting the CSS file from the server again and waiting for it to be downloaded, the browser checks and finds that it had already saved the CSS file upon previous visit. 

So, when you visit this article, the browser actually loads the CSS file from its local storage where the CSS file is cached. This is simply how the process of caching works. You may observe this phenomenon by noticing that websites load faster when you visit them for the second time. This slight increase in the speed is due to the fact that certain assets required for loading the website may have already been cached on your browser and won’t require additional bandwidth. 

Caching can be clearly understood with the help of the following figure, where the CSS file is retrieved from the browser, instead of downloading it from the LibPixel server.

getting resource from cache
Getting resources from cache

Role of CDNs

Websites generally serve people from all across the globe and the geographical distance between the end user and the origin server can be tens of thousands of miles. CDNs are beneficial in this case and cache the static content such as images, JS and CSS files closer to the end user, resulting in decreased loading time.

For example, consider that you are hosting your website from a server somewhere in middle of the USA and serving a user who is situated somewhere in France. The following figure presents the journey of content from your server to France without the use of a CDN.

loading resource directly from server in us
Journey from US to France without CDN

Now, if you were using CDNs placed strategically in Europe then your journey would be much faster. Generally, CDNs can speed up the loading time of an image heavy page by up to 60%. The following figure represents the journey of the same content for the same user but while using CDNs.

loading resource directly from a CDN in France
Journey from US to France with a CDN

Similarly, LibPixel uses CDNs to manage delivery of images to your websites in an efficient way. By using LibPixel’s image API alongside our superfast CDNs, you can quickly change the presentation of images throughout regions without having to manage the caching yourself. 

With the concept of both public (CDNs) and private (Client-side) caching explained, let’s move on to explaining how caching is actually configured on the server side and how the cache headers work.

Cache Control

Here, we are going to talk about the best practices of cache management and how to configure caching while setting up your server. We’ll explain how you can ensure optimal caching of content and at the same time ensure that users always get the most updated content.

How Long to Cache

There are two response headers that can be specified to control the time for which the content should be cached at the end user.

Expires Header

You can use the expires header to specify the absolute time after which the content should not be cached. 

For example, consider the following definition of the header:


Expires: Wed, 10 Jun 2021 00:00:00 GMT 
            

This expires header specifies that the said content should not be cached any longer than the mentioned date.

Max-age Directive

The max-age directive is one of the many directives offered by the Cache-Control header, which provides granular control over how caching should behave. The max-age directive specifies the relative time in seconds for which the content can be cached. Consider the following example where the content is being cached for 1 hour (3600 secs).


Cache-Control: max-age=3600;
            

Who Cache’s the Content?

Public and Private directive control who will cache the content. You can choose public if you want the user’s browser and all intermediate CDNs to cache the content. However, if you want to stop intermediate caches such as CDNs then you can select the private directive. 

These directives are usually used in unison with expiry directives to provide a more granular control over how the content is cached. Here are a few examples: 

Allow content to be cached anywhere for 30 mins


Cache-Control: public, max-age=1800;
            

Allow content to be cached only on end users for 1 hour


Cache-Control: private, max-age=3600;
            

We’re covering just the basics here. To learn more about caching headers and how you can enjoy even more control over the caching behaviour, read this guide by mozilla.

Strategic Caching

Caching can be a headache if not handled strategically. HTML content can change frequently and may even rely on the type of the user accessing the website. So, caching HTML files is a huge mistake and can result in inaccurate presentation of content to your end users. To stop caching of HTML content, we use the no-cache or no-store directives. 

On the other hand, static content like JS, CSS and images can be cached for a long period, up to 6 months, depending upon the circumstances. If you want to change the static assets but find it difficult to propagate the change due to caching, then you can purge caches, which will enable you to swiftly update the static asset despite caching.


The crux of it is that you can ensure the serving of fresh content by adding fingerprints in the URL via tools like WebPack, which generate a hash of a file and append it in the file name. This means that if the file’s content changes then a new name will be generated for it and the previous cache will be invalidated.

Cache Hit Ratio

A cache is “hit” when a client requests a file but the request is fulfilled via the cache instead of fetching from the server. In terms of CDNs, when a file is requested and returned from the CDN’s cache instead of CDN downloading a fresh copy from the core server then this is considered as a “cache hit”. 

Hit ratio is simply the ratio between the number of hits and the total number of requests. We generally aim for 90% cache hit ratio and anything below 80% is considered as suboptimal usage of CDNs.

Improving Hit Ratio

If you are suffering from a low cache hit ratio then there are several factors that can be improved to increase the ratio.

Firstly, ensure that you are using consistent URLs for static assets, URLs for the same asset with similar parameters but with different sequences of the parameters can result in the same asset being called from the server instead of cache due to the difference in the URL. 

For the same reason, avoid constantly changing values as fingerprints in the filename. One such example is the use of timestamps of the current time in the name, this should be avoided since the current time will always be different. 

Other suggestions that you can use to improve hit ratio is to reduce the number of image variations that you may have for the purpose of responsiveness. Having too many variations can mean a lower hit ratio. This can be improved by using LibPixel’s API to serve different variations of the same image without having to change the URLs. Moreover, you can also optimize your application for the most famous device types listed on Google Analytics and other sources.

Conclusion

Caching plays an important role in improving the user experience of the users. In this article, we learned how to manage caching effectively with the use of cache headers. While setting up your server, carefully set up the Cache-Control HTTP directive to control who is allowed to cache the content, for how long and which files should be cached. 

The general rule is to avoid caching HTML and other files that update often. Whereas, static assets like JS and CSS files and images can be cached for long periods of time, up to 6 months, depending upon your requirements. 

For strategic cache management, use a tool like webpack which automatically creates the filename with it’s hash included. This can make it easier to purge cache and invalidate old files and cause immediate updation of the file. In the end, caching is an important tool in web development to improve user experience and should be used wisely to avoid unintended consequences.

Ready to dive in?
Start your free trial today.