Why won’t my site update? A story in data

I’ve updated my WordPress site, but no-one can see my changes – what went wrong?
A common query we hear from GovPress clients is “I’ve updated my WordPress site, but no-one can see my changes – what went wrong?” and the slightly counterintuitive answer to this question is that nothing went wrong, but an old version of the page had been saved in a cache and that was the version that was delivered to visitors.
Dynamic websites, whose content might change based on the information in a database, whether a user is logged in, and so on, need several moving parts to work. The code that powers the site is part of this, plus a database and the images, videos and other media that appears on the site. All of these parts need to work together to send a page to a browser, and if one or other parts of this system become very busy the site can become slow to load or eventually fail to load at all. If you remember the Twitter “fail whale” sudden spikes in high load are one potential cause of failures you may have seen before.
To ensure that sites stay functional when large numbers of visitors access them, we can’t always add more and more databases and servers to the infrastructure. Under some circumstances a spike in the number of visitors will happen too quickly for the underlying infrastructure to scale up but also there’s a cost implication in adding more resources to a site, and eventually clients will run out of money for their hosting.
The chart below shows an example of this – a site we run which normally has very few visitors had a sudden spike and served a terabyte (1024GB) of data in a single day. To cope with the sudden surge in traffic we used a content delivery network (CDN) which provides a cache of the current pages. This means that almost all of the 1TB of data served in this spike was served from the cache (the green line in the chart) and not from the running infrastructure (the blue line) and so the site was protected from potential failure – no Fail Whales here.

Some sites cache better than others
Caching protects sites from outages but it also helps with performance – a well-cached site will load faster in the browser because a cached page can usually be served quicker than running a database look-up.
The chart below shows an extremely well-cached site. You’ll see the number of bytes served from the cache (the solid blue line) is always far higher than the bytes served from the database (the dashed red line).
There’s a small spike in the blue line around the 3 October which is likely to be a time when we emptied the cache to deploy a new version of the code that runs the site.
You’ll also notice a diurnal effect here – we serve many more bytes of data during the day than overnight. Having noticed that, the data in the table below won’t surprise you – most visitors to this site are from the UK and so are all in the same timezone.

Country | Requests | Request % | Bytes |
United Kingdom | 7,141,992 | 74.92% | 601.51 GB |
United States | 1,596,731 | 16.75% | 33.27 GB |
Singapore | 154,060 | 1.62% | 0.76 GB |
Germany | 79,881 | 0.84% | 1.82 GB |
Ireland | 78,830 | 0.83% | 3.61 GB |
France | 52,257 | 0.55% | 2.86 GB |
Netherlands | 32,849 | 0.34% | 1.85 GB |
India | 28,566 | 0.30% | 1.54 GB |
Canada | 23,518 | 0.25% | 1.16 GB |
What requests?
That data that we saw served from the cache is not just webpages and media. In some cases the response will be a redirect to another page (the light blue bars in the chart below) an error (red bars) or a login failure (the darkest blue bars). High numbers of login failures are usually the result of an unauthorised access attack on a site, which is a fact of life on the Internet and we have a number of ways to mitigate these attacks.

Unsuccessful requests
The chart above shows that some visits to the site were unsuccessful, either because the user could not login or (very rarely) due to some error on the site. However, a failure to finish downloading a page can happen for many reasons. For example, the user might press the escape key or the “stop” button on their browser, their internet connection may be interrupted, and so on. We can also see from the CDN that a very small number of visitors experience problems like this.

Not all sites cache well
Not all sites can be well-cached. Sites that make heavy use of the database or cookies may need to show slightly different versions of the site to different visitors. For example a shopping site will need to show a different shopping basket to each user, and so the “basket” feature of the site can never be cached.
Sites with very large numbers of less popular pages or very few visitors may cache poorly because the chances of a page being accessed the first time, so that it can be cached, are low.
From the CDNs we can see this behaviour in how the pages are served from the cache. The chart below shows a site with some dynamic features that cannot be cached. The number of bytes served from the running infrastructure is sometimes as high or higher than the number of bytes served from the cache and so there are quite long periods of time where the site is not as well protected by the cache as it might be.

Intranets are just different
Intranets are an exception to much of what I’ve said here. Most intranets expect users to be logged in to access the site, which limits how much data can be cached without causing a security vulnerability.
The chart below is from an intranet where a small amount of content can be cached, and you can see that the amount of data served from the running infrastructure (the dashed purple line) is always far higher than the amount of data served from the cache (the solid green line).
With intranets, scaling up the infrastructure is often the only thing that hosting companies can do to protect an intranet from spikes in the number of visitors. Unfortunately this happens relatively often with intranets – it’s in their nature that sometimes most of the users will login at once, to access a site-wide announcement or similar.

When should your site update?
When you make a change to the content of your site, when should visitors start to see the change? If your site caches well the answer to this will never be “straight away” but equally it’s unreasonable for visitors to wait “too long” to see a change.
How long is “too long” differs for each site, the nature of the information on the site and the visitors who need it. Some hosting companies have very strict rules about caching that apply to all their sites, or all sites on particular price points. It isn’t uncommon to see hosts that cache images and media files for a month or more.
In GovPress, our default for WordPress sites is that we cache all content for 24 hours. However, we can be flexible with clients with different needs. For some sites we move the 24 hours down to a smaller window of time but other sites will need different sorts of content cached in different ways. For example a page showing a list of news items may need to have a shorter cache lifetime than the news items themselves, so that visitors can see that a new story has just been published.
If that is something you need for your site, get in touch with us.