Cache purging with Ghost and Cloudflare Workers

Cache purging with Ghost and Cloudflare Workers

As I have moved my blog platform over to Ghost, I've realised that there will be other parts of my technical ecosystem that will keep me busy tinkering. One such part of that is how to cache every page on this site (effectively) permanently whilst also allowing new posts and updates to be seen.

I won't belabour the point about active purging as other people online have covered it far more accurately and succinctly than I could. My aim is simply to reduce the amount of origin hits my server receives whilst simultaneously keeping pages in the CDN for as long as possible to heighten the chance of really fast delivery.

One thing I noticed a couple of days after publishing my last article was that the RSS feed (and thus the auto-tweeter) hadn't updated. I knew I'd been pretty aggressive with Cloudflare caches so I turned to Google to find out how Ghost could purge it.

Quickly enough, I came across Paolo's blog authored just a few days prior about connecting Ghost with Cloudflare Workers to purge the cache on site change. I followed the instructions and they worked perfectly, however I wanted to take his work a step further and:

  1. Change the logic around authentication
  2. Only trigger Workers on page publication or update
  3. Only purge the cache for specific pages rather than my whole domain

Changing the authentication

As many times as I tried, I couldn't seem to get the Cloudflare Worker to authenticate with username and password parameters in the URL. With that in mind, I decided that I'd change the way that I was constructing webhooks to use parameters in the path itself. I constructed an arbitrary path that I could use to trigger different cache purges on the same Worker as well as checking the username and password were correct.

const publishPath = `/cf-purge/purge/publish/${WEBHOOK_USER}/${WEBHOOK_PASSWORD}/`;
const updatePath = `/cf-purge/purge/update/${WEBHOOK_USER}/${WEBHOOK_PASSWORD}/`;

While there are some obvious flaws using this technique (not least that the username and password are directly in the URL), we are at least POSTing over HTTPS directly from my server to my Worker. There was also a thread I found about how basic auth parameters were becoming less supported, so I'm currently comfortable with what I changed.

Triggering on publish or update

When I publish a post on this blog, the only pages that really need to be purged are the front page where the list of articles is, and the RSS feed which is what dlvrit.com uses to post articles directly to my Twitter account.

When I update a post, it's highly unlikely that the front page or RSS will actually need changing as it'll more likely be a typo or additional edit to the post. In this case, the only item in the cache that I need to purge is the post itself.

Rather than using a single webhook for any site change, I used two different webhooks that posted to different paths on the same Worker. The top is triggered when I publish a post, the bottom is triggered when I update a post that is already published.

Ghost webooks

Selectively purging

I try to be as defensive as possible about when I purge; not because I'm hosting mission critical material but more because I try to follow best practices at home where possible – practicing what I preach.

I did some digging into what Ghost sends in the webhook for posts and updates so I could find out what I could switch on in order to send to Cloudflare for purging.

My first instinct was to dump what came to the Worker itself, but that proved to take additional time with both my lack of JavaScript knowledge as well as deploying to the Worker. I could have set up ngrok to do it locally. I could have used wrangler dev. All I wanted to see was the POSTed body though so I found Pipedream which acts as a big endpoint to examine what is being sent to it.

Pipedream Ghost output

The body content for the Published post updated hook looks similar to the above picture and has pretty much everything you'd want to use on a Worker. The key item for me was the body.post.current.url value as this is what I'd be purging.

With that captured, I could switch on whether the webhook path was publish or update and send different POST data to Cloudflare.

I've created a gist with my tweaks to Paolo's code for people who would like similar functionality with different URLs. The main changes are some code I stole from Cloudflare's documentation to read the POSTed body and the code that changes how I purge based on path.

As a final note, I did try to use the Cache API from the Worker to directly purge the cache object without having to undertake a separate API request. After trying a couple of times, the purge was coming back successful, but I was still seeing cache hits with my test script.

Turns out, other people had run into this as well. Long story short, the Cache API only works within the datacentre the Worker is based in and as my Ghost server is geographically distanced from me, I wasn't seeing the purge from my local POP.

This wouldn't have been a good path for me to pursue anyway as I would want consistent purging and caching regardless of location.

Testing

Testing this was working was the easiest part of this whole process and something that can be accomplished with only a terminal. I picked a handful of paths that wouldn't change as well as a handful of paths that would. I then wrote a quick bash one-liner to curl these URLs and check their status after I created new pages and updated existing ones.

# Testing after publishing a new page
$ for i in / /rss/ /about-me/ /public-keys/ /post/github/ /post/personal-resets/ ; do echo $i; curl -skILXGET https://www.adammalone.net$i | grep cf-cache-status; done
/
cf-cache-status: MISS
/rss/
cf-cache-status: MISS
/about-me/
cf-cache-status: HIT
/public-keys/
cf-cache-status: HIT
/post/github/
cf-cache-status: HIT
/post/personal-resets/
cf-cache-status: HIT

# Testing after updating the Github post
$ for i in / /rss/ /about-me/ /public-keys/ /post/github/ /post/personal-resets/ ; do echo $i; curl -skILXGET https://www.adammalone.net$i | grep cf-cache-status; done
/
cf-cache-status: HIT
/rss/
cf-cache-status: HIT
/about-me/
cf-cache-status: HIT
/public-keys/
cf-cache-status: HIT
/post/github/
cf-cache-status: MISS
/post/personal-resets/
cf-cache-status: HIT

Where can I improve?

While the quality of my JavaScript is definitely something that is left to be desired, there are a handful of things that I'd like to improve at some point.

  • Detect if a post has changed its canonical URL and purge the old URL rather than the new one. My assumption would be that this is in body.post.previous but I haven't tested yet.
  • There are a few other ways to read the request body that look cleaner than the code I've got in there currently. These would be interesting to explore.
  • There has to be a better way to secure this than passing authentication parameters in the URL. While I will explore locking the worker to my server IP, I'd like to look at other mechanisms for security. Creating a tunnel directly from my server to Cloudflare would be ideal (and possible with cloudflared/Argo), but maybe not pragmatic for my efforts and budget.
  • Extend the cache clear by also pre-filling the cache so users get both warm caches and new content.
Show Comments