caching – LINISNIL

docker caches

April 3, 2025April 3, 2025 alan Leave a comment

“cache” is an extremely overloaded term in docker land nowadays because docker supports different types of caching. ill cover the three main cache terms – what they are and when they’re useful

build cache

this is what most people think when you mention a docker cache. the build cache caches built image layers with the cache key being a docker instruction and/or the checksum of file contents. not all instructions create new image layers that contribute to final size of image – basically any of the commands that can write to the filesystem like COPY, ADD, RUN will create new layers. any FROM command that pulls in another image will also be a new layer.

you don’t need to do anything special for the builder to make use of build cache – this just comes for free. you can try it yourself. if you build a dockerfile once, run it again and you’ll see that it skips the steps if nothing changed. invalidation happens automatically as well. if the instruction changes, cache is invalidated. if a file part of a COPY command changes, cache is invalidated.

the build cache is probably most useful locally when you’re working on a dockerized application and using the same internal cache on your host machine.

external (build) cache

this is just another form of a build cache, except instead of relying on the builders internal cache you can use an external cache storage. once you start building images in a ci/cd environment that may be running your image builds on different instances, it may not be sufficient to rely on the default internal build cache to store an image. one run of your build will cache its images on its own machine but a subsequent run on a different executor machine will rebuild because it has its own internal docker cache. this can be wildly inefficient

the solution for this docker offers external cache backends. these can be fully remote ones like s3 where you specify a bucket and region or more local (but outside of buildkits internal cache) cache stores that store the cache in a specific directory on the host machine

many companies run dev workflows using github actions and use gh actions as their primary ci/cd pipeline, and docker allows you to point the cache to githubs cache.

cache mount

finally, we have cache mounts. these are are newer addition to build caches and they’re often confused with build caches because of a couple of reasons

there’s cache in the name
this cache is available only during image build – not too different from build caches

so what’s the difference? you’re not using the cache to store image layers, you’re using it in a more narrow sense to store artifacts produced by a RUN command. cache mounts are intended to cache results of RUN instructions so that any subsequent executions of that command during build time will go faster.

it’s used by specifying --mount-type=cache and specifying the location. at the time of writing, you cannot specify an external storage for this cache so this is entirely internal to the builder on the machine. anyway here’s an example

FROM node:latest
WORKDIR /app
COPY package.json ./
RUN --mount=type=cache,target=/root/.npm npm install

in the above, the first ever build will copy package.json and install the NPM dependencies into the /root/.npm path of dockers internal cache. if package.json changes and both the COPY and RUN layers are invalidated, the NPM install will run a lot faster in the new build step because the dependencies are found in the cache.

as far as i know, at the moment supplying an external cache via --cache-from to the builder has no effect on the source of the cache mount. this makes it not very useful in situations like ci/cd where you’re dealing with ephemeral executors. you might be using an external build cache like s3, but cache mounts storage will still be internal to the builder instance on the machine. yeah, confusing.

once i actually learned the differences between these, searching docs became far easier. if i need to understand details on the build cache which caches image layers, ill search for build cache, image cache, or external cache. if i’m writing a new dockerfile that needs to cache dependencies for a particular build instruction to optimize local build times, i’ll look up cache mounts if i need an assist on the api

Distributed Caching Pattern: Write Through

June 3, 2023June 3, 2023 alan Leave a comment

Write-through caching is a caching mechanism where data modifications are written to both the source database and the cache. In this approach, every write operation triggers a write to a RAM cache (such as redis) and the source database (such as postgres).

This is commonly used to complement look-aside caching pattern so that when subsequent read operations request the same data, there’s a greater likelihood of a cache-hit, therefore reducing the response time and minimizing the load on the source data store. Additionally, the cached data being returned will be more fresh compared to a strategy that only uses TTL or expiration based invalidation.

The key benefit to using a write-through cache is data freshness. By updating the cache during updates to the source database, you ensure that the cached data remains up to date. One of the common problems with relying strictly on expiry based invalidation to serve fresh data is the risk of stale data during the period between when the data is cached and when it expires.

How to implement (Ruby on Rails Example)

Lets look at a simple example. When data needs to be written, it is first updated in the source database and then persisted in the cache. Here’s an example using rails. In this example, we’re reading a list of blog posts using look-aside caching. On writes (updates to an individual blog post), we update the cache. This keeps the list of blog posts in the cache fresh!

class PostsController < ApplicationController
  def index
    @posts = Rails.cache.fetch('posts', expires_in: 1.hour) do
      Post.all.to_a
    end
    render json: @posts
  end

  def create
    @post = Post.new(post_params)
    if @post.save
      update_posts_cache
      render json: @post, status: :created
    else
      render json: @post.errors, status: :unprocessable_entity
    end
  end

  def update
    @post = Post.find(params[:id])
    if @post.update(post_params)
      update_posts_cache
      render json: @post
    else
      render json: @post.errors, status: :unprocessable_entity
    end
  end

  def destroy
    @post = Post.find(params[:id])
    @post.destroy
    update_posts_cache
    head :no_content
  end

  private

  def post_params
    params.require(:post).permit(:title, :content)
  end

  def update_posts_cache
    Rails.cache.write('posts', Post.all.to_a, expires_in: 1.hour)
  end
end

In this example, the index action retrieves the list of blog posts from the cache using the key `posts`. If the cache contains the posts, they are returned directly. Otherwise, the posts are fetched from the database (Post.all.to_a) and stored in the cache with a 1-hour expiration time.

Implementing write-through isn’t always straightforward because cached data may not always be easily re-computed on writes. The extent to which you can re-compute cached data during a write greatly affects the feasibility of a write-through implementation. In the example above, we re-computed the list of posts in the cache by invoking Posts.all.to_a in the code for writing. But what if instead of serving all posts on reads, our application only serves unread posts for logged in users? If you have 10 logged in users, they need to be served different lists of posts! (As an exercise, think about how you would handle this!).

Sometimes cache re-computability is affected by the API paradigm you choose. RESTful services are able to leverage write through caching better than GraphQL services. Lets see why.

GraphQL and Write Through Caching

At work we heavily use GraphQL API’s and GraphQL presents unique challenges for caching. The entire paradigm of GraphQL rests on allowing clients to query for the data they need. This leads to a higher variability in queries of nearly unbounded complexity (you can have highly nested queries). Common caching techniques can be easily applied to applications serving a small set of frequently executed but fixed queries – but if every query is likely going to be unique, the value of caching is diminished.

There’s a bigger reason though why caching (particularly write-through caching) is trickier with GraphQL. In reality, most GraphQL servers are built for internal product teams and client queries aren’t infinitely variable. There’s typically some fixed number of clients that use some set of queries to power an experience.

For example, at a video streaming company there may be a recommendations team (the client) that uses a set of queries for fetching video metadata from an internal video catalog service (graphql server).

Lets assume…

There’s a high traffic query that looks like this defined within the recommendation client: query { videos { title createdAt reviews { reviewerFullName } } }.
Within the video service, query results are cached on read with look-aside caching.

Now lets say you want to implement write-through caching to keep the cache of the recommendation teams query results warm. If video metadata gets updated throughout the day in the video catalog service (lets just say by some other service via HTTP), how does the cache get updated?

Here’s what we know:

The query is defined on the client (unlike REST, where the query is defined on the server).
The result of the query operation is computed at runtime in the video service resolvers.

This is a situation where the cached data (JSON result of the GraphQL query) is no longer easily computable. In order to compute that data, you need to execute the original query against the schema using the same inputs. But wait, GraphQL servers don’t define the queries, clients do! 😞

Unless the application changes the type of data that gets cached or the server records queries being executed against the API (to be “replayed” on relevant writes), warming this type of cache using write-through is not really possible. So in practice, most GraphQL servers focus on lazy-loading read-based caching strategies such as automatic persisted queries using GET requests or fragment caching on the server side.

Risks

Adopting write-through caching isn’t without risks – before implementing it, it’s worth considering the following in the context of your unique problem.

Increased Write Latency. Write-through caching introduces additional overhead due to the need to write data to both the cache and the main data store before returning a result. This can be an issue for applications with high writes that cannot tolerate additional write latency. That said, most users of applications generally have higher tolerance for writes than for reads.
Redundant / Unnecessary data in cache. The cache may fill up regardless of whether the data being written is going to be used. Data being written isn’t really a guarantee of a future read, so you may end up with a cache filled with unread data (which defeats the purpose of a cache). If you don’t have proper eviction policies in place, systems with high write volumes with large amounts of data being written can rapidly fill up a cache and make it unusable. On the other hand, if there are eviction policies in place and you haven’t properly sized your cache, you end up with churn in your cache where data being written to the cache is evicted before it has a chance to be used by readers.
Delayed benefit. The cache is only updated when something changes as a result of a write. If nothing changes, the cache stays empty. This is also why you rarely see systems that rely exclusively on write-through caching. If updates are infrequent, the cache will stay cold forever unless you also cache when the data is being read.

Questions to ask before caching with a remote cache

June 3, 2023June 3, 2023 alan Leave a comment

Distributed caching with a system like memcached or redis comes at a cost! If you haven’t added a remote cache yet, here’s some costs to consider:

The operational cost of actually standing up and maintaining a remote caching service. Is it a managed service like AWS elasticache or self managed? What are the failure modes? How does it get secured and upgraded?
More complexity in having another system in the call stack that makes it harder to understand and troubleshoot application behavior. Data freshness and consistency issues come to the forefront whenever a cache (or really any derived data) is involved.
General exposure to a entire class of new problems (cascading failures) that comes with sharing load with another system.

Even if you already operate or use a remote cache, caching any bit of data still adds complexity to your application – and not the essential kind. It’s easy to use caching strategies as a hammer for every performance issue, but here are some questions I recommend you ask before reaching for caching as a solution.

Have I figured out what the problem really is? Is speed really an issue?

Programmers tend to want everything to be fast – and while it’s important that apps generally feel snappy and help users get stuff done efficiently, speed doesn’t impact the user experience the same way in every interaction they have with your app.

For example, users expect fast read experiences (like searching a catalog on ecommerce sites) but slower write / update experiences (like submitting a payment). If you spend all your time optimizing a slow payment submit process when most of your users are bouncing because they can’t be bothered to even look at your inventory, you’re wasting your time.

What specifically needs to be faster and by how much?

Just like most metrics, optimization has diminishing returns. How fast do you want an experience to be? Under a second? Under 100 milliseconds? If your p99 is < 100ms and there’s no reason to believe that your users are unhappy with performance, the return on investing time in reducing latency may be low.

The marginal value of optimization depends on your use-case and current baseline performance. If you’re a backend API team with an SLA, the specific numbers may even be hard coded into a contract. If there are no binding agreements on performance, the requirements are fuzzier and may be driven more by best practices and customer reports. In either case, do some cost benefit analysis – this doesn’t have to be rigorous and sometimes the reason to do it regardless of how high the lift is may be as simple as “users will stop using this dashboard if it continues taking five seconds to load”.

Finally, once you’ve picked a target don’t forget to measure your performance progress. Are there certain metrics (percentile latency, profiling stats, etc) you can use to describe the current performance and how will you know when you’ve reached your goal? I really enjoy performance work, but I also know I need to know when to stop at the risk of losing sight of other goals.

Can I optimize without caching?

Reaching for caching as a strategy in a first pass at performance work does two things that are not ideal for the resilience of your system.

It hides inefficiencies (poorly written queries, N^2 algorithms) that exist in your program that will be much more difficult to spot once the cache is in place.
It can make your cache load bearing over time, meaning unless your cache is healthy, your application is unusable under normal load. Since most distributed in-memory caches are, by design, meant to be volatile and ephemeral, cold-cache events become an Achilles heel for your system.

Before you cache data, consider common performance bottlenecks in your system. Some common examples for web apps are:

Excess database querying such as N+1 queries.
Excess memory allocation, especially for garbage collected languages. I currently use rails for word and Active Record object allocations are particularly expensive since they lead to longer GC cycles that impact latency.
Missing HTTP response compression (gzip compression for example at a reverse proxy).
In-efficient algorithms, such as slow compute inside nested loops. Worth knowing the basics of Big O here.

Can I cache outside of my origin server?

Ok, so you need to cache – but can you cache the data such that:

The data is as close to the requester as possible, therefore much faster to serve!
Your backend server can sit back and do zero work (isn’t that great?)

The good news for web applications is that browser clients have their own cache. By using proper cache-control HTTP headers, you can have control over what requests are cached and for how long. If the data is too dynamic (unique to individual users for example) to cache for any length of time, consider if you can make it static (same for all users). If you can’t make it static, can you extract the parts that are static and cache that?

Most enterprise applications with global customer bases also make use of Content Delivery Networks (CDN’s) that have points of presence (data centers strategically located in geographic locations around the world) that can serve requests out of its own cache without reaching to the origin servers.

Delegating caching behavior to systems closer to your clients isn’t a mutually exclusive strategy to caching at origins with remote in-memory systems, but they can be a low effort (implementation wise) and significantly faster caching approach so they’re worth trying out first if they’re not being used.

Distributed Caching Pattern: Cache Aside

May 24, 2023May 27, 2023 alan Leave a comment

A distributed cache is a remote caching system that an application uses via a network to reduce read latency. There are lots of ways an application can interact with this cache and there tends to be common access patterns or strategies – one of the most popular access pattern is called “Cache Aside” (sometimes also referred to as look-aside or lazy load) and I’ll cover both the benefits and risks of this pattern in the context of a web app.

In a cache-aside, the application is able to connect to both the cache store and the source store (this is typically the primary, higher engine latency data store). If the application is a web application, the way this pattern works is:

App receives a request for data.
App looks up the data in the cache. If it’s in the cache (cache-hit), return it. If not (cache-miss), fetch the data from the source.

As with any caching pattern, the usefulness of a cache is influenced by how much faster it is to access the same copy of data from the cache compared to the source and the cache hit ratio (also known as the hit ratio or hit rate).

A cache hit ratio is the percentage of total requests to the application that results in a cache hit (number of cache hits / (number of cache hits + number of cache misses)). If you have a cache hit ratio of 1, that means every request resulted in a cache hit. If you have a cache hit ratio of 0, that means no requests resulted in a cache hit.

Pros and Cons

Here are some advantages or benefits to using a cache-aside pattern:

It’s relatively straightforward to implement once you’ve identified what you want to cache. In most cases this involves adding a single new line of code to perform the lookup in the cache before executing the original source store lookup. As a maintainer of that code, it’s also easier to reason with since the caching decision is made explicit in the source code.
You’re more likely to cache data that is going to be requested multiple times because you’re always caching by demand. The cache store is only populated whenever there is a cache-miss, so you’re more likely to cache the data you actually want cached and the storage footprint is lower.
Since you’re caching on demand, you can start benefiting from this cache pattern immediately once it’s in place because the cache will naturally fill up overtime without requiring any sort of offline cache pre-population.
In the event of a cache fail-over event, there’s a natural fallback already in place in your application (hitting the source database). In other words, since the application knows how to connect to both data stores, there’s a built-in redundancy which makes it more resilient to caching system failures.

Risks or things to watch out for with this pattern:

Since we’re only caching on demand, there will always be a cache-miss for initial requests. This might be bad if the cost of a cache miss is very high (lets say it involves some heavy compute that will cause potential customers to your site to bounce). When there’s high load, you’re also susceptible to a cache stampede.
Cache-invalidation is something you still need to reason about since this pattern has zero say / opinion on how data stays fresh once it’s stored for the first time. How long does it stay in the cache (TTL) ? What are the requirements around data freshness? If the underlying data is something that changes often (lets say it’s a list of book recommendations that gets pre-computed offline), how do you push those changes to the cache if at all? These are generally important questions to ask regardless of what caching pattern you’re using, but they’re especially critical when you’re using this particular pattern.
It’s very easy for the cache to become load-bearing overtime. If an app cannot adequately service it’s normal levels of traffic without the cache, the cache is load bearing. This isn’t great because it means that the cache becomes a single point of failure for your business and this dependence creeps up with this particular caching pattern because it’s easy to ignore the real costs of data access once you’re serving the majority of your traffic via the cache.

On cache hit ratios

The cache hit ratio alone doesn’t say much about the usefulness of a cache.

What you’ll typically notice with this pattern is that the hit rate starts out very low (on a cold cache) and then gradually increases as more data is cached until it stabilizes. If you’ve just restarted your cache and it’s cold, having a hit rate of 0 for the first say 10 minutes doesn’t tell you much about the effectiveness of the caching pattern if eventually the hit rate rises and stabilizes at a satisfactory level.

Traffic patterns can also affect your cache – if you’re experiencing a period of low variability in queries, your hit rate is going to be high during that period which may be misleading if during normal periods of traffic you get a much wider distribution of unique requests that are likely to miss your cache.

Lastly, if your hit rate is high, it really only tells you that your cache is working but not whether it’s actually working better than the actual un-cached path. Long story short – it’s a data point, but don’t take it as gospel and look at it in context of your entire application.