Questions to ask before caching with a remote cache

Distributed caching with a system like memcached or redis comes at a cost! If you haven’t added a remote cache yet, here’s some costs to consider:

  1. The operational cost of actually standing up and maintaining a remote caching service. Is it a managed service like AWS elasticache or self managed? What are the failure modes? How does it get secured and upgraded?
  2. More complexity in having another system in the call stack that makes it harder to understand and troubleshoot application behavior. Data freshness and consistency issues come to the forefront whenever a cache (or really any derived data) is involved.
  3. General exposure to a entire class of new problems (cascading failures) that comes with sharing load with another system.

Even if you already operate or use a remote cache, caching any bit of data still adds complexity to your application – and not the essential kind. It’s easy to use caching strategies as a hammer for every performance issue, but here are some questions I recommend you ask before reaching for caching as a solution.

Have I figured out what the problem really is? Is speed really an issue?

Programmers tend to want everything to be fast – and while it’s important that apps generally feel snappy and help users get stuff done efficiently, speed doesn’t impact the user experience the same way in every interaction they have with your app.

For example, users expect fast read experiences (like searching a catalog on ecommerce sites) but slower write / update experiences (like submitting a payment). If you spend all your time optimizing a slow payment submit process when most of your users are bouncing because they can’t be bothered to even look at your inventory, you’re wasting your time.

What specifically needs to be faster and by how much?

Just like most metrics, optimization has diminishing returns. How fast do you want an experience to be? Under a second? Under 100 milliseconds? If your p99 is < 100ms and there’s no reason to believe that your users are unhappy with performance, the return on investing time in reducing latency may be low.

The marginal value of optimization depends on your use-case and current baseline performance. If you’re a backend API team with an SLA, the specific numbers may even be hard coded into a contract. If there are no binding agreements on performance, the requirements are fuzzier and may be driven more by best practices and customer reports. In either case, do some cost benefit analysis – this doesn’t have to be rigorous and sometimes the reason to do it regardless of how high the lift is may be as simple as “users will stop using this dashboard if it continues taking five seconds to load”.

Finally, once you’ve picked a target don’t forget to measure your performance progress. Are there certain metrics (percentile latency, profiling stats, etc) you can use to describe the current performance and how will you know when you’ve reached your goal? I really enjoy performance work, but I also know I need to know when to stop at the risk of losing sight of other goals.

Can I optimize without caching?

Reaching for caching as a strategy in a first pass at performance work does two things that are not ideal for the resilience of your system.

  1. It hides inefficiencies (poorly written queries, N^2 algorithms) that exist in your program that will be much more difficult to spot once the cache is in place.
  2. It can make your cache load bearing over time, meaning unless your cache is healthy, your application is unusable under normal load. Since most distributed in-memory caches are, by design, meant to be volatile and ephemeral, cold-cache events become an Achilles heel for your system.

Before you cache data, consider common performance bottlenecks in your system. Some common examples for web apps are:

  • Excess database querying such as N+1 queries.
  • Excess memory allocation, especially for garbage collected languages. I currently use rails for word and Active Record object allocations are particularly expensive since they lead to longer GC cycles that impact latency.
  • Missing HTTP response compression (gzip compression for example at a reverse proxy).
  • In-efficient algorithms, such as slow compute inside nested loops. Worth knowing the basics of Big O here.

Can I cache outside of my origin server?

Ok, so you need to cache – but can you cache the data such that:

  1. The data is as close to the requester as possible, therefore much faster to serve!
  2. Your backend server can sit back and do zero work (isn’t that great?)

The good news for web applications is that browser clients have their own cache. By using proper cache-control HTTP headers, you can have control over what requests are cached and for how long. If the data is too dynamic (unique to individual users for example) to cache for any length of time, consider if you can make it static (same for all users). If you can’t make it static, can you extract the parts that are static and cache that?

Most enterprise applications with global customer bases also make use of Content Delivery Networks (CDN’s) that have points of presence (data centers strategically located in geographic locations around the world) that can serve requests out of its own cache without reaching to the origin servers.

Delegating caching behavior to systems closer to your clients isn’t a mutually exclusive strategy to caching at origins with remote in-memory systems, but they can be a low effort (implementation wise) and significantly faster caching approach so they’re worth trying out first if they’re not being used.

Forward Proxy vs Reverse Proxy

Two main types of proxies are forward proxies and reverse proxies. Since they’re both proxies, it’s not immediately obvious from their names how they’re different! All proxies act as a middle man in a network topology between two parties: the client (or thing requesting a resource) and a server (the thing providing the resource).

The proxy itself is technically also a server, but in most discussions / writing about proxies the “server” is referring to the server that is providing the resource (such as an HTML page) that the client is requesting. If you’re running a web service, that server would be the backend service such as a rails app.

Forward Proxy

A forward proxy is a proxy that clients connect directly to and is aware of. The proxy itself is not aware of backend server identities / IP’s – it only knows how to forward and respond to requests. The client knows that its connecting to a proxy – and by client I don’t mean the actual computer user, but rather the program (perhaps a browser) that’s connecting to the wider internet. Sometimes the real user doesn’t even know that there is a proxy involved!

Here’s a couple of real scenarios involving forwarding proxies:

  1. A VPN service that clients can connect to hide their origin IP’s for the sake of protecting their identity (maybe for the purposes of bypassing censorship or just to remain anonymous). VPN’s are a special type of forwarding proxy that provides additional security and authentication features for the purposes of protecting the anonymity of the client requesting a resource. Since users are the ones seeking protection when signing up for VPN services, they’re also aware that the system they’re using is making use of a proxy server.
  2. A firewall proxy setup by school network administrators to intercept and block traffic to and from certain sites (social media sites, porn sites, etc). In this situation the computer user may be unaware that they’re being restricted – a student tries to access a restricted site and find it blocked, unaware that the client programs on their computer are configured to connect to a firewall proxy that is snooping on their requests.

Reverse Proxy

A reverse proxy is one that client programs are not aware of. Additionally, the proxy itself is aware of backend servers.

Clients have no idea that they’re connecting a reverse proxy. For example, so much of the internet services you use day to day sit behind reverse proxies – but I’m sure you have not configured your programs to connect to each one of these reverse proxy servers. Even though requests from clients are reaching the reverse proxy servers just like forwarding proxies, the clients do not actually know that they’re proxies.

On the other hand, the reverse proxy is aware of backend servers. It accepts incoming requests from clients and then forwards them to specific servers.

A few of the most common scenarios involving reverse proxies are:

  1. To improve the reliability of a service by load balancing traffic between backend servers. Nginx is a popular reverse proxy used by some of the biggest sites in the world. If you’re a developer, you can set up nginx in front of your application servers and have distributed traffic using algorithms like round robin or least time.
  2. To improve the security of a service by acting as the single authentication point for things like TLS handling / TLS termination and protecting the actual IP’s of your backend servers.
  3. To improve the performance of a service by compressing outbound traffic so that reducing response sizes to clients as well as caching static files so that requests for the same files such as images don’t need to go directly to your backend servers.

To sum it up – a forward proxy is cooperating with the clients to help the clients achieve X (bypass firewalls or in the case of firewall proxies to impose limitations). A reverse proxy is cooperating with servers / backend servers to achieve X (service reliability, security, etc).