API Scaling Challenges
The only thing worse than having to solve problems of scale is not having to solve problems of scale. You’re doing at least one thing right: you’ve made an API other people want to use. Now, however, you’re likely to face slowdowns during peak traffic, especially for endpoints that require multiple round trips to the database or are computationally expensive.. Horizontal and vertical scaling can be expensive, so what’s the best way to serve API requests under load?
While there are a variety of optimizations we can perform, I would like to assert that one often-neglected strategy is making use of edge caching. By instructing a varnish cache or other CDN to hold onto an API response and purging the cache when the data mutates, you can simultaneously decrease response latency and decrease load on the API server. I highly recommend using a Varnish-based cache. For one reason, Varnish is very fast. Additionally, the VCL language gives engineers the capacity to do lots of routine operations at the cache layer, before any traffic is directed to the API server. You can even do authorization and feature flagging at the edge.
Don’t forget client-side caching! For people living in remote geographical areas far from a CDN Point of Presence (PoP) , client side caching can significantly improve perceived latency and performance. The two most common approaches here are the use of either the ETag or Last-Modified header. Both will allow browsers and other clients to perform an OPTIONS request to see if the response has been modified since it was last fetched.
How We Did It
For more technical information on how we’ve done that at my current company, check out this blog post. For the TL;DR, we used a set of headers to define a cache TTL for several of our most-accessed endpoints. This has allowed us to reduce load significantly while serving requests much more quickly: the average time to first byte on a cache hit is less than 10 milliseconds. Our client side caching strategy has improved performance for users in remote locations as well.
API caching is a powerful strategy to alleviate a lot of the performance problems associated with scaling a service. While it won’t act as a silver bullet, caching can significantly reduce load on GET requests to your service while serving responses much faster than even the most optimized web server could.