Testing Linear Elastic Caching
To ensure our theory holds up in the real world, we conducted extensive experiments using two main sources:
- Production workloads: We integrated the system into Spanner.
- Public Traces: We tested various publicly available cache traces from industry benchmarks to ensure the results were not specific to Google’s infrastructure.
Production Workloads
We have developed a practical algorithm that assigns a cached page time-to-live (TTL) to each page request based on access patterns and page costs. Since Spanner processes billions of requests per second, this TTL prediction model must be incredibly lightweight. We opted for a superficial decision tree that can be translated into a few lines of C++ code. The resulting code is also easily interpretable and provides valuable information about workload characteristics. This model takes into account features such as data size, cost of a cache miss (when data is not in cache and the system must retrieve it from another slower system like a disk), and the type of database operation performed to predict the optimal lifetime for each page.
We integrated the elastic caching policy into Spanner’s production servers over several months. Compared to a standard fixed-size cache, the results were substantial:
- Memory usage: reduced by 15.5%.
- Cache misses: increased by only 5.5%.
- Total Cost of Ownership (TCO): Reduced by approximately 5%.
Importantly, because the algorithm is “cost conscious”, the slight increase in cache misses was concentrated on data that was inexpensive to fetch from storage, meaning that the impact on actual I/O costs was a negligible 0.5%.
Public Traces
We also evaluated our elastic caching approach using several publicly available cache traces. We used an optimized implementation of the Greedy Dual Size Frequency (GDSF) eviction algorithm – a generalization of the well-known LRU policy that allows pages of different sizes – as a base fixed cache size policy.
We considered four variations of elastic caching depending on the ski rental algorithm we used and whether or not we used a machine learning model. Since the available public traces do not have application-level features available for training, we have not implemented decision trees for prediction. Instead, we developed a simple learning strategy that splits each trace in half and uses the first half for training. For each individual page in the training trace, we calculated the best TTL for the page that minimizes the cost relative to the training trace.
Because cache behavior changes depending on what is initially in the cache, a common practice, known as “warming,” is to use a prefix from the cache trace to fill the cache, but without actually measuring its performance. We warmed up all caches with a day’s worth of requests from the second half of the trace and used the rest for testing and measurement. During the test trace, if we encounter a page that was viewed during training, we set the TTL to be the best precomputed TTL for that page. Otherwise, we define the lifetime using either equilibrium or random policies.
For more information, you can visit the source Here.
“`

