Optimizing Cloud Economics with Linear Elastic Caching

Testing Linear Elastic Caching

To ensure our theory holds up in the real world, we conducted extensive experiments using two main sources:

Production workloads: We integrated the system into Spanner.

Public Traces: We tested various publicly available cache traces from industry benchmarks to ensure the results were not specific to Google’s infrastructure.

Production Workloads

We have developed a practical algorithm that assigns a cached page time-to-live (TTL) to each page request based on access patterns and page costs. Since Spanner processes billions of requests per second, this TTL prediction model must be incredibly lightweight. We opted for a superficial decision tree that can be translated into a few lines of C++ code. The resulting code is also easily interpretable and provides valuable information about workload characteristics. This model takes into account features such as data size, cost of a cache miss (when data is not in cache and the system must retrieve it from another slower system like a disk), and the type of database operation performed to predict the optimal lifetime for each page.

We integrated the elastic caching policy into Spanner’s production servers over several months. Compared to a standard fixed-size cache, the results were substantial:

Memory usage: reduced by 15.5%.

Cache misses: increased by only 5.5%.

Total Cost of Ownership (TCO): Reduced by approximately 5%.

Importantly, because the algorithm is “cost conscious”, the slight increase in cache misses was concentrated on data that was inexpensive to fetch from storage, meaning that the impact on actual I/O costs was a negligible 0.5%.

Public Traces

We also evaluated our elastic caching approach using several publicly available cache traces. We used an optimized implementation of the Greedy Dual Size Frequency (GDSF) eviction algorithm – a generalization of the well-known LRU policy that allows pages of different sizes – as a base fixed cache size policy.

We considered four variations of elastic caching depending on the ski rental algorithm we used and whether or not we used a machine learning model. Since the available public traces do not have application-level features available for training, we have not implemented decision trees for prediction. Instead, we developed a simple learning strategy that splits each trace in half and uses the first half for training. For each individual page in the training trace, we calculated the best TTL for the page that minimizes the cost relative to the training trace.

Because cache behavior changes depending on what is initially in the cache, a common practice, known as “warming,” is to use a prefix from the cache trace to fill the cache, but without actually measuring its performance. We warmed up all caches with a day’s worth of requests from the second half of the trace and used the rest for testing and measurement. During the test trace, if we encounter a page that was viewed during training, we set the TTL to be the best precomputed TTL for that page. Otherwise, we define the lifetime using either equilibrium or random policies.

For more information, you can visit the source Here.

“`

The ‘Almost Homeless’ subreddit is a brutal look at skyrocketing wealth inequality

AGIBOT produces its 15,000th robot, marking an important milestone in the deployment of embodied AI

If you want to reduce your screen time, just get a brick

The Trump administration allows Anthropic to distribute Mythos to select US organizations

Optimizing Cloud Economics with Linear Elastic Caching

Testing Linear Elastic Caching

Production Workloads

Public Traces

The ‘Almost Homeless’ subreddit is a brutal look at skyrocketing wealth inequality

AGIBOT produces its 15,000th robot, marking an important milestone in the deployment of embodied AI

If you want to reduce your screen time, just get a brick

The Trump administration allows Anthropic to distribute Mythos to select US organizations

Industry Voices – What Pope Leo’s AI Encyclical Could Mean for Healthcare: 3 Key Takeaways for Leaders

I replaced ChatGPT with local AI for 30 days. Here’s what really happened.

Acceleration of Gemini Nano models on Pixel with frozen multi-token prediction

How Cara is pioneering domain-specific AI for enterprise insurance brokerages with AWS

Use Gemini to create Google Sheets

I deleted all the static Claude API keys I had. Here’s the keyless migration, vendor by vendor.

LEAVE A REPLY Cancel reply

Useful Links

Latest News

AGIBOT produces its 15,000th robot, marking an important milestone in the deployment of embodied AI

If you want to reduce your screen time, just get a brick

The Trump administration allows Anthropic to distribute Mythos to select US organizations

Our Newsletter