Cache-testing software for LLM-provider-style tiered ephemeral caches? [D]

Our take

Are you seeking a cache-testing solution tailored for the unique demands of LLM-provider-style tiered ephemeral caches? Traditional tools like libCacheSim fall short for this purpose, lacking support for multi-tier residency and varying costs associated with cache misses. An ideal benchmark suite would model multiple tiers, enabling nuanced evaluations of cache policies with features for tokenized objects and edit-driven workloads. If you know of any academic code, research prototypes, or open-sourced internal tools that meet these criteria, your insights would be greatly appreciated.

I'm looking for a cache simulator / benchmark suite suited to the kind of tiered ephemeral cache that LLM providers use — e.g. Anthropic's 4-tier prompt cache, where context sits across several tiers with different residency windows, costs, and eviction rules.

I've already tried libCacheSim. It's a solid piece of software for classical caches (LRU, FIFO, ARC, SIEVE, S3-FIFO, W-TinyLFU, Belady oracle, plugin API, trace replay), and I got a plugin + synthetic trace working against it. But it seems fundamentally aimed at single, flat caches:

One cache, not a hierarchy of tiers with different costs
No notion of partial / multi-tier residency of the same object
Misses are uniform-cost — no way to express "miss to L1 vs miss to L3 vs full recompute," which is the whole point in LLM prompt caching
Trace model is atomic get/put, not edit streams where cached objects mutate in place
No first-class support for token-weighted object sizes

So it works as a baseline comparator, but it's not really the right shape for evaluating LLM-cache policies.

Does anyone know of cache-testing software specifically targeting LLM-provider-style caches? Something that models multiple tiers with per-tier cost/residency, tokenised objects, and edit-driven workloads would be ideal. Academic code, research prototypes, internal tools that got open-sourced — all welcome. Even partial matches (e.g. KV-cache simulators for inference servers) would be useful pointers.

submitted by /u/flatmax
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article →

Tagged with

#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#financial modeling with spreadsheets#digital transformation in spreadsheet software#rows.com#self-service analytics tools#business intelligence tools#collaborative spreadsheet tools#AI-driven spreadsheet solutions#no-code spreadsheet solutions#data visualization tools#data analysis tools#spreadsheet API integration#cache-testing software#LLM-provider-style#tiered ephemeral caches#benchmark suite#prompt cache#residency windows