Cache-testing software for LLM-provider-style tiered ephemeral caches? [D]
Our take
I'm looking for a cache simulator / benchmark suite suited to the kind of tiered ephemeral cache that LLM providers use — e.g. Anthropic's 4-tier prompt cache, where context sits across several tiers with different residency windows, costs, and eviction rules.
I've already tried libCacheSim. It's a solid piece of software for classical caches (LRU, FIFO, ARC, SIEVE, S3-FIFO, W-TinyLFU, Belady oracle, plugin API, trace replay), and I got a plugin + synthetic trace working against it. But it seems fundamentally aimed at single, flat caches:
- One cache, not a hierarchy of tiers with different costs
- No notion of partial / multi-tier residency of the same object
- Misses are uniform-cost — no way to express "miss to L1 vs miss to L3 vs full recompute," which is the whole point in LLM prompt caching
- Trace model is atomic get/put, not edit streams where cached objects mutate in place
- No first-class support for token-weighted object sizes
So it works as a baseline comparator, but it's not really the right shape for evaluating LLM-cache policies.
Does anyone know of cache-testing software specifically targeting LLM-provider-style caches? Something that models multiple tiers with per-tier cost/residency, tokenised objects, and edit-driven workloads would be ideal. Academic code, research prototypes, internal tools that got open-sourced — all welcome. Even partial matches (e.g. KV-cache simulators for inference servers) would be useful pointers.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience