CacheBlend#
CacheBlend lets LMCache reuse the KV cache of any repeated text chunk – not only a shared prefix – by selectively recomputing a small fraction of tokens at chunk boundaries. This cuts time-to-first-token for RAG and multi-document workloads where the reusable context is not a clean prefix.
Enabling CacheBlend (MP mode)#
Start the LMCache server with the blend engine:
lmcache server --l1-size-gb 20 --eviction-policy LRU --engine-type blend
The blend engine composes a BlendModule into the server and requires
--supported-transfer-mode to be lmcache_driven or auto (the default). See
Configuration Reference for the related server flags.
Note
The in-process CacheBlend documentation – configuration knobs such as
LMCACHE_ENABLE_BLENDING and an end-to-end example – is preserved in
the Legacy section: Blending.