LMCache Backend#
- class lmcache.storage_backend.abstract_backend.LMCBackendInterface(dst_device: str = 'cuda')[source]#
- batched_get(keys: Iterable[CacheEngineKey]) Iterable[Tensor | None] [source]#
Retrieve the kv cache chunks by the given keys in a batched manner
- Parameters:
keys – the iterator of keys of the token chunks, including prefix hash and format
- Returns:
the iterator of kv cache of the token chunks, in the format of a big tensor and None if the key is not found
- batched_put(keys_and_chunks: Iterable[Tuple[CacheEngineKey, Tensor]], blocking=True) int [source]#
Store the multiple keys and KV cache chunks into the cache engine in a batched manner.
- Parameters:
keys – the iterable of keys of the token chunks, in the format of CacheEngineKey
kv_chunks – the iterable of kv cache of the token chunks, in the format of a big tensor
blocking – whether to block the call before the operation is completed
- Returns:
the number of chunks are stored
- abstract close()[source]#
Do the cleanup things Children classes should override this method if necessary
- abstract contains(key: CacheEngineKey) bool [source]#
Query if a key is in the cache or not
- abstract get(key: CacheEngineKey) Tensor | None [source]#
Retrieve the KV cache chunk by the given key
- Parameters:
key – the key of the token chunk, including prefix hash and format
- Returns:
the kv cache of the token chunk, in the format of a big tensor and None if the key is not found
- abstract put(key: CacheEngineKey, kv_chunk: Tensor, blocking=True) None [source]#
Store the KV cache of the tokens into the cache engine.
- Parameters:
key – the key of the token chunk, in the format of CacheEngineKey
kv_chunk – the kv cache of the token chunk, as a big tensor.
blocking – to block the call before the operation is completed.
- Returns:
None
Note
The KV cache should NOT have the “batch” dimension.