LMCache Backend#

class lmcache.storage_backend.abstract_backend.LMCBackendInterface(dst_device: str = 'cuda')[source]#

Retrieve the kv cache chunks by the given keys in a batched manner

Parameters:: keys – the iterator of keys of the token chunks, including prefix hash and format
Returns:: the iterator of kv cache of the token chunks, in the format of a big tensor and None if the key is not found

batched_put(keys_and_chunks: Iterable[Tuple[CacheEngineKey, Tensor]], blocking=True) → int[source]#

Store the multiple keys and KV cache chunks into the cache engine in a batched manner.

Parameters:

keys – the iterable of keys of the token chunks, in the format of CacheEngineKey
kv_chunks – the iterable of kv cache of the token chunks, in the format of a big tensor
blocking – whether to block the call before the operation is completed

Returns:

the number of chunks are stored

abstract close()[source]#: Do the cleanup things Children classes should override this method if necessary

abstract contains(key: CacheEngineKey) → bool[source]#: Query if a key is in the cache or not

Retrieve the KV cache chunk by the given key

Parameters:: key – the key of the token chunk, including prefix hash and format
Returns:: the kv cache of the token chunk, in the format of a big tensor and None if the key is not found

abstract put(key: CacheEngineKey, kv_chunk: Tensor, blocking=True) → None[source]#

Store the KV cache of the tokens into the cache engine.

Parameters:

Returns:

None

Note

The KV cache should NOT have the “batch” dimension.