LMCache Storage Backend#
Subpackages#
- lmcache.storage_backend.connector package
- lmcache.storage_backend.serde package
- Submodules
- lmcache.storage_backend.serde.cachegen_basics module
- lmcache.storage_backend.serde.cachegen_decoder module
- lmcache.storage_backend.serde.cachegen_encoder module
- lmcache.storage_backend.serde.fast_serde module
- lmcache.storage_backend.serde.safe_serde module
- lmcache.storage_backend.serde.serde module
- lmcache.storage_backend.serde.torch_serde module
- lmcache.storage_backend.evictor package
- lmcache.storage_backend.mem_pool package
Submodules#
lmcache.storage_backend.abstract_backend module#
- class lmcache.storage_backend.abstract_backend.LMCBackendInterface(dst_device: str = 'cuda')[source]#
- batched_get(keys: Iterable[CacheEngineKey]) Iterable[Tensor | None] [source]#
Retrieve the kv cache chunks by the given keys in a batched manner
- Parameters:
keys – the iterator of keys of the token chunks, including prefix hash and format
- Returns:
the iterator of kv cache of the token chunks, in the format of a big tensor and None if the key is not found
- batched_put(keys_and_chunks: Iterable[Tuple[CacheEngineKey, Tensor]], blocking=True) int [source]#
Store the multiple keys and KV cache chunks into the cache engine in a batched manner.
- Parameters:
keys – the iterable of keys of the token chunks, in the format of CacheEngineKey
kv_chunks – the iterable of kv cache of the token chunks, in the format of a big tensor
blocking – whether to block the call before the operation is completed
- Returns:
the number of chunks are stored
- abstract close()[source]#
Do the cleanup things Children classes should override this method if necessary
- abstract contains(key: CacheEngineKey) bool [source]#
Query if a key is in the cache or not
- abstract get(key: CacheEngineKey) Tensor | None [source]#
Retrieve the KV cache chunk by the given key
- Parameters:
key – the key of the token chunk, including prefix hash and format
- Returns:
the kv cache of the token chunk, in the format of a big tensor and None if the key is not found
- abstract put(key: CacheEngineKey, kv_chunk: Tensor, blocking=True) None [source]#
Store the KV cache of the tokens into the cache engine.
- Parameters:
key – the key of the token chunk, in the format of CacheEngineKey
kv_chunk – the kv cache of the token chunk, as a big tensor.
blocking – to block the call before the operation is completed.
- Returns:
None
Note
The KV cache should NOT have the “batch” dimension.
lmcache.storage_backend.hybrid_backend module#
- class lmcache.storage_backend.hybrid_backend.LMCHybridBackend(config: LMCacheEngineConfig, metadata: LMCacheEngineMetadata, mpool_metadata: LMCacheMemPoolMetadata, dst_device: str = 'cuda')[source]#
Bases:
LMCBackendInterface
A hybrid backend that uses both local and remote backend to store and retrieve data. It implements write-through and read-through caching.
- batched_get(keys: Iterable[CacheEngineKey]) Iterable[Tensor | None] [source]#
Retrieve the kv cache chunks by the given keys in a batched manner
- Parameters:
keys – the iterator of keys of the token chunks, including prefix hash and format
- Returns:
the iterator of kv cache of the token chunks, in the format of a big tensor and None if the key is not found
- contains(key: CacheEngineKey) bool [source]#
Query if a key is in the cache or not
- get(key: CacheEngineKey) Tensor | None [source]#
Retrieve the KV cache chunk by the given key
- Parameters:
key – the key of the token chunk, including prefix hash and format
- Returns:
the kv cache of the token chunk, in the format of a big tensor and None if the key is not found
- put(key: CacheEngineKey, value: Tensor, blocking: bool = True)[source]#
Store the KV cache of the tokens into the cache engine.
- Parameters:
key – the key of the token chunk, in the format of CacheEngineKey
kv_chunk – the kv cache of the token chunk, as a big tensor.
blocking – to block the call before the operation is completed.
- Returns:
None
Note
The KV cache should NOT have the “batch” dimension.
lmcache.storage_backend.local_backend module#
- class lmcache.storage_backend.local_backend.LMCLocalBackend(config: LMCacheEngineConfig, metadata: LMCacheMemPoolMetadata, dst_device: str = 'cuda')[source]#
Bases:
LMCBackendInterface
Cache engine for storing the KV cache of the tokens in the local cpu/gpu memory.
- contains(key: CacheEngineKey) bool [source]#
Check if the cache engine contains the key.
- Input:
key: the key of the token chunk, including prefix hash and format
- Returns:
True if the cache engine contains the key, False otherwise
- get(key: CacheEngineKey) Tensor | None [source]#
Retrieve the KV cache chunk by the given key
- Input:
key: the key of the token chunk, including prefix hash and format
- Output:
the kv cache of the token chunk, in the format of nested tuples None if the key is not found
- put(key: CacheEngineKey, kv_chunk: Tensor, blocking: bool = True) None [source]#
Store the KV cache of the tokens into the cache engine.
- Input:
key: the key of the token chunk, including prefix hash and format kv_chunk: the kv cache of the token chunk, in the format of nested tuples
- Returns:
None
Note
The KV cache should NOT have the “batch” dimension.
- remove(key: CacheEngineKey) None [source]#
Remove the KV cache chunk by the given key
- Input:
key: the key of the token chunk, including prefix hash and format
- class lmcache.storage_backend.local_backend.LMCLocalDiskBackend(config: LMCacheEngineConfig, metadata: LMCacheMemPoolMetadata, dst_device: str = 'cuda')[source]#
Bases:
LMCBackendInterface
Cache engine for storing the KV cache of the tokens in the local disk.
- contains(key: CacheEngineKey) bool [source]#
Check if the cache engine contains the key.
- Input:
key: the key of the token chunk, including prefix hash and format
- Returns:
True if the cache engine contains the key, False otherwise
- get(key: CacheEngineKey) Tuple[Tuple[Tensor, Tensor], ...] | None [source]#
Retrieve the KV cache chunk by the given key
- Input:
key: the key of the token chunk, including prefix hash and format
- Output:
the kv cache of the token chunk, in the format of nested tuples None if the key is not found
- put(key: CacheEngineKey, kv_chunk: Tensor, blocking: bool = True) None [source]#
Store the KV cache of the tokens into the cache engine.
- Input:
key: the key of the token chunk, including prefix hash and format kv_chunk: the kv cache of the token chunk, in the format of nested tuples
- Returns:
None
Note
The KV cache should NOT have the “batch” dimension.
- put_blocking(key: CacheEngineKey, kv_chunk: Tensor) None [source]#
- put_nonblocking(key: CacheEngineKey, kv_chunk: Tensor) None [source]#
- remove(key: CacheEngineKey) None [source]#
Remove the KV cache chunk by the given key
- Input:
key: the key of the token chunk, including prefix hash and format
lmcache.storage_backend.remote_backend module#
- class lmcache.storage_backend.remote_backend.LMCPipelinedRemoteBackend(config: LMCacheEngineConfig, metadata: LMCacheEngineMetadata, dst_device: str = 'cuda')[source]#
Bases:
LMCRemoteBackend
Implements the pipelined get functionality for the remote backend.
- batched_get(keys: Iterator[CacheEngineKey]) Iterable[Tensor | None] [source]#
Retrieve the kv cache chunks by the given keys in a batched manner
- Parameters:
keys – the iterator of keys of the token chunks, including prefix hash and format
- Returns:
the iterator of kv cache of the token chunks, in the format of a big tensor and None if the key is not found
- class lmcache.storage_backend.remote_backend.LMCRemoteBackend(config: LMCacheEngineConfig, metadata: LMCacheEngineMetadata, dst_device: str = 'cuda')[source]#
Bases:
LMCBackendInterface
Cache engine for storing the KV cache of the tokens in the remote server.
- contains(key: CacheEngineKey) bool [source]#
Check if the cache engine contains the key.
- Input:
key: the key of the token chunk, including prefix hash and format
- Returns:
True if the cache engine contains the key, False otherwise
- get(key: CacheEngineKey) Tensor | None [source]#
Retrieve the KV cache chunk (in a single big tensor) by the given key
- list() List[CacheEngineKey] [source]#
list the remote keys (and also update the ‘cached’ existing keys set)
- put(key: CacheEngineKey, kv_chunk: Tensor, blocking: bool = True) None [source]#
Store the KV cache of the tokens into the cache engine.
- Input:
key: the key of the token chunk, including prefix hash and format kv_chunk: the kv cache of the token chunk, in a single big tensor blocking: whether to block until the put is done
- Returns:
None
Note
The KV cache should NOT have the “batch” dimension.
- put_blocking(key: CacheEngineKey, kv_chunk: Tensor) None [source]#