LMCache Storage Backend#

Subpackages#

Submodules#

lmcache.storage_backend.abstract_backend module#

class lmcache.storage_backend.abstract_backend.LMCBackendInterface[source]#

batched_get(keys: Iterable[CacheEngineKey]) → Iterable[Tensor | None][source]#

Retrieve the kv cache chunks by the given keys in a batched manner

Parameters:: keys – the iterator of keys of the token chunks, including prefix hash and format
Returns:: the iterator of kv cache of the token chunks, in the format of a big tensor and None if the key is not found

batched_put(keys_and_chunks: Iterable[Tuple[CacheEngineKey, Tensor]], blocking=True) → int[source]#

Store the multiple keys and KV cache chunks into the cache engine in a batched manner.

Parameters:

keys – the iterable of keys of the token chunks, in the format of CacheEngineKey
kv_chunks – the iterable of kv cache of the token chunks, in the format of a big tensor
blocking – whether to block the call before the operation is completed

Returns:

the number of chunks are stored

abstract close()[source]#: Do the cleanup things Children classes should override this method if necessary

abstract contains(key: CacheEngineKey) → bool[source]#: Query if a key is in the cache or not

abstract get(key: CacheEngineKey) → Tensor | None[source]#

Retrieve the KV cache chunk by the given key

Parameters:: key – the key of the token chunk, including prefix hash and format
Returns:: the kv cache of the token chunk, in the format of a big tensor and None if the key is not found

abstract put(key: CacheEngineKey, kv_chunk: Tensor, blocking=True) → None[source]#

Store the KV cache of the tokens into the cache engine.

Parameters:

key – the key of the token chunk, in the format of CacheEngineKey
kv_chunk – the kv cache of the token chunk, as a big tensor.
blocking – to block the call before the operation is completed.

Returns:

None

Note

The KV cache should NOT have the “batch” dimension.

lmcache.storage_backend.hybrid_backend module#

class lmcache.storage_backend.hybrid_backend.LMCHybridBackend(config: LMCacheEngineConfig, metadata: LMCacheEngineMetadata)[source]#

Bases: LMCBackendInterface

A hybrid backend that uses both local and remote backend to store and retrieve data. It implements write-through and read-through caching.

batched_get(keys: Iterable[CacheEngineKey]) → Iterable[Tensor | None][source]#

Retrieve the kv cache chunks by the given keys in a batched manner

Parameters:: keys – the iterator of keys of the token chunks, including prefix hash and format
Returns:: the iterator of kv cache of the token chunks, in the format of a big tensor and None if the key is not found

close()[source]#: Do the cleanup things Children classes should override this method if necessary

contains(key: CacheEngineKey) → bool[source]#: Query if a key is in the cache or not

get(key: CacheEngineKey) → Tensor | None[source]#

Retrieve the KV cache chunk by the given key

Parameters:: key – the key of the token chunk, including prefix hash and format
Returns:: the kv cache of the token chunk, in the format of a big tensor and None if the key is not found

put(key: CacheEngineKey, value: Tensor, blocking: bool = True)[source]#

Store the KV cache of the tokens into the cache engine.

Parameters:

key – the key of the token chunk, in the format of CacheEngineKey
kv_chunk – the kv cache of the token chunk, as a big tensor.
blocking – to block the call before the operation is completed.

Returns:

None

Note

The KV cache should NOT have the “batch” dimension.

lmcache.storage_backend.local_backend module#

class lmcache.storage_backend.local_backend.LMCLocalBackend(config: LMCacheEngineConfig)[source]#

Bases: LMCBackendInterface

Cache engine for storing the KV cache of the tokens in the local cpu/gpu memory.

close()[source]#: Do the cleanup things Children classes should override this method if necessary

contains(key: CacheEngineKey) → bool[source]#

Check if the cache engine contains the key.

Input:: key: the key of the token chunk, including prefix hash and format

Returns:: True if the cache engine contains the key, False otherwise

get(key: CacheEngineKey) → Tensor | None[source]#

Retrieve the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format
Output:: the kv cache of the token chunk, in the format of nested tuples None if the key is not found

put(key: CacheEngineKey, kv_chunk: Tensor, blocking: bool = True) → None[source]#

Store the KV cache of the tokens into the cache engine.

Input:: key: the key of the token chunk, including prefix hash and format kv_chunk: the kv cache of the token chunk, in the format of nested tuples

Returns:: None

Note

The KV cache should NOT have the “batch” dimension.

put_blocking(key, kv_chunk)[source]#

put_nonblocking(key, kv_chunk)[source]#

put_worker()[source]#

remove(key: CacheEngineKey) → None[source]#

Remove the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format

class lmcache.storage_backend.local_backend.LMCLocalDiskBackend(config: LMCacheEngineConfig)[source]#

Bases: LMCBackendInterface

Cache engine for storing the KV cache of the tokens in the local disk.

close()[source]#: Do the cleanup things Children classes should override this method if necessary

contains(key: CacheEngineKey) → bool[source]#

Check if the cache engine contains the key.

Input:: key: the key of the token chunk, including prefix hash and format

Returns:: True if the cache engine contains the key, False otherwise

get(key: CacheEngineKey) → Tuple[Tuple[Tensor, Tensor], ...] | None[source]#

Retrieve the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format
Output:: the kv cache of the token chunk, in the format of nested tuples None if the key is not found

put(key: CacheEngineKey, kv_chunk: Tensor, blocking: bool = True) → None[source]#

Store the KV cache of the tokens into the cache engine.

Input:: key: the key of the token chunk, including prefix hash and format kv_chunk: the kv cache of the token chunk, in the format of nested tuples

Returns:: None

Note

The KV cache should NOT have the “batch” dimension.

put_blocking(key: CacheEngineKey, kv_chunk: Tensor) → None[source]#

put_worker()[source]#

remove(key: CacheEngineKey) → None[source]#

Remove the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format

class lmcache.storage_backend.local_backend.LocalBackendEndSignal[source]#

lmcache.storage_backend.remote_backend module#

class lmcache.storage_backend.remote_backend.LMCPipelinedRemoteBackend(config: LMCacheEngineConfig, metadata: LMCacheEngineMetadata)[source]#

Bases: LMCRemoteBackend

Implements the pipelined get functionality for the remote backend.

batched_get(keys: Iterator[CacheEngineKey]) → Iterable[Tensor | None][source]#

Retrieve the kv cache chunks by the given keys in a batched manner

Parameters:: keys – the iterator of keys of the token chunks, including prefix hash and format
Returns:: the iterator of kv cache of the token chunks, in the format of a big tensor and None if the key is not found

close()[source]#: Do the cleanup things Children classes should override this method if necessary

deserialize_worker()[source]#

network_worker()[source]#

class lmcache.storage_backend.remote_backend.LMCRemoteBackend(config: LMCacheEngineConfig, metadata: LMCacheEngineMetadata)[source]#

Bases: LMCBackendInterface

Cache engine for storing the KV cache of the tokens in the remote server.

close()[source]#: Do the cleanup things Children classes should override this method if necessary

contains(key: CacheEngineKey) → bool[source]#

Check if the cache engine contains the key.

Input:: key: the key of the token chunk, including prefix hash and format

Returns:: True if the cache engine contains the key, False otherwise

get(key: CacheEngineKey) → Tensor | None[source]#: Retrieve the KV cache chunk (in a single big tensor) by the given key

list() → List[CacheEngineKey][source]#: list the remote keys (and also update the ‘cached’ existing keys set)

put(key: CacheEngineKey, kv_chunk: Tensor, blocking: bool = True) → None[source]#

Store the KV cache of the tokens into the cache engine.

Input:: key: the key of the token chunk, including prefix hash and format kv_chunk: the kv cache of the token chunk, in a single big tensor blocking: whether to block until the put is done

Returns:: None

Note

The KV cache should NOT have the “batch” dimension.

put_blocking(key: CacheEngineKey, kv_chunk: Tensor) → None[source]#

put_worker()[source]#

class lmcache.storage_backend.remote_backend.RemoteBackendEndSignal[source]#

lmcache.server.server_storage_backend package

lmcache.storage_backend.connector package