lmcache.server.server_storage_backend package#

Submodules#

lmcache.server.server_storage_backend.abstract_backend module#

class lmcache.server.server_storage_backend.abstract_backend.LMSBackendInterface[source]#

abstract close()[source]#: Do the cleanup things Children classes should override this method if necessary

abstract contains(key: str) → bool[source]#: Query if a key is in the cache or not

abstract get(key: str) → Tensor | None[source]#

Retrieve the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format
Output:: the kv cache of the token chunk, in the format of a big tensor None if the key is not found

abstract list_keys() → List[str][source]#

Retrieve the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format
Output:: the kv cache of the token chunk, in the format of a big tensor None if the key is not found

abstract put(key: str, kv_chunk_bytes: bytearray, blocking=True) → None[source]#

Store the KV cache of the tokens into the cache server.

Parameters:

key – the key of the token chunk, in the format of str
kv_chunk – the kv cache (bytearray) of the token chunk,
tensor (in the format of a big)
blocking – whether to block the call before the operation is
completed

Returns:

None

Note

The KV cache should NOT have the “batch” dimension.

lmcache.server.server_storage_backend.local_backend module#

class lmcache.server.server_storage_backend.local_backend.LMSLocalBackend[source]#

Bases: LMSBackendInterface

Cache engine for storing the KV cache of the tokens in the local cpu/gpu memory.

close()[source]#: Do the cleanup things Children classes should override this method if necessary

contains(key: str) → bool[source]#

Check if the cache engine contains the key.

Input:: key: the key of the token chunk, including prefix hash and format

Returns:: True if the cache engine contains the key, False otherwise

get(key: str) → bytearray | None[source]#

Retrieve the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format
Output:: the kv cache of the token chunk, in the format of nested tuples None if the key is not found

list_keys() → List[str][source]#

Retrieve the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format
Output:: the kv cache of the token chunk, in the format of a big tensor None if the key is not found

put(key: str, kv_chunk_bytes: bytearray, blocking: bool = True) → None[source]#

Store the KV cache of the tokens into the cache engine.

Input:: key: the key of the token chunk, including prefix hash and format kv_chunk_bytes: the kv cache of the token chunk, in the format of bytearray

Returns:: None

Note

The KV cache should NOT have the “batch” dimension.

remove(key: str) → None[source]#

Remove the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format

class lmcache.server.server_storage_backend.local_backend.LMSLocalDiskBackend(path: str)[source]#

Bases: LMSBackendInterface

Cache engine for storing the KV cache of the tokens in the local disk.

close()[source]#: Do the cleanup things Children classes should override this method if necessary

contains(key: str) → bool[source]#

Check if the cache engine contains the key.

Input:: key: the key of the token chunk, including prefix hash and format

Returns:: True if the cache engine contains the key, False otherwise

get(key: str) → bytes | None[source]#

Retrieve the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format
Output:: the kv cache of the token chunk, in the format of nested tuples None if the key is not found

list_keys() → List[str][source]#

Retrieve the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format
Output:: the kv cache of the token chunk, in the format of a big tensor None if the key is not found

put(key: str, kv_chunk_bytes: bytearray, blocking: bool = True) → None[source]#

Store the KV cache of the tokens into the cache engine.

Input:: key: the key of the token chunk, including prefix hash and format kv_chunk: the kv cache of the token chunk, in the format of nested tuples

Returns:: None

Note

The KV cache should NOT have the “batch” dimension.

remove(key: str) → None[source]#

Remove the KV cache chunk by the given key

Input:: key: the key of the token chunk, including prefix hash and format

LMCache Server

LMCache Storage Backend