lmcache.server.server_storage_backend package#

Submodules#

lmcache.server.server_storage_backend.abstract_backend module#

class lmcache.server.server_storage_backend.abstract_backend.LMSBackendInterface[source]#
abstract close()[source]#

Do the cleanup things Children classes should override this method if necessary

abstract contains(key: str) bool[source]#

Query if a key is in the cache or not

abstract get(key: str) Tensor | None[source]#

Retrieve the KV cache chunk by the given key

Input:

key: the key of the token chunk, including prefix hash and format

Output:

the kv cache of the token chunk, in the format of a big tensor None if the key is not found

abstract list_keys() List[str][source]#

Retrieve the KV cache chunk by the given key

Input:

key: the key of the token chunk, including prefix hash and format

Output:

the kv cache of the token chunk, in the format of a big tensor None if the key is not found

abstract put(key: str, kv_chunk_bytes: bytearray, blocking=True) None[source]#

Store the KV cache of the tokens into the cache server.

Parameters:
  • key – the key of the token chunk, in the format of str

  • kv_chunk – the kv cache (bytearray) of the token chunk,

  • tensor (in the format of a big)

  • blocking – whether to block the call before the operation is

  • completed

Returns:

None

Note

The KV cache should NOT have the “batch” dimension.

lmcache.server.server_storage_backend.local_backend module#

class lmcache.server.server_storage_backend.local_backend.LMSLocalBackend[source]#

Bases: LMSBackendInterface

Cache engine for storing the KV cache of the tokens in the local cpu/gpu memory.

close()[source]#

Do the cleanup things Children classes should override this method if necessary

contains(key: str) bool[source]#

Check if the cache engine contains the key.

Input:

key: the key of the token chunk, including prefix hash and format

Returns:

True if the cache engine contains the key, False otherwise

get(key: str) bytearray | None[source]#

Retrieve the KV cache chunk by the given key

Input:

key: the key of the token chunk, including prefix hash and format

Output:

the kv cache of the token chunk, in the format of nested tuples None if the key is not found

list_keys() List[str][source]#

Retrieve the KV cache chunk by the given key

Input:

key: the key of the token chunk, including prefix hash and format

Output:

the kv cache of the token chunk, in the format of a big tensor None if the key is not found

put(key: str, kv_chunk_bytes: bytearray, blocking: bool = True) None[source]#

Store the KV cache of the tokens into the cache engine.

Input:

key: the key of the token chunk, including prefix hash and format kv_chunk_bytes: the kv cache of the token chunk, in the format of bytearray

Returns:

None

Note

The KV cache should NOT have the “batch” dimension.

remove(key: str) None[source]#

Remove the KV cache chunk by the given key

Input:

key: the key of the token chunk, including prefix hash and format

class lmcache.server.server_storage_backend.local_backend.LMSLocalDiskBackend(path: str)[source]#

Bases: LMSBackendInterface

Cache engine for storing the KV cache of the tokens in the local disk.

close()[source]#

Do the cleanup things Children classes should override this method if necessary

contains(key: str) bool[source]#

Check if the cache engine contains the key.

Input:

key: the key of the token chunk, including prefix hash and format

Returns:

True if the cache engine contains the key, False otherwise

get(key: str) bytes | None[source]#

Retrieve the KV cache chunk by the given key

Input:

key: the key of the token chunk, including prefix hash and format

Output:

the kv cache of the token chunk, in the format of nested tuples None if the key is not found

list_keys() List[str][source]#

Retrieve the KV cache chunk by the given key

Input:

key: the key of the token chunk, including prefix hash and format

Output:

the kv cache of the token chunk, in the format of a big tensor None if the key is not found

put(key: str, kv_chunk_bytes: bytearray, blocking: bool = True) None[source]#

Store the KV cache of the tokens into the cache engine.

Input:

key: the key of the token chunk, including prefix hash and format kv_chunk: the kv cache of the token chunk, in the format of nested tuples

Returns:

None

Note

The KV cache should NOT have the “batch” dimension.

remove(key: str) None[source]#

Remove the KV cache chunk by the given key

Input:

key: the key of the token chunk, including prefix hash and format