LMCache Controller#

Overview#

The overall architecture of the LMCache Controller is shown in the figure, mainly consisting of two parts: the Controller Manager and LMCache Worker.

The Controller Manager mainly consists of KV Controller, Reg Controller, and Cluster Executor.

KV Controller: The KV Controller handles the chunk information reported by LMCache Workers, and lookup requests query chunk information from the KV Controller.
Reg Controller: The Reg Controller is responsible for handling register/deregister/heartbeat requests from LMCache Workers.
Cluster Executor: When the Controller Manager receives user requests, such as Clear or Move, it sends the corresponding commands to LMCache Workers through the Cluster Executor.

The LMCache Worker is a thread within a rank process, which is responsible for the following tasks:

sends register, deregister, heartbeat to the Reg Controller.
send chunk information to the KV Controller, which include admit and evict message.
listens on a port to receive commands from the Cluster Executor and performs corresponding processing.

Key Features#

Exposes a set of APIs for users and orchestrators to manage the KV cache.

Currently, the controller provides the following APIs:

Clear: Clear the KV caches.
Compress: Compress the KV cache.
Health: Check the health status of cache workers.
Lookup: Lookup the KV cache for a given list of tokens.
Move: Move the KV cache to a different location.
Pin: Persist the KV cache to prevent it from being evicted.
CheckFinish: Check whether a (non-blocking) control event has finished or not.
QueryWorkerInfo: Query the worker info.

Interacts with the LMCache worker.

Currently, the LMCache worker supports the following functions:

register with the controller
deregister from the controller
heartbeat
admit or evict chunk information(LocalCPUBackend or LocalDiskBackend)

Quick Start#

Start the Controller

python3 -m lmcache.v1.api_server

Expected output:

[2025-11-11 11:15:35,277] LMCache WARNING: Argument --monitor-port will be deprecated soon. Please use --monitor-ports instead. (__main__.py:361:__main__)
INFO 11-11 11:15:36 [__init__.py:239] Automatically detected platform cuda.
/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_fields.py:198: UserWarning: Field name "copy" in "create_app.<locals>.MoveRequest" shadows an attribute in parent "BaseModel"
warnings.warn(
[2025-11-11 11:15:37,956] LMCache INFO: Starting LMCache controller at 0.0.0.0:9000 (__main__.py:371:__main__)
[2025-11-11 11:15:37,956] LMCache INFO: Monitoring lmcache workers at ports None (__main__.py:372:__main__)
INFO:     Started server process [50664]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)

Controller Configuration

–host: default is 0.0.0.0
–port: default is 9000, the externally exposed port through which interfaces like lookup can be accessed via this port.
–monitor-port: default is 9001, the port through which LMCache Worker communicates with Controller Manager (deprecated, indicates the pull port in –monitor-ports, reply port is None).
–monitor-ports: default is None, if configured, requires a JSON format string input such as {"pull": 8300, "reply": 8400}.

YAML Configuration

enable_controller: True
lmcache_instance_id: "lmcache_instance_id"

controller_pull_url: ip:pull_port
# if controller reply port is None, no need to configure reply url
controller_reply_url: ip:reply_port
# the number of ports for LMCache Worker, must equal to the number of ranks
lmcache_worker_ports: [1, 2, 3]

# p2p configuration
p2p_host: localhost
p2p_init_port: [11, 12, 13]

LMCache Controller#

Overview#

P2P Related#

Key Features#

Quick Start#