LMCache Controller#

Overview#

The overall architecture of the LMCache Controller is shown in the figure, mainly consisting of two parts: the Controller Manager and LMCache Worker.

The Controller Manager mainly consists of KV Controller, Reg Controller, and Cluster Executor.

  • KV Controller: The KV Controller handles the chunk information reported by LMCache Workers, and lookup requests query chunk information from the KV Controller.

  • Reg Controller: The Reg Controller is responsible for handling register/deregister/heartbeat requests from LMCache Workers.

  • Cluster Executor: When the Controller Manager receives user requests, such as Clear or Move, it sends the corresponding commands to LMCache Workers through the Cluster Executor.

The LMCache Worker is a thread within a rank process, which is responsible for the following tasks:

  • sends register, deregister, heartbeat to the Reg Controller.

  • send chunk information to the KV Controller, which include admit and evict message.

  • listens on a port to receive commands from the Cluster Executor and performs corresponding processing.

LMCache Controller Architecture Diagram

Key Features#

  1. Exposes a set of APIs for users and orchestrators to manage the KV cache.

Currently, the controller provides the following APIs:

  • Clear: Clear the KV caches.

  • Compress: Compress the KV cache.

  • Health: Check the health status of cache workers.

  • Lookup: Lookup the KV cache for a given list of tokens.

  • Move: Move the KV cache to a different location.

  • Pin: Persist the KV cache to prevent it from being evicted.

  • CheckFinish: Check whether a (non-blocking) control event has finished or not.

  • QueryWorkerInfo: Query the worker info.

  1. Interacts with the LMCache worker.

Currently, the LMCache worker supports the following functions:

  • register with the controller

  • deregister from the controller

  • heartbeat

  • admit or evict chunk information(LocalCPUBackend or LocalDiskBackend)

Quick Start#

Start the Controller

python3 -m lmcache.v1.api_server

Expected output:

[2025-11-11 11:15:35,277] LMCache WARNING: Argument --monitor-port will be deprecated soon. Please use --monitor-ports instead. (__main__.py:361:__main__)
INFO 11-11 11:15:36 [__init__.py:239] Automatically detected platform cuda.
/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_fields.py:198: UserWarning: Field name "copy" in "create_app.<locals>.MoveRequest" shadows an attribute in parent "BaseModel"
warnings.warn(
[2025-11-11 11:15:37,956] LMCache INFO: Starting LMCache controller at 0.0.0.0:9000 (__main__.py:371:__main__)
[2025-11-11 11:15:37,956] LMCache INFO: Monitoring lmcache workers at ports None (__main__.py:372:__main__)
INFO:     Started server process [50664]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)

Controller Configuration

  • –host: default is 0.0.0.0

  • –port: default is 9000, the externally exposed port through which interfaces like lookup can be accessed via this port.

  • –monitor-port: default is 9001, the port through which LMCache Worker communicates with Controller Manager (deprecated, indicates the pull port in –monitor-ports, reply port is None).

  • –monitor-ports: default is None, if configured, requires a JSON format string input such as {"pull": 8300, "reply": 8400}.

YAML Configuration

enable_controller: True
lmcache_instance_id: "lmcache_instance_id"

controller_pull_url: ip:pull_port
# if controller reply port is None, no need to configure reply url
controller_reply_url: ip:reply_port
# the number of ports for LMCache Worker, must equal to the number of ranks
lmcache_worker_ports: [1, 2, 3]

# p2p configuration
p2p_host: localhost
p2p_init_port: [11, 12, 13]