lmcache server#
The lmcache server command launches the standalone LMCache
Multi-Process (MP) server, which exposes a ZMQ control plane and an HTTP
frontend (status, healthcheck, cache-clear, checksum APIs). It is the server
that lmcache describe, lmcache ping kvcache, lmcache kvcache, and
lmcache bench server talk to.
Note
This command requires the full lmcache installation with CUDA
extensions. It is not available in the lightweight lmcache-cli
package.
lmcache server [options]
Quick start#
lmcache server \
--host 0.0.0.0 --port 5555 \
--l1-size-gb 100 \
--eviction-policy LRU
Options#
The server composes its arguments from several configuration modules — the multiprocess server, the storage manager (L1 / L2 adapters / eviction), the HTTP frontend, and the Prometheus / telemetry observability layer. The full, authoritative list is large and evolves with the runtime, so consult:
lmcache server --help
Commonly used flags include:
Flag |
Description |
|---|---|
|
Bind address for the server. |
|
ZMQ control-plane port. |
|
KV cache chunk size in tokens. |
|
L1 (CPU/DRAM) cache capacity in GB. |
|
L1 eviction policy (e.g. |
|
L1 fill ratio at which eviction begins. |
|
Fraction of L1 cleared per eviction cycle. |
|
Number of server worker processes. |
|
Enable storage-level trace recording (see lmcache trace). |
|
Destination for recorded |
L2 adapters, observability, and Prometheus exporters are configured through
their own flag groups; see lmcache server --help for the complete set.