Configuration Reference#
This page documents every CLI argument accepted by the LMCache multiprocess server. Arguments are grouped by the config module that defines them.
MP Server#
Source: lmcache/v1/multiprocess/config.py
Argument |
Default |
Description |
|---|---|---|
|
|
Host address to bind the ZMQ server. |
|
|
Port to bind the ZMQ server. |
|
|
Chunk size for KV cache operations (in tokens). |
|
|
Maximum number of worker threads for handling ZMQ requests. |
|
|
Hash algorithm for token-based operations.
Choices: |
HTTP Frontend#
Source: lmcache/v1/multiprocess/config.py
Only available when running http_server.py.
Argument |
Default |
Description |
|---|---|---|
|
|
Host to bind the HTTP (FastAPI/uvicorn) server. |
|
|
Port to bind the HTTP server. |
L1 Memory Manager#
Source: lmcache/v1/distributed/config.py
Argument |
Default |
Description |
|---|---|---|
|
required |
Size of L1 memory in GB. |
|
|
Use lazy allocation for L1 memory. |
|
|
Initial allocation size (GB) when using lazy allocation. |
|
|
Alignment size in bytes (default 4 KB). |
L1 Manager TTLs#
Source: lmcache/v1/distributed/config.py
Argument |
Default |
Description |
|---|---|---|
|
|
Time-to-live for each object’s write lock (seconds). |
|
|
Time-to-live for each object’s read lock (seconds). |
Eviction Policy#
Source: lmcache/v1/distributed/config.py
Argument |
Default |
Description |
|---|---|---|
|
required |
Eviction policy. Currently only |
|
|
Memory usage ratio (0.0–1.0) that triggers eviction. |
|
|
Fraction of allocated memory to evict when triggered (0.0–1.0). |
L2 Adapters#
Source: lmcache/v1/distributed/l2_adapters/config.py
L2 adapters are configured via repeatable --l2-adapter <JSON> arguments.
Each JSON object must include a "type" field that selects the adapter type.
The order of --l2-adapter arguments determines the adapter order (cascade).
Registered adapter types: nixl_store, mock.
nixl_store – NIXL-based persistent storage#
Fields:
backend(required): One ofPOSIX,GDS,GDS_MT,HF3FS,OBJ.backend_params(required for file-based backends): Dict of string key-value pairs. File-based backends (GDS,GDS_MT,POSIX,HF3FS) requirefile_pathanduse_direct_io.pool_size(required): Number of storage descriptors to pre-allocate (> 0).
Examples:
# POSIX backend (local file system)
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false"}, "pool_size": 64}'
# GDS backend (GPU Direct Storage)
--l2-adapter '{"type": "nixl_store", "backend": "GDS", "backend_params": {"file_path": "/data/nvme/lmcache", "use_direct_io": "true"}, "pool_size": 128}'
# GDS_MT backend (multi-threaded GDS)
--l2-adapter '{"type": "nixl_store", "backend": "GDS_MT", "backend_params": {"file_path": "/data/nvme/lmcache", "use_direct_io": "true"}, "pool_size": 128}'
# HF3FS backend (shared file system)
--l2-adapter '{"type": "nixl_store", "backend": "HF3FS", "backend_params": {"file_path": "/mnt/hf3fs/lmcache", "use_direct_io": "false"}, "pool_size": 64}'
# OBJ backend (object store -- no file_path needed)
--l2-adapter '{"type": "nixl_store", "backend": "OBJ", "backend_params": {}, "pool_size": 32}'
mock – Mock adapter for testing#
Fields:
max_size_gb(required): Maximum size of the adapter in GB (> 0).mock_bandwidth_gb(required): Simulated bandwidth in GB/sec (> 0).
Example:
--l2-adapter '{"type": "mock", "max_size_gb": 256, "mock_bandwidth_gb": 10}'
Multiple adapters (cascade)#
Pass --l2-adapter multiple times. Adapters are used in the order given:
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/ssd/l2", "use_direct_io": "false"}, "pool_size": 64}' \
--l2-adapter '{"type": "nixl_store", "backend": "GDS", "backend_params": {"file_path": "/data/nvme/l2", "use_direct_io": "true"}, "pool_size": 128}'
Prometheus Observability#
Source: lmcache/v1/mp_observability/config.py
Argument |
Default |
Description |
|---|---|---|
|
|
Disable Prometheus metrics collection and HTTP server. |
|
|
Port to expose the Prometheus |
|
|
How often (seconds) to flush accumulated stats to Prometheus. |
Telemetry#
Source: lmcache/v1/mp_observability/telemetry/config.py
Argument |
Default |
Description |
|---|---|---|
|
|
Enable the telemetry event system. |
|
|
Maximum events in the telemetry queue before tail-drop. |
|
(none) |
Processor spec as JSON (repeatable). Must include |
logging processor#
The built-in processor. Logs telemetry events via LMCache’s logger.
Fields:
log_level: Log level to use (DEBUG,INFO,WARNING,ERROR,CRITICAL). Default isDEBUG.
Examples:
--telemetry-processor '{"type": "logging", "log_level": "DEBUG"}'
--telemetry-processor '{"type": "logging", "log_level": "INFO"}'
vLLM Client Configuration#
On the vLLM side, specify the LMCache server host and port via the
kv_connector_extra_config parameter:
vllm serve Qwen/Qwen3-14B \
--kv-transfer-config \
'{"kv_connector":"LMCacheMPConnector", "kv_role":"kv_both", "kv_connector_extra_config": {"lmcache.mp.host": "127.0.0.1", "lmcache.mp.port": 6000}}'
Environment Variables#
Variable |
Description |
|---|---|
|
Log level for LMCache ( |
|
Set to a fixed value for reproducible hashing across processes
(relevant when using |
Full Example#
python3 -m lmcache.v1.multiprocess.server \
--host 0.0.0.0 \
--port 6555 \
--chunk-size 512 \
--max-workers 4 \
--hash-algorithm blake3 \
--l1-size-gb 100 \
--l1-use-lazy \
--l1-init-size-gb 20 \
--l1-align-bytes 4096 \
--l1-write-ttl-seconds 600 \
--l1-read-ttl-seconds 300 \
--eviction-policy LRU \
--eviction-trigger-watermark 0.9 \
--eviction-ratio 0.1 \
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false"}, "pool_size": 64}' \
--prometheus-port 9090 \
--prometheus-log-interval 10 \
--enable-telemetry \
--telemetry-processor '{"type": "logging", "log_level": "DEBUG"}'