Configuring LMCache#
LMCache supports two types of configurations:
Configuration file: a YAML (recommended) or JSON file that contains the configuration items.
Environment variables: environment variables that start with
LMCACHE_
.
To use a configuration file, you can set the LMCACHE_CONFIG_FILE
environment variable to the path of the configuration file.
Note
The environment variable configurations will be ignored if the configuration file is present.
General Configurations#
Basic cache settings that control the core functionality of LMCache.
YAML Config Name |
Environment Variable |
Description |
---|---|---|
chunk_size |
LMCACHE_CHUNK_SIZE |
Size of cache chunks. Default: 256 |
local_cpu |
LMCACHE_LOCAL_CPU |
Whether to enable CPU caching. Values: true/false. Default: true |
max_local_cpu_size |
LMCACHE_MAX_LOCAL_CPU_SIZE |
Maximum CPU cache size in GB. Default: 5.0 |
local_disk |
LMCACHE_LOCAL_DISK |
Path to local disk cache. Format: “file:///path/to/cache”. |
max_local_disk_size |
LMCACHE_MAX_LOCAL_DISK_SIZE |
Maximum disk cache size in GB. Default: 0.0 |
remote_url |
LMCACHE_REMOTE_URL |
Remote storage URL. Format: “protocol://host:port”. |
remote_serde |
LMCACHE_REMOTE_SERDE |
Serialization format. Values: “naive” or “cachegen”. Default: “naive” |
save_decode_cache |
LMCACHE_SAVE_DECODE_CACHE |
Whether to store decode KV cache. Values: true/false. Default: false |
use_layerwise |
LMCACHE_USE_LAYERWISE |
Whether to enable layerwise pipelining. Values: true/false. Default: false |
pre_caching_hash_algorithm |
LMCACHE_PRE_CACHING_HASH_ALGORITHM |
Hash algorithm for prefix-caching. Default: “builtin” |
save_unfull_chunk |
LMCACHE_SAVE_UNFULL_CHUNK |
Whether to save unfull chunks. Values: true/false. Default: true |
blocking_timeout_secs |
LMCACHE_BLOCKING_TIMEOUT_SECS |
Timeout for blocking operations in seconds. Default: 10 |
py_enable_gc |
LMCACHE_PY_ENABLE_GC |
Whether to enable Python garbage collection. Values: true/false. Default: true |
cache_policy |
LMCACHE_CACHE_POLICY |
Cache eviction policy (e.g. “LRU”, “LFU”, “FIFO”). Default: “LRU” |
numa_mode |
LMCACHE_NUMA_MODE |
NUMA-aware memory allocation mode. Values: “auto” (detect from system), “manual” (use extra_config mapping), null (disabled). When enabled, allocates pinned memory on specific NUMA nodes for better GPU-CPU memory bandwidth. Default: null |
external_lookup_client |
LMCACHE_EXTERNAL_LOOKUP_CLIENT |
External KV lookup service URI (e.g., “mooncakestore://address”). If null, defaults to LMCache’s internal lookup client. Default: null |
priority_limit |
LMCACHE_PRIORITY_LIMIT |
Caches requests only if priority value ≤ limit. (Not applicable for PD Disaggregation) Type: int. Default: None |
extra_config |
LMCACHE_EXTRA_CONFIG={“key”: value, …} |
Additional configuration as JSON dict. For NUMA manual mode, include “gpu_to_numa_mapping”: {gpu_id: numa_node, …}. Default: {} |
Cache Blending Configurations#
Settings related to cache blending functionality.
Note
Cache blending is not supported in the latest version. We are working on it and will add it back soon.
YAML Config Name |
Environment Variable |
Description |
---|---|---|
enable_blending |
LMCACHE_ENABLE_BLENDING |
Whether to enable blending. Values: true/false. Default: false |
blend_recompute_ratio |
LMCACHE_BLEND_RECOMPUTE_RATIO |
Ratio of blending recompute. Default: 0.15 |
blend_min_tokens |
LMCACHE_BLEND_MIN_TOKENS |
Minimum number of tokens for blending. Default: 256 |
blend_special_str |
LMCACHE_BLEND_SPECIAL_STR |
Separator string for blending. Default: “ # # “ |
Peer-to-Peer Sharing Configurations#
Settings for enabling and configuring peer-to-peer CPU KV cache sharing and global KV cache lookup.
YAML Config Name |
Environment Variable |
Description |
---|---|---|
enable_p2p |
LMCACHE_ENABLE_P2P |
Whether to enable peer-to-peer sharing. Values: true/false. Default: false |
lookup_url |
LMCACHE_LOOKUP_URL |
URL of the lookup server. Required if enable_p2p is true |
distributed_url |
LMCACHE_DISTRIBUTED_URL |
URL of the distributed server. Required if enable_p2p is true |
Controller Configurations#
Settings for the KV cache controller functionality.
YAML Config Name |
Environment Variable |
Description |
---|---|---|
enable_controller |
LMCACHE_ENABLE_CONTROLLER |
Whether to enable controller. Values: true/false. Default: false |
lmcache_instance_id |
LMCACHE_LMCACHE_INSTANCE_ID |
ID of the LMCache instance. Default: “lmcache_default_instance” |
controller_url |
LMCACHE_CONTROLLER_URL |
URL of the controller server |
lmcache_worker_port |
LMCACHE_LMCACHE_WORKER_PORT |
Port number for LMCache worker |
Nixl (Disaggregated Prefill) Configurations#
Settings for Nixl-based disaggregated prefill functionality. The latest/default nixl backend/connector are implemented inside of lmcache/v1/storage_backend/nixl_backend_v3 and lmcache/v1/storage_backend/connector/nixl_connector_v3.py.
Note
When Nixl is enabled, the following restrictions apply (welcome contributions to remove these restrictions):
remote_url must be null
save_decode_cache must be false
enable_p2p must be false
YAML Config Name |
Environment Variable |
Description |
---|---|---|
enable_nixl |
LMCACHE_ENABLE_NIXL |
Whether to enable Nixl. Values: true/false. Default: false |
enable_xpyd |
LMCACHE_ENABLE_XPYD |
Should be true when enable_nixl=true to use latest v3 nixl backend/connector. Values: true/false. Default: false |
nixl_role |
LMCACHE_NIXL_ROLE |
Nixl role. Values: “sender” (prefiller) or “receiver” (decoder). Required when enable_nixl=true |
nixl_buffer_size |
LMCACHE_NIXL_BUFFER_SIZE |
Transport buffer size for Nixl in bytes. Required for both senders and receivers when enable_nixl=true |
nixl_buffer_device |
LMCACHE_NIXL_BUFFER_DEVICE |
Device for Nixl buffer. Values: “cpu”, “cuda”. Required for both senders and receivers when enable_nixl=true |
nixl_backends |
LMCACHE_NIXL_BACKENDS |
List of Nixl transport backends. Useful for non-disaggregated use case (see below). UCX default is sufficient for disagg use case. Default: [“UCX”] |
nixl_enable_gc |
LMCACHE_NIXL_ENABLE_GC |
Whether to enable Nixl garbage collection. Values: true/false. Default: false |
nixl_peer_host |
LMCACHE_NIXL_PEER_HOST |
Host for peer connections. Required for receivers to bind to |
nixl_peer_init_port |
LMCACHE_NIXL_PEER_INIT_PORT |
Initialization port for peer connections. Required for receivers to bind to |
nixl_peer_alloc_port |
LMCACHE_NIXL_PEER_ALLOC_PORT |
Allocation port for peer connections. Required for receivers to bind to |
nixl_proxy_host |
LMCACHE_NIXL_PROXY_HOST |
Host for proxy server. Required for senders to connect to inform the proxy when transfer to decoder has been completed |
nixl_proxy_port |
LMCACHE_NIXL_PROXY_PORT |
Port for proxy server. Required for senders to connect to inform the proxy when transfer to decoder has been completed |
Nixl (as a storage backend) Configurations#
Settings for using Nixl as a storage backend instead of disaggregated prefill. This mode requires additional configurations in extra_config
.
Note
This is a different mode from disaggregated prefill. When using Nixl as a storage backend, you need to configure it through extra_config
.
extra_config:
# enable_nixl_storage will disable disaggregated prefill mode, even if
# enable_nixl is true.
enable_nixl_storage: true
nixl_backend: "POSIX" # Options: "GDS", "GDS_MT", "POSIX", "HF3FS"
nixl_path: "/path/to/storage/"
nixl_file_pool_size: 64
Configuration Key |
Description |
---|---|
enable_nixl_storage |
Whether to enable Nixl storage backend. Values: true/false |
nixl_backend |
Storage backend type. Options: “GDS”, “GDS_MT”, “POSIX”, “HF3FS” |
nixl_path |
File system path for Nixl storage |
nixl_file_pool_size |
Number of files in the storage pool |
Additional Storage Configurations#
Settings for different storage backends and paths.
YAML Config Name |
Environment Variable |
Description |
---|---|---|
weka_path |
LMCACHE_WEKA_PATH |
Path for Weka storage backend |
gds_path |
LMCACHE_GDS_PATH |
Path for GDS backend |
cufile_buffer_size |
LMCACHE_CUFILE_BUFFER_SIZE |
Buffer size for cuFile operations |
Internal API Server Configurations#
Settings for the internal API server that provides management and debugging APIs for LMCache engines. The API server runs on each worker and scheduler, allowing you to inspect and control LMCache behavior at runtime.
Note
The internal API server provides endpoints for:
Metrics: Performance and cache statistics
Configuration: Runtime configuration inspection
Metadata: Engine and model metadata
Threads: Thread debugging information
Log Level: Dynamic log level adjustment
Script Execution: Run custom Python scripts with access to the LMCache engine
Configuration Options#
YAML Config Name |
Environment Variable |
Description |
---|---|---|
internal_api_server_enabled |
LMCACHE_INTERNAL_API_SERVER_ENABLED |
Whether to enable internal API server. Default: false |
internal_api_server_host |
LMCACHE_INTERNAL_API_SERVER_HOST |
Host for internal API server to bind to. Default: “0.0.0.0” |
internal_api_server_port_start |
LMCACHE_INTERNAL_API_SERVER_PORT_START |
Starting port for internal API server. Port assignment: Scheduler = port_start + 0, Worker i = port_start + i + 1. Example: If port_start=6999, then Scheduler=6999, Worker 0=7000, Worker 1=7001, etc. Default: 6999 |
internal_api_server_include_index_list |
LMCACHE_INTERNAL_API_SERVER_INCLUDE_INDEX_LIST |
List of worker/scheduler indices to enable API server on. Use 0 for scheduler, 1 for worker 0, 2 for worker 1, etc. If null, enables on all workers/scheduler. Example: [0, 1] enables only on scheduler and worker 0. Default: null |
internal_api_server_socket_path_prefix |
LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX |
If specified, use Unix domain sockets instead of TCP ports. Socket paths will be “{prefix}_{port}”. Example: “/tmp/lmcache_api_socket” creates “/tmp/lmcache_api_socket_6999”, “/tmp/lmcache_api_socket_7000”, etc. Default: null |
Plugin Configurations#
Settings for plugin system.
YAML Config Name |
Environment Variable |
Description |
---|---|---|
plugin_locations |
LMCACHE_PLUGIN_LOCATIONS |
List of plugin locations. Default: [] |
Deprecated Configurations#
These configurations are deprecated and may be removed in future versions.
YAML Config Name |
Environment Variable |
Description |
---|---|---|
audit_actual_remote_url |
LMCACHE_AUDIT_ACTUAL_REMOTE_URL |
(Deprecated) URL of actual remote LMCache instance for auditing. Use extra_config[‘audit_actual_remote_url’] instead |