L2 Storage (Persistent Cache)#
LMCache multiprocess mode supports a two-tier storage architecture:
L1 (in-memory) – Fast CPU memory managed by the L1 Manager. All KV cache chunks live here during active use.
L2 (persistent) – Durable storage backends accessed through NIXL. The StoreController asynchronously pushes data from L1 to L2, and the PrefetchController loads data from L2 back into L1 on cache misses.
Data Flow#
Write path (L1 -> L2):
vLLM stores KV cache chunks into L1 via the
STORERPC.The
StoreControllerdetects new objects (via eventfd) and asynchronously submits store tasks to each configured L2 adapter.The L2 adapter writes the data to its backend (e.g., local SSD via GDS).
Read path (L2 -> L1):
A
LOOKUPRPC checks L1 for prefix hits.For keys not found in L1, the
PrefetchControllersubmits lookup requests to L2 adapters.If found in L2, the data is loaded back into L1 and read-locked for the pending
RETRIEVERPC.
Adapter Types#
nixl_store – NIXL-based persistent storage#
The primary production adapter. Uses NIXL (NVIDIA Interconnect Library) for high-performance storage I/O.
Required fields:
backend: Storage backend – one ofPOSIX,GDS,GDS_MT,HF3FS,OBJ.pool_size: Number of storage descriptors to pre-allocate (must be > 0).
Backend-specific parameters (``backend_params``):
File-based backends (GDS, GDS_MT, POSIX, HF3FS) require:
file_path: Directory path for storing L2 data.use_direct_io:"true"or"false"– whether to use direct I/O.
The OBJ backend (object store) does not require file_path.
Backend descriptions:
Backend |
Description |
|---|---|
|
Standard POSIX file I/O. Works on any file system. No direct I/O. |
|
NVIDIA GPU Direct Storage. Enables direct GPU-to-storage transfers bypassing the CPU. Requires NVMe SSDs with GDS support. |
|
Multi-threaded variant of GDS for higher throughput. |
|
Shared file system backend (e.g., for distributed/networked storage). |
|
Object store backend. No local file path required. |
Configuration examples:
# POSIX backend
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false"}, "pool_size": 64}'
# GDS backend
--l2-adapter '{"type": "nixl_store", "backend": "GDS", "backend_params": {"file_path": "/data/nvme/lmcache", "use_direct_io": "true"}, "pool_size": 128}'
# GDS_MT backend
--l2-adapter '{"type": "nixl_store", "backend": "GDS_MT", "backend_params": {"file_path": "/data/nvme/lmcache", "use_direct_io": "true"}, "pool_size": 128}'
# HF3FS backend
--l2-adapter '{"type": "nixl_store", "backend": "HF3FS", "backend_params": {"file_path": "/mnt/hf3fs/lmcache", "use_direct_io": "false"}, "pool_size": 64}'
# OBJ backend
--l2-adapter '{"type": "nixl_store", "backend": "OBJ", "backend_params": {}, "pool_size": 32}'
mock – Mock adapter for testing#
Simulates L2 storage with configurable size and bandwidth. Useful for testing the L2 pipeline without real storage hardware.
Fields:
max_size_gb: Maximum size in GB (> 0).mock_bandwidth_gb: Simulated bandwidth in GB/sec (> 0).
--l2-adapter '{"type": "mock", "max_size_gb": 256, "mock_bandwidth_gb": 10}'
Multiple Adapters (Cascade)#
You can configure multiple L2 adapters by repeating the --l2-adapter
argument. Adapters are used in the order they are specified. The
StoreController pushes data to all configured adapters, and the
PrefetchController queries adapters in order during lookups.
# SSD (fast, smaller) + NVMe GDS (larger capacity)
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/ssd/l2", "use_direct_io": "false"}, "pool_size": 64}' \
--l2-adapter '{"type": "nixl_store", "backend": "GDS", "backend_params": {"file_path": "/data/nvme/l2", "use_direct_io": "true"}, "pool_size": 128}'
Verifying L2 Storage#
Set LMCACHE_LOG_LEVEL=DEBUG to see L2 activity in the server logs:
LMCACHE_LOG_LEVEL=DEBUG python3 -m lmcache.v1.multiprocess.server \
--l1-size-gb 100 --eviction-policy LRU \
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false"}, "pool_size": 64}'
Expected log messages when L2 is active:
LMCache DEBUG: Submitted store task ...
LMCache DEBUG: L2 store task N completed ...
LMCache DEBUG: Prefetch request submitted: X total keys, Y L1 prefix hits, Z remaining for L2