L2 Storage (Persistent Cache)#

LMCache multiprocess mode supports a two-tier storage architecture:

  • L1 (in-memory) – Fast CPU memory managed by the L1 Manager. All KV cache chunks live here during active use.

  • L2 (persistent) – Durable storage backends accessed through NIXL. The StoreController asynchronously pushes data from L1 to L2, and the PrefetchController loads data from L2 back into L1 on cache misses.

Data Flow#

Write path (L1 -> L2):

  1. vLLM stores KV cache chunks into L1 via the STORE RPC.

  2. The StoreController detects new objects (via eventfd) and asynchronously submits store tasks to each configured L2 adapter.

  3. The L2 adapter writes the data to its backend (e.g., local SSD via GDS).

Read path (L2 -> L1):

  1. A LOOKUP RPC checks L1 for prefix hits.

  2. For keys not found in L1, the PrefetchController submits lookup requests to L2 adapters.

  3. If found in L2, the data is loaded back into L1 and read-locked for the pending RETRIEVE RPC.

Adapter Types#

nixl_store – NIXL-based persistent storage#

The primary production adapter. Uses NIXL (NVIDIA Interconnect Library) for high-performance storage I/O.

Required fields:

  • backend: Storage backend – one of POSIX, GDS, GDS_MT, HF3FS, OBJ.

  • pool_size: Number of storage descriptors to pre-allocate (must be > 0).

Backend-specific parameters (``backend_params``):

File-based backends (GDS, GDS_MT, POSIX, HF3FS) require:

  • file_path: Directory path for storing L2 data.

  • use_direct_io: "true" or "false" – whether to use direct I/O.

The OBJ backend (object store) does not require file_path.

Backend descriptions:

Backend

Description

POSIX

Standard POSIX file I/O. Works on any file system. No direct I/O.

GDS

NVIDIA GPU Direct Storage. Enables direct GPU-to-storage transfers bypassing the CPU. Requires NVMe SSDs with GDS support.

GDS_MT

Multi-threaded variant of GDS for higher throughput.

HF3FS

Shared file system backend (e.g., for distributed/networked storage).

OBJ

Object store backend. No local file path required.

Configuration examples:

# POSIX backend
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false"}, "pool_size": 64}'

# GDS backend
--l2-adapter '{"type": "nixl_store", "backend": "GDS", "backend_params": {"file_path": "/data/nvme/lmcache", "use_direct_io": "true"}, "pool_size": 128}'

# GDS_MT backend
--l2-adapter '{"type": "nixl_store", "backend": "GDS_MT", "backend_params": {"file_path": "/data/nvme/lmcache", "use_direct_io": "true"}, "pool_size": 128}'

# HF3FS backend
--l2-adapter '{"type": "nixl_store", "backend": "HF3FS", "backend_params": {"file_path": "/mnt/hf3fs/lmcache", "use_direct_io": "false"}, "pool_size": 64}'

# OBJ backend
--l2-adapter '{"type": "nixl_store", "backend": "OBJ", "backend_params": {}, "pool_size": 32}'

mock – Mock adapter for testing#

Simulates L2 storage with configurable size and bandwidth. Useful for testing the L2 pipeline without real storage hardware.

Fields:

  • max_size_gb: Maximum size in GB (> 0).

  • mock_bandwidth_gb: Simulated bandwidth in GB/sec (> 0).

--l2-adapter '{"type": "mock", "max_size_gb": 256, "mock_bandwidth_gb": 10}'

Multiple Adapters (Cascade)#

You can configure multiple L2 adapters by repeating the --l2-adapter argument. Adapters are used in the order they are specified. The StoreController pushes data to all configured adapters, and the PrefetchController queries adapters in order during lookups.

# SSD (fast, smaller) + NVMe GDS (larger capacity)
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/ssd/l2", "use_direct_io": "false"}, "pool_size": 64}' \
--l2-adapter '{"type": "nixl_store", "backend": "GDS", "backend_params": {"file_path": "/data/nvme/l2", "use_direct_io": "true"}, "pool_size": 128}'

Verifying L2 Storage#

Set LMCACHE_LOG_LEVEL=DEBUG to see L2 activity in the server logs:

LMCACHE_LOG_LEVEL=DEBUG python3 -m lmcache.v1.multiprocess.server \
    --l1-size-gb 100 --eviction-policy LRU \
    --l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false"}, "pool_size": 64}'

Expected log messages when L2 is active:

LMCache DEBUG: Submitted store task ...
LMCache DEBUG: L2 store task N completed ...
LMCache DEBUG: Prefetch request submitted: X total keys, Y L1 prefix hits, Z remaining for L2