L2 Storage (Persistent Cache)#
LMCache multiprocess mode supports a two-tier storage architecture:
L1 (in-memory) – Fast CPU memory managed by the L1 Manager. All KV cache chunks live here during active use.
L2 (persistent) – Durable storage backends (NIXL-based or plain file-system/raw-block). The StoreController asynchronously pushes data from L1 to L2, and the PrefetchController loads data from L2 back into L1 on cache misses.
Data Flow#
Write path (L1 -> L2):
vLLM stores KV cache chunks into L1 via the
STORERPC.The
StoreControllerdetects new objects (via eventfd) and asynchronously submits store tasks to each configured L2 adapter.The L2 adapter writes the data to its backend (e.g., local SSD via GDS).
Read path (L2 -> L1):
A
LOOKUPRPC checks L1 for prefix hits.For keys not found in L1, the
PrefetchControllersubmits lookup requests to L2 adapters.If found in L2, the data is loaded back into L1 and read-locked for the pending
RETRIEVERPC.
Adapter Types#
nixl_store – NIXL-based persistent storage#
The primary production adapter. Uses NIXL (NVIDIA Interconnect Library) for high-performance storage I/O.
Required fields:
backend: Storage backend – one ofPOSIX,GDS,GDS_MT,HF3FS,OBJ,AZURE_BLOB.pool_size: Number of storage descriptors to pre-allocate (must be > 0).
Backend-specific parameters (``backend_params``):
File-based backends (GDS, GDS_MT, POSIX, HF3FS) require:
file_path: Directory path for storing L2 data.use_direct_io:"true"or"false"– whether to use direct I/O.
The OBJ and AZURE_BLOB backends (object stores) do not require file_path.
Backend descriptions:
Backend |
Description |
|---|---|
|
Standard POSIX file I/O. Works on any file system. No direct I/O. |
|
NVIDIA GPU Direct Storage. Enables direct GPU-to-storage transfers bypassing the CPU. Requires NVMe SSDs with GDS support. |
|
Multi-threaded variant of GDS for higher throughput. |
|
Shared file system backend (e.g., for distributed/networked storage). |
|
Object store backend. No local file path required. |
|
Object store backend for Azure Blob Storage. No local file path required. |
Configuration examples:
# POSIX backend
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false"}, "pool_size": 64}'
# GDS backend
--l2-adapter '{"type": "nixl_store", "backend": "GDS", "backend_params": {"file_path": "/data/nvme/lmcache", "use_direct_io": "true"}, "pool_size": 128}'
# GDS_MT backend
--l2-adapter '{"type": "nixl_store", "backend": "GDS_MT", "backend_params": {"file_path": "/data/nvme/lmcache", "use_direct_io": "true"}, "pool_size": 128}'
# HF3FS backend
--l2-adapter '{"type": "nixl_store", "backend": "HF3FS", "backend_params": {"file_path": "/mnt/hf3fs/lmcache", "use_direct_io": "false"}, "pool_size": 64}'
# OBJ backend
--l2-adapter '{"type": "nixl_store", "backend": "OBJ", "backend_params": {}, "pool_size": 32}'
# AZURE_BLOB backend
--l2-adapter '{"type": "nixl_store", "backend": "AZURE_BLOB", "backend_params": {"account_url": "https://<account_name>.blob.core.windows.net", "container_name": "<container_name>"}, "pool_size": 32}'
nixl_store_dynamic – NIXL-based dynamic storage with persist/recover#
A dynamic variant of the NIXL adapter that opens and registers files per-operation instead of pre-allocating them at init. This enables:
Persist/recover – cached KV metadata survives restarts.
No fd limits – files are opened and closed per transfer, so the cache can grow beyond OS open-file-descriptor limits.
Note
Only file-based backends are supported (POSIX, GDS, GDS_MT,
HF3FS). The OBJ and AZURE_BLOB backends are not supported yet.
Required fields:
backend: Storage backend – one ofPOSIX,GDS,GDS_MT,HF3FS.
Backend-specific parameters (``backend_params``):
file_path: Directory path for storing L2 data files.use_direct_io:"true"or"false".max_capacity_gb: Maximum storage capacity in GB. The adapter rejects stores when this limit is reached. Required for the eviction controller to compute usage.
Optional fields (for persist):
persist_enabled(bool, defaulttrue): Iftrue, data files are kept on disk at shutdown. Iffalse, all data files are deleted on shutdown.
Lookup always checks secondary storage (disk) on miss and lazily populates the in-memory index when a file is found.
Configuration examples:
# Basic dynamic POSIX backend (persist enabled by default)
--l2-adapter '{"type": "nixl_store_dynamic", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false", "max_capacity_gb": "10"}}'
# Explicitly disable persist
--l2-adapter '{"type": "nixl_store_dynamic", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false", "max_capacity_gb": "10"}, "persist_enabled": false}'
# With eviction
--l2-adapter '{"type": "nixl_store_dynamic", "backend": "GDS", "backend_params": {"file_path": "/data/nvme/l2", "use_direct_io": "true", "max_capacity_gb": "50"}, "eviction": {"eviction_policy": "LRU", "trigger_watermark": 0.9, "eviction_ratio": 0.1}}'
Persist / secondary lookup behaviour:
On shutdown, the adapter keeps data files on disk by default (
persist_enableddefaults totrue). If explicitly set tofalse, all data files are deleted to avoid orphaned storage.On startup, the in-memory index is empty. Every lookup miss falls through to a secondary lookup on disk: if the deterministic file exists, it is treated as a hit and the in-memory index is populated lazily from the file size.
fs – File-system backed storage#
A pure file-system L2 adapter using async I/O (aiofiles). Each KV cache
object is stored as a raw .data file whose name encodes the full
ObjectKey. Does not require NIXL – works on any POSIX file system.
Required fields:
base_path: Directory for storing KV cache files.
Optional fields:
relative_tmp_dir: Relative sub-directory for temporary files during writes (atomic rename on completion).read_ahead_size: Trigger file-system read-ahead by reading this many bytes first (positive integer, optional).use_odirect:trueorfalse(defaultfalse) – bypass the page cache viaO_DIRECT.
Configuration examples:
# Basic FS adapter
--l2-adapter '{"type": "fs", "base_path": "/data/lmcache/l2"}'
# With temp directory
--l2-adapter '{"type": "fs", "base_path": "/data/lmcache/l2", "relative_tmp_dir": ".tmp"}'
# With O_DIRECT for bypassing page cache
--l2-adapter '{"type": "fs", "base_path": "/data/lmcache/l2", "use_odirect": true}'
dax – Device-DAX fixed-slot storage#
An L2 adapter that maps a single Device-DAX path, such as /dev/dax1.0,
and stores KV cache objects in fixed-size slots. This adapter is intended for
byte-addressable memory devices such as persistent memory or CXL memory.
The MP dax adapter is volatile in this release. It keeps the key index in
server memory and rebuilds an empty index on restart. Old bytes may remain on
the DAX device, but they are unreachable after the LMCache server restarts.
Required fields:
device_path: Path to the mmap-able DAX device or test file.max_dax_size_gb: Number of GiB to map fromdevice_path.slot_bytes: Fixed slot size in bytes. This must be large enough for one full LMCache chunk because MP memory descriptors do not expose the non-MP full-chunk size.
Optional fields:
num_store_workers(int, default1): Store worker threads.num_lookup_workers(int, default1): Lookup worker threads.num_load_workers(int, defaultmin(4, os.cpu_count())): Load worker threads.persist_enabled(bool): Accepted by common L2 config parsing but has no effect fordaxbecause restart recovery is not implemented.
Configuration example:
--l2-adapter '{
"type": "dax",
"device_path": "/dev/dax1.0",
"max_dax_size_gb": 100,
"slot_bytes": 268435456,
"num_store_workers": 1,
"num_lookup_workers": 1,
"num_load_workers": 4,
"eviction": {
"eviction_policy": "LRU",
"trigger_watermark": 0.9,
"eviction_ratio": 0.1
}
}'
Current limits:
Uses one server-owned mapped DAX path. Per-TP partitions and multi-device striping are not implemented.
Only single-buffer objects are supported. Multi-tensor objects are rejected.
Capacity is slot-based, not payload-byte-based. L2 eviction and usage metrics count occupied slots.
Lookups acquire DAX-side external locks.
submit_unlockreleases those locks after load/retrieve completes, making entries evictable again.
fs_native – Native C++ file-system connector#
A file-system L2 adapter backed by the native C++ LMCacheFSClient
wrapped with NativeConnectorL2Adapter. I/O is dispatched through a
C++ worker-thread pool with eventfd-driven completions, giving a true
I/O queue depth on a single Python thread.
Required fields:
base_path: Directory for storing KV cache files.
Optional fields:
num_workers(int, default4, > 0): Number of C++ worker threads inside the connector. This is the real I/O queue depth – raise to push throughput on filesystems whose aggregate BW exceeds per-stream BW.relative_tmp_dir(str, default""): Relative sub-directory for temporary files during writes (atomic rename on completion).use_odirect(bool, defaultfalse): Bypass the page cache viaO_DIRECT. Required to measure real disk bandwidth. See alignment caveat below.read_ahead_size(int, optional): Trigger filesystem readahead by issuing a warm-up read of this many bytes at open time.max_capacity_gb(float, default0): Maximum L2 capacity in GB for client-side usage tracking. Default0disables tracking.
Important
O_DIRECT has two independent alignment requirements:
Length alignment. The transfer length must be a multiple of the filesystem’s block size. The connector queries the disk block size at construction time and, on each operation, checks
len % disk_block_size. If the length is not a multiple, the connector silently falls back to a buffered open (noO_DIRECT) for that operation – correctness is preserved but you do not get true direct I/O. To ensureO_DIRECTis actually used, choose--chunk-sizeso that the resulting per-chunk byte size is a multiple of the FS block size. GPFS and similar parallel filesystems often use large blocks (e.g. several MiB).Memory-buffer alignment. The I/O buffer pointer itself must also be aligned (typically to 4096 bytes on local disks, or to the FS block size on parallel filesystems). This is controlled by
--l1-align-bytes(default4096) – raise it to match the FS block size when running on a filesystem with larger blocks. If the buffer is misaligned, the underlyingread/writesyscall returnsEINVAL(this is not caught by the length-fallback path above and will surface as a runtime error).
If unsure, start with use_odirect: false and confirm correctness
before enabling O_DIRECT.
Configuration examples:
# Basic native FS adapter
--l2-adapter '{"type": "fs_native", "base_path": "/data/lmcache/l2"}'
# Many worker threads for a parallel filesystem (e.g. GPFS, Lustre)
--l2-adapter '{"type": "fs_native", "base_path": "/data/lmcache/l2", "num_workers": 32}'
# O_DIRECT for real-disk benchmarking
--l2-adapter '{"type": "fs_native", "base_path": "/data/lmcache/l2", "num_workers": 32, "use_odirect": true}'
Buffer-only mode example. L1 acts as a pure write buffer that absorbs the peak burst of in-flight chunks while the C++ worker pool drains them to disk; nothing is retained in L1 once a store completes:
lmcache server \
--host 0.0.0.0 --port 5555 \
--max-workers 32 \
--l1-size-gb 32 --l1-use-lazy \
--eviction-policy noop \
--l2-store-policy skip_l1 \
--l2-adapter '{"type": "fs_native", "base_path": "/data/lmcache/l2", "num_workers": 32, "use_odirect": true}'
raw_block – Raw block device backed persistent storage#
A built-in L2 adapter that stores KV objects in fixed-size slots on a raw block device or pre-sized file using the Rust raw-device I/O bindings. It reuses the existing raw-block metadata checkpoint model and writes directly into the caller-provided load buffers during prefetch.
Required fields:
device_path: Raw device path or pre-sized file path.slot_bytes: Fixed slot size in bytes. Must be aligned toblock_align.
Optional fields:
capacity_bytes: Optional cap on the usable device bytes. Default0means use the full device/file size.use_odirect:trueorfalse(defaulttrue).block_align: Device alignment in bytes (default4096).header_bytes: Per-slot header reservation (default4096).meta_total_bytes: Reserved metadata checkpoint region (default256MiB).meta_magic/meta_version: Metadata checkpoint identity/version knobs.meta_checkpoint_interval_sec/meta_idle_quiet_ms/meta_enable_periodic/meta_verify_on_load: Checkpoint and recovery controls carried over from the legacy raw-block backend.load_checkpoint_on_init: Load an existing on-device metadata checkpoint during startup (defaulttrue). Set tofalseto start with an empty in-memory index instead.enable_zero_copy: Try aligned direct-buffer I/O when possible.io_engine: Rust raw-block I/O engine. Valid values are"posix"(default synchronouspread/pwritepath),"io_uring"(direct Rust io_uring syscall path).iouring_queue_depth: Queue depth forio_engine="io_uring".num_store_workers/num_lookup_workers/num_load_workers: Worker-thread counts for each operation type.
Notes:
raw_blockis a server-owned MP adapter. It does not support per-TP device-path mappings in MP mode.raw_blockremains"type": "raw_block"for both supported engines.raw_blockowns on-device slot allocation, checkpointing, and recovery throughRawBlockCore. Slot reclamation is driven by the shared/global L2 eviction controller or explicitdelete()calls.If
use_odirectis enabled, the server’s--l1-align-bytesshould be at leastblock_align.persist_enabledmust remaintruefor this adapter.
Configuration examples:
--l2-adapter '{"type": "raw_block", "device_path": "/dev/nvme0n1", "slot_bytes": 1048576, "block_align": 4096, "header_bytes": 4096, "meta_total_bytes": 268435456, "use_odirect": true, "num_store_workers": 2, "num_lookup_workers": 1, "num_load_workers": 4}'
--l2-adapter '{"type": "raw_block", "device_path": "/dev/nvme0n1", "slot_bytes": 1048576, "io_engine": "io_uring", "iouring_queue_depth": 256, "use_odirect": true}'
--l2-adapter '{"type": "raw_block", "device_path": "/dev/nvme0n1", "slot_bytes": 1048576, "load_checkpoint_on_init": false, "eviction": {"eviction_policy": "LRU", "trigger_watermark": 0.9, "eviction_ratio": 0.1}}'
mooncake_store – Mooncake Store native connector#
An L2 adapter backed by the native C++ Mooncake Store connector. Uses Mooncake for high-performance distributed KV cache storage with RDMA support.
When Mooncake is configured with "protocol": "rdma", LMCache must also
have a valid contiguous L1 memory region available. The distributed storage
manager passes this L1 memory descriptor to the adapter factory automatically
in MP mode. If the descriptor is missing or invalid, adapter creation fails
with ValueError instead of silently falling back to a non-RDMA path.
Prerequisites – Building with Mooncake support:
The Mooncake extension is not built by default. You must explicitly enable it:
BUILD_MOONCAKE=1 pip install -e . --verbose
The BUILD_MOONCAKE environment variable controls compilation:
BUILD_MOONCAKE=1: Enable the Mooncake C++ extension.BUILD_MOONCAKE=0: Force disable (highest priority), even ifMOONCAKE_INCLUDE_DIRis set.Not set: Falls back to checking
MOONCAKE_INCLUDE_DIRfor backward compatibility. IfMOONCAKE_INCLUDE_DIRis also unset, the extension is skipped.
If the Mooncake headers are not installed in the system include path
(e.g., /usr/local/include), you must point to them explicitly:
BUILD_MOONCAKE=1 \
MOONCAKE_INCLUDE_DIR=/path/to/mooncake/include \
MOONCAKE_LIB_DIR=/path/to/mooncake/lib \
pip install -e . --verbose
LMCache-specific fields:
num_workers: Number of C++ worker threads (default4, must be > 0).
Mooncake fields:
All other keys in the JSON config (except type, num_workers,
and eviction) are forwarded as-is to Mooncake’s
store.setup(config: dict) API (introduced in
Mooncake PR #1445).
Older Mooncake builds that only expose the positional-arg setup()
signature are still supported – LMCache transparently falls back to
the legacy form on TypeError. Refer to the
Mooncake documentation
for available setup keys (e.g., local_hostname,
metadata_server, master_server_addr, protocol,
rdma_devices, global_segment_size).
Configuration example:
--l2-adapter '{
"type": "mooncake_store",
"num_workers": 4,
"local_hostname": "node01",
"metadata_server": "http://localhost:8080/metadata",
"master_server_addr": "localhost:50051",
"protocol": "tcp",
"local_buffer_size": "3221225472"
"global_segment_size": "3221225472"
}'
For full Mooncake setup instructions (master service, metadata server, etc.), see Mooncake .
RDMA notes:
protocol: "rdma"requires a valid LMCache L1 memory descriptor.When using
protocol: "rdma", it is recommended to disable lazy L1 allocation with--no-l1-use-lazyso the L1 buffer is fully allocated before Mooncake registers it.protocol: "tcp"does not require L1 preregistration.If Mooncake RDMA initialization fails at adapter creation time, verify that LMCache L1 memory is enabled and that the descriptor has a non-zero pointer and size.
s3 – S3-compatible object store#
An L2 adapter that stores KV cache objects as S3 objects using the AWS Common Runtime (CRT). Works with AWS S3, S3 Express One Zone, and any S3-compatible endpoint (MinIO, Ceph RGW, etc.).
Required fields:
s3_endpoint: Bucket URL – either"s3://<bucket>"or the bare host form (used for non-AWS endpoints).s3_region: AWS region string (e.g."us-west-2").
Optional fields:
s3_num_io_threads(int, default64): Number of CRT I/O threads.s3_prefer_http2(bool, defaulttrue): Negotiate HTTP/2 via ALPN.s3_enable_s3express(bool, defaultfalse): Enable S3 Express signing for S3 Express One Zone buckets.disable_tls(bool, defaultfalse): Bypass TLS when pointing at a plain-HTTP endpoint (e.g. a local MinIO).aws_access_key_id/aws_secret_access_key(string): Static credentials; omit both to use the AWS default credential provider chain (environment, EC2 instance profile, etc.).max_capacity_gb(float, default0.0): Aggregate capacity used byget_usage(). A value of0disables aggregate eviction (usage_fraction == -1.0).
Configuration examples:
# AWS S3 with default credentials
--l2-adapter '{"type": "s3", "s3_endpoint": "s3://my-bucket", "s3_region": "us-west-2"}'
# Static credentials, HTTP/2 disabled
--l2-adapter '{"type": "s3", "s3_endpoint": "s3://my-bucket", "s3_region": "us-west-2", "s3_prefer_http2": false, "aws_access_key_id": "AKIA...", "aws_secret_access_key": "..."}'
# Local MinIO over plain HTTP
--l2-adapter '{"type": "s3", "s3_endpoint": "minio.local:9000", "s3_region": "us-east-1", "disable_tls": true, "aws_access_key_id": "minio", "aws_secret_access_key": "minio123"}'
mock – Mock adapter for testing#
Simulates L2 storage with configurable size and bandwidth. Useful for testing the L2 pipeline without real storage hardware.
Fields:
max_size_gb: Maximum size in GB (> 0).mock_bandwidth_gb: Simulated bandwidth in GB/sec (> 0).
--l2-adapter '{"type": "mock", "max_size_gb": 256, "mock_bandwidth_gb": 10}'
Multiple Adapters (Cascade)#
You can configure multiple L2 adapters by repeating the --l2-adapter
argument. Adapters are used in the order they are specified. The
StoreController pushes data to all configured adapters, and the
PrefetchController queries adapters in order during lookups.
# SSD (fast, smaller) + NVMe GDS (larger capacity)
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/ssd/l2", "use_direct_io": "false"}, "pool_size": 64}' \
--l2-adapter '{"type": "nixl_store", "backend": "GDS", "backend_params": {"file_path": "/data/nvme/l2", "use_direct_io": "true"}, "pool_size": 128}'
Store and Prefetch Policies#
The store policy controls how keys flow from L1 to L2: which adapters receive each key and whether keys are deleted from L1 after a successful L2 store. The prefetch policy controls how keys flow from L2 back to L1: when multiple adapters have the same key, the policy decides which adapter loads it.
Select policies via CLI:
--l2-store-policy default \
--l2-prefetch-policy default
Built-in policies:
Flag |
Name |
Behaviour |
|---|---|---|
|
|
Store all keys to all adapters. Never delete from L1. |
|
|
Buffer-only mode. Store all keys to all adapters, then
delete them from L1 immediately. Pair with
|
|
|
For each key, pick the first (lowest-indexed) adapter that has it. Prefetched keys are temporary (deleted after the reader finishes). |
|
|
Same load plan as |
Prefetch Concurrency#
The --l2-prefetch-max-in-flight flag limits the number of concurrent
prefetch requests that the PrefetchController can have in flight at
any time. A higher value increases L2-to-L1 throughput but also
increases L1 memory pressure from in-flight data.
Flag |
Default |
Description |
|---|---|---|
|
|
Maximum number of concurrent prefetch requests. |
Buffer-Only Mode#
When L1 is used purely as a write buffer (all data lives in L2), use
--l2-store-policy skip_l1 together with --eviction-policy noop.
This combination deletes keys from L1 as soon as they are stored to L2
and disables the LRU eviction tracker entirely, reducing memory and CPU
overhead.
--eviction-policy noop \
--l2-store-policy skip_l1 \
--l2-prefetch-policy default
Policies are extensible – new policies can be added by creating a file
in storage_controllers/ and calling register_store_policy() or
register_prefetch_policy() at import time. See the design doc
l2_adapters/design_docs/overall.md for details.
Serde (compression / quantization)#
Each adapter can optionally run a serde (serializer / deserializer) that transforms data on the way in and out of L2 — e.g. fp8 quantization for disk backends, or encryption for remote adapters. See L2 Serde (Serialization / Deserialization) for details and configuration.
Eviction#
LMCache supports eviction at both storage tiers so that each tier can operate within a fixed capacity budget.
L1 Eviction#
L1 eviction runs a single background thread that monitors overall L1
memory usage. When usage exceeds trigger_watermark, the eviction
policy evicts a fraction of the least-recently-used keys.
CLI flags:
Flag |
Default |
Description |
|---|---|---|
|
(required) |
Policy name: |
|
|
L1 usage fraction [0, 1] above which eviction is triggered. |
|
|
Fraction of currently allocated L1 memory to evict per cycle. |
Example:
--eviction-policy LRU \
--eviction-trigger-watermark 0.8 \
--eviction-ratio 0.2
L2 Eviction#
L2 eviction is per-adapter and opt-in. Each adapter can
independently declare an eviction policy by adding an "eviction"
sub-object to its --l2-adapter JSON spec. Adapters without an
"eviction" key have no eviction controller.
When L2 eviction is enabled for an adapter, a dedicated background
thread monitors that adapter’s get_usage() value. Once usage
exceeds trigger_watermark, the policy evicts keys until usage
drops by eviction_ratio.
``”eviction”`` sub-object fields:
Field |
Default |
Description |
|---|---|---|
|
(required) |
Policy name: |
|
|
Adapter usage fraction [0, 1] above which eviction is triggered. |
|
|
Fraction of used capacity to evict per cycle. |
Example — nixl_store with LRU eviction:
--l2-adapter '{
"type": "nixl_store",
"backend": "POSIX",
"backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false"},
"pool_size": 128,
"eviction": {
"eviction_policy": "LRU",
"trigger_watermark": 0.8,
"eviction_ratio": 0.2
}
}'
Adapter support:
Adapter |
L2 Eviction Support |
|---|---|
|
Full support. |
|
Full support. |
|
Full support. Useful for testing eviction behaviour without real storage hardware. |
|
Full shared/global eviction support. |
|
|
|
Full support. |
|
No eviction support (native connector adapter). |
|
No eviction support ( |
native connectors |
No eviction support. |
Note
Each L2 adapter instance gets its own independent eviction controller and policy. Two adapters of the same type can have different watermarks or policies.
Combined L1 + L2 Eviction Example#
--l1-size-gb 100 \
--eviction-policy LRU \
--eviction-trigger-watermark 0.8 \
--eviction-ratio 0.2 \
--l2-adapter '{
"type": "nixl_store",
"backend": "GDS",
"backend_params": {"file_path": "/data/nvme/l2", "use_direct_io": "true"},
"pool_size": 256,
"eviction": {
"eviction_policy": "LRU",
"trigger_watermark": 0.9,
"eviction_ratio": 0.1
}
}'
In this setup:
L1 evicts from memory when it is 80 % full, reclaiming 20 % of allocated memory per cycle.
L2 (NIXL/GDS) evicts from the storage pool when 90 % of pool slots are occupied, reclaiming 10 % per cycle.
Both tiers use independent LRU policies, so each evicts its own least-recently-used keys.
Verifying L2 Storage#
Set LMCACHE_LOG_LEVEL=DEBUG to see L2 activity in the server logs:
LMCACHE_LOG_LEVEL=DEBUG lmcache server \
--l1-size-gb 100 --eviction-policy LRU \
--l2-adapter '{"type": "nixl_store", "backend": "POSIX", "backend_params": {"file_path": "/data/lmcache/l2", "use_direct_io": "false"}, "pool_size": 64}'
Expected log messages when L2 is active:
LMCache DEBUG: Submitted store task ...
LMCache DEBUG: L2 store task N completed ...
LMCache DEBUG: Prefetch request submitted: X total keys, Y L1 prefix hits, Z remaining for L2