lmcache 描述#
lmcache describe 命令显示正在运行的 LMCache 服务的详细状态,包括缓存健康状况、L1 存储、注册的模型和 L2 适配器。
lmcache describe kvcache --url http://localhost:8000
============ LMCache KV Cache Service ============
Health: OK
URL: http://localhost:8000
Engine type: BlendEngine
Chunk size: 256
L1 capacity (GB): 60.00
L1 used (GB): 42.30 (70.5%)
Eviction policy: LRU
Cached objects: 1024
Active sessions: 3
---- Model: meta-llama/Llama-3.1-70B-Instruct ----
Model: meta-llama/Llama-3.1-70B-Instruct
World size: 4
GPU IDs: 0, 1, 2, 3
Num layers: 80
Num blocks: 2048
Cache size per token (bytes): 327680
--- Kernel group 0 (meta-llama/Llama-3.1-70B-Instruct) ---
Kernel group index: 0
Engine group index: 0
Object group index: 0
Num layers: 80
Physical block size: 128
Compress ratio: 1
Dtype: torch.float16
MLA: False
Attention backend: vLLM non-MLA flash attention
GPU KV shape: NL x [2, NB, BS, NH, HS]
GPU KV tensor shape: 80 x [2, 2048, 128, 8, 128]
------------- L2: NixlStoreL2Adapter -------------
Type: NixlStoreL2Adapter
Health: OK
Backend: nixl_rdma
Stored objects: 512
Pool used: 480 / 512 (93.8%)
==================================================
输出显示:
概述 — 健康状态、引擎类型、块大小。
L1 存储 — 容量、使用情况、逐出策略、缓存对象数量。
Registered models — per-model KV cache layout: a context-wide summary followed by one kernel group section per kernel group, each with the GPU KV tensor shape (symbolic and concrete), attention backend, and group geometry.
L2 适配器 — 类型、健康状况、后端、存储对象和利用率。
选项#
标志 |
描述 |
|---|---|
|
描述目标(位置参数,必需;当前仅支持 |
|
LMCache HTTP 服务器 URL(默认: |
|
输出格式: |
|
将指标保存到文件中(格式遵循 |
|
抑制标准输出。仅返回退出代码。 |
JSON 输出#
Use --format json for machine-readable output. Models, kernel groups, and
L2 adapters are collected into lists for easy programmatic access:
lmcache describe kvcache --url http://localhost:8000 --format json
{
"title": "LMCache KV Cache Service",
"metrics": {
"health": "OK",
"url": "http://localhost:8000",
"engine_type": "BlendEngine",
"chunk_size": 256,
"l1_capacity_gb": 60.0,
"l1_used_gb": "42.30 (70.5%)",
"eviction_policy": "LRU",
"cached_objects": 1024,
"active_sessions": 3,
"models": [
{
"model": "meta-llama/Llama-3.1-70B-Instruct",
"world_size": 4,
"gpu_ids": "0, 1, 2, 3",
"num_layers": 80,
"num_blocks": 2048,
"cache_size_per_token": 327680
}
],
"kernel_groups": [
{
"model": "meta-llama/Llama-3.1-70B-Instruct",
"kernel_group_idx": 0,
"engine_group_idx": 0,
"object_group_idx": 0,
"num_layers": 80,
"physical_block_size": 128,
"compress_ratio": 1,
"dtype": "torch.float16",
"is_mla": false,
"attention_backend": "vLLM non-MLA flash attention",
"gpu_kv_shape": "NL x [2, NB, BS, NH, HS]",
"gpu_kv_concrete_shape": "80 x [2, 2048, 128, 8, 128]"
}
],
"l2_adapters": [
{
"type": "NixlStoreL2Adapter",
"health": "OK",
"backend": "nixl_rdma",
"stored_object_count": 512,
"pool_used": "480 / 512 (93.8%)"
}
]
}
}
GPU KV 形状缩写#
gpu_kv_shape 字段使用来自 GPUKVFormat 枚举的简短名称:
缩写 |
含义 |
|---|---|
注意事项 |
num_blocks |
NL |
num_layers |
批量大小 |
块大小 |
NH |
头数 |
HS |
头部大小 |
PBS |
页面缓冲区大小 (NB × BS) |