清除 KV Cache#

警告

本页面记录了 LMCache 的进程内模式（已弃用）的行为。请考虑使用 LMCache MP 模式以获得更好的功能支持和性能。

clear 接口定义如下：

clear(instance_id: str, location: str) -> event_id: str, num_tokens: int

该函数会移除指定 instance_id 在 location 存储的 KV Cache。它返回一个 event_id 和计划清除的令牌数量。

示例用法：#

首先，创建一个 yaml 文件 example.yaml 来配置 lmcache 实例：

chunk_size: 256
local_cpu: True
max_local_cpu_size: 5

# cache controller configurations
enable_controller: True
lmcache_instance_id: "lmcache_default_instance"
controller_pull_url: "localhost:9001"
lmcache_worker_ports: 8001

# Peer identifiers
p2p_host: "localhost"
p2p_init_ports: 8200

在端口 8000 启动 vllm/lmcache 实例：

CUDA_VISIBLE_DEVICES=0 LMCACHE_CONFIG_FILE=example.yaml vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 4096 \
  --gpu-memory-utilization 0.8 --port 8000 --kv-transfer-config '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'

在9000端口启动lmcache控制器，在9001端口启动监视器：

lmcache_controller --host localhost --port 9000 --monitor-port 9001

发送请求到 vllm：

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
        "model": "meta-llama/Llama-3.1-8B-Instruct",
        "prompt": "Explain the significance of KV cache in language models.",
        "max_tokens": 10
      }'

清除系统中的 KV Cache：

curl -X POST http://localhost:9000/clear \
  -H "Content-Type: application/json" \
  -d '{
        "instance_id": "lmcache_default_instance",
        "location": "LocalCPUBackend"
      }'

控制器会回复类似于以下内容的消息：

{"event_id": "xxx", "num_tokens": 12}

这表明 12 个令牌的 KV Cache 已被安排清除。我们可以通过执行查找来验证缓存是否已被清除：

curl -X POST http://localhost:9000/lookup \
  -H "Content-Type: application/json" \
  -d '{
        "tokens": [128000, 849, 21435, 279, 26431, 315, 85748, 6636, 304, 4221, 4211, 13]
      }'

查找应该返回一个空结果，确认给定的令牌的 KV Cache 已被清除。