CLI Reference#
The lmcache command-line interface provides tools for launching,
managing, inspecting, and benchmarking LMCache servers and the inference
engines in front of them.
lmcache <command> [options]
After installing LMCache, the lmcache command is available globally.
Run lmcache -h to see all commands, or lmcache <command> -h for a
specific command.
Installation#
The lmcache CLI ships in two packages:
Package |
Install |
When to use |
|---|---|---|
|
|
Full install: server, CLI, and CUDA extensions. Required for
|
|
|
CLI only: |
Note
Do not install both packages in the same environment — they both provide
the lmcache entry point.
Available Commands#
Command |
Description |
|---|---|
Launch the LMCache MP server (ZMQ + HTTP). Requires the full install. |
|
Launch the LMCache MP coordinator (HTTP instance registry). |
|
Show detailed status of a running LMCache service. |
|
Liveness check for LMCache or vLLM servers. |
|
Single-shot query interface for the serving engine. |
|
Run sustained benchmarks against an inference engine
( |
|
Manage KV cache state (e.g. clear L1 cache) on a running server. |
|
Manage per-salt cache quotas (set, get, list, delete). |
|
Inspect and replay storage-level trace files. |
|
Run offline analysis tools (e.g. the cache simulator). |
Output Formats#
Commands that produce metrics share three common flags:
--format {terminal,json}— stdout format (default:terminal).--output PATH— also write metrics to a file (uses--format).-q/--quiet— suppress stdout; rely on the exit code.
The terminal output uses human-readable labels (e.g. "Round trip time
(ms)"), while JSON uses machine-readable keys (e.g.
"round_trip_time_ms").
Adding New Commands#
New CLI subcommands are added by creating a BaseCommand subclass under
lmcache/cli/commands/; they are discovered and registered automatically.
See Extending the CLI for details.