lmcache query#

The lmcache query command sends a single OpenAI-compatible inference request and reports token and latency metrics. It has two targets:

lmcache query {engine,kvcache} [options]

engine — send one request to a serving engine’s HTTP API.
kvcache — query KV-cache endpoints (not implemented yet).

query engine#

The query engine subcommand sends one request to the engine API and reports metrics. --prompt supports placeholders: {lmcache} loads lmcache/cli/documents/lmcache.txt, and custom documents can be passed with --documents NAME=PATH. The prompt token count is taken directly from the usage data reported by the engine (stream_options: {include_usage: true}).

lmcache query engine --url http://localhost:8000/v1 \
  --prompt "{lmcache} Summarize LMCache usage." \
  --format terminal \
  --max-tokens 128

================= Query Engine =================
Model:                         facebook/opt-125m
Input tokens:                                618
--------------- Latency Metrics ----------------
Output tokens:                                 9
TTFT (ms):                                 26.88
TPOT (ms/token):                            0.91
Total latency (ms):                        35.05
Throughput (tokens/s):                   1100.64
================================================

Options#

Flag	Required	Description
`--url URL`	Yes	Serving engine base URL (e.g. `http://localhost:8000/v1`).
`--prompt TEXT`	Yes	Prompt text with optional `{name}` placeholders. `{lmcache}` expands to the bundled sample document.
`--model ID`	No	Model ID for the serving engine. Auto-detected from the engine’s reported usage if omitted.
`--max-tokens N`	No	Maximum completion tokens (default: 128).
`--timeout SECS`	No	HTTP timeout in seconds (default: 30).
`--documents NAME=PATH`	No	Load file text for `{NAME}` in `--prompt`. Accepts one or more `NAME=PATH` values.
`--completions`	No	Use `POST /v1/completions` only.
`--chat-first`	No	Try `/v1/chat/completions` first, then fall back to `/v1/completions`.
`--format`	No	Output format: `terminal` (default) or `json`.
`--output PATH`	No	Save metrics to a file (format follows `--format`).
`-q` / `--quiet`	No	Suppress stdout output. Exit code only.