Internal API Server Metrics#

Another approach to retrieve LMCache metrics is to use the internal API server.

Overview#

The internal API server exposes Prometheus-compatible metrics endpoints in your LMCache deployment.

Configure your vLLM instance to enable the internal API server:

LMCACHE_INTERNAL_API_SERVER_ENABLED=true \
vllm serve $model \
--kv-transfer-config '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'

Retrieve metrics from the worker’s endpoint:

curl http://$IP:7000/metrics

The following environment variables are used implicitly with their default values:

Default Port Configuration#
Environment Variable	Default Value	Description
`LMCACHE_INTERNAL_API_SERVER_HOST`	`0.0.0.0`	Host address for the internal API server to bind to.
`LMCACHE_INTERNAL_API_SERVER_PORT_START`	`6999`	Starting port number, e.g.: Scheduler: port_start + 0 (6999) Worker 0: port_start + 1 (7000) Worker 1: port_start + 2 (7001)

Therefore, the metrics endpoint curl command above uses port 7000.

For comprehensive testing and configuration options, refer to Testing the Server for detailed examples and best practices.