Metrics by vLLM API#

LMCache provides detailed metrics via a Prometheus endpoint, allowing for in-depth monitoring of cache performance and behavior. This section outlines how to enable and configure observability from embedded vLLM /metrics API endpoint.

Quick Start Guide#

1) On vLLM/LMCache side#

In v1, vLLM and LMCache run in separate processes, so you have to use multi‑process Prometheus.

The PROMETHEUS_MULTIPROC_DIR environment variable must be the same in both processes, as a IPC directory.

PROMETHEUS_MULTIPROC_DIR=/tmp/lmcache_prometheus \
#.. other environment variables \
vllm serve $MODEL -port 8000 ...

Once the HTTP server is running, you can access the LMCache metrics at the /metrics endpoint.

curl http://$<vllm-worker-ip>:8000/metrics | grep lmcache

# Replace $IP with the IP address of a vLLM worker

And you will also find some .db files in the $PROMETHEUS_MULTIPROC_DIR directory.

2) Prometheus Configuration#

To scrape the LMCache metrics with a Prometheus server, add the following job to your prometheus.yml configuration, or equivalent configuration to scrape the metrics endpoint:

scrape_configs:
  - job_name: 'lmcache'
    static_configs:
      - targets: ['<vllm-worker-ip>:8000']
    scrape_interval: 15s

Available Metrics#

LMCache exposes a variety of metrics to monitor its performance. The following table lists all available metrics organized by category:

LMCache Metrics#
Metric Name	Type	Description
Core Request Metrics
`lmcache:num_retrieve_requests`	Counter	Total number of retrieve requests
`lmcache:num_store_requests`	Counter	Total number of store requests
`lmcache:num_lookup_requests`	Counter	Total number of lookup requests
`lmcache:num_requested_tokens`	Counter	Total number of tokens requested for retrieval
`lmcache:num_hit_tokens`	Counter	Total number of cache hit tokens from retrieval
`lmcache:num_lookup_tokens`	Counter	Total number of tokens requested in lookup operations
`lmcache:num_lookup_hits`	Counter	Total number of tokens hit in lookup operations
`lmcache:num_vllm_hit_tokens`	Counter	Number of hit tokens in vLLM
Hit Rate Metrics
`lmcache:retrieve_hit_rate`	Gauge	The hit rate for retrieve requests
`lmcache:lookup_hit_rate`	Gauge	The hit rate for lookup requests
Cache Usage Metrics
`lmcache:local_cache_usage`	Gauge	Local cache usage in bytes
`lmcache:remote_cache_usage`	Gauge	Remote cache usage in bytes
`lmcache:local_storage_usage`	Gauge	Local storage usage in bytes
Performance Metrics
`lmcache:time_to_retrieve`	Histogram	Time taken to retrieve from the cache (seconds)
`lmcache:time_to_store`	Histogram	Time taken to store to the cache (seconds)
`lmcache:retrieve_speed`	Histogram	Retrieval speed (tokens per second)
`lmcache:store_speed`	Histogram	Storage speed (tokens per second)
Remote Backend Metrics
`lmcache:num_remote_read_requests`	Counter	Total number of read requests to remote backends
`lmcache:num_remote_read_bytes`	Counter	Total number of bytes read from remote backends
`lmcache:num_remote_write_requests`	Counter	Total number of write requests to remote backends
`lmcache:num_remote_write_bytes`	Counter	Total number of bytes written to remote backends
`lmcache:remote_time_to_get`	Histogram	Time taken to get data from remote backends (milliseconds)
`lmcache:remote_time_to_put`	Histogram	Time taken to put data to remote backends (milliseconds)
`lmcache:remote_time_to_get_sync`	Histogram	Time taken to get data from remote backends synchronously (milliseconds)
Network Monitoring Metrics
`lmcache:remote_ping_latency`	Gauge	Latest ping latency to remote backends (milliseconds)
`lmcache:remote_ping_errors`	Counter	Number of ping errors to remote backends
`lmcache:remote_ping_successes`	Counter	Number of ping successes to remote backends
`lmcache:remote_ping_error_code`	Gauge	Latest ping error code to remote backends
Local CPU Backend Metrics
`lmcache:local_cpu_evict_count`	Counter	Total number of evictions in local CPU backend
`lmcache:local_cpu_evict_keys_count`	Counter	Total number of evicted keys in local CPU backend
`lmcache:local_cpu_evict_failed_count`	Counter	Total number of failed evictions in local CPU backend
`lmcache:local_cpu_hot_cache_count`	Gauge	The size of the hot cache
`lmcache:local_cpu_keys_in_request_count`	Gauge	The size of the keys in request
Memory Management Metrics
`lmcache:active_memory_objs_count`	Gauge	The number of active memory objects
`lmcache:pinned_memory_objs_count`	Gauge	The number of pinned memory objects