Metrics Reference#

LMCache provides comprehensive metrics via Prometheus to help you monitor performance, cache efficiency, and system health. These metrics are exposed via the vLLM /metrics endpoint when LMCache is integrated with vLLM, or via the LMCache internal API server.

Available Metrics#

The following tables list all available LMCache metrics organized by category.

Core Request Metrics#

Core Request Metrics#
Metric Name	Type	Description
`lmcache:num_retrieve_requests`	Counter	Total number of retrieve requests sent to LMCache.
`lmcache:num_store_requests`	Counter	Total number of store requests sent to LMCache.
`lmcache:num_lookup_requests`	Counter	Total number of lookup requests sent to LMCache.

Token Metrics#

Token Metrics#
Metric Name	Type	Description
`lmcache:num_requested_tokens`	Counter	Total number of tokens requested for retrieval.
`lmcache:num_hit_tokens`	Counter	Total number of tokens hit in LMCache during retrieval.
`lmcache:num_stored_tokens`	Counter	Total number of tokens stored in LMCache.
`lmcache:num_lookup_tokens`	Counter	Total number of tokens requested in lookup operations.
`lmcache:num_lookup_hits`	Counter	Total number of tokens hit in lookup operations.
`lmcache:num_vllm_hit_tokens`	Counter	Number of hit tokens in vLLM.
`lmcache:num_prompt_tokens`	Counter	Number of prompt tokens in LMCache.

Hit Rate Metrics#

Hit Rate Metrics#
Metric Name	Type	Description
`lmcache:retrieve_hit_rate`	Gauge	The hit rate for retrieve requests since last log.
`lmcache:lookup_hit_rate`	Gauge	The hit rate for lookup requests since last log.
`lmcache:request_cache_hit_rate`	Histogram	Distribution of hit rates per request.
`lmcache:lookup_0_hit_requests`	Counter	Total number of lookup requests with zero hits.

Performance & Latency Metrics#

Performance & Latency Metrics#
Metric Name	Type	Description
`lmcache:time_to_retrieve`	Histogram	Time taken to retrieve from the cache (seconds).
`lmcache:time_to_store`	Histogram	Time taken to store to the cache (seconds).
`lmcache:time_to_lookup`	Histogram	Time taken to perform a lookup in the cache (seconds).
`lmcache:retrieve_speed`	Histogram	Retrieval speed (tokens per second).
`lmcache:store_speed`	Histogram	Storage speed (tokens per second).
`lmcache:num_slow_retrieval_by_time`	Counter	Total number of slow retrievals exceeding the time threshold.
`lmcache:num_slow_retrieval_by_speed`	Counter	Total number of slow retrievals below the speed threshold.

Detailed Profiling Metrics#

Profiling Metrics#
Metric Name	Type	Description
`lmcache:retrieve_process_tokens_time`	Histogram	Time to process tokens in retrieve (seconds).
`lmcache:retrieve_broadcast_time`	Histogram	Time to broadcast memory objects in retrieve (seconds).
`lmcache:retrieve_to_gpu_time`	Histogram	Time to move data to GPU in retrieve (seconds).
`lmcache:store_process_tokens_time`	Histogram	Time to process tokens in store (seconds).
`lmcache:store_from_gpu_time`	Histogram	Time to move data from GPU in store (seconds).
`lmcache:store_put_time`	Histogram	Time to put data to storage in store (seconds).
`lmcache:remote_backend_batched_get_blocking_time`	Histogram	Time spent waiting for data from remote backend (seconds).
`lmcache:instrumented_connector_batched_get_time`	Histogram	Time spent in the connector layer (seconds).

Cache Usage & Lifecycle Metrics#

Cache Usage Metrics#
Metric Name	Type	Description
`lmcache:local_cache_usage`	Gauge	Local cache usage in bytes.
`lmcache:remote_cache_usage`	Gauge	Remote cache usage in bytes.
`lmcache:local_storage_usage`	Gauge	Local storage usage in bytes.
`lmcache:request_cache_lifespan`	Histogram	Distribution of request cache lifespan in minutes.

Remote Backend & Network Metrics#

Remote Backend Metrics#
Metric Name	Type	Description
`lmcache:num_remote_read_requests`	Counter	Total number of read requests to remote backends.
`lmcache:num_remote_read_bytes`	Counter	Total number of bytes read from remote backends.
`lmcache:num_remote_write_requests`	Counter	Total number of write requests to remote backends.
`lmcache:num_remote_write_bytes`	Counter	Total number of bytes written to remote backends.
`lmcache:remote_time_to_get`	Histogram	Time taken to get data from remote backends (ms).
`lmcache:remote_time_to_put`	Histogram	Time taken to put data to remote backends (ms).
`lmcache:remote_time_to_get_sync`	Histogram	Time taken to get data from remote backends synchronously (ms).
`lmcache:remote_ping_latency`	Gauge	Latest ping latency to remote backends (ms).
`lmcache:remote_ping_errors`	Counter	Total number of ping errors to remote backends.
`lmcache:remote_ping_successes`	Counter	Total number of successful pings to remote backends.
`lmcache:remote_ping_error_code`	Gauge	Latest ping error code to remote backends.

Local CPU Backend Metrics#

Local CPU Backend Metrics#
Metric Name	Type	Description
`lmcache:local_cpu_evict_count`	Counter	Total number of evictions in local CPU backend.
`lmcache:local_cpu_evict_keys_count`	Counter	Total number of evicted keys in local CPU backend.
`lmcache:local_cpu_evict_failed_count`	Counter	Total number of failed evictions in local CPU backend.
`lmcache:local_cpu_hot_cache_count`	Gauge	Current number of items in the hot cache.
`lmcache:local_cpu_keys_in_request_count`	Gauge	Current number of keys being processed in requests.

Memory Management Metrics#

Memory Management Metrics#
Metric Name	Type	Description
`lmcache:active_memory_objs_count`	Gauge	The number of currently active memory objects.
`lmcache:pinned_memory_objs_count`	Gauge	The number of currently pinned memory objects.
`lmcache:forced_unpin_count`	Counter	Total number of forced unpins due to timeout.
`lmcache:pin_monitor_pinned_objects_count`	Gauge	The number of pinned objects tracked by the PinMonitor.

P2P Transfer Metrics#

P2P Transfer Metrics#
Metric Name	Type	Description
`lmcache:num_p2p_requests`	Counter	Total number of P2P transfer requests.
`lmcache:num_p2p_transferred_tokens`	Counter	Total number of tokens transferred via P2P.
`lmcache:p2p_time_to_transfer`	Histogram	Time taken for P2P transfers (seconds).
`lmcache:p2p_transfer_speed`	Histogram	P2P transfer speed (tokens per second).

Health & Internal System Metrics#

Health & Internal Metrics#
Metric Name	Type	Description
`lmcache:lmcache_is_healthy`	Gauge	Overall health status of LMCache (1 = healthy, 0 = unhealthy).
`lmcache:interval_get_blocking_failed_count`	Gauge	Number of failed blocking get operations in the current interval.
`lmcache:kv_msg_queue_size`	Gauge	Size of the KV message queue in the BatchedMessageSender.
`lmcache:remote_put_task_num`	Gauge	Number of pending remote put tasks.
`lmcache:storage_events_ongoing_count`	Gauge	Number of storage events currently in progress.
`lmcache:storage_events_done_count`	Gauge	Number of storage events completed successfully.
`lmcache:storage_events_not_found_count`	Gauge	Number of storage events where the requested data was not found.

Chunk Statistics Metrics#

Chunk Statistics Metrics#
Metric Name	Type	Description
`lmcache:chunk_statistics_enabled`	Gauge	Whether chunk statistics collection is enabled (1 = enabled, 0 = disabled).
`lmcache:chunk_statistics_total_requests`	Gauge	Total number of requests processed by chunk statistics.
`lmcache:chunk_statistics_total_chunks`	Gauge	Total number of chunks processed.
`lmcache:chunk_statistics_unique_chunks`	Gauge	Estimated number of unique chunks encountered.
`lmcache:chunk_statistics_reuse_rate`	Gauge	Chunk reuse rate (0.0 to 1.0).
`lmcache:chunk_statistics_bloom_filter_size_mb`	Gauge	Memory usage of the Bloom filter in megabytes.
`lmcache:chunk_statistics_bloom_filter_fill_rate`	Gauge	Fill rate of the Bloom filter (0.0 to 1.0).
`lmcache:chunk_statistics_file_count`	Gauge	Number of files created when using the `file_hash` strategy.
`lmcache:chunk_statistics_current_file_size`	Gauge	Current size of the active statistics file in bytes.

Connector Metrics#

Connector Metrics#
Metric Name	Type	Description
`lmcache:scheduler_unfinished_requests_count`	Gauge	Current count of unfinished requests in the scheduler.
`lmcache:connector_load_specs_count`	Gauge	Number of load specifications currently in the connector.
`lmcache:connector_request_trackers_count`	Gauge	Number of active request trackers in the connector.
`lmcache:connector_kv_caches_count`	Gauge	Number of KV caches currently managed by the connector.
`lmcache:connector_layerwise_retrievers_count`	Gauge	Number of layer-wise retrievers active in the connector.
`lmcache:connector_invalid_block_ids_count`	Gauge	Number of invalid block IDs encountered by the connector.
`lmcache:connector_requests_priority_count`	Gauge	Number of requests prioritized by the connector.