Metrics Reference#
LMCache provides comprehensive metrics via Prometheus to help you monitor performance, cache efficiency, and system health. These metrics are exposed via the vLLM /metrics endpoint when LMCache is integrated with vLLM, or via the LMCache internal API server.
Available Metrics#
The following tables list all available LMCache metrics organized by category.
Core Request Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Counter |
Total number of retrieve requests sent to LMCache. |
|
Counter |
Total number of store requests sent to LMCache. |
|
Counter |
Total number of lookup requests sent to LMCache. |
Token Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Counter |
Total number of tokens requested for retrieval. |
|
Counter |
Total number of tokens hit in LMCache during retrieval. |
|
Counter |
Total number of tokens stored in LMCache. |
|
Counter |
Total number of tokens requested in lookup operations. |
|
Counter |
Total number of tokens hit in lookup operations. |
|
Counter |
Number of hit tokens in vLLM. |
|
Counter |
Number of prompt tokens in LMCache. |
Hit Rate Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Gauge |
The hit rate for retrieve requests since last log. |
|
Gauge |
The hit rate for lookup requests since last log. |
|
Histogram |
Distribution of hit rates per request. |
|
Counter |
Total number of lookup requests with zero hits. |
Performance & Latency Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Histogram |
Time taken to retrieve from the cache (seconds). |
|
Histogram |
Time taken to store to the cache (seconds). |
|
Histogram |
Time taken to perform a lookup in the cache (seconds). |
|
Histogram |
Retrieval speed (tokens per second). |
|
Histogram |
Storage speed (tokens per second). |
|
Counter |
Total number of slow retrievals exceeding the time threshold. |
|
Counter |
Total number of slow retrievals below the speed threshold. |
Detailed Profiling Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Histogram |
Time to process tokens in retrieve (seconds). |
|
Histogram |
Time to broadcast memory objects in retrieve (seconds). |
|
Histogram |
Time to move data to GPU in retrieve (seconds). |
|
Histogram |
Time to process tokens in store (seconds). |
|
Histogram |
Time to move data from GPU in store (seconds). |
|
Histogram |
Time to put data to storage in store (seconds). |
|
Histogram |
Time spent waiting for data from remote backend (seconds). |
|
Histogram |
Time spent in the connector layer (seconds). |
Cache Usage & Lifecycle Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Gauge |
Local cache usage in bytes. |
|
Gauge |
Remote cache usage in bytes. |
|
Gauge |
Local storage usage in bytes. |
|
Histogram |
Distribution of request cache lifespan in minutes. |
Remote Backend & Network Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Counter |
Total number of read requests to remote backends. |
|
Counter |
Total number of bytes read from remote backends. |
|
Counter |
Total number of write requests to remote backends. |
|
Counter |
Total number of bytes written to remote backends. |
|
Histogram |
Time taken to get data from remote backends (ms). |
|
Histogram |
Time taken to put data to remote backends (ms). |
|
Histogram |
Time taken to get data from remote backends synchronously (ms). |
|
Gauge |
Latest ping latency to remote backends (ms). |
|
Counter |
Total number of ping errors to remote backends. |
|
Counter |
Total number of successful pings to remote backends. |
|
Gauge |
Latest ping error code to remote backends. |
Local CPU Backend Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Counter |
Total number of evictions in local CPU backend. |
|
Counter |
Total number of evicted keys in local CPU backend. |
|
Counter |
Total number of failed evictions in local CPU backend. |
|
Gauge |
Current number of items in the hot cache. |
|
Gauge |
Current number of keys being processed in requests. |
Memory Management Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Gauge |
The number of currently active memory objects. |
|
Gauge |
The number of currently pinned memory objects. |
|
Counter |
Total number of forced unpins due to timeout. |
|
Gauge |
The number of pinned objects tracked by the PinMonitor. |
P2P Transfer Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Counter |
Total number of P2P transfer requests. |
|
Counter |
Total number of tokens transferred via P2P. |
|
Histogram |
Time taken for P2P transfers (seconds). |
|
Histogram |
P2P transfer speed (tokens per second). |
Health & Internal System Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Gauge |
Overall health status of LMCache (1 = healthy, 0 = unhealthy). |
|
Gauge |
Number of failed blocking get operations in the current interval. |
|
Gauge |
Size of the KV message queue in the BatchedMessageSender. |
|
Gauge |
Number of pending remote put tasks. |
|
Gauge |
Number of storage events currently in progress. |
|
Gauge |
Number of storage events completed successfully. |
|
Gauge |
Number of storage events where the requested data was not found. |
Chunk Statistics Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Gauge |
Whether chunk statistics collection is enabled (1 = enabled, 0 = disabled). |
|
Gauge |
Total number of requests processed by chunk statistics. |
|
Gauge |
Total number of chunks processed. |
|
Gauge |
Estimated number of unique chunks encountered. |
|
Gauge |
Chunk reuse rate (0.0 to 1.0). |
|
Gauge |
Memory usage of the Bloom filter in megabytes. |
|
Gauge |
Fill rate of the Bloom filter (0.0 to 1.0). |
|
Gauge |
Number of files created when using the |
|
Gauge |
Current size of the active statistics file in bytes. |
Connector Metrics#
Metric Name |
Type |
Description |
|---|---|---|
|
Gauge |
Current count of unfinished requests in the scheduler. |
|
Gauge |
Number of load specifications currently in the connector. |
|
Gauge |
Number of active request trackers in the connector. |
|
Gauge |
Number of KV caches currently managed by the connector. |
|
Gauge |
Number of layer-wise retrievers active in the connector. |
|
Gauge |
Number of invalid block IDs encountered by the connector. |
|
Gauge |
Number of requests prioritized by the connector. |