Skip to content
LMCache
⌘ K
LMCache
  • Getting Started
    • Overview
    • Installation
    • Quickstart
    • Configuration Reference
    • Benchmarking
    • KV Cache Size Calculator
  • KV Cache Operations
    • CLI Reference
      • lmcache server
      • lmcache coordinator
      • lmcache describe
      • lmcache ping
      • lmcache query
      • lmcache bench
      • lmcache kvcache
      • lmcache quota
      • lmcache trace
      • lmcache tool
    • HTTP API
    • Frontend Dashboard
  • Recipes
    • Uniform Attention Models
      • MiniMax M2 series
      • Mistral / Devstral
      • Qwen3 MoE
      • Llama
      • Phi-3 / Phi-4
      • Mixtral
    • Hybrid Attention Models
      • Gemma 3
      • Gemma 4
      • gpt-oss
      • Qwen3.5 / Qwen3.6 series
      • DeepSeek-V4-Flash
      • GLM 5.1/5.2
      • MiniMax M3
  • Secondary KV Storage
    • Supported Backends
      • NIXL
      • File & Block
        • FileSystem
        • FS (native)
        • Raw Block (Rust)
      • Remote & Distributed
        • S3
        • HF Bucket
        • Mooncake Store
        • RESP (Redis/Valkey)
        • Aerospike
      • DAX
      • Mock
      • Fault Inject
    • KV Cache Compression
      • CacheGen
  • Distributed KV Cache
    • Disaggregated Prefill
    • P2P KV Cache Sharing
    • Multi-Server Coordination
    • KV Cache Management
  • Use LMCache in Production
    • Deployment Guide
    • Kubernetes Deployment
    • Kubernetes Operator
    • Runtime Plugins
    • Dynamo Integration
  • Observability
    • Metrics
    • Logging
    • Tracing
  • Community
    • Community meetings
    • Blogs
  • KV Cache Optimizations
    • CacheBlend
    • Segmented Prefill
  • Developer Guide
    • Contributing Guide
    • Adding Native Backends
    • Extending the CLI
    • Extending the HTTP API
  • Non-KV Caching
    • Encodings
    • Hidden states
  • Legacy (In-Process Mode)
    • More Examples
    • Using Different Storage Backends
    • Async Loading
    • Using Different Caching Policies
    • P2P KV Cache Sharing
    • Encoder caching
    • Using NIXL
    • Using shared storage
    • Compression
    • Layerwise KV Transfer
    • LMCache Controller
    • Blending
    • KV Caching for Multimodal Models with vLLM
    • Adding new storage backends
    • vLLM Dynamic Connector
    • Configuring LMCache
    • Internal API Server
    • Controller WebUI
    • Observability
    • Docker deployment
    • Performance Tuning
    • KV Cache Events
    • Architecture Overview
    • Integration
    • Usage Data Module
    • Basic Check Tool
    • Storage Plugins
    • Remote Storage Plugins
LMCache
/
Legacy (In-Process Mode)
/
Observability

Observability#

  • Metrics by vLLM API
    • Quick Start Guide
    • Available Metrics
  • Internal API Server Metrics
    • Overview
    • Quick Start Guide
    • Port Configuration
    • Advanced Usage
  • Metrics Reference
    • Available Metrics
  • Chunk Statistics
    • Overview
    • Recording Strategies
    • Quick Start Guide
    • Configuration Options
    • Advanced Usage
    • Best Practices
    • Troubleshooting
  • Health Monitor
    • Overview
    • Architecture
    • Auto-Discovery
    • Configuration
    • How It Works
    • Built-in Health Checks
    • Prometheus Metrics
    • Error Codes
    • Extending the Health Monitor
  • Periodic Thread Monitoring API
    • Overview
    • API Endpoints
    • Thread Levels
    • IrrecoverableException Handling
    • Usage Examples
  • LMCache Frontend
    • Features
    • Installation
    • Quick Start
    • Configuration
    • Proxying Requests
    • Contributing
    • More Information
Controller WebUI
Metrics by vLLM API

© 2024, The LMCache Team Built with Sphinx 8.2.3