Skip to content
LMCache
⌘ K
LMCache
  • Getting Started
    • Overview
    • Installation
    • Quickstart
    • Configuration Reference
    • Benchmarking
    • KV Cache Size Calculator
  • KV Cache Operations
    • CLI Reference
      • lmcache server
      • lmcache coordinator
      • lmcache describe
      • lmcache ping
      • lmcache query
      • lmcache bench
      • lmcache kvcache
      • lmcache quota
      • lmcache trace
      • lmcache tool
    • HTTP API
    • Frontend Dashboard
  • Recipes
    • Uniform Attention Models
      • MiniMaxM2ForCausalLM
      • MistralForCausalLM
      • Qwen3MoeForCausalLM
      • LlamaForCausalLM
      • Phi3ForCausalLM
      • MixtralForCausalLM
    • Hybrid Attention Models
      • Gemma3ForConditionalGeneration
      • Gemma 4
      • GptOssForCausalLM
      • Qwen3_5ForConditionalGeneration
      • DeepSeek-V4-Flash
  • Secondary KV Storage
    • Supported Backends
      • NIXL
      • File & Block
        • FileSystem
        • FS (native)
        • Raw Block (Rust)
      • Remote & Distributed
        • S3
        • HF Bucket
        • Mooncake Store
        • RESP (Redis/Valkey)
        • Aerospike
      • DAX
      • Mock
    • KV Cache Compression
      • CacheGen
  • Distributed KV Cache
    • Disaggregated Prefill
    • P2P KV Cache Sharing
    • Multi-Server Coordination
    • KV Cache Management
  • Use LMCache in Production
    • Deployment Guide
    • Kubernetes Deployment
    • Kubernetes Operator
    • Runtime Plugins
    • Dynamo Integration
  • Observability
    • Metrics
    • Logging
    • Tracing
  • Community
    • Community meetings
    • Blogs
  • KV Cache Optimizations
    • CacheBlend
    • Segmented Prefill
  • Developer Guide
    • Contributing Guide
    • Adding Native Backends
    • Extending the CLI
    • Extending the HTTP API
  • Non-KV Caching
    • Encodings
    • Hidden States
  • Legacy (In-Process Mode)
    • More Examples
    • Using Different Storage Backends
    • Async Loading
    • Using Different Caching Policies
    • P2P KV Cache Sharing
    • Encoder caching
    • Using NIXL
    • Using shared storage
    • Compression
    • Layerwise KV Transfer
    • LMCache Controller
    • Blending
    • KV Caching for Multimodal Models with vLLM
    • Adding new storage backends
    • vLLM Dynamic Connector
    • Configuring LMCache
    • Internal API Server
    • Controller WebUI
    • Observability
    • Docker deployment
    • Performance Tuning
    • KV Cache Events
    • Architecture Overview
    • Integration
    • Usage Data Module
    • Basic Check Tool
    • Storage Plugins
    • Remote Storage Plugins
LMCache
/
Use LMCache in Production

Use LMCache in Production#

Deploying, scaling, and operating LMCache in production.

  • Deployment Guide
    • Docker
    • Kubernetes
    • Production Best Practices
    • Transfer Mode (--supported-transfer-mode, --shm-name)
  • Kubernetes Deployment
  • Kubernetes Operator
    • Why Use the Operator
    • Prerequisites
    • Installing the Operator
    • Deploying an LMCacheEngine
    • Connecting vLLM
    • Verifying the Deployment
    • CRD Spec Reference
    • Examples
    • CacheBlend
    • Operator vs Manual Deployment
    • Security Considerations
    • Development
  • Runtime Plugins
    • Key Use Cases
    • Configuration
    • Runtime Plugin Naming Convention
    • Execution Model
    • Example Runtime Plugins
    • Best Practices
  • Dynamo Integration
KV Cache Management
Deployment Guide

© 2024, The LMCache Team Built with Sphinx 8.2.3