Skip to content
LMCache
⌘ K
LMCache
  • Getting Started
    • Overview
    • Installation
    • Quickstart
    • Configuration Reference
    • Benchmarking
    • KV Cache Size Calculator
  • KV Cache Operations
    • CLI Reference
      • lmcache server
      • lmcache coordinator
      • lmcache describe
      • lmcache ping
      • lmcache query
      • lmcache bench
      • lmcache kvcache
      • lmcache quota
      • lmcache trace
      • lmcache tool
    • HTTP API
    • Frontend Dashboard
  • Recipes
    • Uniform Attention Models
      • MiniMax M2 series
      • Mistral / Devstral
      • Qwen3 MoE
      • Llama
      • Phi-3 / Phi-4
      • Mixtral
    • Hybrid Attention Models
      • Gemma 3
      • Gemma 4
      • gpt-oss
      • Qwen3.5 / Qwen3.6 series
      • DeepSeek-V4-Flash
      • GLM 5.1/5.2
  • Secondary KV Storage
    • Supported Backends
      • NIXL
      • File & Block
        • FileSystem
        • FS (native)
        • Raw Block (Rust)
      • Remote & Distributed
        • S3
        • HF Bucket
        • Mooncake Store
        • RESP (Redis/Valkey)
        • Aerospike
      • DAX
      • Mock
      • Fault Inject
    • KV Cache Compression
      • CacheGen
  • Distributed KV Cache
    • Disaggregated Prefill
    • P2P KV Cache Sharing
    • Multi-Server Coordination
    • KV Cache Management
  • Use LMCache in Production
    • Deployment Guide
    • Kubernetes Deployment
    • Kubernetes Operator
    • Runtime Plugins
    • Dynamo Integration
  • Observability
    • Metrics
    • Logging
    • Tracing
  • Community
    • Community meetings
    • Blogs
  • KV Cache Optimizations
    • CacheBlend
    • Segmented Prefill
  • Developer Guide
    • Contributing Guide
    • Adding Native Backends
    • Extending the CLI
    • Extending the HTTP API
  • Extension Guide
    • Extending the CLI
  • Non-KV Caching
    • Encodings
    • Hidden States
  • Legacy (In-Process Mode)
    • More Examples
    • Using Different Storage Backends
    • Async Loading
    • Using Different Caching Policies
    • P2P KV Cache Sharing
    • Encoder caching
    • Using NIXL
    • Using shared storage
    • Compression
    • Layerwise KV Transfer
    • LMCache Controller
    • Blending
    • KV Caching for Multimodal Models with vLLM
    • Adding new storage backends
    • vLLM Dynamic Connector
    • Configuring LMCache
    • Internal API Server
    • Controller WebUI
    • Observability
    • Docker deployment
    • Performance Tuning
    • KV Cache Events
    • Architecture Overview
    • Integration
    • Usage Data Module
    • Basic Check Tool
    • Storage Plugins
    • Remote Storage Plugins
LMCache
/
Extension Guide

Extension Guide#

This section describes how to extend LMCache by adding new CLI commands, plugins, or other components.

  • Extending the CLI
    • Architecture Overview
    • Directory Layout
    • Level 1: Adding a Top-Level Command
    • Level 2: Adding a Subcommand Group
    • Level 2: Adding a Subcommand to an Existing Group
    • Level N: Arbitrary Nesting
    • Real-World Example
    • Summary
Extending the HTTP API
Extending the CLI

© 2024, The LMCache Team Built with Sphinx 8.2.3