Skip to content
LMCache
⌘ K
LMCache

Getting Started

  • Installation
  • Quickstart Examples
    • Example: Offload KV cache to CPU
    • Example: Share KV cache across multiple LLMs
    • Example: Disaggregated prefill
    • Example: Multimodal KV Cache Support
  • Benchmarking
  • TroubleShoot
  • FAQ

KV Cache offloading and sharing

  • Using Different Storage Backends
    • CPU RAM
    • Local storage
    • GDS Backend
    • Redis
    • InfiniStore
    • Mooncake
    • ValKey
    • Weka
    • Nixl
    • External Storage Backends
  • Using Different Caching Policies

Disaggregated prefill

  • Using NIXL
    • 1p1d
    • XpYd
  • Using shared storage

KV Cache management

  • LMCache Controller
  • Clear the KV cache
  • Compress and Decompress the KV cache
  • Check controller health
  • Lookup the KV cache
  • Move the KV cache
  • Pin the KV cache
  • Check finish of a control event

KV Cache Optimizations

  • Compression
    • CacheGen
  • Blending

Use LMCache in production

  • Docker deployment
  • Kubernetes deployment
  • Observability
    • Metrics by vLLM API
    • Internal API Server Metrics

Internal API Server

  • Configuring the Internal API Server
  • How to extend the Internal API Server

Developer Guide

  • Contributing Guide
  • Dockerfile
  • Usage Data Module
    • Usage Stats Collection
  • LMCache Plugin Framework

API Reference

  • Configuring LMCache
  • Adding new storage backends
  • vLLM Dynamic Connector
  • KV Caching for Multimodal Models with vLLM

Community

  • Community meetings
  • Blogs
LMCache
/
Observability

Observability#

  • Metrics by vLLM API
    • Quick Start Guide
    • Available Metrics
  • Internal API Server Metrics
    • Overview
    • Quick Start Guide
    • Port Configuration
    • Advanced Usage
Kubernetes deployment
Metrics by vLLM API

© 2024, The LMCache Team Built with Sphinx 8.2.3