Skip to content
LMCache
⌘ K
LMCache

Welcome to LMCache

  • Welcome to LMCache!

Getting Started

  • Installation
  • Quickstart
  • More Examples
    • Example: Offload KV cache to CPU
    • Example: Share KV cache across multiple LLMs
    • Example: Disaggregated prefill
    • Example: Multimodal KV Cache Support
  • Benchmarking
  • TroubleShoot
  • FAQ

KV Cache offloading and sharing

  • Using Different Storage Backends
    • CPU RAM
    • Local storage
    • GDS Backend
    • Redis
    • S3 Backend
    • InfiniStore
    • Mooncake
    • ValKey
    • Weka
    • Nixl
    • Configurable Storage Backends
  • Using Different Caching Policies
  • P2P KV Cache Sharing

Disaggregated prefill

  • Using NIXL
    • 1p1d
    • XpYd
  • Using shared storage

KV Cache management

  • LMCache Controller
    • Clear the KV cache
    • Compress and Decompress the KV cache
    • Check controller health
    • Lookup the KV cache
    • Move the KV cache
    • Pin the KV cache
    • Check finish of a control event

KV Cache Optimizations

  • Compression
    • CacheGen
  • Blending
  • Layerwise KV Transfer

Use LMCache in production

  • Docker deployment
  • Kubernetes deployment
  • Observability
    • Metrics by vLLM API
    • Internal API Server Metrics

Internal API Server

  • Configuring the Internal API Server
  • How to extend the Internal API Server

Developer Guide

  • Contributing Guide
  • Dockerfile
  • Architecture Overview
  • Integration
  • Extending LMCache
    • Extending LMCache: Plugin
  • Usage Data Module
    • Usage Stats Collection

API Reference

  • Configuring LMCache
  • Adding new storage backends
  • vLLM Dynamic Connector
  • KV Caching for Multimodal Models with vLLM

Community

  • Community meetings
  • Blogs
LMCache
/
Observability

Observability#

  • Metrics by vLLM API
    • Quick Start Guide
    • Available Metrics
  • Internal API Server Metrics
    • Overview
    • Quick Start Guide
    • Port Configuration
    • Advanced Usage
Kubernetes deployment
Metrics by vLLM API

© 2024, The LMCache Team Built with Sphinx 8.2.3