Welcome to LMCache!#
Supercharge Your LLM with the Fastest KV Cache Layer.
备注
We are currently in the process of upgrading our documentation to provide better guidance and examples. Some sections may be under construction. Thank you for your patience!
LMCache lets LLMs prefill each text only once. By storing the KV caches of all reusable texts, LMCache can reuse the KV caches of any reused text (not necessarily prefix) in any serving engine instance. It thus reduces prefill delay, i.e., time to first token (TTFT), as well as saves the precious GPU cycles and memory. By combining LMCache with vLLM, LMCaches achieves 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.
For more information, check out the following:
Our papers:
Documentation#
Welcome to LMCache
KV Cache offloading and sharing
Non-KV caching
Multiprocess Mode
KV Cache management
Use LMCache in production
Internal API Server
API Reference