Using NIXL#

Warning

This page documents the behavior of LMCache’s in-process mode (deprecated). Please consider using LMCache MP mode for better feature support and performance. For the MP mode equivalent of this page, see Disaggregated Prefill.

NIXL (NVIDIA Inference Xfer Library) is a high-performance library designed for accelerating point to point communications in AI inference frameworks. It provides an abstraction over various types of memory (CPU and GPU) and storage through a modular plug-in architecture, enabling efficient data transfer and coordination between different components of the inference pipeline.

LMCache supports using NIXL as the underlying communication library for prefill-decode disaggregation.

For detailed installation instructions of LMCache with NIXL, please refer to our installation guide.

Examples#