Using NIXL#

NIXL (NVIDIA Inference Xfer Library) is a high-performance library designed for accelerating point to point communications in AI inference frameworks. It provides an abstraction over various types of memory (CPU and GPU) and storage through a modular plug-in architecture, enabling efficient data transfer and coordination between different components of the inference pipeline.

LMCache supports using NIXL as the underlying communication library for prefill-decode disaggregation.

For detailed installation instructions of LMCache with NIXL, please refer to our installation guide.

Examples#