Installation Guide#

LMCache is a Python library that also contains pre-compiled C++ and CUDA (12.1) binaries.

Requirements#

  • OS: Linux

  • Python: 3.10 or higher

  • CUDA: 12.1

Note

LMCache requires CUDA 12.1. You can check nvcc --version to see if you loaded CUDA 12. Following, please add the following to your ~/.bashrc file:

cuda_version=12.1
export CUDA_HOME=/usr/local/cuda-${cuda_version}
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export PATH=$CUDA_HOME/bin:$PATH

Install from source (v1)#

You can install the latest code from the GitHub repository:

# vLLM version: 0.7.4.dev160+g28943d36
# NOTE: Run the below script in a virtual environment to avoid mess up the default env
$ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
$ git clone https://github.com/LMCache/LMCache.git
$ cd LMCache
$ pip install -e .

Note

For LMCache v1, LMCACHE_USE_EXPERIMENTAL=True is required to use the experimental features. The relevant source code is in the lmcache/experimental directory in the dev branch of the LMCache repository. Source installation is the same for v0 and v1 but v0 doesn’t require LMCACHE_USE_EXPERIMENTAL=True.

Note

For LMCache v1, please refer to the examples in the LMCache v1 section. LMCache v1 can be directly run with the vllm serve command.

Install from source (v0)#

You can install the latest code from the GitHub repository:

# Install vLLM version
$ pip install vllm==0.6.2.3

# Clone and install LMCache
$ git clone [email protected]:LMCache/LMCache.git
$ cd LMCache
$ pip install -e .
$ cd ..

# Clone and install LMCache-vLLM
$ git clone git@github:LMCache/lmcache-vllm.git
$ cd lmcache-vllm
$ pip install -e .
$ cd ..

Version Compatibility Matrix#

LMCache

LMCache_vLLM

vLLM

v1

N/A

0.7.3

0.1.4 (v0)

0.6.2.3

0.6.2

0.1.3 (v0)

0.6.2.2

0.6.1.post2

Install pip released versions (v0)#

You can install LMCache using pip:

$ # (Recommended) Create a new conda environment.
$ conda create -n venv python=3.10 -y
$ conda activate venv

$ # Install vLLM with CUDA 12.1.
$ pip install lmcache==0.1.4 lmcache_vllm==0.6.2.3

Note

Although we recommend using conda to create and manage Python environments, it is highly recommended to use pip to install LMCache. This is because pip can install torch with separate library packages like NCCL, while conda installs torch with statically linked NCCL. This can cause issues when vLLM tries to use NCCL. As LMCache depends on vLLM as a backend, it is necessary to install vLLM correctly.

Note

pip install for LMCache v1 is not available yet (will be released soon). Please install LMCache v1 from source for now.