Installation#

Prerequisites: Linux · Python 3.9–3.13 · NVIDIA GPU (compute 7.0+) · CUDA 12.1+ · uv

Install LMCache#

uv venv --python 3.12
source .venv/bin/activate
uv pip install lmcache

Important

You’re all set! You can now start using LMCache. For hands-on guides and more usage examples, see the More Examples section.

The CUDA 12.9 wheel is published to a dedicated GitHub Release rather than PyPI.

uv venv --python 3.12
source .venv/bin/activate
VERSION=0.4.3  # replace with target release
uv pip install lmcache==${VERSION} \
    --extra-index-url https://download.pytorch.org/whl/cu129 \
    --find-links https://github.com/LMCache/LMCache/releases/expanded_assets/v${VERSION}-cu129 \
    --index-strategy unsafe-best-match

Note

--extra-index-url https://download.pytorch.org/whl/cu129 ensures the CUDA 12.9 build of PyTorch is resolved. Without it, pip may select a mismatched CUDA variant.

Nightly wheels are built from the latest dev branch each day at 07:30 UTC and published to GitHub Releases. No version pinning required — --pre picks the latest nightly automatically.

uv venv --python 3.12
source .venv/bin/activate
uv pip install lmcache --pre \
    --extra-index-url https://download.pytorch.org/whl/cu130 \
    --find-links https://github.com/LMCache/LMCache/releases/expanded_assets/nightly \
    --index-strategy unsafe-best-match
uv venv --python 3.12
source .venv/bin/activate
uv pip install lmcache --pre \
    --extra-index-url https://download.pytorch.org/whl/cu129 \
    --find-links https://github.com/LMCache/LMCache/releases/expanded_assets/nightly-cu129 \
    --index-strategy unsafe-best-match

--no-build-isolation ensures the kernels are compiled against the same torch already installed in your environment, preventing undefined symbol errors at runtime.

git clone https://github.com/LMCache/LMCache.git
cd LMCache

uv venv --python 3.12
source .venv/bin/activate

uv pip install -r requirements/build.txt
uv pip install vllm  # pulls in required torch version (cu13)
uv pip install -e . --no-build-isolation
git clone https://github.com/LMCache/LMCache.git
cd LMCache

uv venv --python 3.12
source .venv/bin/activate

uv pip install -r requirements/build.txt
# Pin vLLM (and torch) to the cu12.9 wheel index so the local
# CUDA 12 toolchain matches what the extensions are built against.
uv pip install vllm \
    --extra-index-url https://download.pytorch.org/whl/cu129 \
    --index-strategy unsafe-best-match
# LMCACHE_CUDA_MAJOR=12 makes setup.py pick cupy-cuda12x / nixl-cu12
# for install_requires instead of the cu13 defaults.
LMCACHE_CUDA_MAJOR=12 \
    uv pip install -e . --no-build-isolation
git clone https://github.com/LMCache/LMCache.git
cd LMCache

uv venv --python 3.12
source .venv/bin/activate

# Need to install these packages manually to avoid build isolation
uv pip install -r requirements/build.txt

# Install torch from the ROCm wheel index
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm7.0

# Build LMCache. BUILD_WITH_HIP=1 makes setup.py pick cupy-rocm-7-0 automatically.
# PYTORCH_ROCM_ARCH selects the target GPU(s):
#   gfx942  -> MI300X / MI325X
#   gfx950  -> MI350X / MI355X
# Comma-separate to build a fat binary for multiple archs.
PYTORCH_ROCM_ARCH="gfx942,gfx950" \
TORCH_DONT_CHECK_COMPILER_ABI=1 \
CXX=hipcc \
BUILD_WITH_HIP=1 \
uv pip install -e . --no-build-isolation
git clone https://github.com/LMCache/LMCache.git
cd LMCache

uv venv --python 3.12
source .venv/bin/activate

# Need to install these packages manually to avoid build isolation
uv pip install -r requirements/build.txt

# Build LMCache with SYCL backend.
BUILD_WITH_SYCL=1 uv pip install --no-build-isolation -e .
docker pull lmcache/vllm-openai
docker pull lmcache/vllm-openai:latest-cu129
docker pull lmcache/vllm-openai:latest-nightly
docker pull lmcache/vllm-openai:latest-nightly-cu129
docker pull rocm/vllm-dev:nightly_0624_rc2_0624_rc2_20250620
docker pull intel/vllm:0.17.0-xpu

See Docker deployment for running the container and ROCm images.

Lightweight CLI-only package for querying or benchmarking a remote LMCache server. No CUDA required, works on any OS.

pip install lmcache-cli

Note

lmcache-cli and lmcache ship the same lmcache CLI command. Do not install both in the same environment.

Build the Docker Image#

Instead of pulling a prebuilt image, you can build the LMCache (integrated with vLLM) image yourself from the provided Dockerfile, located in docker/.

From the root of the LMCache repository:

docker build --tag <IMAGE_NAME>:<TAG> --target image-build --file docker/Dockerfile .

Replace <IMAGE_NAME> and <TAG> with your desired image name and tag. See the example build file in docker/ for an explanation of all build arguments.

Verify Installation#

python -c "import lmcache.c_ops"