Docker Installation#
LMCache offers an official Docker image for deployment (for LMCache v1). The image is available on Docker Hub at lmcache/vllm-openai .
Note
Make sure you have Docker installed on your machine. You can install Docker from here.
Note
The Docker image lmcache/lmcache_vllm for LMCache v0 is no longer maintained.
Pulling the Docker Image:#
To get started, pull the official Docker image with the following command:
docker pull lmcache/vllm-openai
Running the Docker Container#
Run the Docker Command:
IMAGE=<IMAGE_NAME>:<TAG>
docker run --runtime nvidia --gpus all \
--env "HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN>" \
--env "LMCACHE_USE_EXPERIMENTAL=True" \
--env "chunk_size=256" \
--env "local_cpu=True" \
--env "max_local_cpu_size=5" \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--network host \
--entrypoint "/usr/local/bin/vllm" \
$IMAGE \
serve mistralai/Mistral-7B-Instruct-v0.2 --kv-transfer-config \
'{"kv_connector":"LMCacheConnector","kv_role":"kv_both"}' \
--enable-chunked-prefill false
Save the above command in a file named run.sh
and run the following command:
chmod +x run.sh
./run.sh
Testing the Docker Container#
To verify the setup, you can test it using the following curl
command:
curl -X 'POST' \
'http://127.0.0.1:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta-llama/Llama-3.2-1B",
"messages": [
{"role": "system", "content": "You are a helpful AI coding assistant."},
{"role": "user", "content": "Write a segment tree implementation in python"}
],
"max_tokens": 150
}'
Building Docker from Source#
Note
This section is for users who want to build the Docker image from source. For this please visit the link here LMCache docker.