Docker deployment#

Running the container image#

You can run the LMCache integrated with vLLM image using Docker as follows:

IMAGE=<IMAGE_NAME>:<TAG>
docker run --runtime nvidia --gpus all \
    --env "HF_TOKEN=<REPLACE_WITH_YOUR_HF_TOKEN>" \
    --env "LMCACHE_CHUNK_SIZE=256" \
    --env "LMCACHE_LOCAL_CPU=True" \
    --env "LMCACHE_MAX_LOCAL_CPU_SIZE=5" \
    --volume ~/.cache/huggingface:/root/.cache/huggingface \
    --network host \
    $IMAGE \
    meta-llama/Llama-3.1-8B-Instruct --kv-transfer-config \
    '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}'

The image name and tag can be found on DockerHub - LMCache/vllm-openai. See example run file in docker for more details.

Note

DockerHub contains the following image types:

  • Nightly build images of LMCache and vLLM latest code (e.g. tagged with latest-nightly and nightly-<date>)

  • Images of stable releases of LMCache and vLLM (tagged with v0.x.x, the exact version of vllm a version of lmcache was built with can be discovered by consulting the compatibility matrix inside of installation)

  • Lightweight image that cannot run PD disaggregation (tagged with lightweight)