Phi-3 / Phi-4#

Validated models#

vLLM

Engine documentation: Phi3ForCausalLM in vLLM supported models (architecture Phi3ForCausalLM).

Status: Validated with LMCache.

Start the LMCache MP server:

lmcache server --l1-size-gb 100 --eviction-policy LRU

Start vLLM with the LMCache MP connector:

Phi-4-mini-instruct (1 GPU):

vllm serve microsoft/Phi-4-mini-instruct \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser phi4_mini_json \
    --kv-transfer-config \
    '{"kv_connector":"LMCacheMPConnector", "kv_role":"kv_both"}'

Phi-3-medium-128k-instruct (1 GPU):

vllm serve microsoft/Phi-3-medium-128k-instruct \
    --trust-remote-code \
    --kv-transfer-config \
    '{"kv_connector":"LMCacheMPConnector", "kv_role":"kv_both"}'

Adjust --tensor-parallel-size to match your hardware. For the generic LMCache + vLLM wiring (ports, remote hosts), see Quickstart.

SGLang

Status: Not validated with LMCache.

TRT-LLM

Status: Supported. See Quickstart for TRT-LLM + LMCache setup.

CacheBlend support#

Compression support#

Method	Status	Notes
CacheGen	Not validated

Caveats#

None known.