MiniMaxM2ForCausalLM#

Validated models#

Engine documentation: MiniMax-M2 in vLLM supported models (architecture MiniMaxM2ForCausalLM).

Status: Validated with LMCache.

Start the LMCache MP server:

lmcache server --l1-size-gb 100 --eviction-policy LRU

Start vLLM with the LMCache MP connector:

vllm serve MiniMaxAI/MiniMax-M2 \
    --tensor-parallel-size 8 \
    --trust-remote-code \
    --kv-transfer-config \
    '{"kv_connector":"LMCacheMPConnector", "kv_role":"kv_both"}'

Adjust --tensor-parallel-size to match your hardware. For the generic LMCache + vLLM wiring (ports, remote hosts, in-process mode), see Quick Start.

Engine documentation: MiniMax-M2 SGLang cookbook, MiniMax M2.5/M2.1/M2 usage guide.

Status: Not validated with LMCache.

Status: Not supported. LMCache TRT-LLM integration is in progress.

CacheBlend support#

Compression support#

Method

Status

Notes

CacheGen

Not validated

Caveats#

None known.