MiniMaxM2ForCausalLM#
Validated models#
Engine documentation:
MiniMax-M2 in vLLM supported models
(architecture MiniMaxM2ForCausalLM).
Status: Validated with LMCache.
Start the LMCache MP server:
lmcache server --l1-size-gb 100 --eviction-policy LRU
Start vLLM with the LMCache MP connector:
vllm serve MiniMaxAI/MiniMax-M2 \
--tensor-parallel-size 8 \
--trust-remote-code \
--kv-transfer-config \
'{"kv_connector":"LMCacheMPConnector", "kv_role":"kv_both"}'
Adjust --tensor-parallel-size to match your hardware. For the
generic LMCache + vLLM wiring (ports, remote hosts, in-process mode),
see Quick Start.
Engine documentation: MiniMax-M2 SGLang cookbook, MiniMax M2.5/M2.1/M2 usage guide.
Status: Not validated with LMCache.
Status: Not supported. LMCache TRT-LLM integration is in progress.
CacheBlend support#
Compression support#
Method |
Status |
Notes |
|---|---|---|
Not validated |
Caveats#
None known.