Supported Models#

Currently, LMCache supports the following models:

To use vLLM’s offline inference with LMCache, for any model, use the required model card name as on Huggingface.

import lmcache_vllm.vllm as vllm
from lmcache_vllm.vllm import LLM

# model card (Huggingface model card format name)
model_card = "insert here"

# Load the model
model = LLM.from_pretrained(model_card)

# Use the model
model.generate("Hello, my name is", max_length=100)

Note

To use the models, you might often require setting up a Huggingface-login token, after you accept the terms and conditions of the model. To do so, you can add the following to the top of your Python script:

from huggingface_hub import login
login()

# You will now be prompted to enter your Huggingface login credentials.

For more information on Huggingface login, please refer to the Huggingface documentation.