Supported Models#
Currently, LMCache supports the following models:
To use vLLM’s offline inference with LMCache, for any model, use the required model card name as on Huggingface.
import lmcache_vllm.vllm as vllm
from lmcache_vllm.vllm import LLM
# model card (Huggingface model card format name)
model_card = "insert here"
# Load the model
model = LLM.from_pretrained(model_card)
# Use the model
model.generate("Hello, my name is", max_length=100)
Note
To use the models, you might often require setting up a Huggingface-login token, after you accept the terms and conditions of the model. To do so, you can add the following to the top of your Python script:
from huggingface_hub import login
login()
# You will now be prompted to enter your Huggingface login credentials.
For more information on Huggingface login, please refer to the Huggingface documentation.