Configuring the Internal API Server#
The internal_api_server
provides APIs for managing the LMCache engine. Below are the configuration options and usage examples.
Configuration Parameters#
The following parameters can be configured in the YAML file:
# Enable/disable the internal API server
internal_api_server_enabled: True
# Base port for the API server
# actual_port = internal_api_server_port_start + index
# Scheduler → 6999 + 0 = 6999
# Worker 0 → 6999 + 1 = 7000
internal_api_server_port_start: 6999
# List of scheduler/worker indices: 0 for scheduler, 1 for worker 0, 2 for worker 1, etc.
internal_api_server_include_index_list: [0, 1]
# Socket path prefix for the API server. If configured, the server will use a Unix socket instead of listening on a port.
internal_api_server_socket_path_prefix: "/tmp/lmcache_internal_api_server/socket"
# Actual socket files will be:
# /tmp/lmcache_internal_api_server/socket_6999 (scheduler)
# /tmp/lmcache_internal_api_server/socket_7000 (worker 0)
Testing the Server#
You can test the server by querying the relevant endpoints.
/metrics endpoint for metrics:
curl http://localhost:7000/metrics
/conf endpoint for configuration:
curl http://localhost:7000/conf
/meta endpoint for metadata:
curl http://localhost:7000/meta
/threads endpoint for threads:
curl http://localhost:7000/threads
/loglevel endpoint for log level:
# Get all loggers info
curl http://localhost:7000/loglevel
# Get specified logger level
curl http://localhost:7000/loglevel?logger_name=lmcache.v1.cache_engine
# Set specified logger level
curl http://localhost:7000/loglevel?logger_name=lmcache.v1.cache_engine&level=DEBUG
/run_script endpoint for running script:
curl -X POST http://localhost:7000/run_script \
-F "script=@/Users/msy/scratch.py"
{'is_first_rank': True, 'model_version': (27, 1, 64, 1, 576), 'LocalCPUBackend.use_hot': False}
scratch.py:
# Get cache_engine from app.state
lmcache_engine = app.state.lmcache_adapter.lmcache_engine
# Print the worker ID and model name
print(f"Worker ID: {lmcache_engine.metadata.worker_id}")
print(f"Model name: {lmcache_engine.metadata.model_name}")
# Set LocalCPUBackend.use_hot to False or True
lmcache_engine.storage_manager.storage_backends["LocalCPUBackend"].use_hot = False
# return the output contents
result = {
"is_first_rank": lmcache_engine.metadata.is_first_rank(),
"model_version": lmcache_engine.metadata.kv_shape,
"LocalCPUBackend.use_hot": lmcache_engine.storage_manager.storage_backends["LocalCPUBackend"].use_hot
}
How to extend the Internal API Server#
You can extend the internal_api_server
by adding new endpoint files to the lmcache/v1/internal_api_server/ directory.
Ensure your new file name ends with _api.py. Additionally, you need to define a router = APIRouter() in your file and add your endpoints to it.