HF Bucket#
An L2 adapter that stores KV cache objects in a Hugging Face Bucket using the
huggingface_hub bucket APIs. Blocking Hub calls run on a bounded thread
pool driven by an asyncio loop on a daemon thread, so the L2 controller thread
is never blocked on network I/O.
Object names are derived from the MP ObjectKey as
<model>@<kv_rank_hex>@<chunk_hash_hex>[@<cache_salt>] and then encoded with
the standard HFBucket object-name encoding plus the optional bucket prefix.
Because Hugging Face batch writes are not transactional, a store task that
partially fails reconciles backend metadata so that any objects that actually
landed are still counted for usage accounting and later deletion.
This is a persistent remote backend best suited to warm and cold KV cache tiers; prefer a lower-latency local adapter for the hottest cache tier.
Required fields:
bucket_handle: Bucket location in the formhf://buckets/<namespace>/<bucket>[/<prefix>].
Optional fields:
token_env(string, default"HF_TOKEN"): Environment variable used to resolve the Hugging Face access token.token(string): Direct token fallback used whentoken_envis unset.create_bucket_if_missing(bool, defaultfalse): Create the bucket lazily on the first store instead of requiring it to exist.download_tmp_dir(string): Root directory for temporary load downloads.metadata_cache_ttl_secs(float, default30.0): TTL for the path-size metadata cache that backs lookups and usage accounting.num_workers(int, default4): Number of worker threads for blocking Hugging Face Hub API calls.max_capacity_gb(float, default0.0): Aggregate capacity used byget_usage(). A value of0disables aggregate eviction.eviction(dict): Optional eviction policy, seeL2AdapterConfigBase.
Configuration examples:
# Minimal: use an existing bucket with a token from $HF_TOKEN
--l2-adapter '{"type": "hfbucket", "bucket_handle": "hf://buckets/my-org/lmcache-kv/prod"}'
# Create the bucket on first store and bound the worker pool
--l2-adapter '{"type": "hfbucket", "bucket_handle": "hf://buckets/my-org/lmcache-kv/prod", "create_bucket_if_missing": true, "num_workers": 8}'
# Enable aggregate eviction with a capacity cap
--l2-adapter '{"type": "hfbucket", "bucket_handle": "hf://buckets/my-org/lmcache-kv/prod", "max_capacity_gb": 50, "eviction": {"eviction_policy": "LRU", "trigger_watermark": 0.9, "eviction_ratio": 0.1}}'