S3 Backend#
Example Configurations#
Basic S3 Configuration#
chunk_size: 256
local_cpu: False
save_unfull_chunk: False
remote_url: "s3://your-bucket-name"
remote_serde: "naive"
blocking_timeout_secs: 100
extra_config:
s3_region: "us-east-1"
s3_max_io_concurrency: 64
s3_max_inflight_reqs: 64
S3 Express One Zone#
chunk_size: 256
local_cpu: False
save_unfull_chunk: False
remote_url: "s3://{BUCKET_NAME}.s3express-{AZ_ID}.{REGION}.amazonaws.com"
remote_serde: "naive"
blocking_timeout_secs: 100
extra_config:
s3_max_io_concurrency: 64
s3_max_inflight_reqs: 64
s3_prefer_http2: True
s3_region: "{REGION}"
s3_enable_s3express: True
s3_file_prefix: "{FILE_PREFIX}"
CoreWeave (S3-compatible)#
chunk_size: 256
local_cpu: False
max_local_cpu_size: 50
save_unfull_chunk: False
enable_async_loading: True
remote_url: "s3://test-127.cwlota.com"
remote_serde: "naive"
blocking_timeout_secs: 100
extra_config:
s3_max_io_concurrency: 320
s3_max_inflight_reqs: 320
s3_prefer_http2: False
s3_region: "US-WEST-04A"
s3_enable_s3express: False
save_chunk_meta: False
s3_file_prefix: "test-2"
Note: cwlota.com is CoreWeave’s S3-compatible Cloud Storage that caches for GPU locality. Set s3_enable_s3express: False for non-AWS services.
Configuration Parameters#
remote_url: S3 bucket URL (s3://bucket-name)
save_unfull_chunk: Save partial chunks (default: True, must be False for S3)
enable_async_loading: Async loading (default: False)
blocking_timeout_secs: Timeout seconds (default: 10)
S3-Specific (in extra_config)#
s3_region: AWS region for S3 client (required)
s3_max_io_concurrency: Max concurrent I/O operations for event loop group (controls AWS CRT threading)
s3_max_inflight_reqs: Max simultaneous S3 requests (creates this many /dev/shm buffers and semaphore limit)
s3_prefer_http2: Enable HTTP/2 with ALPN negotiation ([“h2”, “http/1.1”])
s3_enable_s3express: Enable S3 Express One Zone support in AWS CRT client
s3_file_prefix: Prefix for S3 object keys (e.g., cache becomes /cache/key_name). Avoid leading/trailing slashes.
save_chunk_meta: Whether to save chunk metadata with data (set False for performance)
The effective concurrency is limited by the minimum of s3_max_io_concurrency and s3_max_inflight_reqs.
/dev/shm Configuration#
/dev/shm is used as the tmpfs that S3 can use to transfer only into RAM instead of having to touch a block device.
Memory Requirements and Configuration Calculation#
Calculate total memory needed:
# GB / token should be the aggregated size across TP workers of KV Cache size
(GB / token) * chunk_size * s3_max_inflight_reqs + max_local_cpu_size * num_tp_workers <= available_pinned_memory
Calculate s3_max_inflight_reqs based on /dev/shm:
s3_max_inflight_reqs <= (GB in /dev/shm) / (chunk_size_GB_per_TP) / (TP_count)
Check current size:
df -h /dev/shm
Increase size:
sudo mount -o remount,size=256G /dev/shm
Clean up LMCache files:
rm -f /dev/shm/my_shm_*
Troubleshooting#
Memory::
Check: df -h /dev/shm
Fix: Increase /dev/shm size or reduce s3_max_inflight_reqs or max_local_cpu_size
Clean: rm -f /dev/shm/my_shm_*
Latency:
- Use same region for compute and S3
Consider S3 Express One Zone