S3 Backend#
Example Configurations#
Basic S3 Configuration#
chunk_size: 256
local_cpu: False
save_unfull_chunk: False
remote_url: "s3://your-bucket-name"
remote_serde: "naive"
blocking_timeout_secs: 10
extra_config:
s3_region: "us-east-1"
s3_num_io_threads: 64
save_chunk_meta: False
S3 Express One Zone#
chunk_size: 256
local_cpu: False
save_unfull_chunk: False
remote_url: "s3://{BUCKET_NAME}.s3express-{AZ_ID}.{REGION}.amazonaws.com"
remote_serde: "naive"
blocking_timeout_secs: 10
extra_config:
save_chunk_meta: False
s3_num_io_threads: 64
s3_prefer_http2: True
s3_region: "{REGION}"
s3_enable_s3express: True
CoreWeave (S3-compatible)#
chunk_size: 256
local_cpu: False
max_local_cpu_size: 50
save_unfull_chunk: False
enable_async_loading: True
remote_url: "s3://test-127.cwlota.com"
remote_serde: "naive"
blocking_timeout_secs: 10
extra_config:
s3_num_io_threads: 320
s3_prefer_http2: False
s3_region: "US-WEST-04A"
s3_enable_s3express: False
save_chunk_meta: False
disable_tls: True
aws_access_key_id: "your-access-key-id"
aws_secret_access_key: "your-secret-access-key"
Note: cwlota.com is CoreWeave’s S3-compatible Cloud Storage that caches for GPU locality. You can set disable_tls: True for non-AWS services.
Check out the blog post between LMCache, Cohere, and CoreWeave: https://blog.lmcache.ai/en/2025/10/29/breaking-the-memory-barrier-how-lmcache-and-coreweave-power-efficient-llm-inference-for-cohere/
Configuration Parameters#
remote_url: S3 bucket URL (s3://bucket-name)
save_unfull_chunk: Save partial chunks (default: False, must be False for S3)
enable_async_loading: Async loading (default: False)
blocking_timeout_secs: Timeout seconds (default: 10)
S3-Specific (in extra_config)#
s3_region: AWS region for S3 client (required)
s3_num_io_threads: Number of IO threads for the AWS CRT client to spawn. Benefits taper out after exceeding the number of CPU cores. This is also a way to restrict the number of outgoing requests in case your S3-compatible object store has a rate-limiting gateway.
s3_prefer_http2: Enable HTTP/2 with ALPN negotiation ([“h2”, “http/1.1”])
s3_enable_s3express: Enable S3 Express One Zone support in AWS CRT client
save_chunk_meta: Whether to save chunk metadata in the object store along with your data (False required for S3)
aws_access_key_id: AWS access key ID (or log in with aws configure in your environment)
aws_secret_access_key: AWS secret access key (or log in with aws configure in your environment)
Tips::
- Use same region for compute and S3
Consider S3 Express One Zone for less redundancy but better performance