S3 Backend#

Example Configurations#

Basic S3 Configuration#

chunk_size: 256
local_cpu: False
save_unfull_chunk: False
remote_url: "s3://your-bucket-name"
remote_serde: "naive"
blocking_timeout_secs: 10
extra_config:
  s3_region: "us-east-1"
  s3_num_io_threads: 64
  save_chunk_meta: False

S3 Express One Zone#

chunk_size: 256
local_cpu: False
save_unfull_chunk: False
remote_url: "s3://{BUCKET_NAME}.s3express-{AZ_ID}.{REGION}.amazonaws.com"
remote_serde: "naive"
blocking_timeout_secs: 10
extra_config:
  save_chunk_meta: False
  s3_num_io_threads: 64
  s3_prefer_http2: True
  s3_region: "{REGION}"
  s3_enable_s3express: True

CoreWeave (S3-compatible)#

chunk_size: 256
local_cpu: False
max_local_cpu_size: 50
save_unfull_chunk: False
enable_async_loading: True
remote_url: "s3://test-127.cwlota.com"
remote_serde: "naive"
blocking_timeout_secs: 10
extra_config:
  s3_num_io_threads: 320
  s3_prefer_http2: False
  s3_region: "US-WEST-04A"
  s3_enable_s3express: False
  save_chunk_meta: False
  disable_tls: True
  aws_access_key_id: "your-access-key-id"
  aws_secret_access_key: "your-secret-access-key"

Note: cwlota.com is CoreWeave’s S3-compatible Cloud Storage that caches for GPU locality. You can set disable_tls: True for non-AWS services.

Check out the blog post between LMCache, Cohere, and CoreWeave: https://blog.lmcache.ai/en/2025/10/29/breaking-the-memory-barrier-how-lmcache-and-coreweave-power-efficient-llm-inference-for-cohere/

Configuration Parameters#

  • remote_url: S3 bucket URL (s3://bucket-name)

  • save_unfull_chunk: Save partial chunks (default: False, must be False for S3)

  • enable_async_loading: Async loading (default: False)

  • blocking_timeout_secs: Timeout seconds (default: 10)

S3-Specific (in extra_config)#

  • s3_region: AWS region for S3 client (required)

  • s3_num_io_threads: Number of IO threads for the AWS CRT client to spawn. Benefits taper out after exceeding the number of CPU cores. This is also a way to restrict the number of outgoing requests in case your S3-compatible object store has a rate-limiting gateway.

  • s3_prefer_http2: Enable HTTP/2 with ALPN negotiation ([“h2”, “http/1.1”])

  • s3_enable_s3express: Enable S3 Express One Zone support in AWS CRT client

  • save_chunk_meta: Whether to save chunk metadata in the object store along with your data (False required for S3)

  • aws_access_key_id: AWS access key ID (or log in with aws configure in your environment)

  • aws_secret_access_key: AWS secret access key (or log in with aws configure in your environment)

Tips::

- Use same region for compute and S3
  • Consider S3 Express One Zone for less redundancy but better performance