SageMaker Hyperpod#
Prerequisites#
Create an Amazon SageMaker HyperPod cluster with tiered storage enabled by following the instructions at:
https://docs.aws.amazon.com/sagemaker/latest/dg/managed-tier-checkpointing-setup.html
This enables the ai-toolkit daemon that provides shared memory access for LMCache.
Example Configuration#
chunk_size: 256
local_cpu: True
max_local_cpu_size: 5
remote_url: "sagemaker-hyperpod://$NODE_IP:9200"
Configuration Parameters#
SageMaker Hyperpod-Specific (in extra_config)#
sagemaker_hyperpod_bucket: Bucket name for KV storage namespace (default: “lmcache”)
sagemaker_hyperpod_shared_memory_name: Name of shared memory segment (default: “shared_memory”). Set to None to disable shared memory.
sagemaker_hyperpod_max_concurrent_requests: Maximum concurrent HTTP requests allowed in-flight at any moment (application-level throttling, default: 100, minimum: 1). This limit is per LMCache engine instance. With multiple workers (e.g., high TP), each worker creates its own engine with separate limits.
sagemaker_hyperpod_max_connections: Maximum total TCP connections in the connection pool per LMCache engine across all daemons (default: 256, minimum: 1). For typical single-daemon setups, this effectively limits connections from one engine to one daemon. With N workers per node, total connections to the daemon = N × this value.
sagemaker_hyperpod_max_connections_per_host: Maximum TCP connections per LMCache engine to a single daemon address (IP:port) (default: 128, minimum: 1). “Host” refers to the daemon’s network address, not the client machine. For today’s typical single-daemon setup, this has similar effect as max_connections. This parameter enables future multi-daemon configurations where one engine connects to multiple daemons for load balancing. With N workers per node connecting to the same daemon, total connections = N × this value. Reduce proportionally for high TP setups (e.g., set to 16 for 8 workers to achieve ~128 total connections).
sagemaker_hyperpod_timeout_ms: Timeout for lease acquisition requests in milliseconds (default: 5000, minimum: 100)
sagemaker_hyperpod_lease_ttl_s: Server-side lease timeout in seconds (default: 30.0)
sagemaker_hyperpod_put_stream_chunk_bytes: Chunk size for streaming PUT requests in bytes (default: 65536, minimum: 1024)
sagemaker_hyperpod_use_https: Enable HTTPS instead of HTTP (default: False). Note: Ignored if
remote_urlalready containshttp://orhttps://protocol.save_chunk_meta: Whether to save chunk metadata with data (set False for performance)
Kubernetes Deployment Requirements#
Environment Variable for Node IP#
Add the NODE_IP environment variable to resolve the local node’s IP address:
env:
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
/dev/shm Volume Configuration#
SageMaker Hyperpod requires /dev/shm for high-performance shared memory operations:
volumeMounts:
- name: dshm
mountPath: /dev/shm/shared_memory
subPath: shared_memory
volumes:
- name: dshm
hostPath:
path: /dev/shm