3FS#
概述#
3FS(火飞文件系统)是一个 AI 原生的分布式文件系统,提供高性能和低延迟。它是 LMCache 卸载 KV Cache 的一个支持选项。尽管 FSConnector 后端可以与 3FS 存储集群一起工作,但它通过 FUSE 接口访问 3FS 存储集群,无法利用 3FS 的高性能。这个特定的后端使用 3FS 原生的 USRBIO(用户空间环形 IO)接口访问 3FS 存储集群,从而获得高性能。
配置 LMCache 3FS 卸载#
通过 LMCACHE_CONFIG_FILE=your-lmcache-config.yaml 传入
示例 config.yaml:
# 256 Tokens per KV Chunk
chunk_size: 256
local_cpu: False
# Plugin name mode, obtain base_path in extra_config session
remote_storage_plugins: ["hf3fs.primary"]
# URL mode, obtain base_path from the URL
#remote_url: "hf3fs:///3fs/stage/hello, /3fs/stage/world"
extra_config:
# base_path, for a plugin instance
remote_storage_plugin.hf3fs.primary.base_path: "/3fs/stage/dir1,/3fs/stage/dir2"
# base_path, for all plugin instances
#hf3fs_base_path: "/3fs/stage/dir1,/3fs/stage/dir2"
# Mount point of 3FS
hf3fs_mount_point: "/3fs/stage"
# Shared memory size for Iov in hf3fs client,
# range in [104857600(100MB), 2147483648(2GB)], default:209715200
hf3fs_iov_size: 209715200 #200MB
# Max num of concurrent requests that can be submitted in Ior
# range in [128,1024], default: 256
hf3fs_ior_entries: 256
# Control with I/O depth. 0, no control
# >0, only when io_depth requests are in queue, and issue them in one batch
# <0, wait for at most -io_depth requests are in queue and issue them in one batch
# range in [-128, 128], default: 0
hf3fs_io_depth: 0
# NUMA ID for Ior shared memory, -1 for current process NUMA ID.
hf3fs_numa_id: -1
# Number of io thread
# range in [2,16], default: 4
hf3fs_io_thread_num: 4
- 配置 hf3fs 远程后端有两种方法:
插件名称模式,使用参数 remote_storage_plugins(推荐)
URL 模式,使用参数 remote_url(已弃用,将在未来删除)
- 对于 URL 模式,base_path 包含在 url 中。对于插件名称模式,有两种方法可以设置 base_path:
- remote_storage_plugin.{plugin name}.base_path,它为插件实例设置 base_path。
e.g.:remote_storage_plugin.hf3fs.primary.base_path
hf3fs_base_path,它为所有插件实例设置了 base_path
安装#
先决条件:
一台至少配备一块 GPU 的机器。您可以根据您的显存调整 vllm 实例的最大模型长度。
已安装 vllm 和 LMCache
步骤 1. 安装 3FS hf3fs_py_usrbio 包
推理服务器需要安装 3FS hf3fs_py_usrbio 包,建议从源代码构建该包:
git clone https://github.com/deepseek-ai/3fs cd 3fs git submodule update --init --recursive ./patches/apply.sh Install dependencies pip install -e .
步骤 2. 设置 3FS 存储集群
步骤 3. 在推理服务器上部署 3FS FUSE 客户端
推理服务器必须部署 3FS FUSE 客户端(由 3FS 提供的 FUSE 守护进程),否则无法访问 3FS 存储集群。
步骤 4. 启动一个启用 3FS 卸载的 vLLM 服务器
创建一个名为
3fs-offload.yaml的 lmcache 配置文件# 256 Tokens per KV Chunk chunk_size: 256 local_cpu: False # support multiple paths remote_storage_plugins: ["hf3fs.primary"] extra_config: # base_path remote_storage_plugin.hf3fs.primary.base_path: "/3fs/stage/dir1,/3fs/stage/dir2" # Mount point of 3FS hf3fs_mount_point: "/3fs/stage" # Shared memory size for Iov in hf3fs client, # range in [104857600(100MB), 2147483648(2GB)], default:209715200 (200MB) hf3fs_iov_size: 209715200 # Max num of concurrent requests that can be submitted in Ior # range in [128,1024], default: 256 hf3fs_ior_entries: 256 # Control with I/O depth. 0, no control # >0, only when io_depth requests are in queue, and issue them in one batch # <0, wait for at most -io_depth requests are in queue and issue them in one batch # range in [-128, 128], default: 0 hf3fs_io_depth: 0 # NUMA ID for Ior shared memory, -1 for current process NUMA ID. hf3fs_numa_id: -1 # Number of io thread # range in [2,16], default: 4 hf3fs_io_thread_num: 4启动 vLLM:
export VLLM_USE_V1=0 export LMCACHE_USE_EXPERIMENTAL=True export LMCACHE_LOG_LEVEL=INFO export VLLM_WORKER_MULTIPROC_METHOD=spawn export VLLM_ENABLE_V1_MULTIPROCESSING=1 LMCACHE_CONFIG_FILE="3fs-offload.yaml" \ vllm serve \ meta-llama/Llama-3.1-8B-Instruct \ --max-model-len 65536 \ --kv-transfer-config \ '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'