3FS#

概述#

3FS(火飞文件系统)是一个 AI 原生的分布式文件系统,提供高性能和低延迟。它是 LMCache 卸载 KV Cache 的一个支持选项。尽管 FSConnector 后端可以与 3FS 存储集群一起工作,但它通过 FUSE 接口访问 3FS 存储集群,无法利用 3FS 的高性能。这个特定的后端使用 3FS 原生的 USRBIO(用户空间环形 IO)接口访问 3FS 存储集群,从而获得高性能。

配置 LMCache 3FS 卸载#

通过 LMCACHE_CONFIG_FILE=your-lmcache-config.yaml 传入

示例 config.yaml

# 256 Tokens per KV Chunk
chunk_size: 256
local_cpu: False

# Plugin name mode, obtain base_path in extra_config session
remote_storage_plugins: ["hf3fs.primary"]

# URL mode, obtain base_path from the URL
#remote_url: "hf3fs:///3fs/stage/hello, /3fs/stage/world"

extra_config:

    # base_path, for a plugin instance
    remote_storage_plugin.hf3fs.primary.base_path: "/3fs/stage/dir1,/3fs/stage/dir2"

    # base_path, for all plugin instances
    #hf3fs_base_path: "/3fs/stage/dir1,/3fs/stage/dir2"

    # Mount point of 3FS
    hf3fs_mount_point: "/3fs/stage"

    # Shared memory size for Iov in hf3fs client,
    # range in [104857600(100MB), 2147483648(2GB)], default:209715200
    hf3fs_iov_size: 209715200 #200MB

    # Max num of concurrent requests that can be submitted in Ior
    # range in [128,1024], default: 256
    hf3fs_ior_entries: 256

    # Control with I/O depth. 0, no control
    # >0, only when io_depth requests are in queue, and issue them in one batch
    # <0, wait for at most -io_depth requests are in queue and issue them in one batch
    # range in [-128, 128], default: 0
    hf3fs_io_depth: 0

    # NUMA ID for Ior shared memory, -1 for current process NUMA ID.
    hf3fs_numa_id: -1

    # Number of io thread
    # range in [2,16], default: 4
    hf3fs_io_thread_num: 4
配置 hf3fs 远程后端有两种方法:
  1. 插件名称模式,使用参数 remote_storage_plugins(推荐)

  2. URL 模式,使用参数 remote_url(已弃用,将在未来删除)

对于 URL 模式,base_path 包含在 url 中。对于插件名称模式,有两种方法可以设置 base_path:
  1. remote_storage_plugin.{plugin name}.base_path,它为插件实例设置 base_path。

    e.g.:remote_storage_plugin.hf3fs.primary.base_path

  2. hf3fs_base_path,它为所有插件实例设置了 base_path

安装#

先决条件:

  • 一台至少配备一块 GPU 的机器。您可以根据您的显存调整 vllm 实例的最大模型长度。

  • 已安装 vllm 和 LMCache

步骤 1. 安装 3FS hf3fs_py_usrbio 包

推理服务器需要安装 3FS hf3fs_py_usrbio 包,建议从源代码构建该包:

git clone https://github.com/deepseek-ai/3fs
cd 3fs
git submodule update --init --recursive
./patches/apply.sh
Install dependencies
pip install -e .

3FS Build

步骤 2. 设置 3FS 存储集群

步骤 3. 在推理服务器上部署 3FS FUSE 客户端

推理服务器必须部署 3FS FUSE 客户端(由 3FS 提供的 FUSE 守护进程),否则无法访问 3FS 存储集群。

设置 3FS FUSE 客户端

步骤 4. 启动一个启用 3FS 卸载的 vLLM 服务器

创建一个名为 3fs-offload.yaml 的 lmcache 配置文件

# 256 Tokens per KV Chunk
chunk_size: 256
local_cpu: False
# support multiple paths
remote_storage_plugins: ["hf3fs.primary"]

extra_config:
    # base_path
    remote_storage_plugin.hf3fs.primary.base_path: "/3fs/stage/dir1,/3fs/stage/dir2"

    # Mount point of 3FS
    hf3fs_mount_point: "/3fs/stage"

    # Shared memory size for Iov in hf3fs client,
    # range in [104857600(100MB), 2147483648(2GB)], default:209715200 (200MB)
    hf3fs_iov_size: 209715200

    # Max num of concurrent requests that can be submitted in Ior
    # range in [128,1024], default: 256
    hf3fs_ior_entries: 256

    # Control with I/O depth. 0, no control
    # >0, only when io_depth requests are in queue, and issue them in one batch
    # <0, wait for at most -io_depth requests are in queue and issue them in one batch
    # range in [-128, 128], default: 0
    hf3fs_io_depth: 0

    # NUMA ID for Ior shared memory, -1 for current process NUMA ID.
    hf3fs_numa_id: -1

    # Number of io thread
    # range in [2,16], default: 4
    hf3fs_io_thread_num: 4

启动 vLLM:

export VLLM_USE_V1=0
export LMCACHE_USE_EXPERIMENTAL=True
export LMCACHE_LOG_LEVEL=INFO
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export VLLM_ENABLE_V1_MULTIPROCESSING=1
LMCACHE_CONFIG_FILE="3fs-offload.yaml" \
vllm serve \
    meta-llama/Llama-3.1-8B-Instruct \
    --max-model-len 65536 \
    --kv-transfer-config \
    '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'