[English readme](#english) # Kimi-K2 INT4MIX 模型 - Fastllm Fastllm 的 Kimi-K2 INT4MIX 模型 https://github.com/ztxz16/fastllm # 安装 ``` sh pip install ftllm ``` # 下载模型: ``` sh ftllm download fastllm/Kimi-K2-Instruct-INT4MIX ``` # 运行模型 ``` sh # 假设模型下载在 /root/Kimi-K2-Instruct-INT4MIX ftllm run /root/Kimi-K2-Instruct-INT4MIX # 聊天模式 ftllm server /root/Kimi-K2-Instruct-INT4MIX # API 服务器模式(默认模型名称 = /root/Kimi-K2-Instruct-INT4MIX,端口 = 8080) ``` # 优化 ## 单 CPU 如果您使用的是单个 CPU,请使用 -t 参数设置线程数(通常设置为 CPU 核心数 - 2)。 如果速度非常慢,可能是由于线程过多——考虑减少线程数。 例如: ``` sh ftllm server /root/Kimi-K2-Instruct-INT4MIX -t 12 ``` ## 多 CPU(多 NUMA 节点) 如果使用多路 CPU 的机器,您需要启用 CUDA + NUMA 异构加速模式。 使用环境变量 FASTLLM_NUMA_THREADS 设置线程数(通常设置为每个 NUMA 节点的核心数 - 2)。 如果性能非常慢,可能是由于线程过多——考虑减少线程数。 例如: ``` sh export FASTLLM_NUMA_THREADS=12 && ftllm server /root/Kimi-K2-Instruct-INT4MIX --device cuda --moe_device numa -t 1 ``` --- # English Kimi-K2 INT4MIX model for Fastllm https://github.com/ztxz16/fastllm # install ``` sh pip install ftllm ``` # download model: ``` sh ftllm download fastllm/Kimi-K2-Instruct-INT4MIX ``` # run model ``` sh # Assuming the model is downloaded in /root/Kimi-K2-Instruct-INT4MIX ftllm run /root/Kimi-K2-Instruct-INT4MIX # chat ftllm server /root/Kimi-K2-Instruct-INT4MIX # api server (default model_name = /root/Kimi-K2-Instruct-INT4MIX, port = 8080) ``` # optimize ## single CPU If you are using a single CPU, set the number of threads with the -t parameter (generally set to CPU core count - 2). If the speed is extremely slow, it may be due to too many threads—consider reducing them. for example: ``` sh ftllm server /root/Kimi-K2-Instruct-INT4MIX -t 12 ``` ## multi cpu (multi numa node) If using a multi-socket CPU machine, you need to enable CUDA + NUMA heterogeneous acceleration mode. Set the number of threads using the environment variable FASTLLM_NUMA_THREADS (typically set to the number of cores per NUMA node - 2). If performance is extremely slow, it may be due to excessive threads—consider reducing them. for example: ``` sh export FASTLLM_NUMA_THREADS=12 && ftllm server /root/Kimi-K2-Instruct-INT4MIX --device cuda --moe_device numa -t 1 ```