Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ DeepSeek-R1-FlagOS-Metax-BF16 provides an all-in-one deployment solution, enabli
|
|
| 5 |
1. Comprehensive Integration:
|
| 6 |
- Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
|
| 7 |
- Open-source inference execution code, preconfigured with all necessary software and hardware settings.
|
| 8 |
-
- Verified model files, available on
|
| 9 |
- Pre-built Docker image for rapid deployment on Metax-C550.
|
| 10 |
2. High-Precision BF16 Checkpoints:
|
| 11 |
- BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
|
|
@@ -30,7 +30,7 @@ We validate the execution of DeepSeed-R1 model with a Triton-based operator libr
|
|
| 30 |
|
| 31 |
We use a variety of Triton-implemented operation kernels—approximately 70%—to run the DeepSeek-R1 model. These kernels come from two main sources:
|
| 32 |
|
| 33 |
-
- Most Triton kernels are provided by FlagGems (
|
| 34 |
|
| 35 |
- Also included are Triton kernels from vLLM, including fused MoE.
|
| 36 |
|
|
@@ -43,7 +43,7 @@ We provide dequantized model weights in bfloat16 to run DeepSeek-R1 on Metax GPU
|
|
| 43 |
| | Usage | Metax |
|
| 44 |
| ----------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
|
| 45 |
| Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-metax` |
|
| 46 |
-
| Model | model weight and configuration files | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Metax-
|
| 47 |
|
| 48 |
# Evaluation Results
|
| 49 |
|
|
@@ -55,8 +55,8 @@ We provide dequantized model weights in bfloat16 to run DeepSeek-R1 on Metax GPU
|
|
| 55 |
| MMLU (Acc.) | 85.34 | 85.38 |
|
| 56 |
| CEVAL | 89.00 | 89.23 |
|
| 57 |
| AIME 2024 (Pass@1) | 76.67 | 76.67 |
|
| 58 |
-
| GPQA-Diamond (Pass@1) | 70.20
|
| 59 |
-
|
|
| 60 |
|
| 61 |
# How to Run Locally
|
| 62 |
## 📌 Getting Started
|
|
@@ -141,8 +141,89 @@ We warmly welcome global developers to join us:
|
|
| 141 |
Scan the QR code below to add our WeChat group
|
| 142 |
send "FlagRelease"
|
| 143 |
|
| 144 |
-
.
|
| 7 |
- Open-source inference execution code, preconfigured with all necessary software and hardware settings.
|
| 8 |
+
- Verified model files, available on Hugging Face ([Model Link](https://huggingface.co/FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16)).
|
| 9 |
- Pre-built Docker image for rapid deployment on Metax-C550.
|
| 10 |
2. High-Precision BF16 Checkpoints:
|
| 11 |
- BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
|
|
|
|
| 30 |
|
| 31 |
We use a variety of Triton-implemented operation kernels—approximately 70%—to run the DeepSeek-R1 model. These kernels come from two main sources:
|
| 32 |
|
| 33 |
+
- Most Triton kernels are provided by FlagGems (GitHub - FlagOpen/FlagGems: FlagGems is an operator library for large language models implemented in). You can enable FlagGems kernels by setting the environment variable USE_FLAGGEMS. For more details, please refer to the "How to Run Locally" section.
|
| 34 |
|
| 35 |
- Also included are Triton kernels from vLLM, including fused MoE.
|
| 36 |
|
|
|
|
| 43 |
| | Usage | Metax |
|
| 44 |
| ----------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
|
| 45 |
| Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-metax` |
|
| 46 |
+
| Model | model weight and configuration files | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16 |
|
| 47 |
|
| 48 |
# Evaluation Results
|
| 49 |
|
|
|
|
| 55 |
| MMLU (Acc.) | 85.34 | 85.38 |
|
| 56 |
| CEVAL | 89.00 | 89.23 |
|
| 57 |
| AIME 2024 (Pass@1) | 76.67 | 76.67 |
|
| 58 |
+
| GPQA-Diamond (Pass@1) | 70.20 | 71.72 |
|
| 59 |
+
| AIME 2024 (Pass@1) | 93.20 | 93.80 |
|
| 60 |
|
| 61 |
# How to Run Locally
|
| 62 |
## 📌 Getting Started
|
|
|
|
| 141 |
Scan the QR code below to add our WeChat group
|
| 142 |
send "FlagRelease"
|
| 143 |
|
| 144 |
+

|
| 145 |
|
| 146 |
# License
|
| 147 |
|
| 148 |
This project and related model weights are licensed under the MIT License.
|
| 149 |
+
This project and related model weights are licensed under the Apache License (Version 2.0).
|
| 150 |
+
|
| 151 |
+
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
# Initial installation steps:
|
| 155 |
+
|
| 156 |
+
## 📌 Getting Started
|
| 157 |
+
|
| 158 |
+
### Environment Setup
|
| 159 |
+
|
| 160 |
+
```bash
|
| 161 |
+
# Download checkpoint
|
| 162 |
+
pip install modelscope
|
| 163 |
+
modelscope download --model FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16 --local_dir <Download URL>
|
| 164 |
+
|
| 165 |
+
# build and enter the container 【Perform this operation on four machines】
|
| 166 |
+
docker run -it --device=/dev/dri --device=/dev/mxcd --group-add video --name flagrelease_metax --device=/dev/mem --network=host --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size '100gb' --ulimit memlock=-1 -v /usr/local/:/usr/local/ -v <CKPT_PATH>:<CKPT_PATH> flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-metax /bin/bash
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
### Modify the `config.json` for Deepseek-R1-671b
|
| 170 |
+
|
| 171 |
+
```
|
| 172 |
+
# Locate and remove the following JSON configuration:
|
| 173 |
+
"quantization_config": {
|
| 174 |
+
"activation_scheme": "dynamic",
|
| 175 |
+
"fmt": "e4m3",
|
| 176 |
+
"quant_method": "fp8",
|
| 177 |
+
"weight_block_size": [
|
| 178 |
+
128,
|
| 179 |
+
128
|
| 180 |
+
]
|
| 181 |
+
},
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
### Configure environment variables
|
| 185 |
+
|
| 186 |
+
```
|
| 187 |
+
# Create an ‘env.sh’ file with:
|
| 188 |
+
export GLOO_SOCKET_IF_NAME=ens20np0 # Note: The value of GLOO_SOCKET_IF_NAME should be the network interface name for inter-machine communication. Use `ifconfig` to check network interfaces.
|
| 189 |
+
export VLLM_LOGGING_LEVEL=DEBUG
|
| 190 |
+
export VLLM_PP_LAYER_PARTITION=16,15,15,15
|
| 191 |
+
export MACA_SMALL_PAGESIZE_ENABLE=1
|
| 192 |
+
source env.sh
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
### Start Ray Cluster
|
| 196 |
+
|
| 197 |
+
```
|
| 198 |
+
# On the **main node** (first machine), run:
|
| 199 |
+
ray start --head --num-gpus=8
|
| 200 |
+
|
| 201 |
+
# On **other nodes**, execute `ray start --address='<main_node_ip:port>'` (use the IP and port displayed by the main node).
|
| 202 |
+
# After all nodes start Ray, run `ray status` on the main node. Ensure **32 GPUs** are recognized.
|
| 203 |
+
# Note: If environment variables are modified, restart Ray on all nodes (`ray stop`). Stop worker nodes first, then the main node.
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
### Serve
|
| 207 |
+
|
| 208 |
+
```bash
|
| 209 |
+
# On the main node:
|
| 210 |
+
vllm serve /nfs/deepseek_r1_BF16 -pp 4 -tp 8 --trust-remote-code --distributed-executor-backend ray --dtype bfloat16 --max-model-len 4096 --swap-space 16 --gpu-memory-utilization 0.90
|
| 211 |
+
# Once the model loads fully, use the API for inference.**Test with ‘client.py’**
|
| 212 |
+
|
| 213 |
+
```
|
| 214 |
+
|
| 215 |
+
`client.py`
|
| 216 |
+
|
| 217 |
+
```bash
|
| 218 |
+
curl http://localhost:8000/v1/chat/completions \
|
| 219 |
+
-H "Content-Type: application/json" \
|
| 220 |
+
-d '{
|
| 221 |
+
"model": "<model path>",
|
| 222 |
+
"messages": [
|
| 223 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
| 224 |
+
{"role": "user", "content": "Who won the world series in 2020?"}
|
| 225 |
+
]
|
| 226 |
+
}'
|
| 227 |
+
```
|
| 228 |
+
|
| 229 |
+
#
|