yajunvicky commited on
Commit
2ef34ae
·
verified ·
1 Parent(s): 10b63f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -6
README.md CHANGED
@@ -5,7 +5,7 @@ DeepSeek-R1-FlagOS-Metax-BF16 provides an all-in-one deployment solution, enabli
5
  1. Comprehensive Integration:
6
  - Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
7
  - Open-source inference execution code, preconfigured with all necessary software and hardware settings.
8
- - Verified model files, available on ModelScope ([Model Link](https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16)).
9
  - Pre-built Docker image for rapid deployment on Metax-C550.
10
  2. High-Precision BF16 Checkpoints:
11
  - BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
@@ -30,7 +30,7 @@ We validate the execution of DeepSeed-R1 model with a Triton-based operator libr
30
 
31
  We use a variety of Triton-implemented operation kernels—approximately 70%—to run the DeepSeek-R1 model. These kernels come from two main sources:
32
 
33
- - Most Triton kernels are provided by FlagGems (https://github.com/FlagOpen/FlagGems). You can enable FlagGems kernels by setting the environment variable USE_FLAGGEMS. For more details, please refer to the "How to Run Locally" section.
34
 
35
  - Also included are Triton kernels from vLLM, including fused MoE.
36
 
@@ -43,7 +43,7 @@ We provide dequantized model weights in bfloat16 to run DeepSeek-R1 on Metax GPU
43
  | | Usage | Metax |
44
  | ----------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
45
  | Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-metax` |
46
- | Model | model weight and configuration files | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Metax-FP16 |
47
 
48
  # Evaluation Results
49
 
@@ -55,8 +55,8 @@ We provide dequantized model weights in bfloat16 to run DeepSeek-R1 on Metax GPU
55
  | MMLU (Acc.) | 85.34 | 85.38 |
56
  | CEVAL | 89.00 | 89.23 |
57
  | AIME 2024 (Pass@1) | 76.67 | 76.67 |
58
- | GPQA-Diamond (Pass@1) | 70.20 | 71.72 |
59
- | MATH-500 (Pass@1) | 93.20 | 93.80 |
60
 
61
  # How to Run Locally
62
  ## 📌 Getting Started
@@ -141,8 +141,89 @@ We warmly welcome global developers to join us:
141
  Scan the QR code below to add our WeChat group
142
  send "FlagRelease"
143
 
144
- ![WeChat](https://cdn-uploads.huggingface.co/production/uploads/673326280dbcb3477ecc2af6/aETN9Zswqts2P9YLrizrz.png)
145
 
146
  # License
147
 
148
  This project and related model weights are licensed under the MIT License.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  1. Comprehensive Integration:
6
  - Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
7
  - Open-source inference execution code, preconfigured with all necessary software and hardware settings.
8
+ - Verified model files, available on Hugging Face ([Model Link](https://huggingface.co/FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16)).
9
  - Pre-built Docker image for rapid deployment on Metax-C550.
10
  2. High-Precision BF16 Checkpoints:
11
  - BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
 
30
 
31
  We use a variety of Triton-implemented operation kernels—approximately 70%—to run the DeepSeek-R1 model. These kernels come from two main sources:
32
 
33
+ - Most Triton kernels are provided by FlagGems (GitHub - FlagOpen/FlagGems: FlagGems is an operator library for large language models implemented in). You can enable FlagGems kernels by setting the environment variable USE_FLAGGEMS. For more details, please refer to the "How to Run Locally" section.
34
 
35
  - Also included are Triton kernels from vLLM, including fused MoE.
36
 
 
43
  | | Usage | Metax |
44
  | ----------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
45
  | Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-metax` |
46
+ | Model | model weight and configuration files | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16 |
47
 
48
  # Evaluation Results
49
 
 
55
  | MMLU (Acc.) | 85.34 | 85.38 |
56
  | CEVAL | 89.00 | 89.23 |
57
  | AIME 2024 (Pass@1) | 76.67 | 76.67 |
58
+ | GPQA-Diamond (Pass@1) | 70.20 | 71.72 |
59
+ | AIME 2024 (Pass@1) | 93.20 | 93.80 |
60
 
61
  # How to Run Locally
62
  ## 📌 Getting Started
 
141
  Scan the QR code below to add our WeChat group
142
  send "FlagRelease"
143
 
144
+ ![WeChat](image/group.png)
145
 
146
  # License
147
 
148
  This project and related model weights are licensed under the MIT License.
149
+ This project and related model weights are licensed under the Apache License (Version 2.0).
150
+
151
+ <p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
152
+
153
+
154
+ # Initial installation steps:
155
+
156
+ ## 📌 Getting Started
157
+
158
+ ### Environment Setup
159
+
160
+ ```bash
161
+ # Download checkpoint
162
+ pip install modelscope
163
+ modelscope download --model FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16 --local_dir <Download URL>
164
+
165
+ # build and enter the container 【Perform this operation on four machines】
166
+ docker run -it --device=/dev/dri --device=/dev/mxcd --group-add video --name flagrelease_metax --device=/dev/mem --network=host --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size '100gb' --ulimit memlock=-1 -v /usr/local/:/usr/local/ -v <CKPT_PATH>:<CKPT_PATH> flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-metax /bin/bash
167
+ ```
168
+
169
+ ### Modify the `config.json` for Deepseek-R1-671b
170
+
171
+ ```
172
+ # Locate and remove the following JSON configuration:
173
+ "quantization_config": {
174
+ "activation_scheme": "dynamic",
175
+ "fmt": "e4m3",
176
+ "quant_method": "fp8",
177
+ "weight_block_size": [
178
+ 128,
179
+ 128
180
+ ]
181
+ },
182
+ ```
183
+
184
+ ### Configure environment variables
185
+
186
+ ```
187
+ # Create an ‘env.sh’ file with:
188
+ export GLOO_SOCKET_IF_NAME=ens20np0 # Note: The value of GLOO_SOCKET_IF_NAME should be the network interface name for inter-machine communication. Use `ifconfig` to check network interfaces.
189
+ export VLLM_LOGGING_LEVEL=DEBUG
190
+ export VLLM_PP_LAYER_PARTITION=16,15,15,15
191
+ export MACA_SMALL_PAGESIZE_ENABLE=1
192
+ source env.sh
193
+ ```
194
+
195
+ ### Start Ray Cluster
196
+
197
+ ```
198
+ # On the **main node** (first machine), run:
199
+ ray start --head --num-gpus=8
200
+
201
+ # On **other nodes**, execute `ray start --address='<main_node_ip:port>'` (use the IP and port displayed by the main node).
202
+ # After all nodes start Ray, run `ray status` on the main node. Ensure **32 GPUs** are recognized.
203
+ # Note: If environment variables are modified, restart Ray on all nodes (`ray stop`). Stop worker nodes first, then the main node.
204
+ ```
205
+
206
+ ### Serve
207
+
208
+ ```bash
209
+ # On the main node:
210
+ vllm serve /nfs/deepseek_r1_BF16 -pp 4 -tp 8 --trust-remote-code --distributed-executor-backend ray --dtype bfloat16 --max-model-len 4096 --swap-space 16 --gpu-memory-utilization 0.90
211
+ # Once the model loads fully, use the API for inference.**Test with ‘client.py’**
212
+
213
+ ```
214
+
215
+ `client.py`
216
+
217
+ ```bash
218
+ curl http://localhost:8000/v1/chat/completions \
219
+ -H "Content-Type: application/json" \
220
+ -d '{
221
+ "model": "<model path>",
222
+ "messages": [
223
+ {"role": "system", "content": "You are a helpful assistant."},
224
+ {"role": "user", "content": "Who won the world series in 2020?"}
225
+ ]
226
+ }'
227
+ ```
228
+
229
+ #