FlagRelease
/

DeepSeek-R1-FlagOS-Metax-BF16

@@ -5,7 +5,7 @@ DeepSeek-R1-FlagOS-Metax-BF16 provides an all-in-one deployment solution, enabli
 1. Comprehensive Integration:
    - Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
    - Open-source inference execution code, preconfigured with all necessary software and hardware settings.
-   - Verified model files, available on ModelScope ([Model Link](https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16)).
    - Pre-built Docker image for rapid deployment on Metax-C550.
 2. High-Precision BF16 Checkpoints:
    - BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
@@ -30,7 +30,7 @@ We validate the execution of DeepSeed-R1 model with a Triton-based operator libr
 We use a variety of Triton-implemented operation kernels—approximately 70%—to run the DeepSeek-R1 model. These kernels come from two main sources:
-- Most Triton kernels are provided by FlagGems (https://github.com/FlagOpen/FlagGems). You can enable FlagGems kernels by setting the environment variable USE_FLAGGEMS. For more details, please refer to the "How to Run Locally" section.
 - Also included are Triton kernels from vLLM, including fused MoE.
@@ -43,7 +43,7 @@ We provide dequantized model weights in bfloat16 to run DeepSeek-R1 on Metax GPU
 |             | Usage                                                  | Metax                                                                                                       |
 | ----------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
 | Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-metax` |
-| Model       | model weight and configuration files                   | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Metax-FP16                                  |
 # Evaluation Results
@@ -55,8 +55,8 @@ We provide dequantized model weights in bfloat16 to run DeepSeek-R1 on Metax GPU
 | MMLU (Acc.)        | 85.34                   |  85.38                        |
 | CEVAL              | 89.00                   |  89.23                        |
 | AIME 2024 (Pass@1) | 76.67                   |  76.67                          |
-| GPQA-Diamond (Pass@1) | 70.20                   |  71.72                         |
-| MATH-500 (Pass@1) | 93.20                   |  93.80                          |
 # How to Run Locally
 ## 📌 Getting Started
@@ -141,8 +141,89 @@ We warmly welcome global developers to join us:
 Scan the QR code below to add our WeChat group
 send "FlagRelease"
-![WeChat](https://cdn-uploads.huggingface.co/production/uploads/673326280dbcb3477ecc2af6/aETN9Zswqts2P9YLrizrz.png)
 # License
 This project and related model weights are licensed under the MIT License.

 1. Comprehensive Integration:
    - Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
    - Open-source inference execution code, preconfigured with all necessary software and hardware settings.
+   - Verified model files, available on Hugging Face ([Model Link](https://huggingface.co/FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16)).
    - Pre-built Docker image for rapid deployment on Metax-C550.
 2. High-Precision BF16 Checkpoints:
    - BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
 We use a variety of Triton-implemented operation kernels—approximately 70%—to run the DeepSeek-R1 model. These kernels come from two main sources:
+- Most Triton kernels are provided by FlagGems (GitHub - FlagOpen/FlagGems: FlagGems is an operator library for large language models implemented in). You can enable FlagGems kernels by setting the environment variable USE_FLAGGEMS. For more details, please refer to the "How to Run Locally" section.
 - Also included are Triton kernels from vLLM, including fused MoE.
 |             | Usage                                                  | Metax                                                                                                       |
 | ----------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
 | Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-metax` |
+| Model       | model weight and configuration files                   | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16                                  |
 # Evaluation Results
 | MMLU (Acc.)        | 85.34                   |  85.38                        |
 | CEVAL              | 89.00                   |  89.23                        |
 | AIME 2024 (Pass@1) | 76.67                   |  76.67                          |
+| GPQA-Diamond (Pass@1) | 70.20                |  71.72                        |
+| AIME 2024 (Pass@1) | 93.20      |  93.80           |
 # How to Run Locally
 ## 📌 Getting Started
 Scan the QR code below to add our WeChat group
 send "FlagRelease"
+![WeChat](image/group.png)
 # License
 This project and related model weights are licensed under the MIT License.
+This project and related model weights  are  licensed under the Apache License (Version 2.0).
+<p style="color: lightgrey;">如果您是本模型的贡献者，我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>，及时完善模型卡片内容。</p>
+# Initial installation steps：
+## 📌 Getting Started
+### Environment Setup
+```bash
+# Download checkpoint
+pip install modelscope
+modelscope download --model FlagRelease/DeepSeek-R1-FlagOS-Metax-BF16 --local_dir <Download URL>
+# build and enter the container 【Perform this operation on four machines】
+docker run -it --device=/dev/dri --device=/dev/mxcd --group-add video --name flagrelease_metax --device=/dev/mem --network=host --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size '100gb' --ulimit memlock=-1 -v /usr/local/:/usr/local/ -v <CKPT_PATH>:<CKPT_PATH> flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-metax /bin/bash
+```
+### Modify the `config.json` for Deepseek-R1-671b
+```
+# Locate and remove the following JSON configuration:
+"quantization_config": {
+  "activation_scheme": "dynamic",
+  "fmt": "e4m3",
+  "quant_method": "fp8",
+  "weight_block_size": [
+    128,
+    128
+  ]
+},
+```
+### Configure environment variables
+```
+# Create an ‘env.sh’ file with:
+export GLOO_SOCKET_IF_NAME=ens20np0  # Note: The value of GLOO_SOCKET_IF_NAME should be the network interface name for inter-machine communication. Use `ifconfig` to check network interfaces.
+export VLLM_LOGGING_LEVEL=DEBUG
+export VLLM_PP_LAYER_PARTITION=16,15,15,15
+export MACA_SMALL_PAGESIZE_ENABLE=1
+source env.sh
+```
+### Start Ray Cluster
+```
+# On the **main node** (first machine), run:
+ray start --head --num-gpus=8
+# On **other nodes**, execute `ray start --address='<main_node_ip:port>'` (use the IP and port displayed by the main node).
+# After all nodes start Ray, run `ray status` on the main node. Ensure **32 GPUs** are recognized.
+# Note: If environment variables are modified, restart Ray on all nodes (`ray stop`). Stop worker nodes first, then the main node.
+```
+### Serve
+```bash
+# On the main node:
+vllm serve /nfs/deepseek_r1_BF16 -pp 4 -tp 8 --trust-remote-code --distributed-executor-backend ray --dtype bfloat16 --max-model-len 4096 --swap-space 16 --gpu-memory-utilization 0.90
+# Once the model loads fully, use the API for inference.**Test with ‘client.py’**
+```
+ `client.py`
+```bash
+curl http://localhost:8000/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "<model path>",
+        "messages": [
+            {"role": "system", "content": "You are a helpful assistant."},
+            {"role": "user", "content": "Who won the world series in 2020?"}
+        ]
+    }'
+```
+#