FlagRelease
/

DeepSeek-R1-FlagOS-Nvidia-BF16

@@ -5,7 +5,7 @@ DeepSeek-R1-FlagOS-NVIDIA-BF16 provides an all-in-one deployment solution, enabl
 1. Comprehensive Integration:
    - Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
    - Open-source inference execution code, preconfigured with all necessary software and hardware settings.
-   - Verified model files, available on ModelScope ([Model Link](https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Nvidia-BF16)).
    - Pre-built Docker image for rapid deployment on NVIDIA-H100.
 2. High-Precision BF16 Checkpoints:
    - BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
@@ -40,97 +40,128 @@ We provide dequantized model weights in bfloat16 to run DeepSeek-R1 on NVIDIA GP
 # Bundle Download
-|             | Usage                                                  | Nvidia                                                                                                       |
-| ----------- | ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------ |
 | Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia` |
-| Model       | model weight and configuration files                   | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Nvidia-BF16                                  |
 # Evaluation Results
 ## Benchmark Result
-| Metrics            | DeepSeek-R1-H100-CUDA | DeepSeek-R1-H100-FlagOS    |
-|--------------------|-----------------------|----------------------------|
-| GSM8K (EM)         | 95.75                 | 95.83                      |
-| MMLU (Acc.)        | 85.34                 | 85.56                      |
-| CEVAL              | 89.00                 | 89.60                      |
-| AIME 2024 (Pass@1) | 76.67                 | 70.00                        |
-| GPQA-Diamond (Pass@1) | 70.20                | 71.21                        |
-| MATH-500 (Pass@1) | 93.20                 | 94.80                       |
-| MMLU-Pro (Acc.)    | TBD                   | TBD                        |
 # How to Run Locally
 ## 📌 Getting Started
-### Environment Setup
 ```bash
-# install FlagScale
-git clone https://github.com/FlagOpen/FlagScale.git
-cd FlagScale
-pip install .
-# download image and ckpt
-flagscale pull --image flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia --ckpt https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Nvidia-BF16.git --ckpt-path <CKPT_PATH>
-# Note: For security reasons, this image does not have passwordless configuration. In multi-machine scenarios, you need to configure passwordless access for the image yourself.
-# build and enter the container
 docker run -itd --name flagrelease_nv  --privileged --gpus all --net=host --ipc=host --device=/dev/infiniband --shm-size 512g --ulimit memlock=-1 -v <CKPT_PATH>:<CKPT_PATH> flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia /bin/bash
 docker exec -it flagrelease_nv /bin/bash
 conda activate flagscale-inference
 ```
 ### Download and install FlagGems
-```bash
 git clone https://github.com/FlagOpen/FlagGems.git
 cd FlagGems
-pip install ./
-cd ../
- ```
-### Download FlagScale and build vllm
-```bash
-git clone https://github.com/FlagOpen/FlagScale.git
-cd FlagScale/
-git checkout ae85925798358d95050773dfa66680efdb0c2b28
-cd vllm
 pip install .
 cd ../
 ```
-### Serve
 ```bash
-# config the deepseek_r1 yaml
-FlagScale/
-├── examples/
-│   └── deepseek_r1/
-│       └── conf/
-│           └── config_deepseek_r1.yaml # set hostfile and ssh_port(optional), if it is passwordless access between containers, the docker field needs to be removed
-│           └── serve/
-│               └── deepseek_r1.yaml # set model parameters and server port
 # install flagscale
 pip install .
-# serve
-flagscale serve deepseek_r1
 ```
-# Usage Recommendations
-When custom service parameters, users can run:
-```bash
-flagscale serve <MODEL_NAME> <MODEL_CONFIG_YAML>
 ```
 # Contributing
 We warmly welcome global developers to join us:
 1. Submit Issues to report problems
 2. Create Pull Requests to contribute code
 3. Improve technical documentation
@@ -141,7 +172,7 @@ We warmly welcome global developers to join us:
 Scan the QR code below to add our WeChat group
 send "FlagRelease"
-![WeChat](https://cdn-uploads.huggingface.co/production/uploads/673326280dbcb3477ecc2af6/aETN9Zswqts2P9YLrizrz.png)
 # License

 1. Comprehensive Integration:
    - Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
    - Open-source inference execution code, preconfigured with all necessary software and hardware settings.
+   - Verified model files, available on Hugging Face ([Model Link](https://huggingface.co/FlagRelease/DeepSeek-R1-FlagOS-Nvidia-BF16)).
    - Pre-built Docker image for rapid deployment on NVIDIA-H100.
 2. High-Precision BF16 Checkpoints:
    - BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
 # Bundle Download
+|             | Usage                                                  | Nvidia                                                       |
+| ----------- | ------------------------------------------------------ | ------------------------------------------------------------ |
 | Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia` |
+| Model       | model weight and configuration files                   | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Nvidia-BF16 |
 # Evaluation Results
 ## Benchmark Result
+| Metrics               | DeepSeek-R1-H100-CUDA | DeepSeek-R1-H100-FlagOS |
+| --------------------- | --------------------- | ----------------------- |
+| GSM8K (EM)            | 95.75                 | 95.83                   |
+| MMLU (Acc.)           | 85.34                 | 85.56                   |
+| CEVAL                 | 89.00                 | 89.60                   |
+| AIME 2024 (Pass@1)    | 76.67                 | 70.00                   |
+| GPQA-Diamond (Pass@1) | 70.20                 | 71.21                   |
+| MATH-500 (Pass@1)     | 93.20                 | 94.80                   |
+| MMLU-Pro (Acc.)       | TBD                   | TBD                     |
 # How to Run Locally
 ## 📌 Getting Started
+### Download open-source weights
 ```bash
+pip install modelscope
+modelscope download --model <Model Name> --local_dir <Cache Path>
+```
+### Download the FlagOS image
+```bash
+docker pull <IMAGE>
+```
+### Start the inference service
+```bash
 docker run -itd --name flagrelease_nv  --privileged --gpus all --net=host --ipc=host --device=/dev/infiniband --shm-size 512g --ulimit memlock=-1 -v <CKPT_PATH>:<CKPT_PATH> flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia /bin/bash
 docker exec -it flagrelease_nv /bin/bash
 conda activate flagscale-inference
 ```
 ### Download and install FlagGems
+```
 git clone https://github.com/FlagOpen/FlagGems.git
 cd FlagGems
 pip install .
 cd ../
 ```
+### Modify the configuration
 ```bash
+cd FlagScale/examples/deepseek_r1/conf
+# Modify the configuration in config_deepseek_r1.yaml
+defaults:
+  - _self_
+  - serve: deepseek_r1
+experiment:
+  exp_name: deepseek_r1
+  exp_dir: outputs/${experiment.exp_name}
+  task:
+    type: serve
+  deploy:
+    use_fs_serve: false
+  runner:
+    hostfile: examples/deepseek_r1/conf/hostfile.txt  # set hostfile
+    docker: flagrelease_nv # set docker
+    ssh_port: 22
+  envs:
+    CUDA_DEVICE_MAX_CONNECTIONS: 1
+  cmds:
+    before_start: source /root/miniconda3/bin/activate flagscale-inference && export GLOO_SOCKET_IFNAME=bond0 && export USE_FLAGGEMS=1 # The environment variable GLOO_SOCKET_IF_NAME must be set to the name of the network interface (e.g., eth0, enp0s3) corresponding to the subnet used for inter-machine communication. You can check interface details (IP addresses, names) using the ifconfig command.
+action: run
+hydra:
+  run:
+    dir: ${experiment.exp_dir}/hydra
+# Modify the configuration in hostfile.txt
+# ip slots type=xxx[optional]
+# master node
+x.x.x.x slots=8 type=gpu
+# worker nodes
+x.x.x.x slots=8 type=gpu
+# Modify the configuration in /serve/deepseek_r1.yaml
+- serve_id: vllm_model
+  engine: vllm
+  engine_args:
+    model: /models/deepseek_r1 # path of weight of deepseek r1
+    tensor_parallel_size: 8
+    pipeline_parallel_size: 4
+    gpu_memory_utilization: 0.9
+    max_model_len: 32768
+    max_num_seqs: 256
+    enforce_eager: true
+    trust_remote_code: true
+    enable_chunked_prefill: true
 # install flagscale
+cd FlagScale/
 pip install .
+# Configure passwordless container access by adding its key to other hosts.
 ```
+### Serve
 ```
+flagscale serve <Model>
+```
 # Contributing
 We warmly welcome global developers to join us:
 1. Submit Issues to report problems
 2. Create Pull Requests to contribute code
 3. Improve technical documentation
 Scan the QR code below to add our WeChat group
 send "FlagRelease"
+![WeChat](image/group.png)
 # License