Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ DeepSeek-R1-FlagOS-NVIDIA-BF16 provides an all-in-one deployment solution, enabl
|
|
| 5 |
1. Comprehensive Integration:
|
| 6 |
- Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
|
| 7 |
- Open-source inference execution code, preconfigured with all necessary software and hardware settings.
|
| 8 |
-
- Verified model files, available on
|
| 9 |
- Pre-built Docker image for rapid deployment on NVIDIA-H100.
|
| 10 |
2. High-Precision BF16 Checkpoints:
|
| 11 |
- BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
|
|
@@ -40,97 +40,128 @@ We provide dequantized model weights in bfloat16 to run DeepSeek-R1 on NVIDIA GP
|
|
| 40 |
|
| 41 |
# Bundle Download
|
| 42 |
|
| 43 |
-
| | Usage | Nvidia
|
| 44 |
-
| ----------- | ------------------------------------------------------ |
|
| 45 |
| Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia` |
|
| 46 |
-
| Model | model weight and configuration files | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Nvidia-BF16
|
| 47 |
|
| 48 |
# Evaluation Results
|
| 49 |
|
| 50 |
## Benchmark Result
|
| 51 |
|
| 52 |
-
| Metrics
|
| 53 |
-
|
| 54 |
-
| GSM8K (EM)
|
| 55 |
-
| MMLU (Acc.)
|
| 56 |
-
| CEVAL
|
| 57 |
-
| AIME 2024 (Pass@1)
|
| 58 |
-
| GPQA-Diamond (Pass@1) | 70.20
|
| 59 |
-
| MATH-500 (Pass@1)
|
| 60 |
-
| MMLU-Pro (Acc.)
|
| 61 |
|
| 62 |
|
| 63 |
# How to Run Locally
|
|
|
|
| 64 |
## 📌 Getting Started
|
| 65 |
-
|
|
|
|
| 66 |
|
| 67 |
```bash
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
pip install .
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
-
|
| 77 |
|
| 78 |
-
|
| 79 |
docker run -itd --name flagrelease_nv --privileged --gpus all --net=host --ipc=host --device=/dev/infiniband --shm-size 512g --ulimit memlock=-1 -v <CKPT_PATH>:<CKPT_PATH> flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia /bin/bash
|
|
|
|
| 80 |
docker exec -it flagrelease_nv /bin/bash
|
| 81 |
|
| 82 |
conda activate flagscale-inference
|
| 83 |
```
|
| 84 |
|
| 85 |
-
|
| 86 |
### Download and install FlagGems
|
| 87 |
|
| 88 |
-
```
|
| 89 |
git clone https://github.com/FlagOpen/FlagGems.git
|
| 90 |
cd FlagGems
|
| 91 |
-
pip install ./
|
| 92 |
-
cd ../
|
| 93 |
-
```
|
| 94 |
-
|
| 95 |
-
### Download FlagScale and build vllm
|
| 96 |
-
|
| 97 |
-
```bash
|
| 98 |
-
git clone https://github.com/FlagOpen/FlagScale.git
|
| 99 |
-
cd FlagScale/
|
| 100 |
-
git checkout ae85925798358d95050773dfa66680efdb0c2b28
|
| 101 |
-
cd vllm
|
| 102 |
pip install .
|
| 103 |
cd ../
|
| 104 |
```
|
| 105 |
-
|
|
|
|
| 106 |
|
| 107 |
```bash
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
# install flagscale
|
|
|
|
| 118 |
pip install .
|
| 119 |
-
|
| 120 |
-
# serve
|
| 121 |
-
flagscale serve deepseek_r1
|
| 122 |
```
|
| 123 |
|
| 124 |
-
|
| 125 |
-
When custom service parameters, users can run:
|
| 126 |
|
| 127 |
-
```bash
|
| 128 |
-
flagscale serve <MODEL_NAME> <MODEL_CONFIG_YAML>
|
| 129 |
```
|
|
|
|
|
|
|
|
|
|
| 130 |
|
| 131 |
# Contributing
|
| 132 |
|
| 133 |
We warmly welcome global developers to join us:
|
|
|
|
| 134 |
1. Submit Issues to report problems
|
| 135 |
2. Create Pull Requests to contribute code
|
| 136 |
3. Improve technical documentation
|
|
@@ -141,7 +172,7 @@ We warmly welcome global developers to join us:
|
|
| 141 |
Scan the QR code below to add our WeChat group
|
| 142 |
send "FlagRelease"
|
| 143 |
|
| 144 |
-
.
|
| 7 |
- Open-source inference execution code, preconfigured with all necessary software and hardware settings.
|
| 8 |
+
- Verified model files, available on Hugging Face ([Model Link](https://huggingface.co/FlagRelease/DeepSeek-R1-FlagOS-Nvidia-BF16)).
|
| 9 |
- Pre-built Docker image for rapid deployment on NVIDIA-H100.
|
| 10 |
2. High-Precision BF16 Checkpoints:
|
| 11 |
- BF16 checkpoints dequantized from the official DeepSeek-R1 FP8 model to ensure enhanced inference accuracy and performance.
|
|
|
|
| 40 |
|
| 41 |
# Bundle Download
|
| 42 |
|
| 43 |
+
| | Usage | Nvidia |
|
| 44 |
+
| ----------- | ------------------------------------------------------ | ------------------------------------------------------------ |
|
| 45 |
| Basic Image | basic software environment that supports model running | `docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia` |
|
| 46 |
+
| Model | model weight and configuration files | https://www.modelscope.cn/models/FlagRelease/DeepSeek-R1-FlagOS-Nvidia-BF16 |
|
| 47 |
|
| 48 |
# Evaluation Results
|
| 49 |
|
| 50 |
## Benchmark Result
|
| 51 |
|
| 52 |
+
| Metrics | DeepSeek-R1-H100-CUDA | DeepSeek-R1-H100-FlagOS |
|
| 53 |
+
| --------------------- | --------------------- | ----------------------- |
|
| 54 |
+
| GSM8K (EM) | 95.75 | 95.83 |
|
| 55 |
+
| MMLU (Acc.) | 85.34 | 85.56 |
|
| 56 |
+
| CEVAL | 89.00 | 89.60 |
|
| 57 |
+
| AIME 2024 (Pass@1) | 76.67 | 70.00 |
|
| 58 |
+
| GPQA-Diamond (Pass@1) | 70.20 | 71.21 |
|
| 59 |
+
| MATH-500 (Pass@1) | 93.20 | 94.80 |
|
| 60 |
+
| MMLU-Pro (Acc.) | TBD | TBD |
|
| 61 |
|
| 62 |
|
| 63 |
# How to Run Locally
|
| 64 |
+
|
| 65 |
## 📌 Getting Started
|
| 66 |
+
|
| 67 |
+
### Download open-source weights
|
| 68 |
|
| 69 |
```bash
|
| 70 |
+
pip install modelscope
|
| 71 |
+
modelscope download --model <Model Name> --local_dir <Cache Path>
|
| 72 |
+
```
|
|
|
|
| 73 |
|
| 74 |
+
### Download the FlagOS image
|
| 75 |
+
|
| 76 |
+
```bash
|
| 77 |
+
docker pull <IMAGE>
|
| 78 |
+
```
|
| 79 |
|
| 80 |
+
### Start the inference service
|
| 81 |
|
| 82 |
+
```bash
|
| 83 |
docker run -itd --name flagrelease_nv --privileged --gpus all --net=host --ipc=host --device=/dev/infiniband --shm-size 512g --ulimit memlock=-1 -v <CKPT_PATH>:<CKPT_PATH> flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia /bin/bash
|
| 84 |
+
|
| 85 |
docker exec -it flagrelease_nv /bin/bash
|
| 86 |
|
| 87 |
conda activate flagscale-inference
|
| 88 |
```
|
| 89 |
|
|
|
|
| 90 |
### Download and install FlagGems
|
| 91 |
|
| 92 |
+
```
|
| 93 |
git clone https://github.com/FlagOpen/FlagGems.git
|
| 94 |
cd FlagGems
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
pip install .
|
| 96 |
cd ../
|
| 97 |
```
|
| 98 |
+
|
| 99 |
+
### Modify the configuration
|
| 100 |
|
| 101 |
```bash
|
| 102 |
+
cd FlagScale/examples/deepseek_r1/conf
|
| 103 |
+
# Modify the configuration in config_deepseek_r1.yaml
|
| 104 |
+
defaults:
|
| 105 |
+
- _self_
|
| 106 |
+
- serve: deepseek_r1
|
| 107 |
+
experiment:
|
| 108 |
+
exp_name: deepseek_r1
|
| 109 |
+
exp_dir: outputs/${experiment.exp_name}
|
| 110 |
+
task:
|
| 111 |
+
type: serve
|
| 112 |
+
deploy:
|
| 113 |
+
use_fs_serve: false
|
| 114 |
+
runner:
|
| 115 |
+
hostfile: examples/deepseek_r1/conf/hostfile.txt # set hostfile
|
| 116 |
+
docker: flagrelease_nv # set docker
|
| 117 |
+
ssh_port: 22
|
| 118 |
+
envs:
|
| 119 |
+
CUDA_DEVICE_MAX_CONNECTIONS: 1
|
| 120 |
+
cmds:
|
| 121 |
+
before_start: source /root/miniconda3/bin/activate flagscale-inference && export GLOO_SOCKET_IFNAME=bond0 && export USE_FLAGGEMS=1 # The environment variable GLOO_SOCKET_IF_NAME must be set to the name of the network interface (e.g., eth0, enp0s3) corresponding to the subnet used for inter-machine communication. You can check interface details (IP addresses, names) using the ifconfig command.
|
| 122 |
+
action: run
|
| 123 |
+
hydra:
|
| 124 |
+
run:
|
| 125 |
+
dir: ${experiment.exp_dir}/hydra
|
| 126 |
+
|
| 127 |
+
# Modify the configuration in hostfile.txt
|
| 128 |
+
# ip slots type=xxx[optional]
|
| 129 |
+
# master node
|
| 130 |
+
x.x.x.x slots=8 type=gpu
|
| 131 |
+
# worker nodes
|
| 132 |
+
x.x.x.x slots=8 type=gpu
|
| 133 |
+
|
| 134 |
+
# Modify the configuration in /serve/deepseek_r1.yaml
|
| 135 |
+
- serve_id: vllm_model
|
| 136 |
+
engine: vllm
|
| 137 |
+
engine_args:
|
| 138 |
+
model: /models/deepseek_r1 # path of weight of deepseek r1
|
| 139 |
+
tensor_parallel_size: 8
|
| 140 |
+
pipeline_parallel_size: 4
|
| 141 |
+
gpu_memory_utilization: 0.9
|
| 142 |
+
max_model_len: 32768
|
| 143 |
+
max_num_seqs: 256
|
| 144 |
+
enforce_eager: true
|
| 145 |
+
trust_remote_code: true
|
| 146 |
+
enable_chunked_prefill: true
|
| 147 |
|
| 148 |
# install flagscale
|
| 149 |
+
cd FlagScale/
|
| 150 |
pip install .
|
| 151 |
+
# Configure passwordless container access by adding its key to other hosts.
|
|
|
|
|
|
|
| 152 |
```
|
| 153 |
|
| 154 |
+
### Serve
|
|
|
|
| 155 |
|
|
|
|
|
|
|
| 156 |
```
|
| 157 |
+
flagscale serve <Model>
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
|
| 161 |
# Contributing
|
| 162 |
|
| 163 |
We warmly welcome global developers to join us:
|
| 164 |
+
|
| 165 |
1. Submit Issues to report problems
|
| 166 |
2. Create Pull Requests to contribute code
|
| 167 |
3. Improve technical documentation
|
|
|
|
| 172 |
Scan the QR code below to add our WeChat group
|
| 173 |
send "FlagRelease"
|
| 174 |
|
| 175 |
+

|
| 176 |
|
| 177 |
# License
|
| 178 |
|