Add text-generation pipeline tag and MIT license
#2
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,6 +1,10 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
| 4 |
<!-- markdownlint-disable first-line-h1 -->
|
| 5 |
<!-- markdownlint-disable html -->
|
| 6 |
<!-- markdownlint-disable no-duplicate-header -->
|
|
@@ -44,7 +48,7 @@ library_name: transformers
|
|
| 44 |
|
| 45 |
|
| 46 |
<p align="center">
|
| 47 |
-
<a href="https://
|
| 48 |
</p>
|
| 49 |
|
| 50 |
|
|
@@ -101,7 +105,8 @@ Throughout the entire training process, we did not experience any irrecoverable
|
|
| 101 |
|
| 102 |
</div>
|
| 103 |
|
| 104 |
-
|
|
|
|
| 105 |
|
| 106 |
To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
|
| 107 |
|
|
@@ -132,7 +137,7 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
|
|
| 132 |
| | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
|
| 133 |
| | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
|
| 134 |
| | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
|
| 135 |
-
| | TriviaQA (EM) | 5-shot | 80.0 | 71.9 |
|
| 136 |
| | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
|
| 137 |
| | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
|
| 138 |
| Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
|
|
@@ -154,8 +159,9 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
|
|
| 154 |
|
| 155 |
</div>
|
| 156 |
|
| 157 |
-
|
| 158 |
-
|
|
|
|
| 159 |
|
| 160 |
#### Context Window
|
| 161 |
<p align="center">
|
|
@@ -216,9 +222,11 @@ Note: All models are evaluated in a configuration that limits the output length
|
|
| 216 |
| Claude-Sonnet-3.5-1022 | 85.2 | 52.0 |
|
| 217 |
| DeepSeek-V3 | **85.5** | **70.0** |
|
| 218 |
|
| 219 |
-
Note: English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
|
| 220 |
</div>
|
| 221 |
|
|
|
|
|
|
|
|
|
|
| 222 |
|
| 223 |
## 5. Chat Website & API Platform
|
| 224 |
You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
|
|
@@ -232,8 +240,8 @@ DeepSeek-V3 can be deployed locally using the following hardware and open-source
|
|
| 232 |
1. **DeepSeek-Infer Demo**: We provide a simple and lightweight demo for FP8 and BF16 inference.
|
| 233 |
2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
|
| 234 |
3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
|
| 235 |
-
4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/
|
| 236 |
-
5. **vLLM**: Support
|
| 237 |
6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
|
| 238 |
7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
|
| 239 |
|
|
@@ -246,7 +254,8 @@ cd inference
|
|
| 246 |
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
|
| 247 |
```
|
| 248 |
|
| 249 |
-
|
|
|
|
| 250 |
|
| 251 |
### 6.1 Inference with DeepSeek-Infer Demo (example only)
|
| 252 |
|
|
@@ -269,7 +278,7 @@ Download the model weights from HuggingFace, and put them into `/path/to/DeepSee
|
|
| 269 |
|
| 270 |
#### Model Weights Conversion
|
| 271 |
|
| 272 |
-
Convert
|
| 273 |
|
| 274 |
```shell
|
| 275 |
python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
|
|
@@ -280,13 +289,13 @@ python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepS
|
|
| 280 |
Then you can chat with DeepSeek-V3:
|
| 281 |
|
| 282 |
```shell
|
| 283 |
-
torchrun --nnodes 2 --nproc-per-node 8
|
| 284 |
```
|
| 285 |
|
| 286 |
Or batch inference on a given file:
|
| 287 |
|
| 288 |
```shell
|
| 289 |
-
torchrun --nnodes 2 --nproc-per-node 8
|
| 290 |
```
|
| 291 |
|
| 292 |
### 6.2 Inference with SGLang (recommended)
|
|
@@ -336,4 +345,4 @@ This code repository is licensed under [the MIT License](LICENSE-CODE). The use
|
|
| 336 |
```
|
| 337 |
|
| 338 |
## 9. Contact
|
| 339 |
-
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
+
license: mit
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
---
|
| 6 |
+
|
| 7 |
+
```markdown
|
| 8 |
<!-- markdownlint-disable first-line-h1 -->
|
| 9 |
<!-- markdownlint-disable html -->
|
| 10 |
<!-- markdownlint-disable no-duplicate-header -->
|
|
|
|
| 48 |
|
| 49 |
|
| 50 |
<p align="center">
|
| 51 |
+
<a href="https://arxiv.org/abs/2412.19437"><b>Paper Link</b>👁️</a>
|
| 52 |
</p>
|
| 53 |
|
| 54 |
|
|
|
|
| 105 |
|
| 106 |
</div>
|
| 107 |
|
| 108 |
+
> [!NOTE]
|
| 109 |
+
> The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.
|
| 110 |
|
| 111 |
To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
|
| 112 |
|
|
|
|
| 137 |
| | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
|
| 138 |
| | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
|
| 139 |
| | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
|
| 140 |
+
| | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | 82.7 | **82.9** |
|
| 141 |
| | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
|
| 142 |
| | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
|
| 143 |
| Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
|
|
|
|
| 159 |
|
| 160 |
</div>
|
| 161 |
|
| 162 |
+
> [!NOTE]
|
| 163 |
+
> Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.
|
| 164 |
+
> For more evaluation details, please check our paper.
|
| 165 |
|
| 166 |
#### Context Window
|
| 167 |
<p align="center">
|
|
|
|
| 222 |
| Claude-Sonnet-3.5-1022 | 85.2 | 52.0 |
|
| 223 |
| DeepSeek-V3 | **85.5** | **70.0** |
|
| 224 |
|
|
|
|
| 225 |
</div>
|
| 226 |
|
| 227 |
+
> [!NOTE]
|
| 228 |
+
> English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
|
| 229 |
+
|
| 230 |
|
| 231 |
## 5. Chat Website & API Platform
|
| 232 |
You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
|
|
|
|
| 240 |
1. **DeepSeek-Infer Demo**: We provide a simple and lightweight demo for FP8 and BF16 inference.
|
| 241 |
2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
|
| 242 |
3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
|
| 243 |
+
4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/INT8 quantization, with FP8 support coming soon.
|
| 244 |
+
5. **vLLM**: Support DeepSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
|
| 245 |
6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
|
| 246 |
7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
|
| 247 |
|
|
|
|
| 254 |
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
|
| 255 |
```
|
| 256 |
|
| 257 |
+
> [!NOTE]
|
| 258 |
+
> Huggingface's Transformers has not been directly supported yet.
|
| 259 |
|
| 260 |
### 6.1 Inference with DeepSeek-Infer Demo (example only)
|
| 261 |
|
|
|
|
| 278 |
|
| 279 |
#### Model Weights Conversion
|
| 280 |
|
| 281 |
+
Convert Hugging Face model weights to a specific format:
|
| 282 |
|
| 283 |
```shell
|
| 284 |
python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
|
|
|
|
| 289 |
Then you can chat with DeepSeek-V3:
|
| 290 |
|
| 291 |
```shell
|
| 292 |
+
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
|
| 293 |
```
|
| 294 |
|
| 295 |
Or batch inference on a given file:
|
| 296 |
|
| 297 |
```shell
|
| 298 |
+
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE
|
| 299 |
```
|
| 300 |
|
| 301 |
### 6.2 Inference with SGLang (recommended)
|
|
|
|
| 345 |
```
|
| 346 |
|
| 347 |
## 9. Contact
|
| 348 |
+
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.
|