Add text-generation pipeline tag and MIT license

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +22 -13
README.md CHANGED
@@ -1,6 +1,10 @@
1
  ---
2
  library_name: transformers
 
 
3
  ---
 
 
4
  <!-- markdownlint-disable first-line-h1 -->
5
  <!-- markdownlint-disable html -->
6
  <!-- markdownlint-disable no-duplicate-header -->
@@ -44,7 +48,7 @@ library_name: transformers
44
 
45
 
46
  <p align="center">
47
- <a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf"><b>Paper Link</b>👁️</a>
48
  </p>
49
 
50
 
@@ -101,7 +105,8 @@ Throughout the entire training process, we did not experience any irrecoverable
101
 
102
  </div>
103
 
104
- **NOTE: The total size of DeepSeek-V3 models on HuggingFace is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.**
 
105
 
106
  To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
107
 
@@ -132,7 +137,7 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
132
  | | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
133
  | | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
134
  | | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
135
- | | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | **82.7** | **82.9** |
136
  | | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
137
  | | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
138
  | Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
@@ -154,8 +159,9 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
154
 
155
  </div>
156
 
157
- Note: Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.
158
- For more evaluation details, please check our paper.
 
159
 
160
  #### Context Window
161
  <p align="center">
@@ -216,9 +222,11 @@ Note: All models are evaluated in a configuration that limits the output length
216
  | Claude-Sonnet-3.5-1022 | 85.2 | 52.0 |
217
  | DeepSeek-V3 | **85.5** | **70.0** |
218
 
219
- Note: English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
220
  </div>
221
 
 
 
 
222
 
223
  ## 5. Chat Website & API Platform
224
  You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
@@ -232,8 +240,8 @@ DeepSeek-V3 can be deployed locally using the following hardware and open-source
232
  1. **DeepSeek-Infer Demo**: We provide a simple and lightweight demo for FP8 and BF16 inference.
233
  2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
234
  3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
235
- 4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
236
- 5. **vLLM**: Support DeekSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
237
  6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
238
  7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
239
 
@@ -246,7 +254,8 @@ cd inference
246
  python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
247
  ```
248
 
249
- **NOTE: Huggingface's Transformers has not been directly supported yet.**
 
250
 
251
  ### 6.1 Inference with DeepSeek-Infer Demo (example only)
252
 
@@ -269,7 +278,7 @@ Download the model weights from HuggingFace, and put them into `/path/to/DeepSee
269
 
270
  #### Model Weights Conversion
271
 
272
- Convert HuggingFace model weights to a specific format:
273
 
274
  ```shell
275
  python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
@@ -280,13 +289,13 @@ python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepS
280
  Then you can chat with DeepSeek-V3:
281
 
282
  ```shell
283
- torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
284
  ```
285
 
286
  Or batch inference on a given file:
287
 
288
  ```shell
289
- torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE
290
  ```
291
 
292
  ### 6.2 Inference with SGLang (recommended)
@@ -336,4 +345,4 @@ This code repository is licensed under [the MIT License](LICENSE-CODE). The use
336
  ```
337
 
338
  ## 9. Contact
339
- If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
+ pipeline_tag: text-generation
5
  ---
6
+
7
+ ```markdown
8
  <!-- markdownlint-disable first-line-h1 -->
9
  <!-- markdownlint-disable html -->
10
  <!-- markdownlint-disable no-duplicate-header -->
 
48
 
49
 
50
  <p align="center">
51
+ <a href="https://arxiv.org/abs/2412.19437"><b>Paper Link</b>👁️</a>
52
  </p>
53
 
54
 
 
105
 
106
  </div>
107
 
108
+ > [!NOTE]
109
+ > The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.
110
 
111
  To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
112
 
 
137
  | | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
138
  | | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
139
  | | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
140
+ | | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | 82.7 | **82.9** |
141
  | | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
142
  | | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
143
  | Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
 
159
 
160
  </div>
161
 
162
+ > [!NOTE]
163
+ > Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.
164
+ > For more evaluation details, please check our paper.
165
 
166
  #### Context Window
167
  <p align="center">
 
222
  | Claude-Sonnet-3.5-1022 | 85.2 | 52.0 |
223
  | DeepSeek-V3 | **85.5** | **70.0** |
224
 
 
225
  </div>
226
 
227
+ > [!NOTE]
228
+ > English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
229
+
230
 
231
  ## 5. Chat Website & API Platform
232
  You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
 
240
  1. **DeepSeek-Infer Demo**: We provide a simple and lightweight demo for FP8 and BF16 inference.
241
  2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
242
  3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
243
+ 4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/INT8 quantization, with FP8 support coming soon.
244
+ 5. **vLLM**: Support DeepSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
245
  6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
246
  7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
247
 
 
254
  python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
255
  ```
256
 
257
+ > [!NOTE]
258
+ > Huggingface's Transformers has not been directly supported yet.
259
 
260
  ### 6.1 Inference with DeepSeek-Infer Demo (example only)
261
 
 
278
 
279
  #### Model Weights Conversion
280
 
281
+ Convert Hugging Face model weights to a specific format:
282
 
283
  ```shell
284
  python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
 
289
  Then you can chat with DeepSeek-V3:
290
 
291
  ```shell
292
+ torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
293
  ```
294
 
295
  Or batch inference on a given file:
296
 
297
  ```shell
298
+ torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE
299
  ```
300
 
301
  ### 6.2 Inference with SGLang (recommended)
 
345
  ```
346
 
347
  ## 9. Contact
348
+ If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.