SGLang
/

DeepSeek-V3-NextN

@@ -1,6 +1,10 @@
 ---
 library_name: transformers
 ---
 <!-- markdownlint-disable first-line-h1 -->
 <!-- markdownlint-disable html -->
 <!-- markdownlint-disable no-duplicate-header -->
@@ -44,7 +48,7 @@ library_name: transformers
 <p align="center">
-  <a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf"><b>Paper Link</b>👁️</a>
 </p>
@@ -101,7 +105,8 @@ Throughout the entire training process, we did not experience any irrecoverable
 </div>
-**NOTE: The total size of DeepSeek-V3 models on HuggingFace is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.**
 To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
@@ -132,7 +137,7 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
 | | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
 | | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
 | | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
-| | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | **82.7** | **82.9** |
 | | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
 | | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
 | Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
@@ -154,8 +159,9 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
 </div>
-Note: Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.
-For more evaluation details, please check our paper.
 #### Context Window
 <p align="center">
@@ -216,9 +222,11 @@ Note: All models are evaluated in a configuration that limits the output length
 | Claude-Sonnet-3.5-1022 | 85.2 | 52.0 |
 | DeepSeek-V3 | **85.5** | **70.0** |
-Note: English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
 </div>
 ## 5. Chat Website & API Platform
 You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
@@ -232,8 +240,8 @@ DeepSeek-V3 can be deployed locally using the following hardware and open-source
 1. **DeepSeek-Infer Demo**: We provide a simple and lightweight demo for FP8 and BF16 inference.
 2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
 3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
-4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
-5. **vLLM**: Support DeekSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
 6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
 7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
@@ -246,7 +254,8 @@ cd inference
 python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
 ```
-**NOTE: Huggingface's Transformers has not been directly supported yet.**
 ### 6.1 Inference with DeepSeek-Infer Demo (example only)
@@ -269,7 +278,7 @@ Download the model weights from HuggingFace, and put them into `/path/to/DeepSee
 #### Model Weights Conversion
-Convert HuggingFace model weights to a specific format:
 ```shell
 python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
@@ -280,13 +289,13 @@ python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepS
 Then you can chat with DeepSeek-V3:
 ```shell
-torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
 ```
 Or batch inference on a given file:
 ```shell
-torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE
 ```
 ### 6.2 Inference with SGLang (recommended)
@@ -336,4 +345,4 @@ This code repository is licensed under [the MIT License](LICENSE-CODE). The use
 ```
 ## 9. Contact
-If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).

 ---
 library_name: transformers
+license: mit
+pipeline_tag: text-generation
 ---
+```markdown
 <!-- markdownlint-disable first-line-h1 -->
 <!-- markdownlint-disable html -->
 <!-- markdownlint-disable no-duplicate-header -->
 <p align="center">
+  <a href="https://arxiv.org/abs/2412.19437"><b>Paper Link</b>👁️</a>
 </p>
 </div>
+> [!NOTE]
+> The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.
 To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
 | | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
 | | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
 | | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
+| | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | 82.7 | **82.9** |
 | | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
 | | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
 | Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
 </div>
+> [!NOTE]
+> Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.
+> For more evaluation details, please check our paper.
 #### Context Window
 <p align="center">
 | Claude-Sonnet-3.5-1022 | 85.2 | 52.0 |
 | DeepSeek-V3 | **85.5** | **70.0** |
 </div>
+> [!NOTE]
+> English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
 ## 5. Chat Website & API Platform
 You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
 1. **DeepSeek-Infer Demo**: We provide a simple and lightweight demo for FP8 and BF16 inference.
 2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
 3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
+4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/INT8 quantization, with FP8 support coming soon.
+5. **vLLM**: Support DeepSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
 6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
 7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
 python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
 ```
+> [!NOTE]
+> Huggingface's Transformers has not been directly supported yet.
 ### 6.1 Inference with DeepSeek-Infer Demo (example only)
 #### Model Weights Conversion
+Convert Hugging Face model weights to a specific format:
 ```shell
 python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
 Then you can chat with DeepSeek-V3:
 ```shell
+torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
 ```
 Or batch inference on a given file:
 ```shell
+torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE
 ```
 ### 6.2 Inference with SGLang (recommended)
 ```
 ## 9. Contact
+If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.