zhejianglab-ospo commited on
Commit
e3d1326
·
verified ·
1 Parent(s): 84493fe

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # DeepSeek-V3.2-Retro
5
+
6
+ This repository hosts the model weights for **DeepSeek-V3.2-Retro**. For instructions and details, please refer to the [GitHub](https://github.com/zhejianglab/DeepSeek-V3.2-Retro).
7
+
8
+ ## 1. Introduction
9
+ [DeepSeek-V3.2](https://huggingface.co/deepseek-ai/DeepSeek-V3.2)
10
+ introduces the DeepSeek Sparse Attention (DSA) architecture, representing a significant architectural evolution over [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) and [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1). However, as of now, an official open-source implementation compatible with Ampere-series GPUs has not been released.
11
+
12
+ To address this gap, we introduce **DeepSeek-V3.2-Retro**, targeting the following user groups:
13
+
14
+ - Ampere GPU users who do not have access to Hopper or Blackwell architectures.
15
+ - Users of general-purpose GPU platforms where DSA is not yet supported.
16
+
17
+ Key features of **DeepSeek-V3.2-Retro** include:
18
+
19
+ - Removal of the DSA modules from the original V3.2 architecture.
20
+ - Conversion of model parameters and computation to the BF16 data format.
21
+ - Broad Compatibility: runs on any hardware platform that supports the V3 architecture.
22
+ - Validated Performance: achieves performance on multiple benchmarks that is close to the [officially reported results](https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf).
23
+
24
+ ## 2. Performance Evaluation
25
+ As our primary target scenario is reasoning-oriented usage, we report accuracy results on several representative benchmarks after enabling the thinking feature. All evaluation metrics are taken from the corresponding official technical reports for consistency.
26
+
27
+ <div align="center">
28
+
29
+ | Benchmark | [DeepSeek-V3.2-Retro](https://github.com/zhejianglab/DeepSeek-V3.2-Retro) | [DeepSeek-V3.2-Thinking](https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf) |
30
+ | :---: | :---: | :---: |
31
+ | MMLU-Pro | 86.4 | 85.0 |
32
+ | GPQA Diamond | 82.12 | 82.4 |
33
+ | AIME 2025 | 93.67 | 93.1 |
34
+ | LiveCodeBench | 80.72 | 83.3 |
35
+
36
+ </div>
37
+
38
+ In addition, we evaluate inference efficiency. Using SGLang v0.5.6 under identical settings, we observe that the throughput of DeepSeek-V3.2-Retro is on par with DeepSeek-V3.1. Output throughput is reported in tokens/s.
39
+
40
+ <div align="center">
41
+
42
+ | Model | Output Throughput (qps=512, input=1k, output=10k) |
43
+ | :---: | :---: |
44
+ | [DeepSeek-V3.2-Retro](https://github.com/zhejianglab/DeepSeek-V3.2-Retro) | 2510.27 |
45
+ | [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) | 2515.34 |
46
+
47
+ </div>
48
+
49
+ These results indicate that removing the DSA structure and reverting to a V3-compatible architecture does not introduce noticeable performance regression in either reasoning accuracy or inference throughput on Ampere-class hardware.
50
+
51
+ ## 3. Model Download
52
+ DeepSeek-V3.2-Retro model is available for download from [Hugging Face](https://huggingface.co/ZhejiangLab/DeepSeek-V3.2-Retro) and [ModelScope](https://modelscope.cn/models/zhejianglab/DeepSeek-V3.2-Retro). Please ensure that you have at least 1.5 TB of available disk space before downloading the model.
53
+
54
+ <div align="center">
55
+
56
+ | **Model** | **Total Params** | **Hugging Face** | **ModelScope** |
57
+ |:---------:|:----------------:|:----------------:|:--------------:|
58
+ | DeepSeek-V3.2-Retro | 684 B | [🤗 Hugging Face](https://huggingface.co/ZhejiangLab/DeepSeek-V3.2-Retro) |[🤖 ModelScope](https://modelscope.cn/models/zhejianglab/DeepSeek-V3.2-Retro) |
59
+
60
+ </div>
61
+
62
+ ## 4. Quickstart
63
+
64
+ We strongly recommend using SGLang for efficient inference of the DeepSeek series models. We provide example configurations for SGLang serving on four A100*8 nodes.
65
+
66
+ ### SGLang
67
+
68
+ #### Using Docker (Recommended)
69
+
70
+ ```docker
71
+ # Pull latest image on four nodes and ensure RDMA network connectivity between the 4 nodes.
72
+ # https://hub.docker.com/r/lmsysorg/sglang/tags
73
+ docker pull lmsysorg/sglang:latest
74
+ ```
75
+
76
+ #### Launch Command
77
+
78
+ ```python
79
+ # For high QPS scenarios, add --enable-dp-attention and --ep-size arguments to boost throughput, and use mtp to boost decoding speed.
80
+ # node 1
81
+ python3 -m sglang.launch_server --model-path /path/to/DeepSeek-V3.2-Retro --tp 32 --dist-init-addr 10.0.0.1:5000 --nnodes 4 --node-rank 0 --trust-remote-code --host 0.0.0.0 --port 30000 --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-dp-attention --dp 8 --ep-size 32 --enable-dp-lm-head
82
+
83
+ # node 2
84
+ python3 -m sglang.launch_server --model-path /path/to/DeepSeek-V3.2-Retro --tp 32 --dist-init-addr 10.0.0.1:5000 --nnodes 4 --node-rank 1 --trust-remote-code --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-dp-attention --dp 8 --ep-size 32 --enable-dp-lm-head
85
+
86
+ # node 3
87
+ python3 -m sglang.launch_server --model-path /path/to/DeepSeek-V3.2-Retro --tp 32 --dist-init-addr 10.0.0.1:5000 --nnodes 4 --node-rank 2 --trust-remote-code --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-dp-attention --dp 8 --ep-size 32 --enable-dp-lm-head
88
+
89
+ # node 4
90
+ python3 -m sglang.launch_server --model-path /path/to/DeepSeek-V3.2-Retro --tp 32 --dist-init-addr 10.0.0.1:5000 --nnodes 4 --node-rank 3 --trust-remote-code --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-dp-attention --dp 8 --ep-size 32 --enable-dp-lm-head
91
+ ```
92
+
93
+ ## 5. License
94
+ This repository and the model weights are licensed under the MIT License, following the license of DeepSeek-V3.2. In addition, if you use DeepSeek-V3.2, you shall also comply with the terms and conditions of DeepSeek-V3.2.
95
+
96
+ ## 6. Contact
97
+ If you have any questions, please raise an [issue](https://github.com/zhejianglab/DeepSeek-V3.2-Retro/issues) or contact us at opensource@zhejianglab.org.