Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
# DeepSeek-V3.2-Retro
|
| 5 |
+
|
| 6 |
+
This repository hosts the model weights for **DeepSeek-V3.2-Retro**. For instructions and details, please refer to the [GitHub](https://github.com/zhejianglab/DeepSeek-V3.2-Retro).
|
| 7 |
+
|
| 8 |
+
## 1. Introduction
|
| 9 |
+
[DeepSeek-V3.2](https://huggingface.co/deepseek-ai/DeepSeek-V3.2)
|
| 10 |
+
introduces the DeepSeek Sparse Attention (DSA) architecture, representing a significant architectural evolution over [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) and [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1). However, as of now, an official open-source implementation compatible with Ampere-series GPUs has not been released.
|
| 11 |
+
|
| 12 |
+
To address this gap, we introduce **DeepSeek-V3.2-Retro**, targeting the following user groups:
|
| 13 |
+
|
| 14 |
+
- Ampere GPU users who do not have access to Hopper or Blackwell architectures.
|
| 15 |
+
- Users of general-purpose GPU platforms where DSA is not yet supported.
|
| 16 |
+
|
| 17 |
+
Key features of **DeepSeek-V3.2-Retro** include:
|
| 18 |
+
|
| 19 |
+
- Removal of the DSA modules from the original V3.2 architecture.
|
| 20 |
+
- Conversion of model parameters and computation to the BF16 data format.
|
| 21 |
+
- Broad Compatibility: runs on any hardware platform that supports the V3 architecture.
|
| 22 |
+
- Validated Performance: achieves performance on multiple benchmarks that is close to the [officially reported results](https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf).
|
| 23 |
+
|
| 24 |
+
## 2. Performance Evaluation
|
| 25 |
+
As our primary target scenario is reasoning-oriented usage, we report accuracy results on several representative benchmarks after enabling the thinking feature. All evaluation metrics are taken from the corresponding official technical reports for consistency.
|
| 26 |
+
|
| 27 |
+
<div align="center">
|
| 28 |
+
|
| 29 |
+
| Benchmark | [DeepSeek-V3.2-Retro](https://github.com/zhejianglab/DeepSeek-V3.2-Retro) | [DeepSeek-V3.2-Thinking](https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf) |
|
| 30 |
+
| :---: | :---: | :---: |
|
| 31 |
+
| MMLU-Pro | 86.4 | 85.0 |
|
| 32 |
+
| GPQA Diamond | 82.12 | 82.4 |
|
| 33 |
+
| AIME 2025 | 93.67 | 93.1 |
|
| 34 |
+
| LiveCodeBench | 80.72 | 83.3 |
|
| 35 |
+
|
| 36 |
+
</div>
|
| 37 |
+
|
| 38 |
+
In addition, we evaluate inference efficiency. Using SGLang v0.5.6 under identical settings, we observe that the throughput of DeepSeek-V3.2-Retro is on par with DeepSeek-V3.1. Output throughput is reported in tokens/s.
|
| 39 |
+
|
| 40 |
+
<div align="center">
|
| 41 |
+
|
| 42 |
+
| Model | Output Throughput (qps=512, input=1k, output=10k) |
|
| 43 |
+
| :---: | :---: |
|
| 44 |
+
| [DeepSeek-V3.2-Retro](https://github.com/zhejianglab/DeepSeek-V3.2-Retro) | 2510.27 |
|
| 45 |
+
| [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) | 2515.34 |
|
| 46 |
+
|
| 47 |
+
</div>
|
| 48 |
+
|
| 49 |
+
These results indicate that removing the DSA structure and reverting to a V3-compatible architecture does not introduce noticeable performance regression in either reasoning accuracy or inference throughput on Ampere-class hardware.
|
| 50 |
+
|
| 51 |
+
## 3. Model Download
|
| 52 |
+
DeepSeek-V3.2-Retro model is available for download from [Hugging Face](https://huggingface.co/ZhejiangLab/DeepSeek-V3.2-Retro) and [ModelScope](https://modelscope.cn/models/zhejianglab/DeepSeek-V3.2-Retro). Please ensure that you have at least 1.5 TB of available disk space before downloading the model.
|
| 53 |
+
|
| 54 |
+
<div align="center">
|
| 55 |
+
|
| 56 |
+
| **Model** | **Total Params** | **Hugging Face** | **ModelScope** |
|
| 57 |
+
|:---------:|:----------------:|:----------------:|:--------------:|
|
| 58 |
+
| DeepSeek-V3.2-Retro | 684 B | [🤗 Hugging Face](https://huggingface.co/ZhejiangLab/DeepSeek-V3.2-Retro) |[🤖 ModelScope](https://modelscope.cn/models/zhejianglab/DeepSeek-V3.2-Retro) |
|
| 59 |
+
|
| 60 |
+
</div>
|
| 61 |
+
|
| 62 |
+
## 4. Quickstart
|
| 63 |
+
|
| 64 |
+
We strongly recommend using SGLang for efficient inference of the DeepSeek series models. We provide example configurations for SGLang serving on four A100*8 nodes.
|
| 65 |
+
|
| 66 |
+
### SGLang
|
| 67 |
+
|
| 68 |
+
#### Using Docker (Recommended)
|
| 69 |
+
|
| 70 |
+
```docker
|
| 71 |
+
# Pull latest image on four nodes and ensure RDMA network connectivity between the 4 nodes.
|
| 72 |
+
# https://hub.docker.com/r/lmsysorg/sglang/tags
|
| 73 |
+
docker pull lmsysorg/sglang:latest
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
#### Launch Command
|
| 77 |
+
|
| 78 |
+
```python
|
| 79 |
+
# For high QPS scenarios, add --enable-dp-attention and --ep-size arguments to boost throughput, and use mtp to boost decoding speed.
|
| 80 |
+
# node 1
|
| 81 |
+
python3 -m sglang.launch_server --model-path /path/to/DeepSeek-V3.2-Retro --tp 32 --dist-init-addr 10.0.0.1:5000 --nnodes 4 --node-rank 0 --trust-remote-code --host 0.0.0.0 --port 30000 --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-dp-attention --dp 8 --ep-size 32 --enable-dp-lm-head
|
| 82 |
+
|
| 83 |
+
# node 2
|
| 84 |
+
python3 -m sglang.launch_server --model-path /path/to/DeepSeek-V3.2-Retro --tp 32 --dist-init-addr 10.0.0.1:5000 --nnodes 4 --node-rank 1 --trust-remote-code --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-dp-attention --dp 8 --ep-size 32 --enable-dp-lm-head
|
| 85 |
+
|
| 86 |
+
# node 3
|
| 87 |
+
python3 -m sglang.launch_server --model-path /path/to/DeepSeek-V3.2-Retro --tp 32 --dist-init-addr 10.0.0.1:5000 --nnodes 4 --node-rank 2 --trust-remote-code --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-dp-attention --dp 8 --ep-size 32 --enable-dp-lm-head
|
| 88 |
+
|
| 89 |
+
# node 4
|
| 90 |
+
python3 -m sglang.launch_server --model-path /path/to/DeepSeek-V3.2-Retro --tp 32 --dist-init-addr 10.0.0.1:5000 --nnodes 4 --node-rank 3 --trust-remote-code --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-dp-attention --dp 8 --ep-size 32 --enable-dp-lm-head
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
## 5. License
|
| 94 |
+
This repository and the model weights are licensed under the MIT License, following the license of DeepSeek-V3.2. In addition, if you use DeepSeek-V3.2, you shall also comply with the terms and conditions of DeepSeek-V3.2.
|
| 95 |
+
|
| 96 |
+
## 6. Contact
|
| 97 |
+
If you have any questions, please raise an [issue](https://github.com/zhejianglab/DeepSeek-V3.2-Retro/issues) or contact us at opensource@zhejianglab.org.
|