Update README.md
Browse files
README.md
CHANGED
|
@@ -25,7 +25,7 @@ This model continues to employ a hybrid architecture that combines linear attent
|
|
| 25 |
|
| 26 |
<div style="display: flex; justify-content: center;">
|
| 27 |
<div style="text-align: center;">
|
| 28 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/PHRg8ipzJtr0p6sojAa5T.png" width="
|
| 29 |
<p style="margin-top: 8px; font-size: 14px;"><strong>Figure 1:</strong> Hybrid Linear Model Architecture</p>
|
| 30 |
</div>
|
| 31 |
</div>
|
|
@@ -36,12 +36,12 @@ To better demonstrate our model's reasoning capabilities, we compared it with th
|
|
| 36 |
|
| 37 |
<div style="display: flex; justify-content: center;">
|
| 38 |
<div style="text-align: center;">
|
| 39 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/RcHlh5PriRuOLsErG8RjK.webp" width="
|
| 40 |
<p style="margin-top: 8px; font-size: 14px;"><strong>Figure 2:</strong> Model Performance Comparison </p>
|
| 41 |
</div>
|
| 42 |
</div>
|
| 43 |
|
| 44 |
-
## Linear Attention, Highly Sparse
|
| 45 |
|
| 46 |
Thanks to its hybrid attention mechanism and highly sparse MoE architecture, Ring-mini-linear-2.0 achieves near-linear time complexity and constant space complexity, resulting in outstanding inference efficiency. To fully demonstrate this advantage, we conducted a head-to-head comparison between our model and top-tier competitors of similar size or performance.
|
| 47 |
The results are remarkable. In the prefill stage, Ring-mini-linear-2.0's performance is exceptional; when the context length exceeds 256k, its throughput is over 12 times higher than that of Qwen3-8B. Furthermore, in the high-concurrency decode stage, its capabilities are even more pronounced. For generation lengths exceeding 32k, its throughput easily surpasses 12 times that of Qwen3-8B.
|
|
|
|
| 25 |
|
| 26 |
<div style="display: flex; justify-content: center;">
|
| 27 |
<div style="text-align: center;">
|
| 28 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/PHRg8ipzJtr0p6sojAa5T.png" width="800">
|
| 29 |
<p style="margin-top: 8px; font-size: 14px;"><strong>Figure 1:</strong> Hybrid Linear Model Architecture</p>
|
| 30 |
</div>
|
| 31 |
</div>
|
|
|
|
| 36 |
|
| 37 |
<div style="display: flex; justify-content: center;">
|
| 38 |
<div style="text-align: center;">
|
| 39 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/RcHlh5PriRuOLsErG8RjK.webp" width="100%">
|
| 40 |
<p style="margin-top: 8px; font-size: 14px;"><strong>Figure 2:</strong> Model Performance Comparison </p>
|
| 41 |
</div>
|
| 42 |
</div>
|
| 43 |
|
| 44 |
+
## Linear Attention, Highly Sparse, High-Speed Generation
|
| 45 |
|
| 46 |
Thanks to its hybrid attention mechanism and highly sparse MoE architecture, Ring-mini-linear-2.0 achieves near-linear time complexity and constant space complexity, resulting in outstanding inference efficiency. To fully demonstrate this advantage, we conducted a head-to-head comparison between our model and top-tier competitors of similar size or performance.
|
| 47 |
The results are remarkable. In the prefill stage, Ring-mini-linear-2.0's performance is exceptional; when the context length exceeds 256k, its throughput is over 12 times higher than that of Qwen3-8B. Furthermore, in the high-concurrency decode stage, its capabilities are even more pronounced. For generation lengths exceeding 32k, its throughput easily surpasses 12 times that of Qwen3-8B.
|