caizhi1 commited on
Commit
63fcfbd
·
verified ·
1 Parent(s): 1b5976d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -25,7 +25,7 @@ This model continues to employ a hybrid architecture that combines linear attent
25
 
26
  <div style="display: flex; justify-content: center;">
27
  <div style="text-align: center;">
28
- <img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/PHRg8ipzJtr0p6sojAa5T.png" width="1000">
29
  <p style="margin-top: 8px; font-size: 14px;"><strong>Figure 1:</strong> Hybrid Linear Model Architecture</p>
30
  </div>
31
  </div>
@@ -36,12 +36,12 @@ To better demonstrate our model's reasoning capabilities, we compared it with th
36
 
37
  <div style="display: flex; justify-content: center;">
38
  <div style="text-align: center;">
39
- <img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/RcHlh5PriRuOLsErG8RjK.webp" width="1000">
40
  <p style="margin-top: 8px; font-size: 14px;"><strong>Figure 2:</strong> Model Performance Comparison </p>
41
  </div>
42
  </div>
43
 
44
- ## Linear Attention, Highly SparseHigh-Speed Generation
45
 
46
  Thanks to its hybrid attention mechanism and highly sparse MoE architecture, Ring-mini-linear-2.0 achieves near-linear time complexity and constant space complexity, resulting in outstanding inference efficiency. To fully demonstrate this advantage, we conducted a head-to-head comparison between our model and top-tier competitors of similar size or performance.
47
  The results are remarkable. In the prefill stage, Ring-mini-linear-2.0's performance is exceptional; when the context length exceeds 256k, its throughput is over 12 times higher than that of Qwen3-8B. Furthermore, in the high-concurrency decode stage, its capabilities are even more pronounced. For generation lengths exceeding 32k, its throughput easily surpasses 12 times that of Qwen3-8B.
 
25
 
26
  <div style="display: flex; justify-content: center;">
27
  <div style="text-align: center;">
28
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/PHRg8ipzJtr0p6sojAa5T.png" width="800">
29
  <p style="margin-top: 8px; font-size: 14px;"><strong>Figure 1:</strong> Hybrid Linear Model Architecture</p>
30
  </div>
31
  </div>
 
36
 
37
  <div style="display: flex; justify-content: center;">
38
  <div style="text-align: center;">
39
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/RcHlh5PriRuOLsErG8RjK.webp" width="100%">
40
  <p style="margin-top: 8px; font-size: 14px;"><strong>Figure 2:</strong> Model Performance Comparison </p>
41
  </div>
42
  </div>
43
 
44
+ ## Linear Attention, Highly Sparse, High-Speed Generation
45
 
46
  Thanks to its hybrid attention mechanism and highly sparse MoE architecture, Ring-mini-linear-2.0 achieves near-linear time complexity and constant space complexity, resulting in outstanding inference efficiency. To fully demonstrate this advantage, we conducted a head-to-head comparison between our model and top-tier competitors of similar size or performance.
47
  The results are remarkable. In the prefill stage, Ring-mini-linear-2.0's performance is exceptional; when the context length exceeds 256k, its throughput is over 12 times higher than that of Qwen3-8B. Furthermore, in the high-concurrency decode stage, its capabilities are even more pronounced. For generation lengths exceeding 32k, its throughput easily surpasses 12 times that of Qwen3-8B.