inclusionAI
/

Ring-mini-linear-2.0

@@ -25,7 +25,7 @@ This model continues to employ a hybrid architecture that combines linear attent
 <div style="display: flex; justify-content: center;">
   <div style="text-align: center;">
-    <img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/PHRg8ipzJtr0p6sojAa5T.png" width="1000">
     <p style="margin-top: 8px; font-size: 14px;"><strong>Figure 1:</strong> Hybrid Linear Model Architecture</p>
   </div>
 </div>
@@ -36,12 +36,12 @@ To better demonstrate our model's reasoning capabilities, we compared it with th
 <div style="display: flex; justify-content: center;">
   <div style="text-align: center;">
-    <img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/RcHlh5PriRuOLsErG8RjK.webp" width="1000">
     <p style="margin-top: 8px; font-size: 14px;"><strong>Figure 2:</strong> Model Performance Comparison </p>
   </div>
 </div>
-## Linear Attention, Highly Sparse，High-Speed Generation
 Thanks to its hybrid attention mechanism and highly sparse MoE architecture, Ring-mini-linear-2.0 achieves near-linear time complexity and constant space complexity, resulting in outstanding inference efficiency. To fully demonstrate this advantage, we conducted a head-to-head comparison between our model and top-tier competitors of similar size or performance.
 The results are remarkable. In the prefill stage, Ring-mini-linear-2.0's performance is exceptional; when the context length exceeds 256k, its throughput is over 12 times higher than that of Qwen3-8B. Furthermore, in the high-concurrency decode stage, its capabilities are even more pronounced. For generation lengths exceeding 32k, its throughput easily surpasses 12 times that of Qwen3-8B.

 <div style="display: flex; justify-content: center;">
   <div style="text-align: center;">
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/PHRg8ipzJtr0p6sojAa5T.png" width="800">
     <p style="margin-top: 8px; font-size: 14px;"><strong>Figure 1:</strong> Hybrid Linear Model Architecture</p>
   </div>
 </div>
 <div style="display: flex; justify-content: center;">
   <div style="text-align: center;">
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/68d20104a6f8ea66da0cb447/RcHlh5PriRuOLsErG8RjK.webp" width="100%">
     <p style="margin-top: 8px; font-size: 14px;"><strong>Figure 2:</strong> Model Performance Comparison </p>
   </div>
 </div>
+## Linear Attention, Highly Sparse, High-Speed Generation
 Thanks to its hybrid attention mechanism and highly sparse MoE architecture, Ring-mini-linear-2.0 achieves near-linear time complexity and constant space complexity, resulting in outstanding inference efficiency. To fully demonstrate this advantage, we conducted a head-to-head comparison between our model and top-tier competitors of similar size or performance.
 The results are remarkable. In the prefill stage, Ring-mini-linear-2.0's performance is exceptional; when the context length exceeds 256k, its throughput is over 12 times higher than that of Qwen3-8B. Furthermore, in the high-concurrency decode stage, its capabilities are even more pronounced. For generation lengths exceeding 32k, its throughput easily surpasses 12 times that of Qwen3-8B.