mack-williams
/

Light-Forcing

video_generation

Sparse_Attention

quantization

Model card Files Files and versions

xet

Community

mack-williams commited on 17 days ago

Commit

9fcd27b

verified ·

1 Parent(s): c87d042

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -28

README.md CHANGED Viewed

@@ -24,14 +24,6 @@ tags:
 (📧 denotes corresponding author.)
-https://github.com/user-attachments/assets/2daa9f17-329e-4019-8f14-68ac2c467592
-<em>
-    (Results on Self Forcing 1.3B. Left: Dense Attention. Right: 1.3x acceleration using Light Forcing)
-</em>
 </div>
 ### 💡 Why Light Forcing
@@ -44,8 +36,6 @@ https://github.com/user-attachments/assets/2daa9f17-329e-4019-8f14-68ac2c467592
 ### 🧾 Introduction
 Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (i.e., frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (e.g., 84.5 on VBench) and efficiency (e.g., 1.2-1.3x end-to-end speedup). Combined with other efficient solutions, Light Forcing further achieves a 2.0-3.0x end-to-end speedup across diverse GPUs (e.g., 27.4 FPS on RTX 5090 and 33.9 FPS on H100).
-<img src="assets/framework.png" width="90%" ></img>
 ## ✨ Quick Start
 ### Environment
@@ -135,15 +125,6 @@ python inference.py \
     <th>+Efficient kernel<br>(RoPE, RMSNorm, etc.)</th>
     <th>+Light VAE</th>
   </tr>
-  <tr>
-    <td>Video</td>
-    <td>5 seconds</td>
-    <td><video src="https://github.com/user-attachments/assets/59988b4e-e31e-4924-a4de-5d5cee2c4266" width="100%" controls loop></video></td>
-    <td><video src="https://github.com/user-attachments/assets/9219bb6f-d1d8-4837-816b-ac29a77e13f4" width="100%" controls loop></video></td>
-    <td><video src="https://github.com/user-attachments/assets/7c374e46-de08-4a45-a006-d7adf8fc61cd" width="100%" controls loop></video></td>
-    <td><video src="https://github.com/user-attachments/assets/a4e064ad-7d61-4120-a6f3-f8a68be8ccab" width="100%" controls loop></video></td>
-    <td><video src="https://github.com/user-attachments/assets/7682c7ae-6ffc-4305-a51f-5569d7e8338a" width="100%" controls loop></video></td>
-  </tr>
   <tr>
     <td>Latency</td>
     <td>5 seconds</td>
@@ -171,15 +152,6 @@ python inference.py \
     <td>15.8G</td>
     <td>12.7G</td>
   </tr>
-  <tr>
-    <td>Video</td>
-    <td>15 seconds</td>
-    <td><video src="https://github.com/user-attachments/assets/746bb4ce-fa46-46eb-9de5-b7925976d1de" width="100%" controls loop></video></td>
-    <td><video src="https://github.com/user-attachments/assets/9d8fcdd7-cb81-4539-ab45-acff5a754e7f" width="100%" controls loop></video></td>
-    <td><video src="https://github.com/user-attachments/assets/b5d79bc5-969a-4125-b15c-32b7e3db327a" width="100%" controls loop></video></td>
-    <td><video src="https://github.com/user-attachments/assets/daa46c3f-20c2-4920-8536-bf580f2d15e1" width="100%" controls loop></video></td>
-    <td><video src="https://github.com/user-attachments/assets/2e5fc265-4ba4-481e-acfe-f716cb8f4e5f" width="100%" controls loop></video></td>
-  </tr>
   <tr>
     <td>Latency</td>
     <td>15 seconds</td>

 (📧 denotes corresponding author.)
 </div>
 ### 💡 Why Light Forcing
 ### 🧾 Introduction
 Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (i.e., frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (e.g., 84.5 on VBench) and efficiency (e.g., 1.2-1.3x end-to-end speedup). Combined with other efficient solutions, Light Forcing further achieves a 2.0-3.0x end-to-end speedup across diverse GPUs (e.g., 27.4 FPS on RTX 5090 and 33.9 FPS on H100).
 ## ✨ Quick Start
 ### Environment
     <th>+Efficient kernel<br>(RoPE, RMSNorm, etc.)</th>
     <th>+Light VAE</th>
   </tr>
   <tr>
     <td>Latency</td>
     <td>5 seconds</td>
     <td>15.8G</td>
     <td>12.7G</td>
   </tr>
   <tr>
     <td>Latency</td>
     <td>15 seconds</td>