HongyuanTao commited on
Commit
214835f
·
verified ·
1 Parent(s): 2714845

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -3
README.md CHANGED
@@ -1,3 +1,45 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - vision-language-model
6
+ - image-text-to-text
7
+ - linear-attention
8
+ - gated-deltanet
9
+ - infinitevl
10
+ - multimodal
11
+ base_model: Qwen/Qwen2.5-VL-3B-Instruct
12
+ pipeline_tag: image-text-to-text
13
+ ---
14
+
15
+ <div align="center">
16
+
17
+ # InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input VLMs
18
+
19
+ <a href="https://arxiv.org/abs/YOUR_ARXIV_ID"><img src="https://img.shields.io/badge/Paper-ArXiv-b31b1b.svg" alt="Paper"></a>
20
+ <a href="https://github.com/YOUR_USERNAME/InfiniteVL"><img src="https://img.shields.io/badge/GitHub-Code-black" alt="Code"></a>
21
+ <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="License"></a>
22
+
23
+ </div>
24
+
25
+ ## 📖 Introduction
26
+
27
+ **InfiniteVL** is a linear-complexity Vision-Language Model (VLM) developed by **Huazhong University of Science and Technology (HUST)** and **Horizon Robotics**.
28
+
29
+ Traditional Transformer-based VLMs suffer from quadratic computational complexity ($O(N^2)$) and growing KV-cache memory usage. **InfiniteVL** solves this by synergizing **Sliding Window Attention (SWA)** with **Gated DeltaNet**, enabling **unlimited input tokens** and **real-time streaming**.
30
+
31
+ ### Key Features
32
+ * **🚀 Linear Complexity ($O(N)$):** Reduces per-token latency by **3.6×** compared to Qwen2.5-VL-3B.
33
+ * **📉 Constant Memory:** Maintains a fixed GPU memory usage (~9GB) regardless of sequence length.
34
+ * **⚡ Real-Time Streaming:** Sustains a stable **24 FPS** throughput for long video understanding on a single RTX 4090.
35
+ * **🧠 Hybrid Architecture:** 75% Gated DeltaNet (Global Context) + 25% SWA (Local Detail).
36
+
37
+ ![Performance Comparison](teaser.png)
38
+
39
+ ## 🛠️ Requirements
40
+
41
+ To use InfiniteVL, you need to install the linear attention kernels.
42
+
43
+ ```bash
44
+ pip install transformers torch
45
+ pip install fla # Flash Linear Attention