Aryan3108
/

SparseVLM

vision-language-model

inference-optimization

Model card Files Files and versions

Aryan3108 commited on 27 days ago

Commit

14c17cc

·

verified ·

1 Parent(s): 45c83c9

Update README.md

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -77,10 +77,10 @@ Measured on **NVIDIA A100-SXM4-40GB**, Qwen2.5-VL-7B-Instruct, bfloat16, SDPA at
 | Config | Tokens kept | Time | Speedup | Output quality |
 |---|---|---|---|---|
-| Baseline | 16320 (100%) | 9738ms | 1.00× | ✅ Identifies Fuji, Milky Way, snow cap, star colors |
-| SparseVLM 50% | 8192 | 9441ms | 1.03× | ✅ Same quality |
-| SparseVLM 25% | 4080 | 9297ms | 1.05× | ✅ All key details preserved |
-| SparseVLM 10% | 1632 | 9425ms | 1.03× | ✅ Still correctly describes scene |
 > **Key result:** Full 4K image (16K tokens) runs without OOM. Without SparseVLM's hook-based scoring, the 16K-token image requires materialising a 15GB attention matrix and crashes. The scorer computes only the text→visual submatrix (35 × 16320 = 32MB instead of 15GB).
@@ -174,10 +174,10 @@ remove_hooks(state)
 | Model | Status |
 |---|---|
-| Qwen/Qwen2.5-VL-7B-Instruct | ✅ Tested |
-| Qwen/Qwen2.5-VL-3B-Instruct | ✅ Should work |
-| Qwen/Qwen2.5-VL-72B-Instruct | ✅ Should work |
-| Qwen/Qwen2-VL-* | ✅ Legacy support |
 ---

 | Config | Tokens kept | Time | Speedup | Output quality |
 |---|---|---|---|---|
+| Baseline | 16320 (100%) | 9738ms | 1.00× | Identifies Fuji, Milky Way, snow cap, star colors |
+| SparseVLM 50% | 8192 | 9441ms | 1.03× | Same quality |
+| SparseVLM 25% | 4080 | 9297ms | 1.05× | All key details preserved |
+| SparseVLM 10% | 1632 | 9425ms | 1.03× | Still correctly describes scene |
 > **Key result:** Full 4K image (16K tokens) runs without OOM. Without SparseVLM's hook-based scoring, the 16K-token image requires materialising a 15GB attention matrix and crashes. The scorer computes only the text→visual submatrix (35 × 16320 = 32MB instead of 15GB).
 | Model | Status |
 |---|---|
+| Qwen/Qwen2.5-VL-7B-Instruct | Tested |
+| Qwen/Qwen2.5-VL-3B-Instruct | Should work |
+| Qwen/Qwen2.5-VL-72B-Instruct | Should work |
+| Qwen/Qwen2-VL-* | Legacy support |
 ---