menglc
/

SliMM-DeepStackE-Qwen2VL-2B

Image-Text-to-Text

text-generation

text-generation-inference

Model card Files Files and versions

menglc commited on Dec 14, 2024

Commit

d72dba2

·

verified ·

1 Parent(s): b2a7ef6

update readme

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -89,12 +89,12 @@ print(output_text)
 | Model                                                    | MMMU (Val) | ChartQA (Test) | AI2D (test) | DocVQA (val)
 |----------------------------------------------------------|------------|----------------|-------------|-------------|
-|Qwen2VL-2B (official evaluation)                          |41.1        | 73.5           |74.7         |90.1         |
 |Qwen2VL-2B (our evaluation, 1024 max vistokens to LLM)    |39.4        | 75.6           |70.7         |90.4         |
 |SliMM-DeepStackE-Qwen2VL-0.5B (256 max vistokens to LLM)  |40.7        | 74.5           |74.7         |85.4         |
 |SliMM-DeepStackE-Qwen2VL-0.5B (400 max vistokens to LLM)  |41.2        | 76.8           |74.9         |88.0         |
 <p align="left">
     <img src="https://cdn-uploads.huggingface.co/production/uploads/64d852a4bab152b2470bf96e/dtVzPkcIp40oH8sg7MG_u.png" alt="Trade-off between N Vistokens for LLM and Acc" style="width:500px;" >  <br>

 | Model                                                    | MMMU (Val) | ChartQA (Test) | AI2D (test) | DocVQA (val)
 |----------------------------------------------------------|------------|----------------|-------------|-------------|
+|Qwen2VL-2B (official evaluation)                          |41.1        | 73.5           |74.7         |90.1*         |
 |Qwen2VL-2B (our evaluation, 1024 max vistokens to LLM)    |39.4        | 75.6           |70.7         |90.4         |
 |SliMM-DeepStackE-Qwen2VL-0.5B (256 max vistokens to LLM)  |40.7        | 74.5           |74.7         |85.4         |
 |SliMM-DeepStackE-Qwen2VL-0.5B (400 max vistokens to LLM)  |41.2        | 76.8           |74.9         |88.0         |
+ <code>*</code> indicates the performance on DocVQA test set
 <p align="left">
     <img src="https://cdn-uploads.huggingface.co/production/uploads/64d852a4bab152b2470bf96e/dtVzPkcIp40oH8sg7MG_u.png" alt="Trade-off between N Vistokens for LLM and Acc" style="width:500px;" >  <br>