Update README.md
Browse files
README.md
CHANGED
|
@@ -18,31 +18,31 @@ model-index:
|
|
| 18 |
metrics:
|
| 19 |
- type: GQA
|
| 20 |
name: GQA
|
| 21 |
-
value: 0.
|
| 22 |
- type: MME Cog.
|
| 23 |
name: MME Cog.
|
| 24 |
-
value:
|
| 25 |
- type: MME Per.
|
| 26 |
name: MME Per.
|
| 27 |
-
value:
|
| 28 |
- type: MM-Vet
|
| 29 |
name: MM-Vet
|
| 30 |
-
value:
|
| 31 |
- type: POPE Acc.
|
| 32 |
name: POPE Acc.
|
| 33 |
-
value: 0.
|
| 34 |
- type: POPE F1
|
| 35 |
name: POPE F1
|
| 36 |
value: 0.839
|
| 37 |
- type: VQAv2
|
| 38 |
name: VQAv2
|
| 39 |
-
value:
|
| 40 |
- type: MMVP
|
| 41 |
name: MMVP
|
| 42 |
-
value: 0.
|
| 43 |
- type: ScienceQA Image
|
| 44 |
name: ScienceQA Image
|
| 45 |
-
value: 0.
|
| 46 |
library_name: transformers
|
| 47 |
pipeline_tag: image-text-to-text
|
| 48 |
---
|
|
@@ -165,8 +165,8 @@ Performance of LLaVA-Gemma models across seven benchmarks. Highlighted box indic
|
|
| 165 |
|
| 166 |
| LM Backbone | Vision Model | Pretrained Connector | GQA | MME cognition | MME perception | MM-Vet | POPE accuracy | POPE F1 | VQAv2 | ScienceQA Image | MMVP |
|
| 167 |
| ----------- | ------------ | -------------------- | ----- | ------------- | -------------- | ------ | ------------- | ------- | ----- | --------------- | ----- |
|
| 168 |
-
| gemma-2b-it | CLIP | Yes | 0.531 | 236 | 1130 | 17.7 | 0.850 |<mark>0.839</mark>| 70.65 | 0.564 | 0.287 |
|
| 169 |
-
|
|
| 170 |
| gemma-2b-it | DinoV2 | Yes |<mark>0.587</mark>| 307| <mark>1133</mark> |<mark>19.1</mark>| <mark>0.853</mark> | 0.838 |<mark>71.37</mark>| 0.555 | 0.227 |
|
| 171 |
| gemma-2b-it | DinoV2 | No | 0.501 | <mark>309</mark>| 959 | 14.5 | 0.793 | 0.772 | 61.65 | 0.568 | 0.180 |
|
| 172 |
| | | | | | | | | | | | |
|
|
|
|
| 18 |
metrics:
|
| 19 |
- type: GQA
|
| 20 |
name: GQA
|
| 21 |
+
value: 0.531
|
| 22 |
- type: MME Cog.
|
| 23 |
name: MME Cog.
|
| 24 |
+
value: 236
|
| 25 |
- type: MME Per.
|
| 26 |
name: MME Per.
|
| 27 |
+
value: 1130
|
| 28 |
- type: MM-Vet
|
| 29 |
name: MM-Vet
|
| 30 |
+
value: 17.7
|
| 31 |
- type: POPE Acc.
|
| 32 |
name: POPE Acc.
|
| 33 |
+
value: 0.850
|
| 34 |
- type: POPE F1
|
| 35 |
name: POPE F1
|
| 36 |
value: 0.839
|
| 37 |
- type: VQAv2
|
| 38 |
name: VQAv2
|
| 39 |
+
value: 70.7
|
| 40 |
- type: MMVP
|
| 41 |
name: MMVP
|
| 42 |
+
value: 0.287
|
| 43 |
- type: ScienceQA Image
|
| 44 |
name: ScienceQA Image
|
| 45 |
+
value: 0.564
|
| 46 |
library_name: transformers
|
| 47 |
pipeline_tag: image-text-to-text
|
| 48 |
---
|
|
|
|
| 165 |
|
| 166 |
| LM Backbone | Vision Model | Pretrained Connector | GQA | MME cognition | MME perception | MM-Vet | POPE accuracy | POPE F1 | VQAv2 | ScienceQA Image | MMVP |
|
| 167 |
| ----------- | ------------ | -------------------- | ----- | ------------- | -------------- | ------ | ------------- | ------- | ----- | --------------- | ----- |
|
| 168 |
+
| **gemma-2b-it** | CLIP | Yes | 0.531 | 236 | 1130 | 17.7 | 0.850 |<mark>0.839</mark>| 70.65 | 0.564 | 0.287 |
|
| 169 |
+
| gemma-2b-it | CLIP | No | 0.481 | 248 | 935 | 13.1 | 0.784 | 0.762 | 61.74 | 0.549 | 0.180 |
|
| 170 |
| gemma-2b-it | DinoV2 | Yes |<mark>0.587</mark>| 307| <mark>1133</mark> |<mark>19.1</mark>| <mark>0.853</mark> | 0.838 |<mark>71.37</mark>| 0.555 | 0.227 |
|
| 171 |
| gemma-2b-it | DinoV2 | No | 0.501 | <mark>309</mark>| 959 | 14.5 | 0.793 | 0.772 | 61.65 | 0.568 | 0.180 |
|
| 172 |
| | | | | | | | | | | | |
|