Intel
/

llava-gemma-2b

@@ -18,31 +18,31 @@ model-index:
     metrics:
     - type: GQA
       name: GQA
-      value: 0.587
     - type: MME Cog.
       name: MME Cog.
-      value: 309
     - type: MME Per.
       name: MME Per.
-      value: 1133
     - type: MM-Vet
       name: MM-Vet
-      value: 19.1
     - type: POPE Acc.
       name: POPE Acc.
-      value: 0.853
     - type: POPE F1
       name: POPE F1
       value: 0.839
     - type: VQAv2
       name: VQAv2
-      value: 71.4
     - type: MMVP
       name: MMVP
-      value: 0.327
     - type: ScienceQA Image
       name: ScienceQA Image
-      value: 0.636
 library_name: transformers
 pipeline_tag: image-text-to-text
 ---
@@ -165,8 +165,8 @@ Performance of LLaVA-Gemma models across seven benchmarks. Highlighted box indic
 | LM Backbone | Vision Model | Pretrained Connector | GQA   | MME cognition | MME perception | MM-Vet | POPE accuracy | POPE F1 | VQAv2 | ScienceQA Image | MMVP  |
 | ----------- | ------------ | -------------------- | ----- | ------------- | -------------- | ------ | ------------- | ------- | ----- | --------------- | ----- |
-| gemma-2b-it | CLIP         | Yes                  | 0.531 | 236           | 1130           | 17.7   | 0.850         |<mark>0.839</mark>| 70.65 | 0.564  | 0.287 |
-| **gemma-2b-it** | CLIP         | No                   | 0.481 | 248           | 935            | 13.1   | 0.784         | 0.762   | 61.74 | 0.549           | 0.180 |
 | gemma-2b-it | DinoV2       | Yes                  |<mark>0.587</mark>| 307| <mark>1133</mark>   |<mark>19.1</mark>| <mark>0.853</mark>   | 0.838   |<mark>71.37</mark>| 0.555         | 0.227 |
 | gemma-2b-it | DinoV2       | No                   | 0.501 | <mark>309</mark>| 959          | 14.5   | 0.793         | 0.772   | 61.65 | 0.568           | 0.180 |
 |             |              |                      |       |               |                |        |               |         |       |                 |       |

     metrics:
     - type: GQA
       name: GQA
+      value: 0.531
     - type: MME Cog.
       name: MME Cog.
+      value: 236
     - type: MME Per.
       name: MME Per.
+      value: 1130
     - type: MM-Vet
       name: MM-Vet
+      value: 17.7
     - type: POPE Acc.
       name: POPE Acc.
+      value: 0.850
     - type: POPE F1
       name: POPE F1
       value: 0.839
     - type: VQAv2
       name: VQAv2
+      value: 70.7
     - type: MMVP
       name: MMVP
+      value: 0.287
     - type: ScienceQA Image
       name: ScienceQA Image
+      value: 0.564
 library_name: transformers
 pipeline_tag: image-text-to-text
 ---
 | LM Backbone | Vision Model | Pretrained Connector | GQA   | MME cognition | MME perception | MM-Vet | POPE accuracy | POPE F1 | VQAv2 | ScienceQA Image | MMVP  |
 | ----------- | ------------ | -------------------- | ----- | ------------- | -------------- | ------ | ------------- | ------- | ----- | --------------- | ----- |
+| **gemma-2b-it** | CLIP         | Yes                  | 0.531 | 236           | 1130           | 17.7   | 0.850         |<mark>0.839</mark>| 70.65 | 0.564  | 0.287 |
+| gemma-2b-it | CLIP         | No                   | 0.481 | 248           | 935            | 13.1   | 0.784         | 0.762   | 61.74 | 0.549           | 0.180 |
 | gemma-2b-it | DinoV2       | Yes                  |<mark>0.587</mark>| 307| <mark>1133</mark>   |<mark>19.1</mark>| <mark>0.853</mark>   | 0.838   |<mark>71.37</mark>| 0.555         | 0.227 |
 | gemma-2b-it | DinoV2       | No                   | 0.501 | <mark>309</mark>| 959          | 14.5   | 0.793         | 0.772   | 61.65 | 0.568           | 0.180 |
 |             |              |                      |       |               |                |        |               |         |       |                 |       |