rogerxi
/

Spatial-LLaVA-7B

Image-Text-to-Text

Model card Files Files and versions

rogerxi commited on May 2, 2025

Commit

ea864cf

·

verified ·

1 Parent(s): 37fd3cc

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ Instruction following training: [rogerxi/LLaVA-Spatial-Instruct-850K](https://hu
 ## 📊 Evaluation
 A collection of 10 benchmarks:
 | Model                  |   VQAv2  |    GQA   |  VizWiz  |    SQA   |  TextVQA |   POPE   |     MME    | MM-Bench | MM-Bench-cn |  MM-Vet  |
-|:----------------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:----------:|:--------:|:-----------:|:--------:|
 |   LLaVA-1.5-7b   |   78.5   |   62.0   | **50.0** |   66.8   |   58.2   |   85.9   | **1510.7** |   64.3   |     58.3    |   31.1   |
 | Spatial-LLaVA-7b | **79.7** | **62.7** |   48.7   | **68.7** | **58.5** | **87.2** |   1472.7   | **67.8** |   **60.7**  | **31.6** |
@@ -48,3 +48,6 @@ A collection of 10 benchmarks:
 |:-----------------------:|:------------------------:|:----------------------------:|:--------------------------:|:------------------:|:-------------------:|:----------------------:|
 | LLaVA-1.5-7b          |     12.90 / 1.06         |     10.68 / 2.03             |     20.79 / 0.94          |     **24.19 / 0.50**  |     14.29 / 5.27   |      10.23 / 58.33    |
 | Spatial-LLaVA-7b      |     **24.19 / 0.57**         |     **14.56 / 0.62**             |     **41.58 / 0.42**          |     22.58 / 1.12  |     **18.25 / 2.92**   |      **20.45 / 56.47**    |

 ## 📊 Evaluation
 A collection of 10 benchmarks:
 | Model                  |   VQAv2  |    GQA   |  VizWiz  |    SQA   |  TextVQA |   POPE   |     MME    | MM-Bench | MM-Bench-cn |  MM-Vet  |
+|:-----------------------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:----------:|:--------:|:-----------:|:--------:|
 |   LLaVA-1.5-7b   |   78.5   |   62.0   | **50.0** |   66.8   |   58.2   |   85.9   | **1510.7** |   64.3   |     58.3    |   31.1   |
 | Spatial-LLaVA-7b | **79.7** | **62.7** |   48.7   | **68.7** | **58.5** | **87.2** |   1472.7   | **67.8** |   **60.7**  | **31.6** |
 |:-----------------------:|:------------------------:|:----------------------------:|:--------------------------:|:------------------:|:-------------------:|:----------------------:|
 | LLaVA-1.5-7b          |     12.90 / 1.06         |     10.68 / 2.03             |     20.79 / 0.94          |     **24.19 / 0.50**  |     14.29 / 5.27   |      10.23 / 58.33    |
 | Spatial-LLaVA-7b      |     **24.19 / 0.57**         |     **14.56 / 0.62**             |     **41.58 / 0.42**          |     22.58 / 1.12  |     **18.25 / 2.92**   |      **20.45 / 56.47**    |
+## 🙏 Acknowledgements
+We thank Liu Haotian et al. for the LLaVA pretrained script, weights and LLaVA-v1.5 mixture dataset; the teams behind CLEVR, TextCaps, VisualMRC and VQAv2 (via “HuggingFaceM4/the_cauldron”); remyxai for OpenSpaces; Anjie Cheng et al. for Spatial-Bench and data pipeline; Google for OpenImages; and Hugging Face for their datasets infrastructure.