Add model card for VaseVL

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-nd-4.0
3
+ pipeline_tag: image-text-to-text
4
+ tags:
5
+ - multimodal
6
+ - vqa
7
+ - cultural-heritage
8
+ ---
9
+
10
+ # <img src="https://github.com/AIGeeksGroup/VaseVL/blob/main/images/vasevl_logo-cropped.svg" alt="logo" width="30"/> VaseVL: Multimodal Agent for Ancient Greek Pottery
11
+
12
+ This repository contains the **VaseVL** model, an SFT-then-RL system designed for robust, expert-level reasoning on ancient Greek pottery. It was presented in the paper [VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery](https://huggingface.co/papers/2509.17191).
13
+
14
+ The code and associated resources for this project are available on GitHub: [https://github.com/AIGeeksGroup/VaseVQA](https://github.com/AIGeeksGroup/VaseVQA).
15
+
16
+ ## Introduction
17
+ Analyzing cultural-heritage artifacts remains challenging for MLLMs: general models lack domain expertise, and SFT often overfits superficial patterns, yielding brittle reasoning for authentication and historical attribution. This raises the question of how to equip MLLMs with robust, expert-level reasoning for ancient Greek pottery. We present VaseVL, an SFT-then-RL system that turns evaluation into supervision: we construct a taxonomy of question types, probe the SFT model to localize type-specific performance gaps, and optimize with type-conditioned, compositionality-oriented rewards targeting those gaps. We also release VaseVQA, a comprehensive benchmark of 31,773 images designed to probe deep understanding. Experiments show state-of-the-art results on style classification and historical attribution with marked gains in compositional robustness over SFT-only baselines, validating diagnosis-guided, taxonomy-conditioned reward engineering and providing a reusable resource for future research.
18
+
19
+ <center class ='img'>
20
+ <img title="VaseVL Pipeline" src="https://github.com/AIGeeksGroup/VaseVL/blob/main/images/vasevqa_example.png" width="100%">
21
+ </center>
22
+
23
+ ## Deploy VaseVL Demo UI Locally
24
+
25
+ To get the full experience of the VaseVL UI, you need to deploy it locally by following the steps below:
26
+
27
+ 1. **Clone the VaseVL repository to your local machine.**
28
+ ```bash
29
+ git clone https://github.com/AIGeeksGroup/VaseVL.git
30
+ ```
31
+
32
+ 2. **Navigate to the `ui` directory which contains the front-end source code.**
33
+ ```bash
34
+ cd ui
35
+ ```
36
+
37
+ 3. **Install all required Node.js dependencies.**
38
+ ```bash
39
+ npm install
40
+ ```
41
+
42
+ 4. **Build the UI project for production.**
43
+ ```bash
44
+ npm run build
45
+ ```
46
+
47
+ 5. **Start the local server to launch the VaseVL Demo UI.**
48
+ ```bash
49
+ npm run start
50
+ ```
51
+
52
+ Once the server starts, you can access the VaseVL Demo UI in your browser at `http://localhost:1717/projects/1743242682314/playground` by default.
53
+
54
+ <center class ='img'>
55
+ <img title="VaseVL UI Example" src="https://github.com/AIGeeksGroup/VaseVL/blob/main/images/website_example.png" width="100%">
56
+ </center>
57
+
58
+ ## License
59
+
60
+ Our data and model are under an NCND (Non-Commercial, No Derivatives) license, meaning they are for non-commercial use only, and modification of the data for other datasets is not permitted. This corresponds to the [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) license.
61
+
62
+ ## Citation
63
+
64
+ If you use any content of this repository for your work, please cite the following paper:
65
+ ```bibtex
66
+ @article{ge2025vasevqa,
67
+ title={VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery},
68
+ author={Ge, Jinchao and Cheng, Tengfei and Wu, Biao and Zhang, Zeyu and Huang, Shiya and Bishop, Judith and Shepherd, Gillian and Fang, Meng and Chen, Ling and Zhao, Yang},
69
+ journal={arXiv preprint arXiv:2509.17191},
70
+ year={2025}
71
+ }
72
+ ```