nielsr HF Staff commited on
Commit
3419559
·
verified ·
1 Parent(s): 7bb727d

Add model card: Include essential metadata, paper, and code links

Browse files

This PR adds a comprehensive model card for the `Vision-Zero` model.

It includes:
- The `pipeline_tag: image-text-to-text` for discoverability on the Hugging Face Hub.
- `library_name: transformers` as indicated by the model's `config.json` and `tokenizer_config.json`, enabling the automated "How to use" widget.
- A link to the paper: [Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play](https://huggingface.co/papers/2509.25541).
- A link to the official GitHub repository: https://github.com/wangqinsi1/Vision-Zero.
- A concise summary of the model's capabilities, an overview image, and citation information.

This update makes the model more discoverable and provides clearer guidance for users.

Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ ---
6
+
7
+ # Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
8
+
9
+ This repository contains the `Vision-Zero` model, as presented in the paper [Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play](https://huggingface.co/papers/2509.25541).
10
+
11
+ Vision-Zero is a novel, domain-agnostic framework designed to enhance the reasoning capabilities of Vision-Language Models (VLMs) through competitive visual games. It introduces a strategic self-play framework where models generate their own training data without human annotation. This enables gameplay from arbitrary images, boosting the model's generalization across diverse domains like synthetic scenes, charts, and real-world images. The framework also features Iterative Self-Play Policy Optimization (Iterative-SPO) to ensure sustained performance gains. Vision-Zero achieves state-of-the-art results on various tasks, including reasoning, chart question answering, and vision-centric understanding, despite using label-free data.
12
+
13
+ ![Overview](https://github.com/wangqinsi1/Vision-Zero/raw/main/self-play-taste.png)
14
+
15
+ **Paper Link:** https://huggingface.co/papers/2509.25541
16
+ **Code:** https://github.com/wangqinsi1/Vision-Zero
17
+
18
+ ## Usage
19
+
20
+ For detailed installation, training, and usage instructions, please refer to the official [GitHub repository](https://github.com/wangqinsi1/Vision-Zero).
21
+
22
+ ## Citation
23
+
24
+ If you find Vision-Zero useful or relevant to your project and research, please kindly cite our paper:
25
+
26
+ ```bibtex
27
+ @misc{wang2025visionzeroscalablevlmselfimprovement,
28
+ title={Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play},
29
+ author={Qinsi Wang and Bo Liu and Tianyi Zhou and Jing Shi and Yueqian Lin and Yiran Chen and Hai Helen Li and Kun Wan and Wentian Zhao},
30
+ year={2025},
31
+ eprint={2509.25541},
32
+ archivePrefix={arXiv},
33
+ primaryClass={cs.CV},
34
+ url={https://arxiv.org/abs/2509.25541},
35
+ }
36
+ ```