Add model card for Visual Jigsaw
Browse filesThis PR adds a model card for the Visual Jigsaw model, linking it to the paper [Visual Jigsaw Post-Training Improves MLLMs](https://huggingface.co/papers/2509.25190).
The update includes:
* Adding the `license: apache-2.0`.
* Setting the `pipeline_tag: image-text-to-text`.
* Adding `library_name: transformers` as the model is compatible with the library.
* Including a direct link to the project page and the GitHub repository.
* A brief description of the model.
This will improve discoverability and provide essential information for users.
README.md
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
library_name: transformers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Visual Jigsaw Post-Training Improves MLLMs
|
| 8 |
+
|
| 9 |
+
Visual Jigsaw is a generic self-supervised post-training framework designed to strengthen visual understanding in MLLMs. It is formulated as a general ordering task: visual inputs are partitioned, shuffled, and the model must reconstruct the visual information by producing the correct permutation in natural language. We provide the instantiations of Visual Jigsaw across three visual modalities, including images, videos, and 3D data.
|
| 10 |
+
|
| 11 |
+
* **Paper:** [Visual Jigsaw Post-Training Improves MLLMs](https://huggingface.co/papers/2509.25190)
|
| 12 |
+
* **Project Page:** https://penghao-wu.github.io/visual_jigsaw/
|
| 13 |
+
* **Code:** https://github.com/penghao-wu/visual_jigsaw
|
| 14 |
+
|
| 15 |
+
<p align="center">
|
| 16 |
+
<img src="https://github.com/penghao-wu/visual_jigsaw/raw/main/assets/overview.png" alt="Overview of Visual Jigsaw" width="700"/>
|
| 17 |
+
</p>
|
| 18 |
+
|
| 19 |
+
## License
|
| 20 |
+
This project is under the Apache-2.0 license.
|
| 21 |
+
|
| 22 |
+
## Citation
|
| 23 |
+
Please consider citing our paper if you find this project helpful for your research:
|
| 24 |
+
|
| 25 |
+
```bibtex
|
| 26 |
+
@article{visual_jigsaw,
|
| 27 |
+
author = {Wu, Penghao and Yushan, Zhang and Haiwen, Diao and Bo, Li and Lu, Lewei and Liu, Ziwei},
|
| 28 |
+
title = {Visual Jigsaw Post-Training Improves MLLMs},
|
| 29 |
+
journal={arXiv preprint arXiv:2509.25190},
|
| 30 |
+
year={2025}}
|
| 31 |
+
```
|