| # VTool-R1 | |
| Model weights for the paper "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use" | |
| [](https://arxiv.org/pdf/2505.19255) | |
| [](https://vtool-r1.github.io/) | |
| [](https://huggingface.co/VTOOL) | |
| [Chart 3B](https://huggingface.co/VTOOL/VTOOL-R1-3B-V3-F) | |
| [Chart 7B](https://huggingface.co/VTOOL/VTOOL-R1-7B-F) | |
| [Chart 32B](https://huggingface.co/VTOOL/VTOOL-R1-32B-F) | |
| We are working on training better versions of our Table models, they will be available very soon. | |
| [Table 3B (Soon)]() | |
| [Table 7B (Soon)]() | |
| [Table 32B (Soon)]() | |
| If you find our project helpful, please cite: | |
| <pre style="background-color: auto; padding: 0.8rem 1rem 0.4rem 1rem; border-radius: 8px; overflow-x: auto; font-size: 0.9rem;"> | |
| @misc{wu2025vtoolr1vlmslearnthink, | |
| title={VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use}, | |
| author={Mingyuan Wu and Jingcheng Yang and Jize Jiang and Meitang Li and Kaizhuo Yan and Hanchao Yu and Minjia Zhang and Chengxiang Zhai and Klara Nahrstedt}, | |
| year={2025}, | |
| eprint={2505.19255}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG}, | |
| url={https://arxiv.org/abs/2505.19255}, | |
| } | |