Image-Text-to-Text
Safetensors
English
qwen2_5_vl
agent
conversational
hitsmy's picture
Update README.md
75b72d5 verified
---
license: apache-2.0
datasets:
- hitsmy/AdaReasoner-TC-Randomized
- hitsmy/AdaReasoner-TG-Data-Randomized
language:
- en
metrics:
- accuracy
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
pipeline_tag: image-text-to-text
tags:
- agent
---
<div align="center">
<img src="docs/logo.png" alt="Logo" width="300">
<h1 align="center">Dynamic Tool Orchestration for Iterative Visual Reasoning</h1>
<a href="#">
<img src="https://img.shields.io/badge/Paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white" alt="Paper">
</a>
<a href="https://github.com/ssmisya/AdaReasoner/tree/main/docs">
<img src="https://img.shields.io/badge/Docs-1f6feb?style=for-the-badge&logo=readthedocs&logoColor=white" alt="Docs">
</a>
<a href="https://huggingface.co/collections/hitsmy/adareasoner">
<img src="https://img.shields.io/badge/Data%20%26%20Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000" alt="Data & Model">
</a>
<a href="https://adareasoner.github.io">
<img src="https://img.shields.io/badge/Homepage-2ea44f?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Homepage">
</a>
<a href="https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval/demo">
<img src="https://img.shields.io/badge/Demo-FF7C00?style=for-the-badge&logo=gradio&logoColor=white" alt="Demo">
</a>
<a href="https://www.youtube.com/watch?v=AtBoJYW_yDA">
<img src="https://img.shields.io/badge/Video-FF0000?style=for-the-badge&logo=youtube&logoColor=white" alt="Video">
</a>
</div>
---
## 📋 Model Description
**AdaReasoner-7B** is a vision-language model trained with dynamic tool orchestration capabilities for iterative visual reasoning. This model is AdaReasoner-7B-Randomized.
We provide three variants of AdaReasoner-7B, each optimized for different use cases:
| Model | Description | Hugging Face |
|------|-------------|--------------|
| **AdaReasoner-7B-Randomized** | Trained with the *adaptive learning* method, enabling strong generalization to **unseen tools and tasks**. Designed for open-ended and evolving tool environments where adaptability is required. | [🤗 Link](https://huggingface.co/AdaReasoner/AdaReasoner-7B-Randomized/) |
| **AdaReasoner-7B-Non-Randomized** | Trained **without adaptive learning**, providing **more stable and reliable performance on known tools and tasks**, but limited generalization to unseen tools or task settings. | [🤗 Link](https://huggingface.co/AdaReasoner/AdaReasoner-7B-Non-Randomized) |
| **AdaReasoner-VSP-7B** | Task-specialized model trained **exclusively on the Visual Spatial Planning (VSP) task**, achieving strong performance on VSP benchmarks but not intended for cross-task generalization. | [🤗 Link](https://huggingface.co/AdaReasoner/AdaReasoner-VSP-7B) |
**Key Differences:**
- **Randomized**: Trained with adaptive learning method, enabling zero-shot generalization to novel tools and task configurations
- **Non-Randomized**: Trained without adaptive learning, offering more predictable behavior on familiar tools but lacking generalization
- **VSP-7B**: Task-specific model fine-tuned exclusively on Visual Spatial Planning (VSP) benchmarks for optimal performance on navigation tasks
## 🚀 Quick Start
AdaReasoner-7B can be deployed for single-turn inference using standard inference frameworks such as vLLM.
However, AdaReasoner is a tool-planning model whose full capabilities require interaction with an external tool environment.
To fully evaluate or utilize its tool-planning behavior, we recommend using [AdaEval](https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval) provided in our repository for batch inference and evaluation, or trying the [Demo](https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval/demo) interface for interactive, single-instance GUI-based reasoning.
## 🎯 Capabilities
The model supports a diverse set of visual reasoning tasks, covering both structured reasoning and open-ended visual understanding:
- **Visual Spatial Planning**
Navigation and verification tasks based on grid-world environments (VSPO and VSP), evaluating fine-grained spatial perception, multi-step path planning, and safety verification under out-of-distribution map configurations.
- **Compositional Visual Reasoning (Jigsaw)**
Image reconstruction from shuffled patches (Jigsaw-COCO and BLINK-J), testing local–global consistency, part–whole reasoning, and visual compositional understanding.
- **GUI Question Answering (GUIQA)**
Fine-grained reasoning over GUI screenshots, including interactive webpage understanding (GUIChat) and agent-centric UI reasoning from WebMMU (Agentic Action subset), emphasizing element grounding, action planning, and multi-step inference.
- **General Visual Question Answering (General VQA)**
Open-ended visual reasoning beyond structured settings, evaluated on V* and HRBench, focusing on fine-grained visual search, attribute recognition, spatial relationship reasoning, and robustness to high-resolution, complex real-world scenes.
## 🛠️ Tool Integration
For full tool-augmented inference capabilities, please refer to the [AdaReasoner repository](https://github.com/ssmisya/AdaReasoner) which includes:
- Tool Server deployment
- AdaEval evaluation framework
- Complete inference pipeline
## 📊 Performance
Please refer to our paper for detailed benchmark results across multiple visual reasoning tasks.
## 🔧 Technical Details
- **Base Architecture**: Qwen 2.5 VL 7B Instruct
- **Training Method**: Tool Cold Start (SFT) + Tool GRPO (RL) + Adaptive Learning
- **Context Length**: Support for extended context with multiple tool interactions
- **Modalities**: Text + Vision
## 📚 Citation
If you use this model in your research, please cite:
```bibtex
@article{adareasoner2024,
title={Dynamic Tool Orchestration for Iterative Visual Reasoning},
author={AdaReasoner Team},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2024}
}
```
## 📄 License
Apache 2.0
## 🤝 Acknowledgments
This model is part of the AdaReasoner project. For more information, visit our [GitHub repository](https://github.com/ssmisya/AdaReasoner).
## 📧 Contact
For questions and feedback, please open an issue in our [GitHub repository](https://github.com/ssmisya/AdaReasoner).