Upload folder using huggingface_hub
Browse files- .gitattributes +1 -0
- README.md +109 -0
- docs/logo.png +3 -0
- model-00001-of-00004.safetensors +2 -2
- model-00002-of-00004.safetensors +2 -2
- model-00003-of-00004.safetensors +2 -2
- model-00004-of-00004.safetensors +2 -2
- model.safetensors.index.json +0 -0
.gitattributes
CHANGED
|
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
docs/logo.png filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div align="center">
|
| 2 |
+
<img src="docs/logo.png" alt="Logo" width="300">
|
| 3 |
+
<h1 align="center">Dynamic Tool Orchestration for Iterative Visual Reasoning</h1>
|
| 4 |
+
|
| 5 |
+
<a href="#">
|
| 6 |
+
<img src="https://img.shields.io/badge/Paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white" alt="Paper">
|
| 7 |
+
</a>
|
| 8 |
+
<a href="https://github.com/ssmisya/AdaReasoner/tree/main/docs">
|
| 9 |
+
<img src="https://img.shields.io/badge/Docs-1f6feb?style=for-the-badge&logo=readthedocs&logoColor=white" alt="Docs">
|
| 10 |
+
</a>
|
| 11 |
+
<a href="https://huggingface.co/collections/hitsmy/adareasoner">
|
| 12 |
+
<img src="https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000" alt="Data">
|
| 13 |
+
</a>
|
| 14 |
+
<a href="https://https://adareasoner.github.io/">
|
| 15 |
+
<img src="https://img.shields.io/badge/Homepage-2ea44f?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Homepage">
|
| 16 |
+
</a>
|
| 17 |
+
<a href="https://huggingface.co/hitsmy/AdaReasoner-7B">
|
| 18 |
+
<img src="https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000" alt="Model">
|
| 19 |
+
</a>
|
| 20 |
+
<a href="https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval/demo">
|
| 21 |
+
<img src="https://img.shields.io/badge/Demo-FF7C00?style=for-the-badge&logo=gradio&logoColor=white" alt="Demo">
|
| 22 |
+
</a>
|
| 23 |
+
|
| 24 |
+
</div>
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## 📋 Model Description
|
| 30 |
+
|
| 31 |
+
**AdaReasoner-7B** is a vision-language model trained with dynamic tool orchestration capabilities for iterative visual reasoning. This model is AdaReasoner-7B-Randomized.
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
We provide three variants of AdaReasoner-7B, each optimized for different use cases:
|
| 35 |
+
|
| 36 |
+
| Model | Description | Hugging Face |
|
| 37 |
+
|------|-------------|--------------|
|
| 38 |
+
| **AdaReasoner-7B-Randomized** | Trained with the *adaptive learning* method, enabling strong generalization to **unseen tools and tasks**. Designed for open-ended and evolving tool environments where adaptability is required. | [🤗 Link](https://huggingface.co/hitsmy/AdaReasoner-7B-Randomized) |
|
| 39 |
+
| **AdaReasoner-7B-Non-Randomized** | Trained **without adaptive learning**, providing **more stable and reliable performance on known tools and tasks**, but limited generalization to unseen tools or task settings. | [🤗 Link](https://huggingface.co/hitsmy/AdaReasoner-7B-Non-Randomized) |
|
| 40 |
+
| **AdaReasoner-VSP-7B** | Task-specialized model trained **exclusively on the Visual Spatial Planning (VSP) task**, achieving strong performance on VSP benchmarks but not intended for cross-task generalization. | [🤗 Link](https://huggingface.co/hitsmy/AdaReasoner-VSP-7B) |
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
**Key Differences:**
|
| 45 |
+
- **Randomized**: Trained with adaptive learning method, enabling zero-shot generalization to novel tools and task configurations
|
| 46 |
+
- **Non-Randomized**: Trained without adaptive learning, offering more predictable behavior on familiar tools but lacking generalization
|
| 47 |
+
- **VSP-7B**: Task-specific model fine-tuned exclusively on Visual Spatial Planning (VSP) benchmarks for optimal performance on navigation tasks
|
| 48 |
+
|
| 49 |
+
## 🚀 Quick Start
|
| 50 |
+
|
| 51 |
+
AdaReasoner-7B can be deployed for single-turn inference using standard inference frameworks such as vLLM.
|
| 52 |
+
However, AdaReasoner is a tool-planning model whose full capabilities require interaction with an external tool environment.
|
| 53 |
+
To fully evaluate or utilize its tool-planning behavior, we recommend using [AdaEval](https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval) provided in our repository for batch inference and evaluation, or trying the [Demo](https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval/demo) interface for interactive, single-instance GUI-based reasoning.
|
| 54 |
+
|
| 55 |
+
## 🎯 Capabilities
|
| 56 |
+
|
| 57 |
+
The model supports a diverse set of visual reasoning tasks, covering both structured reasoning and open-ended visual understanding:
|
| 58 |
+
- **Visual Spatial Planning**
|
| 59 |
+
Navigation and verification tasks based on grid-world environments (VSPO and VSP), evaluating fine-grained spatial perception, multi-step path planning, and safety verification under out-of-distribution map configurations.
|
| 60 |
+
- **Compositional Visual Reasoning (Jigsaw)**
|
| 61 |
+
Image reconstruction from shuffled patches (Jigsaw-COCO and BLINK-J), testing local–global consistency, part–whole reasoning, and visual compositional understanding.
|
| 62 |
+
- **GUI Question Answering (GUIQA)**
|
| 63 |
+
Fine-grained reasoning over GUI screenshots, including interactive webpage understanding (GUIChat) and agent-centric UI reasoning from WebMMU (Agentic Action subset), emphasizing element grounding, action planning, and multi-step inference.
|
| 64 |
+
- **General Visual Question Answering (General VQA)**
|
| 65 |
+
Open-ended visual reasoning beyond structured settings, evaluated on V* and HRBench, focusing on fine-grained visual search, attribute recognition, spatial relationship reasoning, and robustness to high-resolution, complex real-world scenes.
|
| 66 |
+
|
| 67 |
+
## 🛠️ Tool Integration
|
| 68 |
+
|
| 69 |
+
For full tool-augmented inference capabilities, please refer to the [AdaReasoner repository](https://github.com/ssmisya/AdaReasoner) which includes:
|
| 70 |
+
|
| 71 |
+
- Tool Server deployment
|
| 72 |
+
- AdaEval evaluation framework
|
| 73 |
+
- Complete inference pipeline
|
| 74 |
+
|
| 75 |
+
## 📊 Performance
|
| 76 |
+
|
| 77 |
+
Please refer to our paper for detailed benchmark results across multiple visual reasoning tasks.
|
| 78 |
+
|
| 79 |
+
## 🔧 Technical Details
|
| 80 |
+
|
| 81 |
+
- **Base Architecture**: Qwen 2.5 VL 7B Instruct
|
| 82 |
+
- **Training Method**: Tool Cold Start (SFT) + Tool GRPO (RL) + Adaptive Learning
|
| 83 |
+
- **Context Length**: Support for extended context with multiple tool interactions
|
| 84 |
+
- **Modalities**: Text + Vision
|
| 85 |
+
|
| 86 |
+
## 📚 Citation
|
| 87 |
+
|
| 88 |
+
If you use this model in your research, please cite:
|
| 89 |
+
|
| 90 |
+
```bibtex
|
| 91 |
+
@article{adareasoner2024,
|
| 92 |
+
title={Dynamic Tool Orchestration for Iterative Visual Reasoning},
|
| 93 |
+
author={AdaReasoner Team},
|
| 94 |
+
journal={arXiv preprint arXiv:XXXX.XXXXX},
|
| 95 |
+
year={2024}
|
| 96 |
+
}
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
## 📄 License
|
| 100 |
+
|
| 101 |
+
Apache 2.0
|
| 102 |
+
|
| 103 |
+
## 🤝 Acknowledgments
|
| 104 |
+
|
| 105 |
+
This model is part of the AdaReasoner project. For more information, visit our [GitHub repository](https://github.com/ssmisya/AdaReasoner).
|
| 106 |
+
|
| 107 |
+
## 📧 Contact
|
| 108 |
+
|
| 109 |
+
For questions and feedback, please open an issue in our [GitHub repository](https://github.com/ssmisya/AdaReasoner).
|
docs/logo.png
ADDED
|
Git LFS Details
|
model-00001-of-00004.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e1973c3a320dc134a744ccde54b18d989fc07cc099aecbbf31887849b7dc1904
|
| 3 |
+
size 4929434944
|
model-00002-of-00004.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:15d302e20f3efa12f5d7581f95fc69a835d54a256776bc2d8f8a82996ed6d3c8
|
| 3 |
+
size 4915523392
|
model-00003-of-00004.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3578c00179db25315e0a784fb069fd15c18426aef594a9d07012b7705e5135e9
|
| 3 |
+
size 4074022040
|
model-00004-of-00004.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e9ab23d0816164a4b9c7e207589d21f228c5378e2c818dfa210fb6886f78b46b
|
| 3 |
+
size 2665434168
|
model.safetensors.index.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|