Image-Text-to-Text
Safetensors
English
qwen2_5_vl
agent
conversational
hitsmy commited on
Commit
b905f92
·
verified ·
1 Parent(s): b3d101f

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ docs/logo.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <img src="docs/logo.png" alt="Logo" width="300">
3
+ <h1 align="center">Dynamic Tool Orchestration for Iterative Visual Reasoning</h1>
4
+
5
+ <a href="#">
6
+ <img src="https://img.shields.io/badge/Paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white" alt="Paper">
7
+ </a>
8
+ <a href="https://github.com/ssmisya/AdaReasoner/tree/main/docs">
9
+ <img src="https://img.shields.io/badge/Docs-1f6feb?style=for-the-badge&logo=readthedocs&logoColor=white" alt="Docs">
10
+ </a>
11
+ <a href="https://huggingface.co/collections/hitsmy/adareasoner">
12
+ <img src="https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000" alt="Data">
13
+ </a>
14
+ <a href="https://https://adareasoner.github.io/">
15
+ <img src="https://img.shields.io/badge/Homepage-2ea44f?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Homepage">
16
+ </a>
17
+ <a href="https://huggingface.co/hitsmy/AdaReasoner-7B">
18
+ <img src="https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000" alt="Model">
19
+ </a>
20
+ <a href="https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval/demo">
21
+ <img src="https://img.shields.io/badge/Demo-FF7C00?style=for-the-badge&logo=gradio&logoColor=white" alt="Demo">
22
+ </a>
23
+
24
+ </div>
25
+
26
+
27
+ ---
28
+
29
+ ## 📋 Model Description
30
+
31
+ **AdaReasoner-7B** is a vision-language model trained with dynamic tool orchestration capabilities for iterative visual reasoning. This model is AdaReasoner-7B-Randomized.
32
+
33
+
34
+ We provide three variants of AdaReasoner-7B, each optimized for different use cases:
35
+
36
+ | Model | Description | Hugging Face |
37
+ |------|-------------|--------------|
38
+ | **AdaReasoner-7B-Randomized** | Trained with the *adaptive learning* method, enabling strong generalization to **unseen tools and tasks**. Designed for open-ended and evolving tool environments where adaptability is required. | [🤗 Link](https://huggingface.co/hitsmy/AdaReasoner-7B-Randomized) |
39
+ | **AdaReasoner-7B-Non-Randomized** | Trained **without adaptive learning**, providing **more stable and reliable performance on known tools and tasks**, but limited generalization to unseen tools or task settings. | [🤗 Link](https://huggingface.co/hitsmy/AdaReasoner-7B-Non-Randomized) |
40
+ | **AdaReasoner-VSP-7B** | Task-specialized model trained **exclusively on the Visual Spatial Planning (VSP) task**, achieving strong performance on VSP benchmarks but not intended for cross-task generalization. | [🤗 Link](https://huggingface.co/hitsmy/AdaReasoner-VSP-7B) |
41
+
42
+
43
+
44
+ **Key Differences:**
45
+ - **Randomized**: Trained with adaptive learning method, enabling zero-shot generalization to novel tools and task configurations
46
+ - **Non-Randomized**: Trained without adaptive learning, offering more predictable behavior on familiar tools but lacking generalization
47
+ - **VSP-7B**: Task-specific model fine-tuned exclusively on Visual Spatial Planning (VSP) benchmarks for optimal performance on navigation tasks
48
+
49
+ ## 🚀 Quick Start
50
+
51
+ AdaReasoner-7B can be deployed for single-turn inference using standard inference frameworks such as vLLM.
52
+ However, AdaReasoner is a tool-planning model whose full capabilities require interaction with an external tool environment.
53
+ To fully evaluate or utilize its tool-planning behavior, we recommend using [AdaEval](https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval) provided in our repository for batch inference and evaluation, or trying the [Demo](https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval/demo) interface for interactive, single-instance GUI-based reasoning.
54
+
55
+ ## 🎯 Capabilities
56
+
57
+ The model supports a diverse set of visual reasoning tasks, covering both structured reasoning and open-ended visual understanding:
58
+ - **Visual Spatial Planning**
59
+ Navigation and verification tasks based on grid-world environments (VSPO and VSP), evaluating fine-grained spatial perception, multi-step path planning, and safety verification under out-of-distribution map configurations.
60
+ - **Compositional Visual Reasoning (Jigsaw)**
61
+ Image reconstruction from shuffled patches (Jigsaw-COCO and BLINK-J), testing local–global consistency, part–whole reasoning, and visual compositional understanding.
62
+ - **GUI Question Answering (GUIQA)**
63
+ Fine-grained reasoning over GUI screenshots, including interactive webpage understanding (GUIChat) and agent-centric UI reasoning from WebMMU (Agentic Action subset), emphasizing element grounding, action planning, and multi-step inference.
64
+ - **General Visual Question Answering (General VQA)**
65
+ Open-ended visual reasoning beyond structured settings, evaluated on V* and HRBench, focusing on fine-grained visual search, attribute recognition, spatial relationship reasoning, and robustness to high-resolution, complex real-world scenes.
66
+
67
+ ## 🛠️ Tool Integration
68
+
69
+ For full tool-augmented inference capabilities, please refer to the [AdaReasoner repository](https://github.com/ssmisya/AdaReasoner) which includes:
70
+
71
+ - Tool Server deployment
72
+ - AdaEval evaluation framework
73
+ - Complete inference pipeline
74
+
75
+ ## 📊 Performance
76
+
77
+ Please refer to our paper for detailed benchmark results across multiple visual reasoning tasks.
78
+
79
+ ## 🔧 Technical Details
80
+
81
+ - **Base Architecture**: Qwen 2.5 VL 7B Instruct
82
+ - **Training Method**: Tool Cold Start (SFT) + Tool GRPO (RL) + Adaptive Learning
83
+ - **Context Length**: Support for extended context with multiple tool interactions
84
+ - **Modalities**: Text + Vision
85
+
86
+ ## 📚 Citation
87
+
88
+ If you use this model in your research, please cite:
89
+
90
+ ```bibtex
91
+ @article{adareasoner2024,
92
+ title={Dynamic Tool Orchestration for Iterative Visual Reasoning},
93
+ author={AdaReasoner Team},
94
+ journal={arXiv preprint arXiv:XXXX.XXXXX},
95
+ year={2024}
96
+ }
97
+ ```
98
+
99
+ ## 📄 License
100
+
101
+ Apache 2.0
102
+
103
+ ## 🤝 Acknowledgments
104
+
105
+ This model is part of the AdaReasoner project. For more information, visit our [GitHub repository](https://github.com/ssmisya/AdaReasoner).
106
+
107
+ ## 📧 Contact
108
+
109
+ For questions and feedback, please open an issue in our [GitHub repository](https://github.com/ssmisya/AdaReasoner).
docs/logo.png ADDED

Git LFS Details

  • SHA256: f63fdbdc7aad0940f49bcf65aae36d0931ab52c3d5718dc9933c83b9a5fea5b3
  • Pointer size: 131 Bytes
  • Size of remote file: 143 kB
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a6008bb0888133db96fbfc4fe6f116404500631c631616ceb6ef80beafba5fac
3
- size 4292527792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1973c3a320dc134a744ccde54b18d989fc07cc099aecbbf31887849b7dc1904
3
+ size 4929434944
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7bbd2bbd935e280faa3f0fa41b1445fdd8f0c70683c825ce7bf52b8b87f6a268
3
- size 4971952072
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15d302e20f3efa12f5d7581f95fc69a835d54a256776bc2d8f8a82996ed6d3c8
3
+ size 4915523392
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5a66736ba187d06c2c780848f88cb6e870941a5b3abcab97a543e1e048a8e464
3
- size 4438207856
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3578c00179db25315e0a784fb069fd15c18426aef594a9d07012b7705e5135e9
3
+ size 4074022040
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:be5ee14615044c15ecc2b2aa6df099a86505b40882eedba2b7eed80b60778f16
3
- size 2881726816
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9ab23d0816164a4b9c7e207589d21f228c5378e2c818dfa210fb6886f78b46b
3
+ size 2665434168
model.safetensors.index.json CHANGED
The diff for this file is too large to render. See raw diff