Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

.gitattributes +1 -0
README.md +109 -0
docs/logo.png +3 -0
model-00001-of-00004.safetensors +2 -2
model-00002-of-00004.safetensors +2 -2
model-00003-of-00004.safetensors +2 -2
model-00004-of-00004.safetensors +2 -2
model.safetensors.index.json +0 -0

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
+docs/logo.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,109 @@

+<div align="center">
+  <img src="docs/logo.png" alt="Logo" width="300">
+  <h1 align="center">Dynamic Tool Orchestration for Iterative Visual Reasoning</h1>
+  <a href="#">
+    <img src="https://img.shields.io/badge/Paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white" alt="Paper">
+  </a>
+  <a href="https://github.com/ssmisya/AdaReasoner/tree/main/docs">
+    <img src="https://img.shields.io/badge/Docs-1f6feb?style=for-the-badge&logo=readthedocs&logoColor=white" alt="Docs">
+  </a>
+  <a href="https://huggingface.co/collections/hitsmy/adareasoner">
+    <img src="https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000" alt="Data">
+  </a>
+  <a href="https://https://adareasoner.github.io/">
+    <img src="https://img.shields.io/badge/Homepage-2ea44f?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Homepage">
+  </a>
+  <a href="https://huggingface.co/hitsmy/AdaReasoner-7B">
+    <img src="https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000" alt="Model">
+  </a>
+  <a href="https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval/demo">
+  <img src="https://img.shields.io/badge/Demo-FF7C00?style=for-the-badge&logo=gradio&logoColor=white" alt="Demo">
+  </a>
+</div>
+---
+## 📋 Model Description
+**AdaReasoner-7B** is a vision-language model trained with dynamic tool orchestration capabilities for iterative visual reasoning. This model is AdaReasoner-7B-Randomized.
+We provide three variants of AdaReasoner-7B, each optimized for different use cases:
+| Model | Description | Hugging Face |
+|------|-------------|--------------|
+| **AdaReasoner-7B-Randomized** | Trained with the *adaptive learning* method, enabling strong generalization to **unseen tools and tasks**. Designed for open-ended and evolving tool environments where adaptability is required. | [🤗 Link](https://huggingface.co/hitsmy/AdaReasoner-7B-Randomized) |
+| **AdaReasoner-7B-Non-Randomized** | Trained **without adaptive learning**, providing **more stable and reliable performance on known tools and tasks**, but limited generalization to unseen tools or task settings. | [🤗 Link](https://huggingface.co/hitsmy/AdaReasoner-7B-Non-Randomized) |
+| **AdaReasoner-VSP-7B** | Task-specialized model trained **exclusively on the Visual Spatial Planning (VSP) task**, achieving strong performance on VSP benchmarks but not intended for cross-task generalization. | [🤗 Link](https://huggingface.co/hitsmy/AdaReasoner-VSP-7B) |
+**Key Differences:**
+- **Randomized**: Trained with adaptive learning method, enabling zero-shot generalization to novel tools and task configurations
+- **Non-Randomized**: Trained without adaptive learning, offering more predictable behavior on familiar tools but lacking generalization
+- **VSP-7B**: Task-specific model fine-tuned exclusively on Visual Spatial Planning (VSP) benchmarks for optimal performance on navigation tasks
+## 🚀 Quick Start
+AdaReasoner-7B can be deployed for single-turn inference using standard inference frameworks such as vLLM.
+However, AdaReasoner is a tool-planning model whose full capabilities require interaction with an external tool environment.
+To fully evaluate or utilize its tool-planning behavior, we recommend using [AdaEval](https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval) provided in our repository for batch inference and evaluation, or trying the [Demo](https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval/demo) interface for interactive, single-instance GUI-based reasoning.
+## 🎯 Capabilities
+The model supports a diverse set of visual reasoning tasks, covering both structured reasoning and open-ended visual understanding:
+-	**Visual Spatial Planning**
+Navigation and verification tasks based on grid-world environments (VSPO and VSP), evaluating fine-grained spatial perception, multi-step path planning, and safety verification under out-of-distribution map configurations.
+-	**Compositional Visual Reasoning (Jigsaw)**
+Image reconstruction from shuffled patches (Jigsaw-COCO and BLINK-J), testing local–global consistency, part–whole reasoning, and visual compositional understanding.
+-	**GUI Question Answering (GUIQA)**
+Fine-grained reasoning over GUI screenshots, including interactive webpage understanding (GUIChat) and agent-centric UI reasoning from WebMMU (Agentic Action subset), emphasizing element grounding, action planning, and multi-step inference.
+-	**General Visual Question Answering (General VQA)**
+Open-ended visual reasoning beyond structured settings, evaluated on V* and HRBench, focusing on fine-grained visual search, attribute recognition, spatial relationship reasoning, and robustness to high-resolution, complex real-world scenes.
+## 🛠️ Tool Integration
+For full tool-augmented inference capabilities, please refer to the [AdaReasoner repository](https://github.com/ssmisya/AdaReasoner) which includes:
+- Tool Server deployment
+- AdaEval evaluation framework
+- Complete inference pipeline
+## 📊 Performance
+Please refer to our paper for detailed benchmark results across multiple visual reasoning tasks.
+## 🔧 Technical Details
+- **Base Architecture**: Qwen 2.5 VL 7B Instruct
+- **Training Method**: Tool Cold Start (SFT) + Tool GRPO (RL) + Adaptive Learning
+- **Context Length**: Support for extended context with multiple tool interactions
+- **Modalities**: Text + Vision
+## 📚 Citation
+If you use this model in your research, please cite:
+```bibtex
+@article{adareasoner2024,
+  title={Dynamic Tool Orchestration for Iterative Visual Reasoning},
+  author={AdaReasoner Team},
+  journal={arXiv preprint arXiv:XXXX.XXXXX},
+  year={2024}
+}
+```
+## 📄 License
+Apache 2.0
+## 🤝 Acknowledgments
+This model is part of the AdaReasoner project. For more information, visit our [GitHub repository](https://github.com/ssmisya/AdaReasoner).
+## 📧 Contact
+For questions and feedback, please open an issue in our [GitHub repository](https://github.com/ssmisya/AdaReasoner).

docs/logo.png ADDED Viewed

Git LFS Details

SHA256: f63fdbdc7aad0940f49bcf65aae36d0931ab52c3d5718dc9933c83b9a5fea5b3
Pointer size: 131 Bytes
Size of remote file: 143 kB

model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a6008bb0888133db96fbfc4fe6f116404500631c631616ceb6ef80beafba5fac
-size 4292527792

 version https://git-lfs.github.com/spec/v1
+oid sha256:e1973c3a320dc134a744ccde54b18d989fc07cc099aecbbf31887849b7dc1904
+size 4929434944

model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7bbd2bbd935e280faa3f0fa41b1445fdd8f0c70683c825ce7bf52b8b87f6a268
-size 4971952072

 version https://git-lfs.github.com/spec/v1
+oid sha256:15d302e20f3efa12f5d7581f95fc69a835d54a256776bc2d8f8a82996ed6d3c8
+size 4915523392

model-00003-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5a66736ba187d06c2c780848f88cb6e870941a5b3abcab97a543e1e048a8e464
-size 4438207856

 version https://git-lfs.github.com/spec/v1
+oid sha256:3578c00179db25315e0a784fb069fd15c18426aef594a9d07012b7705e5135e9
+size 4074022040

model-00004-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:be5ee14615044c15ecc2b2aa6df099a86505b40882eedba2b7eed80b60778f16
-size 2881726816

 version https://git-lfs.github.com/spec/v1
+oid sha256:e9ab23d0816164a4b9c7e207589d21f228c5378e2c818dfa210fb6886f78b46b
+size 2665434168

model.safetensors.index.json CHANGED Viewed

The diff for this file is too large to render. See raw diff