AdaReasoner-TC-7B-Non-Randomized / README.md

Upload folder using huggingface_hub

d947ed5 verified 3 days ago

3.81 kB

	<div align="center">
	<img src="docs/logo.png" alt="Logo" width="300">
	<h1 align="center">Dynamic Tool Orchestration for Iterative Visual Reasoning</h1>

	<a href="#">
	<img src="https://img.shields.io/badge/Paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white" alt="Paper">
	</a>
	<a href="https://github.com/ssmisya/AdaReasoner/tree/main/docs">
	<img src="https://img.shields.io/badge/Docs-1f6feb?style=for-the-badge&logo=readthedocs&logoColor=white" alt="Docs">
	</a>
	<a href="https://huggingface.co/collections/hitsmy/adareasoner">
	<img src="https://img.shields.io/badge/Data%20%26%20Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000" alt="Data & Model">
	</a>
	<a href="https://adareasoner.github.io">
	<img src="https://img.shields.io/badge/Homepage-2ea44f?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Homepage">
	</a>

	<a href="https://github.com/ssmisya/AdaReasoner/tree/main/tool_server/tf_eval/demo">
	<img src="https://img.shields.io/badge/Demo-FF7C00?style=for-the-badge&logo=gradio&logoColor=white" alt="Demo">
	</a>
	<a href="https://www.youtube.com/watch?v=AtBoJYW_yDA">
	<img src="https://img.shields.io/badge/Video-FF0000?style=for-the-badge&logo=youtube&logoColor=white" alt="Video">
	</a>

	</div>


	## 🔔 Important Note on Model Status

	The models released on this page belong to the AdaReasoner-TC series and are not the final RL-fine-tuned models.
	They are trained using Tool Cold Start (TC) supervised fine-tuning only, and are intended for analysis, ablation, and reproducibility purposes.

	For RL fine-tuned version, please refer to [Data & models](https://github.com/ssmisya/AdaReasoner/tree/main/docs/data_models.md)

	## 📋 Model Description

	AdaReasoner-7B is a vision-language model trained with dynamic tool orchestration capabilities for iterative visual reasoning.

	AdaReasoner-TC series are trained through TC (Tool Cold Start) supervised fine-tuning only, without subsequent RL fine-tuning.

	We provide three variants of AdaReasoner-7B, each optimized for different use cases:

	\| Model \| Description \| Hugging Face \|
	\|------\|-------------\|--------------\|
	\| AdaReasoner-TC-7B-Randomized \| Trained with the adaptive learning method, enabling strong generalization to unseen tools and tasks. Designed for open-ended and evolving tool environments where adaptability is required. \| [🤗 Link](https://huggingface.co/AdaReasoner/AdaReasoner-TC-7B-Randomized) \|
	\| AdaReasoner-TC-7B-Non-Randomized \| Trained without adaptive learning, providing more stable and reliable performance on known tools and tasks, but limited generalization to unseen tools or task settings. \| [🤗 Link](https://huggingface.co/AdaReasoner/AdaReasoner-TC-7B-Non-Randomized) \|




	Key Differences:
	- Randomized: Trained with adaptive learning method, enabling zero-shot generalization to novel tools and task configurations
	- Non-Randomized: Trained without adaptive learning, offering more predictable behavior on familiar tools but lacking generalization



	## 📊 Performance

	Please refer to our paper for detailed benchmark results across multiple visual reasoning tasks.


	## 📚 Citation

	If you use this model in your research, please cite:

	```bibtex
	@article{adareasoner2024,
	title={Dynamic Tool Orchestration for Iterative Visual Reasoning},
	author={AdaReasoner Team},
	journal={arXiv preprint arXiv:XXXX.XXXXX},
	year={2024}
	}
	```

	## 📄 License

	Apache 2.0

	## 🤝 Acknowledgments

	This model is part of the AdaReasoner project. For more information, visit our [GitHub repository](https://github.com/ssmisya/AdaReasoner).

	## 📧 Contact

	For questions and feedback, please open an issue in our [GitHub repository](https://github.com/ssmisya/AdaReasoner).