--- license: apache-2.0 datasets: - violetcliff/SmartHome-Bench base_model: - Qwen/Qwen2.5-VL-7B-Instruct --- # DeepIntuit ## Model Description **DeepIntuit** is a reasoning-enhanced video understanding model designed for **open-instance video classification**. Instead of directly mapping visual features to labels, the model learns to generate **intrinsic reasoning traces** that guide the final classification decision, improving robustness under large intra-class variation. The model is introduced in: **From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification** 📄 Paper: [https://arxiv.org/abs/2603.10300](https://arxiv.org/abs/2603.10300) 💻 Code: [https://github.com/BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit) --- ## Training Pipeline DeepIntuit is trained through a three-stage pipeline: 1. **Cold Start Alignment** Supervised training to initialize structured reasoning generation. 2. **Reasoning Refinement (GRPO)** Reinforcement learning improves reasoning quality and prediction consistency. 3. **Intuitive Calibration** A lightweight classifier is trained on generated reasoning traces for stable prediction. --- ## Intended Use DeepIntuit is designed for research on: * video understanding * open-instance video classification * reasoning-enhanced multimodal learning * safety-sensitive video analysis ## Citation ```bibtex @article{zhang2026deepintuit, title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification}, author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di}, year={2026} } ```