| --- |
| license: apache-2.0 |
| datasets: |
| - violetcliff/SmartHome-Bench |
| base_model: |
| - Qwen/Qwen2.5-VL-7B-Instruct |
| --- |
| |
| # DeepIntuit |
|
|
| ## Model Description |
|
|
| **DeepIntuit** is a reasoning-enhanced video understanding model designed for **open-instance video classification**. Instead of directly mapping visual features to labels, the model learns to generate **intrinsic reasoning traces** that guide the final classification decision, improving robustness under large intra-class variation. |
|
|
| The model is introduced in: |
|
|
| **From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification** |
| 📄 Paper: [https://arxiv.org/abs/2603.10300](https://arxiv.org/abs/2603.10300) |
| 💻 Code: [https://github.com/BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit) |
|
|
| --- |
|
|
| ## Training Pipeline |
|
|
| DeepIntuit is trained through a three-stage pipeline: |
|
|
| 1. **Cold Start Alignment** |
| Supervised training to initialize structured reasoning generation. |
|
|
| 2. **Reasoning Refinement (GRPO)** |
| Reinforcement learning improves reasoning quality and prediction consistency. |
|
|
| 3. **Intuitive Calibration** |
| A lightweight classifier is trained on generated reasoning traces for stable prediction. |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| DeepIntuit is designed for research on: |
|
|
| * video understanding |
| * open-instance video classification |
| * reasoning-enhanced multimodal learning |
| * safety-sensitive video analysis |
|
|
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{zhang2026deepintuit, |
| title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification}, |
| author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di}, |
| year={2026} |
| } |
| ``` |
|
|