| --- |
| base_model: |
| - Qwen/Qwen2.5-VL-7B-Instruct |
| datasets: |
| - violetcliff/SmartHome-Bench |
| license: apache-2.0 |
| pipeline_tag: video-classification |
| library_name: transformers |
| --- |
| |
| # DeepIntuit |
|
|
| ## Model Description |
|
|
| **DeepIntuit** is a reasoning-enhanced video understanding model designed for **open-instance video classification**. Instead of directly mapping visual features to labels, the model learns to generate **intrinsic reasoning traces** that guide the final classification decision, improving robustness under large intra-class variation. |
|
|
| The model is introduced in: |
|
|
| **From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification** |
| ๐ Paper: [https://arxiv.org/abs/2603.10300](https://arxiv.org/abs/2603.10300) |
| ๐ป Code: [https://github.com/BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit) |
| ๐ Project Page: [https://bwgzk-keke.github.io/DeepIntuit/](https://bwgzk-keke.github.io/DeepIntuit/) |
|
|
| --- |
|
|
| ## Training Pipeline |
|
|
| DeepIntuit is trained through a three-stage pipeline: |
|
|
| 1. **Cold Start Alignment** |
| Supervised training to initialize structured reasoning generation. |
|
|
| 2. **Reasoning Refinement (GRPO)** |
| Reinforcement learning improves reasoning quality and prediction consistency. |
|
|
| 3. **Intuitive Calibration** |
| A lightweight classifier is trained on generated reasoning traces for stable prediction. |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| DeepIntuit is designed for research on: |
|
|
| * video understanding |
| * open-instance video classification |
| * reasoning-enhanced multimodal learning |
| * safety-sensitive video analysis |
|
|
| ## Sample Usage |
|
|
| To run inference using the code provided in the [official repository](https://github.com/BWGZK-keke/DeepIntuit): |
|
|
| ```bash |
| cd stage2_model |
| python inference.py \ |
| --model_path BWGZK/DeepIntuit \ |
| --video_path path_to_video.mp4 |
| ``` |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{zhang2026deepintuit, |
| title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification}, |
| author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di}, |
| journal={arXiv preprint arXiv:2603.10300}, |
| year={2026} |
| } |
| ``` |