File size: 1,680 Bytes
f3f9382 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | ---
license: apache-2.0
datasets:
- violetcliff/SmartHome-Bench
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
---
# DeepIntuit
## Model Description
**DeepIntuit** is a reasoning-enhanced video understanding model designed for **open-instance video classification**. Instead of directly mapping visual features to labels, the model learns to generate **intrinsic reasoning traces** that guide the final classification decision, improving robustness under large intra-class variation.
The model is introduced in:
**From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification**
📄 Paper: [https://arxiv.org/abs/2603.10300](https://arxiv.org/abs/2603.10300)
💻 Code: [https://github.com/BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit)
---
## Training Pipeline
DeepIntuit is trained through a three-stage pipeline:
1. **Cold Start Alignment**
Supervised training to initialize structured reasoning generation.
2. **Reasoning Refinement (GRPO)**
Reinforcement learning improves reasoning quality and prediction consistency.
3. **Intuitive Calibration**
A lightweight classifier is trained on generated reasoning traces for stable prediction.
---
## Intended Use
DeepIntuit is designed for research on:
* video understanding
* open-instance video classification
* reasoning-enhanced multimodal learning
* safety-sensitive video analysis
## Citation
```bibtex
@article{zhang2026deepintuit,
title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification},
author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di},
year={2026}
}
```
|