Safetensors
File size: 1,680 Bytes
f3f9382
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: apache-2.0
datasets:
- violetcliff/SmartHome-Bench
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
---

# DeepIntuit

## Model Description

**DeepIntuit** is a reasoning-enhanced video understanding model designed for **open-instance video classification**. Instead of directly mapping visual features to labels, the model learns to generate **intrinsic reasoning traces** that guide the final classification decision, improving robustness under large intra-class variation.

The model is introduced in:

**From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification**
📄 Paper: [https://arxiv.org/abs/2603.10300](https://arxiv.org/abs/2603.10300)
💻 Code: [https://github.com/BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit)

---

## Training Pipeline

DeepIntuit is trained through a three-stage pipeline:

1. **Cold Start Alignment**
   Supervised training to initialize structured reasoning generation.

2. **Reasoning Refinement (GRPO)**
   Reinforcement learning improves reasoning quality and prediction consistency.

3. **Intuitive Calibration**
   A lightweight classifier is trained on generated reasoning traces for stable prediction.

---

## Intended Use

DeepIntuit is designed for research on:

* video understanding
* open-instance video classification
* reasoning-enhanced multimodal learning
* safety-sensitive video analysis


## Citation

```bibtex
@article{zhang2026deepintuit,
  title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification},
  author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di},
  year={2026}
}
```