Safetensors
File size: 2,475 Bytes
f90a24c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
pipeline_tag: video-classification
library_name: transformers
tags:
- video-reasoning
- VLM
- reinforcement-learning
---
---

# DeepIntuit

[DeepIntuit](https://bwgzk-keke.github.io/DeepIntuit/) is a progressive framework for **open-instance video classification** that evolves models from simple feature imitation to intrinsic reasoning.

- **Paper:** [From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification](https://huggingface.co/papers/2603.10300)
- **Repository:** [BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit)
- **Project Page:** [https://bwgzk-keke.github.io/DeepIntuit/](https://bwgzk-keke.github.io/DeepIntuit/)

## Model Description

DeepIntuit bridges the gap between traditional video encoders and the generalization capabilities of vision-language models (VLMs). Instead of directly predicting labels from visual features, it utilizes a three-stage reasoning pipeline:

1.  **Cold-start supervised alignment:** Initializes reasoning capability using supervised traces generated by a teacher model.
2.  **Intrinsic reasoning refinement (Stage 1):** Refines the reasoning ability using **Group Relative Policy Optimization (GRPO)** reinforcement learning to enhance coherence.
3.  **Intuitive calibration (Stage 2):** Trains a classifier on the intrinsic reasoning traces to ensure stable knowledge transfer and accurate classification results.

This approach decouples reasoning generation from final decision-making, significantly improving robustness in scenarios with vast intra-class variations.

## Installation

The repository contains separate environments for each stage. For inference using the final model, set up the stage 2 environment:

```bash
git clone https://github.com/BWGZK-keke/DeepIntuit.git
cd DeepIntuit/stage2_model
pip install -r requirements.txt
```

## Sample Usage

After setting up the environment, you can run inference using the following command provided in the official repository:

```bash
cd stage2_model
python inference.py \
  --model_path BWGZK/DeepIntuit \
  --video_path path_to_your_video.mp4
```

## Citation

If you find this work useful, please cite:

```bibtex
@article{zhang2026deepintuit,
  title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification},
  author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di},
  journal={arXiv preprint arXiv:2603.10300},
  year={2026}
}
```