Safetensors

Add model card for DeepIntuit

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: video-classification
3
+ library_name: transformers
4
+ tags:
5
+ - video-reasoning
6
+ - VLM
7
+ - reinforcement-learning
8
+ ---
9
+ ---
10
+
11
+ # DeepIntuit
12
+
13
+ [DeepIntuit](https://bwgzk-keke.github.io/DeepIntuit/) is a progressive framework for **open-instance video classification** that evolves models from simple feature imitation to intrinsic reasoning.
14
+
15
+ - **Paper:** [From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification](https://huggingface.co/papers/2603.10300)
16
+ - **Repository:** [BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit)
17
+ - **Project Page:** [https://bwgzk-keke.github.io/DeepIntuit/](https://bwgzk-keke.github.io/DeepIntuit/)
18
+
19
+ ## Model Description
20
+
21
+ DeepIntuit bridges the gap between traditional video encoders and the generalization capabilities of vision-language models (VLMs). Instead of directly predicting labels from visual features, it utilizes a three-stage reasoning pipeline:
22
+
23
+ 1. **Cold-start supervised alignment:** Initializes reasoning capability using supervised traces generated by a teacher model.
24
+ 2. **Intrinsic reasoning refinement (Stage 1):** Refines the reasoning ability using **Group Relative Policy Optimization (GRPO)** reinforcement learning to enhance coherence.
25
+ 3. **Intuitive calibration (Stage 2):** Trains a classifier on the intrinsic reasoning traces to ensure stable knowledge transfer and accurate classification results.
26
+
27
+ This approach decouples reasoning generation from final decision-making, significantly improving robustness in scenarios with vast intra-class variations.
28
+
29
+ ## Installation
30
+
31
+ The repository contains separate environments for each stage. For inference using the final model, set up the stage 2 environment:
32
+
33
+ ```bash
34
+ git clone https://github.com/BWGZK-keke/DeepIntuit.git
35
+ cd DeepIntuit/stage2_model
36
+ pip install -r requirements.txt
37
+ ```
38
+
39
+ ## Sample Usage
40
+
41
+ After setting up the environment, you can run inference using the following command provided in the official repository:
42
+
43
+ ```bash
44
+ cd stage2_model
45
+ python inference.py \
46
+ --model_path BWGZK/DeepIntuit \
47
+ --video_path path_to_your_video.mp4
48
+ ```
49
+
50
+ ## Citation
51
+
52
+ If you find this work useful, please cite:
53
+
54
+ ```bibtex
55
+ @article{zhang2026deepintuit,
56
+ title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification},
57
+ author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di},
58
+ journal={arXiv preprint arXiv:2603.10300},
59
+ year={2026}
60
+ }
61
+ ```