Add model card for DeepIntuit
Browse filesHi! I'm Niels from the Hugging Face community science team. I noticed that this repository was missing a model card and metadata. I've opened this PR to add a description of the project, links to the paper and code, and the necessary metadata tags. This will help users find and use your model more effectively on the Hub.
README.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: video-classification
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- video-reasoning
|
| 6 |
+
- VLM
|
| 7 |
+
- reinforcement-learning
|
| 8 |
+
---
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# DeepIntuit
|
| 12 |
+
|
| 13 |
+
[DeepIntuit](https://bwgzk-keke.github.io/DeepIntuit/) is a progressive framework for **open-instance video classification** that evolves models from simple feature imitation to intrinsic reasoning.
|
| 14 |
+
|
| 15 |
+
- **Paper:** [From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification](https://huggingface.co/papers/2603.10300)
|
| 16 |
+
- **Repository:** [BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit)
|
| 17 |
+
- **Project Page:** [https://bwgzk-keke.github.io/DeepIntuit/](https://bwgzk-keke.github.io/DeepIntuit/)
|
| 18 |
+
|
| 19 |
+
## Model Description
|
| 20 |
+
|
| 21 |
+
DeepIntuit bridges the gap between traditional video encoders and the generalization capabilities of vision-language models (VLMs). Instead of directly predicting labels from visual features, it utilizes a three-stage reasoning pipeline:
|
| 22 |
+
|
| 23 |
+
1. **Cold-start supervised alignment:** Initializes reasoning capability using supervised traces generated by a teacher model.
|
| 24 |
+
2. **Intrinsic reasoning refinement (Stage 1):** Refines the reasoning ability using **Group Relative Policy Optimization (GRPO)** reinforcement learning to enhance coherence.
|
| 25 |
+
3. **Intuitive calibration (Stage 2):** Trains a classifier on the intrinsic reasoning traces to ensure stable knowledge transfer and accurate classification results.
|
| 26 |
+
|
| 27 |
+
This approach decouples reasoning generation from final decision-making, significantly improving robustness in scenarios with vast intra-class variations.
|
| 28 |
+
|
| 29 |
+
## Installation
|
| 30 |
+
|
| 31 |
+
The repository contains separate environments for each stage. For inference using the final model, set up the stage 2 environment:
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
git clone https://github.com/BWGZK-keke/DeepIntuit.git
|
| 35 |
+
cd DeepIntuit/stage2_model
|
| 36 |
+
pip install -r requirements.txt
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
## Sample Usage
|
| 40 |
+
|
| 41 |
+
After setting up the environment, you can run inference using the following command provided in the official repository:
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
cd stage2_model
|
| 45 |
+
python inference.py \
|
| 46 |
+
--model_path BWGZK/DeepIntuit \
|
| 47 |
+
--video_path path_to_your_video.mp4
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
## Citation
|
| 51 |
+
|
| 52 |
+
If you find this work useful, please cite:
|
| 53 |
+
|
| 54 |
+
```bibtex
|
| 55 |
+
@article{zhang2026deepintuit,
|
| 56 |
+
title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification},
|
| 57 |
+
author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di},
|
| 58 |
+
journal={arXiv preprint arXiv:2603.10300},
|
| 59 |
+
year={2026}
|
| 60 |
+
}
|
| 61 |
+
```
|