| --- |
| pipeline_tag: video-classification |
| library_name: transformers |
| tags: |
| - video-reasoning |
| - VLM |
| - reinforcement-learning |
| --- |
| --- |
|
|
| # DeepIntuit |
|
|
| [DeepIntuit](https://bwgzk-keke.github.io/DeepIntuit/) is a progressive framework for **open-instance video classification** that evolves models from simple feature imitation to intrinsic reasoning. |
|
|
| - **Paper:** [From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification](https://huggingface.co/papers/2603.10300) |
| - **Repository:** [BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit) |
| - **Project Page:** [https://bwgzk-keke.github.io/DeepIntuit/](https://bwgzk-keke.github.io/DeepIntuit/) |
|
|
| ## Model Description |
|
|
| DeepIntuit bridges the gap between traditional video encoders and the generalization capabilities of vision-language models (VLMs). Instead of directly predicting labels from visual features, it utilizes a three-stage reasoning pipeline: |
|
|
| 1. **Cold-start supervised alignment:** Initializes reasoning capability using supervised traces generated by a teacher model. |
| 2. **Intrinsic reasoning refinement (Stage 1):** Refines the reasoning ability using **Group Relative Policy Optimization (GRPO)** reinforcement learning to enhance coherence. |
| 3. **Intuitive calibration (Stage 2):** Trains a classifier on the intrinsic reasoning traces to ensure stable knowledge transfer and accurate classification results. |
|
|
| This approach decouples reasoning generation from final decision-making, significantly improving robustness in scenarios with vast intra-class variations. |
|
|
| ## Installation |
|
|
| The repository contains separate environments for each stage. For inference using the final model, set up the stage 2 environment: |
|
|
| ```bash |
| git clone https://github.com/BWGZK-keke/DeepIntuit.git |
| cd DeepIntuit/stage2_model |
| pip install -r requirements.txt |
| ``` |
|
|
| ## Sample Usage |
|
|
| After setting up the environment, you can run inference using the following command provided in the official repository: |
|
|
| ```bash |
| cd stage2_model |
| python inference.py \ |
| --model_path BWGZK/DeepIntuit \ |
| --video_path path_to_your_video.mp4 |
| ``` |
|
|
| ## Citation |
|
|
| If you find this work useful, please cite: |
|
|
| ```bibtex |
| @article{zhang2026deepintuit, |
| title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification}, |
| author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di}, |
| journal={arXiv preprint arXiv:2603.10300}, |
| year={2026} |
| } |
| ``` |