DeepIntuit / README.md

nielsr HF Staff

Add model card for DeepIntuit

f90a24c verified 5 days ago

2.48 kB

pipeline_tag: video-classification
library_name: transformers
tags:
  - video-reasoning
  - VLM
  - reinforcement-learning

DeepIntuit

DeepIntuit is a progressive framework for open-instance video classification that evolves models from simple feature imitation to intrinsic reasoning.

Paper: From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification
Repository: BWGZK-keke/DeepIntuit
Project Page: https://bwgzk-keke.github.io/DeepIntuit/

Model Description

DeepIntuit bridges the gap between traditional video encoders and the generalization capabilities of vision-language models (VLMs). Instead of directly predicting labels from visual features, it utilizes a three-stage reasoning pipeline:

Cold-start supervised alignment: Initializes reasoning capability using supervised traces generated by a teacher model.
Intrinsic reasoning refinement (Stage 1): Refines the reasoning ability using Group Relative Policy Optimization (GRPO) reinforcement learning to enhance coherence.
Intuitive calibration (Stage 2): Trains a classifier on the intrinsic reasoning traces to ensure stable knowledge transfer and accurate classification results.

This approach decouples reasoning generation from final decision-making, significantly improving robustness in scenarios with vast intra-class variations.

Installation

The repository contains separate environments for each stage. For inference using the final model, set up the stage 2 environment:

git clone https://github.com/BWGZK-keke/DeepIntuit.git
cd DeepIntuit/stage2_model
pip install -r requirements.txt

Sample Usage

After setting up the environment, you can run inference using the following command provided in the official repository:

cd stage2_model
python inference.py \
  --model_path BWGZK/DeepIntuit \
  --video_path path_to_your_video.mp4

Citation

If you find this work useful, please cite:

@article{zhang2026deepintuit,
  title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification},
  author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di},
  journal={arXiv preprint arXiv:2603.10300},
  year={2026}
}