base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
datasets:
- violetcliff/SmartHome-Bench
license: apache-2.0
pipeline_tag: video-classification
library_name: transformers
DeepIntuit
Model Description
DeepIntuit is a reasoning-enhanced video understanding model designed for open-instance video classification. Instead of directly mapping visual features to labels, the model learns to generate intrinsic reasoning traces that guide the final classification decision, improving robustness under large intra-class variation.
The model is introduced in:
From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification 📄 Paper: https://arxiv.org/abs/2603.10300 💻 Code: https://github.com/BWGZK-keke/DeepIntuit 🏠 Project Page: https://bwgzk-keke.github.io/DeepIntuit/
Training Pipeline
DeepIntuit is trained through a three-stage pipeline:
Cold Start Alignment Supervised training to initialize structured reasoning generation.
Reasoning Refinement (GRPO) Reinforcement learning improves reasoning quality and prediction consistency.
Intuitive Calibration A lightweight classifier is trained on generated reasoning traces for stable prediction.
Intended Use
DeepIntuit is designed for research on:
- video understanding
- open-instance video classification
- reasoning-enhanced multimodal learning
- safety-sensitive video analysis
Sample Usage
To run inference using the code provided in the official repository:
cd stage2_model
python inference.py \
--model_path BWGZK/DeepIntuit \
--video_path path_to_video.mp4
Citation
@article{zhang2026deepintuit,
title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification},
author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di},
journal={arXiv preprint arXiv:2603.10300},
year={2026}
}