BWGZK
/

DeepIntuit

Model card Files Files and versions

DeepIntuit / README.md

nielsr's picture

nielsr HF Staff

Add metadata, project page link and sample usage

d4b1d4b verified about 1 month ago

|

2.14 kB

	---
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	datasets:
	- violetcliff/SmartHome-Bench
	license: apache-2.0
	pipeline_tag: video-classification
	library_name: transformers
	---

	# DeepIntuit

	## Model Description

	DeepIntuit is a reasoning-enhanced video understanding model designed for open-instance video classification. Instead of directly mapping visual features to labels, the model learns to generate intrinsic reasoning traces that guide the final classification decision, improving robustness under large intra-class variation.

	The model is introduced in:

	From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification
	📄 Paper: [https://arxiv.org/abs/2603.10300](https://arxiv.org/abs/2603.10300)
	💻 Code: [https://github.com/BWGZK-keke/DeepIntuit](https://github.com/BWGZK-keke/DeepIntuit)
	🏠 Project Page: [https://bwgzk-keke.github.io/DeepIntuit/](https://bwgzk-keke.github.io/DeepIntuit/)

	---

	## Training Pipeline

	DeepIntuit is trained through a three-stage pipeline:

	1. Cold Start Alignment
	Supervised training to initialize structured reasoning generation.

	2. Reasoning Refinement (GRPO)
	Reinforcement learning improves reasoning quality and prediction consistency.

	3. Intuitive Calibration
	A lightweight classifier is trained on generated reasoning traces for stable prediction.

	---

	## Intended Use

	DeepIntuit is designed for research on:

	* video understanding
	* open-instance video classification
	* reasoning-enhanced multimodal learning
	* safety-sensitive video analysis

	## Sample Usage

	To run inference using the code provided in the [official repository](https://github.com/BWGZK-keke/DeepIntuit):

	```bash
	cd stage2_model
	python inference.py \
	--model_path BWGZK/DeepIntuit \
	--video_path path_to_video.mp4
	```

	---

	## Citation

	```bibtex
	@article{zhang2026deepintuit,
	title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification},
	author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di},
	journal={arXiv preprint arXiv:2603.10300},
	year={2026}
	}
	```