Safetensors

DeepIntuit

Model Description

DeepIntuit is a reasoning-enhanced video understanding model designed for open-instance video classification. Instead of directly mapping visual features to labels, the model learns to generate intrinsic reasoning traces that guide the final classification decision, improving robustness under large intra-class variation.

The model is introduced in:

From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification 📄 Paper: https://arxiv.org/abs/2603.10300 💻 Code: https://github.com/BWGZK-keke/DeepIntuit


Training Pipeline

DeepIntuit is trained through a three-stage pipeline:

  1. Cold Start Alignment Supervised training to initialize structured reasoning generation.

  2. Reasoning Refinement (GRPO) Reinforcement learning improves reasoning quality and prediction consistency.

  3. Intuitive Calibration A lightweight classifier is trained on generated reasoning traces for stable prediction.


Intended Use

DeepIntuit is designed for research on:

  • video understanding
  • open-instance video classification
  • reasoning-enhanced multimodal learning
  • safety-sensitive video analysis

Citation

@article{zhang2026deepintuit,
  title={From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification},
  author={Zhang, Ke and Zhao, Xiangchen and Tian, Yunjie and Zheng, Jiayu and Patel, Vishal M and Fu, Di},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BWGZK/DeepIntuit

Finetuned
(1020)
this model

Dataset used to train BWGZK/DeepIntuit

Paper for BWGZK/DeepIntuit