Spaces:

sming256
/

VideoAuto-R1_Demo

Running on Zero

Apply for a GPU community grant: Academic project

by sming256 - opened Jan 10

Owner Jan 10

https://ivul-kaust.github.io/projects/videoauto-r1/

We propose VideoAuto-R1, a video understanding framework that adopts a "reason-when-necessary" strategy. During training, our approach follows a Thinking Once, Answering Twice paradigm: the model first generates an initial answer, then performs reasoning, and finally outputs a reviewed answer. Both answers are supervised via verifiable rewards. During inference, the model uses the confidence score of the initial answer to determine whether to proceed with reasoning.

Our models and demo are hosted on HuggingFace. We kindly request additional GPU resources to ensure a smooth and responsive demo experience for users.

sming256

Owner Jan 10

Project Page: https://ivul-kaust.github.io/projects/videoauto-r1/
GitHub: https://github.com/IVUL-KAUST/VideoAuto-R1/
Tweet: https://x.com/shuming96/status/2009833088264876235

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment