QiWang98
/

VideoRFT-SFT-3B

Video-Text-to-Text

image-text-to-text

text-generation-inference

Model card Files Files and versions

Improve model card: Add `transformers` support, update `pipeline_tag`, and add descriptive content with usage

#1

by nielsr HF Staff - opened Oct 15, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

This PR significantly enhances the model card for VideoRFT, linking it to the paper VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning.

Key updates include:

Adding library_name: transformers to enable the "Use in Transformers" widget, as the model is fully compatible with the library.
Updating pipeline_tag from visual-question-answering to video-text-to-text to better reflect the model's capabilities in video reasoning and understanding.
Populating the main content section with a detailed overview (including the abstract), methodology, dataset information, installation/training/evaluation instructions, and a runnable "Quick Inference Code" example, all extracted from the project's GitHub README.
Ensuring all relevant links (paper, GitHub, datasets) are correctly included.

This comprehensive update aims to improve discoverability and usability for researchers and developers on the Hugging Face Hub.

Improve model card: Add `transformers` support, update `pipeline_tag`, and add descriptive content with usaged9663d9b

QiWang98 changed pull request status to merged Oct 21, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment