Video-Text-to-Text
Transformers
Safetensors
English
qwen2_5_vl
image-text-to-text
text-generation-inference
Instructions to use QiWang98/VideoRFT-SFT-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QiWang98/VideoRFT-SFT-3B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("QiWang98/VideoRFT-SFT-3B") model = AutoModelForImageTextToText.from_pretrained("QiWang98/VideoRFT-SFT-3B") - Notebooks
- Google Colab
- Kaggle
Improve model card: Add `transformers` support, update `pipeline_tag`, and add descriptive content with usage
#1
by nielsr HF Staff - opened
This PR significantly enhances the model card for VideoRFT, linking it to the paper VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning.
Key updates include:
- Adding
library_name: transformersto enable the "Use in Transformers" widget, as the model is fully compatible with the library. - Updating
pipeline_tagfromvisual-question-answeringtovideo-text-to-textto better reflect the model's capabilities in video reasoning and understanding. - Populating the main content section with a detailed overview (including the abstract), methodology, dataset information, installation/training/evaluation instructions, and a runnable "Quick Inference Code" example, all extracted from the project's GitHub README.
- Ensuring all relevant links (paper, GitHub, datasets) are correctly included.
This comprehensive update aims to improve discoverability and usability for researchers and developers on the Hugging Face Hub.
QiWang98 changed pull request status to merged