Video-Text-to-Text
Transformers
Safetensors
English
qwen2_5_vl
image-text-to-text
text-generation-inference
Instructions to use QiWang98/VideoRFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QiWang98/VideoRFT with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("QiWang98/VideoRFT") model = AutoModelForMultimodalLM.from_pretrained("QiWang98/VideoRFT") - Notebooks
- Google Colab
- Kaggle
Improve model card for VideoRFT with metadata and comprehensive content
#1
by nielsr HF Staff - opened
This PR significantly improves the model card for VideoRFT by:
- Updating the
pipeline_tagfromvisual-question-answeringtovideo-text-to-textto accurately reflect the model's capabilities in video reasoning and text generation. - Adding
library_name: transformersas the model is compatible with the Hugging Face Transformers library, enabling the automated "How to use" widget. - Populating the content section with a detailed description, including the paper abstract, methodology, dataset information, installation instructions, training and inference guidance, a runnable Python usage example, and the full citation information.
- Ensuring direct links to the associated paper and the GitHub repository are prominently displayed.
These enhancements make the model card more informative and user-friendly for the Hugging Face community.
QiWang98 changed pull request status to merged