Video-Text-to-Text
Transformers
Safetensors
English
qwen2_5_vl
image-text-to-text
text-generation-inference
Instructions to use QiWang98/VideoRFT-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QiWang98/VideoRFT-SFT with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("QiWang98/VideoRFT-SFT") model = AutoModelForImageTextToText.from_pretrained("QiWang98/VideoRFT-SFT") - Notebooks
- Google Colab
- Kaggle
Improve model card: update pipeline tag, add library name, paper details & content
#1
by nielsr HF Staff - opened
This PR significantly enhances the model card by:
- Updating the
pipeline_tagfromvisual-question-answeringtovideo-text-to-textto better reflect the model's comprehensive video reasoning capabilities. - Adding
library_name: transformersto enable the automated "Use in Transformers" widget, as the model's configuration files and GitHub requirements demonstrate compatibility with the library. - Populating the content with the paper's abstract (as "Overview"), methodology, dataset details, setup instructions, training and evaluation guides, and acknowledgements, all sourced directly from the project's GitHub README.
- Ensuring that the official paper link and GitHub repository link are prominently displayed.
- Carefully linking all images to their raw GitHub URLs for proper rendering on the Hub.
This update provides a much richer and more accurate overview for users, improving discoverability and ease of use.
QiWang98 changed pull request status to merged