Improve model card: Add pipeline tag, library name, and comprehensive details from GitHub
#1
by
nielsr
HF Staff
- opened
This PR significantly enhances the model card for the Reason-RFT project by:
Adding Metadata:
pipeline_tag: image-text-to-textis added, as the model is a Visual Language Model (VLM) designed for visual reasoning, taking images and text as input to generate text. This improves discoverability on the Hugging Face Hub.library_name: transformersis added, as evidenced by the model's architecture (Qwen2VLForConditionalGeneration) and components (Qwen2Tokenizer,Qwen2VLProcessor) found in theconfig.jsonandtokenizer_config.jsonfiles. This will enable automated code snippets for easy usage.
Updating Content for Clarity and Completeness:
- The main title has been updated to
# Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Modelsto align with the paper and official GitHub repository title. - The "News" and "Citation" sections have been updated with the latest and most comprehensive information available from the project's GitHub README, including recent announcements and additional relevant citations.
- Detailed "RoadMap", "Pipeline", "General Visual Reasoning Tasks" (including Setup, Dataset Preparation, Training, and Evaluation instructions), and "Embodied Visual Reasoning Tasks" sections have been integrated from the GitHub README. These provide extensive usage guidance and project context, replacing the generic "Usage" link.
- Malformed HTML in the header links (
<p align="center"> ... </p>) has been corrected for better rendering and validity.
- The main title has been updated to
These changes provide a more informative, up-to-date, and user-friendly model card for the community.