Improve model card: Add pipeline tag, library name, and comprehensive details from GitHub

by nielsr HF Staff - opened Oct 7, 2025

←

This PR significantly enhances the model card for the Reason-RFT project by:

Adding Metadata:
- pipeline_tag: image-text-to-text is added, as the model is a Visual Language Model (VLM) designed for visual reasoning, taking images and text as input to generate text. This improves discoverability on the Hugging Face Hub.
- library_name: transformers is added, as evidenced by the model's architecture (Qwen2VLForConditionalGeneration) and components (Qwen2Tokenizer, Qwen2VLProcessor) found in the config.json and tokenizer_config.json files. This will enable automated code snippets for easy usage.
Updating Content for Clarity and Completeness:
- The main title has been updated to # Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models to align with the paper and official GitHub repository title.
- The "News" and "Citation" sections have been updated with the latest and most comprehensive information available from the project's GitHub README, including recent announcements and additional relevant citations.
- Detailed "RoadMap", "Pipeline", "General Visual Reasoning Tasks" (including Setup, Dataset Preparation, Training, and Evaluation instructions), and "Embodied Visual Reasoning Tasks" sections have been integrated from the GitHub README. These provide extensive usage guidance and project context, replacing the generic "Usage" link.
- Malformed HTML in the header links (<p align="center"> ... </p>) has been corrected for better rendering and validity.

These changes provide a more informative, up-to-date, and user-friendly model card for the community.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment