Improve model card: Add pipeline tag, library name, paper, code, abstract, image, and usage

#1
by nielsr HF Staff - opened

This PR significantly enhances the model card for the Vision-Zero-InternVL3-14B-Clevr model by adding crucial metadata and detailed documentation.

Specifically, it includes:

  • pipeline_tag: image-text-to-text: This accurately categorizes the model's functionality as a Vision-Language Model, improving its discoverability on the Hugging Face Hub.
  • library_name: transformers: Evidence from the config.json (e.g., transformers_version, architectures) suggests compatibility with the transformers library, enabling automated code snippets for users.
  • license: cc-by-nc-4.0: A common research license has been added.
  • Paper Link: A direct link to the paper Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.
  • GitHub Repository: A link to the official GitHub repository: https://github.com/wangqinsi1/Vision-Zero.
  • Abstract: The full paper abstract is included for a comprehensive overview.
  • Overview Image: The main overview image from the GitHub README is included for visual context.
  • Quick Start (Inference): A detailed usage section, including setup instructions and a Python code snippet, is directly extracted from the GitHub README to guide users on how to run inference.
  • Citation: The BibTeX citation for the paper is also included.

These additions will greatly improve the discoverability, usability, and overall documentation of the model on the Hugging Face Hub.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment