Improve model card: Add metadata, links, overview, and citation

#1
by nielsr HF Staff - opened

This PR enhances the model card by adding key metadata and comprehensive information:

  • Adds pipeline_tag: image-text-to-text to correctly categorize the model for multimodal tasks.
  • Adds library_name: transformers as the model architecture (llava_llama and AnchorLlava) and transformers_version in config.json indicate compatibility with the transformers library, enabling the "How to use" widget.
  • Includes a direct link to the paper: Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens.
  • Provides links to the official project page (https://wakalsprojectpage.github.io/comt-website) and the GitHub repository (https://github.com/Wakals/CoMT) for easy access to more resources.
  • Expands the "Model Description" with a detailed overview of CoVT's methodology and benefits, derived from the paper's abstract and the GitHub README.
  • Embeds relevant demo images from the GitHub repository to visually illustrate the model's capabilities.
  • Adds a BibTeX citation for the paper.

Please review and merge if these improvements align with your expectations.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment