Wakals
/

CoVT-LLaVA-13B-depth

Model card Files Files and versions

Improve model card: Add metadata, links, overview, and citation

#1

by nielsr HF Staff - opened Nov 26, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

This PR enhances the model card by adding key metadata and comprehensive information:

Adds pipeline_tag: image-text-to-text to correctly categorize the model for multimodal tasks.
Adds library_name: transformers as the model architecture (llava_llama and AnchorLlava) and transformers_version in config.json indicate compatibility with the transformers library, enabling the "How to use" widget.
Includes a direct link to the paper: Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens.
Provides links to the official project page (https://wakalsprojectpage.github.io/comt-website) and the GitHub repository (https://github.com/Wakals/CoMT) for easy access to more resources.
Expands the "Model Description" with a detailed overview of CoVT's methodology and benefits, derived from the paper's abstract and the GitHub README.
Embeds relevant demo images from the GitHub repository to visually illustrate the model's capabilities.
Adds a BibTeX citation for the paper.

Please review and merge if these improvements align with your expectations.

Improve model card: Add metadata, links, overview, and citation862faebd

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment