Add comprehensive model card for Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

#1
by nielsr HF Staff - opened

This PR significantly enhances the model card for the Conan model by adding essential information for improved discoverability and usability on the Hugging Face Hub.

Key additions include:

  • Metadata: pipeline_tag: video-text-to-text to enable the video reasoning inference widget, library_name: transformers for automatic code snippet generation (when an explicit usage example is provided/discovered later), base_model: Qwen2.5-VL-7B-Instruct, datasets: Conan-91K, and descriptive tags (multimodal, video, reasoning, qwen).
  • Paper Link: A direct link to the research paper Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence.
  • Abstract: The full abstract from the paper for a detailed overview of the model's approach and findings.
  • Project Description: A summary of Conan's capabilities, derived from the paper's introduction.
  • Teaser Image: An image to visually represent the model's performance from the GitHub repository.
  • GitHub Repository Link: A link to the official GitHub repository for more detailed code and resources.
  • Evaluation Details: Information on the Conan-Eval toolkit and supported benchmarks, extracted from the GitHub README.
  • Citation: A BibTeX entry for easy referencing.

This update makes the model card more informative and user-friendly.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment