DAMO-NLP-SG
/

VideoLLaMA3-7B-Image

Visual Question Answering

videollama3_qwen2

text-generation

large-language-model

video-language-model

Model card Files Files and versions

Add any-to-any pipeline tag, paper link

#1

by nielsr HF Staff - opened Jan 24, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -15,18 +15,17 @@ language:
 - en
 metrics:
 - accuracy
-pipeline_tag: visual-question-answering
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
 ---
 <p align="center">
     <img src="https://cdn-uploads.huggingface.co/production/uploads/626938b16f8f86ad21deb989/tt5KYnAUmQlHtfB1-Zisl.png" width="150" style="margin-bottom: 0.2;"/>
 <p>
-<h3 align="center"><a href="https://arxiv.org/abs/2501.13106">VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding</a></h3>
 <h5 align="center">

 - en
 metrics:
 - accuracy
+pipeline_tag: any-to-any
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
 ---
 <p align="center">
     <img src="https://cdn-uploads.huggingface.co/production/uploads/626938b16f8f86ad21deb989/tt5KYnAUmQlHtfB1-Zisl.png" width="150" style="margin-bottom: 0.2;"/>
 <p>
+<h3 align="center"><a href="https://huggingface.co/papers/2501.13106">VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding</a></h3>
 <h5 align="center">