Video-Text-to-Text
Transformers
Safetensors
English
videollama3_qwen2
text-generation
multi-modal
large-language-model
video-language-model
custom_code
Instructions to use DAMO-NLP-SG/VideoLLaMA3-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DAMO-NLP-SG/VideoLLaMA3-7B with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("DAMO-NLP-SG/VideoLLaMA3-7B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Add pipeline tag
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -15,13 +15,12 @@ language:
|
|
| 15 |
- en
|
| 16 |
metrics:
|
| 17 |
- accuracy
|
| 18 |
-
pipeline_tag:
|
| 19 |
base_model:
|
| 20 |
- Qwen/Qwen2.5-7B-Instruct
|
| 21 |
- DAMO-NLP-SG/VideoLLaMA3-7B-Image
|
| 22 |
---
|
| 23 |
|
| 24 |
-
|
| 25 |
<p align="center">
|
| 26 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/626938b16f8f86ad21deb989/tt5KYnAUmQlHtfB1-Zisl.png" width="150" style="margin-bottom: 0.2;"/>
|
| 27 |
<p>
|
|
@@ -139,4 +138,12 @@ If you find VideoLLaMA useful for your research and applications, please cite us
|
|
| 139 |
year = {2023},
|
| 140 |
url = {https://arxiv.org/abs/2306.02858}
|
| 141 |
}
|
| 142 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
- en
|
| 16 |
metrics:
|
| 17 |
- accuracy
|
| 18 |
+
pipeline_tag: any-to-any
|
| 19 |
base_model:
|
| 20 |
- Qwen/Qwen2.5-7B-Instruct
|
| 21 |
- DAMO-NLP-SG/VideoLLaMA3-7B-Image
|
| 22 |
---
|
| 23 |
|
|
|
|
| 24 |
<p align="center">
|
| 25 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/626938b16f8f86ad21deb989/tt5KYnAUmQlHtfB1-Zisl.png" width="150" style="margin-bottom: 0.2;"/>
|
| 26 |
<p>
|
|
|
|
| 138 |
year = {2023},
|
| 139 |
url = {https://arxiv.org/abs/2306.02858}
|
| 140 |
}
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
## 👍 Acknowledgement
|
| 144 |
+
Our VideoLLaMA3 is built on top of [**SigLip**](https://huggingface.co/google/siglip-so400m-patch14-384) and [**Qwen2.5**](https://github.com/QwenLM/Qwen2.5). We also learned a lot from the implementation of [**LLaVA-OneVision**](https://github.com/LLaVA-VL/LLaVA-NeXT), [**InternVL2**](https://internvl.github.io/blog/2024-07-02-InternVL-2.0/), and [**Qwen2VL**](https://github.com/QwenLM/Qwen2-VL). Besides, our VideoLLaMA3 benefits from tons of open-source efforts. We sincerely appreciate these efforts and compile a list in [ACKNOWLEDGEMENT.md](https://github.com/DAMO-NLP-SG/VideoLLaMA3/blob/main/ACKNOWLEDGEMENT.md) to express our gratitude. If your work is used in VideoLLaMA3 but not mentioned in either this repo or the technical report, feel free to let us know :heart:.
|
| 145 |
+
|
| 146 |
+
## 🔒 License
|
| 147 |
+
|
| 148 |
+
This project is released under the Apache 2.0 license as found in the LICENSE file.
|
| 149 |
+
The service is a research preview intended for **non-commercial use ONLY**, subject to the model Licenses of Qwen, Terms of Use of the data generated by OpenAI and Gemini, and Privacy Practices of ShareGPT. Please get in touch with us if you find any potential violations.
|