nielsr HF Staff commited on
Commit
3bdfde1
·
verified ·
1 Parent(s): f0aecb4

Improve model card: Add pipeline tag, library name, and complete citation

Browse files

This PR enhances the model card for AVoCaDO by:
- Adding `pipeline_tag: video-text-to-text` to improve discoverability on the Hub, accurately reflecting the model's functionality of generating text descriptions from video input.
- Specifying `library_name: transformers` in the metadata, as evidence from `config.json` (`"transformers_version": "4.52.4"` and `Qwen2_5OmniForConditionalGeneration` architecture) confirms compatibility with the Hugging Face Transformers library, enabling an automated usage widget.
- Completing the `Citation` section with the appropriate BibTeX entry for the paper.

These updates provide more comprehensive information for users and integrate the model more effectively within the Hugging Face ecosystem.

Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -1,11 +1,13 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen2.5-Omni-7B
 
5
  tags:
6
  - audiovisual
7
  - video
8
  - captioner
 
 
9
  ---
10
 
11
  # AVoCaDO: An <u>A</u>udio<u>V</u>isual Vide<u>o</u> <u>Ca</u>ptioner <u>D</u>riven by Temporal <u>O</u>rchestration
@@ -17,7 +19,7 @@ tags:
17
  </p>
18
 
19
  ## ✨ Overview
20
- Audiovisual video captioning aims to generate semantically rich descriptions with temporal alignment between visual and auditory events, thereby benefiting both video understanding and generation. We introduce <b>AVoCaDO</b>, a powerful audiovisual video captioner driven by the temporal orchestration between audio and visual modalities. Experimental results demonstrate that AVoCaDO significantly outperforms existing open-source models across four audiovisual video captioning benchmarks, and also achieves competitive performance under visual-only settings.
21
 
22
  ## 🚀 Getting Started
23
  Please refer to our [Github repository](https://github.com/AVoCaDO-Captioner/AVoCaDO) for more details.
@@ -27,5 +29,10 @@ Please refer to our [Github repository](https://github.com/AVoCaDO-Captioner/AVo
27
  If you find our work helpful for your research, please consider giving a star ⭐ and citing our paper. We appreciate your support!
28
 
29
  ```bibtex
30
- todo
 
 
 
 
 
31
  ```
 
1
  ---
 
2
  base_model:
3
  - Qwen/Qwen2.5-Omni-7B
4
+ license: apache-2.0
5
  tags:
6
  - audiovisual
7
  - video
8
  - captioner
9
+ pipeline_tag: video-text-to-text
10
+ library_name: transformers
11
  ---
12
 
13
  # AVoCaDO: An <u>A</u>udio<u>V</u>isual Vide<u>o</u> <u>Ca</u>ptioner <u>D</u>riven by Temporal <u>O</u>rchestration
 
19
  </p>
20
 
21
  ## ✨ Overview
22
+ Audiovisual video captioning aims to generate semantically rich descriptions with temporal alignment between visual and auditory events, thereby benefiting both video understanding and generation. We introduce **AVoCaDO**, a powerful audiovisual video captioner driven by the temporal orchestration between audio and visual modalities. Experimental results demonstrate that AVoCaDO significantly outperforms existing open-source models across four audiovisual video captioning benchmarks, and also achieves competitive performance under visual-only settings.
23
 
24
  ## 🚀 Getting Started
25
  Please refer to our [Github repository](https://github.com/AVoCaDO-Captioner/AVoCaDO) for more details.
 
29
  If you find our work helpful for your research, please consider giving a star ⭐ and citing our paper. We appreciate your support!
30
 
31
  ```bibtex
32
+ @article{wu2025avocado,
33
+ author = {Zhiyong Wu and Zichen Ding and Zhenyu Wu and Yian Wang and Peng Li and Chengyou Jia and Zicheng Zhang and Paul Pu Liang and Hu Xu and Hyunwoo J. Kim and Lemeng Wu and Chenchen Zhu and Paul Pu Liang and Mohit Bansal and Liheng Chen},
34
+ title = {AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration},
35
+ journal = {arXiv preprint arXiv:2510.10395},
36
+ year = {2025},
37
+ }
38
  ```