AVoCaDO-Captioner commited on
Commit
4ae5db1
·
verified ·
1 Parent(s): cc4da4b

Create README.md

Browse files

# AVoCaDO: An <u>A</u>udio<u>V</u>isual Vide<u>o</u> <u>Ca</u>ptioner <u>D</u>riven by Temporal <u>O</u>rchestration

<p align="left">
<a href="https://avocado-captioner.github.io/"><img src="https://img.shields.io/badge/Project%20webpage-558b2f?style=for-the-badge"></a>
<a href="https://github.com/AVoCaDO-Captioner/AVoCaDO"><img src="https://img.shields.io/badge/Github-db8905?style=for-the-badge"></a>
<a href="https://arxiv.org/abs/todo"><img src="https://img.shields.io/badge/arXiv-red?style=for-the-badge"></a>
</p>

## ✨ Overview
Audiovisual video captioning aims to generate semantically rich descriptions with temporal alignment between visual and auditory events, thereby benefiting both video understanding and generation. We introduce <b>AVoCaDO</b>, a powerful audiovisual video captioner driven by the temporal orchestration between audio and visual modalities. Experimental results demonstrate that AVoCaDO significantly outperforms existing open-source models across four audiovisual video captioning benchmarks, and also achieves competitive performance under visual-only settings.

## 🚀 Getting Started
Please refer to our [Github repository](https://github.com/AVoCaDO-Captioner/AVoCaDO) for more details.

## ✒️ Citation

If you find our work helpful for your research, please consider giving a star ⭐ and citing our paper. We appreciate your support!

```bibtex
todo
```

Files changed (1) hide show
  1. README.md +9 -0
README.md ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-Omni-7B
5
+ tags:
6
+ - audiovisual
7
+ - video
8
+ - captioner
9
+ ---