Create README.md
Browse files# AVoCaDO: An <u>A</u>udio<u>V</u>isual Vide<u>o</u> <u>Ca</u>ptioner <u>D</u>riven by Temporal <u>O</u>rchestration
<p align="left">
<a href="https://avocado-captioner.github.io/"><img src="https://img.shields.io/badge/Project%20webpage-558b2f?style=for-the-badge"></a>
<a href="https://github.com/AVoCaDO-Captioner/AVoCaDO"><img src="https://img.shields.io/badge/Github-db8905?style=for-the-badge"></a>
<a href="https://arxiv.org/abs/todo"><img src="https://img.shields.io/badge/arXiv-red?style=for-the-badge"></a>
</p>
## ✨ Overview
Audiovisual video captioning aims to generate semantically rich descriptions with temporal alignment between visual and auditory events, thereby benefiting both video understanding and generation. We introduce <b>AVoCaDO</b>, a powerful audiovisual video captioner driven by the temporal orchestration between audio and visual modalities. Experimental results demonstrate that AVoCaDO significantly outperforms existing open-source models across four audiovisual video captioning benchmarks, and also achieves competitive performance under visual-only settings.
## 🚀 Getting Started
Please refer to our [Github repository](https://github.com/AVoCaDO-Captioner/AVoCaDO) for more details.
## ✒️ Citation
If you find our work helpful for your research, please consider giving a star ⭐ and citing our paper. We appreciate your support!
```bibtex
todo
```