Create README.md

# AVoCaDO: An AudioVisual Video Captioner Driven by Temporal Orchestration


<a href="https://avocado-captioner.github.io/"><img src="https://img.shields.io/badge/Project%20webpage-558b2f?style=for-the-badge"></a>
<a href="https://github.com/AVoCaDO-Captioner/AVoCaDO"><img src="https://img.shields.io/badge/Github-db8905?style=for-the-badge"></a>
<a href="https://arxiv.org/abs/todo"><img src="https://img.shields.io/badge/arXiv-red?style=for-the-badge"></a>


## ✨ Overview
Audiovisual video captioning aims to generate semantically rich descriptions with temporal alignment between visual and auditory events, thereby benefiting both video understanding and generation. We introduce AVoCaDO, a powerful audiovisual video captioner driven by the temporal orchestration between audio and visual modalities. Experimental results demonstrate that AVoCaDO significantly outperforms existing open-source models across four audiovisual video captioning benchmarks, and also achieves competitive performance under visual-only settings.

## 🚀 Getting Started
Please refer to our [Github repository](https://github.com/AVoCaDO-Captioner/AVoCaDO) for more details.

## ✒️ Citation

If you find our work helpful for your research, please consider giving a star ⭐ and citing our paper. We appreciate your support!

```bibtex
todo
```

Files changed (1) hide show

README.md +9 -0

README.md ADDED Viewed

	@@ -0,0 +1,9 @@

+---
+license: apache-2.0
+base_model:
+- Qwen/Qwen2.5-Omni-7B
+tags:
+- audiovisual
+- video
+- captioner
+---