Spaces:
Running
Running
File size: 1,158 Bytes
e5d9030 1d43211 e5d9030 1d43211 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
---
title: ASID-Caption
emoji: 🦉
colorFrom: indigo
colorTo: gray
sdk: static
pinned: false
---
# ASID-Caption
We build **ASID-Caption**, a data-and-model suite for **fine-grained audiovisual video understanding**.
Our goal is to move beyond “one video → one generic caption” by providing **attribute-structured supervision** and **quality-verified annotations**, enabling models to produce **more complete, more controllable, and more temporally consistent** descriptions that cover both **visual content** and **audio cues**.
## What we release
- **ASID-1M**: a large-scale collection of **attribute-structured** audiovisual instructions with both *single-attribute* and *all-attributes* training formats.
- **ASID-Verify**: a scalable curation pipeline that generates, ensembles, verifies, and refines annotations to improve semantic and temporal consistency.
- **ASID-Captioner**: Qwen2.5-Omni-based audiovisual captioning models fine-tuned on ASID-1M.
## Research interests
- Video understanding & video captioning
- Audio-visual learning
- Multimodal LLMs / instruction tuning
- Data curation, verification, and quality control
|