--- title: ASID-Caption emoji: 🦉 colorFrom: indigo colorTo: gray sdk: static pinned: false --- # ASID-Caption We build **ASID-Caption**, a data-and-model suite for **fine-grained audiovisual video understanding**. Our goal is to move beyond “one video → one generic caption” by providing **attribute-structured supervision** and **quality-verified annotations**, enabling models to produce **more complete, more controllable, and more temporally consistent** descriptions that cover both **visual content** and **audio cues**. ## What we release - **ASID-1M**: a large-scale collection of **attribute-structured** audiovisual instructions with both *single-attribute* and *all-attributes* training formats. - **ASID-Verify**: a scalable curation pipeline that generates, ensembles, verifies, and refines annotations to improve semantic and temporal consistency. - **ASID-Captioner**: Qwen2.5-Omni-based audiovisual captioning models fine-tuned on ASID-1M. ## Research interests - Video understanding & video captioning - Audio-visual learning - Multimodal LLMs / instruction tuning - Data curation, verification, and quality control