Spaces:

AudioVisual-Caption
/

README

Running

README / README.md

Update README.md

98341cb verified 7 days ago

1.16 kB

	---
	title: ASID-Caption
	emoji: 🦉
	colorFrom: indigo
	colorTo: gray
	sdk: static
	pinned: false
	---

	# ASID-Caption

	We build ASID-Caption, a data-and-model suite for fine-grained audiovisual video understanding.

	Our goal is to move beyond “one video → one generic caption” by providing attribute-structured supervision and quality-verified annotations, enabling models to produce more complete, more controllable, and more temporally consistent descriptions that cover both visual content and audio cues.

	## What we release

	- ASID-1M: a large-scale collection of attribute-structured audiovisual instructions with both single-attribute and all-attributes training formats.
	- ASID-Verify: a scalable curation pipeline that generates, ensembles, verifies, and refines annotations to improve semantic and temporal consistency.
	- ASID-Captioner: Qwen2.5-Omni-based audiovisual captioning models fine-tuned on ASID-1M.

	## Research interests

	- Video understanding & video captioning
	- Audio-visual learning
	- Multimodal LLMs / instruction tuning
	- Data curation, verification, and quality control