Spaces:

AudioVisual-Caption
/

README

Running

lyhisme commited on Feb 11

Commit

1d43211

verified ·

1 Parent(s): e5d9030

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,10 +1,36 @@
 ---
-title: README
-emoji: 😻
-colorFrom: red
 colorTo: gray
 sdk: static
 pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

 ---
+title: ASID-Caption
+emoji: 🦉
+colorFrom: indigo
 colorTo: gray
 sdk: static
 pinned: false
 ---
+# ASID-Caption
+We build **ASID-Caption**, a data-and-model suite for **fine-grained audiovisual video understanding**.
+Our goal is to move beyond “one video → one generic caption” by providing **attribute-structured supervision** and **quality-verified annotations**, enabling models to produce **more complete, more controllable, and more temporally consistent** descriptions that cover both **visual content** and **audio cues**.
+## What we release
+- **ASID-1M**: a large-scale collection of **attribute-structured** audiovisual instructions with both *single-attribute* and *all-attributes* training formats.
+- **ASID-Verify**: a scalable curation pipeline that generates, ensembles, verifies, and refines annotations to improve semantic and temporal consistency.
+- **ASID-Captioner**: Qwen2.5-Omni-based audiovisual captioning models fine-tuned on ASID-1M.
+## Research interests
+- Video understanding & video captioning
+- Audio-visual learning
+- Multimodal LLMs / instruction tuning
+- Data curation, verification, and quality control
+## Links
+- **Dataset (ASID-1M):** https://huggingface.co/datasets/AudioVisual-Caption/ASID-1M
+- **Models (ASID-Captioner):** https://huggingface.co/AudioVisual-Caption
+## Contact
+For questions, issues, or takedown requests, please open a **Discussion** under the corresponding dataset/model page.