ASID-Caption

community

https://asid-caption.github.io/

AI & ML interests

Video Understanding, Audio-Visual, Multimodal LLMs, Video Captioning, Instruction Tuning, Dataset Curation, Qwen-based, Open-source, Fully-Open-MLLMs

Papers

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

View all Papers

lyhisme

submitted a paper to Daily Papers 3 months ago

Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

Paper • 2603.22847 • Published Mar 24 • 26

lyhisme

updated 2 models 4 months ago

AudioVisual-Caption/ASID-Captioner-7B

Image-Text-to-Text • 9B • Updated Mar 11 • 18 • 7

AudioVisual-Caption/ASID-Captioner-3B

Image-Text-to-Text • 5B • Updated Mar 11 • 8 • 37

lyhisme

updated a dataset 4 months ago

AudioVisual-Caption/ASID-1M

Viewer • Updated Mar 11 • 241k • 986 • 85

lyhisme

updated a Space 4 months ago

ASID-Caption

🦉

lyhisme

submitted a paper to Daily Papers 5 months ago

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

Paper • 2602.13013 • Published Feb 13 • 55

lyhisme

published 2 models 5 months ago

AudioVisual-Caption/ASID-Captioner-3B

Image-Text-to-Text • 5B • Updated Mar 11 • 8 • 37

AudioVisual-Caption/ASID-Captioner-7B

Image-Text-to-Text • 9B • Updated Mar 11 • 18 • 7

lyhisme

published a Space 5 months ago

ASID-Caption

🦉

lyhisme

published a dataset 5 months ago

AudioVisual-Caption/ASID-1M

Viewer • Updated Mar 11 • 241k • 986 • 85

lyhisme

authored 5 papers 9 months ago

TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

Paper • 2509.18056 • Published Sep 22, 2025 • 27

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Paper • 2406.00670 • Published Jun 2, 2024

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

Paper • 2412.06244 • Published Dec 9, 2024

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models

Paper • 2508.01548 • Published Aug 3, 2025 • 14

Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment

Paper • 2508.08811 • Published Aug 12, 2025 • 2

AI & ML interests

Papers

Team members 1

AudioVisual-Caption's activity

ASID-Caption

ASID-Caption