Integrated Vision Language Lab

university

https://www.ivllab.kaist.ac.kr/ivylab-ivllab

AI & ML interests

None defined yet.

Recent Activity

arkimjh authored a paper about 1 month ago

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

arkimjh submitted a paper about 1 month ago

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

arkimjh authored a paper about 1 month ago

DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes

View all activity

authored a paper about 1 month ago

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

Paper • 2605.15764 • Published May 15 • 4

submitted a paper to Daily Papers about 1 month ago

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

Paper • 2605.15764 • Published May 15 • 4

authored 3 papers about 1 month ago

DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes

Paper • 2505.23179 • Published May 29, 2025 • 1

STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding

Paper • 2603.27593 • Published Mar 29 • 12

Narrative-Driven Paper-to-Slide Generation via ArcDeck

Paper • 2604.11969 • Published Apr 13 • 7

submitted a paper to Daily Papers 2 months ago

Narrative-Driven Paper-to-Slide Generation via ArcDeck

Paper • 2604.11969 • Published Apr 13 • 7

authored a paper 3 months ago

STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding

Paper • 2603.27593 • Published Mar 29 • 12

submitted a paper to Daily Papers 3 months ago

STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding

Paper • 2603.27593 • Published Mar 29 • 12

authored a paper about 1 year ago

ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding

Paper • 2506.01274 • Published Jun 2, 2025 • 4

authored a paper about 1 year ago

ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding

Paper • 2506.01274 • Published Jun 2, 2025 • 4

updated a dataset over 1 year ago

IVLLab/SceneWalk

Viewer • Updated Dec 9, 2024 • 1.29M • 29 • 4

authored 3 papers over 1 year ago

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models

Paper • 2406.01920 • Published Jun 4, 2024 • 1

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis

Paper • 2411.16173 • Published Nov 25, 2024 • 9

Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing

Paper • 2411.19460 • Published Nov 29, 2024 • 11

in IVLLab/MultiDialog over 1 year ago

Test valid Freq and Test valid Rare

#6 opened over 1 year ago by

in IVLLab/SceneWalk over 1 year ago

How to get the video ?

#2 opened over 1 year ago by

authored a paper over 1 year ago

Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing

Paper • 2411.19460 • Published Nov 29, 2024 • 11

authored 2 papers over 1 year ago

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis

Paper • 2411.16173 • Published Nov 25, 2024 • 9

Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing

Paper • 2411.19460 • Published Nov 29, 2024 • 11

authored a paper over 1 year ago

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis

Paper • 2411.16173 • Published Nov 25, 2024 • 9