GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions Paper • 2605.15764 • Published May 15 • 4
GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions Paper • 2605.15764 • Published May 15 • 4
DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes Paper • 2505.23179 • Published May 29, 2025 • 1
STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding Paper • 2603.27593 • Published Mar 29 • 12
STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding Paper • 2603.27593 • Published Mar 29 • 12
STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding Paper • 2603.27593 • Published Mar 29 • 12
ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding Paper • 2506.01274 • Published Jun 2, 2025 • 4
ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding Paper • 2506.01274 • Published Jun 2, 2025 • 4
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models Paper • 2406.01920 • Published Jun 4, 2024 • 1
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis Paper • 2411.16173 • Published Nov 25, 2024 • 9
Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing Paper • 2411.19460 • Published Nov 29, 2024 • 11
Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing Paper • 2411.19460 • Published Nov 29, 2024 • 11
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis Paper • 2411.16173 • Published Nov 25, 2024 • 9
Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing Paper • 2411.19460 • Published Nov 29, 2024 • 11
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis Paper • 2411.16173 • Published Nov 25, 2024 • 9