MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models Paper • 2601.21181 • Published Jan 29 • 10
Narrative-Driven Paper-to-Slide Generation via ArcDeck Paper • 2604.11969 • Published 25 days ago • 7
STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding Paper • 2603.27593 • Published Mar 29 • 12
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing Paper • 2411.19460 • Published Nov 29, 2024 • 11
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis Paper • 2411.16173 • Published Nov 25, 2024 • 9
VideoGPT+ Collection VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding • 10 items • Updated Jun 11, 2024 • 3
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models Paper • 2406.01920 • Published Jun 4, 2024 • 1
What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models Paper • 2403.13513 • Published Mar 20, 2024 • 1