Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence Paper • 2606.15932 • Published 10 days ago • 27
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 128
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting Paper • 2504.15921 • Published Apr 22, 2025 • 7
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting Paper • 2504.15921 • Published Apr 22, 2025 • 7 • 2
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning Paper • 2503.11495 • Published Mar 14, 2025 • 14
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning Paper • 2503.11495 • Published Mar 14, 2025 • 14
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning Paper • 2503.11495 • Published Mar 14, 2025 • 14 • 2
CoS: Chain-of-Shot Prompting for Long Video Understanding Paper • 2502.06428 • Published Feb 10, 2025 • 10
CoS: Chain-of-Shot Prompting for Long Video Understanding Paper • 2502.06428 • Published Feb 10, 2025 • 10
CoS: Chain-of-Shot Prompting for Long Video Understanding Paper • 2502.06428 • Published Feb 10, 2025 • 10 • 2
INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation Paper • 2501.18753 • Published Jan 30, 2025 • 3
INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation Paper • 2501.18753 • Published Jan 30, 2025 • 3
INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation Paper • 2501.18753 • Published Jan 30, 2025 • 3 • 2