ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning Paper • 2512.02835 • Published 10 days ago • 9
Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout Paper • 2511.20649 • Published 16 days ago • 45
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published 16 days ago • 150
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published 28 days ago • 15
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published 25 days ago • 25
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published 28 days ago • 111
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30 • 117
Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation Paper • 2510.19592 • Published Oct 22 • 12
Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation Paper • 2510.19592 • Published Oct 22 • 12
Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation Paper • 2510.19592 • Published Oct 22 • 12 • 2
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22 • 39
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published Aug 20 • 38
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Paper • 2508.14041 • Published Aug 19 • 59