TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers Paper • 2601.14133 • Published 13 days ago • 60
ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models Paper • 2601.11404 • Published 17 days ago • 25
view article Article NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI 28 days ago • 60
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 19 days ago • 32
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published 19 days ago • 51
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published Dec 23, 2025 • 27
OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Paper • 2601.09575 • Published 19 days ago • 25
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published 19 days ago • 51
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published 19 days ago • 51
3AM: Segment Anything with Geometric Consistency in Videos Paper • 2601.08831 • Published 20 days ago • 34
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 25 days ago • 214
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 29 days ago • 44
Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting Paper • 2512.20927 • Published Dec 24, 2025 • 16
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published Dec 19, 2025 • 97
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published Dec 17, 2025 • 33
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published Dec 23, 2025 • 50
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation Paper • 2512.17012 • Published Dec 18, 2025 • 46
Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment Paper • 2512.04356 • Published Dec 4, 2025 • 10
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action Paper • 2511.22134 • Published Nov 27, 2025 • 22