Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image Paper • 2512.05044 • Published Dec 4, 2025 • 17
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning Paper • 2512.02835 • Published Dec 2, 2025 • 10
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28, 2025 • 78
view article Article SmolVLM2: Bringing Video Understanding to Every Device +5 orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova • Feb 20, 2025 • 337
Exploring the Potential of Encoder-free Architectures in 3D LMMs Paper • 2502.09620 • Published Feb 13, 2025 • 26
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model Paper • 2501.15830 • Published Jan 27, 2025 • 13