SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation Paper • 2602.02402 • Published 3 days ago • 28
InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams Paper • 2601.02281 • Published about 1 month ago • 33
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published Dec 18, 2025 • 75
JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization Paper • 2511.23002 • Published Nov 28, 2025 • 26
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation Paper • 2512.08186 • Published Dec 9, 2025 • 22
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling Paper • 2512.03000 • Published Dec 2, 2025 • 37
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation Paper • 2510.08551 • Published Oct 9, 2025 • 33
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published Jul 7, 2025 • 48
IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering Paper • 2506.23329 • Published Jun 29, 2025 • 8
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent Paper • 2506.17612 • Published Jun 21, 2025 • 64
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper • 2506.05573 • Published Jun 5, 2025 • 82
SpatialLM: Training Large Language Models for Structured Indoor Modeling Paper • 2506.07491 • Published Jun 9, 2025 • 50
LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS Paper • 2311.17245 • Published Nov 28, 2023 • 2
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? Paper • 2503.12349 • Published Mar 16, 2025 • 44
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Paper • 2412.18450 • Published Dec 24, 2024 • 36
Large Spatial Model: End-to-end Unposed Images to Semantic 3D Paper • 2410.18956 • Published Oct 24, 2024 • 1