OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 27 days ago • 343
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents Paper • 2601.16973 • Published Jan 23 • 40
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published Jan 1 • 131
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation Paper • 2512.21252 • Published Dec 24, 2025 • 35
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation Paper • 2512.09363 • Published Dec 10, 2025 • 72
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Paper • 2511.08633 • Published Nov 9, 2025 • 55
UniREditBench: A Unified Reasoning-based Image Editing Benchmark Paper • 2511.01295 • Published Nov 3, 2025 • 39
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published Nov 3, 2025 • 32
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published Oct 21, 2025 • 37
ObjectMover: Generative Object Movement with Video Prior Paper • 2503.08037 • Published Mar 11, 2025 • 5
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published Jun 20, 2025 • 64
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24, 2025 • 92
Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling Paper • 2505.14521 • Published May 20, 2025 • 11