Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Paper • 2606.11176 • Published 18 days ago • 127
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published Apr 27 • 71
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published Mar 24 • 92
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published Mar 16 • 155
Mode Seeking meets Mean Seeking for Fast Long Video Generation Paper • 2602.24289 • Published Feb 27 • 41
Reliable and Responsible Foundation Models: A Comprehensive Survey Paper • 2602.08145 • Published Feb 4 • 8
Inference-time Physics Alignment of Video Generative Models with Latent World Models Paper • 2601.10553 • Published Jan 15 • 13
EasyV2V: A High-quality Instruction-based Video Editing Framework Paper • 2512.16920 • Published Dec 18, 2025 • 18
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published Dec 8, 2025 • 46
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published Dec 5, 2025 • 29
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published Dec 5, 2025 • 29
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published Dec 5, 2025 • 29 • 2
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 171