Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 12 days ago • 68
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published Mar 24 • 91
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published Mar 16 • 153
Mode Seeking meets Mean Seeking for Fast Long Video Generation Paper • 2602.24289 • Published Feb 27 • 41
Reliable and Responsible Foundation Models: A Comprehensive Survey Paper • 2602.08145 • Published Feb 4 • 8
Inference-time Physics Alignment of Video Generative Models with Latent World Models Paper • 2601.10553 • Published Jan 15 • 13
EasyV2V: A High-quality Instruction-based Video Editing Framework Paper • 2512.16920 • Published Dec 18, 2025 • 18
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published Dec 8, 2025 • 46
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published Dec 5, 2025 • 29
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published Dec 5, 2025 • 29
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published Dec 5, 2025 • 29 • 2
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 170
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30, 2025 • 43
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9, 2025 • 105
Running on Zero Agents Featured 824 Qwen Image Edit ✒ 824 Edit images based on your written instructions