Composition of Memory Experts for Diffusion World Models Paper • 2605.18813 • Published about 1 month ago • 2
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control Paper • 2412.11198 • Published Dec 15, 2024 • 2
Rethinking Visual Intelligence: Insights from Video Pretraining Paper • 2510.24448 • Published Oct 28, 2025 • 7
Rethinking Visual Intelligence: Insights from Video Pretraining Paper • 2510.24448 • Published Oct 28, 2025 • 7
Communication-Inspired Tokenization for Structured Image Representations Paper • 2602.20731 • Published Feb 24 • 4
World Model Self-Distillation: Training World Models to Solve General Tasks Paper • 2606.12072 • Published 2 days ago • 6
Composition of Memory Experts for Diffusion World Models Paper • 2605.18813 • Published about 1 month ago • 2
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control Paper • 2412.11198 • Published Dec 15, 2024 • 2
From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models Paper • 2506.07280 • Published Jun 10, 2025 • 1
World Model Self-Distillation: Training World Models to Solve General Tasks Paper • 2606.12072 • Published 2 days ago • 6
Communication-Inspired Tokenization for Structured Image Representations Paper • 2602.20731 • Published Feb 24 • 4
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published Oct 14, 2025 • 115
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 171
Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models Paper • 2506.00996 • Published Jun 1, 2025 • 40