Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models Paper • 2606.25041 • Published 3 days ago • 46
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation Paper • 2605.14333 • Published May 14 • 35
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision Paper • 2605.05781 • Published May 7 • 5
Meta-CoT: Enhancing Granularity and Generalization in Image Editing Paper • 2604.24625 • Published Apr 27 • 26
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models Paper • 2604.25636 • Published Apr 28 • 24
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models Paper • 2604.25636 • Published Apr 28 • 24
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation Paper • 2511.16671 • Published Nov 20, 2025 • 16
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models Paper • 2305.16223 • Published May 25, 2023
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing Paper • 2406.08850 • Published Jun 13, 2024