Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published Dec 16, 2024 • 23
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Paper • 2412.09619 • Published Dec 12, 2024 • 32
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 48
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 28
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Paper • 2412.16112 • Published Dec 20, 2024 • 23
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens Paper • 2501.07730 • Published Jan 13, 2025 • 18
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published Jan 30, 2025 • 27
Improved Training Technique for Latent Consistency Models Paper • 2502.01441 • Published Feb 3, 2025 • 8
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12, 2025 • 38
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling Paper • 2502.09509 • Published Feb 13, 2025 • 9
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation Paper • 2502.18302 • Published Feb 25, 2025 • 5
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published Feb 28, 2025 • 26
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification Paper • 2503.02537 • Published Mar 4, 2025 • 12
Autoregressive Image Generation with Randomized Parallel Decoding Paper • 2503.10568 • Published Mar 13, 2025 • 9
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation Paper • 2503.10618 • Published Mar 13, 2025 • 19
Neighboring Autoregressive Modeling for Efficient Visual Generation Paper • 2503.10696 • Published Mar 12, 2025 • 8
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published Mar 20, 2025 • 73
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models Paper • 2503.18352 • Published Mar 24, 2025 • 7
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published Mar 30, 2025 • 94
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance Paper • 2504.06232 • Published Apr 8, 2025 • 13
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published Apr 10, 2025 • 50
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published Apr 11, 2025 • 46
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL Paper • 2504.11455 • Published Apr 15, 2025 • 14
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers Paper • 2504.10483 • Published Apr 14, 2025 • 22
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published May 1, 2025 • 44
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis Paper • 2506.06276 • Published Jun 6, 2025 • 27
Improving Progressive Generation with Decomposable Flow Matching Paper • 2506.19839 • Published Jun 24, 2025 • 8
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training Paper • 2510.11712 • Published Oct 13, 2025 • 31
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions Paper • 2511.06876 • Published Nov 10, 2025 • 28
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published Nov 19, 2025 • 234
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation Paper • 2511.19365 • Published Nov 24, 2025 • 66
OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation Paper • 2511.20211 • Published Nov 25, 2025 • 12
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss Paper • 2602.02493 • Published Feb 2 • 46
RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment Paper • 2603.00483 • Published Feb 28 • 3
DREAM: Where Visual Understanding Meets Text-to-Image Generation Paper • 2603.02667 • Published Mar 3 • 6
OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens Paper • 2603.02138 • Published Mar 2 • 151
Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion Paper • 2603.14645 • Published Mar 15 • 5
Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training Paper • 2603.16139 • Published Mar 17 • 33
Representation Alignment for Just Image Transformers is not Easier than You Think Paper • 2603.14366 • Published Mar 15 • 13
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published Mar 31 • 49
FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching Paper • 2604.06757 • Published Apr 8 • 10