daily papers
updated
GenTron: Delving Deep into Diffusion Transformers for Image and Video
Generation
Paper
• 2312.04557
• Published • 13
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper
• 2312.04410
• Published • 15
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
• 2312.04461
• Published • 62
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes
Interactively
Paper
• 2401.02955
• Published • 23
Denoising Vision Transformers
Paper
• 2401.02957
• Published • 31
SSR-Encoder: Encoding Selective Subject Representation for
Subject-Driven Generation
Paper
• 2312.16272
• Published • 7
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
Time-Decoupled Training and Reusable Coop-Diffusion
Paper
• 2312.16486
• Published • 7
Edify Image: High-Quality Image Generation with Pixel Space Laplacian
Diffusion Models
Paper
• 2411.07126
• Published • 30
Motion Control for Enhanced Complex Action Video Generation
Paper
• 2411.08328
• Published • 5
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified
Multimodal Understanding and Generation
Paper
• 2411.07975
• Published • 31
Pyramidal Flow Matching for Efficient Video Generative Modeling
Paper
• 2410.05954
• Published • 40
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Paper
• 2412.04432
• Published • 16
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper
• 2412.04814
• Published • 46
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Paper
• 2412.05263
• Published • 10
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Paper
• 2412.01169
• Published • 13
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper
• 2410.13861
• Published • 56
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit
Consistency
Paper
• 2412.15216
• Published • 5
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Paper
• 2412.16153
• Published • 6
Large Motion Video Autoencoding with Cross-modal Video VAE
Paper
• 2412.17805
• Published • 24
AnyStory: Towards Unified Single and Multiple Subject Personalization in
Text-to-Image Generation
Paper
• 2501.09503
• Published • 14
Do generative video models learn physical principles from watching
videos?
Paper
• 2501.09038
• Published • 34
Small Models Struggle to Learn from Strong Reasoners
Paper
• 2502.12143
• Published • 39
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic
Faithfulness
Paper
• 2503.21755
• Published • 33
Efficient Generative Model Training via Embedded Representation Warmup
Paper
• 2504.10188
• Published • 12
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper
• 2510.20888
• Published • 50
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Paper
• 2511.09611
• Published • 71
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
Paper
• 2511.13704
• Published • 44
Back to Basics: Let Denoising Generative Models Denoise
Paper
• 2511.13720
• Published • 70
DiP: Taming Diffusion Models in Pixel Space
Paper
• 2511.18822
• Published • 29
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published • 170
PixelDiT: Pixel Diffusion Transformers for Image Generation
Paper
• 2511.20645
• Published • 35
OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning
Paper
• 2603.24458
• Published • 4
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation
Paper
• 2603.23500
• Published • 35
Manifold-Aware Exploration for Reinforcement Learning in Video Generation
Paper
• 2603.21872
• Published • 33