aigc and 3d
updated
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper
• 2306.07967
• Published
• 26
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper
• 2306.07954
• Published
• 113
TryOnDiffusion: A Tale of Two UNets
Paper
• 2306.08276
• Published
• 75
Seeing the World through Your Eyes
Paper
• 2306.09348
• Published
• 34
DreamHuman: Animatable 3D Avatars from Text
Paper
• 2306.09329
• Published
• 17
AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation
Paper
• 2306.09864
• Published
• 15
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image
Editing
Paper
• 2306.10012
• Published
• 37
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape
Optimization
Paper
• 2306.16928
• Published
• 41
DreamTime: An Improved Optimization Strategy for Text-to-3D Content
Creation
Paper
• 2306.12422
• Published
• 13
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based
Image Editing
Paper
• 2306.14435
• Published
• 21
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals
Paper
• 2306.16934
• Published
• 32
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D
and 3D Diffusion Priors
Paper
• 2306.17843
• Published
• 44
Generate Anything Anywhere in Any Scene
Paper
• 2306.17154
• Published
• 23
DisCo: Disentangled Control for Referring Human Dance Generation in Real
World
Paper
• 2307.00040
• Published
• 26
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
Paper
• 2307.00522
• Published
• 33
SDXL: Improving Latent Diffusion Models for High-Resolution Image
Synthesis
Paper
• 2307.01952
• Published
• 90
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
Paper
• 2307.02421
• Published
• 35
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
and Generation
Paper
• 2307.06942
• Published
• 24
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
Paper
• 2307.03869
• Published
• 24
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning
Paper
• 2307.04725
• Published
• 65
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image
Models
Paper
• 2307.06949
• Published
• 52
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
Paper
• 2307.07487
• Published
• 21
Text2Layer: Layered Image Generation using Latent Diffusion Model
Paper
• 2307.09781
• Published
• 16
FABRIC: Personalizing Diffusion Models with Iterative Feedback
Paper
• 2307.10159
• Published
• 32
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
Paper
• 2307.10373
• Published
• 58
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation
without Test-time Fine-tuning
Paper
• 2307.11410
• Published
• 17
Interpolating between Images with Diffusion Models
Paper
• 2307.12560
• Published
• 21
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based
Image Manipulation
Paper
• 2308.00906
• Published
• 15
ConceptLab: Creative Generation using Diffusion Prior Constraints
Paper
• 2308.02669
• Published
• 25
AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose
Paper
• 2308.03610
• Published
• 25
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Paper
• 2308.04079
• Published
• 195
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image
Diffusion Models
Paper
• 2308.06721
• Published
• 36
Dual-Stream Diffusion Net for Text-to-Video Generation
Paper
• 2308.08316
• Published
• 25
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Paper
• 2308.08545
• Published
• 35
MVDream: Multi-view Diffusion for 3D Generation
Paper
• 2308.16512
• Published
• 106
VideoGen: A Reference-Guided Latent Diffusion Approach for High
Definition Text-to-Video Generation
Paper
• 2309.00398
• Published
• 23
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Paper
• 2309.00610
• Published
• 21
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models
Paper
• 2309.05793
• Published
• 51
InstaFlow: One Step is Enough for High-Quality Diffusion-Based
Text-to-Image Generation
Paper
• 2309.06380
• Published
• 33
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion
Models
Paper
• 2309.15103
• Published
• 43
Emu: Enhancing Image Generation Models Using Photogenic Needles in a
Haystack
Paper
• 2309.15807
• Published
• 34
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video
Generation
Paper
• 2309.15818
• Published
• 19
Text-to-3D using Gaussian Splatting
Paper
• 2309.16585
• Published
• 32
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content
Creation
Paper
• 2309.16653
• Published
• 48
PixArt-α: Fast Training of Diffusion Transformer for
Photorealistic Text-to-Image Synthesis
Paper
• 2310.00426
• Published
• 61
Conditional Diffusion Distillation
Paper
• 2310.01407
• Published
• 20
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion
Paper
• 2310.03502
• Published
• 79
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Paper
• 2310.03739
• Published
• 22
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Paper
• 2310.08465
• Published
• 16
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with
Point Cloud Priors
Paper
• 2310.08529
• Published
• 18
HyperHuman: Hyper-Realistic Human Generation with Latent Structural
Diffusion
Paper
• 2310.08579
• Published
• 16
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Paper
• 2310.11448
• Published
• 40
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Paper
• 2310.15008
• Published
• 22
Matryoshka Diffusion Models
Paper
• 2310.15111
• Published
• 45
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual
Design
Paper
• 2310.15144
• Published
• 14
A Picture is Worth a Thousand Words: Principled Recaptioning Improves
Image Generation
Paper
• 2310.16656
• Published
• 53
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion
Prior
Paper
• 2310.16818
• Published
• 33
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons
Images
Paper
• 2310.16825
• Published
• 36
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper
• 2310.19512
• Published
• 16
Beyond U: Making Diffusion Models Faster & Lighter
Paper
• 2310.20092
• Published
• 12
De-Diffusion Makes Text a Strong Cross-Modal Interface
Paper
• 2311.00618
• Published
• 23
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion
Models
Paper
• 2311.04145
• Published
• 34
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper
• 2311.05556
• Published
• 87
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model
Paper
• 2311.06214
• Published
• 33
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View
Generation and 3D Diffusion
Paper
• 2311.07885
• Published
• 40
Instant3D: Instant Text-to-3D Generation
Paper
• 2311.08403
• Published
• 47
Drivable 3D Gaussian Avatars
Paper
• 2311.08581
• Published
• 47
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
Model
Paper
• 2311.09217
• Published
• 22
UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
Paper
• 2311.09257
• Published
• 47
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
• 2311.10093
• Published
• 58
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry
and Texture
Paper
• 2311.10123
• Published
• 18
SelfEval: Leveraging the discriminative nature of generative models for
evaluation
Paper
• 2311.10708
• Published
• 17
Emu Video: Factorizing Text-to-Video Generation by Explicit Image
Conditioning
Paper
• 2311.10709
• Published
• 25
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human
Expression
Paper
• 2311.10794
• Published
• 27
Make Pixels Dance: High-Dynamic Video Generation
Paper
• 2311.10982
• Published
• 68
AutoStory: Generating Diverse Storytelling Images with Minimal Human
Effort
Paper
• 2311.11243
• Published
• 16
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval
Score Matching
Paper
• 2311.11284
• Published
• 20
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction
Paper
• 2311.12024
• Published
• 19
MagicDance: Realistic Human Dance Video Generation with Motions & Facial
Expressions Transfer
Paper
• 2311.12052
• Published
• 32
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Paper
• 2311.12092
• Published
• 22
NeuroPrompts: An Adaptive Framework to Optimize Prompts for
Text-to-Image Generation
Paper
• 2311.12229
• Published
• 26
Diffusion Model Alignment Using Direct Preference Optimization
Paper
• 2311.12908
• Published
• 49
FusionFrames: Efficient Architectural Aspects for Text-to-Video
Generation Pipeline
Paper
• 2311.13073
• Published
• 58
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
Model
Paper
• 2311.13231
• Published
• 28
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper
• 2311.13384
• Published
• 53
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
• 2311.13600
• Published
• 47
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper
• 2312.00777
• Published
• 24
VideoSwap: Customized Video Subject Swapping with Interactive Semantic
Point Correspondence
Paper
• 2312.02087
• Published
• 22
ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation
Paper
• 2312.02201
• Published
• 35
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded
Diffusion Model
Paper
• 2312.02238
• Published
• 27
FaceStudio: Put Your Face Everywhere in Seconds
Paper
• 2312.02663
• Published
• 32
DiffiT: Diffusion Vision Transformers for Image Generation
Paper
• 2312.02139
• Published
• 15
VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models
Paper
• 2312.00845
• Published
• 39
DeepCache: Accelerating Diffusion Models for Free
Paper
• 2312.00858
• Published
• 23
Analyzing and Improving the Training Dynamics of Diffusion Models
Paper
• 2312.02696
• Published
• 33
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper
• 2312.02432
• Published
• 14
LivePhoto: Real Image Animation with Text-guided Motion Control
Paper
• 2312.02928
• Published
• 18
Fine-grained Controllable Video Generation via Object Appearance and
Context
Paper
• 2312.02919
• Published
• 13
MotionCtrl: A Unified and Flexible Motion Controller for Video
Generation
Paper
• 2312.03641
• Published
• 22
Controllable Human-Object Interaction Synthesis
Paper
• 2312.03913
• Published
• 23
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
• 2312.03793
• Published
• 18
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
• 2312.04461
• Published
• 62
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a
Single Image
Paper
• 2312.04543
• Published
• 22
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper
• 2312.04410
• Published
• 15
DreaMoving: A Human Dance Video Generation Framework based on Diffusion
Models
Paper
• 2312.05107
• Published
• 39
GenTron: Delving Deep into Diffusion Transformers for Image and Video
Generation
Paper
• 2312.04557
• Published
• 13
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D
priors
Paper
• 2312.04963
• Published
• 17
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
Prior
Paper
• 2312.06655
• Published
• 24
Photorealistic Video Generation with Diffusion Models
Paper
• 2312.06662
• Published
• 24
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper
• 2312.07537
• Published
• 27
FreeControl: Training-Free Spatial Control of Any Text-to-Image
Diffusion Model with Any Condition
Paper
• 2312.07536
• Published
• 18
DiffMorpher: Unleashing the Capability of Diffusion Models for Image
Morphing
Paper
• 2312.07409
• Published
• 23
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Paper
• 2312.08128
• Published
• 13
VideoLCM: Video Latent Consistency Model
Paper
• 2312.09109
• Published
• 23
Mosaic-SDF for 3D Generative Models
Paper
• 2312.09222
• Published
• 17
DreamTalk: When Expressive Talking Head Generation Meets Diffusion
Probabilistic Models
Paper
• 2312.09767
• Published
• 27
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion
Models
Paper
• 2312.09608
• Published
• 16
FineControlNet: Fine-level Text Control for Image Generation with
Spatially Aligned Text Control Injection
Paper
• 2312.09252
• Published
• 12
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip
Connection Editing
Paper
• 2312.11392
• Published
• 20
Rich Human Feedback for Text-to-Image Generation
Paper
• 2312.10240
• Published
• 20
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive
Generation
Paper
• 2312.12491
• Published
• 75
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Paper
• 2312.12490
• Published
• 19
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
Paper
• 2312.13834
• Published
• 26
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for
Single Image Talking Face Generation
Paper
• 2312.13578
• Published
• 29
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
Paper
• 2312.13913
• Published
• 24
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image
Inpainting with Diffusion Models
Paper
• 2312.14091
• Published
• 17
DreamTuner: Single Image is Enough for Subject-Driven Generation
Paper
• 2312.13691
• Published
• 27
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed
Diffusion Models
Paper
• 2312.13763
• Published
• 10
PIA: Your Personalized Image Animator via Plug-and-Play Modules in
Text-to-Image Models
Paper
• 2312.13964
• Published
• 19
Make-A-Character: High Quality Text-to-3D Character Generation within
Minutes
Paper
• 2312.15430
• Published
• 28
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Paper
• 2312.15770
• Published
• 15
Unsupervised Universal Image Segmentation
Paper
• 2312.17243
• Published
• 20
DreamGaussian4D: Generative 4D Gaussian Splatting
Paper
• 2312.17142
• Published
• 19
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video
Synthesis
Paper
• 2312.17681
• Published
• 19
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper
• 2401.01256
• Published
• 22
Image Sculpting: Precise Object Editing with 3D Geometry Control
Paper
• 2401.01702
• Published
• 20
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Paper
• 2401.04468
• Published
• 49
PIXART-δ: Fast and Controllable Image Generation with Latent
Consistency Models
Paper
• 2401.05252
• Published
• 49
InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes
Paper
• 2401.05335
• Published
• 29
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper
• 2401.06105
• Published
• 50
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for
Text-to-Image Generation
Paper
• 2401.05675
• Published
• 24
TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering
Paper
• 2401.06003
• Published
• 25
InstantID: Zero-shot Identity-Preserving Generation in Seconds
Paper
• 2401.07519
• Published
• 57
Towards A Better Metric for Text-to-Video Generation
Paper
• 2401.07781
• Published
• 15
UniVG: Towards UNIfied-modal Video Generation
Paper
• 2401.09084
• Published
• 17
GARField: Group Anything with Radiance Fields
Paper
• 2401.09419
• Published
• 21
Quantum Denoising Diffusion Models
Paper
• 2401.07049
• Published
• 15
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Paper
• 2401.10061
• Published
• 32
WorldDreamer: Towards General World Models for Video Generation via
Predicting Masked Tokens
Paper
• 2401.09985
• Published
• 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper
• 2401.10891
• Published
• 62
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
Generating with Multimodal LLMs
Paper
• 2401.11708
• Published
• 30
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
Paper
• 2401.11739
• Published
• 17
Synthesizing Moving People with 3D Control
Paper
• 2401.10889
• Published
• 12
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
Diffusion Transformers
Paper
• 2401.11605
• Published
• 23
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
• 2401.12945
• Published
• 87
Large-scale Reinforcement Learning for Diffusion Models
Paper
• 2401.12244
• Published
• 29
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper
• 2401.14404
• Published
• 18
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent
Diffusion Models for Virtual Try-All
Paper
• 2401.13795
• Published
• 68
Motion-I2V: Consistent and Controllable Image-to-Video Generation with
Explicit Motion Modeling
Paper
• 2401.15977
• Published
• 39
StableIdentity: Inserting Anybody into Anywhere at First Sight
Paper
• 2401.15975
• Published
• 18
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane
Extrapolation
Paper
• 2401.17053
• Published
• 33
Advances in 3D Generation: A Survey
Paper
• 2401.17807
• Published
• 19
Anything in Any Scene: Photorealistic Video Object Insertion
Paper
• 2401.17509
• Published
• 17
ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural
Radiance Fields
Paper
• 2401.17895
• Published
• 16
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
Paper
• 2402.00769
• Published
• 22
Boximator: Generating Rich and Controllable Motions for Video Synthesis
Paper
• 2402.01566
• Published
• 27
Training-Free Consistent Text-to-Image Generation
Paper
• 2402.03286
• Published
• 67
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content
Creation
Paper
• 2402.05054
• Published
• 29
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Paper
• 2402.04324
• Published
• 26
Magic-Me: Identity-Specific Video Customized Diffusion
Paper
• 2402.09368
• Published
• 31
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Paper
• 2402.10210
• Published
• 35
DreamMatcher: Appearance Matching Self-Attention for
Semantically-Consistent Text-to-Image Personalization
Paper
• 2402.09812
• Published
• 16
GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object
with Gaussian Splatting
Paper
• 2402.10259
• Published
• 15
Paper
• 2402.13144
• Published
• 100
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for
Single or Sparse-view 3D Object Reconstruction
Paper
• 2402.12712
• Published
• 18
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
Synthesis
Paper
• 2402.14797
• Published
• 21
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
Composition
Paper
• 2402.15504
• Published
• 21
Multi-LoRA Composition for Image Generation
Paper
• 2402.16843
• Published
• 31
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
• 2402.17177
• Published
• 88
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
Diffusion Model
Paper
• 2402.17412
• Published
• 23
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Paper
• 2402.18842
• Published
• 15
AtomoVideo: High Fidelity Image-to-Video Generation
Paper
• 2403.01800
• Published
• 23
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
Paper
• 2403.01779
• Published
• 30
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper
• 2403.03206
• Published
• 71
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Paper
• 2403.02084
• Published
• 15
Finetuned Multimodal Language Models Are High-Quality Image-Text Data
Filters
Paper
• 2403.02677
• Published
• 18
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
Text-to-Image Generation
Paper
• 2403.04692
• Published
• 40
VideoElevator: Elevating Video Generation Quality with Versatile
Text-to-Image Diffusion Models
Paper
• 2403.05438
• Published
• 20
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Paper
• 2403.05121
• Published
• 23
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper
• 2403.05135
• Published
• 45
V3D: Video Diffusion Models are Effective 3D Generators
Paper
• 2403.06738
• Published
• 30
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
Paper
• 2403.08764
• Published
• 36
Video Editing via Factorized Diffusion Distillation
Paper
• 2403.09334
• Published
• 22
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based
Semantic Control
Paper
• 2403.09055
• Published
• 26
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image
using Latent Video Diffusion
Paper
• 2403.12008
• Published
• 20
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
Paper
• 2403.12032
• Published
• 15
LightIt: Illumination Modeling and Control for Diffusion Models
Paper
• 2403.10615
• Published
• 18
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion
Distillation
Paper
• 2403.12015
• Published
• 70
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Paper
• 2403.12365
• Published
• 11
AnimateDiff-Lightning: Cross-Model Diffusion Distillation
Paper
• 2403.12706
• Published
• 18
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust
Real-Time Rendering with 900+ FPS
Paper
• 2403.13806
• Published
• 18
DreamReward: Text-to-3D Generation with Human Preference
Paper
• 2403.14613
• Published
• 37
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Paper
• 2403.14468
• Published
• 27
ReNoise: Real Image Inversion Through Iterative Noising
Paper
• 2403.14602
• Published
• 21
Efficient Video Diffusion Models via Content-Frame Motion-Latent
Decomposition
Paper
• 2403.14148
• Published
• 21
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction
and Generation
Paper
• 2403.14621
• Published
• 16
FlashFace: Human Image Personalization with High-fidelity Identity
Preservation
Paper
• 2403.17008
• Published
• 22
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image
Generation
Paper
• 2403.16990
• Published
• 25
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper
• 2403.16627
• Published
• 22
Gamba: Marry Gaussian Splatting with Mamba for single view 3D
reconstruction
Paper
• 2403.18795
• Published
• 20
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object
Removal and Insertion
Paper
• 2403.18818
• Published
• 28
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper
• 2403.18118
• Published
• 12
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for
3D Generative Modeling
Paper
• 2403.19655
• Published
• 19
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper
• 2404.01197
• Published
• 31
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper
• 2404.00987
• Published
• 23
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper
• 2404.01294
• Published
• 17
Segment Any 3D Object with Language
Paper
• 2404.02157
• Published
• 2
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper
• 2404.02101
• Published
• 24
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper
• 2404.02125
• Published
• 10
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
• 2404.02905
• Published
• 74
On the Scalability of Diffusion-based Text-to-Image Generation
Paper
• 2404.02883
• Published
• 19
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image
Generation
Paper
• 2404.02733
• Published
• 22
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion
Models
Paper
• 2404.02747
• Published
• 13
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept
Matching
Paper
• 2404.03653
• Published
• 35
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper
• 2404.03566
• Published
• 16
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency
Decomposition
Paper
• 2404.02514
• Published
• 11
Robust Gaussian Splatting
Paper
• 2404.04211
• Published
• 9
ByteEdit: Boost, Comply and Accelerate Generative Image Editing
Paper
• 2404.04860
• Published
• 25
UniFL: Improve Stable Diffusion via Unified Feedback Learning
Paper
• 2404.05595
• Published
• 24
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Paper
• 2404.05014
• Published
• 33
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual
Editing
Paper
• 2404.05717
• Published
• 26
Aligning Diffusion Models by Optimizing Human Utility
Paper
• 2404.04465
• Published
• 15
BeyondScene: Higher-Resolution Human-Centric Scene Generation With
Pretrained Diffusion
Paper
• 2404.04544
• Published
• 23
DATENeRF: Depth-Aware Text-based Editing of NeRFs
Paper
• 2404.04526
• Published
• 10
Hash3D: Training-free Acceleration for 3D Generation
Paper
• 2404.06091
• Published
• 13
Revising Densification in Gaussian Splatting
Paper
• 2404.06109
• Published
• 9
Reconstructing Hand-Held Objects in 3D
Paper
• 2404.06507
• Published
• 6
Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion
Paper
• 2404.06429
• Published
• 7
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic
Gaussian Splatting
Paper
• 2404.06903
• Published
• 21
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth
Diffusion
Paper
• 2404.07199
• Published
• 27
ControlNet++: Improving Conditional Controls with Efficient Consistency
Feedback
Paper
• 2404.07987
• Published
• 48
Applying Guidance in a Limited Interval Improves Sample and Distribution
Quality in Diffusion Models
Paper
• 2404.07724
• Published
• 14
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
Controls to Any Diffusion Model
Paper
• 2404.09967
• Published
• 21
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Paper
• 2404.09990
• Published
• 14
EdgeFusion: On-Device Text-to-Image Generation
Paper
• 2404.11925
• Published
• 23
PhysDreamer: Physics-Based Interaction with 3D Objects via Video
Generation
Paper
• 2404.13026
• Published
• 24
Does Gaussian Splatting need SFM Initialization?
Paper
• 2404.12547
• Published
• 9
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
Synthesis
Paper
• 2404.13686
• Published
• 29
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper
• 2404.14507
• Published
• 23
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Paper
• 2404.16022
• Published
• 25
Interactive3D: Create What You Want by Interactive 3D Generation
Paper
• 2404.16510
• Published
• 21
NeRF-XL: Scaling NeRFs with Multiple GPUs
Paper
• 2404.16221
• Published
• 15
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and
Human Ratings
Paper
• 2404.16820
• Published
• 17
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity
Preserving
Paper
• 2404.16771
• Published
• 19
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring
Unconstrained Photo Collections
Paper
• 2404.16845
• Published
• 7
Stylus: Automatic Adapter Selection for Diffusion Models
Paper
• 2404.18928
• Published
• 15
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper
• 2404.19427
• Published
• 74
MotionLCM: Real-time Controllable Motion Generation via Latent
Consistency Model
Paper
• 2404.19759
• Published
• 27
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Paper
• 2404.19702
• Published
• 20
SAGS: Structure-Aware 3D Gaussian Splatting
Paper
• 2404.19149
• Published
• 14
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper
• 2404.18212
• Published
• 30
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper
• 2405.00676
• Published
• 10
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
Generation
Paper
• 2405.01434
• Published
• 56
Customizing Text-to-Image Models with a Single Image Pair
Paper
• 2405.01536
• Published
• 22
Coin3D: Controllable and Interactive 3D Assets Generation with
Proxy-Guided Conditioning
Paper
• 2405.08054
• Published
• 25
Compositional Text-to-Image Generation with Dense Blob Representations
Paper
• 2405.08246
• Published
• 17
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
Paper
• 2405.10314
• Published
• 47
Toon3D: Seeing Cartoons from a New Perspective
Paper
• 2405.10320
• Published
• 22
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode
Multi-view Latent Diffusion
Paper
• 2405.09874
• Published
• 20
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper
• 2405.11473
• Published
• 56
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory
Score Matching
Paper
• 2405.11252
• Published
• 16
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and
Attribute Control
Paper
• 2405.12970
• Published
• 25
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
• 2405.12399
• Published
• 30
ReVideo: Remake a Video with Motion and Content Control
Paper
• 2405.13865
• Published
• 25
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion
Models
Paper
• 2405.16537
• Published
• 17
Human4DiT: Free-view Human Video Generation with 4D Diffusion
Transformer
Paper
• 2405.17405
• Published
• 16
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with
Dynamic Gaussian Surfels
Paper
• 2405.16822
• Published
• 13
Part123: Part-aware 3D Reconstruction from a Single-view Image
Paper
• 2405.16888
• Published
• 12
Paper
• 2405.18407
• Published
• 48
GFlow: Recovering 4D World from Monocular Video
Paper
• 2405.18426
• Published
• 17
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian
Splatting
Paper
• 2405.18424
• Published
• 9
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model
with Mixed Reward Feedback
Paper
• 2405.18750
• Published
• 21
MOFA-Video: Controllable Image Animation via Generative Motion Field
Adaptions in Frozen Image-to-Video Diffusion Model
Paper
• 2405.20222
• Published
• 11
Learning Temporally Consistent Video Depth from Video Diffusion Priors
Paper
• 2406.01493
• Published
• 23
I4VGen: Image as Stepping Stone for Text-to-Video Generation
Paper
• 2406.02230
• Published
• 18
Guiding a Diffusion Model with a Bad Version of Itself
Paper
• 2406.02507
• Published
• 17
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Paper
• 2406.03184
• Published
• 21
Step-aware Preference Optimization: Aligning Preference with Denoising
Performance at Each Step
Paper
• 2406.04314
• Published
• 30
SF-V: Single Forward Video Generation Model
Paper
• 2406.04324
• Published
• 24
VideoTetris: Towards Compositional Text-to-Video Generation
Paper
• 2406.04277
• Published
• 25
pOps: Photo-Inspired Diffusion Operators
Paper
• 2406.01300
• Published
• 17
GenAI Arena: An Open Evaluation Platform for Generative Models
Paper
• 2406.04485
• Published
• 22
Autoregressive Model Beats Diffusion: Llama for Scalable Image
Generation
Paper
• 2406.06525
• Published
• 71
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering
for HDR View Synthesis
Paper
• 2406.06216
• Published
• 23
GTR: Improving Large 3D Reconstruction Models through Geometry and
Texture Refinement
Paper
• 2406.05649
• Published
• 12
Zero-shot Image Editing with Reference Imitation
Paper
• 2406.07547
• Published
• 33
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper
• 2406.07550
• Published
• 60
NaRCan: Natural Refined Canonical Image with Integration of Diffusion
Prior for Video Editing
Paper
• 2406.06523
• Published
• 53
MotionClone: Training-Free Motion Cloning for Controllable Video
Generation
Paper
• 2406.05338
• Published
• 41
Physics3D: Learning Physical Properties of 3D Gaussians via Video
Diffusion
Paper
• 2406.04338
• Published
• 39
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent
Font Effect Generation
Paper
• 2406.08392
• Published
• 21
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Paper
• 2406.07792
• Published
• 16
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
Video Generation
Paper
• 2406.07686
• Published
• 17
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and
Less Hallucination
Paper
• 2406.05132
• Published
• 30
Alleviating Distortion in Image Generation via Multi-Resolution
Diffusion Models
Paper
• 2406.09416
• Published
• 29
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Paper
• 2406.08552
• Published
• 25
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal
Prompts
Paper
• 2406.09162
• Published
• 14
Make It Count: Text-to-Image Generation with an Accurate Number of
Objects
Paper
• 2406.10210
• Published
• 78
Training-free Camera Control for Video Generation
Paper
• 2406.10126
• Published
• 13
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with
Structure Priors
Paper
• 2406.12459
• Published
• 12