aigc
updated
OmnimatteRF: Robust Omnimatte with 3D Background Modeling
Paper
• 2309.07749
• Published
• 7
AudioSR: Versatile Audio Super-resolution at Scale
Paper
• 2309.07314
• Published
• 29
Generative Image Dynamics
Paper
• 2309.07906
• Published
• 55
MagiCapture: High-Resolution Multi-Concept Portrait Customization
Paper
• 2309.06895
• Published
• 28
Text-Guided Generation and Editing of Compositional 3D Avatars
Paper
• 2309.07125
• Published
• 6
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion
Models
Paper
• 2309.06933
• Published
• 14
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models
Paper
• 2309.05793
• Published
• 51
InstaFlow: One Step is Enough for High-Quality Diffusion-Based
Text-to-Image Generation
Paper
• 2309.06380
• Published
• 33
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance
Propagation
Paper
• 2309.00908
• Published
• 6
Diffusion Generative Inverse Design
Paper
• 2309.02040
• Published
• 5
Dual-Stream Diffusion Net for Text-to-Video Generation
Paper
• 2308.08316
• Published
• 25
CoDeF: Content Deformation Fields for Temporally Consistent Video
Processing
Paper
• 2308.07926
• Published
• 29
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Paper
• 2308.06873
• Published
• 28
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image
Diffusion Models
Paper
• 2308.06721
• Published
• 36
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
Pretraining
Paper
• 2308.05734
• Published
• 38
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Paper
• 2308.04079
• Published
• 198
ConceptLab: Creative Generation using Diffusion Prior Constraints
Paper
• 2308.02669
• Published
• 25
Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with
Whitted-Style Ray Tracing
Paper
• 2308.03280
• Published
• 8
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using
Beat-Synchronous Mixup Strategies
Paper
• 2308.01546
• Published
• 19
Computational Long Exposure Mobile Photography
Paper
• 2308.01379
• Published
• 4
PromptStyler: Prompt-driven Style Generation for Source-free Domain
Generalization
Paper
• 2307.15199
• Published
• 13
Interpolating between Images with Diffusion Models
Paper
• 2307.12560
• Published
• 21
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation
without Test-time Fine-tuning
Paper
• 2307.11410
• Published
• 17
Text2Layer: Layered Image Generation using Latent Diffusion Model
Paper
• 2307.09781
• Published
• 16
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image
Models
Paper
• 2307.06949
• Published
• 52
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image
Models
Paper
• 2307.06925
• Published
• 12
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional
Text-to-image Generation
Paper
• 2307.06350
• Published
• 7
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning
Paper
• 2307.04725
• Published
• 65
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
Paper
• 2307.03869
• Published
• 24
SDXL: Improving Latent Diffusion Models for High-Resolution Image
Synthesis
Paper
• 2307.01952
• Published
• 90
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape
Optimization
Paper
• 2306.16928
• Published
• 41
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals
Paper
• 2306.16934
• Published
• 32
Generate Anything Anywhere in Any Scene
Paper
• 2306.17154
• Published
• 23
FoleyGen: Visually-Guided Audio Generation
Paper
• 2309.10537
• Published
• 8
FreeU: Free Lunch in Diffusion U-Net
Paper
• 2309.11497
• Published
• 66
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper
• 2309.11499
• Published
• 60
ProPainter: Improving Propagation and Transformer for Video Inpainting
Paper
• 2309.03897
• Published
• 28
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion
Models
Paper
• 2309.15103
• Published
• 43
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided
Planning
Paper
• 2309.15091
• Published
• 35
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper
• 2309.14717
• Published
• 46
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video
Generation
Paper
• 2309.15818
• Published
• 19
Emu: Enhancing Image Generation Models Using Photogenic Needles in a
Haystack
Paper
• 2309.15807
• Published
• 34
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content
Creation
Paper
• 2309.16653
• Published
• 48
Paper
• 2309.16609
• Published
• 38
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper
• 2309.16496
• Published
• 9
RealFill: Reference-Driven Generation for Authentic Image Completion
Paper
• 2309.16668
• Published
• 15
Deep Geometrized Cartoon Line Inbetweening
Paper
• 2309.16643
• Published
• 26
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model
Adaptation
Paper
• 2309.16429
• Published
• 11
PixArt-α: Fast Training of Diffusion Transformer for
Photorealistic Text-to-Image Synthesis
Paper
• 2310.00426
• Published
• 61
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion
Paper
• 2310.03502
• Published
• 79
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Paper
• 2310.03739
• Published
• 22
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper
• 2310.00704
• Published
• 21
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic
Image Design and Generation
Paper
• 2310.08541
• Published
• 18
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Paper
• 2310.08465
• Published
• 16
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
Paper
• 2310.07697
• Published
• 1
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video
editing
Paper
• 2310.05922
• Published
• 4
VideoGen: A Reference-Guided Latent Diffusion Approach for High
Definition Text-to-Video Generation
Paper
• 2309.00398
• Published
• 23
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Paper
• 2309.03549
• Published
• 6
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with
Point Cloud Priors
Paper
• 2310.08529
• Published
• 18
Reward-Augmented Decoding: Efficient Controlled Text Generation With a
Unidirectional Reward Model
Paper
• 2310.09520
• Published
• 11
Mini-DALLE3: Interactive Text to Image by Prompting Large Language
Models
Paper
• 2310.07653
• Published
• 2
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative
Vokens
Paper
• 2310.02239
• Published
• 2
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Paper
• 2310.11448
• Published
• 40
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper
• 2310.11441
• Published
• 29
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
Paper
• 2310.10769
• Published
• 9
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Paper
• 2310.11440
• Published
• 17
Kosmos-G: Generating Images in Context with Multimodal Large Language
Models
Paper
• 2310.02992
• Published
• 4
MusicAgent: An AI Agent for Music Understanding and Generation with
Large Language Models
Paper
• 2310.11954
• Published
• 25
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture
Propagation
Paper
• 2310.13119
• Published
• 13
DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model
Statistics
Paper
• 2310.13268
• Published
• 18
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
Paper
• 2310.15169
• Published
• 10
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper
• 2310.19512
• Published
• 16
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion
Models
Paper
• 2311.04145
• Published
• 34
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper
• 2311.05556
• Published
• 87
Story-to-Motion: Synthesizing Infinite and Controllable Character
Animation from Long Text
Paper
• 2311.07446
• Published
• 29
Music ControlNet: Multiple Time-varying Controls for Music Generation
Paper
• 2311.07069
• Published
• 45
ChatAnything: Facetime Chat with LLM-Enhanced Personas
Paper
• 2311.06772
• Published
• 35
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View
Generation and 3D Diffusion
Paper
• 2311.07885
• Published
• 40
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality
Foundation Models
Paper
• 2311.06783
• Published
• 28
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying
Paper
• 2311.09578
• Published
• 16
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
• 2311.10093
• Published
• 58
UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
Paper
• 2311.09257
• Published
• 47
Single-Image 3D Human Digitization with Shape-Guided Diffusion
Paper
• 2311.09221
• Published
• 22
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
Model
Paper
• 2311.09217
• Published
• 22
Drivable 3D Gaussian Avatars
Paper
• 2311.08581
• Published
• 47
Emu Video: Factorizing Text-to-Video Generation by Explicit Image
Conditioning
Paper
• 2311.10709
• Published
• 25
MVDream: Multi-view Diffusion for 3D Generation
Paper
• 2308.16512
• Published
• 106
Make Pixels Dance: High-Dynamic Video Generation
Paper
• 2311.10982
• Published
• 68
AutoStory: Generating Diverse Storytelling Images with Minimal Human
Effort
Paper
• 2311.11243
• Published
• 16
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper
• 2311.11501
• Published
• 37
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction
Paper
• 2311.12024
• Published
• 19
MagicDance: Realistic Human Dance Video Generation with Motions & Facial
Expressions Transfer
Paper
• 2311.12052
• Published
• 32
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via
Blender-Oriented GPT Planning
Paper
• 2311.12631
• Published
• 14
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Paper
• 2311.12092
• Published
• 22
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human
Expression
Paper
• 2311.10794
• Published
• 27
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
• 2311.13600
• Published
• 47
LEDITS++: Limitless Image Editing using Text-to-Image Models
Paper
• 2311.16711
• Published
• 25
MoMask: Generative Masked Modeling of 3D Human Motions
Paper
• 2312.00063
• Published
• 18
Hierarchical Masked 3D Diffusion Model for Video Outpainting
Paper
• 2309.02119
• Published
• 13
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human
Captures
Paper
• 2312.02963
• Published
• 10
DragVideo: Interactive Drag-style Video Editing
Paper
• 2312.02216
• Published
• 12
LivePhoto: Real Image Animation with Text-guided Motion Control
Paper
• 2312.02928
• Published
• 18
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded
Diffusion Model
Paper
• 2312.02238
• Published
• 27
FaceStudio: Put Your Face Everywhere in Seconds
Paper
• 2312.02663
• Published
• 32
MotionCtrl: A Unified and Flexible Motion Controller for Video
Generation
Paper
• 2312.03641
• Published
• 22
DreamComposer: Controllable 3D Object Generation via Multi-View
Conditions
Paper
• 2312.03611
• Published
• 8
Context Diffusion: In-Context Aware Image Generation
Paper
• 2312.03584
• Published
• 15
Kandinsky 3.0 Technical Report
Paper
• 2312.03511
• Published
• 45
Photorealistic Video Generation with Diffusion Models
Paper
• 2312.06662
• Published
• 24
CCM: Adding Conditional Controls to Text-to-Image Consistency Models
Paper
• 2312.06971
• Published
• 12
FreeControl: Training-Free Spatial Control of Any Text-to-Image
Diffusion Model with Any Condition
Paper
• 2312.07536
• Published
• 18
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper
• 2312.07537
• Published
• 27
StarVector: Generating Scalable Vector Graphics Code from Images
Paper
• 2312.11556
• Published
• 38
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip
Connection Editing
Paper
• 2312.11392
• Published
• 20
MAG-Edit: Localized Image Editing in Complex Scenarios via
Mask-Based Attention-Adjusted
Guidance
Paper
• 2312.11396
• Published
• 11
DreamTalk: When Expressive Talking Head Generation Meets Diffusion
Probabilistic Models
Paper
• 2312.09767
• Published
• 27
VideoLCM: Video Latent Consistency Model
Paper
• 2312.09109
• Published
• 23
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Paper
• 2312.12490
• Published
• 19
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper
• 2312.14125
• Published
• 47
Generative Multimodal Models are In-Context Learners
Paper
• 2312.13286
• Published
• 36
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models
Paper
• 2312.16693
• Published
• 14
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper
• 2401.01256
• Published
• 22
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper
• 2401.01952
• Published
• 32
TrailBlazer: Trajectory Control for Diffusion-Based Video Generation
Paper
• 2401.00896
• Published
• 15
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Paper
• 2401.04468
• Published
• 49
URHand: Universal Relightable Hands
Paper
• 2401.05334
• Published
• 25
Object-Centric Diffusion for Efficient Video Editing
Paper
• 2401.05735
• Published
• 10
Motion-I2V: Consistent and Controllable Image-to-Video Generation with
Explicit Motion Modeling
Paper
• 2401.15977
• Published
• 39
StableIdentity: Inserting Anybody into Anywhere at First Sight
Paper
• 2401.15975
• Published
• 18
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
Paper
• 2402.00769
• Published
• 22
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
• 2402.17177
• Published
• 88
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
Paper
• 2403.01779
• Published
• 30
BizGen: Advancing Article-level Visual Text Rendering for Infographics
Generation
Paper
• 2503.20672
• Published
• 14