aigc

Dushwe 's Collections

aigc

llm

text-to-3D

SSM

updated Mar 27, 2025

Upvote

OmnimatteRF: Robust Omnimatte with 3D Background Modeling

Paper • 2309.07749 • Published Sep 14, 2023 • 7
AudioSR: Versatile Audio Super-resolution at Scale

Paper • 2309.07314 • Published Sep 13, 2023 • 28
Generative Image Dynamics

Paper • 2309.07906 • Published Sep 14, 2023 • 55
MagiCapture: High-Resolution Multi-Concept Portrait Customization

Paper • 2309.06895 • Published Sep 13, 2023 • 28
Text-Guided Generation and Editing of Compositional 3D Avatars

Paper • 2309.07125 • Published Sep 13, 2023 • 6
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

Paper • 2309.06933 • Published Sep 13, 2023 • 14
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Paper • 2309.05793 • Published Sep 11, 2023 • 51
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

Paper • 2309.06380 • Published Sep 12, 2023 • 33
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation

Paper • 2309.00908 • Published Sep 2, 2023 • 6
Diffusion Generative Inverse Design

Paper • 2309.02040 • Published Sep 5, 2023 • 5
Dual-Stream Diffusion Net for Text-to-Video Generation

Paper • 2308.08316 • Published Aug 16, 2023 • 25
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Paper • 2308.07926 • Published Aug 15, 2023 • 29
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Paper • 2308.06873 • Published Aug 14, 2023 • 27
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Paper • 2308.06721 • Published Aug 13, 2023 • 37
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Paper • 2308.05734 • Published Aug 10, 2023 • 37
3D Gaussian Splatting for Real-Time Radiance Field Rendering

Paper • 2308.04079 • Published Aug 8, 2023 • 200
ConceptLab: Creative Generation using Diffusion Prior Constraints

Paper • 2308.02669 • Published Aug 3, 2023 • 25
Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing

Paper • 2308.03280 • Published Aug 7, 2023 • 8
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

Paper • 2308.01546 • Published Aug 3, 2023 • 19
Computational Long Exposure Mobile Photography

Paper • 2308.01379 • Published Aug 2, 2023 • 4
PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

Paper • 2307.15199 • Published Jul 27, 2023 • 13
Interpolating between Images with Diffusion Models

Paper • 2307.12560 • Published Jul 24, 2023 • 21
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning

Paper • 2307.11410 • Published Jul 21, 2023 • 17
Text2Layer: Layered Image Generation using Latent Diffusion Model

Paper • 2307.09781 • Published Jul 19, 2023 • 16
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Paper • 2307.06949 • Published Jul 13, 2023 • 52
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Paper • 2307.06925 • Published Jul 13, 2023 • 12
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Paper • 2307.06350 • Published Jul 12, 2023 • 7
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Paper • 2307.04725 • Published Jul 10, 2023 • 65
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation

Paper • 2307.03869 • Published Jul 8, 2023 • 24
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 92
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Paper • 2306.16928 • Published Jun 29, 2023 • 41
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

Paper • 2306.16934 • Published Jun 29, 2023 • 32
Generate Anything Anywhere in Any Scene

Paper • 2306.17154 • Published Jun 29, 2023 • 23
FoleyGen: Visually-Guided Audio Generation

Paper • 2309.10537 • Published Sep 19, 2023 • 8
FreeU: Free Lunch in Diffusion U-Net

Paper • 2309.11497 • Published Sep 20, 2023 • 66
DreamLLM: Synergistic Multimodal Comprehension and Creation

Paper • 2309.11499 • Published Sep 20, 2023 • 60
ProPainter: Improving Propagation and Transformer for Video Inpainting

Paper • 2309.03897 • Published Sep 7, 2023 • 28
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Paper • 2309.15103 • Published Sep 26, 2023 • 43
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

Paper • 2309.15091 • Published Sep 26, 2023 • 35
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 46
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Paper • 2309.15818 • Published Sep 27, 2023 • 19
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Paper • 2309.15807 • Published Sep 27, 2023 • 33
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Paper • 2309.16653 • Published Sep 28, 2023 • 48
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 38
CCEdit: Creative and Controllable Video Editing via Diffusion Models

Paper • 2309.16496 • Published Sep 28, 2023 • 9
RealFill: Reference-Driven Generation for Authentic Image Completion

Paper • 2309.16668 • Published Sep 28, 2023 • 15
Deep Geometrized Cartoon Line Inbetweening

Paper • 2309.16643 • Published Sep 28, 2023 • 26
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Paper • 2309.16429 • Published Sep 28, 2023 • 11
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Paper • 2310.00426 • Published Sep 30, 2023 • 61
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 79
Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Paper • 2310.03739 • Published Oct 5, 2023 • 22
UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 20
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

Paper • 2310.08541 • Published Oct 12, 2023 • 18
MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Paper • 2310.08465 • Published Oct 12, 2023 • 16
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

Paper • 2310.07697 • Published Oct 11, 2023 • 1
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

Paper • 2310.05922 • Published Oct 9, 2023 • 4
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Paper • 2309.00398 • Published Sep 1, 2023 • 23
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

Paper • 2309.03549 • Published Sep 7, 2023 • 6
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

Paper • 2310.08529 • Published Oct 12, 2023 • 18
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

Paper • 2310.09520 • Published Oct 14, 2023 • 11
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Paper • 2310.07653 • Published Oct 11, 2023 • 2
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Paper • 2310.02239 • Published Oct 3, 2023 • 2
4K4D: Real-Time 4D View Synthesis at 4K Resolution

Paper • 2310.11448 • Published Oct 17, 2023 • 40
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Paper • 2310.11441 • Published Oct 17, 2023 • 29
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation

Paper • 2310.10769 • Published Oct 16, 2023 • 9
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Paper • 2310.11440 • Published Oct 17, 2023 • 17
Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Paper • 2310.02992 • Published Oct 4, 2023 • 4
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

Paper • 2310.11954 • Published Oct 18, 2023 • 24
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation

Paper • 2310.13119 • Published Oct 19, 2023 • 13
DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics

Paper • 2310.13268 • Published Oct 20, 2023 • 18
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling

Paper • 2310.15169 • Published Oct 23, 2023 • 10
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 16
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

Paper • 2311.04145 • Published Nov 7, 2023 • 34
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Paper • 2311.05556 • Published Nov 9, 2023 • 86
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

Paper • 2311.07446 • Published Nov 13, 2023 • 29
Music ControlNet: Multiple Time-varying Controls for Music Generation

Paper • 2311.07069 • Published Nov 13, 2023 • 44
ChatAnything: Facetime Chat with LLM-Enhanced Personas

Paper • 2311.06772 • Published Nov 12, 2023 • 35
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Paper • 2311.07885 • Published Nov 14, 2023 • 40
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 28
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying

Paper • 2311.09578 • Published Nov 16, 2023 • 16
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Paper • 2311.10093 • Published Nov 16, 2023 • 58
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Paper • 2311.09257 • Published Nov 14, 2023 • 47
Single-Image 3D Human Digitization with Shape-Guided Diffusion

Paper • 2311.09221 • Published Nov 15, 2023 • 22
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Paper • 2311.09217 • Published Nov 15, 2023 • 22
Drivable 3D Gaussian Avatars

Paper • 2311.08581 • Published Nov 14, 2023 • 47
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Paper • 2311.10709 • Published Nov 17, 2023 • 25
MVDream: Multi-view Diffusion for 3D Generation

Paper • 2308.16512 • Published Aug 31, 2023 • 106
Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 67
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Paper • 2311.11243 • Published Nov 19, 2023 • 16
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

Paper • 2311.11501 • Published Nov 20, 2023 • 37
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Paper • 2311.12024 • Published Nov 20, 2023 • 19
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

Paper • 2311.12052 • Published Nov 18, 2023 • 32
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 14
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Paper • 2311.12092 • Published Nov 20, 2023 • 22
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Paper • 2311.10794 • Published Nov 17, 2023 • 27
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 47
LEDITS++: Limitless Image Editing using Text-to-Image Models

Paper • 2311.16711 • Published Nov 28, 2023 • 25
MoMask: Generative Masked Modeling of 3D Human Motions

Paper • 2312.00063 • Published Nov 29, 2023 • 18
Hierarchical Masked 3D Diffusion Model for Video Outpainting

Paper • 2309.02119 • Published Sep 5, 2023 • 13
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Paper • 2312.02963 • Published Dec 5, 2023 • 10
DragVideo: Interactive Drag-style Video Editing

Paper • 2312.02216 • Published Dec 3, 2023 • 11
LivePhoto: Real Image Animation with Text-guided Motion Control

Paper • 2312.02928 • Published Dec 5, 2023 • 18
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Paper • 2312.02238 • Published Dec 4, 2023 • 27
FaceStudio: Put Your Face Everywhere in Seconds

Paper • 2312.02663 • Published Dec 5, 2023 • 32
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Paper • 2312.03641 • Published Dec 6, 2023 • 22
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Paper • 2312.03611 • Published Dec 6, 2023 • 8
Context Diffusion: In-Context Aware Image Generation

Paper • 2312.03584 • Published Dec 6, 2023 • 15
Kandinsky 3.0 Technical Report

Paper • 2312.03511 • Published Dec 6, 2023 • 45
Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24
CCM: Adding Conditional Controls to Text-to-Image Consistency Models

Paper • 2312.06971 • Published Dec 12, 2023 • 12
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Paper • 2312.07536 • Published Dec 12, 2023 • 18
FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 27
StarVector: Generating Scalable Vector Graphics Code from Images

Paper • 2312.11556 • Published Dec 17, 2023 • 38
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Paper • 2312.11392 • Published Dec 18, 2023 • 20
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

Paper • 2312.11396 • Published Dec 18, 2023 • 11
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Paper • 2312.09767 • Published Dec 15, 2023 • 27
VideoLCM: Video Latent Consistency Model

Paper • 2312.09109 • Published Dec 14, 2023 • 23
InstructVideo: Instructing Video Diffusion Models with Human Feedback

Paper • 2312.12490 • Published Dec 19, 2023 • 19
VideoPoet: A Large Language Model for Zero-Shot Video Generation

Paper • 2312.14125 • Published Dec 21, 2023 • 47
Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 36
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models

Paper • 2312.16693 • Published Dec 27, 2023 • 14
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

Paper • 2401.01256 • Published Jan 2, 2024 • 22
Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3, 2024 • 31
TrailBlazer: Trajectory Control for Diffusion-Based Video Generation

Paper • 2401.00896 • Published Dec 31, 2023 • 15
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Paper • 2401.04468 • Published Jan 9, 2024 • 49
URHand: Universal Relightable Hands

Paper • 2401.05334 • Published Jan 10, 2024 • 25
Object-Centric Diffusion for Efficient Video Editing

Paper • 2401.05735 • Published Jan 11, 2024 • 10
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29, 2024 • 39
StableIdentity: Inserting Anybody into Anywhere at First Sight

Paper • 2401.15975 • Published Jan 29, 2024 • 18
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

Paper • 2402.00769 • Published Feb 1, 2024 • 22
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 87
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4, 2024 • 30
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Paper • 2503.20672 • Published Mar 26, 2025 • 14

Upvote

Collection guide
Browse collections