aigc and 3d

kame062 's Collections

aigc and 3d

scene understanding

updated Jun 20, 2024

Upvote

One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

Paper • 2306.07967 • Published Jun 13, 2023 • 26
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Paper • 2306.07954 • Published Jun 13, 2023 • 113
TryOnDiffusion: A Tale of Two UNets

Paper • 2306.08276 • Published Jun 14, 2023 • 75
Seeing the World through Your Eyes

Paper • 2306.09348 • Published Jun 15, 2023 • 34
DreamHuman: Animatable 3D Avatars from Text

Paper • 2306.09329 • Published Jun 15, 2023 • 17
AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation

Paper • 2306.09864 • Published Jun 16, 2023 • 15
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Paper • 2306.10012 • Published Jun 16, 2023 • 37
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Paper • 2306.16928 • Published Jun 29, 2023 • 41
DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation

Paper • 2306.12422 • Published Jun 21, 2023 • 13
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Paper • 2306.14435 • Published Jun 26, 2023 • 21
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

Paper • 2306.16934 • Published Jun 29, 2023 • 32
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

Paper • 2306.17843 • Published Jun 30, 2023 • 44
Generate Anything Anywhere in Any Scene

Paper • 2306.17154 • Published Jun 29, 2023 • 23
DisCo: Disentangled Control for Referring Human Dance Generation in Real World

Paper • 2307.00040 • Published Jun 30, 2023 • 26
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Paper • 2307.00522 • Published Jul 2, 2023 • 33
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 92
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

Paper • 2307.02421 • Published Jul 5, 2023 • 35
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Paper • 2307.06942 • Published Jul 13, 2023 • 26
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation

Paper • 2307.03869 • Published Jul 8, 2023 • 24
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Paper • 2307.04725 • Published Jul 10, 2023 • 65
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Paper • 2307.06949 • Published Jul 13, 2023 • 52
DreamTeacher: Pretraining Image Backbones with Deep Generative Models

Paper • 2307.07487 • Published Jul 14, 2023 • 21
Text2Layer: Layered Image Generation using Latent Diffusion Model

Paper • 2307.09781 • Published Jul 19, 2023 • 16
FABRIC: Personalizing Diffusion Models with Iterative Feedback

Paper • 2307.10159 • Published Jul 19, 2023 • 32
TokenFlow: Consistent Diffusion Features for Consistent Video Editing

Paper • 2307.10373 • Published Jul 19, 2023 • 58
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning

Paper • 2307.11410 • Published Jul 21, 2023 • 17
Interpolating between Images with Diffusion Models

Paper • 2307.12560 • Published Jul 24, 2023 • 21
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation

Paper • 2308.00906 • Published Aug 2, 2023 • 15
ConceptLab: Creative Generation using Diffusion Prior Constraints

Paper • 2308.02669 • Published Aug 3, 2023 • 25
AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose

Paper • 2308.03610 • Published Aug 7, 2023 • 25
3D Gaussian Splatting for Real-Time Radiance Field Rendering

Paper • 2308.04079 • Published Aug 8, 2023 • 204
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Paper • 2308.06721 • Published Aug 13, 2023 • 37
Dual-Stream Diffusion Net for Text-to-Video Generation

Paper • 2308.08316 • Published Aug 16, 2023 • 25
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Paper • 2308.08545 • Published Aug 16, 2023 • 35
MVDream: Multi-view Diffusion for 3D Generation

Paper • 2308.16512 • Published Aug 31, 2023 • 106
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Paper • 2309.00398 • Published Sep 1, 2023 • 23
CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Paper • 2309.00610 • Published Sep 1, 2023 • 21
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Paper • 2309.05793 • Published Sep 11, 2023 • 51
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

Paper • 2309.06380 • Published Sep 12, 2023 • 33
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Paper • 2309.15103 • Published Sep 26, 2023 • 43
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Paper • 2309.15807 • Published Sep 27, 2023 • 33
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Paper • 2309.15818 • Published Sep 27, 2023 • 19
Text-to-3D using Gaussian Splatting

Paper • 2309.16585 • Published Sep 28, 2023 • 33
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Paper • 2309.16653 • Published Sep 28, 2023 • 48
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Paper • 2310.00426 • Published Sep 30, 2023 • 62
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 19
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 79
Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Paper • 2310.03739 • Published Oct 5, 2023 • 22
MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Paper • 2310.08465 • Published Oct 12, 2023 • 16
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

Paper • 2310.08529 • Published Oct 12, 2023 • 18
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

Paper • 2310.08579 • Published Oct 12, 2023 • 16
4K4D: Real-Time 4D View Synthesis at 4K Resolution

Paper • 2310.11448 • Published Oct 17, 2023 • 40
Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Paper • 2310.15008 • Published Oct 23, 2023 • 22
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 46
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

Paper • 2310.15144 • Published Oct 23, 2023 • 14
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 54
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 33
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

Paper • 2310.16825 • Published Oct 25, 2023 • 37
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 16
Beyond U: Making Diffusion Models Faster & Lighter

Paper • 2310.20092 • Published Oct 31, 2023 • 12
De-Diffusion Makes Text a Strong Cross-Modal Interface

Paper • 2311.00618 • Published Nov 1, 2023 • 22
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

Paper • 2311.04145 • Published Nov 7, 2023 • 34
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Paper • 2311.05556 • Published Nov 9, 2023 • 86
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Paper • 2311.06214 • Published Nov 10, 2023 • 33
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Paper • 2311.07885 • Published Nov 14, 2023 • 40
Instant3D: Instant Text-to-3D Generation

Paper • 2311.08403 • Published Nov 14, 2023 • 47
Drivable 3D Gaussian Avatars

Paper • 2311.08581 • Published Nov 14, 2023 • 47
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Paper • 2311.09217 • Published Nov 15, 2023 • 22
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Paper • 2311.09257 • Published Nov 14, 2023 • 47
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Paper • 2311.10093 • Published Nov 16, 2023 • 58
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture

Paper • 2311.10123 • Published Nov 16, 2023 • 18
SelfEval: Leveraging the discriminative nature of generative models for evaluation

Paper • 2311.10708 • Published Nov 17, 2023 • 17
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Paper • 2311.10709 • Published Nov 17, 2023 • 25
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Paper • 2311.10794 • Published Nov 17, 2023 • 27
Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 67
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Paper • 2311.11243 • Published Nov 19, 2023 • 16
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Paper • 2311.11284 • Published Nov 19, 2023 • 20
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Paper • 2311.12024 • Published Nov 20, 2023 • 19
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

Paper • 2311.12052 • Published Nov 18, 2023 • 32
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Paper • 2311.12092 • Published Nov 20, 2023 • 22
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Paper • 2311.12229 • Published Nov 20, 2023 • 25
Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 49
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

Paper • 2311.13073 • Published Nov 22, 2023 • 58
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Paper • 2311.13231 • Published Nov 22, 2023 • 28
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 53
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 47
VideoBooth: Diffusion-based Video Generation with Image Prompts

Paper • 2312.00777 • Published Dec 1, 2023 • 24
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Paper • 2312.02087 • Published Dec 4, 2023 • 22
ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation

Paper • 2312.02201 • Published Dec 2, 2023 • 35
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Paper • 2312.02238 • Published Dec 4, 2023 • 27
FaceStudio: Put Your Face Everywhere in Seconds

Paper • 2312.02663 • Published Dec 5, 2023 • 32
DiffiT: Diffusion Vision Transformers for Image Generation

Paper • 2312.02139 • Published Dec 4, 2023 • 15
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Paper • 2312.00845 • Published Dec 1, 2023 • 39
DeepCache: Accelerating Diffusion Models for Free

Paper • 2312.00858 • Published Dec 1, 2023 • 23
Analyzing and Improving the Training Dynamics of Diffusion Models

Paper • 2312.02696 • Published Dec 5, 2023 • 33
Orthogonal Adaptation for Modular Customization of Diffusion Models

Paper • 2312.02432 • Published Dec 5, 2023 • 14
LivePhoto: Real Image Animation with Text-guided Motion Control

Paper • 2312.02928 • Published Dec 5, 2023 • 18
Fine-grained Controllable Video Generation via Object Appearance and Context

Paper • 2312.02919 • Published Dec 5, 2023 • 13
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Paper • 2312.03641 • Published Dec 6, 2023 • 22
Controllable Human-Object Interaction Synthesis

Paper • 2312.03913 • Published Dec 6, 2023 • 23
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 18
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 62
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image

Paper • 2312.04543 • Published Dec 7, 2023 • 22
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Paper • 2312.04410 • Published Dec 7, 2023 • 15
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models

Paper • 2312.05107 • Published Dec 8, 2023 • 39
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Paper • 2312.04557 • Published Dec 7, 2023 • 13
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Paper • 2312.04963 • Published Dec 7, 2023 • 17
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Paper • 2312.06655 • Published Dec 11, 2023 • 24
Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24
FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 27
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Paper • 2312.07536 • Published Dec 12, 2023 • 18
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Paper • 2312.07409 • Published Dec 12, 2023 • 23
Clockwork Diffusion: Efficient Generation With Model-Step Distillation

Paper • 2312.08128 • Published Dec 13, 2023 • 13
VideoLCM: Video Latent Consistency Model

Paper • 2312.09109 • Published Dec 14, 2023 • 23
Mosaic-SDF for 3D Generative Models

Paper • 2312.09222 • Published Dec 14, 2023 • 17
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Paper • 2312.09767 • Published Dec 15, 2023 • 27
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Paper • 2312.09608 • Published Dec 15, 2023 • 16
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

Paper • 2312.09252 • Published Dec 14, 2023 • 12
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Paper • 2312.11392 • Published Dec 18, 2023 • 20
Rich Human Feedback for Text-to-Image Generation

Paper • 2312.10240 • Published Dec 15, 2023 • 20
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 76
InstructVideo: Instructing Video Diffusion Models with Human Feedback

Paper • 2312.12490 • Published Dec 19, 2023 • 19
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

Paper • 2312.13834 • Published Dec 20, 2023 • 26
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

Paper • 2312.13578 • Published Dec 21, 2023 • 29
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Paper • 2312.13913 • Published Dec 21, 2023 • 24
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Paper • 2312.14091 • Published Dec 21, 2023 • 17
DreamTuner: Single Image is Enough for Subject-Driven Generation

Paper • 2312.13691 • Published Dec 21, 2023 • 27
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Paper • 2312.13763 • Published Dec 21, 2023 • 10
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Paper • 2312.13964 • Published Dec 21, 2023 • 19
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

Paper • 2312.15430 • Published Dec 24, 2023 • 28
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Paper • 2312.15770 • Published Dec 25, 2023 • 15
Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 20
DreamGaussian4D: Generative 4D Gaussian Splatting

Paper • 2312.17142 • Published Dec 28, 2023 • 19
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

Paper • 2312.17681 • Published Dec 29, 2023 • 19
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

Paper • 2401.01256 • Published Jan 2, 2024 • 22
Image Sculpting: Precise Object Editing with 3D Geometry Control

Paper • 2401.01702 • Published Jan 2, 2024 • 20
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Paper • 2401.04468 • Published Jan 9, 2024 • 49
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Paper • 2401.05252 • Published Jan 10, 2024 • 49
InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

Paper • 2401.05335 • Published Jan 10, 2024 • 29
PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11, 2024 • 49
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Paper • 2401.05675 • Published Jan 11, 2024 • 24
TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering

Paper • 2401.06003 • Published Jan 11, 2024 • 25
InstantID: Zero-shot Identity-Preserving Generation in Seconds

Paper • 2401.07519 • Published Jan 15, 2024 • 57
Towards A Better Metric for Text-to-Video Generation

Paper • 2401.07781 • Published Jan 15, 2024 • 15
UniVG: Towards UNIfied-modal Video Generation

Paper • 2401.09084 • Published Jan 17, 2024 • 17
GARField: Group Anything with Radiance Fields

Paper • 2401.09419 • Published Jan 17, 2024 • 21
Quantum Denoising Diffusion Models

Paper • 2401.07049 • Published Jan 13, 2024 • 15
DiffusionGPT: LLM-Driven Text-to-Image Generation System

Paper • 2401.10061 • Published Jan 18, 2024 • 31
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 64
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22, 2024 • 30
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

Paper • 2401.11739 • Published Jan 22, 2024 • 17
Synthesizing Moving People with 3D Control

Paper • 2401.10889 • Published Jan 19, 2024 • 12
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21, 2024 • 23
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 86
Large-scale Reinforcement Learning for Diffusion Models

Paper • 2401.12244 • Published Jan 20, 2024 • 29
Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Paper • 2401.14404 • Published Jan 25, 2024 • 18
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

Paper • 2401.13795 • Published Jan 24, 2024 • 67
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29, 2024 • 39
StableIdentity: Inserting Anybody into Anywhere at First Sight

Paper • 2401.15975 • Published Jan 29, 2024 • 18
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

Paper • 2401.17053 • Published Jan 30, 2024 • 33
Advances in 3D Generation: A Survey

Paper • 2401.17807 • Published Jan 31, 2024 • 19
Anything in Any Scene: Photorealistic Video Object Insertion

Paper • 2401.17509 • Published Jan 30, 2024 • 17
ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields

Paper • 2401.17895 • Published Jan 31, 2024 • 16
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

Paper • 2402.00769 • Published Feb 1, 2024 • 22
Boximator: Generating Rich and Controllable Motions for Video Synthesis

Paper • 2402.01566 • Published Feb 2, 2024 • 27
Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5, 2024 • 67
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Paper • 2402.05054 • Published Feb 7, 2024 • 29
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Paper • 2402.04324 • Published Feb 6, 2024 • 27
Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14, 2024 • 31
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Paper • 2402.10210 • Published Feb 15, 2024 • 35
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Paper • 2402.09812 • Published Feb 15, 2024 • 16
GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

Paper • 2402.10259 • Published Feb 15, 2024 • 15
Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 100
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Paper • 2402.12712 • Published Feb 20, 2024 • 18
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22, 2024 • 21
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23, 2024 • 21
Multi-LoRA Composition for Image Generation

Paper • 2402.16843 • Published Feb 26, 2024 • 31
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 88
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

Paper • 2402.17412 • Published Feb 27, 2024 • 23
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

Paper • 2402.18842 • Published Feb 29, 2024 • 15
AtomoVideo: High Fidelity Image-to-Video Generation

Paper • 2403.01800 • Published Mar 4, 2024 • 23
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4, 2024 • 30
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5, 2024 • 71
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Paper • 2403.02084 • Published Mar 4, 2024 • 15
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 19
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7, 2024 • 40
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8, 2024 • 20
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Paper • 2403.05121 • Published Mar 8, 2024 • 23
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 45
V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11, 2024 • 30
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Paper • 2403.08764 • Published Mar 13, 2024 • 36
Video Editing via Factorized Diffusion Distillation

Paper • 2403.09334 • Published Mar 14, 2024 • 22
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Paper • 2403.09055 • Published Mar 14, 2024 • 26
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Paper • 2403.12008 • Published Mar 18, 2024 • 20
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

Paper • 2403.12032 • Published Mar 18, 2024 • 15
LightIt: Illumination Modeling and Control for Diffusion Models

Paper • 2403.10615 • Published Mar 15, 2024 • 18
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Paper • 2403.12015 • Published Mar 18, 2024 • 70
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

Paper • 2403.12365 • Published Mar 19, 2024 • 12
AnimateDiff-Lightning: Cross-Model Diffusion Distillation

Paper • 2403.12706 • Published Mar 19, 2024 • 18
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS

Paper • 2403.13806 • Published Mar 20, 2024 • 19
DreamReward: Text-to-3D Generation with Human Preference

Paper • 2403.14613 • Published Mar 21, 2024 • 37
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Paper • 2403.14468 • Published Mar 21, 2024 • 27
ReNoise: Real Image Inversion Through Iterative Noising

Paper • 2403.14602 • Published Mar 21, 2024 • 21
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

Paper • 2403.14148 • Published Mar 21, 2024 • 21
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Paper • 2403.14621 • Published Mar 21, 2024 • 16
FlashFace: Human Image Personalization with High-fidelity Identity Preservation

Paper • 2403.17008 • Published Mar 25, 2024 • 22
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25, 2024 • 24
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Paper • 2403.16627 • Published Mar 25, 2024 • 22
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

Paper • 2403.18795 • Published Mar 27, 2024 • 20
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27, 2024 • 28
EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Paper • 2403.18118 • Published Mar 26, 2024 • 12
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling

Paper • 2403.19655 • Published Mar 28, 2024 • 19
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1, 2024 • 31
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

Paper • 2404.00987 • Published Apr 1, 2024 • 23
CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1, 2024 • 17
Segment Any 3D Object with Language

Paper • 2404.02157 • Published Apr 2, 2024 • 2
CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Paper • 2404.02101 • Published Apr 2, 2024 • 24
3D Congealing: 3D-Aware Image Alignment in the Wild

Paper • 2404.02125 • Published Apr 2, 2024 • 10
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74
On the Scalability of Diffusion-based Text-to-Image Generation

Paper • 2404.02883 • Published Apr 3, 2024 • 19
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3, 2024 • 22
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 13
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4, 2024 • 35
PointInfinity: Resolution-Invariant Point Diffusion Models

Paper • 2404.03566 • Published Apr 4, 2024 • 16
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

Paper • 2404.02514 • Published Apr 3, 2024 • 11
Robust Gaussian Splatting

Paper • 2404.04211 • Published Apr 5, 2024 • 9
ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Paper • 2404.04860 • Published Apr 7, 2024 • 25
UniFL: Improve Stable Diffusion via Unified Feedback Learning

Paper • 2404.05595 • Published Apr 8, 2024 • 24
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Paper • 2404.05014 • Published Apr 7, 2024 • 33
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Paper • 2404.05717 • Published Apr 8, 2024 • 26
Aligning Diffusion Models by Optimizing Human Utility

Paper • 2404.04465 • Published Apr 6, 2024 • 15
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

Paper • 2404.04544 • Published Apr 6, 2024 • 23
DATENeRF: Depth-Aware Text-based Editing of NeRFs

Paper • 2404.04526 • Published Apr 6, 2024 • 10
Hash3D: Training-free Acceleration for 3D Generation

Paper • 2404.06091 • Published Apr 9, 2024 • 13
Revising Densification in Gaussian Splatting

Paper • 2404.06109 • Published Apr 9, 2024 • 9
Reconstructing Hand-Held Objects in 3D

Paper • 2404.06507 • Published Apr 9, 2024 • 6
Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion

Paper • 2404.06429 • Published Apr 9, 2024 • 7
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Paper • 2404.06903 • Published Apr 10, 2024 • 21
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10, 2024 • 27
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11, 2024 • 49
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Paper • 2404.07724 • Published Apr 11, 2024 • 14
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15, 2024 • 21
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15, 2024 • 14
EdgeFusion: On-Device Text-to-Image Generation

Paper • 2404.11925 • Published Apr 18, 2024 • 23
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Paper • 2404.13026 • Published Apr 19, 2024 • 24
Does Gaussian Splatting need SFM Initialization?

Paper • 2404.12547 • Published Apr 18, 2024 • 9
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21, 2024 • 29
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22, 2024 • 23
PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Paper • 2404.16022 • Published Apr 24, 2024 • 25
Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published Apr 25, 2024 • 21
NeRF-XL: Scaling NeRFs with Multiple GPUs

Paper • 2404.16221 • Published Apr 24, 2024 • 15
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Paper • 2404.16820 • Published Apr 25, 2024 • 16
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published Dec 28, 2024 • 19
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections

Paper • 2404.16845 • Published Feb 14, 2024 • 7
Stylus: Automatic Adapter Selection for Diffusion Models

Paper • 2404.18928 • Published Apr 29, 2024 • 15
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30, 2024 • 74
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Paper • 2404.19759 • Published Apr 30, 2024 • 27
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Paper • 2404.19702 • Published Apr 30, 2024 • 20
SAGS: Structure-Aware 3D Gaussian Splatting

Paper • 2404.19149 • Published Apr 29, 2024 • 14
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28, 2024 • 30
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1, 2024 • 10
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2, 2024 • 56
Customizing Text-to-Image Models with a Single Image Pair

Paper • 2405.01536 • Published May 2, 2024 • 22
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Paper • 2405.08054 • Published May 13, 2024 • 25
Compositional Text-to-Image Generation with Dense Blob Representations

Paper • 2405.08246 • Published May 14, 2024 • 17
CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published May 16, 2024 • 47
Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published May 16, 2024 • 22
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Paper • 2405.09874 • Published May 16, 2024 • 20
FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19, 2024 • 55
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Paper • 2405.11252 • Published May 18, 2024 • 16
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Paper • 2405.12970 • Published May 21, 2024 • 25
Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published May 20, 2024 • 30
ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published May 22, 2024 • 26
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Paper • 2405.16537 • Published May 26, 2024 • 17
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published May 27, 2024 • 16
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Paper • 2405.16822 • Published May 27, 2024 • 13
Part123: Part-aware 3D Reconstruction from a Single-view Image

Paper • 2405.16888 • Published May 27, 2024 • 12
Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 48
GFlow: Recovering 4D World from Monocular Video

Paper • 2405.18426 • Published May 28, 2024 • 17
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Paper • 2405.18424 • Published May 28, 2024 • 9
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Paper • 2405.18750 • Published May 29, 2024 • 20
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Paper • 2405.20222 • Published May 30, 2024 • 11
Learning Temporally Consistent Video Depth from Video Diffusion Priors

Paper • 2406.01493 • Published Jun 3, 2024 • 23
I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4, 2024 • 18
Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4, 2024 • 17
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Paper • 2406.03184 • Published Jun 5, 2024 • 21
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published Jun 6, 2024 • 30
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6, 2024 • 24
VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6, 2024 • 25
pOps: Photo-Inspired Diffusion Operators

Paper • 2406.01300 • Published Jun 3, 2024 • 17
GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published Jun 6, 2024 • 22
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

Paper • 2406.06216 • Published Jun 10, 2024 • 23
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Paper • 2406.05649 • Published Jun 9, 2024 • 12
Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published Jun 11, 2024 • 33
An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11, 2024 • 60
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Paper • 2406.06523 • Published Jun 10, 2024 • 53
MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Paper • 2406.05338 • Published Jun 8, 2024 • 41
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6, 2024 • 39
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Paper • 2406.08392 • Published Jun 12, 2024 • 21
Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Paper • 2406.07792 • Published Jun 12, 2024 • 16
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Paper • 2406.07686 • Published Jun 11, 2024 • 17
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Paper • 2406.05132 • Published Jun 7, 2024 • 30
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13, 2024 • 29
DiTFastAttn: Attention Compression for Diffusion Transformer Models

Paper • 2406.08552 • Published Jun 12, 2024 • 25
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Paper • 2406.09162 • Published Jun 13, 2024 • 14
Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14, 2024 • 78
Training-free Camera Control for Video Generation

Paper • 2406.10126 • Published Jun 14, 2024 • 13
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

Paper • 2406.12459 • Published Jun 18, 2024 • 12

Upvote

Collection guide
Browse collections