Music - a ChaangHaan Collection

ChaangHaan 's Collections

Music

updated Jan 4, 2024

Upvote

aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3, 2024 • 31
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 28
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Paper • 2401.00604 • Published Dec 31, 2023 • 6
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 33
Learning Vision from Models Rivals Learning Vision from Data

Paper • 2312.17742 • Published Dec 28, 2023 • 16
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 31
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

Paper • 2312.16457 • Published Dec 27, 2023 • 15
InsActor: Instruction-driven Physics-based Characters

Paper • 2312.17135 • Published Dec 28, 2023 • 10
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

Paper • 2312.16486 • Published Dec 27, 2023 • 7
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Paper • 2312.16272 • Published Dec 26, 2023 • 7
Prompt Expansion for Adaptive Text-to-Image Generation

Paper • 2312.16720 • Published Dec 27, 2023 • 5
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 61
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

Paper • 2312.15430 • Published Dec 24, 2023 • 28
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
LangSplat: 3D Language Gaussian Splatting

Paper • 2312.16084 • Published Dec 26, 2023 • 16
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

Paper • 2312.16145 • Published Dec 26, 2023 • 10
Supervised Knowledge Makes Large Language Models Better In-context Learners

Paper • 2312.15918 • Published Dec 26, 2023 • 9
VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Paper • 2312.14233 • Published Dec 21, 2023 • 16
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper • 2312.14238 • Published Dec 21, 2023 • 20
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Paper • 2312.14878 • Published Dec 22, 2023 • 15
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation

Paper • 2312.14385 • Published Dec 22, 2023 • 7
Shai: A large language model for asset management

Paper • 2312.14203 • Published Dec 21, 2023 • 6
LLM4VG: Large Language Models Evaluation for Video Grounding

Paper • 2312.14206 • Published Dec 21, 2023 • 3
DreamTuner: Single Image is Enough for Subject-Driven Generation

Paper • 2312.13691 • Published Dec 21, 2023 • 27
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Paper • 2312.13913 • Published Dec 21, 2023 • 24
Time is Encoded in the Weights of Finetuned Language Models

Paper • 2312.13401 • Published Dec 20, 2023 • 20
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Paper • 2312.13964 • Published Dec 21, 2023 • 19
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Paper • 2312.14091 • Published Dec 21, 2023 • 17
TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Paper • 2312.13789 • Published Dec 21, 2023 • 15
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Paper • 2312.13980 • Published Dec 21, 2023 • 14
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation

Paper • 2312.13469 • Published Dec 20, 2023 • 11
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Paper • 2312.13763 • Published Dec 21, 2023 • 10
ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors

Paper • 2312.13324 • Published Dec 20, 2023 • 11
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Paper • 2312.13314 • Published Dec 20, 2023 • 8
HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs

Paper • 2312.14140 • Published Dec 21, 2023 • 7
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Paper • 2312.12456 • Published Dec 16, 2023 • 45
Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 36
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Paper • 2312.13252 • Published Dec 20, 2023 • 27
InstructVideo: Instructing Video Diffusion Models with Human Feedback

Paper • 2312.12490 • Published Dec 19, 2023 • 19
Cached Transformers: Improving Transformers with Differentiable Memory Cache

Paper • 2312.12742 • Published Dec 20, 2023 • 13
Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

Paper • 2312.13271 • Published Dec 20, 2023 • 5
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 264
StarVector: Generating Scalable Vector Graphics Code from Images

Paper • 2312.11556 • Published Dec 17, 2023 • 38
3D-LFM: Lifting Foundation Model

Paper • 2312.11894 • Published Dec 19, 2023 • 15
HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Paper • 2312.11666 • Published Dec 18, 2023 • 13
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

Paper • 2312.12423 • Published Dec 19, 2023 • 13
MixRT: Mixed Neural Representations For Real-Time NeRF Rendering

Paper • 2312.11841 • Published Dec 19, 2023 • 11
Tracking Any Object Amodally

Paper • 2312.12433 • Published Dec 19, 2023 • 12
FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline

Paper • 2312.11537 • Published Dec 15, 2023 • 8
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions

Paper • 2312.11595 • Published Dec 18, 2023 • 6
Text-Conditioned Resampler For Long Form Video Understanding

Paper • 2312.11897 • Published Dec 19, 2023 • 6
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation

Paper • 2312.11532 • Published Dec 15, 2023 • 6
Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

Paper • 2312.11535 • Published Dec 15, 2023 • 7
Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method

Paper • 2312.12030 • Published Dec 19, 2023 • 6
VecFusion: Vector Font Generation with Diffusion

Paper • 2312.10540 • Published Dec 16, 2023 • 22
Rich Human Feedback for Text-to-Image Generation

Paper • 2312.10240 • Published Dec 15, 2023 • 20
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Paper • 2312.11392 • Published Dec 18, 2023 • 20
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Paper • 2312.11370 • Published Dec 18, 2023 • 20
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

Paper • 2312.10763 • Published Dec 17, 2023 • 19
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Paper • 2312.11461 • Published Dec 18, 2023 • 20
MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising

Paper • 2312.10899 • Published Dec 18, 2023 • 15
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

Paper • 2312.11396 • Published Dec 18, 2023 • 11
Cascade Speculative Drafting for Even Faster LLM Inference

Paper • 2312.11462 • Published Dec 18, 2023 • 10
Silkie: Preference Distillation for Large Visual Language Models

Paper • 2312.10665 • Published Dec 17, 2023 • 11
VidToMe: Video Token Merging for Zero-Shot Video Editing

Paper • 2312.10656 • Published Dec 17, 2023 • 11
ProTIP: Progressive Tool Retrieval Improves Planning

Paper • 2312.10332 • Published Dec 16, 2023 • 8
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models

Paper • 2312.10835 • Published Dec 17, 2023 • 7
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

Paper • 2312.11459 • Published Dec 18, 2023 • 6
GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

Paper • 2312.11458 • Published Dec 18, 2023 • 5
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Paper • 2312.10003 • Published Dec 15, 2023 • 44
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Paper • 2312.09767 • Published Dec 15, 2023 • 27
MobileSAMv2: Faster Segment Anything to Everything

Paper • 2312.09579 • Published Dec 15, 2023 • 24
Point Transformer V3: Simpler, Faster, Stronger

Paper • 2312.10035 • Published Dec 15, 2023 • 23
Weight subcloning: direct initialization of transformers using larger pretrained ones

Paper • 2312.09299 • Published Dec 14, 2023 • 18
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Paper • 2312.09608 • Published Dec 15, 2023 • 16
Self-Evaluation Improves Selective Generation in Large Language Models

Paper • 2312.09300 • Published Dec 14, 2023 • 16
Stable Score Distillation for High-Quality 3D Generation

Paper • 2312.09305 • Published Dec 14, 2023 • 10
Faithful Persona-based Conversational Dataset Generation with Large Language Models

Paper • 2312.10007 • Published Dec 15, 2023 • 11
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 48
TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 40
CogAgent: A Visual Language Model for GUI Agents

Paper • 2312.08914 • Published Dec 14, 2023 • 31
VideoLCM: Video Latent Consistency Model

Paper • 2312.09109 • Published Dec 14, 2023 • 23
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20
Pixel Aligned Language Models

Paper • 2312.09237 • Published Dec 14, 2023 • 16
SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

Paper • 2312.08889 • Published Dec 13, 2023 • 15
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 12
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

Paper • 2312.09252 • Published Dec 14, 2023 • 12
Holodeck: Language Guided Generation of 3D Embodied AI Environments

Paper • 2312.09067 • Published Dec 14, 2023 • 15
LIME: Localized Image Editing via Attention Regularization in Diffusion Models

Paper • 2312.09256 • Published Dec 14, 2023 • 10
General Object Foundation Model for Images and Videos at Scale

Paper • 2312.09158 • Published Dec 14, 2023 • 11
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

Paper • 2312.08754 • Published Dec 14, 2023 • 11
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Paper • 2312.09251 • Published Dec 14, 2023 • 10
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

Paper • 2312.09246 • Published Dec 14, 2023 • 8
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41
Distributed Inference and Fine-tuning of Large Language Models Over The Internet

Paper • 2312.08361 • Published Dec 13, 2023 • 27
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

Paper • 2312.07661 • Published Dec 12, 2023 • 18
Foundation Models in Robotics: Applications, Challenges, and the Future

Paper • 2312.07843 • Published Dec 13, 2023 • 16
Invariant Graph Transformer

Paper • 2312.07859 • Published Dec 13, 2023 • 9
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Paper • 2312.08344 • Published Dec 13, 2023 • 13
ProNeRF: Learning Efficient Projection-Aware Ray Sampling for Fine-Grained Implicit Neural Radiance Fields

Paper • 2312.08136 • Published Dec 13, 2023 • 6
FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 27
VILA: On Pre-training for Visual Language Models

Paper • 2312.07533 • Published Dec 12, 2023 • 21
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Paper • 2312.07536 • Published Dec 12, 2023 • 18
Interfacing Foundation Models' Embeddings

Paper • 2312.07532 • Published Dec 12, 2023 • 11
CCM: Adding Conditional Controls to Text-to-Image Consistency Models

Paper • 2312.06971 • Published Dec 12, 2023 • 12
Steering Llama 2 via Contrastive Activation Addition

Paper • 2312.06681 • Published Dec 9, 2023 • 14
Honeybee: Locality-enhanced Projector for Multimodal LLM

Paper • 2312.06742 • Published Dec 11, 2023 • 13
Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

Paper • 2312.07231 • Published Dec 12, 2023 • 10
PEEKABOO: Interactive Video Generation via Masked-Diffusion

Paper • 2312.07509 • Published Dec 12, 2023 • 11
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming

Paper • 2312.06908 • Published Dec 12, 2023 • 8
LLM360: Towards Fully Transparent Open-Source LLMs

Paper • 2312.06550 • Published Dec 11, 2023 • 57
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Paper • 2312.06655 • Published Dec 11, 2023 • 24
Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Paper • 2312.06109 • Published Dec 11, 2023 • 21
Context Tuning for Retrieval Augmented Generation

Paper • 2312.05708 • Published Dec 9, 2023 • 16
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"

Paper • 2312.06571 • Published Dec 11, 2023 • 13
Efficient Quantization Strategies for Latent Diffusion Models

Paper • 2312.05431 • Published Dec 9, 2023 • 11
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Paper • 2312.06353 • Published Dec 11, 2023 • 7
Evaluation of Large Language Models for Decision Making in Autonomous Driving

Paper • 2312.06351 • Published Dec 11, 2023 • 6
Using Captum to Explain Generative Language Models

Paper • 2312.05491 • Published Dec 9, 2023 • 4
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing

Paper • 2312.05605 • Published Dec 9, 2023 • 4
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models

Paper • 2312.05107 • Published Dec 8, 2023 • 39
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Paper • 2312.04655 • Published Dec 7, 2023 • 21
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Paper • 2312.04963 • Published Dec 7, 2023 • 17
Customizing Motion in Text-to-Video Diffusion Models

Paper • 2312.04966 • Published Dec 7, 2023 • 11
PathFinder: Guided Search over Multi-Step Reasoning Paths

Paper • 2312.05180 • Published Dec 8, 2023 • 10
MVDD: Multi-View Depth Diffusion Models

Paper • 2312.04875 • Published Dec 8, 2023 • 10
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Paper • 2312.04916 • Published Dec 8, 2023 • 7
Localized Symbolic Knowledge Distillation for Visual Commonsense Models

Paper • 2312.04837 • Published Dec 8, 2023 • 3
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper • 2312.03818 • Published Dec 6, 2023 • 34
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Paper • 2312.04474 • Published Dec 7, 2023 • 34
Controllable Human-Object Interaction Synthesis

Paper • 2312.03913 • Published Dec 6, 2023 • 23
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 18
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 62
Pearl: A Production-ready Reinforcement Learning Agent

Paper • 2312.03814 • Published Dec 6, 2023 • 15
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Paper • 2312.04410 • Published Dec 7, 2023 • 15
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Paper • 2312.04557 • Published Dec 7, 2023 • 13
NeRFiller: Completing Scenes via Generative 3D Inpainting

Paper • 2312.04560 • Published Dec 7, 2023 • 13
Large Language Models for Mathematicians

Paper • 2312.04556 • Published Dec 7, 2023 • 12
Gen2Det: Generate to Detect

Paper • 2312.04566 • Published Dec 7, 2023 • 10
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Paper • 2312.04483 • Published Dec 7, 2023 • 7
Efficient Monotonic Multihead Attention

Paper • 2312.04515 • Published Dec 7, 2023 • 8
Generating Illustrated Instructions

Paper • 2312.04552 • Published Dec 7, 2023 • 9
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Paper • 2312.03849 • Published Dec 6, 2023 • 8
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Paper • 2312.03491 • Published Dec 6, 2023 • 34
Relightable Gaussian Codec Avatars

Paper • 2312.03704 • Published Dec 6, 2023 • 32
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians

Paper • 2312.03029 • Published Dec 5, 2023 • 27
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Paper • 2312.03641 • Published Dec 6, 2023 • 22
Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Paper • 2312.03209 • Published Dec 6, 2023 • 21
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Paper • 2312.03461 • Published Dec 6, 2023 • 17
Context Diffusion: In-Context Aware Image Generation

Paper • 2312.03584 • Published Dec 6, 2023 • 15
LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Paper • 2312.03079 • Published Dec 5, 2023 • 15
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Paper • 2312.03611 • Published Dec 6, 2023 • 8
MagicStick: Controllable Video Editing via Control Handle Transformations

Paper • 2312.03047 • Published Dec 5, 2023 • 11
Self-conditioned Image Generation via Generating Representations

Paper • 2312.03701 • Published Dec 6, 2023 • 9
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia

Paper • 2312.03664 • Published Dec 6, 2023 • 11
Language-Informed Visual Concept Learning

Paper • 2312.03587 • Published Dec 6, 2023 • 8
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Paper • 2312.02238 • Published Dec 4, 2023 • 27
LivePhoto: Real Image Animation with Text-guided Motion Control

Paper • 2312.02928 • Published Dec 5, 2023 • 18
Describing Differences in Image Sets with Natural Language

Paper • 2312.02974 • Published Dec 5, 2023 • 15
Orthogonal Adaptation for Modular Customization of Diffusion Models

Paper • 2312.02432 • Published Dec 5, 2023 • 14
DragVideo: Interactive Drag-style Video Editing

Paper • 2312.02216 • Published Dec 3, 2023 • 12
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Paper • 2312.02963 • Published Dec 5, 2023 • 10
Fine-grained Controllable Video Generation via Object Appearance and Context

Paper • 2312.02919 • Published Dec 5, 2023 • 13
ReconFusion: 3D Reconstruction with Diffusion Priors

Paper • 2312.02981 • Published Dec 5, 2023 • 10
Training Chain-of-Thought via Latent-Variable Inference

Paper • 2312.02179 • Published Nov 28, 2023 • 9
Alchemist: Parametric Control of Material Properties with Diffusion Models

Paper • 2312.02970 • Published Dec 5, 2023 • 9
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Paper • 2312.02949 • Published Dec 5, 2023 • 14
GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Paper • 2312.02980 • Published Dec 5, 2023 • 9
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions

Paper • 2312.02772 • Published Dec 5, 2023 • 7
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Paper • 2312.00845 • Published Dec 1, 2023 • 39
DeepCache: Accelerating Diffusion Models for Free

Paper • 2312.00858 • Published Dec 1, 2023 • 23
Nash Learning from Human Feedback

Paper • 2312.00886 • Published Dec 1, 2023 • 18
DiffiT: Diffusion Vision Transformers for Image Generation

Paper • 2312.02139 • Published Dec 4, 2023 • 15
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Paper • 2312.02155 • Published Dec 4, 2023 • 14
Object Recognition as Next Token Prediction

Paper • 2312.02142 • Published Dec 4, 2023 • 13
GIVT: Generative Infinite-Vocabulary Transformers

Paper • 2312.02116 • Published Dec 4, 2023 • 12
Segment Any 3D Gaussians

Paper • 2312.00860 • Published Dec 1, 2023 • 10
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Paper • 2312.00849 • Published Dec 1, 2023 • 12
Style Aligned Image Generation via Shared Attention

Paper • 2312.02133 • Published Dec 4, 2023 • 11
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Paper • 2312.01409 • Published Dec 3, 2023 • 10
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams

Paper • 2312.01407 • Published Dec 3, 2023 • 8
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

Paper • 2312.01663 • Published Dec 4, 2023 • 6
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 150
Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 27
VideoBooth: Diffusion-based Video Generation with Image Prompts

Paper • 2312.00777 • Published Dec 1, 2023 • 24
SeaLLMs -- Large Language Models for Southeast Asia

Paper • 2312.00738 • Published Dec 1, 2023 • 25
MoMask: Generative Masked Modeling of 3D Human Motions

Paper • 2312.00063 • Published Nov 29, 2023 • 18
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

Paper • 2312.00093 • Published Nov 30, 2023 • 17
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models

Paper • 2312.00079 • Published Nov 30, 2023 • 17
Dolphins: Multimodal Language Model for Driving

Paper • 2312.00438 • Published Dec 1, 2023 • 15
Instruction-tuning Aligns LLMs to the Human Brain

Paper • 2312.00575 • Published Dec 1, 2023 • 15
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

Paper • 2312.00330 • Published Dec 1, 2023 • 13
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Paper • 2312.00109 • Published Nov 30, 2023 • 12
PyNeRF: Pyramidal Neural Radiance Fields

Paper • 2312.00252 • Published Nov 30, 2023 • 11
Towards Accurate Differential Diagnosis with Large Language Models

Paper • 2312.00164 • Published Nov 30, 2023 • 11
FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Paper • 2312.00451 • Published Dec 1, 2023 • 12
Text-Guided 3D Face Synthesis -- From Generation to Editing

Paper • 2312.00375 • Published Dec 1, 2023 • 11
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

Paper • 2312.00085 • Published Nov 30, 2023 • 9
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

Paper • 2311.13073 • Published Nov 22, 2023 • 58
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 53
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 47
Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 49
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Paper • 2311.13231 • Published Nov 22, 2023 • 28
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

Paper • 2311.13435 • Published Nov 22, 2023 • 18
Visual In-Context Prompting

Paper • 2311.13601 • Published Nov 22, 2023 • 18
Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models

Paper • 2311.13141 • Published Nov 22, 2023 • 16
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

Paper • 2311.12052 • Published Nov 18, 2023 • 32
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Paper • 2311.12198 • Published Nov 20, 2023 • 22
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Paper • 2311.12229 • Published Nov 20, 2023 • 25
Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 119
Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 68
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 77
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 43
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

Paper • 2311.11501 • Published Nov 20, 2023 • 37
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Paper • 2311.10794 • Published Nov 17, 2023 • 27
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Paper • 2311.11243 • Published Nov 19, 2023 • 16
Drivable 3D Gaussian Avatars

Paper • 2311.08581 • Published Nov 14, 2023 • 47
GRIM: GRaph-based Interactive narrative visualization for gaMes

Paper • 2311.09213 • Published Nov 15, 2023 • 13
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

Paper • 2311.08469 • Published Nov 14, 2023 • 11
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

Paper • 2311.09180 • Published Nov 15, 2023 • 8
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

Paper • 2311.08263 • Published Nov 14, 2023 • 16

Upvote

Collection guide
Browse collections