multimodal
updated
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper
• 2309.11499
• Published
• 60
FoleyGen: Visually-Guided Audio Generation
Paper
• 2309.10537
• Published
• 8
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper
• 2310.11441
• Published
• 29
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
• 2311.10093
• Published
• 58
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper
• 2311.10702
• Published
• 19
AutoStory: Generating Diverse Storytelling Images with Minimal Human
Effort
Paper
• 2311.11243
• Published
• 16
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human
Expression
Paper
• 2311.10794
• Published
• 27
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Paper
• 2311.12092
• Published
• 22
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
• 2311.13600
• Published
• 47
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper
• 2312.02432
• Published
• 14
FaceStudio: Put Your Face Everywhere in Seconds
Paper
• 2312.02663
• Published
• 32
Fine-grained Controllable Video Generation via Object Appearance and
Context
Paper
• 2312.02919
• Published
• 13
Generating Illustrated Instructions
Paper
• 2312.04552
• Published
• 9
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper
• 2401.06105
• Published
• 50