Papers - Image - Fine-tuning
updated
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
• 2401.00908
• Published
• 189
Visual Instruction Tuning
Paper
• 2304.08485
• Published
• 21
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper
• 2403.09622
• Published
• 17
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
• 2401.12945
• Published
• 87
Model Stock: All we need is just a few fine-tuned models
Paper
• 2403.19522
• Published
• 13
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper
• 2404.01197
• Published
• 31
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept
Matching
Paper
• 2404.03653
• Published
• 35
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Paper
• 2404.03673
• Published
• 15
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper
• 2301.07093
• Published
• 4
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper
• 2404.12803
• Published
• 30
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
Synthesis
Paper
• 2404.13686
• Published
• 29
Capabilities of Gemini Models in Medicine
Paper
• 2404.18416
• Published
• 25
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Paper
• 1707.02968
• Published
• 1
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for
Document Enhancement
Paper
• 2404.05669
• Published
• 1
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Paper
• 2203.03897
• Published
• 1
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper
• 2411.10440
• Published
• 129
DETRs Beat YOLOs on Real-time Object Detection
Paper
• 2304.08069
• Published
• 15