Ambroser53 's Collections Vision
updated
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document
Understanding with Instructions
Paper
• 2401.13313
• Published • 5
Text Generation
• 4B • Updated • 298
• 10
What matters when building vision-language models?
Paper
• 2405.02246
• Published • 104
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
• 2405.20204
• Published • 37
Vision Mamba: Efficient Visual Representation Learning with
Bidirectional State Space Model
Paper
• 2401.09417
• Published • 62
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper
• 2406.12275
• Published • 31
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal
Documents
Paper
• 2406.13923
• Published • 25
Instruction Pre-Training: Language Models are Supervised Multitask
Learners
Paper
• 2406.14491
• Published • 96
ColPali: Efficient Document Retrieval with Vision Language Models
Paper
• 2407.01449
• Published • 51
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document
Understanding
Paper
• 2407.12594
• Published • 19