MM_LLM
updated
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive
Vision-Language Models
Paper
• 2308.01390
• Published
• 34
Med-Flamingo: a Multimodal Medical Few-shot Learner
Paper
• 2307.15189
• Published
• 24
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Paper
• 2307.08581
• Published
• 29
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Paper
• 2307.03601
• Published
• 13
Towards Language Models That Can See: Computer Vision Through the LENS
of Natural Language
Paper
• 2306.16410
• Published
• 29
ImageBind-LLM: Multi-modality Instruction Tuning
Paper
• 2309.03905
• Published
• 18
NExT-GPT: Any-to-Any Multimodal LLM
Paper
• 2309.05519
• Published
• 79
Mirasol3B: A Multimodal Autoregressive model for time-aligned and
contextual modalities
Paper
• 2311.05698
• Published
• 11
CogAgent: A Visual Language Model for GUI Agents
Paper
• 2312.08914
• Published
• 31
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
• 2312.03818
• Published
• 34