Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 138
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 75
Text-to-Image Base Models Collection All text-to-image open source base models, with their respective license • 28 items • Updated May 10, 2024 • 28