WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing Paper • 2603.11593 • Published 1 day ago • 15
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing Paper • 2603.11593 • Published 1 day ago • 15
UME-R1 Collection UME-R1 is a framework designed to endow multimodal embedding models with the flexibility to switch between discriminative and generative embeddings • 4 items • Updated Nov 4, 2025 • 8
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published Apr 24, 2025 • 40 • 4
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper • 2503.04812 • Published Mar 4, 2025 • 17 • 3
AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity Paper • 2410.02745 • Published Sep 20, 2024
MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation Paper • 2409.19937 • Published Sep 30, 2024
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper • 2503.04812 • Published Mar 4, 2025 • 17
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper • 2503.04812 • Published Mar 4, 2025 • 17
LLaVE Collection LLaVE is a series of large language and vision embedding models trained on a variety of multimodal embedding datasets • 4 items • Updated Mar 10, 2025 • 8