FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies Paper • 2605.27284 • Published about 1 month ago • 9
view article Article Train 400x faster Static Embedding Models with Sentence Transformers tomaarsen • Jan 15, 2025 • 233
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published Feb 2 • 118
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published Jan 29 • 155