MAPS: Preserving Vision-Language Representations via Module-Wise Proximity Scheduling for Better Vision-Language-Action Generalization Paper • 2511.19878 • Published Nov 25 • 1
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flow Paper • 2511.20462 • Published Nov 25 • 31
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models +1 Jun 24, 2024 • 205