MAPS: Preserving Vision-Language Representations via Module-Wise Proximity Scheduling for Better Vision-Language-Action Generalization Paper • 2511.19878 • Published Nov 25 • 1
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flow Paper • 2511.20462 • Published Nov 25 • 31
Stanford-ILIAD/prism-qwen25-extra-dinosiglip-224px-0_5b Image-Text-to-Text • Updated Dec 12, 2024 • 1.13k • 6