FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation Paper • 2601.13976 • Published 21 days ago • 21
Running on Zero Featured 826 Florence 2 📉 826 Analyze images for captions, detection, OCR and segmentation