FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model Paper β’ 2510.10921 β’ Published Oct 13, 2025 β’ 11