CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
Paper
•
2601.10061
•
Published
•
30
nlu
3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence
DocDancer: Towards Agentic Document-Grounded Information Seeking