Ruggero1912/Patch-ioner_talk2dino_capdec_groupnet_COCO_Captions Image-to-Text • Updated 22 days ago • 47
Ruggero1912/Patch-ioner_talk2dino_capdec_groupnet_COCO_Captions Image-to-Text • Updated 22 days ago • 47
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Paper • 2408.02718 • Published Aug 5, 2024 • 61
Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval Paper • 2412.13834 • Published Dec 18, 2024
CountingDINO: A Training-free Pipeline for Class-Agnostic Counting using Unsupervised Backbones Paper • 2504.16570 • Published Apr 23, 2025
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework Paper • 2510.02898 • Published Oct 3, 2025 • 5 • 2
Patch-ioner Collection The official collection of all the Patch-ioner framework models • 9 items • Updated Oct 13, 2025 • 2
Trace Captioning Datasets Collection Trace Captioning datasets collection. The datasets were originally introduced in the paper "One Patch to Caption Them All". • 2 items • Updated Oct 10, 2025