--- license: apache-2.0 pipeline_tag: image-to-text --- # Talking Points: Describing and Localizing Pixels This repository contains the Point Descriptor and Point Localizer models presented in the paper [Talking Points: Describing and Localizing Pixels](https://huggingface.co/papers/2510.14583). Talking Points introduces a novel framework for pixel-level grounding. It consists of two complementary components: a Point Descriptor that generates rich, contextual descriptions of individual keypoints, and a Point Localizer that regresses precise pixel coordinates from these descriptions. Unlike prior work, our approach produces free-form, coarse-to-fine descriptions that situate keypoints within their visual context.