Spaces:
Runtime error
Runtime error
| title: CLIP Score | |
| tags: | |
| - evaluate | |
| - metric | |
| description: "CLIPScore is a reference-free evaluation metric for image captioning that measures the alignment between images and their corresponding text descriptions." | |
| sdk: gradio | |
| sdk_version: 5.45.0 | |
| app_file: app.py | |
| pinned: false | |
| # Metric Card for CLIP Score | |
| ***Module Card Instructions:*** *This module calculates CLIPScore, a reference-free evaluation metric for image captioning.* | |
| ## Metric Description | |
| CLIPScore is a reference-free evaluation metric for image captioning that measures the alignment between images and their corresponding text descriptions. It leverages the CLIP (Contrastive Language-Image Pretraining) model to compute a similarity score between the visual and textual modalities. | |
| ## How to Use | |
| To use the CLIPScore metric, you need to provide a list of text predictions and a list of images. The metric will compute the CLIPScore for each pair of image and text. | |
| ### Inputs | |
| - **predictions** *(string): A list of text predictions to score. Each prediction should be a string.* | |
| - **references** *(PIL.Image.Image): A list of images to score against. Each image should be a PIL image.* | |
| ### Output Values | |
| The CLIPScore metric outputs a dictionary with a single key-value pair: | |
| - **clip_score** *(float)*: The average CLIPScore across all provided image-text pairs. The score ranges from -1 to 1, where higher scores indicate better alignment between the image and text. | |
| ### Examples | |
| ```python | |
| from PIL import Image | |
| import evaluate | |
| metric = evaluate.load("sunhill/clip_score") | |
| predictions = ["A cat sitting on a windowsill.", "A dog playing with a ball."] | |
| references = [Image.open("cat.jpg"), Image.open("dog.jpg")] | |
| results = metric.compute(predictions=predictions, references=references) | |
| print(results) | |
| # Output: {'clip_score': 0.85} | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @article{DBLP:journals/corr/abs-2104-08718, | |
| author = {Jack Hessel and | |
| Ari Holtzman and | |
| Maxwell Forbes and | |
| Ronan Le Bras and | |
| Yejin Choi}, | |
| title = {CLIPScore: {A} Reference-free Evaluation Metric for Image Captioning}, | |
| journal = {CoRR}, | |
| volume = {abs/2104.08718}, | |
| year = {2021}, | |
| url = {https://arxiv.org/abs/2104.08718}, | |
| eprinttype = {arXiv}, | |
| eprint = {2104.08718}, | |
| timestamp = {Sat, 29 Apr 2023 10:09:27 +0200}, | |
| biburl = {https://dblp.org/rec/journals/corr/abs-2104-08718.bib}, | |
| bibsource = {dblp computer science bibliography, https://dblp.org} | |
| } | |
| ``` | |
| ## Further References | |
| - [clip-score](https://github.com/Taited/clip-score) | |