Spaces:

sunhill
/

clip_score

Runtime error

App Files Files Community

sunhill commited on Sep 15, 2025

Commit

6f3e563

1 Parent(s): 8ef9909

compete CLIP score calculation

Browse files

Files changed (4) hide show

README.md +42 -16
app.py +9 -5
clip_score.py +3 -5
tests.py +63 -11

README.md CHANGED Viewed

@@ -12,37 +12,63 @@ pinned: false
 # Metric Card for CLIP Score
-***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
 ## Metric Description
-*Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 ## How to Use
-*Give general statement of how to use the metric*
-*Provide simplest possible example for using the metric*
 ### Inputs
-*List all input arguments in the format below*
-- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
-### Output Values
-*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
-*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
-#### Values from Popular Papers
-*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 ### Examples
-*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
-## Limitations and Bias
-*Note any known limitations or biases that the metric has, with links and references if possible.*
 ## Citation
-*Cite the source where this metric was introduced.*
 ## Further References
-*Add any useful further references.*

 # Metric Card for CLIP Score
+***Module Card Instructions:*** *This module calculates CLIPScore, a reference-free evaluation metric for image captioning.*
 ## Metric Description
+CLIPScore is a reference-free evaluation metric for image captioning that measures the alignment between images and their corresponding text descriptions. It leverages the CLIP (Contrastive Language-Image Pretraining) model to compute a similarity score between the visual and textual modalities.
 ## How to Use
+To use the CLIPScore metric, you need to provide a list of text predictions and a list of images. The metric will compute the CLIPScore for each pair of image and text.
 ### Inputs
+- **predictions** *(string): A list of text predictions to score. Each prediction should be a string.*
+- **references** *(PIL.Image.Image): A list of images to score against. Each image should be a PIL image.*
+### Output Values
+The CLIPScore metric outputs a dictionary with a single key-value pair:
+- **clip_score** *(float)*: The average CLIPScore across all provided image-text pairs. The score ranges from -1 to 1, where higher scores indicate better alignment between the image and text.
 ### Examples
+```python
+from PIL import Image
+import evaluate
+metric = evaluate.load("sunhill/clip_score")
+predictions = ["A cat sitting on a windowsill.", "A dog playing with a ball."]
+references = [Image.open("cat.jpg"), Image.open("dog.jpg")]
+results = metric.compute(predictions=predictions, references=references)
+print(results)
+# Output: {'clip_score': 0.85}
+```
 ## Citation
+```bibtex
+@article{DBLP:journals/corr/abs-2104-08718,
+    author       = {Jack Hessel and
+                    Ari Holtzman and
+                    Maxwell Forbes and
+                    Ronan Le Bras and
+                    Yejin Choi},
+    title        = {CLIPScore: {A} Reference-free Evaluation Metric for Image Captioning},
+    journal      = {CoRR},
+    volume       = {abs/2104.08718},
+    year         = {2021},
+    url          = {https://arxiv.org/abs/2104.08718},
+    eprinttype   = {arXiv},
+    eprint       = {2104.08718},
+    timestamp    = {Sat, 29 Apr 2023 10:09:27 +0200},
+    biburl       = {https://dblp.org/rec/journals/corr/abs-2104-08718.bib},
+    bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+```
 ## Further References
+- [clip-score](https://github.com/Taited/clip-score)

app.py CHANGED Viewed

@@ -1,12 +1,15 @@
 import evaluate
 import gradio as gr
-metric = evaluate.load("clip_score.py")
 def compute_clip_score(image, text):
-    results = metric.compute(predictions=[text], images=[image])
     return results["clip_score"]
@@ -22,13 +25,14 @@ iface = gr.Interface(
     examples=[
         [
             "https://images.unsplash.com/photo-1720539222585-346e73f01536",
-            "A cat sitting on a couch.",
         ],
         [
             "https://images.unsplash.com/photo-1694253987647-4eebcf679974",
-            "A scenic view of mountains during sunset.",
         ],
     ],
 )
 iface.launch()

+import sys
+from pathlib import Path
 import evaluate
 import gradio as gr
+from evaluate import parse_readme
+metric = evaluate.load("sunhill/clip_score")
 def compute_clip_score(image, text):
+    results = metric.compute(predictions=[text], references=[image])
     return results["clip_score"]
     examples=[
         [
             "https://images.unsplash.com/photo-1720539222585-346e73f01536",
+            "A cat sitting on a couch",
         ],
         [
             "https://images.unsplash.com/photo-1694253987647-4eebcf679974",
+            "A scenic view of mountains during sunset",
         ],
     ],
+    article=parse_readme(Path(sys.path[0]) / "README.md"),
 )
 iface.launch()

clip_score.py CHANGED Viewed

@@ -63,7 +63,7 @@ class CLIPScore(evaluate.Metric):
             features=datasets.Features(
                 {
                     "predictions": datasets.Value("string"),
-                    "references": datasets.Value("float32"),
                 }
             ),
             # Homepage of the module for documentation
@@ -85,14 +85,12 @@ class CLIPScore(evaluate.Metric):
         refer = self.processor(
             text=None, images=references, return_tensors="pt", padding=True
         )
-        refer["pixel_values"] = refer["pixel_values"][0]
         pred = self.tokenizer(predictions, return_tensors="pt", padding=True)
-        for key in pred:
-            pred[key] = pred[key].squeeze()
         refer_features = self.model.get_image_features(**refer)
         pred_features = self.model.get_text_features(**pred)
         refer_features = refer_features / refer_features.norm(dim=1, keepdim=True)
         pred_features = pred_features / pred_features.norm(dim=1, keepdim=True)
-        return {"clip_score": (refer_features * pred_features).sum().item()}

             features=datasets.Features(
                 {
                     "predictions": datasets.Value("string"),
+                    "references": datasets.Image(),
                 }
             ),
             # Homepage of the module for documentation
         refer = self.processor(
             text=None, images=references, return_tensors="pt", padding=True
         )
         pred = self.tokenizer(predictions, return_tensors="pt", padding=True)
         refer_features = self.model.get_image_features(**refer)
         pred_features = self.model.get_text_features(**pred)
         refer_features = refer_features / refer_features.norm(dim=1, keepdim=True)
         pred_features = pred_features / pred_features.norm(dim=1, keepdim=True)
+        clip_score = (refer_features * pred_features).sum().item()
+        return {"clip_score": clip_score / refer_features.shape[0]}

tests.py CHANGED Viewed

@@ -1,17 +1,69 @@
 test_cases = [
     {
-        "predictions": [0, 0],
-        "references": [1, 1],
-        "result": {"metric_score": 0}
     },
     {
-        "predictions": [1, 1],
-        "references": [1, 1],
-        "result": {"metric_score": 1}
     },
     {
-        "predictions": [1, 0],
-        "references": [1, 1],
-        "result": {"metric_score": 0.5}
-    }
-]

+import requests
+from PIL import Image
+import evaluate
+metric = evaluate.load("./clip_score.py")
+def download_image(image_path):
+    if image_path.startswith("http"):
+        image = Image.open(requests.get(image_path, stream=True).raw)
+    else:
+        image = Image.open(image_path)
+    return image
+def compute_clip_score(image, text):
+    if not isinstance(image, list):
+        references = [image]
+    else:
+        references = image
+    if not isinstance(text, list):
+        predictions = [text]
+    else:
+        predictions = text
+    results = metric.compute(predictions=predictions, references=references)
+    return results["clip_score"]
+predictions = ["A cat sitting on a couch", "A scenic view of mountains during sunset"]
+references = [
+    "https://images.unsplash.com/photo-1720539222585-346e73f01536",
+    "https://images.unsplash.com/photo-1694253987647-4eebcf679974",
+]
+references = [download_image(url) for url in references]
 test_cases = [
     {
+        "predictions": predictions,
+        "references": references,
+        "result": {"clip_score": 0.307},
     },
     {
+        "predictions": predictions[0],
+        "references": references[0],
+        "result": {"clip_score": 0.304},
     },
     {
+        "predictions": predictions[1],
+        "references": references[1],
+        "result": {"clip_score": 0.310},
+    },
+    {
+        "predictions": predictions[0],
+        "references": references[1],
+        "result": {"clip_score": 0.106},
+    },
+    {
+        "predictions": predictions[1],
+        "references": references[0],
+        "result": {"clip_score": 0.134},
+    },
+]
+for i, test_case in enumerate(test_cases):
+    result = compute_clip_score(test_case["references"], test_case["predictions"])
+    error = abs(result - test_case["result"]["clip_score"])
+    assert error < 0.1, f"Test case {i} failed"