sebastianhariman
/

image_captioning_model

Model card Files Files and versions

sebastianhariman commited on Jan 11, 2025

Commit

13a34d9

·

verified ·

1 Parent(s): 24b1270

Update README.md

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -40,3 +40,34 @@ The models were evaluated using popular metrics for image captioning: **BLEU (1-
 | **ViT + GPT2**     | **0.728**  | 0.545      | 0.385      | 0.265      | **0.502**  | 0.532       |
 ---

 | **ViT + GPT2**     | **0.728**  | 0.545      | 0.385      | 0.265      | **0.502**  | 0.532       |
 ---
+## **Inference Example**
+Below is an example of how the models perform on a given image. The table shows the reference caption and the predicted captions generated by each model.
+<table>
+  <tr>
+    <th>Image</th>
+    <th>Reference Caption</th>
+    <th>Predicted Caption</th>
+  </tr>
+  <tr>
+    <td>
+      <img src="examples/000000166391.jpg" alt="Traffic light" width="300">
+    </td>
+    <td>
+      <ol>
+        <li>Traffic is stopped at a red stop light.</li>
+        <li>Cars are stopped at a traffic light on a highway.</li>
+        <li>A number of red and green traffic lights on a wide highway.</li>
+        <li>A large and wide street covered in lots of traffic lights.</li>
+        <li>A traffic light and intersection with cars traveling in both directions on the street.</li>
+      </ol>
+    </td>
+    <td>
+      <b>ResNet50 + LSTM:</b> a traffic light with a street sign on it.<br>
+      <b>ViT + BERT:</b> a bunch of traffic lights hanging from a wire.<br>
+      <b>ViT + GPT2:</b> A green traffic light hanging over a street.
+    </td>
+  </tr>
+</table>