Commit ·
22d77c4
1
Parent(s): cf0ecfa
Update README.md
Browse files
README.md
CHANGED
|
@@ -24,6 +24,37 @@ The model is trained with a strict filter of 0.4 similarity distance thresholds
|
|
| 24 |
For the [dataset](https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset)
|
| 25 |
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
```
|
| 28 |
conda create -n BERT_visual python=3.6 anaconda
|
| 29 |
conda activate BERT_visual
|
|
|
|
| 24 |
For the [dataset](https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset)
|
| 25 |
|
| 26 |
|
| 27 |
+
|
| 28 |
+
## # Result with SoTA pre-trained image Captioning BLIP
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
Comparison result with BLIP (125M pre-trained images) [Table 7 COCO Caption Karpathy testset](https://arxiv.org/pdf/2201.12086.pdf).
|
| 32 |
+
For the VilBERT model (3.5 pre-trained images) please refer to the paper.
|
| 33 |
+
|
| 34 |
+
## Accuarcy
|
| 35 |
+
|
| 36 |
+
| Model | B-1 | B-2 | B-3 | B-4 | M | R | C | S |BERTscore |
|
| 37 |
+
|----------------------------------|---------|-------|--------|-------|--------|--------|-------|--------|---------|
|
| 38 |
+
| BLIP Beam Search b=3 | .797 | .649 | **.514** | **.403** | **.311** | **.606** |**1.365** |**.243** | **.9484** |
|
| 39 |
+
| + BERT-CNN $th=0$ | .798 | .646 | .506 | .392 | .305 | .598 | 1.339 | .238 | .9473 |
|
| 40 |
+
| + BERT-CNN $th\geq0.2$ | .798 | .647 | .507 | .393 | .306 | .600 | 1.342 | .238 | .9473 |
|
| 41 |
+
| + BERT-CNN $th\geq0.3$ | .802 | .651 | .511 | .397 | .307 | .601 | 1.349 | .238 | .9479 |
|
| 42 |
+
| + BERT-CNN $th\geq0.4$ | **.806** | **.654** | .513 | .397 | .303 | .599 | 1.343 | .235 | .9476 |
|
| 43 |
+
|
| 44 |
+
## Diversity
|
| 45 |
+
|
| 46 |
+
| Model | Uniq | V | MBlue-1↓ | Div-1 |Div-2 | SBERT-sts|
|
| 47 |
+
|----------------------------------|---------|-------|----------|-------|-------|----------|
|
| 48 |
+
| BLIP Beam Search b=3 | **8.60** | 1406 | .461 | .68 | .80 | .8058 |
|
| 49 |
+
| + BERT-CNN $th=0$ | 8.49 | **1532** | .457 | .68 | .80 | .8046 |
|
| 50 |
+
| + BERT-CNN $th\geq0.2$ | 8.48 | 1486 | .458 | .68 | .80 | .8052 |
|
| 51 |
+
| + BERT-CNN $th\geq0.3$ | 8.41 | 1448 | .458 | .68 | .80 | **.8060** |
|
| 52 |
+
| + BERT-CNN $th\geq0.4$ | 8.30 | 1448 | **.455** | .68 | .80 | .8053 |
|
| 53 |
+
|human | 9.14 | 3425 | .375 | .74 | .84 | NA |
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
|
| 58 |
```
|
| 59 |
conda create -n BERT_visual python=3.6 anaconda
|
| 60 |
conda activate BERT_visual
|