AhmedSSabir
/

BERT-CNN-Visual-Semantic

Model card Files Files and versions

AhmedSSabir commited on Mar 12, 2023

Commit

22d77c4

·

1 Parent(s): cf0ecfa

Update README.md

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -24,6 +24,37 @@ The model is trained with a strict filter of 0.4 similarity distance thresholds
  For the [dataset](https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset)
 ```
 conda create -n BERT_visual python=3.6 anaconda
 conda activate BERT_visual

  For the [dataset](https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset)
+## # Result with SoTA pre-trained image Captioning BLIP
+Comparison result with BLIP (125M pre-trained images) [Table 7 COCO Caption Karpathy testset](https://arxiv.org/pdf/2201.12086.pdf).
+For the VilBERT model (3.5 pre-trained images) please refer to the paper.
+## Accuarcy
+| Model                            | B-1     | B-2   |  B-3   | B-4   |    M   |  R     |  C    | S      |BERTscore |
+|----------------------------------|---------|-------|--------|-------|--------|--------|-------|--------|---------|
+| BLIP Beam Search b=3            | .797   | .649 | **.514**   | **.403**  | **.311**   |  **.606** |**1.365** |**.243**   | **.9484**  |
+| + BERT-CNN  $th=0$            |  .798  | .646 | .506 | .392  | .305 |  .598 |  1.339 | .238  | .9473 |
+| + BERT-CNN  $th\geq0.2$          |  .798  | .647 | .507  | .393 | .306  | .600  | 1.342 | .238  | .9473  |
+| + BERT-CNN  $th\geq0.3$          |  .802  | .651 | .511  | .397 | .307  |  .601 | 1.349 | .238  | .9479  |
+| + BERT-CNN $th\geq0.4$           |  **.806**  | **.654** | .513  | .397 | .303  |  .599 | 1.343 | .235  | .9476  |
+## Diversity
+| Model                            |  Uniq   | V     |  MBlue-1↓ | Div-1  |Div-2 | SBERT-sts|
+|----------------------------------|---------|-------|----------|-------|-------|----------|
+| BLIP Beam Search b=3             | **8.60** | 1406 | .461     | .68   |  .80  | .8058 |
+| + BERT-CNN  $th=0$               | 8.49    | **1532**  | .457     | .68   |  .80  | .8046  |
+| + BERT-CNN  $th\geq0.2$          | 8.48    | 1486  | .458     | .68   |  .80  | .8052 |
+| + BERT-CNN  $th\geq0.3$          | 8.41    | 1448  | .458     | .68   |  .80  |  **.8060** |
+| + BERT-CNN $th\geq0.4$           | 8.30    | 1448  | **.455**     | .68   |  .80  | .8053 |
+|human                             | 9.14    | 3425  | .375     | .74   |  .84   |   NA    |
 ```
 conda create -n BERT_visual python=3.6 anaconda
 conda activate BERT_visual