johko
/

capdec_015

johko commited on Jan 10, 2023

Commit

6865b8c

1 Parent(s): 319dcd9

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,12 +1,20 @@
 ---
 license: apache-2.0
 ---
 # CapDec - NoiseLevel: 0.015
-This is are model weights originally provided by the authors of the paper [Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
-Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise with a standard-deviation(STD) of into the text embeddings before decoding.
 In their words:
 *Specifically, we assume that the visual embedding corresponding to a text embedding
@@ -20,3 +28,7 @@ The "Noise Level" of 0.015 is equivalent to the Noise Variance which is the squa
 The reported metrics are results of a model with a Noise Variance of 0.016, which the authors unfortunately do not provide in their repository.
 This model with a Noise Variance 0.015 is the closest available  pre-trained model to their best model.

 ---
 license: apache-2.0
+language:
+- en
+pipeline_tag: image-to-text
+datasets:
+- MS-COCO
+- Flickr30k
+tags:
+- Image Captioning
 ---
 # CapDec - NoiseLevel: 0.015
+This are model weights originally provided by the authors of the paper [Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
+Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.
 In their words:
 *Specifically, we assume that the visual embedding corresponding to a text embedding
 The reported metrics are results of a model with a Noise Variance of 0.016, which the authors unfortunately do not provide in their repository.
 This model with a Noise Variance 0.015 is the closest available  pre-trained model to their best model.
+## Performance
+The authors don't explicitly report the performance for this NoiseLevel but it can be estimated from the following figure from the original paper: