Spaces:
Running
Running
new stuff in readme.md
Browse files
readme.md
CHANGED
|
@@ -2,6 +2,14 @@
|
|
| 2 |
|
| 3 |
With a few tricks, we have been able to fine-tune a competitive CLIP-italian model with only 1 million training samples.
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
# Novel Contributions
|
| 6 |
|
| 7 |
The original CLIP model was trained on 400millions text-image pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (translated version of MSCOCO) and WIT. To get competitive results we follewed three directions: 1) more data 2) better augmentation and 3) better training.
|
|
@@ -18,17 +26,23 @@ However, this kind of text, without more information, is not useful to learn a g
|
|
| 18 |
this text is written in Italian and it is good quality. To prevent polluting the data with captions that are not meaningful, we used POS tagging
|
| 19 |
on the data and removed all the captions that were composed for the 80% or more by PROPN.
|
| 20 |
|
| 21 |
-
+ MSCOCO-IT
|
| 22 |
|
| 23 |
-
+
|
| 24 |
|
| 25 |
|
| 26 |
## Better Augmentations
|
| 27 |
|
| 28 |
## Better Training
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
### Optimizer
|
| 31 |
|
|
|
|
|
|
|
| 32 |
|
| 33 |
### Backbone Freezing
|
| 34 |
|
|
|
|
| 2 |
|
| 3 |
With a few tricks, we have been able to fine-tune a competitive CLIP-italian model with only 1 million training samples.
|
| 4 |
|
| 5 |
+
In building this project we kept in mind the following things:
|
| 6 |
+
|
| 7 |
+
+ **Novel Contributions**: we tried to bring something new to the table
|
| 8 |
+
+ **Scientific Validity**: models can look very cool, but external validation is important to assess the real impact
|
| 9 |
+
+ **Broader Outlook**: we always considered which are the possible usages for this model
|
| 10 |
+
|
| 11 |
+
We put our hearts and souls in this project during this week and we hope you will like it :heart:
|
| 12 |
+
|
| 13 |
# Novel Contributions
|
| 14 |
|
| 15 |
The original CLIP model was trained on 400millions text-image pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (translated version of MSCOCO) and WIT. To get competitive results we follewed three directions: 1) more data 2) better augmentation and 3) better training.
|
|
|
|
| 26 |
this text is written in Italian and it is good quality. To prevent polluting the data with captions that are not meaningful, we used POS tagging
|
| 27 |
on the data and removed all the captions that were composed for the 80% or more by PROPN.
|
| 28 |
|
| 29 |
+
+ MSCOCO-IT.
|
| 30 |
|
| 31 |
+
+ Conceptual Captions.
|
| 32 |
|
| 33 |
|
| 34 |
## Better Augmentations
|
| 35 |
|
| 36 |
## Better Training
|
| 37 |
|
| 38 |
+
After different trials, we realized that the usual way of training this model was
|
| 39 |
+
not good enough to get good results. We thus modified two different parts of the
|
| 40 |
+
training pipeline: the optimizer and the training with frozen components.
|
| 41 |
+
|
| 42 |
### Optimizer
|
| 43 |
|
| 44 |
+
The standard AdamW didn't seem enough to train the model...
|
| 45 |
+
|
| 46 |
|
| 47 |
### Backbone Freezing
|
| 48 |
|