muhtasham commited on
Commit
ddd8e60
·
1 Parent(s): b2c3f8c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -4
README.md CHANGED
@@ -11,12 +11,12 @@ license: apache-2.0
11
  # Tiny BERT December 2022
12
 
13
  This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
14
- In addition to being more up-to-date, it is more CPU friendly than its base version.
15
 
16
- We think it is fair to directly compare our model to the original tiny BERT because our model was trained with about the same level of compute as the original tiny BERT.
17
- Our model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
18
 
19
- This model was created as part of the OLM project, which has the goal of continuously training and releasing models that are up-to-date and comparable in standard language model performance to their static counterparts.
 
 
20
  This is important because we want our models to know about events like COVID or
21
  a presidential election right after they happen.
22
 
@@ -25,6 +25,27 @@ a presidential election right after they happen.
25
  You can use the raw model for masked language modeling, but it's mostly intended to
26
  be fine-tuned on a downstream task, such as sequence classification, token classification or question answering.
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ## Dataset
29
 
30
  The model and tokenizer were trained with this [December 2022 cleaned Common Crawl dataset](https://huggingface.co/datasets/olm/olm-CC-MAIN-2022-49-sampling-ratio-olm-0.15114822547) plus this [December 2022 cleaned Wikipedia dataset](https://huggingface.co/datasets/olm/olm-wikipedia-20221220).\
 
11
  # Tiny BERT December 2022
12
 
13
  This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
14
+ In addition to being more up-to-date, it is more CPU friendly than its base version, but its first version and is not perfect by no means.
15
 
 
 
16
 
17
+ The model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
18
+
19
+ This model was intended to be part of the OLM project, which has the goal of continuously training and releasing models that are up-to-date and comparable in standard language model performance to their static counterparts.
20
  This is important because we want our models to know about events like COVID or
21
  a presidential election right after they happen.
22
 
 
25
  You can use the raw model for masked language modeling, but it's mostly intended to
26
  be fine-tuned on a downstream task, such as sequence classification, token classification or question answering.
27
 
28
+ ## Special note
29
+
30
+ It looks like the olm tinybert is underperforming the original from a quick glue finetuning and dev evaluation:
31
+
32
+ Original
33
+ ```bash
34
+ {'cola_mcc': 0.0, 'sst2_acc': 0.7981651376146789, 'mrpc_acc': 0.6838235294117647, 'mrpc_f1': 0.8122270742358079, 'stsb_pear': 0.67208
35
+ 2873279731, 'stsb_spear': 0.6933378278505834, 'qqp_acc': 0.7766420762598881, 'mnli_acc': 0.6542027508914926, 'mnli_acc_mm': 0.6670056
36
+ 956875509, 'qnli_acc': 0.774665934468241, 'rte_acc': 0.5776173285198556, 'wnli_acc': 0.49295774647887325}
37
+ ```
38
+
39
+ OLM
40
+ ```bash
41
+ {'cola_mcc': 0.0, 'sst2_acc': 0.7970183486238532, 'mrpc_acc': 0.6838235294117647, 'mrpc_f1': 0.8122270742358079, 'stsb_pear': -0.1597
42
+ 8233085015087, 'stsb_spear': -0.13638650127051932, 'qqp_acc': 0.6292213609628794, 'mnli_acc': 0.5323484462557311, 'mnli_acc_mm': 0.54
43
+ 65825874694874, 'qnli_acc': 0.6199890170236134, 'rte_acc': 0.5595667870036101, 'wnli_acc': 0.5352112676056338}
44
+ ```
45
+
46
+ Probably messed up with hyperparameters and tokenizer a bit, unfortunately. Stay tuned for version 2 🚀🚀🚀
47
+
48
+
49
  ## Dataset
50
 
51
  The model and tokenizer were trained with this [December 2022 cleaned Common Crawl dataset](https://huggingface.co/datasets/olm/olm-CC-MAIN-2022-49-sampling-ratio-olm-0.15114822547) plus this [December 2022 cleaned Wikipedia dataset](https://huggingface.co/datasets/olm/olm-wikipedia-20221220).\