Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,8 @@ license: apache-2.0
|
|
| 13 |
# Tiny BERT December 2022
|
| 14 |
|
| 15 |
This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
|
| 16 |
-
In addition to being more up-to-date, it is more CPU friendly than its base version, but its first version and is not perfect by no means.
|
|
|
|
| 17 |
|
| 18 |
|
| 19 |
The model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
|
|
@@ -45,8 +46,8 @@ OLM
|
|
| 45 |
65825874694874, 'qnli_acc': 0.6199890170236134, 'rte_acc': 0.5595667870036101, 'wnli_acc': 0.5352112676056338}
|
| 46 |
```
|
| 47 |
|
| 48 |
-
Probably messed up with hyperparameters and tokenizer a bit, unfortunately. Stay tuned for version 2 πππ
|
| 49 |
-
|
| 50 |
|
| 51 |
## Dataset
|
| 52 |
|
|
|
|
| 13 |
# Tiny BERT December 2022
|
| 14 |
|
| 15 |
This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
|
| 16 |
+
In addition to being more up-to-date, it is more CPU friendly than its base version, but its first version and is not perfect by no means. Took a day and 8x A100s to train. π€
|
| 17 |
+
|
| 18 |
|
| 19 |
|
| 20 |
The model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
|
|
|
|
| 46 |
65825874694874, 'qnli_acc': 0.6199890170236134, 'rte_acc': 0.5595667870036101, 'wnli_acc': 0.5352112676056338}
|
| 47 |
```
|
| 48 |
|
| 49 |
+
Probably messed up with hyperparameters and tokenizer a bit, unfortunately. Anyway Stay tuned for version 2 πππ
|
| 50 |
+
But please try it out on your downstream tasks, might be more performant. Should be cheap to fine-tune due to its size π€
|
| 51 |
|
| 52 |
## Dataset
|
| 53 |
|