| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - bookcorpus |
| | - wikipedia |
| | language: |
| | - en |
| | --- |
| | |
| | # BERT Mini (uncased) |
| |
|
| | Mini BERT models from https://arxiv.org/abs/1908.08962 that the HF team didn't convert. The original [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py) is used. |
| |
|
| | See the original Google repo: [google-research/bert](https://github.com/google-research/bert) |
| |
|
| | Note: it's not clear if these checkpoints have undergone knowledge distillation. |
| |
|
| | ## Model variants |
| |
|
| | | |H=128|H=256|H=512|H=768| |
| | |---|:---:|:---:|:---:|:---:| |
| | | **L=2** |[2/128 (BERT-Tiny)][2_128]|[2/256][2_256]|[2/512][2_512]|[2/768][2_768]| |
| | | **L=4** |[4/128][4_128]|[**4/256 (BERT-Mini)**][4_256]|[4/512 (BERT-Small)][4_512]|[4/768][4_768]| |
| | | **L=6** |[6/128][6_128]|[6/256][6_256]|[6/512][6_512]|[6/768][6_768]| |
| | | **L=8** |[8/128][8_128]|[8/256][8_256]|[8/512 (BERT-Medium)][8_512]|[8/768][8_768]| |
| | | **L=10** |[10/128][10_128]|[10/256][10_256]|[10/512][10_512]|[10/768][10_768]| |
| | | **L=12** |[12/128][12_128]|[12/256][12_256]|[12/512][12_512]|[12/768 (BERT-Base, original)][12_768]| |
| |
|
| | [2_128]: https://huggingface.co/gaunernst/bert-tiny-uncased |
| | [2_256]: https://huggingface.co/gaunernst/bert-L2-H256-uncased |
| | [2_512]: https://huggingface.co/gaunernst/bert-L2-H512-uncased |
| | [2_768]: https://huggingface.co/gaunernst/bert-L2-H768-uncased |
| | [4_128]: https://huggingface.co/gaunernst/bert-L4-H128-uncased |
| | [4_256]: https://huggingface.co/gaunernst/bert-mini-uncased |
| | [4_512]: https://huggingface.co/gaunernst/bert-small-uncased |
| | [4_768]: https://huggingface.co/gaunernst/bert-L4-H768-uncased |
| | [6_128]: https://huggingface.co/gaunernst/bert-L6-H128-uncased |
| | [6_256]: https://huggingface.co/gaunernst/bert-L6-H256-uncased |
| | [6_512]: https://huggingface.co/gaunernst/bert-L6-H512-uncased |
| | [6_768]: https://huggingface.co/gaunernst/bert-L6-H768-uncased |
| | [8_128]: https://huggingface.co/gaunernst/bert-L8-H128-uncased |
| | [8_256]: https://huggingface.co/gaunernst/bert-L8-H256-uncased |
| | [8_512]: https://huggingface.co/gaunernst/bert-medium-uncased |
| | [8_768]: https://huggingface.co/gaunernst/bert-L8-H768-uncased |
| | [10_128]: https://huggingface.co/gaunernst/bert-L10-H128-uncased |
| | [10_256]: https://huggingface.co/gaunernst/bert-L10-H256-uncased |
| | [10_512]: https://huggingface.co/gaunernst/bert-L10-H512-uncased |
| | [10_768]: https://huggingface.co/gaunernst/bert-L10-H768-uncased |
| | [12_128]: https://huggingface.co/gaunernst/bert-L12-H128-uncased |
| | [12_256]: https://huggingface.co/gaunernst/bert-L12-H256-uncased |
| | [12_512]: https://huggingface.co/gaunernst/bert-L12-H512-uncased |
| | [12_768]: https://huggingface.co/bert-base-uncased |
| |
|
| | ## Usage |
| |
|
| | See other BERT model cards e.g. https://huggingface.co/bert-base-uncased |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{turc2019, |
| | title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models}, |
| | author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, |
| | journal={arXiv preprint arXiv:1908.08962v2 }, |
| | year={2019} |
| | } |
| | ``` |