|
|
--- |
|
|
language: |
|
|
- is |
|
|
license: cc-by-4.0 |
|
|
datasets: |
|
|
- igc |
|
|
--- |
|
|
|
|
|
# Icelandic TEAMS-Small |
|
|
This model was pre-trained on the 2022 version of the [Icelandic Gigaword Corpus](http://igc.arnastofnun.is/) using the [TensorFlow Model Garden](https://github.com/tensorflow/models). Its pre-training configuration is identical to that of the original [TEAMS-Small model](https://github.com/tensorflow/models/blob/master/official/projects/teams/experiments/small/wiki_books_pretrain.yaml), except for the use of a Unigram tokenizer with a vocabulary size of 64,105. |
|
|
|
|
|
TEAMS is a variant of the ELECTRA architecture, described in [Training ELECTRA Augmented with Multi-word Selection](https://aclanthology.org/2021.findings-acl.219/). While architecturally equivalent to ELECTRA, TEAMS introduces an additional pre-training objective. |
|
|
|
|
|
# Acknowledgments |
|
|
This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC). |
|
|
|