metadata
language:
- is
license: cc-by-4.0
datasets:
- igc
Icelandic TEAMS-Small
This model was pre-trained on the 2022 version of the Icelandic Gigaword Corpus using the TensorFlow Model Garden. Its pre-training configuration is identical to that of the original TEAMS-Small model, except for the use of a Unigram tokenizer with a vocabulary size of 64,105.
TEAMS is a variant of the ELECTRA architecture, described in Training ELECTRA Augmented with Multi-word Selection. While architecturally equivalent to ELECTRA, TEAMS introduces an additional pre-training objective.
Acknowledgments
This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC).