Jón Daðason

Initial commit

d088faf 5 months ago

920 Bytes

metadata

language:
  - is
license: cc-by-4.0
datasets:
  - igc

Icelandic TEAMS-Small

This model was pre-trained on the 2022 version of the Icelandic Gigaword Corpus using the TensorFlow Model Garden. Its pre-training configuration is identical to that of the original TEAMS-Small model, except for the use of a Unigram tokenizer with a vocabulary size of 64,105.

TEAMS is a variant of the ELECTRA architecture, described in Training ELECTRA Augmented with Multi-word Selection. While architecturally equivalent to ELECTRA, TEAMS introduces an additional pre-training objective.

Acknowledgments

This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC).