jonfd
/

teams-small-igc22-is

Model card Files Files and versions

teams-small-igc22-is / README.md

Jón Daðason

Initial commit

d088faf 5 months ago

|

history blame contribute delete

920 Bytes

	---
	language:
	- is
	license: cc-by-4.0
	datasets:
	- igc
	---

	# Icelandic TEAMS-Small
	This model was pre-trained on the 2022 version of the [Icelandic Gigaword Corpus](http://igc.arnastofnun.is/) using the [TensorFlow Model Garden](https://github.com/tensorflow/models). Its pre-training configuration is identical to that of the original [TEAMS-Small model](https://github.com/tensorflow/models/blob/master/official/projects/teams/experiments/small/wiki_books_pretrain.yaml), except for the use of a Unigram tokenizer with a vocabulary size of 64,105.

	TEAMS is a variant of the ELECTRA architecture, described in [Training ELECTRA Augmented with Multi-word Selection](https://aclanthology.org/2021.findings-acl.219/). While architecturally equivalent to ELECTRA, TEAMS introduces an additional pre-training objective.

	# Acknowledgments
	This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC).