GPT-sl-base

This model is a Slovene GPT model, based on the bigscience workshop fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.

Model architecture

GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length. The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.

Training

The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.

Step	Validation Perplexity
50000	26.801
100000	25.574
150000	24.773
200000	24.099
250000	23.336
300000	22.607
350000	22.329
390000	22.293

Downloads last month: 21