| --- |
| tags: |
| - pytorch |
| - causal-lm |
| metrics: |
| - accuracy |
| language: |
| - sl |
| license: apache-2.0 |
| --- |
| |
| # GPT-sl-base |
|
|
| This model is a Slovene GPT model, based on the [bigscience workshop](https://github.com/bigscience-workshop/Megatron-DeepSpeed) fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu. |
|
|
| ## Model architecture |
| GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length. |
| The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens. |
|
|
| ## Training |
| The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training. |
|
|
| | Step | Validation Perplexity | |
| |:------:|:---------------------:| |
| | 50000 | 26.801 | |
| | 100000 | 25.574 | |
| | 150000 | 24.773 | |
| | 200000 | 24.099 | |
| | 250000 | 23.336 | |
| | 300000 | 22.607 | |
| | 350000 | 22.329 | |
| | 390000 | 22.293 | |
|
|