cjvt
/

gpt-sl-base

Text Generation

text-generation-inference

Model card Files Files and versions

gpt-sl-base / README.md

matejulcar's picture

Update README.md

d397d1a over 3 years ago

|

1.12 kB

	---
	tags:
	- pytorch
	- causal-lm
	metrics:
	- accuracy
	language:
	- sl
	license: apache-2.0
	---

	# GPT-sl-base

	This model is a Slovene GPT model, based on the [bigscience workshop](https://github.com/bigscience-workshop/Megatron-DeepSpeed) fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.

	## Model architecture
	GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length.
	The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.

	## Training
	The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.

	\| Step \| Validation Perplexity \|
	\|:------:\|:---------------------:\|
	\| 50000 \| 26.801 \|
	\| 100000 \| 25.574 \|
	\| 150000 \| 24.773 \|
	\| 200000 \| 24.099 \|
	\| 250000 \| 23.336 \|
	\| 300000 \| 22.607 \|
	\| 350000 \| 22.329 \|
	\| 390000 \| 22.293 \|