| library_name: transformers | |
| license: apache-2.0 | |
| datasets: | |
| - Fece228/latin-literature-dataset-170M | |
| language: | |
| - la | |
| Pretrained from scratch using GPT-2 architecture and a dataset of Latin texts ([Corpus Corporum](https://huggingface.co/datasets/Fece228/latin-literature-dataset-170M)) | |
| 64 token context, loss 4.5, trained on 1 epoch of 492 million tokens | |
| GPT2 style tokenizer trained with min_frequency of 2000 | |
| Tends to get repetitive and is not very coherent, due to size and limited data. |