|
|
--- |
|
|
license: bsd-3-clause-clear |
|
|
language: |
|
|
- ne |
|
|
metrics: |
|
|
- perplexity |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# NepaliGPT: Nepali Language Generative Pretrained Transformer Model |
|
|
This is an experiment for developing a language generation model for the Nepali language. |
|
|
Causal Language Model which can predict the next possible tokens given a context in Nepali language. |
|
|
|
|
|
# Dataset Used |
|
|
A large corpus of 9.3 GB size has been collected from different sources on the internet. The sources include |
|
|
- Nepali Books found online. |
|
|
- Nepali News Article from Nepali news portals. |
|
|
- Nepali text collected from different open source Nepali NLP datasets. |
|
|
|
|
|
# Hyperparameters Used |
|
|
Learning rate -> 2e-5 \ |
|
|
Weight Decay -> 0.01 \ |
|
|
Number of training epochs -> 5 \ |
|
|
bf16 -> True \ |
|
|
Base Model Architecture -> GPT-2 \ |
|
|
|
|
|
## Training Results |
|
|
|
|
|
It achieves the following results on the evaluation set: |
|
|
|
|
|
| Training Loss | Validation Loss | Perplexity |
|
|
|:-------------:|:---------------:|:----------:| |
|
|
| 3.3968 | 3.2705 | 26.3245 |