nep-indgns-model
GPT-like decoder-only transformer model for Nepali, Newari, Tamang, and Maithili languages.
Model Architecture
| Parameter | Value |
|---|---|
| Vocabulary Size | 50,000 |
| Context Length | 512 |
| Number of Layers | 8 |
| Number of Attention Heads | 12 |
| Hidden Size | 768 |
| Intermediate Size | 3072 |
| Activation Function | GELU |
| Positional Encoding | Learned |
The model was trained for 160k steps (Batch size 72, AdamW Optimizer lr 5e-5).
Datasets used
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support