nep-indgns-model

GPT-like decoder-only transformer model for Nepali, Newari, Tamang, and Maithili languages.

Model Architecture

Parameter Value
Vocabulary Size 50,000
Context Length 512
Number of Layers 8
Number of Attention Heads 12
Hidden Size 768
Intermediate Size 3072
Activation Function GELU
Positional Encoding Learned

The model was trained for 160k steps (Batch size 72, AdamW Optimizer lr 5e-5).

Datasets used

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support