TeLLaMA β Telugu GPT-Style Language Model
A 35.3 million parameter decoder-only Transformer language model trained from scratch on Telugu text from the CC-100 corpus.
Model Details
| Property | Value |
|---|---|
| Parameters | 35,314,944 |
| Architecture | GPT-style decoder-only Transformer |
| Vocabulary | 32,000 BPE tokens (Telugu-optimized) |
| Context length | 256 tokens |
| Embedding dim | 384 |
| Layers | 6 |
| Attention heads | 6 |
| Dropout | 0.1 |
Training
- Corpus: CC-100 Telugu shard, capped at 20M characters (90/10 train/val split)
- Optimizer: AdamW (lr=4e-4, weight decay=0.1)
- Schedule: 8 epochs (3 fresh + 5 resumed from epoch-3 checkpoint)
- Hardware: Apple Silicon (MPS backend)
- Tokens observed: ~57 million across 110,000 optimizer steps
- Final train CE: ~4.5 nats | Final val CE: ~6.6β6.8 nats
Files
model_epoch_1.ptthroughmodel_epoch_8.ptβ PyTorch checkpoints (model + optimizer state)telugu_tokenizer.jsonβ trained BPE tokenizer (Hugging Face tokenizers format)
Usage
import torch
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("telugu_tokenizer.json")
checkpoint = torch.load("model_epoch_8.pt", map_location="cpu")
# Load into your GPT model class from models/gpt_model.py
Limitations
- Laboratory-scale model (35M params); not competitive with billion-parameter multilingual systems
- Single-seed training run; variance across seeds unmeasured
- May reproduce CC-100 web boilerplate (news, entertainment, Wikipedia fragments)
- Not intended for production deployment without safety evaluation
Links
- Code: github.com/marpally-anirudh/telugu-gpt
- Thesis: 91-page M.Tech dissertation included in the GitHub repository
Citation
@mastersthesis{marpally2026tellama,
title={TeLLaMA: A GPT-Style Language Model from Scratch for Telugu},
author={Marpally, Anirudh},
year={2026},
school={Defence Institute of Advanced Technology (DIAT), Pune}
}