TeLLaMA — Telugu GPT-Style Language Model

A 35.3 million parameter decoder-only Transformer language model trained from scratch on Telugu text from the CC-100 corpus.

Model Details

Property	Value
Parameters	35,314,944
Architecture	GPT-style decoder-only Transformer
Vocabulary	32,000 BPE tokens (Telugu-optimized)
Context length	256 tokens
Embedding dim	384
Layers	6
Attention heads	6
Dropout	0.1

Training

Corpus: CC-100 Telugu shard, capped at 20M characters (90/10 train/val split)
Optimizer: AdamW (lr=4e-4, weight decay=0.1)
Schedule: 8 epochs (3 fresh + 5 resumed from epoch-3 checkpoint)
Hardware: Apple Silicon (MPS backend)
Tokens observed: ~57 million across 110,000 optimizer steps
Final train CE: ~4.5 nats | Final val CE: ~6.6–6.8 nats

Files

model_epoch_1.pt through model_epoch_8.pt — PyTorch checkpoints (model + optimizer state)
telugu_tokenizer.json — trained BPE tokenizer (Hugging Face tokenizers format)

Usage

import torch
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("telugu_tokenizer.json")
checkpoint = torch.load("model_epoch_8.pt", map_location="cpu")
# Load into your GPT model class from models/gpt_model.py

Limitations

Laboratory-scale model (35M params); not competitive with billion-parameter multilingual systems
Single-seed training run; variance across seeds unmeasured
May reproduce CC-100 web boilerplate (news, entertainment, Wikipedia fragments)
Not intended for production deployment without safety evaluation

Citation

@mastersthesis{marpally2026tellama,
  title={TeLLaMA: A GPT-Style Language Model from Scratch for Telugu},
  author={Marpally, Anirudh},
  year={2026},
  school={Defence Institute of Advanced Technology (DIAT), Pune}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

anirudhmarpally
/

TeLLaMA

TeLLaMA — Telugu GPT-Style Language Model

Model Details

Training

Files

Usage

Limitations

Links

Citation

Dataset used to train anirudhmarpally/TeLLaMA

Space using anirudhmarpally/TeLLaMA 1