TeLLaMA β€” Telugu GPT-Style Language Model

A 35.3 million parameter decoder-only Transformer language model trained from scratch on Telugu text from the CC-100 corpus.

Model Details

Property Value
Parameters 35,314,944
Architecture GPT-style decoder-only Transformer
Vocabulary 32,000 BPE tokens (Telugu-optimized)
Context length 256 tokens
Embedding dim 384
Layers 6
Attention heads 6
Dropout 0.1

Training

  • Corpus: CC-100 Telugu shard, capped at 20M characters (90/10 train/val split)
  • Optimizer: AdamW (lr=4e-4, weight decay=0.1)
  • Schedule: 8 epochs (3 fresh + 5 resumed from epoch-3 checkpoint)
  • Hardware: Apple Silicon (MPS backend)
  • Tokens observed: ~57 million across 110,000 optimizer steps
  • Final train CE: ~4.5 nats | Final val CE: ~6.6–6.8 nats

Files

  • model_epoch_1.pt through model_epoch_8.pt β€” PyTorch checkpoints (model + optimizer state)
  • telugu_tokenizer.json β€” trained BPE tokenizer (Hugging Face tokenizers format)

Usage

import torch
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("telugu_tokenizer.json")
checkpoint = torch.load("model_epoch_8.pt", map_location="cpu")
# Load into your GPT model class from models/gpt_model.py

Limitations

  • Laboratory-scale model (35M params); not competitive with billion-parameter multilingual systems
  • Single-seed training run; variance across seeds unmeasured
  • May reproduce CC-100 web boilerplate (news, entertainment, Wikipedia fragments)
  • Not intended for production deployment without safety evaluation

Links

Citation

@mastersthesis{marpally2026tellama,
  title={TeLLaMA: A GPT-Style Language Model from Scratch for Telugu},
  author={Marpally, Anirudh},
  year={2026},
  school={Defence Institute of Advanced Technology (DIAT), Pune}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train anirudhmarpally/TeLLaMA

Space using anirudhmarpally/TeLLaMA 1