Pothana Base 300M

A 387M parameter LLaMA-style language model trained from scratch on Telugu text.

Named after Bammera Pothana, the celebrated 15th-century Telugu poet who authored the Andhra Maha Bhagavatamu.

Developed by Dvitva AI.

Model Details

Model pothana-base-300M
Architecture LLaMA (RoPE + SwiGLU + RMSNorm + GQA)
Parameters 387M (unique)
Hidden size 1024
Layers 30 unique (60 effective via weight sharing)
Attention heads 16 Q / 4 KV (Grouped Query Attention)
Intermediate size 2816
Context length 2048
Vocab size 48,000
Tokenizer SentencePiece Unigram (48K)
Training Single GPU, bf16 mixed precision
Developed by Dvitva AI

Quick Start

Using pipeline

from transformers import pipeline

pipe = pipeline("text-generation", model="dvitvaai/pothana-base-300M", trust_remote_code=True)
result = pipe("తెలుగు భాష", max_new_tokens=50, do_sample=True, temperature=0.8)
print(result[0]["generated_text"])

Note: trust_remote_code=True is required for the custom tokenizer that cleans up SentencePiece word boundary markers for readable output.

Manual loading

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("dvitvaai/pothana-base-300M")
tokenizer = AutoTokenizer.from_pretrained("dvitvaai/pothana-base-300M", trust_remote_code=True)

text = "తెలుగు భాష చాలా అందమైనది"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.8,
        top_k=50,
        do_sample=True,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Tokenizer

This model uses a SentencePiece Unigram tokenizer with a 48K vocabulary, trained directly on Telugu text.

  • Handles raw Telugu text directly (no preprocessing needed)
  • Byte-fallback for out-of-vocabulary characters
  • Split digits for better number handling
  • NFKC normalization

Architecture

Key features:

  • Grouped Query Attention (GQA): 16 query heads, 4 KV heads — 4x KV cache reduction
  • Block-wise Weight Sharing: 30 unique blocks, each used twice = 60 effective layers (MobileLLM-LS)
  • SwiGLU MLP with 2816 intermediate size
  • RoPE positional encoding (theta=10000.0)
  • RMSNorm (no bias in any linear layer)

Training

  • Data: Telugu text corpus (Sangraha dataset)
  • Preprocessing: SentencePiece tokenization (raw text)
  • Optimizer: AdamW (lr=3e-4, weight_decay=0.1, beta1=0.9, beta2=0.95)
  • Schedule: WSD (Warmup-Stable-Decay)
  • Precision: bf16 mixed precision
  • Hardware: Single NVIDIA B200 GPU

Limitations

  • This is a base model (not instruction-tuned) — it performs text completion, not instruction following
  • Trained primarily on Telugu text; limited multilingual capability
  • Small model size (387M) limits reasoning and knowledge capacity

License

Apache 2.0

Citation

If you use this model, please cite:

@misc{pothana-base-300M,
  title={Pothana Base 300M: A Telugu Language Model},
  author={Dvitva AI},
  year={2025},
  url={https://huggingface.co/dvitvaai/pothana-base-300M}
}
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support