Instructions to use dvitvaai/pothana-stage-a-plus-225M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dvitvaai/pothana-stage-a-plus-225M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dvitvaai/pothana-stage-a-plus-225M", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("dvitvaai/pothana-stage-a-plus-225M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use dvitvaai/pothana-stage-a-plus-225M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dvitvaai/pothana-stage-a-plus-225M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dvitvaai/pothana-stage-a-plus-225M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/dvitvaai/pothana-stage-a-plus-225M
- SGLang
How to use dvitvaai/pothana-stage-a-plus-225M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dvitvaai/pothana-stage-a-plus-225M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dvitvaai/pothana-stage-a-plus-225M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dvitvaai/pothana-stage-a-plus-225M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dvitvaai/pothana-stage-a-plus-225M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use dvitvaai/pothana-stage-a-plus-225M with Docker Model Runner:
docker model run hf.co/dvitvaai/pothana-stage-a-plus-225M
Pothana Stage A+ — 230M Telugu LM with Roman Telugu (Tenglish) Capability
Stage A+ extends dvitvaai/pothana-base-v2-225M with code-mix and Roman Telugu (Tenglish) capabilities. The model can now read and write Telugu in three styles:
- Pure Telugu script — same as Base v2
- Code-mixed (Telugu + English script) — e.g., "నేను meeting కి వెళ్తున్నాను"
- Roman Telugu (Tenglish) — e.g., "naku rendu cinemalu chudaalani undi"
Designed for mobile deployment where Indian users mix scripts freely.
Status: pretrained base model with code-mix capability. Not yet instruction-tuned. Intended as a starting point for retrieval-augmented or instruction fine-tuning.
Quick start
pip install "transformers>=4.40,<4.56" "tokenizers<0.22" morfessor
⚠️ transformers 4.56+ is not supported yet. Between 4.55 and 4.56 HuggingFace changed the
LlamaAttentionAPI in a way our customPothanaAttention(Llama + QK-norm) subclass isn't compatible with — the model loads but produces char-level garbage. tokenizers 0.22+ has a separate WordLevel encoding regression. Pin both as shown until we ship a 4.56-compatiblemodeling_pothana.py.
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="dvitvaai/pothana-stage-a-plus-225M",
trust_remote_code=True,
)
# Mixed-script input — pipeline handles it directly.
print(pipe("నేను రేపు office ki వెళ్లాలి"))
print(pipe("naku rendu cinemalu chudaalani undi"))
Or with the lower-level API:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"dvitvaai/pothana-stage-a-plus-225M", trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"dvitvaai/pothana-stage-a-plus-225M", trust_remote_code=True,
)
GEN = dict(
max_new_tokens=80,
do_sample=True, temperature=0.7, top_p=0.9, repetition_penalty=1.15,
)
# Telugu input — tokenizer runs morfessor v4 segmentation internally.
inputs = tokenizer("నేను రేపు ఆఫీసుకు వెళ్లాలి", return_tensors="pt")
out = model.generate(**inputs, **GEN)
print(tokenizer.decode(out[0], skip_special_tokens=True))
# Roman Telugu input — passes through without segmentation.
inputs = tokenizer("naku rendu cinemalu chudaalani undi", return_tensors="pt")
out = model.generate(**inputs, **GEN)
print(tokenizer.decode(out[0], skip_special_tokens=True))
trust_remote_code=True is required for the custom PothanaForCausalLM (Llama + QK-norm) and the PothanaTokenizer (runs morfessor v4 segmentation on Telugu input and strips @@ continuation prefix at decode).
The morfessor package is required so the tokenizer can segment raw Telugu text the way training did. The morfessor model and supporting files ship with the repo and load automatically. A generation_config.json is also shipped with sane sampling defaults — the model loops badly under greedy decoding (see Limitations).
What's new vs Base v2
| Base v2 | Stage A+ | |
|---|---|---|
| Vocab size | 47,831 | 52,831 (+5,000 Roman Telugu word tokens) |
| Parameters | 222M | 230M (+8M from new embedding rows) |
| Telugu capability | ✓ | ✓ |
| Code-mix (Te+En script) | weak | strong |
| Roman Telugu reading | weak | strong |
| Roman Telugu writing | weak | moderate |
Retrieval-format <retrieved> recognition |
✓ | ✓ |
Training pipeline summary
Base v2 (val=3.16, 49h)
↓ Stage A: retrieval-aware continued pretrain (val=3.05, 6h)
↓ Resize vocab 47,831 → 52,831 (add top-5K Roman word tokens, smart-init)
↓ Stage A+: code-mix continued pretrain (200 steps, ~3.4 epochs on 31M-token codemix corpus)
↓ [THIS MODEL]
Stage A+ specifics
- Data: 12,258 Telugu chunks rewritten by Gemini 2.0 Flash into three formats:
codemix_te_en: natural code-mixing (Te-script + En-script)codemix_roman: same code-mixing, all-Roman (phone-typed Tenglish)telugu_roman: pure Telugu in Roman script- Plus 10% original Telugu (anti-forgetting buffer)
- Total tokens: 31.2M (focused continued pretrain — small but enough)
- Tokenizer extension: top-5K most-frequent Roman Telugu words promoted from BPE-fallback to direct tokens (~33% compression on Roman content)
- Training: B200, effective batch 128 × seq 4096, LR 2e-5, WSD schedule, ~20 min wall time
Tokenizer
morfessor_bpe_telugu_v4-v6:
- 47,831 v4 tokens (morfessor Telugu morphemes + BPE merges for non-Telugu)
- +5,000 Roman Telugu word tokens (e.g.,
nunchi,prabhutvam,kosam,mukhyamantri) - 9 retrieval special tokens (IDs 47822–47830, unused for now)
- Total: 52,831 tokens
The top-5K Roman tokens give massive compression: a Roman word like prabhutvam (was 5 BPE subwords) → now 1 token.
Tokenizer fertility
- Telugu (segmented): same as v4
- English: 1.81 tokens/word (unchanged)
- Roman Telugu (Tenglish): ~2.5 tokens/word on common forms (vs ~5 with pure BPE before)
Architecture
| Parameters | 230M unique (378M on disk due to weight-sharing unroll) |
| Hidden size | 768 |
| Layers (unique) | 24 |
| Layers (effective with weight sharing) | 48 |
| Attention | GQA 16Q / 4KV, head_dim 48 |
| MLP | SwiGLU, intermediate 2048 |
| Norm | RMSNorm (eps=1e-6) |
| Position | RoPE, θ=500,000 |
| QK-norm | yes |
| Tied embeddings | no |
| Vocab | 52,831 |
| Max context | 4,096 |
What this model is good at
- Reading code-mixed Telugu — handles "నేను meeting కి వెళ్తున్నాను" naturally
- Reading Roman Telugu — handles "naku meeting undi" via direct tokens for common words
- Generating coherent Telugu prose — short-to-medium length news/literature-style output
- Generating natural code-mixed Telugu — mixes English nouns into Telugu sentences
Limitations
- Loops at low temperature — like most 225M base models, gets stuck in repetition with greedy / low-temp sampling. Use temp=0.7+ and
repetition_penalty=1.15for cleaner output (shipped as defaults ingeneration_config.json). - Roman Telugu input is partially
<unk>-prone. Only the top-5K most-frequent Roman words are direct vocab entries; the HFWordLeveltokenizer used here has no BPE fallback, so less-common Roman forms (e.g.naku,cinemalu,chudaalani) encode as<unk>and lose their content. Telugu-script and code-mixed Te+En script inputs work cleanly. A future tokenizer rebuild with BPE fallback will fix this. - Roman Telugu generation is weaker than reading — model produces fragmented Roman output even though it reads cleanly. Will improve with Stage B SFT (planned).
- Retrieval grounding is NOT yet trained — model accepts
<retrieved>...</retrieved>format from Stage A, but doesn't yet condition answers on retrieved content. This is intentional: grounded retrieval is taught at Stage B (SFT on synthetic traces). - No instruction tuning — base model only. Zero-shot prompts get continuation-style outputs, not Q&A behavior.
- Factual coverage limited to Sangraha corpus (general Telugu web/news) + 8.8% English Wikipedia from Base v2.
Intended use
Starting point for downstream work:
- Retrieval-augmented fine-tuning — the natural next step (Stage B)
- Telugu / Tenglish instruction tuning — possible with appropriate dataset
- Telugu text classification, NER, summarization — fine-tune with task data
- Research on small-scale Telugu language modeling
Evaluation
- Stage A val_loss: 3.05 (on retrieval-mixed corpus)
- Stage A+ best_val_loss: ~3.0 (codemix corpus, 3.4 epochs)
External benchmarks (IndicGLUE, TyDi-QA-Telugu) have not been run yet.
Citation
@misc{pothana-stage-a-plus-225M,
title = {Pothana Stage A+: A 230M Telugu LM with Roman Telugu and code-mix capability},
author = {Katrapati, Ganesh},
year = {2026},
howpublished = {\url{https://huggingface.co/dvitvaai/pothana-stage-a-plus-225M}},
}
Acknowledgments
- Base model:
dvitvaai/pothana-base-v2-225M - Codemix synthetic data: Gemini 2.0 Flash
- Telugu corpus: AI4Bharat Sangraha
License
Apache 2.0.
- Downloads last month
- 95
Model tree for dvitvaai/pothana-stage-a-plus-225M
Base model
dvitvaai/pothana-base-v2-225M