Indian SLM — Hindi Foundational Model
A LLaMA-style decoder-only language model trained from scratch on Hindi text.
Model Details
| Property | Value |
|---|---|
| Architecture | LLaMA-style (RMSNorm, RoPE, GQA, SwiGLU) |
| Parameters | ~31M |
| Layers | 6 |
| Hidden dim | 512 |
| Attention heads | 8 Q / 4 KV (GQA) |
| FFN hidden dim | 1024 |
| Vocab size | 32,000 |
| Max seq length | 512 |
| Training steps | 400 |
| Dataset | Hindi Wikipedia (wikimedia/wikipedia 20231101.hi) |
| Tokenizer | SentencePiece BPE (32k vocab) |
Architecture
- RMSNorm instead of LayerNorm
- RoPE (Rotary Position Embeddings) for position encoding
- GQA (Grouped Query Attention) — 8 Q heads, 4 KV heads
- SwiGLU feed-forward activation
- Weight tying between embedding and LM head
Evaluation (base model, no fine-tuning)
| Metric | Value |
|---|---|
| Overall Perplexity | 391 |
| Top-1 Accuracy | 20.75% |
| Top-5 Accuracy | 32.37% |
| Vocab Coverage | 100% |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("way2hemanthkumar/indian-slm-hindi-30m")
model = AutoModelForCausalLM.from_pretrained("way2hemanthkumar/indian-slm-hindi-30m")
inputs = tokenizer("भारत एक", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0]))
Limitations
This is a base model trained for validation purposes on a small dataset (500 steps). It is not instruction-tuned. Outputs may be incoherent. Fine-tuning is required for practical use.
Training Details
- Optimizer: AdamW (β1=0.9, β2=0.95)
- LR schedule: linear warmup + cosine decay
- Gradient clipping: 1.0
- Gradient accumulation: 4 steps
- Downloads last month
- 30
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support