metadata
language:
- dv
- en
tags:
- dhavana
- causal-lm
- pretrained
Dhavana-Base-150M
Decoder-only Transformer pretrained from scratch with strong Dhivehi support.
| Field | Value |
|---|---|
| Parameters | 125,264,640 (~150M) |
| Non-embedding params | 100,688,640 |
| Architecture | 16 layers, d_model=768, GQA 12Q/4KV, SwiGLU, RMSNorm, RoPE |
| Context length | 2048 |
| Tokenizer | Serialtechlab/dhavana-tok-v0 |
| Final step | 12,000 |
| Total unique training tokens | 3,706,066,846 |
| Training data mix | English 65% / Multilingual 13% / Dhivehi 12% / Math 7% / Code 3% |
This is the base (pretrained) model — not instruction-tuned. SFT models will be released separately.