Final pretrained model at step 12000

b577fff verified about 2 months ago

746 Bytes

language:
  - dv
  - en
tags:
  - dhavana
  - causal-lm
  - pretrained

Dhavana-Base-150M

Decoder-only Transformer pretrained from scratch with strong Dhivehi support.

Field	Value
Parameters	125,264,640 (~150M)
Non-embedding params	100,688,640
Architecture	16 layers, d_model=768, GQA 12Q/4KV, SwiGLU, RMSNorm, RoPE
Context length	2048
Tokenizer	`Serialtechlab/dhavana-tok-v0`
Final step	12,000
Total unique training tokens	3,706,066,846
Training data mix	English 65% / Multilingual 13% / Dhivehi 12% / Math 7% / Code 3%

This is the base (pretrained) model — not instruction-tuned. SFT models will be released separately.