dhavana-base-150m / README.md
Serialtechlab's picture
Final pretrained model at step 12000
b577fff verified
|
Raw
History Blame Contribute Delete
746 Bytes
metadata
language:
  - dv
  - en
tags:
  - dhavana
  - causal-lm
  - pretrained

Dhavana-Base-150M

Decoder-only Transformer pretrained from scratch with strong Dhivehi support.

Field Value
Parameters 125,264,640 (~150M)
Non-embedding params 100,688,640
Architecture 16 layers, d_model=768, GQA 12Q/4KV, SwiGLU, RMSNorm, RoPE
Context length 2048
Tokenizer Serialtechlab/dhavana-tok-v0
Final step 12,000
Total unique training tokens 3,706,066,846
Training data mix English 65% / Multilingual 13% / Dhivehi 12% / Math 7% / Code 3%

This is the base (pretrained) model — not instruction-tuned. SFT models will be released separately.