Domofon-v1-0.8b-base

A 0.9B parameter bilingual (Russian / English) base language model pretrained from scratch on a 660B token corpus.

Model Details

Architecture Qwen3 (dense decoder-only transformer)
Parameters 0.9B (883M unique)
Hidden size 1024
Layers 40
Attention heads 16 (8 KV heads, GQA)
Head dim 64
FFN dim 4096
Vocab size 248,072
Context length 32,768
Precision float16

Training

  • Pretrained from random initialization — no upstream weights were used
  • Training corpus: 660B tokens, ~50/50 English and Russian
  • Training framework: MaxText on Google Cloud TPU v5e-64
  • This is a base model — no SFT, no chat tuning, no RLHF

Intended Use

This is a base pretrained model intended for research and as a foundation for downstream fine-tuning. It is not an instruction-following or chat model.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("domofon/Domofon-v1-0.8b-base")
tokenizer = AutoTokenizer.from_pretrained("domofon/Domofon-v1-0.8b-base")

inputs = tokenizer("Москва — столица", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

  • Base model only — will not follow instructions or engage in dialogue without fine-tuning
  • Training data mix is 50/50 EN/RU; performance on other languages is not evaluated
  • No safety alignment has been applied
Downloads last month
22
Safetensors
Model size
1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for domofon/Domofon-v1-0.8b-base

Quantizations
1 model