Bifrost 1.2B

A from-scratch 1.2B-parameter translation model for the Nordic languages ↔ English: Swedish (sv), Danish (da), Norwegian Bokmål (nb), Norwegian Nynorsk (nn), Finnish (fi), and Icelandic (is), plus cross-Nordic directions.

On FLORES-200 devtest it beats NLLB-200-3.3B and TranslateGemma-12B on the English→Nordic average — at a fraction of their size.

The teacher of the Bifrost Nordic-translation family from NodeNestor — for a ~3× smaller/faster distilled option see Bifrost Flash 430M.

Results — FLORES-200 devtest, chrF++ (sacrebleu, `word_order=2`, n=500)

Headline — English→Nordic average:

Model	Params	en→Nordic chrF++
Nordic Translator (this model)	1.2B	57.4
NLLB-200-3.3B	3.3B	56.1
TranslateGemma-12B	12B	55.7

Group averages:

Direction group	chrF++
English → Nordic	57.4
Nordic → English	63.6
Nordic ↔ Nordic	54.5
Overall	58.1

Per-direction (chrF++):

Dir	score	Dir	score
en→sv	63.8	sv→en	67.4
en→da	65.4	da→en	69.3
en→nb	58.2	nb→en	64.9
en→nn	57.8	nn→en	68.9
en→fi	50.1	fi→en	55.3
en→is	49.2	is→en	55.8
sv→da	62.5	da→sv	62.7
sv→fi	51.0	fi→sv	50.5
nb→nn	53.0	nn→nb	55.5
fi→da	50.9	is→sv	50.1

Strongest relative to the references on the low-resource directions (Nynorsk, Icelandic). NLLB-3.3B still leads on several →English directions and Finnish.

Usage

The model expects a control-token prompt and is decoded greedily:

[BOS] [<2{tgt_lang}>] {source_token_ids} [<eos_src>]   →   generate until [EOS]

The target-language control token placed right after [BOS] selects the output language — the source language is inferred. Control-token IDs (above the 65000 SentencePiece vocab):

token	id	token	id
`<2en>`	65000	`<2nn>`	65004
`<2sv>`	65001	`<2fi>`	65005
`<2da>`	65002	`<2is>`	65006
`<2nb>`	65003	`<eos_src>`	65007

[BOS]=1, [EOS]=2. Tokenizer: nordic_unigram_65k.model (SentencePiece, 65000 pieces + 8 control tokens = vocab 65008).

The weights ship as model.safetensors, with a self-contained pure-PyTorch implementation in modeling_nordic.py (no training-stack dependencies). Three ways to run it:

1. Standalone (pure torch, KV-cached):

import torch, sentencepiece as spm
from modeling_nordic import NordicTranslator

sp = spm.SentencePieceProcessor(); sp.load("nordic_unigram_65k.model")
LANG = {"en":65000,"sv":65001,"da":65002,"nb":65003,"nn":65004,"fi":65005,"is":65006}

model = NordicTranslator.from_checkpoint("model.safetensors", device="cuda")
ids = model.translate(sp.encode("Hello, how are you?", out_type=int), LANG["sv"])
print(sp.decode(ids))     # -> Hej, hur är det med er?

2. HuggingFace (trust_remote_code):

from transformers import AutoModelForCausalLM
import torch, sentencepiece as spm
sp = spm.SentencePieceProcessor(); sp.load("nordic_unigram_65k.model")
m = AutoModelForCausalLM.from_pretrained(".", trust_remote_code=True,
                                         dtype=torch.bfloat16).cuda().eval()
ids = [1, 65001] + sp.encode("Hello, how are you?", out_type=int) + [65007]   # 65001=<2sv>
out = m.generate(torch.tensor([ids]).cuda(), max_new_tokens=128, do_sample=False, eos_token_id=2)
print(sp.decode([t for t in out[0, len(ids):].tolist() if t < 65000]))

3. vLLM (custom architecture — register the included plugin): see vllm_nordic.py + vllm_pkg/ and example_vllm.py. Install the plugin (pip install -e vllm_pkg) inside a vLLM environment, then serve with --skip-tokenizer-init and feed control-token prompts.

The control-token prompt is [BOS] [<2{tgt}>] {source_ids} [<eos_src>] → generate until [EOS]; decode only ids < 65000. The FLORES numbers above were produced with the batched, KV-cached standalone path.

Model details

Architecture: a grouped-query-attention (GQA) decoder. 18 layers, hidden 2048, FFN 6144 (SwiGLU), 16 query heads / 4 KV heads, head dim 128, RoPE (θ=500000, partial 0.25), RMSNorm, parallel residual, fused QKV. ~1.2B params.
Context length: 4096 tokens (trained and evaluated at 4096; longer inputs truncate).
Precision: bf16.
Vocab: 65008 (nordic_unigram_65k SentencePiece + 8 control tokens).

Training

From scratch. A 120B-token run: a ~19B-token trunk, then **+100B tokens of continued training** (clean data, cosine schedule with a monolingual floor + anneal). The released checkpoint is ~~96B into the long run (~~115B cumulative) — on the cosine tail, so quality ≈ the 100B point.
Data: parallel + monolingual Nordic/English (Wikipedia parallel, DCLM en↔Nordic, Aya cross-lingual, FineWeb-Edu, Nemotron-CC), balanced en↔Nordic blend.
Objective: next-token cross-entropy on the target side.

Limitations

Trained at 4096-token context; longer inputs are truncated.
Finnish and Icelandic (en→) are the weakest directions — lower-resource, morphologically hard.
Greedy decoding; no built-in length/formatting control beyond the prompt.
Not instruction-tuned — it is a dedicated translation model, not a chat model.
May produce occasional off-target output on the hardest low-resource pairs.

Acknowledgments

Tokenizer (nordic_unigram_65k) developed by a collaborator; included here with permission.

Citation

@misc{nodenestor_bifrost_1.2b_2026,
  title  = {Bifrost 1.2B},
  author = {Nilsson, Ludvig},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/NodeNestor/bifrost-1.2b}},
  note   = {NodeNestor}
}

Downloads last month: 11

Safetensors

Model size

1B params

Tensor type

BF16

Collection including NodeNestor/bifrost-1.2b

Bifrost — Nordic translation

Collection

Open Nordic↔English translation models (sv/da/nb/nn/fi/is↔en). Bifrost = the bridge between worlds. • 2 items • Updated 4 days ago