Bifrost Flash 430M

A fast, compact 430M translation model for the Nordic languages ↔ English (sv, da, nb, nn, fi, is ↔ en), distilled from Bifrost 1.2B via top-32 logit (KL) distillation. ~⅓ the size of the teacher — the "flash" option when you want Nordic MT cheap and quick.

Part of the Bifrost Nordic-translation family from NodeNestor. Same tokenizer and prompt format as the teacher.

Results — FLORES-200 devtest, chrF++ (sacrebleu, n=200/direction)

Overall chrF++ = 54.5 — closing ~60% of the gap to the 1.2B teacher (58.1) at ~⅓ the parameters.

Direction group	Flash 430M	Teacher 1.2B
English → Nordic	53.1	57.4
Nordic → English	60.9	63.6
Nordic ↔ Nordic	50.7	54.5
Overall	54.5	58.1

Per-direction (chrF++):

Dir	score	Dir	score
en→sv	61.5	sv→en	65.2
en→da	62.2	da→en	66.0
en→nb	55.6	nb→en	63.9
en→nn	55.0	nn→en	67.5
en→fi	42.7	fi→en	49.9
en→is	41.8	is→en	52.9

Strong into-English (50–68) and across Scandinavian pairs. Weakest out of English into Finnish & Icelandic (the low-resource legs), with elevated off-target there.

Usage

The weights ship as model.safetensors with a self-contained pure-PyTorch implementation in modeling_flash.py (no external deps beyond torch). The prompt is a control-token format — [BOS] [<2{tgt}>] {source_ids} [<eos_src>] → generate until [EOS]; decode only ids < 65000.

Standalone:

import torch, sentencepiece as spm
from modeling_flash import NordicFlash
sp = spm.SentencePieceProcessor(); sp.load("nordic_unigram_65k.model")
LANG = {"en":65000,"sv":65001,"da":65002,"nb":65003,"nn":65004,"fi":65005,"is":65006}
m = NordicFlash.from_checkpoint("model.safetensors", device="cuda")
print(sp.decode(m.translate(sp.encode("Hello, how are you?", out_type=int), LANG["sv"])))
# -> Hej, hur är du idag?

HuggingFace (trust_remote_code):

from transformers import AutoModelForCausalLM
import torch, sentencepiece as spm
sp = spm.SentencePieceProcessor(); sp.load("nordic_unigram_65k.model")
m = AutoModelForCausalLM.from_pretrained(".", trust_remote_code=True, dtype=torch.bfloat16).cuda().eval()
ids = [1, 65001] + sp.encode("Hello, how are you?", out_type=int) + [65007]   # 65001=<2sv>
out = m.generate(torch.tensor([ids]).cuda(), max_new_tokens=128, do_sample=False, eos_token_id=2)
print(sp.decode([t for t in out[0, len(ids):].tolist() if t < 65000]))

Control-token ids: <2en>=65000, <2sv>=65001, <2da>=65002, <2nb>=65003, <2nn>=65004, <2fi>=65005, <2is>=65006, <eos_src>=65007; [BOS]=1, [EOS]=2. Run in bf16.

Model details

Hybrid decoder, ~430M params. 18 layers in a [dynamic_conv, dynamic_conv, gqa]×6 pattern: data-dependent causal depthwise convolution (local mixing) interleaved with grouped-query attention every 3rd layer (global mixing).
DynaConv layers: per-token softmax kernel (14 taps, 80 kernels × 16 channels), silu gate.
GQA layers: 16 query / 4 KV heads, head_dim 80, partial rotary (first 25%).
SwiGLU FFN (3584), RMSNorm, parallel residual, hidden 1280, tied embeddings.
Context 4096, bf16, vocab 65008 (nordic_unigram_65k SentencePiece).

Training

Distilled from Bifrost 1.2B via full-probability top-32 logit KL.
Data (for the teacher): parallel + monolingual Nordic/English (Wikipedia parallel, DCLM en↔Nordic, Aya cross-lingual, FineWeb-Edu, Nemotron-CC).

Limitations

Smaller/faster than the teacher → lower quality, especially en→Finnish / Icelandic (elevated off-target there).
4096-token context; greedy decoding; not instruction-tuned.

Acknowledgments

Tokenizer (nordic_unigram_65k) developed by a collaborator; included here with permission.
Distilled from Bifrost 1.2B.

Citation

@misc{nodenestor_bifrost_flash_2026,
  title  = {Bifrost Flash 430M},
  author = {Nilsson, Ludvig},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/NodeNestor/bifrost-flash-430m}},
  note   = {NodeNestor; distilled from Bifrost 1.2B}
}

Downloads last month: 10

Safetensors

Model size

0.4B params

Tensor type

BF16

Collection including NodeNestor/bifrost-flash-430m

Bifrost — Nordic translation

Collection

Open Nordic↔English translation models (sv/da/nb/nn/fi/is↔en). Bifrost = the bridge between worlds. • 2 items • Updated 4 days ago