🏔️ DevLake: NLLB-200 (1.3B) for Russian-Bashkir Translation

Current Model	Architecture	Focus
🔴 Large (This Model)	NLLB-1.3B (QLoRA)	Best Quality (SOTA)
🟡 Medium Model	M2M-100 (418M)	Balanced
🟢 Small Model	MarianMT (77M)	Fastest / CPU

Model Description

This is the High-Performance model submitted by Team DevLake for the LoResMT 2026 Shared Task. It achieved the highest score in our experiments (52.67 CHRF++), significantly outperforming standard baselines.

It is a fine-tuned version of NLLB-200-1.3B-Distilled, trained using QLoRA (4-bit quantization) on a rigorously filtered subset of the Russian-Bashkir parallel corpus.

Paper/Code: GitHub Repository
Developed by: DevLake Team
Language Pair: Russian (rus_Cyrl) $\to$ Bashkir (bak_Cyrl)

🏆 Performance

Model	Size	CHRF++	Note
DevLake NLLB	1.3B	52.67	Best Morphology & Syntax
DevLake M2M	418M	48.80	Good baseline
DevLake Marian	77M	43.15	Fast but hallucinates

🚀 Usage

Due to the use of QLoRA, this model requires peft and bitsandbytes.

pip install torch transformers peft bitsandbytes accelerate

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel

# 1. Load Base Model (NLLB)
base_model_id = "facebook/nllb-200-1.3B"
model = AutoModelForSeq2SeqLM.from_pretrained(
    base_model_id,
    load_in_4bit=True,
    device_map="auto"
)

# 2. Load DevLake Adapters
adapter_model_id = "Voldis/nllb-1.3b-rus-bak"
model = PeftModel.from_pretrained(model, adapter_model_id)
tokenizer = AutoTokenizer.from_pretrained(adapter_model_id)

# 3. Inference
text = "Утром я выпил чашку кофе."
inputs = tokenizer(text, return_tensors="pt").to("cuda")

with torch.no_grad():
    generated_tokens = model.generate(
        **inputs,
        forced_bos_token_id=tokenizer.convert_tokens_to_ids("bak_Cyrl"),
        max_length=128
    )

print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])

🛠️ Training Details

Filtering: We used a BERT-based metric to select only the top 486k sentence pairs (Semantic Similarity $\ge 0.80$).
Method: QLoRA (Rank=64, Alpha=64).
Hardware: Trained on a single NVIDIA RTX 3080.

📚 Citation

@article{devlake2026loresmt,
  title={DevLake at LoResMT 2026: The Impact of Pre-training and Model Scale on Russian-Bashkir Low-Resource Translation},
  author={DevLake Team},
  year={2026}
}

Downloads last month: 1

Model tree for Voldis/nllb-1.3b-rus-bak

Base model

facebook/nllb-200-1.3B

Adapter

(14)

this model

Dataset used to train Voldis/nllb-1.3b-rus-bak

Evaluation results

CHRF++ on LoResMT 2026
self-reported

52.670