🏔️ DevLake: NLLB-200 (1.3B) for Russian-Bashkir Translation
| Current Model | Architecture | Focus |
|---|---|---|
| 🔴 Large (This Model) | NLLB-1.3B (QLoRA) | Best Quality (SOTA) |
| 🟡 Medium Model | M2M-100 (418M) | Balanced |
| 🟢 Small Model | MarianMT (77M) | Fastest / CPU |
Model Description
This is the High-Performance model submitted by Team DevLake for the LoResMT 2026 Shared Task. It achieved the highest score in our experiments (52.67 CHRF++), significantly outperforming standard baselines.
It is a fine-tuned version of NLLB-200-1.3B-Distilled, trained using QLoRA (4-bit quantization) on a rigorously filtered subset of the Russian-Bashkir parallel corpus.
- Paper/Code: GitHub Repository
- Developed by: DevLake Team
- Language Pair: Russian (
rus_Cyrl) $\to$ Bashkir (bak_Cyrl)
🏆 Performance
| Model | Size | CHRF++ | Note |
|---|---|---|---|
| DevLake NLLB | 1.3B | 52.67 | Best Morphology & Syntax |
| DevLake M2M | 418M | 48.80 | Good baseline |
| DevLake Marian | 77M | 43.15 | Fast but hallucinates |
🚀 Usage
Due to the use of QLoRA, this model requires peft and bitsandbytes.
pip install torch transformers peft bitsandbytes accelerate
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel
# 1. Load Base Model (NLLB)
base_model_id = "facebook/nllb-200-1.3B"
model = AutoModelForSeq2SeqLM.from_pretrained(
base_model_id,
load_in_4bit=True,
device_map="auto"
)
# 2. Load DevLake Adapters
adapter_model_id = "Voldis/nllb-1.3b-rus-bak"
model = PeftModel.from_pretrained(model, adapter_model_id)
tokenizer = AutoTokenizer.from_pretrained(adapter_model_id)
# 3. Inference
text = "Утром я выпил чашку кофе."
inputs = tokenizer(text, return_tensors="pt").to("cuda")
with torch.no_grad():
generated_tokens = model.generate(
**inputs,
forced_bos_token_id=tokenizer.convert_tokens_to_ids("bak_Cyrl"),
max_length=128
)
print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])
🛠️ Training Details
- Filtering: We used a BERT-based metric to select only the top 486k sentence pairs (Semantic Similarity $\ge 0.80$).
- Method: QLoRA (Rank=64, Alpha=64).
- Hardware: Trained on a single NVIDIA RTX 3080.
📚 Citation
@article{devlake2026loresmt,
title={DevLake at LoResMT 2026: The Impact of Pre-training and Model Scale on Russian-Bashkir Low-Resource Translation},
author={DevLake Team},
year={2026}
}
- Downloads last month
- 1
Model tree for Voldis/nllb-1.3b-rus-bak
Base model
facebook/nllb-200-1.3BDataset used to train Voldis/nllb-1.3b-rus-bak
Evaluation results
- CHRF++ on LoResMT 2026self-reported52.670