🏔️ DevLake: M2M-100 (418M) for Russian-Bashkir
| Current Model | Architecture | Focus |
|---|---|---|
| 🔴 Large Model | NLLB-1.3B (QLoRA) | Best Quality (SOTA) |
| 🟡 Medium (This Model) | M2M-100 (LoRA) | Balanced / Baseline |
| 🟢 Small Model | MarianMT (77M) | Fastest / CPU |
Model Description
This is the Medium-sized model from the DevLake submission for LoResMT 2026. It serves as a robust baseline, balancing performance and resource usage. It was fine-tuned using LoRA on the facebook/m2m100_418M checkpoint.
- Score: 48.80 CHRF++
- Code: GitHub Repository
🚀 Usage
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel
base_model = "facebook/m2m100_418M"
adapter_model = "Voldis/m2m100-rus-bak"
# Load Model
model = AutoModelForSeq2SeqLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_model)
tokenizer = AutoTokenizer.from_pretrained(adapter_model)
# Set Language
tokenizer.src_lang = "ru"
target_lang_id = tokenizer.get_lang_id("ba")
text = "Где находится библиотека?"
inputs = tokenizer(text, return_tensors="pt").to("cuda")
with torch.no_grad():
generated_tokens = model.generate(
**inputs,
forced_bos_token_id=target_lang_id
)
print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])
🛠️ Training Details
- Hardware: Trained on a single NVIDIA RTX 3080.
📚 Citation
@article{devlake2026loresmt,
title={DevLake at LoResMT 2026: The Impact of Pre-training and Model Scale on Russian-Bashkir Low-Resource Translation},
author={DevLake Team},
year={2026}
}
Model tree for Voldis/m2m100-rus-bak
Base model
facebook/m2m100_418M