🏔️ DevLake: M2M-100 (418M) for Russian-Bashkir

Current Model Architecture Focus
🔴 Large Model NLLB-1.3B (QLoRA) Best Quality (SOTA)
🟡 Medium (This Model) M2M-100 (LoRA) Balanced / Baseline
🟢 Small Model MarianMT (77M) Fastest / CPU

Model Description

This is the Medium-sized model from the DevLake submission for LoResMT 2026. It serves as a robust baseline, balancing performance and resource usage. It was fine-tuned using LoRA on the facebook/m2m100_418M checkpoint.

🚀 Usage

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel

base_model = "facebook/m2m100_418M"
adapter_model = "Voldis/m2m100-rus-bak"

# Load Model
model = AutoModelForSeq2SeqLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_model)
tokenizer = AutoTokenizer.from_pretrained(adapter_model)

# Set Language
tokenizer.src_lang = "ru"
target_lang_id = tokenizer.get_lang_id("ba")

text = "Где находится библиотека?"
inputs = tokenizer(text, return_tensors="pt").to("cuda")

with torch.no_grad():
    generated_tokens = model.generate(
        **inputs,
        forced_bos_token_id=target_lang_id
    )

print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])

🛠️ Training Details

  • Hardware: Trained on a single NVIDIA RTX 3080.

📚 Citation

@article{devlake2026loresmt,
  title={DevLake at LoResMT 2026: The Impact of Pre-training and Model Scale on Russian-Bashkir Low-Resource Translation},
  author={DevLake Team},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Voldis/m2m100-rus-bak

Adapter
(10)
this model

Dataset used to train Voldis/m2m100-rus-bak