AvarNLP — NLLB-600M Fine-tuned for Avar-Turkish Translation
The world's first machine translation model for the Avar language (МагIарул мацI).
Model Description
This model is a LoRA fine-tuned version of Meta's NLLB-200-distilled-600M, trained on an evolutionarily generated Avar-Turkish parallel corpus.
- Base model: facebook/nllb-200-distilled-600M
- Fine-tuning method: LoRA (Low-Rank Adaptation) via PEFT
- Training data: Evolved from ~1K seed pairs using genetic algorithms
- Directions: Avar → Turkish, Turkish → Avar
About the Avar Language
| Name | Avar (МагIарул мацI) |
| Family | Northeast Caucasian (Nakh-Dagestanian) |
| Speakers | ~800,000 |
| Region | Dagestan (Russia), Turkey, Europe |
| UNESCO Status | Definitely Endangered |
| Digital Resources | Virtually none |
Status
🚧 Under active development — model not yet trained. Check back soon or follow this repo for updates.
Links
- Code: github.com/Burtinsaw/AvarNLP
- Dataset: Burtinsaw/avar-turkish-parallel
- Project: MagaruLaw.com
Citation
@software{avarnlp2026,
title = {AvarNLP: Self-Evolving AI for the Endangered Avar Language},
author = {Arif Akgun},
url = {https://github.com/Burtinsaw/AvarNLP},
year = {2026},
license = {MIT}
}
- Downloads last month
- -
Model tree for Burtinsaw/avarnlp-nllb-600m
Base model
facebook/nllb-200-distilled-600M