Maasai-English Translation Model

English↔Maasai translation model for low-resource machine translation and language-preservation workflows.

Model Details

  • Repository name: maasai-en-mt
  • Base model: inspect the published checkpoint config or adapter metadata for the exact source model used in that release
  • Fine-tuning recipe: QLoRA / LoRA adapters
  • Target directions: English→Maasai and Maasai→English
  • Paired public dataset: NorthernTribe-Research/maasai-translation-corpus

Training Data

The training recipe in this repo uses the public parallel corpus from data/final_v3:

  • 9,406 total pairs
  • 7,991 train / 707 valid / 708 test
  • 4,703 en→mas and 4,703 mas→en
  • 9,124 gold-tier and 282 silver-tier examples

This release now includes a small open-source supplement layer from public-domain Hollis proverbs and the CC BY 4.0 ASJP Maasai wordlist, in addition to the existing Bible-aligned and curated cultural data.

The raw published dataset stores parallel pairs and metadata. The trainer constructs instruction prompts at runtime when needed, so the model can be trained from either prompt/completion records or plain translation pairs.

Intended Use

  • Research and benchmarking for English↔Maasai translation
  • Language preservation and educational tooling
  • Culturally grounded translation assistance with human review

Not Intended For

  • Legal, medical, or safety-critical translation
  • Unreviewed authoritative translation in public-facing settings
  • Claims of dialect-complete or culturally exhaustive coverage

Limitations

  • Maasai remains a low-resource language, so quality will vary by domain.
  • The corpus is strongest in Bible-aligned and cultural content.
  • Orthographic and dialectal variation are not fully normalized.
  • Native Maa speaker review remains necessary for formal or sensitive use.

Hub Download Metrics

This repository publishes a lightweight meta.yaml file alongside the model card. The file is metadata only and is not a loadable checkpoint. It exists so scaffold releases and early repo states still expose a stable Hub metadata artifact that can act as a download-count anchor before full model weights are uploaded.

When adapter or merged model files are published, the real model artifacts remain the primary release assets. The metadata file is retained to keep the repo machine-readable and to avoid treating placeholder states as if they were runnable weights.

Evaluation

This template intentionally avoids fixed metric claims. When a new checkpoint is published, add the measured BLEU, chrF++, and glossary-sensitive evaluation results for that exact run.

Related Assets

  • Dataset: NorthernTribe-Research/maasai-translation-corpus
  • Space: NorthernTribe-Research/maasai-language-showcase
  • Glossary file used by the app: data/glossary/maasai_glossary.json

Citation

If you publish results based on this model, cite the model repo, the paired dataset, and NorthernTribe-Research.

Publication Status

This repository has been created ahead of the first full model upload. The current local outputs/maasai-en-mt-qlora directory only contains mock placeholder files used to test the deployment pipeline, so this publish intentionally excludes model weights.

Run GPU-backed training, replace the placeholder artifacts with a real adapter or merged checkpoint, and publish again to upload the actual model.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NorthernTribe-Research/maasai-en-mt

Base model

Qwen/Qwen2.5-3B
Finetuned
(1257)
this model

Dataset used to train NorthernTribe-Research/maasai-en-mt

Space using NorthernTribe-Research/maasai-en-mt 1