You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Swahili-English Neural Machine Translation Model (V2)

Model Description

swahili-model-v2 is a Neural Machine Translation (NMT) model designed to translate text from English to Swahili.

This model is a fine-tuned version of the Helsinki-NLP/opus-mt-en-sw Transformer, adapted specifically for high-accuracy translation tasks using a curated parallel corpus.

By leveraging Transfer Learning on a substantial dataset of 281,000 sentence pairs, this model achieves a BLEU score of 41.63 on the validation set.

It demonstrates professional-grade grammatical fluency and robust vocabulary alignment, significantly outperforming baseline models trained from scratch V1.

Dataset Characteristics

The model was trained on a specific subset of the Swahili-English Parallel Corpus. Prior to training, a comprehensive Exploratory Data Analysis was conducted to ensure data quality and alignment.

Sentence Length Distribution

The dataset follows a long-tailed distribution typical of natural language corpora.

Most sentences are between 5 and 20 words long, which is optimal training.

Figure 3: English Sentence Lengths

image

Figure 4: Swahili Sentence Lengths

image

Source-Target Alignment

There is a strong linear correlation between English and Swahili sentence lengths.

This indicates a high-quality parallel corpus with few alignment errors.

Figure 5: Length Correlation

The regression line shows a consistent mapping ratio between source and target lengths.

image

Training Details

Dataset Configuration

  • Source: Swahili-English Parallel Corpus.
  • Size: 281,000 sentence pairs.
  • Split: 90% Training, 10% Validation.
  • Preprocessing: Tokenization using the Helsinki-NLP SentencePiece tokenizer with dynamic padding.

Performance and Evaluation

The model was evaluated using the BLEU (Bilingual Evaluation Understudy) metric, which measures the similarity between the machine-generated translation and professional human reference translations.

Evaluation Results

  • BLEU Score: 41.63
  • Validation Loss: 0.8659

These metrics indicate high translation quality, with the model successfully capturing complex sentence structures rather than performing simple word-for-word substitution.

Training Results Table

The following table summarizes the model's performance metrics across all 5 training epochs.

Epoch Training Loss Validation Loss BLEU Score
1.0 1.0863 1.0334 28.48
2.0 0.9630 0.9337 32.06
3.0 0.8826 0.8913 37.38
4.0 0.8266 0.8708 40.07
5.0 0.8036 0.8659 41.63

Training Metrics

The training process demonstrated stable convergence with no signs of overfitting. The graphs below illustrate the progression of the BLEU score and Training Loss over 5 epochs.

Figure 1: BLEU Score Progression

The model achieved a rapid increase in translation quality, stabilizing above 40 BLEU by the final epoch.

image

Figure 2: Loss Convergence

Validation loss consistently decreased, confirming that the model effectively generalized to unseen data.

image

Intended Uses and Limitations

Intended Uses

This model is suitable for general-purpose translation tasks, including:

  • Educational Tools: Assisting learners in understanding English-Swahili sentence structures.
  • Content Localization: Translating web content, documentation, or simple narratives into Swahili.
  • Communication Aids: Facilitating basic written communication across language barriers.
  • NLP Research: Serving as a baseline for low-resource language modeling experiments.

Limitations

  • Domain Specificity: The model may struggle with highly technical, medical, or legal jargon that was not present in the training corpus.
  • Context Length: As a sentence-level translator, it may lose context when translating very long paragraphs as a single block.
  • Dialect Variations: Swahili has multiple dialects; this model aligns primarily with standard Swahili (Kiswahili Sanifu) and may not accurately capture regional slang or informal variations (Sheng).

Usage

You can use this model directly with the Hugging Face transformers library.

Python Example

from transformers import pipeline

# Load the translation pipeline
translator = pipeline("translation", model="codeshujaaa/swahili-model-v2")

# Define input text
text = "I am learning to speak Swahili today."


# Generate translation
translation = translator(text)
print(translation[0]['translation_text'])

Citation

If you use this model in your work, please cite the original architecture authors and this repository

@misc{Denis Mwangi,
  author = {Denis Mwangi},
  title = {Fine-Tuned Swahili-English Neural Machine Translation Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{[https://huggingface.co/codeshujaaa/swahili-model-v2](https://huggingface.co/codeshujaaa/swahili-model-v2)}}
}
Downloads last month
2
Safetensors
Model size
74.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codeshujaaa/swahili-model-v2

Finetuned
(1)
this model

Evaluation results