You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Swahili-English Neural Machine Translation Model (V2)

Model Description

swahili-model-v2 is a Neural Machine Translation (NMT) model designed to translate text from English to Swahili.

This model is a fine-tuned version of the Helsinki-NLP/opus-mt-en-sw Transformer, adapted specifically for high-accuracy translation tasks using a curated parallel corpus.

By leveraging Transfer Learning on a substantial dataset of 281,000 sentence pairs, this model achieves a BLEU score of 41.63 on the validation set.

It demonstrates professional-grade grammatical fluency and robust vocabulary alignment, significantly outperforming baseline models trained from scratch V1.

Dataset Characteristics

The model was trained on a specific subset of the Swahili-English Parallel Corpus. Prior to training, a comprehensive Exploratory Data Analysis was conducted to ensure data quality and alignment.

Sentence Length Distribution

The dataset follows a long-tailed distribution typical of natural language corpora.

Most sentences are between 5 and 20 words long, which is optimal training.

Figure 3: English Sentence Lengths

Figure 4: Swahili Sentence Lengths

Source-Target Alignment

There is a strong linear correlation between English and Swahili sentence lengths.

This indicates a high-quality parallel corpus with few alignment errors.

Figure 5: Length Correlation

The regression line shows a consistent mapping ratio between source and target lengths.

Training Details

Dataset Configuration

Source: Swahili-English Parallel Corpus.
Size: 281,000 sentence pairs.
Split: 90% Training, 10% Validation.
Preprocessing: Tokenization using the Helsinki-NLP SentencePiece tokenizer with dynamic padding.

Performance and Evaluation

The model was evaluated using the BLEU (Bilingual Evaluation Understudy) metric, which measures the similarity between the machine-generated translation and professional human reference translations.

Evaluation Results

BLEU Score: 41.63
Validation Loss: 0.8659

These metrics indicate high translation quality, with the model successfully capturing complex sentence structures rather than performing simple word-for-word substitution.

Training Results Table

The following table summarizes the model's performance metrics across all 5 training epochs.

Epoch	Training Loss	Validation Loss	BLEU Score
1.0	1.0863	1.0334	28.48
2.0	0.9630	0.9337	32.06
3.0	0.8826	0.8913	37.38
4.0	0.8266	0.8708	40.07
5.0	0.8036	0.8659	41.63

Training Metrics

The training process demonstrated stable convergence with no signs of overfitting. The graphs below illustrate the progression of the BLEU score and Training Loss over 5 epochs.

Figure 1: BLEU Score Progression

The model achieved a rapid increase in translation quality, stabilizing above 40 BLEU by the final epoch.

Figure 2: Loss Convergence

Validation loss consistently decreased, confirming that the model effectively generalized to unseen data.

Intended Uses and Limitations

Intended Uses

This model is suitable for general-purpose translation tasks, including:

Educational Tools: Assisting learners in understanding English-Swahili sentence structures.
Content Localization: Translating web content, documentation, or simple narratives into Swahili.
Communication Aids: Facilitating basic written communication across language barriers.
NLP Research: Serving as a baseline for low-resource language modeling experiments.

Limitations

Domain Specificity: The model may struggle with highly technical, medical, or legal jargon that was not present in the training corpus.
Context Length: As a sentence-level translator, it may lose context when translating very long paragraphs as a single block.
Dialect Variations: Swahili has multiple dialects; this model aligns primarily with standard Swahili (Kiswahili Sanifu) and may not accurately capture regional slang or informal variations (Sheng).

Usage

You can use this model directly with the Hugging Face transformers library.

Python Example

from transformers import pipeline

# Load the translation pipeline
translator = pipeline("translation", model="codeshujaaa/swahili-model-v2")

# Define input text
text = "I am learning to speak Swahili today."


# Generate translation
translation = translator(text)
print(translation[0]['translation_text'])

Citation

If you use this model in your work, please cite the original architecture authors and this repository

@misc{Denis Mwangi,
  author = {Denis Mwangi},
  title = {Fine-Tuned Swahili-English Neural Machine Translation Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{[https://huggingface.co/codeshujaaa/swahili-model-v2](https://huggingface.co/codeshujaaa/swahili-model-v2)}}
}

Downloads last month: -

Safetensors

Model size

74.4M params

Tensor type

F32

Model tree for codeshujaaa/swahili-model-v2

Base model

Helsinki-NLP/opus-mt-en-sw

Finetuned

(2)

this model

Evaluation results

Bleu on Swahili Parallel Corpus
self-reported

41.631