|
|
--- |
|
|
language: |
|
|
- en |
|
|
- sw |
|
|
tags: |
|
|
- translation |
|
|
- en-sw |
|
|
- helsinki-nlp |
|
|
- fine-tuned |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- bleu |
|
|
base_model: Helsinki-NLP/opus-mt-en-sw |
|
|
model-index: |
|
|
- name: swahili-model-v2 |
|
|
results: |
|
|
- task: |
|
|
type: translation |
|
|
name: Translation English-to-Swahili |
|
|
dataset: |
|
|
name: Swahili Parallel Corpus |
|
|
type: text |
|
|
metrics: |
|
|
- name: Bleu |
|
|
type: bleu |
|
|
value: 41.6314 |
|
|
--- |
|
|
|
|
|
# Swahili-English Neural Machine Translation Model (V2) |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**swahili-model-v2** is a Neural Machine Translation (NMT) model designed to translate text from English to Swahili. |
|
|
|
|
|
This model is a fine-tuned version of the **Helsinki-NLP/opus-mt-en-sw** Transformer, adapted specifically for high-accuracy translation tasks using a curated parallel corpus. |
|
|
|
|
|
By leveraging Transfer Learning on a substantial dataset of **281,000 sentence pairs**, this model achieves a **BLEU score of 41.63** on the validation set. |
|
|
|
|
|
It demonstrates professional-grade grammatical fluency and robust vocabulary alignment, significantly outperforming [baseline models trained from scratch V1](https://huggingface.co/codeshujaaa/swahili-model-V1). |
|
|
|
|
|
## Dataset Characteristics |
|
|
|
|
|
The model was trained on a specific subset of the Swahili-English Parallel Corpus. |
|
|
Prior to training, a comprehensive Exploratory Data Analysis was conducted to ensure data quality and alignment. |
|
|
|
|
|
### Sentence Length Distribution |
|
|
The dataset follows a long-tailed distribution typical of natural language corpora. |
|
|
|
|
|
Most sentences are between 5 and 20 words long, which is optimal training. |
|
|
|
|
|
**Figure 3: English Sentence Lengths** |
|
|
|
|
|
 |
|
|
|
|
|
**Figure 4: Swahili Sentence Lengths** |
|
|
|
|
|
 |
|
|
|
|
|
### Source-Target Alignment |
|
|
There is a strong linear correlation between English and Swahili sentence lengths. |
|
|
|
|
|
This indicates a high-quality parallel corpus with few alignment errors. |
|
|
|
|
|
**Figure 5: Length Correlation** |
|
|
|
|
|
> *The regression line shows a consistent mapping ratio between source and target lengths.* |
|
|
|
|
|
 |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Dataset Configuration |
|
|
* **Source:** Swahili-English Parallel Corpus. |
|
|
* **Size:** 281,000 sentence pairs. |
|
|
* **Split:** 90% Training, 10% Validation. |
|
|
* **Preprocessing:** Tokenization using the Helsinki-NLP SentencePiece tokenizer with dynamic padding. |
|
|
|
|
|
## Performance and Evaluation |
|
|
|
|
|
The model was evaluated using the **BLEU (Bilingual Evaluation Understudy)** metric, which measures the similarity between the machine-generated translation and professional human reference translations. |
|
|
|
|
|
### Evaluation Results |
|
|
* **BLEU Score:** 41.63 |
|
|
* **Validation Loss:** 0.8659 |
|
|
|
|
|
These metrics indicate high translation quality, with the model successfully capturing complex sentence structures rather than performing simple word-for-word substitution. |
|
|
|
|
|
### Training Results Table |
|
|
The following table summarizes the model's performance metrics across all 5 training epochs. |
|
|
|
|
|
| Epoch | Training Loss | Validation Loss | BLEU Score | |
|
|
| :--- | :--- | :--- | :--- | |
|
|
| 1.0 | 1.0863 | 1.0334 | 28.48 | |
|
|
| 2.0 | 0.9630 | 0.9337 | 32.06 | |
|
|
| 3.0 | 0.8826 | 0.8913 | 37.38 | |
|
|
| 4.0 | 0.8266 | 0.8708 | 40.07 | |
|
|
| **5.0** | **0.8036** | **0.8659** | **41.63** | |
|
|
|
|
|
### Training Metrics |
|
|
The training process demonstrated stable convergence with no signs of overfitting. The graphs below illustrate the progression of the BLEU score and Training Loss over 5 epochs. |
|
|
|
|
|
**Figure 1: BLEU Score Progression** |
|
|
> *The model achieved a rapid increase in translation quality, stabilizing above 40 BLEU by the final epoch.* |
|
|
|
|
|
 |
|
|
|
|
|
**Figure 2: Loss Convergence** |
|
|
> *Validation loss consistently decreased, confirming that the model effectively generalized to unseen data.* |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
## Intended Uses and Limitations |
|
|
|
|
|
### Intended Uses |
|
|
This model is suitable for general-purpose translation tasks, including: |
|
|
* **Educational Tools:** Assisting learners in understanding English-Swahili sentence structures. |
|
|
* **Content Localization:** Translating web content, documentation, or simple narratives into Swahili. |
|
|
* **Communication Aids:** Facilitating basic written communication across language barriers. |
|
|
* **NLP Research:** Serving as a baseline for low-resource language modeling experiments. |
|
|
|
|
|
### Limitations |
|
|
* **Domain Specificity:** The model may struggle with highly technical, medical, or legal jargon that was not present in the training corpus. |
|
|
* **Context Length:** As a sentence-level translator, it may lose context when translating very long paragraphs as a single block. |
|
|
* **Dialect Variations:** Swahili has multiple dialects; this model aligns primarily with standard Swahili (Kiswahili Sanifu) and may not accurately capture regional slang or informal variations (Sheng). |
|
|
## Usage |
|
|
|
|
|
You can use this model directly with the Hugging Face `transformers` library. |
|
|
|
|
|
### Python Example |
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load the translation pipeline |
|
|
translator = pipeline("translation", model="codeshujaaa/swahili-model-v2") |
|
|
|
|
|
# Define input text |
|
|
text = "I am learning to speak Swahili today." |
|
|
|
|
|
|
|
|
# Generate translation |
|
|
translation = translator(text) |
|
|
print(translation[0]['translation_text']) |
|
|
|
|
|
``` |
|
|
|
|
|
Citation |
|
|
|
|
|
If you use this model in your work, please cite the original architecture authors and this repository |
|
|
``` |
|
|
@misc{Denis Mwangi, |
|
|
author = {Denis Mwangi}, |
|
|
title = {Fine-Tuned Swahili-English Neural Machine Translation Model}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{[https://huggingface.co/codeshujaaa/swahili-model-v2](https://huggingface.co/codeshujaaa/swahili-model-v2)}} |
|
|
} |
|
|
``` |