Swahili-English Transformer
This model is a custom Transformer trained from scratch to translate English text into Swahili. It represents the Version 1 iteration of an ongoing project to develop efficient, low-resource translation models for African languages.
The model was trained using Mixed Precision to optimize training speed within strict compute constraints. It utilizes a standard Transformer architecture with custom BPE tokenization.
Technical Specifications
- Embedding Size: 256
- Hidden Layers: 3 Encoder / 3 Decoder
- Attention Heads: 8
- Vocab Size: 32,000
- Max Sequence Length: 64
- Dropout: 0.3
Intended Use
- Primary Use: Research, educational demonstrations of NMT pipelines, and understanding low-resource training dynamics.
- Secondary Use: Translation of short, simple English sentences to Swahili.
- Limitations: Not recommended for complex documents or medical/legal text due to V1 repetition artifacts.
Training Data
The model was trained on the Swahili Parallel Dataset sourced from web scrapping.
- Dataset Size: 280,000 sentence pairs
- Preprocessing: Filtered for length 64 tokens and tokenized using a custom BPE tokenizer.
- Split: 80% Train / 10% Validation / 10% Test
Performance Metrics
Evaluated on a held-out test set of 500 sentences.
| Metric | Score | Interpretation |
|---|---|---|
| Perplexity | 18.40 | Excellent. The model has successfully learned the grammar and structure of Swahili. |
| METEOR | 0.3772 | Strong. Indicates good semantic understanding (meaning is preserved). |
| chrF | 36.05 | Good character-level overlap with reference text. |
| BLEU | 6.96 | Low due to repetition artifacts from greedy decoding (e.g., repeating "na na"). |
Analysis: The high Perplexity and METEOR scores confirm the model understands the language well. The low BLEU score is primarily a result of "repetition loops" during generation, which is being addressed in V2.
Limitations
- Repetition: The current greedy decoder can fall into loops e.g., repeating the same word.
- Short Length: Optimized for sentences under 64 tokens.
- Experimental: This is a V1 research preview; output fluency varies.
How to Use
Since this is a custom PyTorch model, you must define the architecture class before loading the weights.
from huggingface_hub import hf_hub_download
import torch
import importlib.util
import sys
import os
# Download weights
script_path = hf_hub_download(repo_id="codeshujaaa/swahili-model-V1", filename="model.py")
weights_path = hf_hub_download(repo_id="codeshujaaa/swahili-model-V1", filename="best_model.pt")
# Helper func
spec = importlib.util.spec_from_file_location("model", script_path)
model_module = importlib.util.module_from_spec(spec)
sys.modules["model"] = model_module
spec.loader.exec_module(model_module)
# Load Model
from model import load_model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = load_model(weights_path, device=device)
print(f"Swahili Transformer V1 loaded on {device}!")