You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Swahili-English Transformer

This model is a custom Transformer trained from scratch to translate English text into Swahili. It represents the Version 1 iteration of an ongoing project to develop efficient, low-resource translation models for African languages.

The model was trained using Mixed Precision to optimize training speed within strict compute constraints. It utilizes a standard Transformer architecture with custom BPE tokenization.

Technical Specifications

  • Embedding Size: 256
  • Hidden Layers: 3 Encoder / 3 Decoder
  • Attention Heads: 8
  • Vocab Size: 32,000
  • Max Sequence Length: 64
  • Dropout: 0.3

Intended Use

  • Primary Use: Research, educational demonstrations of NMT pipelines, and understanding low-resource training dynamics.
  • Secondary Use: Translation of short, simple English sentences to Swahili.
  • Limitations: Not recommended for complex documents or medical/legal text due to V1 repetition artifacts.

Training Data

The model was trained on the Swahili Parallel Dataset sourced from web scrapping.

  • Dataset Size: 280,000 sentence pairs
  • Preprocessing: Filtered for length 64 tokens and tokenized using a custom BPE tokenizer.
  • Split: 80% Train / 10% Validation / 10% Test

Performance Metrics

Evaluated on a held-out test set of 500 sentences.

Metric Score Interpretation
Perplexity 18.40 Excellent. The model has successfully learned the grammar and structure of Swahili.
METEOR 0.3772 Strong. Indicates good semantic understanding (meaning is preserved).
chrF 36.05 Good character-level overlap with reference text.
BLEU 6.96 Low due to repetition artifacts from greedy decoding (e.g., repeating "na na").

Analysis: The high Perplexity and METEOR scores confirm the model understands the language well. The low BLEU score is primarily a result of "repetition loops" during generation, which is being addressed in V2.

Limitations

  1. Repetition: The current greedy decoder can fall into loops e.g., repeating the same word.
  2. Short Length: Optimized for sentences under 64 tokens.
  3. Experimental: This is a V1 research preview; output fluency varies.

How to Use

Since this is a custom PyTorch model, you must define the architecture class before loading the weights.

from huggingface_hub import hf_hub_download
import torch
import importlib.util
import sys
import os

# Download  weights
script_path = hf_hub_download(repo_id="codeshujaaa/swahili-model-V1", filename="model.py")
weights_path = hf_hub_download(repo_id="codeshujaaa/swahili-model-V1", filename="best_model.pt")

# Helper func
spec = importlib.util.spec_from_file_location("model", script_path)
model_module = importlib.util.module_from_spec(spec)
sys.modules["model"] = model_module
spec.loader.exec_module(model_module)

# Load Model 
from model import load_model

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = load_model(weights_path, device=device)

print(f"Swahili Transformer V1 loaded on {device}!")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support