You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

localenlp-en-hau

Fine-tuned MarianMT model for English-to-Hausa translation.

Model Card for LOCALENLP/eng-hau

This is a machine translation model for English โ†’ Hausa, developed by the LOCALENLP organization.
It is based on the pretrained Helsinki-NLP/opus-mt-en-mul MarianMT model and fine-tuned on a custom parallel corpus of ~15k sentence pairs.


Model Details

Model Description

  • Developed by: Mgolo
  • Funded by [optional]: N/A
  • Shared by: Mgolo
  • Model type: Seq2Seq Transformer (MarianMT)
  • Languages: English โ†’ Hausa
  • License: MIT
  • Finetuned from model: Helsinki-NLP/opus-mt-en-mul

Model Sources


Uses

Direct Use

  • Translate English text into Hausa for research, education, and communication.
  • Useful for low-resource NLP tasks, digital content creation, and cultural preservation.

Downstream Use

  • Can be integrated into translation apps, chatbots, and education platforms.
  • Serves as a base for further fine-tuning on domain-specific Wolof corpora.

Out-of-Scope Use

  • Note Suitable for legal and medical translations (e.g., contracts, prescriptions, medical records).
  • Mistranslations may occur, like any automated system.
  • Review recommended as the model can sometimes mistranslate.

Bias, Risks, and Limitations

  • Training data is from a custom collection of parallel sentences (~15k pairs).
  • Some informal or culturally nuanced expressions may not be accurately translated.
  • Wolof spelling and grammar variation (Latin script) may lead to inconsistencies.
  • Model may underperform on domain-specific or long, complex texts.

Recommendations

  • Use human post-editing for high-stakes use cases.
  • Evaluate performance on your target domain before deployment.

How to Get Started with the Model

from transformers import MarianTokenizer, AutoModelForSeq2SeqLM

model_name = "LOCALENLP/eng_hau"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "Good evening, how was your day?"
inputs = tokenizer(">>hau<< " + text, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("English:", text)
print("Hausa:", translation)
Downloads last month
2
Safetensors
Model size
77M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Evaluation results