YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🌍 Wolof ↔ French Translator (NLLB-based) πŸ‡ΈπŸ‡³πŸ‡«πŸ‡·

Model BLEU Score

This is a Wolof ↔ French translation model based on Meta AI’s NLLB (No Language Left Behind) architecture, fine-tuned specifically for high-quality bilingual translation in both directions.

🧠 Developed by GalsenAI, a Senegal-based open initiative promoting artificial intelligence for African languages.


🧠 About the Model

  • Base architecture: facebook/nllb-200-distilled-600M
  • Supported languages: Wolof (wo) ↔ French (fr)
  • Purpose: To enable reliable translation for real-world applications like education, healthcare, and public services.
  • BLEU score: 13 (on a custom Wolof-French evaluation set)

πŸš€ How to Use

πŸ“¦ Install dependencies

pip install transformers torch

πŸ” Example usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
checkpoint = "galsenai/wolofToFrenchTranslator_nllb"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device)

def predict(text, lang):
    if lang.lower() == "wo":
        prefix = "translate Wolof to French: "
    elif lang.lower() == "fr":
        prefix = "translate French to Wolof: "
    else:
        raise ValueError("Invalid language code")
    inputs = tokenizer(prefix + text, return_tensors="pt").to(device)
    translated_tokens = model.generate(**inputs, max_length=30)
    return tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]

# Example
result = predict("Naka nga def?", lang="wo")
print(result) # "Comment Γ§a va?"

πŸ“Š Performance

Translation Direction BLEU Score
Wolof β†’ French 13
French β†’ Wolof 13

Note: BLEU score is an indicator of translation quality. Further training will improve results.


πŸ“š Training Data

The model was fine-tuned using a mix of:

  • Manually aligned Wolof–French parallel corpora
  • Public resources (Common Voice, Wikipedia, administrative documents, etc.)
  • Custom datasets collected via LinguaSprint Africa, a crowdsourcing platform for African languages.

🀝 Contributing

This model is maintained by GalsenAI.

If you’d like to:

  • Help improve this model
  • Contribute more Wolof/French data
  • Build NLP tools for African languages

πŸ‘‰ Join us at github.com/GalsenAI or reach out to the team!


πŸ“œ License

MIT License β€” free to use for research, education, and social applications. πŸ“£ Attribution requested: GalsenAI (2025)

Downloads last month
59
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support