Model Overview

Turaco-mt-fr-gh is a specialized neural machine translation model fine-tuned for high-quality translation from French to Ghomálá.

This model is part of the Turaco family, an initiative focused on advancing translation capabilities for low-resource and underrepresented African languages. While large-scale multilingual models provide strong general foundations, they often lack depth and fluency when applied to specific low-resource languages. This project addresses that gap through targeted fine-tuning on curated parallel data.

Built on top of NLLB-200, Turaco-mt-fr-gh leverages multilingual transfer learning to produce more accurate, fluent, and context-aware translations into Ghomálá.

Model Details

  • Developed by: fotiecodes
  • Model type: Sequence-to-Sequence Transformer (Multilingual NMT)
  • License: Apache-2.0
  • Base model: facebook/nllb-200
  • Task: Machine Translation (French → Ghomálá)
  • Language(s): French (fr), Ghomálá (gh)

Intended Use

This model is designed for:

  • Translating French text into Ghomálá
  • Supporting localization for Cameroonian and regional applications
  • Experimentation with low-resource language translation
  • Research on multilingual transfer learning and adaptation

Training Data

The model was fine-tuned on a parallel dataset of French–Ghomálá sentence pairs.

Key characteristics:

  • High-quality aligned sentence pairs
  • Focus on conversational and general-purpose language
  • Cleaned and normalized text to reduce noise
  • Balanced examples to improve consistency in output

Given the low-resource nature of Ghomálá, dataset quality and consistency were prioritized over sheer size.

Training Procedure

The model was fine-tuned using supervised learning on parallel translation data.

Key aspects:

  • Initialized from NLLB-200

  • Standard sequence-to-sequence training with source-target pairs

  • Tokenization handled using the pretrained NLLB tokenizer

  • Optimization focused on adapting the model to:

    • Ghomálá vocabulary and structure
    • French → Ghomálá alignment
    • Improved fluency and coherence

The training process leverages NLLB’s multilingual representations, allowing the model to generalize better despite limited data.

Evaluation

Evaluation was primarily qualitative, focusing on:

  • Fluency in Ghomálá
  • Semantic correctness of translations
  • Consistency in maintaining the target language

Preliminary results indicate:

📊 Results and evals:

  • French → Ghomala: BLEU=4.9 | chrF2=19.1
  • Ghomala → French: BLEU=10.8 | chrF2=30.5

Note:

The model shows stronger performance when translating from Ghomala to French than the reverse direction. However, overall scores (BLEU and chrF2) indicate that translation quality is still limited, especially for French → Ghomala. These results suggest the model is better at understanding Ghomala than generating it, and further training data or fine-tuning would be needed for production-level performance.

Limitations

  • Performance depends heavily on dataset size and diversity

  • May struggle with:

    • Technical or domain-specific vocabulary
    • Rare linguistic constructions
  • Not optimized for reverse translation (Ghomálá → French)

  • As with most neural MT systems, outputs may occasionally:

    • Be inconsistent
    • Contain minor hallucinations or approximations

Future Work

  • Expand the French–Ghomálá dataset with more diverse domains
  • Explore parameter-efficient fine-tuning (LoRA, adapters)
  • Benchmark against other multilingual MT systems
  • Incorporate human evaluation from native speakers

Ethical Considerations

This model contributes to improving representation of under-resourced African languages in AI.

Care should be taken to:

  • Respect linguistic and cultural nuances of Ghomálá
  • Validate outputs in sensitive or critical contexts
  • Involve native speakers in evaluation and feedback loops
  • Avoid over-reliance in high-stakes applications without verification

Citation

If you use this model, please cite:

@model{turaco_mt_fr_gh,
  author = {fotiecodes},
  title = {Turaco-mt-fr-gh},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/fotiecodes/Turaco-mt-fr-gh}
}
Downloads last month
77
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fotiecodes/Turaco-mt-fr-gh

Finetuned
(276)
this model

Dataset used to train fotiecodes/Turaco-mt-fr-gh

Collection including fotiecodes/Turaco-mt-fr-gh