Model Overview

Turaco-mt-fr-gh is a specialized neural machine translation model fine-tuned for high-quality translation from French to Ghomálá.

This model is part of the Turaco family, an initiative focused on advancing translation capabilities for low-resource and underrepresented African languages. While large-scale multilingual models provide strong general foundations, they often lack depth and fluency when applied to specific low-resource languages. This project addresses that gap through targeted fine-tuning on curated parallel data.

Built on top of NLLB-200, Turaco-mt-fr-gh leverages multilingual transfer learning to produce more accurate, fluent, and context-aware translations into Ghomálá.

Model Details

Developed by: fotiecodes
Model type: Sequence-to-Sequence Transformer (Multilingual NMT)
License: Apache-2.0
Base model: facebook/nllb-200
Task: Machine Translation (French → Ghomálá)
Language(s): French (fr), Ghomálá (gh)

Intended Use

This model is designed for:

Translating French text into Ghomálá
Supporting localization for Cameroonian and regional applications
Experimentation with low-resource language translation
Research on multilingual transfer learning and adaptation

Training Data

The model was fine-tuned on a parallel dataset of French–Ghomálá sentence pairs.

Key characteristics:

High-quality aligned sentence pairs
Focus on conversational and general-purpose language
Cleaned and normalized text to reduce noise
Balanced examples to improve consistency in output

Given the low-resource nature of Ghomálá, dataset quality and consistency were prioritized over sheer size.

Training Procedure

The model was fine-tuned using supervised learning on parallel translation data.

Key aspects:

Initialized from NLLB-200
Standard sequence-to-sequence training with source-target pairs
Tokenization handled using the pretrained NLLB tokenizer
Optimization focused on adapting the model to:
- Ghomálá vocabulary and structure
- French → Ghomálá alignment
- Improved fluency and coherence

The training process leverages NLLB’s multilingual representations, allowing the model to generalize better despite limited data.

Evaluation

Evaluation was primarily qualitative, focusing on:

Fluency in Ghomálá
Semantic correctness of translations
Consistency in maintaining the target language

Preliminary results indicate:

📊 Results and evals:

French → Ghomala: BLEU=4.9 | chrF2=19.1
Ghomala → French: BLEU=10.8 | chrF2=30.5

Note:

The model shows stronger performance when translating from Ghomala to French than the reverse direction. However, overall scores (BLEU and chrF2) indicate that translation quality is still limited, especially for French → Ghomala. These results suggest the model is better at understanding Ghomala than generating it, and further training data or fine-tuning would be needed for production-level performance.

Limitations

Performance depends heavily on dataset size and diversity
May struggle with:
- Technical or domain-specific vocabulary
- Rare linguistic constructions
Not optimized for reverse translation (Ghomálá → French)
As with most neural MT systems, outputs may occasionally:
- Be inconsistent
- Contain minor hallucinations or approximations

Future Work

Expand the French–Ghomálá dataset with more diverse domains
Explore parameter-efficient fine-tuning (LoRA, adapters)
Benchmark against other multilingual MT systems
Incorporate human evaluation from native speakers

Ethical Considerations

This model contributes to improving representation of under-resourced African languages in AI.

Care should be taken to:

Respect linguistic and cultural nuances of Ghomálá
Validate outputs in sensitive or critical contexts
Involve native speakers in evaluation and feedback loops
Avoid over-reliance in high-stakes applications without verification

Citation

If you use this model, please cite:

@model{turaco_mt_fr_gh,
  author = {fotiecodes},
  title = {Turaco-mt-fr-gh},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/fotiecodes/Turaco-mt-fr-gh}
}

Downloads last month: 77

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fotiecodes/Turaco-mt-fr-gh

Base model

facebook/nllb-200-distilled-600M

Finetuned

(276)

this model

Dataset used to train fotiecodes/Turaco-mt-fr-gh

Collection including fotiecodes/Turaco-mt-fr-gh

Project Turaco

Collection