Africa v2 Translation Model
This is an improved fine-tuned translation model for 29 African languages, based on Qwen3-4B-Instruct-2507 with enhanced training data.
Model Description
Africa v2 is an improved version of the Africa v1 translation model. Key improvements include:
- System prompts in training data to enforce direct translation behavior
- Regenerated training dataset with better formatting
- Available in MLX 4-bit format and LoRA adapters
Note: Training was interrupted at 1,000 iterations due to GPU OOM. The model represents a partial training checkpoint.
Supported Languages (29)
African Languages:
- Afrikaans (af), Akan (ak), Amharic (am), Bambara (bm), Ewe (ee)
- Fula (ff), Hausa (ha), Igbo (ig), Kinyarwanda (rw), Kirundi (rn)
- Kongo (kg), Lingala (ln), Luganda (lg), Ndebele (nd), Northern Sotho (nso)
- Chichewa/Nyanja (ny), Oromo (om), Shona (sn), Somali (so), Swahili (sw)
- Tigrinya (ti), Tsonga (ts), Tswana (tn), Twi (tw), Venda (ve)
- Wolof (wo), Xhosa (xh), Yoruba (yo), Zulu (zu)
Plus English (en) for bidirectional translation.
Training Details
Base Model
- Model: Qwen3-4B-Instruct-2507 (MLX 4-bit quantized)
- Parameters: 4 billion
- Architecture: Transformer-based language model
Fine-tuning
- Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 8
- LoRA Alpha: 20
- Target Layers: 16 layers
- Training Iterations: 1,000 (interrupted, intended 10,000)
- Learning Rate: 5e-5
- Batch Size: 1
- Final Train Loss: 2.116
- Final Val Loss: 2.512
Training Data
- Total Translation Pairs: 283,986
- Format: Enhanced with system prompts
- System Message: "You are a translation assistant. Output only the translation without explanation."
Available Checkpoints
- Checkpoint at 1,000 iterations (latest)
Improvements Over v1
- System Prompts: Training data includes system messages to suppress thinking mode
- Better Formatting: Consistent prompt format for all language pairs
- Direct Translation: Model trained to output translation directly without explanation
Evaluation Results
Status: Model evaluation pending. Use v1 evaluation as baseline comparison.
Expected improvements over v1:
- Reduced repetition loops
- Less hallucination due to system prompts
- More consistent output format
Usage
MLX with mlx-lm
# Install MLX
pip install mlx-lm
# Download model
huggingface-cli download aoiandroid/africa-v2-translation-model --local-dir africa-v2-mlx --include "mlx-4bit/*"
# Run inference with system prompt
python -m mlx_lm.generate \
--model africa-v2-mlx/mlx-4bit \
--prompt "<|im_start|>system
You are a translation assistant. Output only the translation without explanation.<|im_end|>
<|im_start|>user
Translate from English to Swahili:
Hello, how are you?<|im_end|>
<|im_start|>assistant" \
--max-tokens 256 \
--temp 0.1
LoRA Adapters
from mlx_lm import load, generate
# Load model with LoRA adapters
model, tokenizer = load("aoiandroid/africa-v2-translation-model", adapter_path="lora")
# Prepare prompt with system message
messages = [
{"role": "system", "content": "You are a translation assistant. Output only the translation without explanation."},
{"role": "user", "content": "Translate from English to Swahili:\n\nHello, how are you?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Generate translation
response = generate(model, tokenizer, prompt=prompt, max_tokens=256, temp=0.1)
print(response)
GGUF Conversion
Note: GGUF export is not available for this model because:
mlx_lm.fuse --export-ggufonly supportsmodel_typein["llama", "mixtral", "mistral"]- Qwen3 has
model_type: "qwen3", which is not yet supported
To convert to GGUF:
- Export MLX model to HuggingFace format (if supported)
- Use llama.cpp's
convert_hf_to_gguf.pyscript - Or wait for mlx_lm to add Qwen3 support
Alternatively, use v1's GGUF model as a fallback.
Limitations and Biases
- Partial Training: Only 1,000 iterations completed (10% of planned training)
- Needs Evaluation: Translation quality not yet formally evaluated
- Low-Resource Languages: Limited training data for some African languages
- Experimental Model: Intended for research and experimentation
Intended Use
- Research: Studying impact of system prompts on translation quality
- Experimentation: Testing improved training data formatting
- Comparison: Baseline for comparing with fully trained models
Recommended: Complete training to 10,000+ iterations before production use.
Future Work
- Complete training to 10,000+ iterations
- Increase LoRA rank to 16 or 32
- Formal evaluation with BLEU, chrF, and TER metrics
- Compare performance against v1
Citation
@software{africa_v2_translation_model,
title = {Africa v2 Translation Model},
author = {TranslateBlue Project},
year = {2026},
url = {https://huggingface.co/aoiandroid/africa-v2-translation-model}
}
License
Apache 2.0
Model Card Authors
TranslateBlue Project
Model Card Contact
For questions or issues, please open an issue in the model repository.
Quantized
Model tree for aoiandroid/africa-v2-translation-model
Base model
Qwen/Qwen3-4B-Instruct-2507