Africa v2 Translation Model

This is an improved fine-tuned translation model for 29 African languages, based on Qwen3-4B-Instruct-2507 with enhanced training data.

Model Description

Africa v2 is an improved version of the Africa v1 translation model. Key improvements include:

System prompts in training data to enforce direct translation behavior
Regenerated training dataset with better formatting
Available in MLX 4-bit format and LoRA adapters

Note: Training was interrupted at 1,000 iterations due to GPU OOM. The model represents a partial training checkpoint.

Supported Languages (29)

African Languages:

Afrikaans (af), Akan (ak), Amharic (am), Bambara (bm), Ewe (ee)
Fula (ff), Hausa (ha), Igbo (ig), Kinyarwanda (rw), Kirundi (rn)
Kongo (kg), Lingala (ln), Luganda (lg), Ndebele (nd), Northern Sotho (nso)
Chichewa/Nyanja (ny), Oromo (om), Shona (sn), Somali (so), Swahili (sw)
Tigrinya (ti), Tsonga (ts), Tswana (tn), Twi (tw), Venda (ve)
Wolof (wo), Xhosa (xh), Yoruba (yo), Zulu (zu)

Plus English (en) for bidirectional translation.

Training Details

Base Model

Model: Qwen3-4B-Instruct-2507 (MLX 4-bit quantized)
Parameters: 4 billion
Architecture: Transformer-based language model

Fine-tuning

Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 8
LoRA Alpha: 20
Target Layers: 16 layers
Training Iterations: 1,000 (interrupted, intended 10,000)
Learning Rate: 5e-5
Batch Size: 1
Final Train Loss: 2.116
Final Val Loss: 2.512

Training Data

Total Translation Pairs: 283,986
Format: Enhanced with system prompts
System Message: "You are a translation assistant. Output only the translation without explanation."

Available Checkpoints

Checkpoint at 1,000 iterations (latest)

Improvements Over v1

System Prompts: Training data includes system messages to suppress thinking mode
Better Formatting: Consistent prompt format for all language pairs
Direct Translation: Model trained to output translation directly without explanation

Evaluation Results

Status: Model evaluation pending. Use v1 evaluation as baseline comparison.

Expected improvements over v1:

Reduced repetition loops
Less hallucination due to system prompts
More consistent output format

Usage

MLX with mlx-lm

# Install MLX
pip install mlx-lm

# Download model
huggingface-cli download aoiandroid/africa-v2-translation-model --local-dir africa-v2-mlx --include "mlx-4bit/*"

# Run inference with system prompt
python -m mlx_lm.generate \
  --model africa-v2-mlx/mlx-4bit \
  --prompt "<|im_start|>system
You are a translation assistant. Output only the translation without explanation.<|im_end|>
<|im_start|>user
Translate from English to Swahili:

Hello, how are you?<|im_end|>
<|im_start|>assistant" \
  --max-tokens 256 \
  --temp 0.1

LoRA Adapters

from mlx_lm import load, generate

# Load model with LoRA adapters
model, tokenizer = load("aoiandroid/africa-v2-translation-model", adapter_path="lora")

# Prepare prompt with system message
messages = [
    {"role": "system", "content": "You are a translation assistant. Output only the translation without explanation."},
    {"role": "user", "content": "Translate from English to Swahili:\n\nHello, how are you?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate translation
response = generate(model, tokenizer, prompt=prompt, max_tokens=256, temp=0.1)
print(response)

GGUF Conversion

Note: GGUF export is not available for this model because:

mlx_lm.fuse --export-gguf only supports model_type in ["llama", "mixtral", "mistral"]
Qwen3 has model_type: "qwen3", which is not yet supported

To convert to GGUF:

Export MLX model to HuggingFace format (if supported)
Use llama.cpp's convert_hf_to_gguf.py script
Or wait for mlx_lm to add Qwen3 support

Alternatively, use v1's GGUF model as a fallback.

Limitations and Biases

Partial Training: Only 1,000 iterations completed (10% of planned training)
Needs Evaluation: Translation quality not yet formally evaluated
Low-Resource Languages: Limited training data for some African languages
Experimental Model: Intended for research and experimentation

Intended Use

Research: Studying impact of system prompts on translation quality
Experimentation: Testing improved training data formatting
Comparison: Baseline for comparing with fully trained models

Recommended: Complete training to 10,000+ iterations before production use.

Future Work

Complete training to 10,000+ iterations
Increase LoRA rank to 16 or 32
Formal evaluation with BLEU, chrF, and TER metrics
Compare performance against v1

Citation

@software{africa_v2_translation_model,
  title = {Africa v2 Translation Model},
  author = {TranslateBlue Project},
  year = {2026},
  url = {https://huggingface.co/aoiandroid/africa-v2-translation-model}
}

License

Apache 2.0

Model Card Authors

TranslateBlue Project

Model Card Contact

For questions or issues, please open an issue in the model repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for aoiandroid/africa-v2-translation-model

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(2616)

this model