You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Aura-MT-1B

A 1B-parameter multilingual machine translation model fine-tuned from WakandaAI/Aura-1B on 25 languages (24 African/Arabic/French/Portuguese + English). Supports bidirectional translation between English and 24 target languages.

Architecture

Base model WakandaAI/Aura-1B
Parameters 1.01B (1,013,280,000)
Layers 36
Hidden dim 1280
Attention heads 20 (KV heads: 4, GQA)
FFN intermediate 5120
Context length 1024
Vocab size 64,000
RoPE theta 500,000
Architecture Llama-3 style (RMSNorm, RoPE, GQA, SwiGLU)

Training

Method Full SFT (all parameters)
Dataset 4.1M parallel sentence pairs across 25 languages
Sources NLLB, WMT22, LAFAND-MT, translated web data
Optimizer AdamW (lr=2e-5, cosine decay, 200 warmup steps)
Precision bfloat16
Hardware --
Batch size 1024 tokens/GPU, packed sequences
Training steps 31,000
DDP backend Gloo over InfiniBand

Supported Languages

Code Language Code Language
afr_Latn Afrikaans plt_Latn Malagasy
amh_Ethi Amharic por_Latn Portuguese
arb_Arab Arabic sna_Latn Shona
bem_Latn Bemba som_Latn Somali
eng_Latn English sot_Latn Sesotho
fon_Latn Fon swh_Latn Swahili
fra_Latn French tir_Ethi Tigrinya
hau_Latn Hausa tsn_Latn Setswana
ibo_Latn Igbo wol_Latn Wolof
kin_Latn Kinyarwanda xho_Latn Xhosa
lin_Latn Lingala yor_Latn Yoruba
lug_Latn Luganda zul_Latn Zulu
nya_Latn Chichewa

Quick Start

git clone https://huggingface.co/WakandaAI/Aura-MT-1B
cd Aura-MT-1B
pip install torch tokenizers safetensors

Single translation

python generate.py --text "The president announced new economic policies." \
    --src eng_Latn --tgt hau_Latn

Interactive mode

python generate.py --interactive --src eng_Latn --tgt yor_Latn
[eng_Latn->yor_Latn] >>> Good morning, how are you doing today?
  Akoko ti o dara, bawo ni o ṣe n ṣiṣẹ loni?

[eng_Latn->yor_Latn] >>> /set src=arb_Arab tgt=eng_Latn
  Direction: arb_Arab -> eng_Latn

[arb_Arab->eng_Latn] >>> صباح الخير، كيف حالك؟
  Good morning, how are you?

Batch translation

python generate.py --input sentences.txt --src eng_Latn --tgt swh_Latn --output translations.txt

Python API

from inference import load_model, translate

model, tokenizer, config = load_model(".")

# English -> Swahili
result = translate(model, tokenizer,
    "The president announced new economic policies.",
    src_lang="eng_Latn", tgt_lang="swh_Latn")
print(result)

# French -> English
result = translate(model, tokenizer,
    "Bonjour, comment allez-vous?",
    src_lang="fra_Latn", tgt_lang="eng_Latn")
print(result)

# With sampling instead of beam search
result = translate(model, tokenizer,
    "Hello world",
    src_lang="eng_Latn", tgt_lang="yor_Latn",
    num_beams=1, temperature=0.7, top_p=0.9)
print(result)

Prompt Format

Internally, the model uses instruction-style prompts with a language token prefix:

<s><|tgt_lang|>Translate the following English text into Swahili.
English: Hello, how are you?
Swahili:

The translate() function handles prompt construction automatically. Six prompt templates are available (selectable via template_idx).

Decoding

Parameter Default Description
num_beams 4 Beam search width (1 = greedy/sampling)
max_new_tokens 128 Maximum output length
length_penalty 1.0 Beam search length penalty
no_repeat_ngram_size 3 Ban repeated n-grams (0 = off)
temperature 0.0 Sampling temperature (>0 with num_beams=1)
top_p 0.9 Nucleus sampling threshold

Files

File Purpose
model.safetensors Model weights (preferred format)
model.pt Same weights as a torch checkpoint (fallback)
config.json Architecture config
tokenizer.json ByteLevel BPE tokenizer (64,000 vocab)
tokenizer_config.json HuggingFace tokenizer metadata
special_tokens.json Language token ID mapping
inference.py load_model() + translate() library
generate.py CLI wrapper (single/batch/interactive)
llama3.py Transformer model definition
model_factory.py Model config builder
kvcache.py KV cache for inference

Limitations

  • Reverse direction (X -> English) is weaker than forward (English -> X) due to training data distribution
  • Low-resource languages (Bemba, Luganda, Wolof, Fon) produce lower-quality translations
  • The model may hallucinate named entities (e.g., inserting "Buhari" in political contexts) reflecting training data bias
  • Maximum sequence length is 1024 tokens; longer inputs will be truncated

Citation

If you use this model, please cite WakandaAI. Details TBA.

License

Apache 2.0

Downloads last month
-
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including WakandaAI/Aura-MT-1B