You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Aura-MT-1B

A 1B-parameter multilingual machine translation model fine-tuned from WakandaAI/Aura-1B on 25 languages (24 African/Arabic/French/Portuguese + English). Supports bidirectional translation between English and 24 target languages.

Architecture


Base model	WakandaAI/Aura-1B
Parameters	1.01B (1,013,280,000)
Layers	36
Hidden dim	1280
Attention heads	20 (KV heads: 4, GQA)
FFN intermediate	5120
Context length	1024
Vocab size	64,000
RoPE theta	500,000
Architecture	Llama-3 style (RMSNorm, RoPE, GQA, SwiGLU)

Training


Method	Full SFT (all parameters)
Dataset	4.1M parallel sentence pairs across 25 languages
Sources	NLLB, WMT22, LAFAND-MT, translated web data
Optimizer	AdamW (lr=2e-5, cosine decay, 200 warmup steps)
Precision	bfloat16
Hardware	--
Batch size	1024 tokens/GPU, packed sequences
Training steps	31,000
DDP backend	Gloo over InfiniBand

Supported Languages

Code	Language	Code	Language
`afr_Latn`	Afrikaans	`plt_Latn`	Malagasy
`amh_Ethi`	Amharic	`por_Latn`	Portuguese
`arb_Arab`	Arabic	`sna_Latn`	Shona
`bem_Latn`	Bemba	`som_Latn`	Somali
`eng_Latn`	English	`sot_Latn`	Sesotho
`fon_Latn`	Fon	`swh_Latn`	Swahili
`fra_Latn`	French	`tir_Ethi`	Tigrinya
`hau_Latn`	Hausa	`tsn_Latn`	Setswana
`ibo_Latn`	Igbo	`wol_Latn`	Wolof
`kin_Latn`	Kinyarwanda	`xho_Latn`	Xhosa
`lin_Latn`	Lingala	`yor_Latn`	Yoruba
`lug_Latn`	Luganda	`zul_Latn`	Zulu
`nya_Latn`	Chichewa

Quick Start

git clone https://huggingface.co/WakandaAI/Aura-MT-1B
cd Aura-MT-1B
pip install torch tokenizers safetensors

Single translation

python generate.py --text "The president announced new economic policies." \
    --src eng_Latn --tgt hau_Latn

Interactive mode

python generate.py --interactive --src eng_Latn --tgt yor_Latn

[eng_Latn->yor_Latn] >>> Good morning, how are you doing today?
  Akoko ti o dara, bawo ni o ṣe n ṣiṣẹ loni?

[eng_Latn->yor_Latn] >>> /set src=arb_Arab tgt=eng_Latn
  Direction: arb_Arab -> eng_Latn

[arb_Arab->eng_Latn] >>> صباح الخير، كيف حالك؟
  Good morning, how are you?

Batch translation

python generate.py --input sentences.txt --src eng_Latn --tgt swh_Latn --output translations.txt

Python API

from inference import load_model, translate

model, tokenizer, config = load_model(".")

# English -> Swahili
result = translate(model, tokenizer,
    "The president announced new economic policies.",
    src_lang="eng_Latn", tgt_lang="swh_Latn")
print(result)

# French -> English
result = translate(model, tokenizer,
    "Bonjour, comment allez-vous?",
    src_lang="fra_Latn", tgt_lang="eng_Latn")
print(result)

# With sampling instead of beam search
result = translate(model, tokenizer,
    "Hello world",
    src_lang="eng_Latn", tgt_lang="yor_Latn",
    num_beams=1, temperature=0.7, top_p=0.9)
print(result)

Prompt Format

Internally, the model uses instruction-style prompts with a language token prefix:

<s><|tgt_lang|>Translate the following English text into Swahili.
English: Hello, how are you?
Swahili:

The translate() function handles prompt construction automatically. Six prompt templates are available (selectable via template_idx).

Decoding

Parameter	Default	Description
`num_beams`	4	Beam search width (1 = greedy/sampling)
`max_new_tokens`	128	Maximum output length
`length_penalty`	1.0	Beam search length penalty
`no_repeat_ngram_size`	3	Ban repeated n-grams (0 = off)
`temperature`	0.0	Sampling temperature (>0 with `num_beams=1`)
`top_p`	0.9	Nucleus sampling threshold

Files

File	Purpose
`model.safetensors`	Model weights (preferred format)
`model.pt`	Same weights as a torch checkpoint (fallback)
`config.json`	Architecture config
`tokenizer.json`	ByteLevel BPE tokenizer (64,000 vocab)
`tokenizer_config.json`	HuggingFace tokenizer metadata
`special_tokens.json`	Language token ID mapping
`inference.py`	`load_model()` + `translate()` library
`generate.py`	CLI wrapper (single/batch/interactive)
`llama3.py`	Transformer model definition
`model_factory.py`	Model config builder
`kvcache.py`	KV cache for inference

Limitations

Reverse direction (X -> English) is weaker than forward (English -> X) due to training data distribution
Low-resource languages (Bemba, Luganda, Wolof, Fon) produce lower-quality translations
The model may hallucinate named entities (e.g., inserting "Buhari" in political contexts) reflecting training data bias
Maximum sequence length is 1024 tokens; longer inputs will be truncated

Citation

If you use this model, please cite WakandaAI. Details TBA.

License

Apache 2.0

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

BF16

Collection including WakandaAI/Aura-MT-1B

Aura Family of LLMs

Collection

4 items • Updated May 28