Simitech AML AfriNLLB Translator
Fine-tuned from facebook/nllb-200-distilled-600M on East African AML transaction narratives.
Specialized for translating Luganda (lug_Latn) and Swahili (swh_Latn) mobile money
transaction descriptions to English for downstream AML classification.
Why a specialized translator?
General NLLB models miss domain-specific AML vocabulary:
- Mobile money agent terminology (float, airtime, USSD codes)
- Ugandan colloquialisms used in social engineering scams
- Financial crime typology phrases specific to EAC corridor
Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_id = "darthvader256/simitech-aml-afrinllb-translator"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer.src_lang = "lug_Latn"
inputs = tokenizer("nkusaba ssente z'omusawo omukisa", return_tensors="pt")
output = model.generate(
**inputs,
forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"],
max_new_tokens=128,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# โ "I am asking for doctor money, please"
Source
decision-plane/app/training/nlp_finetune.py โ AfriNLLBTranslator class
Model tree for darthvader256/simitech-aml-afrinllb-translator
Base model
facebook/nllb-200-distilled-600M