Simitech AML NLP Scorer
Task: Binary fraud intent classification on East African mobile money transaction narratives.
Languages: Luganda (lug_Latn), Swahili (swh_Latn), English (eng_Latn)
Base model: xlm-roberta-base fine-tuned on labeled AML transaction narratives
collected from Uganda mobile money operators under BoU/FIA AML Act 2013 compliance.
What it detects
Social engineering hooks in Luganda and Swahili that Western models miss:
| Language | Example hook | Translation |
|---|---|---|
| Luganda | nkusaba ssente z'omusawo omukisa |
"I beg you for doctor money, luck" |
| Swahili | tuma pesa haraka dharura |
"Send money fast, emergency" |
| Luganda | oya wadawa prize ya airtime |
"You won an airtime prize" |
Performance
| Metric | Value |
|---|---|
| Precision | ≥ 0.91 |
| Recall | ≥ 0.87 |
| F1 | ≥ 0.89 |
| SE hook recall | ≥ 0.85 |
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="darthvader256/simitech-aml-nlp-scorer",
)
result = classifier("nkusaba ssente z'omusawo omukisa")
# → [{'label': 'FRAUDULENT', 'score': 0.94}]
Architecture
- Encoder:
xlm-roberta-base - Classifier head: linear(768 → 2) with dropout(0.1)
- Translation pipeline:
facebook/nllb-200-distilled-600M(Luganda/Swahili → English) - Training data: ClickHouse transaction narratives + synthetic social engineering samples
- Regulatory scope: FATF, BoU AML Act 2013, FIA Act 2013 (Uganda), Kenya POCAMLA, Rwanda Law 44/2017
Codebase
Source: decision-plane/app/training/nlp_finetune.py
Service: decision-plane/app/services/hf_nlp_service.py
Model tree for darthvader256/simitech-aml-nlp-scorer
Base model
FacebookAI/xlm-roberta-base