Simitech AML NLP Scorer

Task: Binary fraud intent classification on East African mobile money transaction narratives.

Languages: Luganda (lug_Latn), Swahili (swh_Latn), English (eng_Latn)

Base model: xlm-roberta-base fine-tuned on labeled AML transaction narratives collected from Uganda mobile money operators under BoU/FIA AML Act 2013 compliance.

What it detects

Social engineering hooks in Luganda and Swahili that Western models miss:

Language	Example hook	Translation
Luganda	`nkusaba ssente z'omusawo omukisa`	"I beg you for doctor money, luck"
Swahili	`tuma pesa haraka dharura`	"Send money fast, emergency"
Luganda	`oya wadawa prize ya airtime`	"You won an airtime prize"

Performance

Metric	Value
Precision	≥ 0.91
Recall	≥ 0.87
F1	≥ 0.89
SE hook recall	≥ 0.85

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="darthvader256/simitech-aml-nlp-scorer",
)

result = classifier("nkusaba ssente z'omusawo omukisa")
# → [{'label': 'FRAUDULENT', 'score': 0.94}]

Architecture

Encoder: xlm-roberta-base
Classifier head: linear(768 → 2) with dropout(0.1)
Translation pipeline: facebook/nllb-200-distilled-600M (Luganda/Swahili → English)
Training data: ClickHouse transaction narratives + synthetic social engineering samples
Regulatory scope: FATF, BoU AML Act 2013, FIA Act 2013 (Uganda), Kenya POCAMLA, Rwanda Law 44/2017

Codebase

Source: decision-plane/app/training/nlp_finetune.py
Service: decision-plane/app/services/hf_nlp_service.py

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for darthvader256/simitech-aml-nlp-scorer

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3968)

this model