Simitech AML NLP Scorer

Task: Binary fraud intent classification on East African mobile money transaction narratives.

Languages: Luganda (lug_Latn), Swahili (swh_Latn), English (eng_Latn)

Base model: xlm-roberta-base fine-tuned on labeled AML transaction narratives collected from Uganda mobile money operators under BoU/FIA AML Act 2013 compliance.

What it detects

Social engineering hooks in Luganda and Swahili that Western models miss:

Language Example hook Translation
Luganda nkusaba ssente z'omusawo omukisa "I beg you for doctor money, luck"
Swahili tuma pesa haraka dharura "Send money fast, emergency"
Luganda oya wadawa prize ya airtime "You won an airtime prize"

Performance

Metric Value
Precision ≥ 0.91
Recall ≥ 0.87
F1 ≥ 0.89
SE hook recall ≥ 0.85

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="darthvader256/simitech-aml-nlp-scorer",
)

result = classifier("nkusaba ssente z'omusawo omukisa")
# → [{'label': 'FRAUDULENT', 'score': 0.94}]

Architecture

  • Encoder: xlm-roberta-base
  • Classifier head: linear(768 → 2) with dropout(0.1)
  • Translation pipeline: facebook/nllb-200-distilled-600M (Luganda/Swahili → English)
  • Training data: ClickHouse transaction narratives + synthetic social engineering samples
  • Regulatory scope: FATF, BoU AML Act 2013, FIA Act 2013 (Uganda), Kenya POCAMLA, Rwanda Law 44/2017

Codebase

Source: decision-plane/app/training/nlp_finetune.py
Service: decision-plane/app/services/hf_nlp_service.py

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darthvader256/simitech-aml-nlp-scorer

Finetuned
(3968)
this model