darthvader256 commited on
Commit
e84af39
·
verified ·
1 Parent(s): d363453

chore: update model card

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - lug
4
+ - sw
5
+ - en
6
+ license: other
7
+ tags:
8
+ - translation
9
+ - anti-money-laundering
10
+ - luganda
11
+ - swahili
12
+ - nllb
13
+ - east-africa
14
+ base_model: facebook/nllb-200-distilled-600M
15
+ datasets:
16
+ - darthvader256/Simivalleyaml
17
+ ---
18
+
19
+ # Simitech AML AfriNLLB Translator
20
+
21
+ Fine-tuned from `facebook/nllb-200-distilled-600M` on East African AML transaction narratives.
22
+ Specialized for translating Luganda (`lug_Latn`) and Swahili (`swh_Latn`) mobile money
23
+ transaction descriptions to English for downstream AML classification.
24
+
25
+ ## Why a specialized translator?
26
+
27
+ General NLLB models miss domain-specific AML vocabulary:
28
+ - Mobile money agent terminology (float, airtime, USSD codes)
29
+ - Ugandan colloquialisms used in social engineering scams
30
+ - Financial crime typology phrases specific to EAC corridor
31
+
32
+ ## Usage
33
+
34
+ ```python
35
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
36
+
37
+ model_id = "darthvader256/simitech-aml-afrinllb-translator"
38
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
39
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
40
+
41
+ tokenizer.src_lang = "lug_Latn"
42
+ inputs = tokenizer("nkusaba ssente z'omusawo omukisa", return_tensors="pt")
43
+ output = model.generate(
44
+ **inputs,
45
+ forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"],
46
+ max_new_tokens=128,
47
+ )
48
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
49
+ # → "I am asking for doctor money, please"
50
+ ```
51
+
52
+ ## Source
53
+
54
+ `decision-plane/app/training/nlp_finetune.py` — `AfriNLLBTranslator` class