Rogendo commited on
Commit
5ebdf14
·
verified ·
1 Parent(s): 1690859

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +506 -0
README.md ADDED
@@ -0,0 +1,506 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - sw
5
+ tags:
6
+ - multi-task-learning
7
+ - text-classification
8
+ - fraud-detection
9
+ - sentiment-analysis
10
+ - call-quality
11
+ - question-answering
12
+ - jenga-ai
13
+ - nlp-for-africa
14
+ - security
15
+ - attention-fusion
16
+ base_model: distilbert-base-uncased
17
+ license: apache-2.0
18
+ pipeline_tag: text-classification
19
+ datasets:
20
+ - custom
21
+ model-index:
22
+ - name: JengaAI-multi-task-nlp
23
+ results:
24
+ - task:
25
+ type: text-classification
26
+ name: Fraud Detection
27
+ metrics:
28
+ - type: f1
29
+ value: 1
30
+ name: F1
31
+ - type: accuracy
32
+ value: 1
33
+ name: Accuracy
34
+ - task:
35
+ type: text-classification
36
+ name: Sentiment Analysis
37
+ metrics:
38
+ - type: f1
39
+ value: 0.167
40
+ name: F1
41
+ - type: accuracy
42
+ value: 0.333
43
+ name: Accuracy
44
+ - task:
45
+ type: text-classification
46
+ name: Call Quality - Listening
47
+ metrics:
48
+ - type: f1
49
+ value: 0.922
50
+ name: F1
51
+ - task:
52
+ type: text-classification
53
+ name: Call Quality - Resolution
54
+ metrics:
55
+ - type: f1
56
+ value: 0.908
57
+ name: F1
58
+ widget:
59
+ - text: >-
60
+ Suspicious M-Pesa transaction detected from unknown account requesting
61
+ urgent transfer
62
+ example_title: Fraud Detection
63
+ - text: >-
64
+ The customer service was excellent, my billing issue was resolved on the
65
+ first call
66
+ example_title: Positive Sentiment
67
+ - text: Hello, welcome to Safaricom customer care. How can I assist you today?
68
+ example_title: Call Quality Scoring
69
+ library_name: transformers
70
+ ---
71
+
72
+ # JengaAI Multi-Task NLP (3-Task Attention Fusion)
73
+
74
+ A **multi-task NLP model** built with the [JengaAI framework](https://github.com/Rogendo/JengaAI) that performs **fraud detection**, **sentiment analysis**, and **call quality scoring** simultaneously through a shared encoder with attention-based task fusion. Designed for Kenyan national security and telecommunications applications.
75
+
76
+ ## Model Capabilities
77
+
78
+ This model handles **3 tasks** with **8 prediction heads** producing **22 total output dimensions** in a single forward pass:
79
+
80
+ | Task | Type | Heads | Outputs | Best F1 |
81
+ |:-----|:-----|:------|:--------|:--------|
82
+ | **Fraud Detection** | Binary classification | 1 (fraud) | 2 classes: normal / fraud | **1.000** |
83
+ | **Sentiment Analysis** | 3-class classification | 1 (sentiment) | 3 classes: negative / neutral / positive | 0.167 |
84
+ | **Call Quality Scoring** | Multi-label QA | 6 heads, 17 sub-metrics | Binary per sub-metric | **0.646 - 0.967** |
85
+
86
+ ### Call Quality Sub-Metrics (17 Binary Outputs)
87
+
88
+ The call quality task evaluates customer service transcripts across 6 quality dimensions:
89
+
90
+ | Head | Sub-Metrics | F1 |
91
+ |:-----|:-----------|:---|
92
+ | **Opening** | greeting | 0.967 |
93
+ | **Listening** | acknowledgment, empathy, clarification, active_listening, patience | 0.922 |
94
+ | **Proactiveness** | initiative, follow_up, suggestions | 0.802 |
95
+ | **Resolution** | identified_issue, provided_solution, confirmed_resolution, set_expectations, offered_alternatives | 0.908 |
96
+ | **Hold** | asked_permission, explained_reason | 0.647 |
97
+ | **Closing** | proper_farewell | 0.881 |
98
+
99
+ ## Architecture
100
+
101
+ ```
102
+ Input Text
103
+ |
104
+ v
105
+ [DistilBERT Encoder] ---- 6 layers, 768 hidden, 12 attention heads
106
+ |
107
+ v
108
+ [Attention Fusion] ------- task-conditioned attention with residual connections
109
+ |
110
+ +-- [Task 0: Fraud Head] ----------- Linear(768, 2) --> softmax
111
+ +-- [Task 1: Sentiment Head] ------- Linear(768, 3) --> softmax
112
+ +-- [Task 2: QA Scoring 6 Heads] --- 6x Linear(768, 1..5) --> sigmoid
113
+ ```
114
+
115
+ **Key design choices:**
116
+
117
+ - **Shared encoder**: All 3 tasks share a single DistilBERT encoder, enabling knowledge transfer between fraud patterns, sentiment signals, and call quality indicators
118
+ - **Attention fusion**: A learned attention mechanism modulates the shared representation per task, allowing each task to attend to different parts of the encoder output while still benefiting from shared features
119
+ - **Residual connections**: Fusion output is added to the original representation (gate_init_value=0.5), ensuring stable training and allowing each task to fall back on the base representation
120
+ - **Multi-head QA**: Call quality uses 6 independent classification heads with different output sizes (1-5 binary outputs each), weighted by importance during training (resolution: 2.0x, listening: 1.5x, hold: 0.5x)
121
+
122
+ ## Usage
123
+
124
+ ### With JengaAI Framework (Recommended)
125
+
126
+ ```bash
127
+ pip install torch transformers pydantic pyyaml huggingface_hub
128
+ ```
129
+
130
+ ```python
131
+ from huggingface_hub import snapshot_download
132
+ from jenga_ai.inference import InferencePipeline
133
+
134
+ # Download model
135
+ model_path = snapshot_download(
136
+ "Rogendo/JengaAI-multi-task-nlp",
137
+ ignore_patterns=["checkpoints/*", "logs/*"],
138
+ )
139
+
140
+ # Load pipeline
141
+ pipeline = InferencePipeline.from_checkpoint(
142
+ model_dir=model_path,
143
+ config_path=f"{model_path}/experiment_config.yaml",
144
+ device="auto",
145
+ )
146
+
147
+ # Run all 3 tasks at once
148
+ result = pipeline.predict("Suspicious M-Pesa transaction from unknown account")
149
+ print(result.to_json())
150
+
151
+ # Or run a single task
152
+ fraud_result = pipeline.predict(
153
+ "WARNING: Your Safaricom account has been compromised. Send 5000 KES to unlock.",
154
+ task_name="fraud_detection",
155
+ )
156
+ fraud = fraud_result.task_results["fraud_detection"].heads["fraud"]
157
+ print(f"Fraud: {fraud.prediction} (confidence: {fraud.confidence:.1%})")
158
+ # Fraud: 1 (confidence: 96.9%)
159
+ ```
160
+
161
+ ### Batch Inference
162
+
163
+ ```python
164
+ texts = [
165
+ "Suspicious M-Pesa notification asking me to send money.",
166
+ "Normal airtime top-up of 100 KES via M-Pesa.",
167
+ "WARNING: Your account has been compromised.",
168
+ ]
169
+
170
+ results = pipeline.predict_batch(texts, task_name="fraud_detection", batch_size=32)
171
+
172
+ for text, result in zip(texts, results):
173
+ fraud = result.task_results["fraud_detection"].heads["fraud"]
174
+ label = "FRAUD" if fraud.prediction == 1 else "LEGIT"
175
+ print(f"[{label} {fraud.confidence:.1%}] {text}")
176
+ ```
177
+
178
+ ### CLI
179
+
180
+ ```bash
181
+ # Single text
182
+ python -m jenga_ai predict \
183
+ --config experiment_config.yaml \
184
+ --model-dir ./model \
185
+ --text "Suspicious M-Pesa transaction from unknown account" \
186
+ --format report
187
+
188
+ # Batch from file
189
+ python -m jenga_ai predict \
190
+ --config experiment_config.yaml \
191
+ --model-dir ./model \
192
+ --input-file transcripts.jsonl \
193
+ --output predictions.json \
194
+ --batch-size 16
195
+ ```
196
+
197
+ ### Call Quality Scoring Example
198
+
199
+ ```python
200
+ result = pipeline.predict(
201
+ "Hello, welcome to Safaricom customer care. I understand you're having "
202
+ "a billing issue. Let me look into that for you right away. I've found "
203
+ "the discrepancy and corrected your balance. Is there anything else?",
204
+ task_name="call_quality",
205
+ )
206
+
207
+ for head_name, head in result.task_results["call_quality"].heads.items():
208
+ print(f"{head_name:16s} {head.prediction} (conf: {head.confidence:.2f})")
209
+ ```
210
+
211
+ Output:
212
+ ```
213
+ opening {'greeting': True} (conf: 0.82)
214
+ listening {'acknowledgment': True, 'empathy': True, ...} (conf: 0.75)
215
+ proactiveness {'initiative': True, 'follow_up': True, 'suggestions': False} (conf: 0.58)
216
+ resolution {'identified_issue': True, 'provided_solution': True, ...} (conf: 0.69)
217
+ hold {'asked_permission': False, 'explained_reason': False} (conf: 0.02)
218
+ closing {'proper_farewell': True} (conf: 0.52)
219
+ ```
220
+
221
+ ### Low-Level Usage (Without JengaAI Framework)
222
+
223
+ If you only need the raw model weights and want to integrate into your own pipeline:
224
+
225
+ ```python
226
+ import torch
227
+ import json
228
+ from transformers import AutoTokenizer, AutoModel, AutoConfig
229
+
230
+ # Load components
231
+ tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
232
+ encoder_config = AutoConfig.from_pretrained("./model/encoder_config")
233
+
234
+ with open("./model/metadata.json") as f:
235
+ metadata = json.load(f)
236
+
237
+ # Load full state dict
238
+ state_dict = torch.load("./model/model.pt", map_location="cpu", weights_only=True)
239
+
240
+ # Extract encoder weights (keys starting with "encoder.")
241
+ encoder_state = {k.replace("encoder.", ""): v for k, v in state_dict.items() if k.startswith("encoder.")}
242
+ encoder = AutoModel.from_config(encoder_config)
243
+ encoder.load_state_dict(encoder_state)
244
+ encoder.eval()
245
+
246
+ # Run encoder
247
+ inputs = tokenizer("Suspicious transaction", return_tensors="pt", padding="max_length",
248
+ truncation=True, max_length=256)
249
+ with torch.no_grad():
250
+ outputs = encoder(**inputs)
251
+ cls_embedding = outputs.last_hidden_state[:, 0] # [1, 768]
252
+
253
+ # Extract fraud head weights (task 0, head "fraud")
254
+ fraud_weight = state_dict["tasks.0.heads.fraud.1.weight"] # [2, 768]
255
+ fraud_bias = state_dict["tasks.0.heads.fraud.1.bias"] # [2]
256
+
257
+ logits = cls_embedding @ fraud_weight.T + fraud_bias
258
+ probs = torch.softmax(logits, dim=-1)
259
+ print(f"Fraud probability: {probs[0, 1].item():.4f}")
260
+ ```
261
+
262
+ ## Intended Use
263
+
264
+ ### Primary Use Cases
265
+
266
+ - **M-Pesa Fraud Detection**: Classify M-Pesa transaction descriptions as fraudulent or legitimate. Designed for Safaricom and Kenyan mobile money contexts.
267
+ - **Customer Sentiment Monitoring**: Analyze customer feedback and communications for sentiment polarity (negative / neutral / positive).
268
+ - **Call Center Quality Assurance**: Score customer service call transcripts across 17 quality sub-metrics in 6 categories, replacing manual QA audits.
269
+ - **Multi-Signal Analysis**: Run all 3 tasks simultaneously on the same text to get a comprehensive analysis (is this a fraud attempt? what's the sentiment? how good was the agent's response?).
270
+
271
+ ### Intended Users
272
+
273
+ - Kenyan telecommunications companies (Safaricom, Airtel Kenya)
274
+ - Financial institutions monitoring mobile money transactions
275
+ - Call center operations teams performing quality audits
276
+ - Security analysts processing incident reports
277
+ - NLP researchers working on African language and context models
278
+
279
+ ### Downstream Use
280
+
281
+ The model can be integrated into:
282
+ - Real-time fraud alerting systems
283
+ - Call center dashboards with automated QA scoring
284
+ - Customer feedback analysis pipelines
285
+ - Security operations center (SOC) threat triage workflows
286
+ - Mobile money transaction monitoring platforms
287
+
288
+ ## Out-of-Scope Use
289
+
290
+ - **Not for automated decision-making without human oversight.** This model should support human analysts, not replace them. High-stakes fraud decisions require human review.
291
+ - **Not for non-Kenyan contexts without retraining.** Entity names, transaction patterns, and call center norms are Kenyan-specific.
292
+ - **Not for languages other than English.** While some Swahili words appear in the training data (M-Pesa, Safaricom, KRA), the model is primarily English.
293
+ - **Not for legal evidence.** Model outputs are analytical signals, not forensic evidence.
294
+ - **Not for surveillance of individuals.** The model analyzes text content, not identity.
295
+
296
+ ## Bias, Risks, and Limitations
297
+
298
+ ### Known Biases
299
+
300
+ - **Training data imbalance**: Fraud detection was trained on only 20 samples (16 train / 4 eval). The model achieves 1.0 F1 on eval but this is likely due to the tiny eval set and potential overfitting. Real-world fraud patterns are far more diverse.
301
+ - **Sentiment data**: Only 15 samples, with accuracy stuck at 33.3% (random baseline for 3 classes). The sentiment head needs significantly more training data to be production-useful.
302
+ - **Call quality data**: 4,996 synthetic transcripts. While metrics are strong (0.65-0.97 F1), the synthetic nature means real-world transcripts with noise, code-switching (Swahili-English), and non-standard grammar may perform differently.
303
+ - **Geographic bias**: All training data reflects Kenyan contexts. The model may not generalize to other East African countries without adaptation.
304
+
305
+ ### Risks
306
+
307
+ - **False positives in fraud detection**: Legitimate transactions flagged as fraud can block real users. Always use this model with human review for enforcement actions.
308
+ - **False negatives in fraud detection**: Sophisticated fraud patterns not in the training data will be missed. This model is one signal among many, not a standalone detector.
309
+ - **Over-reliance on QA scores**: Call quality scores should augment, not replace, human QA reviewers. Edge cases (cultural nuances, sarcasm, escalation scenarios) may be scored incorrectly.
310
+
311
+ ### Recommendations
312
+
313
+ - Use fraud detection as a **triage signal** (flag for review), not an automatic block
314
+ - Retrain with production-scale data before deploying to production
315
+ - Monitor prediction confidence — route low-confidence predictions to human review using the built-in HITL routing (`enable_hitl=True`)
316
+ - Enable PII redaction (`enable_pii=True`) when processing real customer data
317
+ - Enable audit logging (`enable_audit=True`) for compliance and accountability
318
+
319
+ ## Training Details
320
+
321
+ ### Training Data
322
+
323
+ | Dataset | Task | Samples | Source |
324
+ |:--------|:-----|:--------|:-------|
325
+ | `sample_classification.jsonl` | Fraud Detection | 20 | Synthetic M-Pesa transaction descriptions |
326
+ | `sample_sentiment.jsonl` | Sentiment Analysis | 15 | Synthetic customer feedback |
327
+ | `synthetic_qa_metrics_data_v01x.json` | Call Quality | 4,996 | Synthetic call center transcripts with 17 binary QA labels |
328
+
329
+ **Train/eval split**: 80/20 random split (seed=42)
330
+
331
+ All datasets are synthetic, generated to reflect linguistic patterns in Kenyan telecommunications and financial services contexts. They contain English text with occasional Swahili terms and Kenyan-specific entities (M-Pesa, Safaricom, KRA, Kenyan phone numbers).
332
+
333
+ ### Training Procedure
334
+
335
+ #### Preprocessing
336
+
337
+ - Tokenizer: `distilbert-base-uncased` WordPiece tokenizer
338
+ - Max sequence length: 256 tokens
339
+ - Padding: `max_length` (padded to 256)
340
+ - Truncation: enabled
341
+
342
+ #### Architecture
343
+
344
+ - **Encoder**: DistilBERT (6 layers, 768 hidden, 12 heads) — 66.4M parameters
345
+ - **Fusion**: Attention fusion with residual connections — 1.2M parameters
346
+ - **Task heads**: 8 linear heads across 3 tasks — 17K parameters
347
+ - **Total**: 67.6M parameters (258MB on disk)
348
+
349
+ #### Training Hyperparameters
350
+
351
+ | Parameter | Value |
352
+ |:----------|:------|
353
+ | Learning rate | 2e-5 |
354
+ | Batch size | 16 |
355
+ | Epochs | 12 (best checkpoint at epoch 3) |
356
+ | Weight decay | 0.01 |
357
+ | Warmup steps | 20 |
358
+ | Max gradient norm | 1.0 |
359
+ | Optimizer | AdamW |
360
+ | Precision | FP32 |
361
+ | Task sampling | Proportional (temperature=2.0) |
362
+ | Early stopping patience | 5 epochs |
363
+ | Best model metric | eval_loss |
364
+
365
+ #### Task Loss Weights
366
+
367
+ | Head | Weight | Rationale |
368
+ |:-----|:-------|:----------|
369
+ | fraud | 1.0 | Standard |
370
+ | sentiment | 1.0 | Standard |
371
+ | opening | 1.0 | Standard |
372
+ | listening | 1.5 | Important quality dimension |
373
+ | proactiveness | 1.0 | Standard |
374
+ | resolution | 2.0 | Most critical quality dimension |
375
+ | hold | 0.5 | Less frequent in transcripts |
376
+ | closing | 1.0 | Standard |
377
+
378
+ #### Training Loss Progression
379
+
380
+ | Epoch | Train Loss | Eval Loss | Status |
381
+ |:------|:-----------|:----------|:-------|
382
+ | 3 | 1.878 | **1.948** | Best checkpoint |
383
+ | 7 | 1.471 | 2.057 | Overfitting begins |
384
+ | 8 | 1.403 | 2.068 | Continued overfitting |
385
+
386
+ The best checkpoint was selected at epoch 3 based on eval_loss. Training continued to epoch 12 but eval loss increased after epoch 3, indicating overfitting �� expected given the small fraud and sentiment datasets.
387
+
388
+ ### Speeds, Sizes, Times
389
+
390
+ | Metric | Value |
391
+ |:-------|:------|
392
+ | Model size (disk) | 258 MB |
393
+ | Parameters | 67.6M |
394
+ | Inference latency (single task, CPU) | ~590 ms |
395
+ | Inference latency (all 3 tasks, CPU) | ~1,960 ms |
396
+ | Batch throughput (32 texts, single task, CPU) | ~647 ms/sample |
397
+ | Training time | ~5 minutes (CPU, 12 epochs) |
398
+
399
+ ## Evaluation
400
+
401
+ ### Metrics
402
+
403
+ All metrics are computed on the 20% held-out eval split.
404
+
405
+ **Fraud Detection** (binary classification):
406
+
407
+ | Metric | Value |
408
+ |:-------|:------|
409
+ | Accuracy | 1.000 |
410
+ | Precision | 1.000 |
411
+ | Recall | 1.000 |
412
+ | F1 | 1.000 |
413
+
414
+ **Sentiment Analysis** (3-class classification):
415
+
416
+ | Metric | Value |
417
+ |:-------|:------|
418
+ | Accuracy | 0.333 |
419
+ | Precision | 0.111 |
420
+ | Recall | 0.333 |
421
+ | F1 | 0.167 |
422
+
423
+ **Call Quality** (multi-label binary per head):
424
+
425
+ | Head | Precision | Recall | F1 |
426
+ |:-----|:----------|:-------|:---|
427
+ | Opening | 0.967 | 0.967 | **0.967** |
428
+ | Listening | 0.893 | 0.953 | **0.922** |
429
+ | Proactiveness | 0.746 | 0.868 | **0.802** |
430
+ | Resolution | 0.918 | 0.898 | **0.908** |
431
+ | Hold | 0.856 | 0.519 | **0.647** |
432
+ | Closing | 0.881 | 0.881 | **0.881** |
433
+
434
+ ### Results Summary
435
+
436
+ - **Fraud detection** achieves perfect metrics on the eval set, but this is a very small eval set (4 samples). Production deployment requires evaluation on a larger, more diverse dataset.
437
+ - **Sentiment analysis** performs at random baseline (33.3% accuracy for 3 classes), indicating the 15-sample dataset is insufficient. This head needs retraining with production data.
438
+ - **Call quality** shows strong performance across most heads (0.80-0.97 F1), with the "hold" category being the weakest (0.647 F1) due to fewer hold-related examples in the training data.
439
+
440
+ ## Model Examination
441
+
442
+ ### Attention Fusion
443
+
444
+ The attention fusion mechanism learns task-specific attention patterns over the shared encoder output. This allows:
445
+ - The fraud head to attend to transaction-related tokens (amounts, account references)
446
+ - The sentiment head to attend to opinion-bearing words
447
+ - The QA heads to attend to conversational flow patterns
448
+
449
+ The fusion uses a gated residual connection (initialized at 0.5), meaning each task's representation is a learned blend of the task-specific attended output and the original encoder output.
450
+
451
+ ### Security Features
452
+
453
+ When used with the JengaAI inference framework, the model supports:
454
+
455
+ - **PII Redaction**: Masks Kenyan-specific PII (phone numbers, national IDs, KRA PINs, M-Pesa transaction IDs) before inference
456
+ - **Explainability**: Token-level importance scores via attention analysis or gradient methods
457
+ - **Human-in-the-Loop**: Automatic routing of low-confidence predictions to human reviewers based on entropy-based uncertainty estimation
458
+ - **Audit Trail**: Tamper-evident logging of every inference call with SHA-256 hash chains
459
+
460
+ ## Technical Specifications
461
+
462
+ ### Model Architecture and Objective
463
+
464
+ - **Architecture**: DistilBERT encoder + attention fusion + multi-task heads
465
+ - **Encoder**: 6 transformer layers, 768 hidden size, 12 attention heads, 30,522 vocab
466
+ - **Fusion**: Single-head attention with residual gating
467
+ - **Objectives**: CrossEntropy (fraud, sentiment) + BCEWithLogits (call quality)
468
+
469
+ ### Compute Infrastructure
470
+
471
+ #### Hardware
472
+
473
+ - Training: CPU (Intel/AMD, standard workstation)
474
+ - Inference: CPU or CUDA GPU
475
+
476
+ #### Software
477
+
478
+ - PyTorch 2.x
479
+ - Transformers 5.x
480
+ - JengaAI Framework V2
481
+ - Python 3.11+
482
+
483
+ ## Environmental Impact
484
+
485
+ - **Hardware Type**: CPU (standard workstation)
486
+ - **Training Time**: ~5 minutes
487
+ - **Carbon Emitted**: Negligible (short training run on CPU)
488
+
489
+ ## Citation
490
+
491
+ ```bibtex
492
+ @software{jengaai2026,
493
+ title = {JengaAI: Low-Code Multi-Task NLP for African Security Applications},
494
+ author = {Rogendo},
495
+ year = {2026},
496
+ url = {https://huggingface.co/Rogendo/JengaAI-multi-task-nlp},
497
+ }
498
+ ```
499
+
500
+ ## Model Card Authors
501
+
502
+ Rogendo
503
+
504
+ ## Model Card Contact
505
+
506
+ For questions, issues, or contributions: [GitHub Issues](https://github.com/Rogendo/JengaAI/issues)