Attention Fusion Multi-Task DistilBERT
A single DistilBERT-base-cased backbone trained simultaneously on three NLP tasks for child helpline conversation analysis, using per-task learned attention pooling heads.
| Task | Type | Heads |
|---|---|---|
| Named Entity Recognition (NER) | Token-level | 10 entity labels |
| Case Classification (CLS) | Sentence-level | 4 heads (main category, sub-category, intervention, priority) |
| Quality Assurance Scoring (QA) | Sentence-level | 6 binary heads (17 sub-metrics) |
Validation Metrics (best checkpoint)
| Metric | Value |
|---|---|
| NER macro F1 | 0.5343 |
| CLS average accuracy (4 heads) | 0.6183 |
| QA average micro-F1 (6 heads) | 0.8386 |
| Composite average | 0.6638 |
Usage
Install dependencies
pip install torch transformers huggingface_hub
Download and run inference
from huggingface_hub import snapshot_download
import json, torch
# Download all files from the Hub
model_dir = snapshot_download(repo_id="rogendo/attention-fusion-distilbert")
from inference import AttentionFusionInference
inf = AttentionFusionInference(model_dir=model_dir, device="cuda")
texts = [
"Hello, I'm calling from Nairobi. My daughter Sarah, aged 12, was assaulted by her teacher.",
]
# Named Entity Recognition
ner_results = inf.predict_ner(texts)
# β [[("Sarah", "NAME"), ("12", "AGE"), ("Nairobi", "LOCATION"), ...]]
# Case Classification
cls_results = inf.predict_classification(texts)
# β [{"main_category": "VANE", "sub_category": "Physical Abuse",
# "intervention": "Counselling", "priority": "1"}]
# Quality Assurance Scoring
qa_results = inf.predict_qa(texts)
# β [{"opening": [1], "listening": [1,0,1,1,0], ...}]
Architecture
Input Text β Tokenizer β DistilBERT-base-cased (shared backbone)
β
βββββββββββββββββββββββΌβββββββββββββββββββββββ
β β β
NER Head CLS Head QA Head
Linear(768β768) TaskAttnPooling TaskAttnPooling
+ GELU + Drop + Dropout + Dropout
Linear(768β10) 4 classifiers 6 binary heads
[per token] (main/sub/interv/ (open/listen/
CrossEntropy priority) proact/resolv/
ignore=-100 CrossEntropyΓ4 hold/close)
ignore=-1 BCEWithLogitsΓ6
TaskAttentionPooling replaces the fixed [CLS] token with a learned
per-task attention weighted sum over all token positions, so each task can
focus on the tokens most relevant to its objective.
Training Strategy β Task Alternation
Each epoch, all batches from all three DataLoaders are collected, tagged with their task name, and randomly shuffled into a single sequence. The model iterates through this shuffled list β giving the shared backbone continuous gradient signal from all three tasks and preventing catastrophic forgetting.
NER Labels (10 classes)
O, NAME, LOCATION, VICTIM, AGE, GENDER, INCIDENT_TYPE, PERPETRATOR, PHONE_NUMBER, LANDMARK
Classification Labels
Main categories (8):
Advice and Counselling, Child Maintenance & Custody, Disability, GBV, Information, Nutrition, Unknown, VANE
Interventions (16):
Awareness/Information Provided, Counseling, Counseling, Referral, Counselling, Counselling Referral, Counselling Referral Signposting, Counselling, Awareness/Information Provided, Counselling, Awareness/Information Provided, Signposting, Counselling, Referral, Counselling, Referral, Signposting, Counselling, Signposting, Information Provided, Counselling, Referral, Referral, Counselling, Referral, Signposting, Signposting
Priority: 1 (critical) Β· 2 (urgent) Β· 3 (routine)
QA Scoring Heads
| opening | 1 | Call opening phrase used |
| listening | 5 | Did not interrupt caller, Showed empathy, Paraphrased the issue, Used please/thank you, Did not hesitate |
| proactiveness | 3 | Offered to solve extra issues, Confirmed satisfaction, Followed up on updates |
| resolution | 5 | Gave accurate information, Correct language use, Consulted when unsure, Followed correct steps, Explained solution clearly |
| hold | 2 | Explained before placing on hold, Thanked caller for holding |
| closing | 1 | Proper closing phrase used |
Citation
If you use this model, please cite the training repository:
@misc{attention-fusion-distilbert,
title = {Attention Fusion Multi-Task DistilBERT for Child Helpline Analysis},
year = {2025},
url = {https://huggingface.co/rogendo/attention-fusion-distilbert}
}