Attention Fusion Multi-Task DistilBERT

A single DistilBERT-base-cased backbone trained simultaneously on three NLP tasks for child helpline conversation analysis, using per-task learned attention pooling heads.

Task Type Heads
Named Entity Recognition (NER) Token-level 10 entity labels
Case Classification (CLS) Sentence-level 4 heads (main category, sub-category, intervention, priority)
Quality Assurance Scoring (QA) Sentence-level 6 binary heads (17 sub-metrics)

Validation Metrics (best checkpoint)

Metric Value
NER macro F1 0.5343
CLS average accuracy (4 heads) 0.6183
QA average micro-F1 (6 heads) 0.8386
Composite average 0.6638

Usage

Install dependencies

pip install torch transformers huggingface_hub

Download and run inference

from huggingface_hub import snapshot_download
import json, torch

# Download all files from the Hub
model_dir = snapshot_download(repo_id="rogendo/attention-fusion-distilbert")

from inference import AttentionFusionInference

inf = AttentionFusionInference(model_dir=model_dir, device="cuda")

texts = [
    "Hello, I'm calling from Nairobi. My daughter Sarah, aged 12, was assaulted by her teacher.",
]

# Named Entity Recognition
ner_results = inf.predict_ner(texts)
# β†’ [[("Sarah", "NAME"), ("12", "AGE"), ("Nairobi", "LOCATION"), ...]]

# Case Classification
cls_results = inf.predict_classification(texts)
# β†’ [{"main_category": "VANE", "sub_category": "Physical Abuse",
#      "intervention": "Counselling", "priority": "1"}]

# Quality Assurance Scoring
qa_results = inf.predict_qa(texts)
# β†’ [{"opening": [1], "listening": [1,0,1,1,0], ...}]

Architecture

Input Text β†’ Tokenizer β†’ DistilBERT-base-cased (shared backbone)
                                    β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                     β”‚                      β”‚
           NER Head             CLS Head               QA Head
       Linear(768β†’768)      TaskAttnPooling        TaskAttnPooling
        + GELU + Drop           + Dropout              + Dropout
       Linear(768β†’10)       4 classifiers          6 binary heads
        [per token]         (main/sub/interv/       (open/listen/
       CrossEntropy          priority)               proact/resolv/
       ignore=-100          CrossEntropyΓ—4           hold/close)
                            ignore=-1               BCEWithLogitsΓ—6

TaskAttentionPooling replaces the fixed [CLS] token with a learned per-task attention weighted sum over all token positions, so each task can focus on the tokens most relevant to its objective.

Training Strategy β€” Task Alternation

Each epoch, all batches from all three DataLoaders are collected, tagged with their task name, and randomly shuffled into a single sequence. The model iterates through this shuffled list β€” giving the shared backbone continuous gradient signal from all three tasks and preventing catastrophic forgetting.

NER Labels (10 classes)

O, NAME, LOCATION, VICTIM, AGE, GENDER, INCIDENT_TYPE, PERPETRATOR, PHONE_NUMBER, LANDMARK

Classification Labels

Main categories (8): Advice and Counselling, Child Maintenance & Custody, Disability, GBV, Information, Nutrition, Unknown, VANE

Interventions (16): Awareness/Information Provided, Counseling, Counseling, Referral, Counselling, Counselling Referral, Counselling Referral Signposting, Counselling, Awareness/Information Provided, Counselling, Awareness/Information Provided, Signposting, Counselling, Referral, Counselling, Referral, Signposting, Counselling, Signposting, Information Provided, Counselling, Referral, Referral, Counselling, Referral, Signposting, Signposting

Priority: 1 (critical) Β· 2 (urgent) Β· 3 (routine)

QA Scoring Heads

| opening | 1 | Call opening phrase used | | listening | 5 | Did not interrupt caller, Showed empathy, Paraphrased the issue, Used please/thank you, Did not hesitate | | proactiveness | 3 | Offered to solve extra issues, Confirmed satisfaction, Followed up on updates | | resolution | 5 | Gave accurate information, Correct language use, Consulted when unsure, Followed correct steps, Explained solution clearly | | hold | 2 | Explained before placing on hold, Thanked caller for holding | | closing | 1 | Proper closing phrase used |

Citation

If you use this model, please cite the training repository:

@misc{attention-fusion-distilbert,
  title  = {Attention Fusion Multi-Task DistilBERT for Child Helpline Analysis},
  year   = {2025},
  url    = {https://huggingface.co/rogendo/attention-fusion-distilbert}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support