🧠 DualMedBERT: Dual-Teacher Distilled Biomedical Classifier

We present DualMedBERT, a lightweight and reliable text classification framework for disease prediction from patient-reported health conditions. The proposed approach introduces a dual-teacher knowledge distillation pipeline that transfers complementary knowledge from a general-domain language model (BERT-base) and a domain-specific biomedical model (PubMedBERT) into a compact DistilBERT student enhanced with LoRA-based adaptation.

The student model is trained using a combination of focal loss and entropy-weighted dual-teacher distillation, enabling efficient learning under class imbalance while leveraging both linguistic and domain-specific representations. To further improve real-world usability, we incorporate a post-hoc XGBoost-based calibration module that estimates prediction reliability using softmax-derived features.

Experiments on a 27-class disease classification task using patient-reported health data demonstrate that DualMedBERT achieves ~98.5% of BERT-base performance while reducing inference latency by ~1.8Γ—. Additionally, the proposed calibration module achieves an AUROC of ~0.90, significantly improving confidence estimation without affecting classification accuracy.

These results show that carefully designed distillation and calibration strategies can yield efficient, accurate, and reliable models suitable for deployment in real-world healthcare-related NLP applications.


πŸ₯ Use Case / Applications

DualMedBERT is designed for real-world disease classification from patient-reported health conditions, where inputs are often unstructured, noisy, and linguistically diverse.

πŸ” Potential Applications

  • Clinical decision support (assistive, not diagnostic)
    Classifying patient-reported symptoms into likely disease categories to assist healthcare professionals.

  • Telemedicine and triage systems
    Rapidly analyzing patient descriptions to prioritize cases or suggest next steps.

  • Health forums and patient platforms
    Automatically categorizing user-reported conditions for better organization and information retrieval.

  • Public health monitoring
    Aggregating and analyzing trends in reported symptoms across populations.


⚠️ Important Note

This model is intended for research and assistive purposes only and should not be used for medical diagnosis or treatment decisions without professional oversight.


πŸ’‘ Why this matters

Patient-reported health data differs from clinical text:

  • Informal language
  • Symptom descriptions instead of diagnoses
  • Ambiguity and overlap across conditions

DualMedBERT addresses this by combining:

  • General language understanding (BERT)
  • Biomedical knowledge (PubMedBERT)
  • Efficient deployment (DistilBERT + LoRA)
  • Reliability estimation (XGBoost calibration)

🧩 Model Architecture

Student Model

  • Backbone: distilbert-base-uncased

  • LoRA:

    • Rank: r = 8
    • Alpha: Ξ± = 32
    • Applied to layers 2–5
  • Additional:

    • Layer 1 partially unfrozen
  • Pooling: CLS + attention pooling

  • Head: Dense classifier (27 classes)


Teachers

Teacher Role
BERT-base General language understanding
PubMedBERT Biomedical domain knowledge

🧠 Training Method

Dual-Teacher Knowledge Distillation

Loss: [ L = \alpha \cdot L_{KD} + (1 - \alpha) \cdot L_{Focal} ]

Where:

  • KD uses two teachers
  • Weights determined via entropy-based confidence
  • Temperature: T = 4.0
  • Ξ± (KD balance): 0.6

πŸ“Š Confidence Calibration (XGBoost)

Post-hoc calibrator predicts whether a prediction is correct.

Features (31 total):

  • 27 softmax probabilities
  • max probability
  • entropy
  • top-2 gap
  • top-3 sum

πŸ“ˆ Results

Model Macro F1 Accuracy Latency
BERT-base 0.8333 0.835 ~16–18 ms
PubMedBERT 0.8553 0.855 ~16–18 ms
DualMedBERT 0.8207 0.8226 ~10 ms

πŸ” Calibration

  • AUROC: 0.898–0.903
  • Reliability detection: ~83%

βš™οΈ Training Details

  • Optimizer: AdamW
  • Learning rate: 2e-4 (student)
  • Weight decay: 0.1
  • Epochs: 12
  • KD temperature: 4.0
  • LoRA dropout: 0.05

⚠️ Important Notes

  • Slight (~1–2%) drop vs BERT-base
  • Adaptive teacher weights showed limited variation (~0.45 / 0.55)
  • Model prioritizes speed + reliability over peak accuracy

πŸ“‚ Dataset

UCI Drug Review Dataset (GrÀßer et al., 2018)


πŸ“š Citation

If you use this model, please cite:

  • Hinton et al., 2015 β€” Knowledge Distillation
  • Hu et al., 2022 β€” LoRA
  • Sanh et al., 2019 β€” DistilBERT
  • Devlin et al., 2018 β€” BERT
  • Gu et al., 2021 β€” PubMedBERT
  • Lin et al., 2017 β€” Focal Loss
  • Chen & Guestrin, 2016 β€” XGBoost
  • GrÀßer et al., 2018 β€” Dataset

🏁 Summary

DualMedBERT demonstrates that:

A carefully designed distillation pipeline can retain ~98.5% of BERT performance while achieving ~1.8Γ— speedup, improved reliability via calibration, and robust disease classification on patient-reported health conditions.


Downloads last month
329
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for souvik-nlp/DualMedBert

Adapter
(371)
this model