π§ DualMedBERT: Dual-Teacher Distilled Biomedical Classifier
We present DualMedBERT, a lightweight and reliable text classification framework for disease prediction from patient-reported health conditions. The proposed approach introduces a dual-teacher knowledge distillation pipeline that transfers complementary knowledge from a general-domain language model (BERT-base) and a domain-specific biomedical model (PubMedBERT) into a compact DistilBERT student enhanced with LoRA-based adaptation.
The student model is trained using a combination of focal loss and entropy-weighted dual-teacher distillation, enabling efficient learning under class imbalance while leveraging both linguistic and domain-specific representations. To further improve real-world usability, we incorporate a post-hoc XGBoost-based calibration module that estimates prediction reliability using softmax-derived features.
Experiments on a 27-class disease classification task using patient-reported health data demonstrate that DualMedBERT achieves ~98.5% of BERT-base performance while reducing inference latency by ~1.8Γ. Additionally, the proposed calibration module achieves an AUROC of ~0.90, significantly improving confidence estimation without affecting classification accuracy.
These results show that carefully designed distillation and calibration strategies can yield efficient, accurate, and reliable models suitable for deployment in real-world healthcare-related NLP applications.
π₯ Use Case / Applications
DualMedBERT is designed for real-world disease classification from patient-reported health conditions, where inputs are often unstructured, noisy, and linguistically diverse.
π Potential Applications
Clinical decision support (assistive, not diagnostic)
Classifying patient-reported symptoms into likely disease categories to assist healthcare professionals.Telemedicine and triage systems
Rapidly analyzing patient descriptions to prioritize cases or suggest next steps.Health forums and patient platforms
Automatically categorizing user-reported conditions for better organization and information retrieval.Public health monitoring
Aggregating and analyzing trends in reported symptoms across populations.
β οΈ Important Note
This model is intended for research and assistive purposes only and should not be used for medical diagnosis or treatment decisions without professional oversight.
π‘ Why this matters
Patient-reported health data differs from clinical text:
- Informal language
- Symptom descriptions instead of diagnoses
- Ambiguity and overlap across conditions
DualMedBERT addresses this by combining:
- General language understanding (BERT)
- Biomedical knowledge (PubMedBERT)
- Efficient deployment (DistilBERT + LoRA)
- Reliability estimation (XGBoost calibration)
π§© Model Architecture
Student Model
Backbone:
distilbert-base-uncasedLoRA:
- Rank: r = 8
- Alpha: Ξ± = 32
- Applied to layers 2β5
Additional:
- Layer 1 partially unfrozen
Pooling: CLS + attention pooling
Head: Dense classifier (27 classes)
Teachers
| Teacher | Role |
|---|---|
| BERT-base | General language understanding |
| PubMedBERT | Biomedical domain knowledge |
π§ Training Method
Dual-Teacher Knowledge Distillation
Loss: [ L = \alpha \cdot L_{KD} + (1 - \alpha) \cdot L_{Focal} ]
Where:
- KD uses two teachers
- Weights determined via entropy-based confidence
- Temperature: T = 4.0
- Ξ± (KD balance): 0.6
π Confidence Calibration (XGBoost)
Post-hoc calibrator predicts whether a prediction is correct.
Features (31 total):
- 27 softmax probabilities
- max probability
- entropy
- top-2 gap
- top-3 sum
π Results
| Model | Macro F1 | Accuracy | Latency |
|---|---|---|---|
| BERT-base | 0.8333 | 0.835 | ~16β18 ms |
| PubMedBERT | 0.8553 | 0.855 | ~16β18 ms |
| DualMedBERT | 0.8207 | 0.8226 | ~10 ms |
π Calibration
- AUROC: 0.898β0.903
- Reliability detection: ~83%
βοΈ Training Details
- Optimizer: AdamW
- Learning rate: 2e-4 (student)
- Weight decay: 0.1
- Epochs: 12
- KD temperature: 4.0
- LoRA dropout: 0.05
β οΈ Important Notes
- Slight (~1β2%) drop vs BERT-base
- Adaptive teacher weights showed limited variation (~0.45 / 0.55)
- Model prioritizes speed + reliability over peak accuracy
π Dataset
UCI Drug Review Dataset (GrΓ€Γer et al., 2018)
π Citation
If you use this model, please cite:
- Hinton et al., 2015 β Knowledge Distillation
- Hu et al., 2022 β LoRA
- Sanh et al., 2019 β DistilBERT
- Devlin et al., 2018 β BERT
- Gu et al., 2021 β PubMedBERT
- Lin et al., 2017 β Focal Loss
- Chen & Guestrin, 2016 β XGBoost
- GrΓ€Γer et al., 2018 β Dataset
π Summary
DualMedBERT demonstrates that:
A carefully designed distillation pipeline can retain ~98.5% of BERT performance while achieving ~1.8Γ speedup, improved reliability via calibration, and robust disease classification on patient-reported health conditions.
- Downloads last month
- 329
Model tree for souvik-nlp/DualMedBert
Base model
distilbert/distilbert-base-uncased