Souvik Sinha commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -22,17 +22,55 @@ metrics:
|
|
| 22 |
|
| 23 |
# 🧠 DualMedBERT: Dual-Teacher Distilled Biomedical Classifier
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
|
| 37 |
---
|
| 38 |
|
|
@@ -156,6 +194,6 @@ If you use this model, please cite:
|
|
| 156 |
|
| 157 |
DualMedBERT demonstrates that:
|
| 158 |
|
| 159 |
-
> A
|
| 160 |
|
| 161 |
---
|
|
|
|
| 22 |
|
| 23 |
# 🧠 DualMedBERT: Dual-Teacher Distilled Biomedical Classifier
|
| 24 |
|
| 25 |
+
|
| 26 |
+
We present **DualMedBERT**, a lightweight and reliable text classification framework for disease prediction from patient-reported health conditions. The proposed approach introduces a dual-teacher knowledge distillation pipeline that transfers complementary knowledge from a general-domain language model (BERT-base) and a domain-specific biomedical model (PubMedBERT) into a compact DistilBERT student enhanced with LoRA-based adaptation.
|
| 27 |
+
|
| 28 |
+
The student model is trained using a combination of focal loss and entropy-weighted dual-teacher distillation, enabling efficient learning under class imbalance while leveraging both linguistic and domain-specific representations. To further improve real-world usability, we incorporate a post-hoc XGBoost-based calibration module that estimates prediction reliability using softmax-derived features.
|
| 29 |
+
|
| 30 |
+
Experiments on a 27-class disease classification task using patient-reported health data demonstrate that DualMedBERT achieves ~98.5% of BERT-base performance while reducing inference latency by ~1.8×. Additionally, the proposed calibration module achieves an AUROC of ~0.90, significantly improving confidence estimation without affecting classification accuracy.
|
| 31 |
+
|
| 32 |
+
These results show that carefully designed distillation and calibration strategies can yield efficient, accurate, and reliable models suitable for deployment in real-world healthcare-related NLP applications.
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## 🏥 Use Case / Applications
|
| 37 |
+
|
| 38 |
+
DualMedBERT is designed for real-world disease classification from patient-reported health conditions, where inputs are often unstructured, noisy, and linguistically diverse.
|
| 39 |
+
|
| 40 |
+
### 🔍 Potential Applications
|
| 41 |
+
|
| 42 |
+
- **Clinical decision support (assistive, not diagnostic)**
|
| 43 |
+
Classifying patient-reported symptoms into likely disease categories to assist healthcare professionals.
|
| 44 |
+
|
| 45 |
+
- **Telemedicine and triage systems**
|
| 46 |
+
Rapidly analyzing patient descriptions to prioritize cases or suggest next steps.
|
| 47 |
+
|
| 48 |
+
- **Health forums and patient platforms**
|
| 49 |
+
Automatically categorizing user-reported conditions for better organization and information retrieval.
|
| 50 |
+
|
| 51 |
+
- **Public health monitoring**
|
| 52 |
+
Aggregating and analyzing trends in reported symptoms across populations.
|
| 53 |
|
| 54 |
---
|
| 55 |
|
| 56 |
+
### ⚠️ Important Note
|
| 57 |
+
|
| 58 |
+
This model is intended for **research and assistive purposes only** and should **not be used for medical diagnosis or treatment decisions without professional oversight**.
|
| 59 |
+
|
| 60 |
+
---
|
| 61 |
+
|
| 62 |
+
### 💡 Why this matters
|
| 63 |
+
|
| 64 |
+
Patient-reported health data differs from clinical text:
|
| 65 |
+
- Informal language
|
| 66 |
+
- Symptom descriptions instead of diagnoses
|
| 67 |
+
- Ambiguity and overlap across conditions
|
| 68 |
|
| 69 |
+
DualMedBERT addresses this by combining:
|
| 70 |
+
- General language understanding (BERT)
|
| 71 |
+
- Biomedical knowledge (PubMedBERT)
|
| 72 |
+
- Efficient deployment (DistilBERT + LoRA)
|
| 73 |
+
- Reliability estimation (XGBoost calibration)
|
| 74 |
|
| 75 |
---
|
| 76 |
|
|
|
|
| 194 |
|
| 195 |
DualMedBERT demonstrates that:
|
| 196 |
|
| 197 |
+
> > A carefully designed distillation pipeline can retain **~98.5% of BERT performance** while achieving **~1.8× speedup**, improved reliability via calibration, and robust disease classification on patient-reported health conditions.
|
| 198 |
|
| 199 |
---
|