Souvik Sinha commited on
Commit
2e124dc
·
verified ·
1 Parent(s): 845637a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -8
README.md CHANGED
@@ -22,17 +22,55 @@ metrics:
22
 
23
  # 🧠 DualMedBERT: Dual-Teacher Distilled Biomedical Classifier
24
 
25
- DualMedBERT is a fast and reliable biomedical text classifier trained using **dual-teacher knowledge distillation** from BERT-base and PubMedBERT into a lightweight DistilBERT model enhanced with LoRA.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ---
28
 
29
- # 🚀 Key Highlights
 
 
 
 
 
 
 
 
 
 
 
30
 
31
- * **~1.8× faster** than BERT-base
32
- * 🧠 Retains **~98.5% of BERT performance**
33
- * 🎯 Combines general + biomedical knowledge via dual-teacher KD
34
- * 📊 Confidence calibration with XGBoost (AUROC 0.89)
35
- * 🔬 Designed for **27-class disease classification**
36
 
37
  ---
38
 
@@ -156,6 +194,6 @@ If you use this model, please cite:
156
 
157
  DualMedBERT demonstrates that:
158
 
159
- > A distilled model can retain **~98.5% performance of BERT** while achieving **~1.8× speedup** and improved reliability via calibration.
160
 
161
  ---
 
22
 
23
  # 🧠 DualMedBERT: Dual-Teacher Distilled Biomedical Classifier
24
 
25
+
26
+ We present **DualMedBERT**, a lightweight and reliable text classification framework for disease prediction from patient-reported health conditions. The proposed approach introduces a dual-teacher knowledge distillation pipeline that transfers complementary knowledge from a general-domain language model (BERT-base) and a domain-specific biomedical model (PubMedBERT) into a compact DistilBERT student enhanced with LoRA-based adaptation.
27
+
28
+ The student model is trained using a combination of focal loss and entropy-weighted dual-teacher distillation, enabling efficient learning under class imbalance while leveraging both linguistic and domain-specific representations. To further improve real-world usability, we incorporate a post-hoc XGBoost-based calibration module that estimates prediction reliability using softmax-derived features.
29
+
30
+ Experiments on a 27-class disease classification task using patient-reported health data demonstrate that DualMedBERT achieves ~98.5% of BERT-base performance while reducing inference latency by ~1.8×. Additionally, the proposed calibration module achieves an AUROC of ~0.90, significantly improving confidence estimation without affecting classification accuracy.
31
+
32
+ These results show that carefully designed distillation and calibration strategies can yield efficient, accurate, and reliable models suitable for deployment in real-world healthcare-related NLP applications.
33
+
34
+ ---
35
+
36
+ ## 🏥 Use Case / Applications
37
+
38
+ DualMedBERT is designed for real-world disease classification from patient-reported health conditions, where inputs are often unstructured, noisy, and linguistically diverse.
39
+
40
+ ### 🔍 Potential Applications
41
+
42
+ - **Clinical decision support (assistive, not diagnostic)**
43
+ Classifying patient-reported symptoms into likely disease categories to assist healthcare professionals.
44
+
45
+ - **Telemedicine and triage systems**
46
+ Rapidly analyzing patient descriptions to prioritize cases or suggest next steps.
47
+
48
+ - **Health forums and patient platforms**
49
+ Automatically categorizing user-reported conditions for better organization and information retrieval.
50
+
51
+ - **Public health monitoring**
52
+ Aggregating and analyzing trends in reported symptoms across populations.
53
 
54
  ---
55
 
56
+ ### ⚠️ Important Note
57
+
58
+ This model is intended for **research and assistive purposes only** and should **not be used for medical diagnosis or treatment decisions without professional oversight**.
59
+
60
+ ---
61
+
62
+ ### 💡 Why this matters
63
+
64
+ Patient-reported health data differs from clinical text:
65
+ - Informal language
66
+ - Symptom descriptions instead of diagnoses
67
+ - Ambiguity and overlap across conditions
68
 
69
+ DualMedBERT addresses this by combining:
70
+ - General language understanding (BERT)
71
+ - Biomedical knowledge (PubMedBERT)
72
+ - Efficient deployment (DistilBERT + LoRA)
73
+ - Reliability estimation (XGBoost calibration)
74
 
75
  ---
76
 
 
194
 
195
  DualMedBERT demonstrates that:
196
 
197
+ > > A carefully designed distillation pipeline can retain **~98.5% of BERT performance** while achieving **~1.8× speedup**, improved reliability via calibration, and robust disease classification on patient-reported health conditions.
198
 
199
  ---