souvik-nlp
/

DualMedBert

@@ -22,17 +22,55 @@ metrics:
 # 🧠 DualMedBERT: Dual-Teacher Distilled Biomedical Classifier
-DualMedBERT is a fast and reliable biomedical text classifier trained using **dual-teacher knowledge distillation** from BERT-base and PubMedBERT into a lightweight DistilBERT model enhanced with LoRA.
 ---
-# 🚀 Key Highlights
-* ⚡ **~1.8× faster** than BERT-base
-* 🧠 Retains **~98.5% of BERT performance**
-* 🎯 Combines general + biomedical knowledge via dual-teacher KD
-* 📊 Confidence calibration with XGBoost (AUROC ≈ 0.89)
-* 🔬 Designed for **27-class disease classification**
 ---
@@ -156,6 +194,6 @@ If you use this model, please cite:
 DualMedBERT demonstrates that:
-> A distilled model can retain **~98.5% performance of BERT** while achieving **~1.8× speedup** and improved reliability via calibration.
 ---

 # 🧠 DualMedBERT: Dual-Teacher Distilled Biomedical Classifier
+We present **DualMedBERT**, a lightweight and reliable text classification framework for disease prediction from patient-reported health conditions. The proposed approach introduces a dual-teacher knowledge distillation pipeline that transfers complementary knowledge from a general-domain language model (BERT-base) and a domain-specific biomedical model (PubMedBERT) into a compact DistilBERT student enhanced with LoRA-based adaptation.
+The student model is trained using a combination of focal loss and entropy-weighted dual-teacher distillation, enabling efficient learning under class imbalance while leveraging both linguistic and domain-specific representations. To further improve real-world usability, we incorporate a post-hoc XGBoost-based calibration module that estimates prediction reliability using softmax-derived features.
+Experiments on a 27-class disease classification task using patient-reported health data demonstrate that DualMedBERT achieves ~98.5% of BERT-base performance while reducing inference latency by ~1.8×. Additionally, the proposed calibration module achieves an AUROC of ~0.90, significantly improving confidence estimation without affecting classification accuracy.
+These results show that carefully designed distillation and calibration strategies can yield efficient, accurate, and reliable models suitable for deployment in real-world healthcare-related NLP applications.
+---
+## 🏥 Use Case / Applications
+DualMedBERT is designed for real-world disease classification from patient-reported health conditions, where inputs are often unstructured, noisy, and linguistically diverse.
+### 🔍 Potential Applications
+- **Clinical decision support (assistive, not diagnostic)**
+  Classifying patient-reported symptoms into likely disease categories to assist healthcare professionals.
+- **Telemedicine and triage systems**
+  Rapidly analyzing patient descriptions to prioritize cases or suggest next steps.
+- **Health forums and patient platforms**
+  Automatically categorizing user-reported conditions for better organization and information retrieval.
+- **Public health monitoring**
+  Aggregating and analyzing trends in reported symptoms across populations.
 ---
+### ⚠️ Important Note
+This model is intended for **research and assistive purposes only** and should **not be used for medical diagnosis or treatment decisions without professional oversight**.
+---
+### 💡 Why this matters
+Patient-reported health data differs from clinical text:
+- Informal language
+- Symptom descriptions instead of diagnoses
+- Ambiguity and overlap across conditions
+DualMedBERT addresses this by combining:
+- General language understanding (BERT)
+- Biomedical knowledge (PubMedBERT)
+- Efficient deployment (DistilBERT + LoRA)
+- Reliability estimation (XGBoost calibration)
 ---
 DualMedBERT demonstrates that:
+> > A carefully designed distillation pipeline can retain **~98.5% of BERT performance** while achieving **~1.8× speedup**, improved reliability via calibration, and robust disease classification on patient-reported health conditions.
 ---