🎯 Our Goal & Future Usecases

DualMedBert is an early-stage text classification model built to categorize patient-reported health and drug conditions. It was trained using the UCI Drug Review Dataset, which contains patient reviews sourced directly from drugs.com.

The internet is full of unstructured patient data—whether on health forums, review sites like drugs.com, or in clinic intake forms. Our goal with DualMedBert is to process this messy, patient-written text to help researchers and analysts:

Analyze Adverse Drug Effects: Quickly sort patient reviews to figure out how different demographics are reacting to certain medications.
Track Disease Trends: Automatically categorize thousands of forum posts or clinic notes to see what conditions are trending in a specific region or dataset.

Note: These specific analytical use cases are part of our future roadmap. The model is currently under active development.

⚠️ Important Limitations

Not for Diagnosis: This model is strictly an analytical tool designed for research and data structuring. It is never to be used for direct medical diagnosis, advice, or patient treatment.
Limited Scope: The current version is only trained to recognize 27 specific diseases/conditions. If a text describes a condition outside this list, the model cannot predict it accurately. We plan to expand its capacity in future iterations.

🧠 DualMedBERT: Dual-Teacher Distilled Biomedical Classifier

⚠️ Testing Phase Notice: DualMedBERT is currently in an active testing phase. The present experiments cover 27 disease classes from the UCI Drug Review dataset, which represents a relatively small and focused slice of the clinical NLP landscape. We intend to extend testing across many more disease categories and significantly larger sample sizes in future iterations. The current dataset size is limited, which contributes to mild overfitting observed in the student model. At scale — with more diverse classes and substantially more training data — we expect the model's generalization ability and real-world reliability to improve considerably. Results reported here reflect this early-stage evaluation.

We present DualMedBERT, a lightweight and reliable text classification framework for disease prediction from patient-reported health conditions. The proposed approach introduces a dual-teacher knowledge distillation pipeline that transfers complementary knowledge from a general-domain language model (BERT-base) and a domain-specific biomedical model (PubMedBERT) into a compact DistilBERT student enhanced with LoRA-based adaptation.

The student model is trained using a combination of focal loss and entropy-weighted dual-teacher distillation, enabling efficient learning under class imbalance while leveraging both linguistic and domain-specific representations. To further improve real-world usability, we incorporate a post-hoc XGBoost-based calibration module that estimates prediction reliability using softmax-derived features.

Experiments on a 27-class disease classification task using patient-reported health data demonstrate that DualMedBERT achieves a Macro F1 of 0.8432 and Accuracy of 84.4% — matching or exceeding BERT-base performance — while reducing inference latency by ~1.6–1.8× (encoder: 10.13 ms, end-to-end: 11.06 ms). The calibration module achieves an AUROC of 0.8847 with a calibration accuracy of 83.33%, significantly improving confidence estimation without affecting classification performance.

These results show that carefully designed distillation and calibration strategies can yield efficient, accurate, and reliable models suitable for deployment in real-world healthcare-related NLP applications.

🏥 Use Case / Applications

DualMedBERT is designed for real-world disease classification from patient-reported health conditions, where inputs are often unstructured, noisy, and linguistically diverse.

🔍 Potential Applications

Clinical decision support (assistive, not diagnostic)
Classifying patient-reported symptoms into likely disease categories to assist healthcare professionals.
Telemedicine and triage systems
Rapidly analyzing patient descriptions to prioritize cases or suggest next steps.
Health forums and patient platforms
Automatically categorizing user-reported conditions for better organization and information retrieval.
Public health monitoring
Aggregating and analyzing trends in reported symptoms across populations.

⚠️ Important Note

This model is intended for research and assistive purposes only and should not be used for medical diagnosis or treatment decisions without professional oversight.

💡 Why This Matters

Patient-reported health data differs from clinical text:

Informal language
Symptom descriptions instead of diagnoses
Ambiguity and overlap across conditions

DualMedBERT addresses this by combining:

General language understanding (BERT)
Biomedical domain knowledge (PubMedBERT)
Efficient deployment (DistilBERT + LoRA)
Reliability estimation (XGBoost calibration)

🧩 Model Architecture

Student Model

Component	Detail
Backbone	`distilbert-base-uncased`
LoRA Rank	r = 8
LoRA Alpha	α = 32
LoRA Dropout	0.05
LoRA Applied To	Layers 2–5
Layer 1	Partially unfrozen
Pooling	CLS token + attention pooling
Classifier Head	Dense → 27 disease classes
Max Sequence Len	256 tokens

Teachers

Teacher	Checkpoint	Role
BERT-base	`google-bert/bert-base-uncased`	General language understanding
PubMedBERT	`microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext`	Biomedical domain knowledge

🧠 Training Method

Dual-Teacher Knowledge Distillation

The total training loss combines knowledge distillation from two teachers with focal classification loss:

L = α · L_KD_BERT + β · L_KD_PubMed + (1 - α - β) · L_Focal

Where:

KD uses two teachers in parallel
Teacher weights are determined via entropy-based confidence (adaptive weighting)
α (KD weight — BERT teacher): 0.4
β (KD weight — PubMedBERT teacher): 0.5
KD Temperature (T): 3.5
The remaining weight (1 - 0.4 - 0.5 = 0.1) is applied to the focal loss

📊 Confidence Calibration (XGBoost)

Post-hoc calibrator predicts whether a prediction is likely to be correct, enabling flagging of uncertain predictions.

Features Used (31 total):

Feature Group	Details
Softmax probabilities	All 27 class-wise softmax outputs
Max probability	`max(softmax)` — confidence in top prediction
Entropy	Shannon entropy over softmax distribution
Top-2 gap	Difference between top-1 and top-2 softmax values
Top-3 sum	Sum of top-3 softmax probabilities

📈 Results

Note: These results are from the current testing phase on the UCI Drug Review dataset (27 disease classes). Results may improve significantly with more data and broader disease coverage. Mild overfitting is observed due to limited dataset size.

Classification Performance

Model	Macro F1	Accuracy	Latency (Encoder)	Latency (End-to-End)
BERT-base	0.8333	0.835	~16–18 ms	~16–18 ms
PubMedBERT	0.8553	0.855	~16–18 ms	~16–18 ms
DualMedBERT ✅	0.8432	0.8440	10.13 ms	11.06 ms

DualMedBERT achieves higher Macro F1 than BERT-base while running at ~1.6× lower latency compared to the teacher models.

🔍 Calibration Performance

Metric	Value
Calibration AUROC	0.8847
Calibration Accuracy	83.33%

The XGBoost calibrator reliably detects when the student's prediction is likely to be wrong — enabling downstream systems to flag low-confidence outputs for human review.

⚙️ Training Details

Hyperparameter	Value
Optimizer	AdamW
Learning Rate (student)	1.5e-4
Weight Decay	0.1
Epochs	12
Early Stopping Patience	3
KD Temperature (T)	3.5
KD Alpha (BERT weight)	0.4
KD Beta (PubMedBERT weight)	0.5
LoRA Dropout	0.05
Max Sequence Length	256

🏷️ Supported Disease Classes (27)

ID	Disease
0	Abnormal Uterine Bleeding
1	Allergic Rhinitis
2	Bacterial Infection
3	Benign Prostatic Hyperplasia
4	Constipation
5	Diabetes, Type 2
6	Endometriosis
7	Erectile Dysfunction
8	GERD
9	Hepatitis C
10	High Blood Pressure
11	High Cholesterol
12	HIV Infection
13	Hyperhidrosis
14	Fibromyalgia
15	Irritable Bowel Syndrome
16	Migraine
17	Migraine Prevention
18	Multiple Sclerosis
19	Osteoarthritis
20	Overactive Bladder
21	Psoriasis
22	Restless Legs Syndrome
23	Rheumatoid Arthritis
24	Sinusitis
25	Urinary Tract Infection
26	Vaginal Yeast Infection

📂 Repository Structure

DualMedBert/
├── README.md                    # This file
├── config.json                  # Full model and training configuration
├── label_map.json               # Class ID → disease name mapping
├── student_weights.pt           # Trained student model weights
├── tokenizer.json               # Student tokenizer
├── tokenizer_config.json        # Tokenizer configuration
├── vocab.txt                    # Vocabulary file
├── special_tokens_map.json      # Special token definitions
├── xgb_calibrator.json          # Trained XGBoost calibration model
├── temperature_scaler.joblib    # Temperature scaling object (post-hoc)
├── bert_teacher/                # Fine-tuned BERT-base teacher
│   ├── config.json
│   ├── model.safetensors
│   ├── tokenizer.json
│   └── ...
├── pubmed_teacher/              # Fine-tuned PubMedBERT teacher
│   ├── config.json
│   ├── model.safetensors
│   ├── tokenizer.json
│   └── ...
└── plots/                       # Evaluation and analysis figures
    ├── fig1_kd_training_dynamics.png
    ├── fig2_model_comparison.png
    ├── fig3_per_class_f1.png
    ├── fig4_confusion_matrix.png
    ├── fig5_calibrator_analysis.png
    ├── fig6_loss_decomposition.png
    └── fig_shap_importance.png

⚠️ Important Notes & Limitations

Current testing phase: Results are based on a single dataset (UCI Drug Reviews, 27 classes) with limited samples. The model shows mild overfitting attributable to the small dataset size.
Planned expansion: We intend to test DualMedBERT across many more disease classes and with significantly larger datasets. Broader data is expected to unlock better generalization and substantially stronger real-world performance.
Adaptive teacher weights: Teacher confidence weights showed limited dynamic variation (~0.45 / 0.55) during training, suggesting both teachers contribute fairly consistently across the dataset.
Speed–accuracy tradeoff: The model is designed to prioritize speed and reliability while maintaining competitive classification accuracy relative to its teachers.
Not for diagnosis: This model is for research and assistive purposes only. It should not be used as a substitute for professional medical judgment.

📂 Dataset

UCI Drug Review Dataset (Gräßer et al., 2018)
Patient-written drug reviews paired with condition labels. Reviews are informal, symptom-rich, and linguistically diverse — making this an appropriate benchmark for patient-reported health classification.

📚 Citation

If you use DualMedBERT, please cite the following relevant works:

Hinton et al., 2015 — Knowledge Distillation: Distilling the Knowledge in a Neural Network
Hu et al., 2022 — LoRA: Low-Rank Adaptation of Large Language Models
Sanh et al., 2019 — DistilBERT, a distilled version of BERT
Devlin et al., 2018 — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Gu et al., 2021 — Domain-Specific Language Model Pretraining for Biomedical NLU (PubMedBERT)
Lin et al., 2017 — Focal Loss for Dense Object Detection
Chen & Guestrin, 2016 — XGBoost: A Scalable Tree Boosting System
Gräßer et al., 2018 — Aspect-Based Sentiment Analysis of Drug Reviews (UCI Drug Review Dataset)

🏁 Summary

DualMedBERT demonstrates that a carefully designed dual-teacher distillation pipeline can:

✅ Outperform BERT-base in Macro F1 (0.8432 vs. 0.8333) on the current test set
✅ Achieve ~1.6–1.8× lower inference latency (10.13 ms encoder / 11.06 ms end-to-end)
✅ Provide reliable confidence estimation via XGBoost calibration (AUROC: 0.8847, Accuracy: 83.33%)
⏳ Under active expansion — future work will cover more disease classes and larger datasets for improved generalization

Downloads last month: 12

Safetensors

Model size

70.1M params

Tensor type

F32

Model tree for souvik-nlp/DualMedBert

Base model

distilbert/distilbert-base-uncased

Adapter

(380)

this model

souvik-nlp
/

DualMedBert