Drug Review Condition Classifier (DistilBERT)

This model is a multi-class text classification model trained to predict a medical condition based on patient drug reviews from the Drugs.com dataset.


πŸ“Œ Model Overview

  • Base model: distilbert-base-uncased
  • Task: Text Classification
  • Number of labels: ~770+ medical conditions
  • Max sequence length: 256
  • Training epochs: 3
  • Optimizer: AdamW
  • Weight decay: 0.01

πŸ“Š Evaluation Results

Validation

  • Accuracy: ~0.74
  • Macro F1: ~0.15
  • Loss: ~1.15

Test

  • Accuracy: ~0.74
  • Macro F1: ~0.15
  • Loss: ~1.13

Macro F1 is relatively low due to strong class imbalance and a large number of rare condition labels. Accuracy reflects strong performance on frequent condition classes.


🧠 Training Details

  • Hugging Face Trainer API
  • Dynamic padding with DataCollatorWithPadding
  • Automatic acceleration via Accelerate
  • Train-only label space (no label leakage)
  • Evaluation on held-out validation and test splits

πŸš€ Example Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Talip7/distilbert-drug-cls"
)

classifier(
    "This medication significantly reduced my migraine but caused nausea and dizziness."
)

⚠️ Limitations

Highly imbalanced class distribution

User-generated reviews may contain noise

Not intended for medical advice or diagnosis


πŸ“š Dataset

Source: Drugs.com reviews dataset

Preprocessing:

lowercasing

HTML cleanup

minimum review length filtering


πŸ‘¨β€πŸ’» Author

Trained and published as part of hands-on NLP / LLM learning with Hugging Face.

Downloads last month
1
Safetensors
Model size
67.5M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support