medical_condition_classification
This model is a fine-tuned version of distilbert-base-uncased on an Drugs.com dataset. It achieves the following results on the test data set:
- Loss: 0.8930
- Accuracy: 0.7951
Model description
The Goal of the model is to predict the medical condition based on the review of the drug. There're 751 classes.
Intended uses & limitations
More information needed
Training and evaluation data
The training, evaluation & testing data can be found under samsaara/medical_condition_classification of the 🤗 Datasets and the process itself can be found in the modeling.ipynb notebook.
By default, the dataset has train, test splits. train is then further divided into train, validation splits with 0.8, 0.2 ratio. Final results shown are on the test dataset.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 1.8625 | 0.4329 | 2000 | 1.7199 | 0.6397 |
| 1.459 | 0.8658 | 4000 | 1.3696 | 0.6890 |
| 1.1737 | 1.2987 | 6000 | 1.2131 | 0.7172 |
| 1.042 | 1.7316 | 8000 | 1.1014 | 0.7329 |
| 0.8431 | 2.1645 | 10000 | 1.0322 | 0.7510 |
| 0.8012 | 2.5974 | 12000 | 0.9889 | 0.7587 |
| 0.7312 | 3.0303 | 14000 | 0.9497 | 0.7727 |
| 0.6561 | 3.4632 | 16000 | 0.9338 | 0.7805 |
| 0.6132 | 3.8961 | 18000 | 0.9073 | 0.7875 |
| 0.5195 | 4.3290 | 20000 | 0.9011 | 0.7929 |
| 0.5015 | 4.7619 | 22000 | 0.8930 | 0.7951 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.4.1
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 4
Model tree for samsaara/medical_condition_classification
Base model
distilbert/distilbert-base-uncased