ComNumPNdistilBERT
This model is a fine-tuned version of distilbert-base-uncased a subset (8000 samples as training and 2000 samples as validation) of the ComNum dataset. We changed every single digit on a certain position in the numeral into a special character. For example '26753' becomes '3π 5π 7π 6π 2π'. Since the order of magnitude of numerals in the test set is 6, while that of numerals in training and evaluation sets are from 0 to 5, and the fine-tuned model has never seen special character for digits of the order of magnitude of 6 (e.g., '1π', '6π' or '9π'), it cannot generalize well.
It achieves the following results on the evaluation set:
- Loss: 0.0458
- Accuracy: 0.9875
It achieves the following results on the test set:
- Loss: 3.0438
- Accuracy: 0.5909
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 0.2951 | 1.0 | 1000 | 0.1900 | 0.886 |
| 0.09 | 2.0 | 2000 | 0.0701 | 0.9835 |
| 0.0382 | 3.0 | 3000 | 0.0458 | 0.9875 |
Framework versions
- Transformers 4.36.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.0
- Tokenizers 0.15.0
- Downloads last month
- -
Model tree for abbassix/ComNumPNdistilBERTv1-big
Base model
distilbert/distilbert-base-uncased