7b0df2f38673520a354713b44b1a702e
This model is a fine-tuned version of openai-community/gpt2-medium on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:
- Loss: 0.4623
- Data Size: 1.0
- Epoch Runtime: 30.1491
- Mse: 0.4625
- Mae: 0.5324
- R2: 0.7931
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Mse | Mae | R2 |
|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 176.4084 | 0 | 2.8419 | 176.4120 | 11.7811 | -77.9155 |
| No log | 1 | 179 | 59.0898 | 0.0078 | 3.3330 | 59.0915 | 6.0292 | -25.4338 |
| No log | 2 | 358 | 17.1510 | 0.0156 | 3.7229 | 17.1525 | 2.9875 | -6.6729 |
| No log | 3 | 537 | 6.1486 | 0.0312 | 4.4895 | 6.1500 | 1.6667 | -1.7511 |
| No log | 4 | 716 | 2.9114 | 0.0625 | 5.7511 | 2.9121 | 1.3815 | -0.3027 |
| No log | 5 | 895 | 1.0947 | 0.125 | 7.5318 | 1.0953 | 0.8702 | 0.5100 |
| 0.424 | 6 | 1074 | 0.6584 | 0.25 | 10.4486 | 0.6587 | 0.6350 | 0.7053 |
| 0.7802 | 7 | 1253 | 0.7598 | 0.5 | 16.9764 | 0.7602 | 0.6892 | 0.6599 |
| 0.4785 | 8.0 | 1432 | 0.6298 | 1.0 | 29.0787 | 0.6301 | 0.6198 | 0.7181 |
| 0.3272 | 9.0 | 1611 | 0.5007 | 1.0 | 30.4873 | 0.5010 | 0.5430 | 0.7759 |
| 0.2185 | 10.0 | 1790 | 0.4898 | 1.0 | 29.5336 | 0.4900 | 0.5389 | 0.7808 |
| 0.1637 | 11.0 | 1969 | 0.4849 | 1.0 | 29.2989 | 0.4851 | 0.5340 | 0.7830 |
| 0.1285 | 12.0 | 2148 | 0.4819 | 1.0 | 30.1512 | 0.4821 | 0.5446 | 0.7843 |
| 0.1096 | 13.0 | 2327 | 0.4552 | 1.0 | 29.3088 | 0.4555 | 0.5195 | 0.7963 |
| 0.0972 | 14.0 | 2506 | 0.4886 | 1.0 | 29.3155 | 0.4888 | 0.5518 | 0.7813 |
| 0.0785 | 15.0 | 2685 | 0.4661 | 1.0 | 29.3653 | 0.4663 | 0.5368 | 0.7914 |
| 0.0768 | 16.0 | 2864 | 0.4665 | 1.0 | 28.8188 | 0.4668 | 0.5311 | 0.7912 |
| 0.069 | 17.0 | 3043 | 0.4351 | 1.0 | 29.3634 | 0.4353 | 0.5070 | 0.8053 |
| 0.0685 | 18.0 | 3222 | 0.4343 | 1.0 | 29.1231 | 0.4344 | 0.5135 | 0.8057 |
| 0.0499 | 19.0 | 3401 | 0.4519 | 1.0 | 29.6246 | 0.4520 | 0.5318 | 0.7978 |
| 0.0484 | 20.0 | 3580 | 0.4680 | 1.0 | 29.7614 | 0.4682 | 0.5376 | 0.7905 |
| 0.0514 | 21.0 | 3759 | 0.4577 | 1.0 | 29.4914 | 0.4578 | 0.5301 | 0.7952 |
| 0.0457 | 22.0 | 3938 | 0.4623 | 1.0 | 30.1491 | 0.4625 | 0.5324 | 0.7931 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- 2
Model tree for contemmcm/7b0df2f38673520a354713b44b1a702e
Base model
openai-community/gpt2-medium