train_wsc_456_1768397601
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:
- Loss: 0.3353
- Num Input Tokens Seen: 434784
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 456
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.5806 | 0.5020 | 125 | 0.3903 | 21856 |
| 0.3905 | 1.0040 | 250 | 0.3492 | 43616 |
| 0.3177 | 1.5060 | 375 | 0.3473 | 65728 |
| 0.3597 | 2.0080 | 500 | 0.3353 | 87264 |
| 0.3412 | 2.5100 | 625 | 0.3670 | 108800 |
| 0.3437 | 3.0120 | 750 | 0.3367 | 130752 |
| 0.3563 | 3.5141 | 875 | 0.3360 | 153248 |
| 0.3521 | 4.0161 | 1000 | 0.3649 | 174560 |
| 0.3514 | 4.5181 | 1125 | 0.3627 | 196896 |
| 0.3449 | 5.0201 | 1250 | 0.3473 | 218384 |
| 0.3636 | 5.5221 | 1375 | 0.3436 | 239584 |
| 0.3541 | 6.0241 | 1500 | 0.3560 | 261760 |
| 0.3572 | 6.5261 | 1625 | 0.3578 | 283744 |
| 0.4247 | 7.0281 | 1750 | 0.3530 | 305552 |
| 0.3511 | 7.5301 | 1875 | 0.3504 | 326736 |
| 0.3561 | 8.0321 | 2000 | 0.3551 | 349328 |
| 0.3485 | 8.5341 | 2125 | 0.3589 | 371040 |
| 0.3365 | 9.0361 | 2250 | 0.3598 | 392704 |
| 0.3674 | 9.5382 | 2375 | 0.3632 | 414608 |
Framework versions
- PEFT 0.17.1
- Transformers 4.51.3
- Pytorch 2.9.1+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 2
Model tree for rbelanec/train_wsc_456_1768397601
Base model
meta-llama/Meta-Llama-3-8B-Instruct