train_wic_42_1767887009
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:
- Loss: 0.3170
- Num Input Tokens Seen: 4067384
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.6066 | 0.5002 | 1222 | 0.3170 | 203312 |
| 0.096 | 1.0004 | 2444 | 0.3510 | 406960 |
| 0.6999 | 1.5006 | 3666 | 0.3758 | 610320 |
| 0.0235 | 2.0008 | 4888 | 0.3447 | 814208 |
| 0.2523 | 2.5010 | 6110 | 0.4392 | 1017424 |
| 0.4419 | 3.0012 | 7332 | 0.3659 | 1221232 |
| 0.2423 | 3.5014 | 8554 | 0.3769 | 1424736 |
| 0.1568 | 4.0016 | 9776 | 0.3933 | 1628304 |
| 0.411 | 4.5018 | 10998 | 0.3961 | 1831936 |
| 0.6195 | 5.0020 | 12220 | 0.3968 | 2035280 |
| 0.27 | 5.5023 | 13442 | 0.3906 | 2239168 |
| 0.1055 | 6.0025 | 14664 | 0.4025 | 2442032 |
| 0.2196 | 6.5027 | 15886 | 0.4531 | 2645696 |
| 0.4288 | 7.0029 | 17108 | 0.4529 | 2848960 |
| 0.0026 | 7.5031 | 18330 | 0.4631 | 3052368 |
| 0.388 | 8.0033 | 19552 | 0.4850 | 3255736 |
| 0.2477 | 8.5035 | 20774 | 0.4756 | 3459704 |
| 0.1481 | 9.0037 | 21996 | 0.4933 | 3662296 |
| 0.3577 | 9.5039 | 23218 | 0.4952 | 3866024 |
Framework versions
- PEFT 0.17.1
- Transformers 4.51.3
- Pytorch 2.9.1+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 1
Model tree for rbelanec/train_wic_42_1767887009
Base model
meta-llama/Meta-Llama-3-8B-Instruct