train_stsb_1752763924
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:
- Loss: 1.9587
- Num Input Tokens Seen: 4852608
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 123
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 7.6904 | 0.5008 | 324 | 7.5695 | 241664 |
| 5.955 | 1.0015 | 648 | 6.0297 | 485616 |
| 4.6224 | 1.5023 | 972 | 4.6702 | 727280 |
| 3.407 | 2.0031 | 1296 | 3.9041 | 971536 |
| 3.3945 | 2.5039 | 1620 | 3.5486 | 1214864 |
| 3.0494 | 3.0046 | 1944 | 3.3125 | 1456656 |
| 3.1003 | 3.5054 | 2268 | 3.1060 | 1701712 |
| 2.7815 | 4.0062 | 2592 | 2.9036 | 1942960 |
| 2.7445 | 4.5070 | 2916 | 2.6954 | 2189232 |
| 2.3198 | 5.0077 | 3240 | 2.5111 | 2429824 |
| 2.4696 | 5.5085 | 3564 | 2.3520 | 2673664 |
| 2.2091 | 6.0093 | 3888 | 2.2263 | 2917488 |
| 1.8969 | 6.5100 | 4212 | 2.1332 | 3159216 |
| 1.7298 | 7.0108 | 4536 | 2.0647 | 3403040 |
| 2.0465 | 7.5116 | 4860 | 2.0179 | 3648160 |
| 1.6594 | 8.0124 | 5184 | 1.9898 | 3890608 |
| 2.1859 | 8.5131 | 5508 | 1.9703 | 4134704 |
| 1.9089 | 9.0139 | 5832 | 1.9624 | 4375824 |
| 1.9414 | 9.5147 | 6156 | 1.9587 | 4620240 |
Framework versions
- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.7.1+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for rbelanec/train_stsb_1752763924
Base model
meta-llama/Meta-Llama-3-8B-Instruct