train_mrpc_42_1774791060
This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:
- Loss: 0.1332
- Num Input Tokens Seen: 1780000
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.2006 | 0.2518 | 104 | 0.2291 | 89600 |
| 0.1908 | 0.5036 | 208 | 0.1950 | 178688 |
| 0.1721 | 0.7554 | 312 | 0.1814 | 267968 |
| 0.1644 | 1.0073 | 416 | 0.1772 | 357488 |
| 0.1866 | 1.2591 | 520 | 0.1557 | 446896 |
| 0.1538 | 1.5109 | 624 | 0.1509 | 536176 |
| 0.1846 | 1.7627 | 728 | 0.1462 | 626992 |
| 0.181 | 2.0145 | 832 | 0.1442 | 716344 |
| 0.0969 | 2.2663 | 936 | 0.1435 | 806712 |
| 0.1447 | 2.5182 | 1040 | 0.1429 | 895736 |
| 0.0919 | 2.7700 | 1144 | 0.1439 | 985592 |
| 0.1187 | 3.0218 | 1248 | 0.1343 | 1074624 |
| 0.1757 | 3.2736 | 1352 | 0.1521 | 1164544 |
| 0.1714 | 3.5254 | 1456 | 0.1346 | 1253248 |
| 0.1512 | 3.7772 | 1560 | 0.1374 | 1344000 |
| 0.0868 | 4.0291 | 1664 | 0.1423 | 1432880 |
| 0.1456 | 4.2809 | 1768 | 0.1339 | 1522544 |
| 0.1223 | 4.5327 | 1872 | 0.1340 | 1611760 |
| 0.1404 | 4.7845 | 1976 | 0.1332 | 1702832 |
Framework versions
- PEFT 0.17.1
- Transformers 4.51.3
- Pytorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 182
Model tree for rbelanec/train_mrpc_42_1774791060
Base model
meta-llama/Llama-3.2-1B-Instruct