train_record_1745950250
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the record dataset. It achieves the following results on the evaluation set:
- Loss: 0.4438
- Num Input Tokens Seen: 54198768
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 123
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 1.0376 | 0.0064 | 200 | 1.5287 | 272992 |
| 1.1312 | 0.0128 | 400 | 1.1516 | 541536 |
| 1.0905 | 0.0192 | 600 | 0.9675 | 813648 |
| 0.6506 | 0.0256 | 800 | 0.8528 | 1084496 |
| 0.5621 | 0.0320 | 1000 | 0.7838 | 1355472 |
| 0.7799 | 0.0384 | 1200 | 0.7391 | 1624048 |
| 0.7229 | 0.0448 | 1400 | 0.7115 | 1893968 |
| 0.6776 | 0.0512 | 1600 | 0.6931 | 2163024 |
| 0.5712 | 0.0576 | 1800 | 0.6745 | 2436032 |
| 0.6222 | 0.0640 | 2000 | 0.6617 | 2706960 |
| 1.1155 | 0.0704 | 2200 | 0.6501 | 2976144 |
| 0.6136 | 0.0768 | 2400 | 0.6399 | 3248384 |
| 0.6635 | 0.0832 | 2600 | 0.6310 | 3519088 |
| 0.6066 | 0.0896 | 2800 | 0.6240 | 3790208 |
| 0.5585 | 0.0960 | 3000 | 0.6168 | 4059472 |
| 0.5138 | 0.1024 | 3200 | 0.6087 | 4331088 |
| 0.539 | 0.1088 | 3400 | 0.6047 | 4601728 |
| 0.4948 | 0.1152 | 3600 | 0.5978 | 4877104 |
| 0.5902 | 0.1216 | 3800 | 0.5923 | 5150656 |
| 0.512 | 0.1280 | 4000 | 0.5888 | 5422944 |
| 0.6774 | 0.1344 | 4200 | 0.5858 | 5692368 |
| 0.6641 | 0.1408 | 4400 | 0.5806 | 5965440 |
| 0.3172 | 0.1472 | 4600 | 0.5787 | 6237632 |
| 0.4636 | 0.1536 | 4800 | 0.5753 | 6506256 |
| 0.6575 | 0.1600 | 5000 | 0.5704 | 6779376 |
| 0.5575 | 0.1664 | 5200 | 0.5675 | 7051504 |
| 0.5267 | 0.1728 | 5400 | 0.5632 | 7321552 |
| 0.5367 | 0.1792 | 5600 | 0.5594 | 7592304 |
| 0.5104 | 0.1856 | 5800 | 0.5570 | 7865632 |
| 0.4609 | 0.1920 | 6000 | 0.5556 | 8135936 |
| 0.601 | 0.1985 | 6200 | 0.5515 | 8408624 |
| 0.5025 | 0.2049 | 6400 | 0.5481 | 8677888 |
| 0.5747 | 0.2113 | 6600 | 0.5448 | 8947120 |
| 0.4186 | 0.2177 | 6800 | 0.5422 | 9216336 |
| 0.4501 | 0.2241 | 7000 | 0.5399 | 9485568 |
| 0.4739 | 0.2305 | 7200 | 0.5383 | 9758160 |
| 0.6005 | 0.2369 | 7400 | 0.5366 | 10028256 |
| 0.5946 | 0.2433 | 7600 | 0.5359 | 10300544 |
| 0.4071 | 0.2497 | 7800 | 0.5331 | 10574192 |
| 0.7391 | 0.2561 | 8000 | 0.5299 | 10844928 |
| 0.4257 | 0.2625 | 8200 | 0.5278 | 11114800 |
| 0.5219 | 0.2689 | 8400 | 0.5254 | 11383280 |
| 0.6746 | 0.2753 | 8600 | 0.5235 | 11652336 |
| 0.5384 | 0.2817 | 8800 | 0.5215 | 11924224 |
| 0.4862 | 0.2881 | 9000 | 0.5199 | 12194800 |
| 0.4256 | 0.2945 | 9200 | 0.5187 | 12466288 |
| 0.431 | 0.3009 | 9400 | 0.5154 | 12735104 |
| 0.4597 | 0.3073 | 9600 | 0.5135 | 13003216 |
| 0.4979 | 0.3137 | 9800 | 0.5123 | 13273680 |
| 0.6121 | 0.3201 | 10000 | 0.5108 | 13545840 |
| 0.5801 | 0.3265 | 10200 | 0.5098 | 13817104 |
| 0.4489 | 0.3329 | 10400 | 0.5083 | 14088032 |
| 0.5318 | 0.3393 | 10600 | 0.5066 | 14361280 |
| 0.4673 | 0.3457 | 10800 | 0.5040 | 14631040 |
| 0.443 | 0.3521 | 11000 | 0.5028 | 14901648 |
| 0.4586 | 0.3585 | 11200 | 0.5015 | 15170800 |
| 0.6426 | 0.3649 | 11400 | 0.5003 | 15440592 |
| 0.5997 | 0.3713 | 11600 | 0.4979 | 15710608 |
| 0.3011 | 0.3777 | 11800 | 0.4967 | 15980176 |
| 0.676 | 0.3841 | 12000 | 0.4954 | 16249072 |
| 0.3801 | 0.3905 | 12200 | 0.4941 | 16522704 |
| 0.7394 | 0.3969 | 12400 | 0.4925 | 16794064 |
| 0.4433 | 0.4033 | 12600 | 0.4916 | 17062288 |
| 0.5369 | 0.4097 | 12800 | 0.4899 | 17331072 |
| 0.3812 | 0.4161 | 13000 | 0.4896 | 17599616 |
| 0.2993 | 0.4225 | 13200 | 0.4891 | 17869424 |
| 0.4166 | 0.4289 | 13400 | 0.4874 | 18141136 |
| 0.494 | 0.4353 | 13600 | 0.4855 | 18414272 |
| 0.4413 | 0.4417 | 13800 | 0.4849 | 18685264 |
| 0.4482 | 0.4481 | 14000 | 0.4831 | 18957072 |
| 0.3391 | 0.4545 | 14200 | 0.4814 | 19230480 |
| 0.3414 | 0.4609 | 14400 | 0.4816 | 19503472 |
| 0.5706 | 0.4673 | 14600 | 0.4807 | 19777344 |
| 0.3137 | 0.4737 | 14800 | 0.4797 | 20049328 |
| 0.4268 | 0.4801 | 15000 | 0.4798 | 20319488 |
| 0.5074 | 0.4865 | 15200 | 0.4789 | 20589760 |
| 0.5278 | 0.4929 | 15400 | 0.4764 | 20860624 |
| 0.4903 | 0.4993 | 15600 | 0.4754 | 21133104 |
| 0.5416 | 0.5057 | 15800 | 0.4756 | 21403072 |
| 0.3926 | 0.5121 | 16000 | 0.4761 | 21675712 |
| 0.4037 | 0.5185 | 16200 | 0.4750 | 21946528 |
| 0.4913 | 0.5249 | 16400 | 0.4724 | 22217936 |
| 0.4442 | 0.5313 | 16600 | 0.4722 | 22489168 |
| 0.3534 | 0.5377 | 16800 | 0.4720 | 22759200 |
| 0.4472 | 0.5441 | 17000 | 0.4709 | 23028128 |
| 0.3981 | 0.5505 | 17200 | 0.4705 | 23300528 |
| 0.4462 | 0.5569 | 17400 | 0.4684 | 23569728 |
| 0.4062 | 0.5633 | 17600 | 0.4680 | 23838464 |
| 0.3186 | 0.5697 | 17800 | 0.4668 | 24109808 |
| 0.4891 | 0.5761 | 18000 | 0.4663 | 24380336 |
| 0.458 | 0.5825 | 18200 | 0.4661 | 24653072 |
| 0.4189 | 0.5890 | 18400 | 0.4657 | 24924912 |
| 0.4599 | 0.5954 | 18600 | 0.4661 | 25196400 |
| 0.3823 | 0.6018 | 18800 | 0.4650 | 25468816 |
| 0.3631 | 0.6082 | 19000 | 0.4641 | 25741776 |
| 0.6834 | 0.6146 | 19200 | 0.4637 | 26017088 |
| 0.3855 | 0.6210 | 19400 | 0.4627 | 26286480 |
| 0.4292 | 0.6274 | 19600 | 0.4638 | 26557200 |
| 0.437 | 0.6338 | 19800 | 0.4627 | 26827696 |
| 0.3012 | 0.6402 | 20000 | 0.4618 | 27098112 |
| 0.3044 | 0.6466 | 20200 | 0.4609 | 27369984 |
| 0.5599 | 0.6530 | 20400 | 0.4598 | 27640768 |
| 0.3936 | 0.6594 | 20600 | 0.4592 | 27910480 |
| 0.4015 | 0.6658 | 20800 | 0.4587 | 28180240 |
| 0.5022 | 0.6722 | 21000 | 0.4579 | 28451984 |
| 0.3381 | 0.6786 | 21200 | 0.4577 | 28723904 |
| 0.6385 | 0.6850 | 21400 | 0.4576 | 28994096 |
| 0.5204 | 0.6914 | 21600 | 0.4570 | 29267904 |
| 0.3454 | 0.6978 | 21800 | 0.4570 | 29540768 |
| 0.4744 | 0.7042 | 22000 | 0.4565 | 29812480 |
| 0.3103 | 0.7106 | 22200 | 0.4558 | 30080624 |
| 0.5805 | 0.7170 | 22400 | 0.4556 | 30352256 |
| 0.4824 | 0.7234 | 22600 | 0.4552 | 30622032 |
| 0.3745 | 0.7298 | 22800 | 0.4549 | 30894016 |
| 0.5018 | 0.7362 | 23000 | 0.4545 | 31162736 |
| 0.4904 | 0.7426 | 23200 | 0.4541 | 31433344 |
| 0.5793 | 0.7490 | 23400 | 0.4533 | 31708288 |
| 0.5206 | 0.7554 | 23600 | 0.4534 | 31982128 |
| 0.4382 | 0.7618 | 23800 | 0.4533 | 32253040 |
| 0.464 | 0.7682 | 24000 | 0.4531 | 32524464 |
| 0.4827 | 0.7746 | 24200 | 0.4527 | 32794928 |
| 0.5373 | 0.7810 | 24400 | 0.4521 | 33067904 |
| 0.3557 | 0.7874 | 24600 | 0.4519 | 33336480 |
| 0.4961 | 0.7938 | 24800 | 0.4517 | 33606096 |
| 0.6283 | 0.8002 | 25000 | 0.4515 | 33878720 |
| 0.3892 | 0.8066 | 25200 | 0.4513 | 34148496 |
| 0.4803 | 0.8130 | 25400 | 0.4518 | 34421392 |
| 0.4706 | 0.8194 | 25600 | 0.4511 | 34692880 |
| 0.421 | 0.8258 | 25800 | 0.4504 | 34964656 |
| 0.5967 | 0.8322 | 26000 | 0.4504 | 35234256 |
| 0.4502 | 0.8386 | 26200 | 0.4503 | 35504864 |
| 0.5948 | 0.8450 | 26400 | 0.4500 | 35777296 |
| 0.3845 | 0.8514 | 26600 | 0.4494 | 36045376 |
| 0.5572 | 0.8578 | 26800 | 0.4491 | 36315872 |
| 0.5925 | 0.8642 | 27000 | 0.4487 | 36590336 |
| 0.3107 | 0.8706 | 27200 | 0.4486 | 36858080 |
| 0.441 | 0.8770 | 27400 | 0.4483 | 37125216 |
| 0.5021 | 0.8834 | 27600 | 0.4479 | 37397648 |
| 0.4081 | 0.8898 | 27800 | 0.4479 | 37667456 |
| 0.4245 | 0.8962 | 28000 | 0.4474 | 37935760 |
| 0.4141 | 0.9026 | 28200 | 0.4473 | 38204832 |
| 0.4092 | 0.9090 | 28400 | 0.4469 | 38475552 |
| 0.3323 | 0.9154 | 28600 | 0.4469 | 38746560 |
| 0.5035 | 0.9218 | 28800 | 0.4468 | 39016288 |
| 0.3608 | 0.9282 | 29000 | 0.4470 | 39287360 |
| 0.4684 | 0.9346 | 29200 | 0.4466 | 39557440 |
| 0.2902 | 0.9410 | 29400 | 0.4463 | 39830256 |
| 0.4933 | 0.9474 | 29600 | 0.4463 | 40102464 |
| 0.2688 | 0.9538 | 29800 | 0.4461 | 40371968 |
| 0.3888 | 0.9602 | 30000 | 0.4462 | 40643632 |
| 0.4356 | 0.9666 | 30200 | 0.4461 | 40914064 |
| 0.419 | 0.9730 | 30400 | 0.4460 | 41182128 |
| 0.3897 | 0.9795 | 30600 | 0.4457 | 41452688 |
| 0.4594 | 0.9859 | 30800 | 0.4458 | 41721056 |
| 0.3648 | 0.9923 | 31000 | 0.4454 | 41993584 |
| 0.4395 | 0.9987 | 31200 | 0.4456 | 42266304 |
| 0.3717 | 1.0051 | 31400 | 0.4454 | 42536720 |
| 0.3335 | 1.0115 | 31600 | 0.4457 | 42810528 |
| 0.603 | 1.0179 | 31800 | 0.4452 | 43081488 |
| 0.3702 | 1.0243 | 32000 | 0.4454 | 43351904 |
| 0.67 | 1.0307 | 32200 | 0.4450 | 43622640 |
| 0.4796 | 1.0371 | 32400 | 0.4450 | 43893856 |
| 0.2008 | 1.0435 | 32600 | 0.4449 | 44164592 |
| 0.4444 | 1.0499 | 32800 | 0.4447 | 44438640 |
| 0.467 | 1.0563 | 33000 | 0.4448 | 44712640 |
| 0.3529 | 1.0627 | 33200 | 0.4446 | 44980912 |
| 0.5797 | 1.0691 | 33400 | 0.4447 | 45251328 |
| 0.5556 | 1.0755 | 33600 | 0.4447 | 45523792 |
| 0.3862 | 1.0819 | 33800 | 0.4445 | 45796960 |
| 0.389 | 1.0883 | 34000 | 0.4444 | 46067712 |
| 0.3956 | 1.0947 | 34200 | 0.4443 | 46337408 |
| 0.5701 | 1.1011 | 34400 | 0.4443 | 46611232 |
| 0.4403 | 1.1075 | 34600 | 0.4443 | 46879824 |
| 0.5335 | 1.1139 | 34800 | 0.4441 | 47155008 |
| 0.339 | 1.1203 | 35000 | 0.4442 | 47426864 |
| 0.4668 | 1.1267 | 35200 | 0.4441 | 47698224 |
| 0.5801 | 1.1331 | 35400 | 0.4439 | 47967840 |
| 0.4932 | 1.1395 | 35600 | 0.4442 | 48239792 |
| 0.4009 | 1.1459 | 35800 | 0.4441 | 48514752 |
| 0.6396 | 1.1523 | 36000 | 0.4441 | 48783136 |
| 0.4138 | 1.1587 | 36200 | 0.4441 | 49052640 |
| 0.341 | 1.1651 | 36400 | 0.4439 | 49321648 |
| 0.3171 | 1.1715 | 36600 | 0.4440 | 49592352 |
| 0.404 | 1.1779 | 36800 | 0.4440 | 49863184 |
| 0.4588 | 1.1843 | 37000 | 0.4440 | 50135184 |
| 0.4547 | 1.1907 | 37200 | 0.4441 | 50407568 |
| 0.7914 | 1.1971 | 37400 | 0.4440 | 50678192 |
| 0.4438 | 1.2035 | 37600 | 0.4441 | 50953312 |
| 0.4227 | 1.2099 | 37800 | 0.4440 | 51223392 |
| 0.287 | 1.2163 | 38000 | 0.4440 | 51491824 |
| 0.3409 | 1.2227 | 38200 | 0.4439 | 51763040 |
| 0.5219 | 1.2291 | 38400 | 0.4439 | 52033392 |
| 0.4185 | 1.2355 | 38600 | 0.4439 | 52304608 |
| 0.6725 | 1.2419 | 38800 | 0.4441 | 52574352 |
| 0.3928 | 1.2483 | 39000 | 0.4442 | 52846048 |
| 0.3627 | 1.2547 | 39200 | 0.4438 | 53118576 |
| 0.3919 | 1.2611 | 39400 | 0.4439 | 53387872 |
| 0.3773 | 1.2675 | 39600 | 0.4440 | 53659856 |
| 0.4209 | 1.2739 | 39800 | 0.4440 | 53928784 |
| 0.3773 | 1.2803 | 40000 | 0.4440 | 54198768 |
Framework versions
- PEFT 0.15.2.dev0
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for rbelanec/train_record_1745950250
Base model
meta-llama/Meta-Llama-3-8B-Instruct