train_record_1745950253
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the record dataset. It achieves the following results on the evaluation set:
- Loss: 4.4571
- Num Input Tokens Seen: 54198768
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 123
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 3.4622 | 0.0064 | 200 | 4.9109 | 272992 |
| 4.4474 | 0.0128 | 400 | 4.6776 | 541536 |
| 4.3464 | 0.0192 | 600 | 4.5747 | 813648 |
| 4.3474 | 0.0256 | 800 | 4.5402 | 1084496 |
| 3.707 | 0.0320 | 1000 | 4.5457 | 1355472 |
| 4.4764 | 0.0384 | 1200 | 4.5150 | 1624048 |
| 4.6147 | 0.0448 | 1400 | 4.5161 | 1893968 |
| 3.7767 | 0.0512 | 1600 | 4.5053 | 2163024 |
| 4.0815 | 0.0576 | 1800 | 4.5199 | 2436032 |
| 4.5046 | 0.0640 | 2000 | 4.5114 | 2706960 |
| 6.1891 | 0.0704 | 2200 | 4.5107 | 2976144 |
| 3.7462 | 0.0768 | 2400 | 4.5104 | 3248384 |
| 4.6499 | 0.0832 | 2600 | 4.5035 | 3519088 |
| 4.3216 | 0.0896 | 2800 | 4.5189 | 3790208 |
| 4.4919 | 0.0960 | 3000 | 4.5074 | 4059472 |
| 3.3845 | 0.1024 | 3200 | 4.5056 | 4331088 |
| 4.6709 | 0.1088 | 3400 | 4.5079 | 4601728 |
| 3.8271 | 0.1152 | 3600 | 4.4911 | 4877104 |
| 4.429 | 0.1216 | 3800 | 4.4881 | 5150656 |
| 3.6511 | 0.1280 | 4000 | 4.4951 | 5422944 |
| 5.3519 | 0.1344 | 4200 | 4.4855 | 5692368 |
| 4.2255 | 0.1408 | 4400 | 4.4981 | 5965440 |
| 4.9348 | 0.1472 | 4600 | 4.4977 | 6237632 |
| 3.4591 | 0.1536 | 4800 | 4.5017 | 6506256 |
| 4.3076 | 0.1600 | 5000 | 4.4971 | 6779376 |
| 4.1459 | 0.1664 | 5200 | 4.5047 | 7051504 |
| 3.943 | 0.1728 | 5400 | 4.4887 | 7321552 |
| 3.7287 | 0.1792 | 5600 | 4.4714 | 7592304 |
| 4.5494 | 0.1856 | 5800 | 4.4840 | 7865632 |
| 3.8875 | 0.1920 | 6000 | 4.4906 | 8135936 |
| 4.7084 | 0.1985 | 6200 | 4.4994 | 8408624 |
| 3.6507 | 0.2049 | 6400 | 4.4853 | 8677888 |
| 4.6074 | 0.2113 | 6600 | 4.4882 | 8947120 |
| 3.9965 | 0.2177 | 6800 | 4.4977 | 9216336 |
| 3.3807 | 0.2241 | 7000 | 4.4911 | 9485568 |
| 4.1261 | 0.2305 | 7200 | 4.4915 | 9758160 |
| 4.3934 | 0.2369 | 7400 | 4.4811 | 10028256 |
| 5.2692 | 0.2433 | 7600 | 4.4935 | 10300544 |
| 4.0686 | 0.2497 | 7800 | 4.4813 | 10574192 |
| 4.6755 | 0.2561 | 8000 | 4.4925 | 10844928 |
| 3.8404 | 0.2625 | 8200 | 4.4825 | 11114800 |
| 4.5697 | 0.2689 | 8400 | 4.4839 | 11383280 |
| 5.3488 | 0.2753 | 8600 | 4.4747 | 11652336 |
| 4.5469 | 0.2817 | 8800 | 4.4927 | 11924224 |
| 3.6975 | 0.2881 | 9000 | 4.4993 | 12194800 |
| 3.8447 | 0.2945 | 9200 | 4.4795 | 12466288 |
| 4.2442 | 0.3009 | 9400 | 4.4930 | 12735104 |
| 4.1695 | 0.3073 | 9600 | 4.4912 | 13003216 |
| 4.1089 | 0.3137 | 9800 | 4.4999 | 13273680 |
| 5.1543 | 0.3201 | 10000 | 4.5051 | 13545840 |
| 4.2356 | 0.3265 | 10200 | 4.4709 | 13817104 |
| 4.757 | 0.3329 | 10400 | 4.4854 | 14088032 |
| 5.2015 | 0.3393 | 10600 | 4.4966 | 14361280 |
| 4.8125 | 0.3457 | 10800 | 4.4881 | 14631040 |
| 3.7125 | 0.3521 | 11000 | 4.4837 | 14901648 |
| 4.6481 | 0.3585 | 11200 | 4.4879 | 15170800 |
| 4.8716 | 0.3649 | 11400 | 4.5011 | 15440592 |
| 6.5712 | 0.3713 | 11600 | 4.4706 | 15710608 |
| 4.0938 | 0.3777 | 11800 | 4.4764 | 15980176 |
| 4.7581 | 0.3841 | 12000 | 4.4922 | 16249072 |
| 4.3731 | 0.3905 | 12200 | 4.4919 | 16522704 |
| 4.5962 | 0.3969 | 12400 | 4.4913 | 16794064 |
| 3.8551 | 0.4033 | 12600 | 4.4911 | 17062288 |
| 4.5508 | 0.4097 | 12800 | 4.4974 | 17331072 |
| 4.2258 | 0.4161 | 13000 | 4.5005 | 17599616 |
| 3.7439 | 0.4225 | 13200 | 4.4908 | 17869424 |
| 3.3628 | 0.4289 | 13400 | 4.5124 | 18141136 |
| 3.783 | 0.4353 | 13600 | 4.4945 | 18414272 |
| 5.2144 | 0.4417 | 13800 | 4.5006 | 18685264 |
| 4.1907 | 0.4481 | 14000 | 4.4800 | 18957072 |
| 3.1513 | 0.4545 | 14200 | 4.4922 | 19230480 |
| 3.8682 | 0.4609 | 14400 | 4.4971 | 19503472 |
| 4.9699 | 0.4673 | 14600 | 4.4959 | 19777344 |
| 3.1368 | 0.4737 | 14800 | 4.4740 | 20049328 |
| 4.6479 | 0.4801 | 15000 | 4.4963 | 20319488 |
| 4.9356 | 0.4865 | 15200 | 4.4824 | 20589760 |
| 4.012 | 0.4929 | 15400 | 4.4831 | 20860624 |
| 4.1237 | 0.4993 | 15600 | 4.4829 | 21133104 |
| 5.6766 | 0.5057 | 15800 | 4.4748 | 21403072 |
| 4.0869 | 0.5121 | 16000 | 4.4932 | 21675712 |
| 4.3651 | 0.5185 | 16200 | 4.4654 | 21946528 |
| 4.2341 | 0.5249 | 16400 | 4.4637 | 22217936 |
| 4.0066 | 0.5313 | 16600 | 4.4884 | 22489168 |
| 4.5365 | 0.5377 | 16800 | 4.4926 | 22759200 |
| 4.2455 | 0.5441 | 17000 | 4.4940 | 23028128 |
| 4.2426 | 0.5505 | 17200 | 4.4984 | 23300528 |
| 4.291 | 0.5569 | 17400 | 4.4735 | 23569728 |
| 4.815 | 0.5633 | 17600 | 4.5050 | 23838464 |
| 4.839 | 0.5697 | 17800 | 4.4982 | 24109808 |
| 4.8494 | 0.5761 | 18000 | 4.4909 | 24380336 |
| 4.3989 | 0.5825 | 18200 | 4.4916 | 24653072 |
| 3.9486 | 0.5890 | 18400 | 4.4571 | 24924912 |
| 4.4004 | 0.5954 | 18600 | 4.5108 | 25196400 |
| 4.237 | 0.6018 | 18800 | 4.4795 | 25468816 |
| 4.0165 | 0.6082 | 19000 | 4.4926 | 25741776 |
| 4.9951 | 0.6146 | 19200 | 4.4784 | 26017088 |
| 3.5959 | 0.6210 | 19400 | 4.4900 | 26286480 |
| 4.5629 | 0.6274 | 19600 | 4.4870 | 26557200 |
| 4.0744 | 0.6338 | 19800 | 4.4970 | 26827696 |
| 3.8765 | 0.6402 | 20000 | 4.4868 | 27098112 |
| 4.1175 | 0.6466 | 20200 | 4.4988 | 27369984 |
| 5.3047 | 0.6530 | 20400 | 4.4709 | 27640768 |
| 4.0005 | 0.6594 | 20600 | 4.4891 | 27910480 |
| 4.4139 | 0.6658 | 20800 | 4.4806 | 28180240 |
| 4.7914 | 0.6722 | 21000 | 4.4793 | 28451984 |
| 4.6492 | 0.6786 | 21200 | 4.4787 | 28723904 |
| 5.6937 | 0.6850 | 21400 | 4.4880 | 28994096 |
| 3.4293 | 0.6914 | 21600 | 4.4914 | 29267904 |
| 3.8018 | 0.6978 | 21800 | 4.4886 | 29540768 |
| 4.4407 | 0.7042 | 22000 | 4.4938 | 29812480 |
| 4.5343 | 0.7106 | 22200 | 4.5006 | 30080624 |
| 4.3453 | 0.7170 | 22400 | 4.5022 | 30352256 |
| 4.6245 | 0.7234 | 22600 | 4.4961 | 30622032 |
| 4.41 | 0.7298 | 22800 | 4.4869 | 30894016 |
| 4.678 | 0.7362 | 23000 | 4.4889 | 31162736 |
| 5.3991 | 0.7426 | 23200 | 4.4910 | 31433344 |
| 4.5956 | 0.7490 | 23400 | 4.4923 | 31708288 |
| 4.255 | 0.7554 | 23600 | 4.4687 | 31982128 |
| 4.7721 | 0.7618 | 23800 | 4.4848 | 32253040 |
| 3.9138 | 0.7682 | 24000 | 4.5084 | 32524464 |
| 3.8332 | 0.7746 | 24200 | 4.4894 | 32794928 |
| 3.9703 | 0.7810 | 24400 | 4.4828 | 33067904 |
| 4.0106 | 0.7874 | 24600 | 4.4984 | 33336480 |
| 4.3804 | 0.7938 | 24800 | 4.4618 | 33606096 |
| 4.2001 | 0.8002 | 25000 | 4.4941 | 33878720 |
| 4.4727 | 0.8066 | 25200 | 4.4828 | 34148496 |
| 4.79 | 0.8130 | 25400 | 4.5010 | 34421392 |
| 4.8489 | 0.8194 | 25600 | 4.4879 | 34692880 |
| 4.0376 | 0.8258 | 25800 | 4.4957 | 34964656 |
| 5.177 | 0.8322 | 26000 | 4.4864 | 35234256 |
| 4.6042 | 0.8386 | 26200 | 4.4710 | 35504864 |
| 4.626 | 0.8450 | 26400 | 4.5074 | 35777296 |
| 4.4373 | 0.8514 | 26600 | 4.4700 | 36045376 |
| 4.3166 | 0.8578 | 26800 | 4.4870 | 36315872 |
| 4.4594 | 0.8642 | 27000 | 4.4897 | 36590336 |
| 4.1458 | 0.8706 | 27200 | 4.4807 | 36858080 |
| 4.2357 | 0.8770 | 27400 | 4.5005 | 37125216 |
| 4.6963 | 0.8834 | 27600 | 4.4925 | 37397648 |
| 3.6524 | 0.8898 | 27800 | 4.5142 | 37667456 |
| 4.0705 | 0.8962 | 28000 | 4.4866 | 37935760 |
| 4.1514 | 0.9026 | 28200 | 4.4846 | 38204832 |
| 4.6624 | 0.9090 | 28400 | 4.5107 | 38475552 |
| 3.0155 | 0.9154 | 28600 | 4.4960 | 38746560 |
| 5.2439 | 0.9218 | 28800 | 4.4847 | 39016288 |
| 5.1229 | 0.9282 | 29000 | 4.4919 | 39287360 |
| 4.7314 | 0.9346 | 29200 | 4.4841 | 39557440 |
| 3.8531 | 0.9410 | 29400 | 4.4765 | 39830256 |
| 4.4226 | 0.9474 | 29600 | 4.4810 | 40102464 |
| 3.617 | 0.9538 | 29800 | 4.4892 | 40371968 |
| 3.6916 | 0.9602 | 30000 | 4.4873 | 40643632 |
| 3.6975 | 0.9666 | 30200 | 4.4882 | 40914064 |
| 4.0617 | 0.9730 | 30400 | 4.4910 | 41182128 |
| 4.0031 | 0.9795 | 30600 | 4.4909 | 41452688 |
| 4.8552 | 0.9859 | 30800 | 4.4880 | 41721056 |
| 3.6645 | 0.9923 | 31000 | 4.4843 | 41993584 |
| 3.9784 | 0.9987 | 31200 | 4.4770 | 42266304 |
| 3.6653 | 1.0051 | 31400 | 4.4780 | 42536720 |
| 5.1739 | 1.0115 | 31600 | 4.4908 | 42810528 |
| 4.4113 | 1.0179 | 31800 | 4.4806 | 43081488 |
| 4.0525 | 1.0243 | 32000 | 4.4966 | 43351904 |
| 4.5779 | 1.0307 | 32200 | 4.4996 | 43622640 |
| 4.4427 | 1.0371 | 32400 | 4.4902 | 43893856 |
| 3.1415 | 1.0435 | 32600 | 4.4777 | 44164592 |
| 3.9178 | 1.0499 | 32800 | 4.4812 | 44438640 |
| 4.1746 | 1.0563 | 33000 | 4.4796 | 44712640 |
| 4.973 | 1.0627 | 33200 | 4.4898 | 44980912 |
| 4.4304 | 1.0691 | 33400 | 4.4772 | 45251328 |
| 4.1773 | 1.0755 | 33600 | 4.4917 | 45523792 |
| 4.3373 | 1.0819 | 33800 | 4.4946 | 45796960 |
| 5.0644 | 1.0883 | 34000 | 4.4988 | 46067712 |
| 3.8478 | 1.0947 | 34200 | 4.4749 | 46337408 |
| 4.7971 | 1.1011 | 34400 | 4.4940 | 46611232 |
| 4.4257 | 1.1075 | 34600 | 4.4740 | 46879824 |
| 3.9397 | 1.1139 | 34800 | 4.4922 | 47155008 |
| 4.4246 | 1.1203 | 35000 | 4.5051 | 47426864 |
| 4.6721 | 1.1267 | 35200 | 4.4900 | 47698224 |
| 3.6621 | 1.1331 | 35400 | 4.4876 | 47967840 |
| 4.291 | 1.1395 | 35600 | 4.4749 | 48239792 |
| 4.2103 | 1.1459 | 35800 | 4.5065 | 48514752 |
| 4.6992 | 1.1523 | 36000 | 4.5041 | 48783136 |
| 4.4254 | 1.1587 | 36200 | 4.4957 | 49052640 |
| 5.1153 | 1.1651 | 36400 | 4.4957 | 49321648 |
| 3.9983 | 1.1715 | 36600 | 4.4960 | 49592352 |
| 3.6453 | 1.1779 | 36800 | 4.4960 | 49863184 |
| 4.6426 | 1.1843 | 37000 | 4.4960 | 50135184 |
| 4.5039 | 1.1907 | 37200 | 4.4960 | 50407568 |
| 6.2586 | 1.1971 | 37400 | 4.4960 | 50678192 |
| 4.6008 | 1.2035 | 37600 | 4.4960 | 50953312 |
| 4.0899 | 1.2099 | 37800 | 4.4960 | 51223392 |
| 4.6102 | 1.2163 | 38000 | 4.4960 | 51491824 |
| 5.2984 | 1.2227 | 38200 | 4.4960 | 51763040 |
| 3.8424 | 1.2291 | 38400 | 4.4960 | 52033392 |
| 4.7151 | 1.2355 | 38600 | 4.4960 | 52304608 |
| 4.5647 | 1.2419 | 38800 | 4.4960 | 52574352 |
| 4.2167 | 1.2483 | 39000 | 4.4960 | 52846048 |
| 4.1054 | 1.2547 | 39200 | 4.4960 | 53118576 |
| 3.7602 | 1.2611 | 39400 | 4.4960 | 53387872 |
| 3.4438 | 1.2675 | 39600 | 4.4960 | 53659856 |
| 4.9077 | 1.2739 | 39800 | 4.4960 | 53928784 |
| 4.6876 | 1.2803 | 40000 | 4.4960 | 54198768 |
Framework versions
- PEFT 0.15.2.dev0
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 15
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for rbelanec/train_record_1745950253
Base model
meta-llama/Meta-Llama-3-8B-Instruct