train_record_1745950252
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the record dataset. It achieves the following results on the evaluation set:
- Loss: 0.2577
- Num Input Tokens Seen: 54198768
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 123
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.3915 | 0.0064 | 200 | 0.4934 | 272992 |
| 0.4742 | 0.0128 | 400 | 0.4420 | 541536 |
| 0.527 | 0.0192 | 600 | 0.4203 | 813648 |
| 0.5682 | 0.0256 | 800 | 0.3966 | 1084496 |
| 0.3447 | 0.0320 | 1000 | 0.3893 | 1355472 |
| 0.356 | 0.0384 | 1200 | 0.3912 | 1624048 |
| 0.4385 | 0.0448 | 1400 | 0.3688 | 1893968 |
| 0.2763 | 0.0512 | 1600 | 0.3641 | 2163024 |
| 0.2436 | 0.0576 | 1800 | 0.3672 | 2436032 |
| 0.3417 | 0.0640 | 2000 | 0.3655 | 2706960 |
| 0.4767 | 0.0704 | 2200 | 0.3499 | 2976144 |
| 0.299 | 0.0768 | 2400 | 0.3573 | 3248384 |
| 0.3352 | 0.0832 | 2600 | 0.3447 | 3519088 |
| 0.3419 | 0.0896 | 2800 | 0.3351 | 3790208 |
| 0.2942 | 0.0960 | 3000 | 0.3323 | 4059472 |
| 0.2068 | 0.1024 | 3200 | 0.3287 | 4331088 |
| 0.3147 | 0.1088 | 3400 | 0.3330 | 4601728 |
| 0.1763 | 0.1152 | 3600 | 0.3282 | 4877104 |
| 0.2979 | 0.1216 | 3800 | 0.3304 | 5150656 |
| 0.206 | 0.1280 | 4000 | 0.3406 | 5422944 |
| 0.4644 | 0.1344 | 4200 | 0.3358 | 5692368 |
| 0.2814 | 0.1408 | 4400 | 0.3298 | 5965440 |
| 0.1072 | 0.1472 | 4600 | 0.3373 | 6237632 |
| 0.24 | 0.1536 | 4800 | 0.3210 | 6506256 |
| 0.3447 | 0.1600 | 5000 | 0.3419 | 6779376 |
| 0.2321 | 0.1664 | 5200 | 0.3328 | 7051504 |
| 0.346 | 0.1728 | 5400 | 0.3200 | 7321552 |
| 0.4228 | 0.1792 | 5600 | 0.3191 | 7592304 |
| 0.2372 | 0.1856 | 5800 | 0.3272 | 7865632 |
| 0.2721 | 0.1920 | 6000 | 0.3211 | 8135936 |
| 0.3942 | 0.1985 | 6200 | 0.3185 | 8408624 |
| 0.2602 | 0.2049 | 6400 | 0.3156 | 8677888 |
| 0.2708 | 0.2113 | 6600 | 0.3072 | 8947120 |
| 0.4122 | 0.2177 | 6800 | 0.3181 | 9216336 |
| 0.2382 | 0.2241 | 7000 | 0.3097 | 9485568 |
| 0.4538 | 0.2305 | 7200 | 0.3152 | 9758160 |
| 0.38 | 0.2369 | 7400 | 0.3229 | 10028256 |
| 0.4055 | 0.2433 | 7600 | 0.3086 | 10300544 |
| 0.1776 | 0.2497 | 7800 | 0.3077 | 10574192 |
| 0.3196 | 0.2561 | 8000 | 0.3043 | 10844928 |
| 0.3007 | 0.2625 | 8200 | 0.3049 | 11114800 |
| 0.233 | 0.2689 | 8400 | 0.3096 | 11383280 |
| 0.5709 | 0.2753 | 8600 | 0.3076 | 11652336 |
| 0.3855 | 0.2817 | 8800 | 0.3033 | 11924224 |
| 0.4096 | 0.2881 | 9000 | 0.3060 | 12194800 |
| 0.442 | 0.2945 | 9200 | 0.2930 | 12466288 |
| 0.2325 | 0.3009 | 9400 | 0.3004 | 12735104 |
| 0.2312 | 0.3073 | 9600 | 0.3049 | 13003216 |
| 0.3358 | 0.3137 | 9800 | 0.2986 | 13273680 |
| 0.469 | 0.3201 | 10000 | 0.2955 | 13545840 |
| 0.3545 | 0.3265 | 10200 | 0.2976 | 13817104 |
| 0.2752 | 0.3329 | 10400 | 0.3024 | 14088032 |
| 0.238 | 0.3393 | 10600 | 0.2924 | 14361280 |
| 0.1789 | 0.3457 | 10800 | 0.3007 | 14631040 |
| 0.1979 | 0.3521 | 11000 | 0.2991 | 14901648 |
| 0.295 | 0.3585 | 11200 | 0.3065 | 15170800 |
| 0.3261 | 0.3649 | 11400 | 0.3079 | 15440592 |
| 0.3484 | 0.3713 | 11600 | 0.2914 | 15710608 |
| 0.1678 | 0.3777 | 11800 | 0.2939 | 15980176 |
| 0.5446 | 0.3841 | 12000 | 0.2977 | 16249072 |
| 0.2071 | 0.3905 | 12200 | 0.3023 | 16522704 |
| 0.4169 | 0.3969 | 12400 | 0.2909 | 16794064 |
| 0.1803 | 0.4033 | 12600 | 0.2902 | 17062288 |
| 0.1969 | 0.4097 | 12800 | 0.2901 | 17331072 |
| 0.2145 | 0.4161 | 13000 | 0.2878 | 17599616 |
| 0.1481 | 0.4225 | 13200 | 0.2997 | 17869424 |
| 0.3255 | 0.4289 | 13400 | 0.2913 | 18141136 |
| 0.3929 | 0.4353 | 13600 | 0.2951 | 18414272 |
| 0.4248 | 0.4417 | 13800 | 0.2802 | 18685264 |
| 0.2256 | 0.4481 | 14000 | 0.2837 | 18957072 |
| 0.2563 | 0.4545 | 14200 | 0.2821 | 19230480 |
| 0.2458 | 0.4609 | 14400 | 0.2853 | 19503472 |
| 0.2216 | 0.4673 | 14600 | 0.2884 | 19777344 |
| 0.1657 | 0.4737 | 14800 | 0.3013 | 20049328 |
| 0.2795 | 0.4801 | 15000 | 0.2857 | 20319488 |
| 0.4623 | 0.4865 | 15200 | 0.2868 | 20589760 |
| 0.3603 | 0.4929 | 15400 | 0.2808 | 20860624 |
| 0.3257 | 0.4993 | 15600 | 0.2928 | 21133104 |
| 0.3434 | 0.5057 | 15800 | 0.2861 | 21403072 |
| 0.2552 | 0.5121 | 16000 | 0.2869 | 21675712 |
| 0.3749 | 0.5185 | 16200 | 0.2919 | 21946528 |
| 0.2644 | 0.5249 | 16400 | 0.2797 | 22217936 |
| 0.3289 | 0.5313 | 16600 | 0.2856 | 22489168 |
| 0.1999 | 0.5377 | 16800 | 0.2913 | 22759200 |
| 0.207 | 0.5441 | 17000 | 0.2813 | 23028128 |
| 0.4231 | 0.5505 | 17200 | 0.2797 | 23300528 |
| 0.311 | 0.5569 | 17400 | 0.2861 | 23569728 |
| 0.3527 | 0.5633 | 17600 | 0.2804 | 23838464 |
| 0.1523 | 0.5697 | 17800 | 0.2800 | 24109808 |
| 0.486 | 0.5761 | 18000 | 0.2760 | 24380336 |
| 0.3081 | 0.5825 | 18200 | 0.2840 | 24653072 |
| 0.1504 | 0.5890 | 18400 | 0.2765 | 24924912 |
| 0.2461 | 0.5954 | 18600 | 0.2918 | 25196400 |
| 0.28 | 0.6018 | 18800 | 0.2778 | 25468816 |
| 0.2586 | 0.6082 | 19000 | 0.2777 | 25741776 |
| 0.7858 | 0.6146 | 19200 | 0.2846 | 26017088 |
| 0.2154 | 0.6210 | 19400 | 0.2813 | 26286480 |
| 0.205 | 0.6274 | 19600 | 0.2920 | 26557200 |
| 0.2923 | 0.6338 | 19800 | 0.2749 | 26827696 |
| 0.1228 | 0.6402 | 20000 | 0.2763 | 27098112 |
| 0.2786 | 0.6466 | 20200 | 0.2743 | 27369984 |
| 0.2426 | 0.6530 | 20400 | 0.2748 | 27640768 |
| 0.1864 | 0.6594 | 20600 | 0.2746 | 27910480 |
| 0.2724 | 0.6658 | 20800 | 0.2762 | 28180240 |
| 0.2213 | 0.6722 | 21000 | 0.2757 | 28451984 |
| 0.2052 | 0.6786 | 21200 | 0.2781 | 28723904 |
| 0.4804 | 0.6850 | 21400 | 0.2734 | 28994096 |
| 0.2861 | 0.6914 | 21600 | 0.2740 | 29267904 |
| 0.163 | 0.6978 | 21800 | 0.2710 | 29540768 |
| 0.1983 | 0.7042 | 22000 | 0.2702 | 29812480 |
| 0.1187 | 0.7106 | 22200 | 0.2722 | 30080624 |
| 0.3627 | 0.7170 | 22400 | 0.2774 | 30352256 |
| 0.3135 | 0.7234 | 22600 | 0.2776 | 30622032 |
| 0.271 | 0.7298 | 22800 | 0.2708 | 30894016 |
| 0.2472 | 0.7362 | 23000 | 0.2733 | 31162736 |
| 0.3659 | 0.7426 | 23200 | 0.2691 | 31433344 |
| 0.2444 | 0.7490 | 23400 | 0.2659 | 31708288 |
| 0.2957 | 0.7554 | 23600 | 0.2670 | 31982128 |
| 0.2204 | 0.7618 | 23800 | 0.2685 | 32253040 |
| 0.1833 | 0.7682 | 24000 | 0.2660 | 32524464 |
| 0.3072 | 0.7746 | 24200 | 0.2669 | 32794928 |
| 0.2977 | 0.7810 | 24400 | 0.2628 | 33067904 |
| 0.1999 | 0.7874 | 24600 | 0.2725 | 33336480 |
| 0.2237 | 0.7938 | 24800 | 0.2661 | 33606096 |
| 0.2279 | 0.8002 | 25000 | 0.2692 | 33878720 |
| 0.2089 | 0.8066 | 25200 | 0.2716 | 34148496 |
| 0.3298 | 0.8130 | 25400 | 0.2721 | 34421392 |
| 0.5142 | 0.8194 | 25600 | 0.2703 | 34692880 |
| 0.1002 | 0.8258 | 25800 | 0.2633 | 34964656 |
| 0.3595 | 0.8322 | 26000 | 0.2655 | 35234256 |
| 0.2288 | 0.8386 | 26200 | 0.2681 | 35504864 |
| 0.3692 | 0.8450 | 26400 | 0.2708 | 35777296 |
| 0.1523 | 0.8514 | 26600 | 0.2653 | 36045376 |
| 0.3688 | 0.8578 | 26800 | 0.2660 | 36315872 |
| 0.476 | 0.8642 | 27000 | 0.2679 | 36590336 |
| 0.2499 | 0.8706 | 27200 | 0.2619 | 36858080 |
| 0.2467 | 0.8770 | 27400 | 0.2674 | 37125216 |
| 0.2922 | 0.8834 | 27600 | 0.2640 | 37397648 |
| 0.1986 | 0.8898 | 27800 | 0.2676 | 37667456 |
| 0.2582 | 0.8962 | 28000 | 0.2647 | 37935760 |
| 0.4127 | 0.9026 | 28200 | 0.2646 | 38204832 |
| 0.3923 | 0.9090 | 28400 | 0.2642 | 38475552 |
| 0.2338 | 0.9154 | 28600 | 0.2647 | 38746560 |
| 0.25 | 0.9218 | 28800 | 0.2646 | 39016288 |
| 0.1791 | 0.9282 | 29000 | 0.2655 | 39287360 |
| 0.2976 | 0.9346 | 29200 | 0.2626 | 39557440 |
| 0.2667 | 0.9410 | 29400 | 0.2579 | 39830256 |
| 0.3166 | 0.9474 | 29600 | 0.2602 | 40102464 |
| 0.1429 | 0.9538 | 29800 | 0.2577 | 40371968 |
| 0.1887 | 0.9602 | 30000 | 0.2628 | 40643632 |
| 0.401 | 0.9666 | 30200 | 0.2641 | 40914064 |
| 0.2451 | 0.9730 | 30400 | 0.2607 | 41182128 |
| 0.3551 | 0.9795 | 30600 | 0.2593 | 41452688 |
| 0.1752 | 0.9859 | 30800 | 0.2612 | 41721056 |
| 0.2196 | 0.9923 | 31000 | 0.2620 | 41993584 |
| 0.2335 | 0.9987 | 31200 | 0.2585 | 42266304 |
| 0.2282 | 1.0051 | 31400 | 0.2640 | 42536720 |
| 0.1276 | 1.0115 | 31600 | 0.2647 | 42810528 |
| 0.2286 | 1.0179 | 31800 | 0.2683 | 43081488 |
| 0.1595 | 1.0243 | 32000 | 0.2689 | 43351904 |
| 0.3386 | 1.0307 | 32200 | 0.2682 | 43622640 |
| 0.3322 | 1.0371 | 32400 | 0.2694 | 43893856 |
| 0.0575 | 1.0435 | 32600 | 0.2720 | 44164592 |
| 0.2211 | 1.0499 | 32800 | 0.2740 | 44438640 |
| 0.2167 | 1.0563 | 33000 | 0.2693 | 44712640 |
| 0.1086 | 1.0627 | 33200 | 0.2683 | 44980912 |
| 0.2857 | 1.0691 | 33400 | 0.2695 | 45251328 |
| 0.1281 | 1.0755 | 33600 | 0.2700 | 45523792 |
| 0.1722 | 1.0819 | 33800 | 0.2689 | 45796960 |
| 0.2241 | 1.0883 | 34000 | 0.2684 | 46067712 |
| 0.2439 | 1.0947 | 34200 | 0.2690 | 46337408 |
| 0.3502 | 1.1011 | 34400 | 0.2690 | 46611232 |
| 0.1975 | 1.1075 | 34600 | 0.2680 | 46879824 |
| 0.2632 | 1.1139 | 34800 | 0.2699 | 47155008 |
| 0.2396 | 1.1203 | 35000 | 0.2688 | 47426864 |
| 0.2332 | 1.1267 | 35200 | 0.2674 | 47698224 |
| 0.1031 | 1.1331 | 35400 | 0.2675 | 47967840 |
| 0.1411 | 1.1395 | 35600 | 0.2679 | 48239792 |
| 0.2452 | 1.1459 | 35800 | 0.2688 | 48514752 |
| 0.3069 | 1.1523 | 36000 | 0.2684 | 48783136 |
| 0.1366 | 1.1587 | 36200 | 0.2692 | 49052640 |
| 0.1895 | 1.1651 | 36400 | 0.2688 | 49321648 |
| 0.2217 | 1.1715 | 36600 | 0.2692 | 49592352 |
| 0.1989 | 1.1779 | 36800 | 0.2694 | 49863184 |
| 0.2309 | 1.1843 | 37000 | 0.2694 | 50135184 |
| 0.3509 | 1.1907 | 37200 | 0.2692 | 50407568 |
| 0.3252 | 1.1971 | 37400 | 0.2687 | 50678192 |
| 0.2672 | 1.2035 | 37600 | 0.2689 | 50953312 |
| 0.2967 | 1.2099 | 37800 | 0.2689 | 51223392 |
| 0.1626 | 1.2163 | 38000 | 0.2687 | 51491824 |
| 0.256 | 1.2227 | 38200 | 0.2687 | 51763040 |
| 0.2826 | 1.2291 | 38400 | 0.2687 | 52033392 |
| 0.1631 | 1.2355 | 38600 | 0.2686 | 52304608 |
| 0.2621 | 1.2419 | 38800 | 0.2686 | 52574352 |
| 0.1966 | 1.2483 | 39000 | 0.2687 | 52846048 |
| 0.1242 | 1.2547 | 39200 | 0.2685 | 53118576 |
| 0.2718 | 1.2611 | 39400 | 0.2684 | 53387872 |
| 0.1515 | 1.2675 | 39600 | 0.2686 | 53659856 |
| 0.153 | 1.2739 | 39800 | 0.2687 | 53928784 |
| 0.2217 | 1.2803 | 40000 | 0.2685 | 54198768 |
Framework versions
- PEFT 0.15.2.dev0
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for rbelanec/train_record_1745950252
Base model
meta-llama/Meta-Llama-3-8B-Instruct