smallm_70_rope / README.md
Azrail's picture
Upload SmalLmForCausalLM
a914078 verified
metadata
library_name: transformers
tags:
  - generated_from_trainer
  - smallm
model-index:
  - name: smallm_70_rope
    results: []

smallm_70_rope

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8645
  • Num Input Tokens Seen: 18350080000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 64
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_APEX_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: warmup_stable_decay
  • lr_scheduler_warmup_steps: 500
  • training_steps: 70000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
5.3179 0.0034 500 5.1793 131072000
4.208 0.0067 1000 4.1088 262144000
3.8864 0.0101 1500 3.8075 393216000
3.7289 0.0135 2000 3.6546 524288000
3.6424 0.0168 2500 3.5607 655360000
3.5846 0.0202 3000 3.5029 786432000
3.528 0.0235 3500 3.4473 917504000
3.4784 0.0269 4000 3.4037 1048576000
3.4509 0.0303 4500 3.3683 1179648000
3.4252 0.0336 5000 3.3413 1310720000
3.4036 0.0370 5500 3.3187 1441792000
3.3953 0.0404 6000 3.2934 1572864000
3.3625 0.0437 6500 3.2745 1703936000
3.3387 0.0471 7000 3.2563 1835008000
3.3459 0.0504 7500 3.2415 1966080000
3.3143 0.0538 8000 3.2275 2097152000
3.2975 0.0572 8500 3.2149 2228224000
3.2817 0.0605 9000 3.2016 2359296000
3.2876 0.0639 9500 3.1907 2490368000
3.2632 0.0673 10000 3.1775 2621440000
3.2577 0.0706 10500 3.1682 2752512000
3.2427 0.0740 11000 3.1592 2883584000
3.2421 0.0774 11500 3.1493 3014656000
3.2393 0.0807 12000 3.1432 3145728000
3.2386 0.0841 12500 3.1355 3276800000
3.2158 0.0874 13000 3.1287 3407872000
3.2117 0.0908 13500 3.1214 3538944000
3.2057 0.0942 14000 3.1152 3670016000
3.2121 0.0975 14500 3.1071 3801088000
3.2015 0.1009 15000 3.1015 3932160000
3.1925 0.1043 15500 3.0996 4063232000
3.1796 0.1076 16000 3.0902 4194304000
3.211 0.1110 16500 3.0987 4325376000
3.1778 0.1144 17000 3.0843 4456448000
3.1717 0.1177 17500 3.0752 4587520000
3.1597 0.1211 18000 3.0699 4718592000
3.183 0.1244 18500 3.0884 4849664000
3.1541 0.1278 19000 3.0668 4980736000
3.1499 0.1312 19500 3.0654 5111808000
3.1499 0.1345 20000 3.0563 5242880000
3.1462 0.1379 20500 3.0525 5373952000
3.15 0.1413 21000 3.0538 5505024000
3.1544 0.1446 21500 3.0516 5636096000
3.1475 0.1480 22000 3.0482 5767168000
3.1364 0.1513 22500 3.0421 5898240000
3.1564 0.1547 23000 3.0723 6029312000
3.1312 0.1581 23500 3.0458 6160384000
3.132 0.1614 24000 3.0352 6291456000
3.1358 0.1648 24500 3.0328 6422528000
3.1231 0.1682 25000 3.0353 6553600000
3.1248 0.1715 25500 3.0260 6684672000
3.118 0.1749 26000 3.0195 6815744000
3.1308 0.1783 26500 3.0297 6946816000
3.1286 0.1816 27000 3.0181 7077888000
3.1231 0.1850 27500 3.0236 7208960000
3.1399 0.1883 28000 3.0280 7340032000
3.1113 0.1917 28500 3.0133 7471104000
3.1287 0.1951 29000 3.0184 7602176000
3.108 0.1984 29500 3.0065 7733248000
3.1074 0.2018 30000 3.0053 7864320000
3.1155 0.2052 30500 3.0058 7995392000
3.0952 0.2085 31000 3.0034 8126464000
3.1095 0.2119 31500 3.0025 8257536000
3.1201 0.2152 32000 2.9990 8388608000
3.0979 0.2186 32500 2.9993 8519680000
3.1079 0.2220 33000 2.9947 8650752000
3.0888 0.2253 33500 2.9899 8781824000
3.1028 0.2287 34000 2.9927 8912896000
3.1182 0.2321 34500 3.0027 9043968000
3.0831 0.2354 35000 2.9875 9175040000
3.1019 0.2388 35500 2.9896 9306112000
3.0993 0.2422 36000 2.9876 9437184000
3.0801 0.2455 36500 2.9815 9568256000
3.0913 0.2489 37000 2.9841 9699328000
3.1105 0.2522 37500 2.9955 9830400000
3.0926 0.2556 38000 2.9854 9961472000
3.0802 0.2590 38500 2.9803 10092544000
3.0881 0.2623 39000 2.9857 10223616000
3.083 0.2657 39500 2.9809 10354688000
3.0904 0.2691 40000 2.9785 10485760000
3.0857 0.2724 40500 2.9742 10616832000
3.0675 0.2758 41000 2.9688 10747904000
3.0733 0.2791 41500 2.9694 10878976000
3.0685 0.2825 42000 2.9689 11010048000
3.0798 0.2859 42500 2.9728 11141120000
3.071 0.2892 43000 2.9696 11272192000
3.0664 0.2926 43500 2.9677 11403264000
3.0844 0.2960 44000 2.9880 11534336000
3.0591 0.2993 44500 2.9622 11665408000
3.0603 0.3027 45000 2.9669 11796480000
3.0714 0.3061 45500 2.9655 11927552000
3.0602 0.3094 46000 2.9600 12058624000
3.067 0.3128 46500 2.9571 12189696000
3.0676 0.3161 47000 2.9561 12320768000
3.0544 0.3195 47500 2.9534 12451840000
3.0489 0.3229 48000 2.9548 12582912000
3.072 0.3262 48500 2.9678 12713984000
3.0473 0.3296 49000 2.9521 12845056000
3.0573 0.3330 49500 2.9763 12976128000
3.0805 0.3363 50000 2.9581 13107200000
3.073 0.3397 50500 2.9553 13238272000
3.054 0.3431 51000 2.9483 13369344000
3.049 0.3464 51500 2.9457 13500416000
3.0509 0.3498 52000 2.9477 13631488000
3.0478 0.3531 52500 2.9460 13762560000
3.044 0.3565 53000 2.9570 13893632000
3.0444 0.3599 53500 2.9434 14024704000
3.071 0.3632 54000 2.9484 14155776000
3.0523 0.3666 54500 2.9419 14286848000
3.0524 0.3700 55000 2.9469 14417920000
3.0432 0.3733 55500 2.9362 14548992000
3.0364 0.3767 56000 2.9314 14680064000
3.0241 0.3800 56500 2.9202 14811136000
3.0101 0.3834 57000 2.9125 14942208000
3.0115 0.3868 57500 2.9029 15073280000
2.9931 0.3901 58000 2.8951 15204352000
2.9876 0.3935 58500 2.8888 15335424000
2.9856 0.3969 59000 2.8846 15466496000
2.9824 0.4002 59500 2.8822 15597568000
2.9789 0.4036 60000 2.8819 15728640000
3.0132 0.4070 60500 2.9149 15859712000
3.0125 0.4103 61000 2.9137 15990784000
3.0115 0.4137 61500 2.9049 16121856000
3.0079 0.4170 62000 2.9013 16252928000
3.0055 0.4204 62500 2.8968 16384000000
2.9823 0.4238 63000 2.8930 16515072000
3.0004 0.4271 63500 2.8904 16646144000
2.9839 0.4305 64000 2.8860 16777216000
2.9789 0.4339 64500 2.8814 16908288000
2.9876 0.4372 65000 2.8793 17039360000
2.9804 0.4406 65500 2.8758 17170432000
2.9851 0.4439 66000 2.8729 17301504000
2.9651 0.4473 66500 2.8710 17432576000
2.9704 0.4507 67000 2.8692 17563648000
2.9785 0.4540 67500 2.8678 17694720000
2.9724 0.4574 68000 2.8663 17825792000
2.9732 0.4608 68500 2.8653 17956864000
2.9622 0.4641 69000 2.8648 18087936000
2.964 0.4675 69500 2.8646 18219008000
2.9684 0.4709 70000 2.8645 18350080000

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.6.0+cu126
  • Datasets 3.5.0
  • Tokenizers 0.21.1