LaughLM / log.txt
dignity045's picture
Duplicate from Dhiraj45/LaughLM
9639af0
Loaded dataset with 100,000,000 tokens
Model Report
────────────────────────
Total parameters: 16,013,568
Embedding parameters: 12,865,792
Parameters per layer: 786,944
Training Report
────────────────────────
Tokens per step: 512
Total training steps: 1,953
Target tokens: 1,000,000
Memory Report
────────────────────────
Parameter memory: 0.03 GB
Optimizer memory: 0.13 GB
Gradient memory: 0.03 GB
Estimated total: 0.19 GB
============================================================
Training for 488 optimizer steps
Effective tokens per step: 512
============================================================
STEP PROGRESS β”‚ LOSS PPL GNORM β”‚ LR β”‚ TOK/S MFU β”‚ SEEN REMAINING ETA
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
* 10 2.0% β”‚ 10.8392 51.0K n/a β”‚ ↑ 3.00e-05 β”‚ 2,241 0.014% β”‚ 5.1K 994.3K 7m23s
* 20 4.1% β”‚ 10.6394 41.7K n/a β”‚ ↑ 6.00e-05 β”‚ 14,593 0.089% β”‚ 10.2K 989.2K 1m07s
* 30 6.1% β”‚ 10.3197 30.3K n/a β”‚ ↑ 9.00e-05 β”‚ 18,688 0.114% β”‚ 15.4K 984.1K 52s
* 40 8.2% β”‚ 10.0608 23.4K n/a β”‚ ↑ 1.20e-04 β”‚ 20,733 0.126% β”‚ 20.5K 978.9K 47s
[checkpoint] saving step 40
[checkpoint] saving step 40
[checkpoint] saved step 40
* 50 10.2% β”‚ 9.8320 18.6K n/a β”‚ ↑ 1.50e-04 β”‚ 19,617 0.120% β”‚ 25.6K 973.8K 49s
* 60 12.3% β”‚ 9.4081 12.2K n/a β”‚ ↑ 1.80e-04 β”‚ 20,566 0.125% β”‚ 30.7K 968.7K 47s
* 70 14.3% β”‚ 8.9761 7912 n/a β”‚ ↑ 2.10e-04 β”‚ 21,564 0.131% β”‚ 35.8K 963.6K 44s
* 80 16.4% β”‚ 8.6242 5565 n/a β”‚ ↑ 2.40e-04 β”‚ 22,321 0.136% β”‚ 41.0K 958.5K 42s
[checkpoint] saving step 80
[checkpoint] saving step 80
[checkpoint] saved step 80
* 90 18.4% β”‚ 8.0923 3269 n/a β”‚ ↑ 2.70e-04 β”‚ 21,204 0.129% β”‚ 46.1K 953.3K 44s
* 100 20.5% β”‚ 7.8742 2629 n/a β”‚ ↑ 3.00e-04 β”‚ 21,187 0.129% β”‚ 51.2K 948.2K 44s
110 22.5% β”‚ 8.2624 3875 n/a β”‚ β€” 3.00e-04 β”‚ 21,539 0.131% β”‚ 56.3K 943.1K 43s
120 24.6% β”‚ 8.8018 6646 n/a β”‚ β€” 2.98e-04 β”‚ 21,909 0.134% β”‚ 61.4K 938.0K 42s
[checkpoint] saving step 120
[checkpoint] saving step 120
[checkpoint] saved step 120
* 130 26.6% β”‚ 7.6350 2069 n/a β”‚ β€” 2.96e-04 β”‚ 21,479 0.131% β”‚ 66.6K 932.9K 43s
* 140 28.7% β”‚ 7.2791 1450 n/a β”‚ β€” 2.93e-04 β”‚ 21,780 0.133% β”‚ 71.7K 927.7K 42s
150 30.7% β”‚ 7.5348 1872 n/a β”‚ β€” 2.89e-04 β”‚ 22,120 0.135% β”‚ 76.8K 922.6K 41s
* 160 32.8% β”‚ 7.2704 1437 n/a β”‚ β€” 2.84e-04 β”‚ 22,391 0.137% β”‚ 81.9K 917.5K 40s
[checkpoint] saving step 160
[checkpoint] saving step 160
[checkpoint] saved step 160
170 34.8% β”‚ 7.6200 2039 n/a β”‚ β€” 2.79e-04 β”‚ 21,774 0.133% β”‚ 87.0K 912.4K 41s
* 180 36.9% β”‚ 7.1109 1225 n/a β”‚ β€” 2.73e-04 β”‚ 21,763 0.133% β”‚ 92.2K 907.3K 41s
* 190 38.9% β”‚ 6.5831 722.8 n/a β”‚ β€” 2.66e-04 β”‚ 21,957 0.134% β”‚ 97.3K 902.1K 41s
200 41.0% β”‚ 7.8899 2670 n/a β”‚ β€” 2.58e-04 β”‚ 22,053 0.134% β”‚ 102.4K 897.0K 40s
[checkpoint] saving step 200
[checkpoint] saving step 200
[checkpoint] saved step 200
210 43.0% β”‚ 7.4864 1784 n/a β”‚ β€” 2.50e-04 β”‚ 21,606 0.132% β”‚ 107.5K 891.9K 41s
220 45.1% β”‚ 7.7538 2330 n/a β”‚ β€” 2.41e-04 β”‚ 21,455 0.131% β”‚ 112.6K 886.8K 41s
230 47.1% β”‚ 7.0994 1211 n/a β”‚ β€” 2.32e-04 β”‚ 21,550 0.131% β”‚ 117.8K 881.7K 40s
240 49.2% β”‚ 6.9114 1004 n/a β”‚ β€” 2.22e-04 β”‚ 21,638 0.132% β”‚ 122.9K 876.5K 40s
[checkpoint] saving step 240
[checkpoint] saving step 240
[checkpoint] saved step 240
250 51.2% β”‚ 7.7004 2209 n/a β”‚ β€” 2.12e-04 β”‚ 21,159 0.129% β”‚ 128.0K 871.4K 41s
260 53.3% β”‚ 7.1510 1275 n/a β”‚ β€” 2.02e-04 β”‚ 21,109 0.129% β”‚ 133.1K 866.3K 41s
270 55.3% β”‚ 7.4216 1672 n/a β”‚ β€” 1.91e-04 β”‚ 21,189 0.129% β”‚ 138.2K 861.2K 40s
280 57.4% β”‚ 7.2410 1395 n/a β”‚ β€” 1.80e-04 β”‚ 21,361 0.130% β”‚ 143.4K 856.1K 40s
[checkpoint] saving step 280
[checkpoint] saving step 280
[checkpoint] saved step 280
290 59.4% β”‚ 7.3611 1574 n/a β”‚ β€” 1.69e-04 β”‚ 21,116 0.129% β”‚ 148.5K 850.9K 40s
300 61.5% β”‚ 7.0222 1121 n/a β”‚ β€” 1.58e-04 β”‚ 21,209 0.129% β”‚ 153.6K 845.8K 39s
310 63.5% β”‚ 6.6481 771.3 n/a β”‚ β€” 1.48e-04 β”‚ 21,365 0.130% β”‚ 158.7K 840.7K 39s
320 65.6% β”‚ 7.1535 1279 n/a β”‚ β€” 1.37e-04 β”‚ 21,495 0.131% β”‚ 163.8K 835.6K 38s
[checkpoint] saving step 320
[checkpoint] saving step 320
[checkpoint] saved step 320
330 67.6% β”‚ 6.9375 1030 n/a β”‚ β€” 1.26e-04 β”‚ 21,330 0.130% β”‚ 169.0K 830.5K 38s
340 69.7% β”‚ 7.0372 1138 n/a β”‚ β€” 1.16e-04 β”‚ 21,440 0.131% β”‚ 174.1K 825.3K 38s
350 71.7% β”‚ 7.3495 1555 n/a β”‚ β€” 1.06e-04 β”‚ 21,521 0.131% β”‚ 179.2K 820.2K 38s
360 73.8% β”‚ 6.9398 1033 n/a β”‚ β€” 9.62e-05 β”‚ 21,580 0.132% β”‚ 184.3K 815.1K 37s
[checkpoint] saving step 360
[checkpoint] saving step 360
[checkpoint] saved step 360
370 75.8% β”‚ 7.0623 1167 n/a β”‚ β€” 8.71e-05 β”‚ 21,329 0.130% β”‚ 189.4K 810.0K 37s
* 380 77.9% β”‚ 6.5629 708.3 n/a β”‚ β€” 7.84e-05 β”‚ 21,287 0.130% β”‚ 194.6K 804.9K 37s
390 79.9% β”‚ 6.6633 783.1 n/a β”‚ β€” 7.03e-05 β”‚ 21,421 0.131% β”‚ 199.7K 799.7K 37s
400 82.0% β”‚ 6.8306 925.7 n/a β”‚ β€” 6.28e-05 β”‚ 21,535 0.131% β”‚ 204.8K 794.6K 36s
[checkpoint] saving step 400
[checkpoint] saving step 400
[checkpoint] saved step 400
410 84.0% β”‚ 7.2384 1392 n/a β”‚ β€” 5.60e-05 β”‚ 21,414 0.131% β”‚ 209.9K 789.5K 36s
420 86.1% β”‚ 7.4031 1641 n/a β”‚ β€” 5.00e-05 β”‚ 21,476 0.131% β”‚ 215.0K 784.4K 36s