pycraft-1 / sft_train.log
imshadow0's picture
Upload sft_train.log with huggingface_hub
76c467c verified
Raw
History Blame Contribute Delete
4.22 kB
============================================================
PyCraft-1 Post-Training: SFT + Preference Alignment
============================================================
Base : checkpoints/step_0004000
Device : cuda
Dropout : disabled throughout fine-tuning
Precision: float32 (no BF16)
Params : 55.3M
Verifying base model loss (should be ~1.2-2.0)...
Base model loss on Python code: 0.4318
Base model verified OK.
============================================================
STAGE 1 — Supervised Fine-Tuning (SFT)
============================================================
Examples : 40,000
Batch size : 4 x 16 = 64 effective
LR : 0.0001
Steps : 400
Dropout : disabled (0.0)
Precision : float32 (no BF16 for stability)
Step 10 sanity check: loss=1.3601 -- OK
sft step 10 | loss 1.3601 | ppl 3.90 | lr 1.00e-05 | grad 2.335
sft step 20 | loss 1.3061 | ppl 3.69 | lr 2.00e-05 | grad 2.274
sft step 30 | loss 1.3067 | ppl 3.69 | lr 3.00e-05 | grad 2.229
sft step 40 | loss 1.2514 | ppl 3.50 | lr 4.00e-05 | grad 3.011
sft step 50 | loss 1.2427 | ppl 3.46 | lr 5.00e-05 | grad 2.227
sft step 60 | loss 1.2233 | ppl 3.40 | lr 6.00e-05 | grad 2.230
sft step 70 | loss 1.2386 | ppl 3.45 | lr 7.00e-05 | grad 2.269
sft step 80 | loss 1.2378 | ppl 3.45 | lr 8.00e-05 | grad 2.297
sft step 90 | loss 1.2648 | ppl 3.54 | lr 9.00e-05 | grad 2.298
sft step 100 | loss 1.2663 | ppl 3.55 | lr 1.00e-04 | grad 2.281
sft step 110 | loss 1.2966 | ppl 3.66 | lr 9.98e-05 | grad 2.227
sft step 120 | loss 1.2263 | ppl 3.41 | lr 9.90e-05 | grad 2.286
sft step 130 | loss 1.2827 | ppl 3.61 | lr 9.78e-05 | grad 2.310
sft step 140 | loss 1.2585 | ppl 3.52 | lr 9.61e-05 | grad 2.223
sft step 150 | loss 1.2534 | ppl 3.50 | lr 9.40e-05 | grad 2.070
sft step 160 | loss 1.2354 | ppl 3.44 | lr 9.14e-05 | grad 2.049
sft step 170 | loss 1.2277 | ppl 3.41 | lr 8.84e-05 | grad 2.076
sft step 180 | loss 1.2714 | ppl 3.57 | lr 8.51e-05 | grad 2.075
sft step 190 | loss 1.2685 | ppl 3.56 | lr 8.15e-05 | grad 2.257
sft step 200 | loss 1.2621 | ppl 3.53 | lr 7.75e-05 | grad 2.199
SFT checkpoint saved at step 200
sft step 210 | loss 1.2853 | ppl 3.62 | lr 7.33e-05 | grad 2.122
sft step 220 | loss 1.2623 | ppl 3.53 | lr 6.89e-05 | grad 2.047
sft step 230 | loss 1.2691 | ppl 3.56 | lr 6.44e-05 | grad 1.974
sft step 240 | loss 1.2495 | ppl 3.49 | lr 5.97e-05 | grad 2.051
sft step 250 | loss 1.2426 | ppl 3.46 | lr 5.50e-05 | grad 1.973
sft step 260 | loss 1.2013 | ppl 3.32 | lr 5.03e-05 | grad 1.970
sft step 270 | loss 1.2188 | ppl 3.38 | lr 4.56e-05 | grad 1.927
sft step 280 | loss 1.2244 | ppl 3.40 | lr 4.11e-05 | grad 2.000
sft step 290 | loss 1.1766 | ppl 3.24 | lr 3.67e-05 | grad 2.091
sft step 300 | loss 1.2086 | ppl 3.35 | lr 3.25e-05 | grad 2.031
sft step 310 | loss 1.2127 | ppl 3.36 | lr 2.85e-05 | grad 2.041
sft step 320 | loss 1.1476 | ppl 3.15 | lr 2.49e-05 | grad 1.968
sft step 330 | loss 1.1834 | ppl 3.27 | lr 2.16e-05 | grad 2.018
sft step 340 | loss 1.1942 | ppl 3.30 | lr 1.86e-05 | grad 2.044
sft step 350 | loss 1.1894 | ppl 3.29 | lr 1.60e-05 | grad 1.863
sft step 360 | loss 1.2079 | ppl 3.35 | lr 1.39e-05 | grad 1.992
sft step 370 | loss 1.2100 | ppl 3.35 | lr 1.22e-05 | grad 1.958
sft step 380 | loss 1.1824 | ppl 3.26 | lr 1.10e-05 | grad 2.060
sft step 390 | loss 1.1853 | ppl 3.27 | lr 1.02e-05 | grad 2.065
sft step 400 | loss 1.1465 | ppl 3.15 | lr 1.00e-05 | grad 1.905
SFT checkpoint saved at step 400
SFT complete. Best loss: 1.1465
============================================================
STAGE 2 — Preference Alignment (ORPO-inspired margin loss)
============================================================
Novel contribution: lightweight preference alignment
on a 55M from-scratch model, no reference model needed.
Pairs : 20,000
Margin : 0.5
LR : 2e-05
Steps : 200
pref step 10 | sft 10.7388 | pref 3.0770 | gap +2.308 | lr 4.00e-06 | grad 1.802
pref step 20 | sft 10.2856 | pref 2.9803 | gap +2.441 | lr 8.00e-06 | grad 1.824