| ============================================================ | |
| PyCraft-1 Post-Training: SFT + Preference Alignment | |
| ============================================================ | |
| Base : checkpoints/step_0004000 | |
| Device : cuda | |
| Dropout : disabled throughout fine-tuning | |
| Precision: float32 (no BF16) | |
| Params : 55.3M | |
| Verifying base model loss (should be ~1.2-2.0)... | |
| Base model loss on Python code: 0.4318 | |
| Base model verified OK. | |
| ============================================================ | |
| STAGE 1 — Supervised Fine-Tuning (SFT) | |
| ============================================================ | |
| Examples : 40,000 | |
| Batch size : 4 x 16 = 64 effective | |
| LR : 0.0001 | |
| Steps : 400 | |
| Dropout : disabled (0.0) | |
| Precision : float32 (no BF16 for stability) | |
| Step 10 sanity check: loss=1.3601 -- OK | |
| sft step 10 | loss 1.3601 | ppl 3.90 | lr 1.00e-05 | grad 2.335 | |
| sft step 20 | loss 1.3061 | ppl 3.69 | lr 2.00e-05 | grad 2.274 | |
| sft step 30 | loss 1.3067 | ppl 3.69 | lr 3.00e-05 | grad 2.229 | |
| sft step 40 | loss 1.2514 | ppl 3.50 | lr 4.00e-05 | grad 3.011 | |
| sft step 50 | loss 1.2427 | ppl 3.46 | lr 5.00e-05 | grad 2.227 | |
| sft step 60 | loss 1.2233 | ppl 3.40 | lr 6.00e-05 | grad 2.230 | |
| sft step 70 | loss 1.2386 | ppl 3.45 | lr 7.00e-05 | grad 2.269 | |
| sft step 80 | loss 1.2378 | ppl 3.45 | lr 8.00e-05 | grad 2.297 | |
| sft step 90 | loss 1.2648 | ppl 3.54 | lr 9.00e-05 | grad 2.298 | |
| sft step 100 | loss 1.2663 | ppl 3.55 | lr 1.00e-04 | grad 2.281 | |
| sft step 110 | loss 1.2966 | ppl 3.66 | lr 9.98e-05 | grad 2.227 | |
| sft step 120 | loss 1.2263 | ppl 3.41 | lr 9.90e-05 | grad 2.286 | |
| sft step 130 | loss 1.2827 | ppl 3.61 | lr 9.78e-05 | grad 2.310 | |
| sft step 140 | loss 1.2585 | ppl 3.52 | lr 9.61e-05 | grad 2.223 | |
| sft step 150 | loss 1.2534 | ppl 3.50 | lr 9.40e-05 | grad 2.070 | |
| sft step 160 | loss 1.2354 | ppl 3.44 | lr 9.14e-05 | grad 2.049 | |
| sft step 170 | loss 1.2277 | ppl 3.41 | lr 8.84e-05 | grad 2.076 | |
| sft step 180 | loss 1.2714 | ppl 3.57 | lr 8.51e-05 | grad 2.075 | |
| sft step 190 | loss 1.2685 | ppl 3.56 | lr 8.15e-05 | grad 2.257 | |
| sft step 200 | loss 1.2621 | ppl 3.53 | lr 7.75e-05 | grad 2.199 | |
| SFT checkpoint saved at step 200 | |
| sft step 210 | loss 1.2853 | ppl 3.62 | lr 7.33e-05 | grad 2.122 | |
| sft step 220 | loss 1.2623 | ppl 3.53 | lr 6.89e-05 | grad 2.047 | |
| sft step 230 | loss 1.2691 | ppl 3.56 | lr 6.44e-05 | grad 1.974 | |
| sft step 240 | loss 1.2495 | ppl 3.49 | lr 5.97e-05 | grad 2.051 | |
| sft step 250 | loss 1.2426 | ppl 3.46 | lr 5.50e-05 | grad 1.973 | |
| sft step 260 | loss 1.2013 | ppl 3.32 | lr 5.03e-05 | grad 1.970 | |
| sft step 270 | loss 1.2188 | ppl 3.38 | lr 4.56e-05 | grad 1.927 | |
| sft step 280 | loss 1.2244 | ppl 3.40 | lr 4.11e-05 | grad 2.000 | |
| sft step 290 | loss 1.1766 | ppl 3.24 | lr 3.67e-05 | grad 2.091 | |
| sft step 300 | loss 1.2086 | ppl 3.35 | lr 3.25e-05 | grad 2.031 | |
| sft step 310 | loss 1.2127 | ppl 3.36 | lr 2.85e-05 | grad 2.041 | |
| sft step 320 | loss 1.1476 | ppl 3.15 | lr 2.49e-05 | grad 1.968 | |
| sft step 330 | loss 1.1834 | ppl 3.27 | lr 2.16e-05 | grad 2.018 | |
| sft step 340 | loss 1.1942 | ppl 3.30 | lr 1.86e-05 | grad 2.044 | |
| sft step 350 | loss 1.1894 | ppl 3.29 | lr 1.60e-05 | grad 1.863 | |
| sft step 360 | loss 1.2079 | ppl 3.35 | lr 1.39e-05 | grad 1.992 | |
| sft step 370 | loss 1.2100 | ppl 3.35 | lr 1.22e-05 | grad 1.958 | |
| sft step 380 | loss 1.1824 | ppl 3.26 | lr 1.10e-05 | grad 2.060 | |
| sft step 390 | loss 1.1853 | ppl 3.27 | lr 1.02e-05 | grad 2.065 | |
| sft step 400 | loss 1.1465 | ppl 3.15 | lr 1.00e-05 | grad 1.905 | |
| SFT checkpoint saved at step 400 | |
| SFT complete. Best loss: 1.1465 | |
| ============================================================ | |
| STAGE 2 — Preference Alignment (ORPO-inspired margin loss) | |
| ============================================================ | |
| Novel contribution: lightweight preference alignment | |
| on a 55M from-scratch model, no reference model needed. | |
| Pairs : 20,000 | |
| Margin : 0.5 | |
| LR : 2e-05 | |
| Steps : 200 | |
| pref step 10 | sft 10.7388 | pref 3.0770 | gap +2.308 | lr 4.00e-06 | grad 1.802 | |
| pref step 20 | sft 10.2856 | pref 2.9803 | gap +2.441 | lr 8.00e-06 | grad 1.824 | |