Auto-resume from Hub when local checkpoints are cleared by Kaggle\n\nNow --resume checks: local disk first β Hub download fallback β fresh start\nNo more lost progress across Kaggle sessions."
CRITICAL FIX: Wire up real data pipeline (was training on random tokens!)\n\nChanges:\n- Replace random.randint batch with TAPDataPipeline streaming real data\n- Add tokenizer initialization with TAP special tokens\n- Add pad_fraction diagnostic to detect data issues\n- Keep upd_rms diagnostic\n- Stage-aware curriculum switching\n- Proper pad token masking in loss (was already in build_model_mtp)\n\nThis fixes the root cause of loss stuck at ln(vocab)=10.85"