Fix bidirectional _orig_mod. key mismatch when loading checkpoints 3b04aa3 Vjeong Claude Sonnet 4.6 commited on 3 days ago
Fix _orig_mod. key prefix mismatch when loading torch.compile() checkpoints 3535442 Vjeong Claude Sonnet 4.6 commited on 3 days ago
Refactor runner.py: extract shared setup logic into _setup_and_train helper 9b6bd85 Vjeong Claude Sonnet 4.6 commited on 9 days ago
Fix LR warmup ordering and align adam_eps with Meta LLaMA af13727 Vjeong Claude Opus 4.6 commited on 11 days ago
Remove redundant detect_scenario from LossDebugger 2a50172 Vjeong Claude Opus 4.6 commited on 11 days ago
Add Code CPT pipeline for injecting Python code capability a424729 Vjeong Claude Opus 4.6 commited on 12 days ago
Fix false/inaccurate citations in LossDebugger 4322ea0 Vjeong Claude Opus 4.6 commited on 14 days ago
Fix LR reference table and batch-LR scaling guidance in LossDebugger e96b9d3 Vjeong Claude Opus 4.6 commited on 14 days ago
Fix batch size diagnostic: widen window and list multiple causes fb048e4 Vjeong Claude Opus 4.6 commited on 16 days ago
Fix gradient diagnostic thresholds with evidence-based criteria in LossDebugger 362e9ea Vjeong Claude Opus 4.6 commited on 16 days ago
Fix check_numerical_stability accuracy and completeness 2fb0306 Vjeong Claude Opus 4.6 commited on 17 days ago
Scale overfit test LR and steps by model size in LossDebugger 6b7ca0e Vjeong Claude Sonnet 4.6 commited on 17 days ago
Reduce LOSS_BOUNCE false positives with moving-average smoothing 1451cc6 Vjeong Claude Opus 4.6 commited on 18 days ago
Improve LOSS_BOUNCE detection with pre-computed bounce metrics 6c7b430 Vjeong Claude Opus 4.6 commited on 18 days ago
Add LOSS_BOUNCE detection to diagnose_status classification chain 38fd260 Vjeong Claude Opus 4.6 commited on 18 days ago
Add NaN detection to diagnose_status classification chain 8313ca8 Vjeong Claude Opus 4.6 commited on 18 days ago
Fix debugger reference data and scenario detection logic d789de8 Vjeong Claude Sonnet 4.6 commited on 19 days ago
Tighten expected loss ranges for FineWeb-Edu dataset a02e949 Vjeong Claude Opus 4.6 commited on 20 days ago
feat(training): add LossDebugger 5-level diagnostic framework 5b7ea5e Vjeong Claude Opus 4.6 commited on 27 days ago
fix(trainer): correct attribute name from total_mem to total_memory f5ab21e Vjeong Claude Sonnet 4.6 commited on about 1 month ago
refactor(runner): replace manual seed setup with set_seed utility 4733791 Vjeong Claude Sonnet 4.6 commited on Mar 11
docs: translate all Korean comments and docstrings to English 858e8b2 Vjeong Claude Sonnet 4.6 commited on Feb 27