LLM-1B-Lab / llm_lab /training

Commit History

Fix bidirectional _orig_mod. key mismatch when loading checkpoints
3b04aa3

Vjeong Claude Sonnet 4.6 commited on

Fix _orig_mod. key prefix mismatch when loading torch.compile() checkpoints
3535442

Vjeong Claude Sonnet 4.6 commited on

Refactor runner.py: extract shared setup logic into _setup_and_train helper
9b6bd85

Vjeong Claude Sonnet 4.6 commited on

Fix LR warmup ordering and align adam_eps with Meta LLaMA
af13727

Vjeong Claude Opus 4.6 commited on

Remove redundant detect_scenario from LossDebugger
2a50172

Vjeong Claude Opus 4.6 commited on

Add Code CPT pipeline for injecting Python code capability
a424729

Vjeong Claude Opus 4.6 commited on

Fix false/inaccurate citations in LossDebugger
4322ea0

Vjeong Claude Opus 4.6 commited on

Fix LR reference table and batch-LR scaling guidance in LossDebugger
e96b9d3

Vjeong Claude Opus 4.6 commited on

Fix batch size diagnostic: widen window and list multiple causes
fb048e4

Vjeong Claude Opus 4.6 commited on

Fix gradient diagnostic thresholds with evidence-based criteria in LossDebugger
362e9ea

Vjeong Claude Opus 4.6 commited on

Fix check_numerical_stability accuracy and completeness
2fb0306

Vjeong Claude Opus 4.6 commited on

Scale overfit test LR and steps by model size in LossDebugger
6b7ca0e

Vjeong Claude Sonnet 4.6 commited on

Reduce LOSS_BOUNCE false positives with moving-average smoothing
1451cc6

Vjeong Claude Opus 4.6 commited on

Improve LOSS_BOUNCE detection with pre-computed bounce metrics
6c7b430

Vjeong Claude Opus 4.6 commited on

Add LOSS_BOUNCE detection to diagnose_status classification chain
38fd260

Vjeong Claude Opus 4.6 commited on

Add NaN detection to diagnose_status classification chain
8313ca8

Vjeong Claude Opus 4.6 commited on

Fix debugger reference data and scenario detection logic
d789de8

Vjeong Claude Sonnet 4.6 commited on

Add configurable wandb log directory path
ae5f15e

Vjeong Claude Opus 4.6 commited on

Tighten expected loss ranges for FineWeb-Edu dataset
a02e949

Vjeong Claude Opus 4.6 commited on

feat(training): add LossDebugger 5-level diagnostic framework
5b7ea5e

Vjeong Claude Opus 4.6 commited on

fix(trainer): correct attribute name from total_mem to total_memory
f5ab21e

Vjeong Claude Sonnet 4.6 commited on

refactor(runner): replace manual seed setup with set_seed utility
4733791

Vjeong Claude Sonnet 4.6 commited on

docs: translate all Korean comments and docstrings to English
858e8b2

Vjeong Claude Sonnet 4.6 commited on

Initial commit: LLM-1B-Lab project setup
8a58ffe

Vjeong Claude Opus 4.6 commited on