Fix dead split parameter in PackedStreamingDataset._load_dataset 0cd5689 Vjeong Claude Sonnet 4.6 commited on 1 day ago
Refactor runner.py: extract shared setup logic into _setup_and_train helper 9b6bd85 Vjeong Claude Sonnet 4.6 commited on 4 days ago
Fix dtype mismatch in RoPE cos/sin for mixed precision training 331cfcd Vjeong Claude Sonnet 4.6 commited on 4 days ago
Replace F.silu with explicit SiLU implementation in SwiGLUFeedForward baf4768 Vjeong Claude Sonnet 4.6 commited on 5 days ago
Replace F.scaled_dot_product_attention with explicit implementation e072b51 Vjeong Claude Sonnet 4.6 commited on 5 days ago
Remove dead attn_dropout layer from GroupedQueryAttention 9f5773b Vjeong Claude Sonnet 4.6 commited on 5 days ago
Fix LR warmup ordering and align adam_eps with Meta LLaMA af13727 Vjeong Claude Opus 4.6 commited on 6 days ago
Remove redundant detect_scenario from LossDebugger 2a50172 Vjeong Claude Opus 4.6 commited on 6 days ago
Add Code CPT pipeline for injecting Python code capability a424729 Vjeong Claude Opus 4.6 commited on 7 days ago
Fix LR reference table and batch-LR scaling guidance in LossDebugger e96b9d3 Vjeong Claude Opus 4.6 commited on 9 days ago
Fix batch size diagnostic: widen window and list multiple causes fb048e4 Vjeong Claude Opus 4.6 commited on 11 days ago
Fix gradient clipping thresholds in dynamics and checklist modules a671953 Vjeong Claude Opus 4.6 commited on 11 days ago
Fix gradient diagnostic thresholds with evidence-based criteria in LossDebugger 362e9ea Vjeong Claude Opus 4.6 commited on 11 days ago
Fix check_numerical_stability accuracy and completeness 2fb0306 Vjeong Claude Opus 4.6 commited on 12 days ago
Scale overfit test LR and steps by model size in LossDebugger 6b7ca0e Vjeong Claude Sonnet 4.6 commited on 12 days ago
Reduce LOSS_BOUNCE false positives with moving-average smoothing 1451cc6 Vjeong Claude Opus 4.6 commited on 13 days ago
Improve LOSS_BOUNCE detection with pre-computed bounce metrics 6c7b430 Vjeong Claude Opus 4.6 commited on 13 days ago
Add LOSS_BOUNCE detection to diagnose_status classification chain 38fd260 Vjeong Claude Opus 4.6 commited on 13 days ago
Add NaN detection to diagnose_status classification chain 8313ca8 Vjeong Claude Opus 4.6 commited on 14 days ago
Fix debugger reference data and scenario detection logic d789de8 Vjeong Claude Sonnet 4.6 commited on 14 days ago
Remove unused tokenizer training code (train_bpe, load_sentencepiece, load_trained_hf) 33ba3d1 Vjeong Claude Opus 4.6 commited on 14 days ago
Tighten expected loss ranges for FineWeb-Edu dataset a02e949 Vjeong Claude Opus 4.6 commited on 15 days ago
Use LLaMA 2 pretrained tokenizer and remove tokenizer_mode option a5ca4e4 Vjeong Claude Opus 4.6 commited on 18 days ago
Fix BPE tokenizer ByteLevel decoder and update evaluation notebook 8626149 Vjeong Claude Sonnet 4.6 commited on 20 days ago
feat(training): add LossDebugger 5-level diagnostic framework 5b7ea5e Vjeong Claude Opus 4.6 commited on 22 days ago
feat(config): add scale-specific presets to TrainConfig 1c8f3e6 Vjeong Claude Sonnet 4.6 commited on 25 days ago
fix(trainer): correct attribute name from total_mem to total_memory f5ab21e Vjeong Claude Sonnet 4.6 commited on 25 days ago
refactor(runner): replace manual seed setup with set_seed utility 4733791 Vjeong Claude Sonnet 4.6 commited on 27 days ago
fix(device): correct attribute name from total_mem to total_memory f91d771 Vjeong Claude Sonnet 4.6 commited on 29 days ago
refactor(model): replace single-letter vars with descriptive names for readability 81a9145 Vjeong Claude Sonnet 4.6 commited on Mar 6
docs: translate all Korean comments and docstrings to English 858e8b2 Vjeong Claude Sonnet 4.6 commited on Feb 27
refactor(data): replace per-worker seed strategy with full sharding in IterableDataset 8a39fec Vjeong Claude Sonnet 4.6 commited on Feb 21