LLM-1B-Lab / llm_lab

Commit History

Fix dead split parameter in PackedStreamingDataset._load_dataset

0cd5689

Vjeong Claude Sonnet 4.6 commited on 1 day ago

Refactor runner.py: extract shared setup logic into _setup_and_train helper

9b6bd85

Vjeong Claude Sonnet 4.6 commited on 4 days ago

Fix dtype mismatch in RoPE cos/sin for mixed precision training

331cfcd

Vjeong Claude Sonnet 4.6 commited on 4 days ago

Replace F.silu with explicit SiLU implementation in SwiGLUFeedForward

baf4768

Vjeong Claude Sonnet 4.6 commited on 5 days ago

Replace F.scaled_dot_product_attention with explicit implementation

e072b51

Vjeong Claude Sonnet 4.6 commited on 5 days ago

Remove dead attn_dropout layer from GroupedQueryAttention

9f5773b

Vjeong Claude Sonnet 4.6 commited on 5 days ago

Fix LR warmup ordering and align adam_eps with Meta LLaMA

af13727

Vjeong Claude Opus 4.6 commited on 6 days ago

Remove redundant detect_scenario from LossDebugger

2a50172

Vjeong Claude Opus 4.6 commited on 6 days ago

Add Code CPT pipeline for injecting Python code capability

a424729

Vjeong Claude Opus 4.6 commited on 7 days ago

Fix false/inaccurate citations in LossDebugger

4322ea0

Vjeong Claude Opus 4.6 commited on 9 days ago

Fix LR reference table and batch-LR scaling guidance in LossDebugger

e96b9d3

Vjeong Claude Opus 4.6 commited on 9 days ago

Fix batch size diagnostic: widen window and list multiple causes

fb048e4

Vjeong Claude Opus 4.6 commited on 11 days ago

Fix gradient clipping thresholds in dynamics and checklist modules

a671953

Vjeong Claude Opus 4.6 commited on 11 days ago

Fix gradient diagnostic thresholds with evidence-based criteria in LossDebugger

362e9ea

Vjeong Claude Opus 4.6 commited on 11 days ago

Fix check_numerical_stability accuracy and completeness

2fb0306

Vjeong Claude Opus 4.6 commited on 12 days ago

Scale overfit test LR and steps by model size in LossDebugger

6b7ca0e

Vjeong Claude Sonnet 4.6 commited on 12 days ago

Reduce LOSS_BOUNCE false positives with moving-average smoothing

1451cc6

Vjeong Claude Opus 4.6 commited on 13 days ago

Improve LOSS_BOUNCE detection with pre-computed bounce metrics

6c7b430

Vjeong Claude Opus 4.6 commited on 13 days ago

Add LOSS_BOUNCE detection to diagnose_status classification chain

38fd260

Vjeong Claude Opus 4.6 commited on 13 days ago

Add NaN detection to diagnose_status classification chain

8313ca8

Vjeong Claude Opus 4.6 commited on 14 days ago

Fix debugger reference data and scenario detection logic

d789de8

Vjeong Claude Sonnet 4.6 commited on 14 days ago

Remove unused tokenizer training code (train_bpe, load_sentencepiece, load_trained_hf)

33ba3d1

Vjeong Claude Opus 4.6 commited on 14 days ago

Add configurable wandb log directory path

ae5f15e

Vjeong Claude Opus 4.6 commited on 14 days ago

Tighten expected loss ranges for FineWeb-Edu dataset

a02e949

Vjeong Claude Opus 4.6 commited on 15 days ago

Use LLaMA 2 pretrained tokenizer and remove tokenizer_mode option

a5ca4e4

Vjeong Claude Opus 4.6 commited on 18 days ago

Fix BPE tokenizer ByteLevel decoder and update evaluation notebook

8626149

Vjeong Claude Sonnet 4.6 commited on 20 days ago

feat(training): add LossDebugger 5-level diagnostic framework

5b7ea5e

Vjeong Claude Opus 4.6 commited on 22 days ago

feat(config): add scale-specific presets to TrainConfig

1c8f3e6

Vjeong Claude Sonnet 4.6 commited on 25 days ago

fix(trainer): correct attribute name from total_mem to total_memory

f5ab21e

Vjeong Claude Sonnet 4.6 commited on 25 days ago

refactor(runner): replace manual seed setup with set_seed utility

4733791

Vjeong Claude Sonnet 4.6 commited on 27 days ago

fix(device): correct attribute name from total_mem to total_memory

f91d771

Vjeong Claude Sonnet 4.6 commited on 29 days ago

refactor(model): replace single-letter vars with descriptive names for readability

81a9145

Vjeong Claude Sonnet 4.6 commited on Mar 6

docs: translate all Korean comments and docstrings to English

858e8b2

Vjeong Claude Sonnet 4.6 commited on Feb 27

refactor(data): replace per-worker seed strategy with full sharding in IterableDataset

8a39fec

Vjeong Claude Sonnet 4.6 commited on Feb 21

Initial commit: LLM-1B-Lab project setup

8a58ffe

Vjeong Claude Opus 4.6 commited on Feb 11

Commit History

Fix dead split parameter in PackedStreamingDataset._load_dataset 0cd5689

Refactor runner.py: extract shared setup logic into _setup_and_train helper 9b6bd85

Fix dtype mismatch in RoPE cos/sin for mixed precision training 331cfcd

Replace F.silu with explicit SiLU implementation in SwiGLUFeedForward baf4768

Replace F.scaled_dot_product_attention with explicit implementation e072b51

Remove dead attn_dropout layer from GroupedQueryAttention 9f5773b

Fix LR warmup ordering and align adam_eps with Meta LLaMA af13727

Remove redundant detect_scenario from LossDebugger 2a50172

Add Code CPT pipeline for injecting Python code capability a424729

Fix false/inaccurate citations in LossDebugger 4322ea0

Fix LR reference table and batch-LR scaling guidance in LossDebugger e96b9d3

Fix batch size diagnostic: widen window and list multiple causes fb048e4

Fix gradient clipping thresholds in dynamics and checklist modules a671953

Fix gradient diagnostic thresholds with evidence-based criteria in LossDebugger 362e9ea

Fix check_numerical_stability accuracy and completeness 2fb0306

Scale overfit test LR and steps by model size in LossDebugger 6b7ca0e

Reduce LOSS_BOUNCE false positives with moving-average smoothing 1451cc6

Improve LOSS_BOUNCE detection with pre-computed bounce metrics 6c7b430

Add LOSS_BOUNCE detection to diagnose_status classification chain 38fd260

Add NaN detection to diagnose_status classification chain 8313ca8

Fix debugger reference data and scenario detection logic d789de8

Remove unused tokenizer training code (train_bpe, load_sentencepiece, load_trained_hf) 33ba3d1

Add configurable wandb log directory path ae5f15e

Tighten expected loss ranges for FineWeb-Edu dataset a02e949

Use LLaMA 2 pretrained tokenizer and remove tokenizer_mode option a5ca4e4

Fix BPE tokenizer ByteLevel decoder and update evaluation notebook 8626149

feat(training): add LossDebugger 5-level diagnostic framework 5b7ea5e

feat(config): add scale-specific presets to TrainConfig 1c8f3e6

fix(trainer): correct attribute name from total_mem to total_memory f5ab21e

refactor(runner): replace manual seed setup with set_seed utility 4733791

fix(device): correct attribute name from total_mem to total_memory f91d771

refactor(model): replace single-letter vars with descriptive names for readability 81a9145

docs: translate all Korean comments and docstrings to English 858e8b2

refactor(data): replace per-worker seed strategy with full sharding in IterableDataset 8a39fec

Initial commit: LLM-1B-Lab project setup 8a58ffe

Fix dead split parameter in PackedStreamingDataset._load_dataset

0cd5689

Refactor runner.py: extract shared setup logic into _setup_and_train helper

9b6bd85

Fix dtype mismatch in RoPE cos/sin for mixed precision training

331cfcd

Replace F.silu with explicit SiLU implementation in SwiGLUFeedForward

baf4768

Replace F.scaled_dot_product_attention with explicit implementation

e072b51

Remove dead attn_dropout layer from GroupedQueryAttention

9f5773b

Fix LR warmup ordering and align adam_eps with Meta LLaMA

af13727

Remove redundant detect_scenario from LossDebugger

2a50172

Add Code CPT pipeline for injecting Python code capability

a424729

Fix false/inaccurate citations in LossDebugger

4322ea0

Fix LR reference table and batch-LR scaling guidance in LossDebugger

e96b9d3

Fix batch size diagnostic: widen window and list multiple causes

fb048e4

Fix gradient clipping thresholds in dynamics and checklist modules

a671953

Fix gradient diagnostic thresholds with evidence-based criteria in LossDebugger

362e9ea

Fix check_numerical_stability accuracy and completeness

2fb0306

Scale overfit test LR and steps by model size in LossDebugger

6b7ca0e

Reduce LOSS_BOUNCE false positives with moving-average smoothing

1451cc6

Improve LOSS_BOUNCE detection with pre-computed bounce metrics

6c7b430

Add LOSS_BOUNCE detection to diagnose_status classification chain

38fd260

Add NaN detection to diagnose_status classification chain

8313ca8

Fix debugger reference data and scenario detection logic

d789de8

Remove unused tokenizer training code (train_bpe, load_sentencepiece, load_trained_hf)

33ba3d1

Add configurable wandb log directory path

ae5f15e

Tighten expected loss ranges for FineWeb-Edu dataset

a02e949

Use LLaMA 2 pretrained tokenizer and remove tokenizer_mode option

a5ca4e4

Fix BPE tokenizer ByteLevel decoder and update evaluation notebook

8626149

feat(training): add LossDebugger 5-level diagnostic framework

5b7ea5e

feat(config): add scale-specific presets to TrainConfig

1c8f3e6

fix(trainer): correct attribute name from total_mem to total_memory

f5ab21e

refactor(runner): replace manual seed setup with set_seed utility

4733791

fix(device): correct attribute name from total_mem to total_memory

f91d771

refactor(model): replace single-letter vars with descriptive names for readability

81a9145

docs: translate all Korean comments and docstrings to English

858e8b2

refactor(data): replace per-worker seed strategy with full sharding in IterableDataset

8a39fec

Initial commit: LLM-1B-Lab project setup

8a58ffe