| 2026-03-05 03:35:28 [INFO] ======================================================================== |
| 2026-03-05 03:35:28 [INFO] FRANKENSTALLM 3B β Full Evaluation Pipeline |
| 2026-03-05 03:35:28 [INFO] ======================================================================== |
| 2026-03-05 03:35:28 [INFO] Checkpoint : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_fp8_run1/checkpoint-0057000 |
| 2026-03-05 03:35:28 [INFO] Tokenizer : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/tokenizer/korean_sp/tokenizer.json |
| 2026-03-05 03:35:28 [INFO] Data dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/data |
| 2026-03-05 03:35:28 [INFO] Output dir : eval/outputs/3b_full_eval_20260305_0318 |
| 2026-03-05 03:35:28 [INFO] GPUs : [2, 3, 4, 5, 6, 7] |
| 2026-03-05 03:35:28 [INFO] SEQ_LEN : 2048 STRIDE: 512 BATCH_SIZE: 32 |
| 2026-03-05 03:35:28 [INFO] Phases : phase0=skip phase1=run phase2=run |
| 2026-03-05 03:35:28 [INFO] |
| 2026-03-05 03:35:28 [INFO] |
| 2026-03-05 03:35:28 [INFO] PHASE 0 β HF Checkpoint Conversion |
| 2026-03-05 03:35:28 [INFO] |
| 2026-03-05 03:35:28 [INFO] Skipping Phase 0 β reusing: eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000 |
| 2026-03-05 03:35:28 [INFO] |
| 2026-03-05 03:35:28 [INFO] |
| 2026-03-05 03:35:28 [INFO] PHASE 1 β Internal Evaluation β 6 GPU Parallel |
| 2026-03-05 03:35:28 [INFO] |
| 2026-03-05 03:35:28 [INFO] Spawning: GPU 5 β Calibration + Token NLL (GPU 5) |
| 2026-03-05 03:35:28 [INFO] Spawning: GPU 6 β Generation (15 prompts Γ 4 temps) (GPU 6) |
| 2026-03-05 03:35:28 [INFO] Spawning: GPU 7 β Repetition grid (12 Γ 5) (GPU 7) |
| 2026-03-05 03:35:28 [INFO] Spawning: GPU 2 β PPL: 3b_val.bin (GPU 2) |
| 2026-03-05 03:35:28 [INFO] Spawning: GPU 3 β PPL: korean_c4 + korean_val (GPU 3) |
| 2026-03-05 03:35:28 [INFO] Spawning: GPU 4 β PPL: hplt_ko + cc100_ko + PPL: 7 cosmo files + PPL: 7 remaining files (GPU 4) |
| 2026-03-05 03:35:52 [INFO] [DONE] GPU 5 β Calibration + Token NLL |
| 2026-03-05 03:38:04 [INFO] [DONE] GPU 6 β Generation (15 prompts Γ 4 temps) |
| 2026-03-05 03:38:22 [INFO] [DONE] GPU 7 β Repetition grid (12 Γ 5) |
| 2026-03-05 03:51:46 [INFO] [DONE] GPU 3 β PPL: korean_c4 + korean_val |
| 2026-03-05 04:01:10 [INFO] [DONE] GPU 4 β PPL: hplt_ko + cc100_ko + PPL: 7 cosmo files + PPL: 7 remaining files |
| 2026-03-05 04:12:48 [INFO] [DONE] GPU 2 β PPL: 3b_val.bin |
| 2026-03-05 04:12:48 [INFO] Complete: 6 succeeded, 0 failed |
| 2026-03-05 04:12:48 [INFO] Phase 1 results saved: eval/outputs/3b_full_eval_20260305_0318/phase1_results.json |
| 2026-03-05 04:12:48 [INFO] Generation samples saved: eval/outputs/3b_full_eval_20260305_0318/generation_samples.json |
| 2026-03-05 04:12:48 [INFO] Phase 1 complete in 37m 20s. |
| 2026-03-05 04:12:48 [INFO] |
| 2026-03-05 04:12:48 [INFO] |
| 2026-03-05 04:12:48 [INFO] PHASE 2 β lm-eval Benchmarks β 6 GPU Parallel |
| 2026-03-05 04:12:48 [INFO] |
| 2026-03-05 04:12:48 [INFO] Running 0-shot benchmarks on 6 GPUs ... |
| 2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 2 β KoBEST: boolq + copa (GPU 2) |
| 2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 3 β KoBEST: hellaswag + sentineg (GPU 3) |
| 2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 4 β KoBEST: wic (GPU 4) |
| 2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 5 β HAE-RAE (all subtasks) (GPU 5) |
| 2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 6 β MMLU-KO part 1/2 (GPU 6) |
| 2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 7 β MMLU-KO part 2/2 (GPU 7) |
| 2026-03-05 04:12:52 [INFO] [DONE] [0-shot] GPU 6 β MMLU-KO part 1/2 |
| 2026-03-05 04:12:52 [INFO] [DONE] [0-shot] GPU 7 β MMLU-KO part 2/2 |
| 2026-03-05 04:13:10 [INFO] [DONE] [0-shot] GPU 4 β KoBEST: wic |
| 2026-03-05 04:13:18 [INFO] [DONE] [0-shot] GPU 3 β KoBEST: hellaswag + sentineg |
| 2026-03-05 04:13:20 [INFO] [DONE] [0-shot] GPU 2 β KoBEST: boolq + copa |
| 2026-03-05 04:13:28 [INFO] [DONE] [0-shot] GPU 5 β HAE-RAE (all subtasks) |
| 2026-03-05 04:13:28 [INFO] Complete: 6 succeeded, 0 failed |
| 2026-03-05 04:13:28 [INFO] Phase 2 (0-shot) complete. |
| 2026-03-05 04:13:28 [INFO] Attempting 5-shot benchmarks ... |
| 2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 2 β KoBEST: boolq + copa (GPU 2) |
| 2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 3 β KoBEST: hellaswag + sentineg (GPU 3) |
| 2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 4 β KoBEST: wic (GPU 4) |
| 2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 5 β HAE-RAE (all subtasks) (GPU 5) |
| 2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 6 β MMLU-KO part 1/2 (GPU 6) |
| 2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 7 β MMLU-KO part 2/2 (GPU 7) |
| 2026-03-05 04:13:32 [INFO] [DONE] [5-shot] GPU 6 β MMLU-KO part 1/2 |
| 2026-03-05 04:13:32 [INFO] [DONE] [5-shot] GPU 7 β MMLU-KO part 2/2 |
| 2026-03-05 04:14:24 [INFO] [DONE] [5-shot] GPU 4 β KoBEST: wic |
| 2026-03-05 04:14:36 [INFO] [DONE] [5-shot] GPU 3 β KoBEST: hellaswag + sentineg |
| 2026-03-05 04:14:58 [INFO] [DONE] [5-shot] GPU 2 β KoBEST: boolq + copa |
| 2026-03-05 04:15:04 [INFO] [DONE] [5-shot] GPU 5 β HAE-RAE (all subtasks) |
| 2026-03-05 04:15:04 [INFO] Complete: 6 succeeded, 0 failed |
| 2026-03-05 04:15:04 [INFO] Phase 2 (5-shot) complete. |
| 2026-03-05 04:15:04 [INFO] Phase 2 results saved: eval/outputs/3b_full_eval_20260305_0318/phase2_results.json |
| 2026-03-05 04:15:04 [INFO] Phase 2 complete in 2m 16s. |
| 2026-03-05 04:15:04 [INFO] |
| 2026-03-05 04:15:04 [INFO] |
| 2026-03-05 04:15:04 [INFO] PHASE 3 β Report Generation |
| 2026-03-05 04:15:04 [INFO] |
| 2026-03-05 04:15:04 [INFO] Report saved: eval/outputs/3b_full_eval_20260305_0318/full_eval_report.md |
| 2026-03-05 04:15:04 [INFO] Phase 3 complete in 0s. |
| 2026-03-05 04:15:04 [INFO] ======================================================================== |
| 2026-03-05 04:15:04 [INFO] PIPELINE COMPLETE |
| 2026-03-05 04:15:04 [INFO] ======================================================================== |
| 2026-03-05 04:15:04 [INFO] Total time : 39m 36s |
| 2026-03-05 04:15:04 [INFO] Output dir : eval/outputs/3b_full_eval_20260305_0318 |
| 2026-03-05 04:15:04 [INFO] Phase 1 results : eval/outputs/3b_full_eval_20260305_0318/phase1_results.json |
| 2026-03-05 04:15:04 [INFO] Phase 2 results : eval/outputs/3b_full_eval_20260305_0318/phase2_results.json |
| 2026-03-05 04:15:04 [INFO] Gen samples : eval/outputs/3b_full_eval_20260305_0318/generation_samples.json |
| 2026-03-05 04:15:04 [INFO] Report : eval/outputs/3b_full_eval_20260305_0318/full_eval_report.md |
| 2026-03-05 04:15:04 [INFO] Phase 1 tasks : 6 OK / 0 failed |
| 2026-03-05 04:15:04 [INFO] Phase 2 tasks : 6 OK / 0 failed |
| 2026-03-05 04:15:04 [INFO] ======================================================================== |
|
|