| 2026-03-06 15:36:10 [INFO] ======================================================================== |
| 2026-03-06 15:36:10 [INFO] FRANKENSTALLM 3B β SFT Evaluation Pipeline |
| 2026-03-06 15:36:10 [INFO] ======================================================================== |
| 2026-03-06 15:36:10 [INFO] SFT Checkpoint : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_sft_v1/checkpoint-best |
| 2026-03-06 15:36:10 [INFO] Tokenizer : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_sft_v1/tokenizer.json |
| 2026-03-06 15:36:10 [INFO] Base Results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451 |
| 2026-03-06 15:36:10 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536 |
| 2026-03-06 15:36:10 [INFO] GPUs : [0, 1, 2, 3, 4, 5, 6, 7] |
| 2026-03-06 15:36:10 [INFO] Phases : phase0=skip phase1=run phase2=run |
| 2026-03-06 15:36:10 [INFO] |
| 2026-03-06 15:36:10 [INFO] |
| 2026-03-06 15:36:10 [INFO] PHASE 0 β SFT Checkpoint β HuggingFace Conversion |
| 2026-03-06 15:36:10 [INFO] |
| 2026-03-06 15:36:10 [INFO] Using pre-converted HF model: eval/outputs/hf_3b_sft_best |
| 2026-03-06 15:36:10 [INFO] |
| 2026-03-06 15:36:10 [INFO] |
| 2026-03-06 15:36:10 [INFO] PHASE 1 β SFT Internal Evaluation β 8 GPU Parallel |
| 2026-03-06 15:36:10 [INFO] |
| 2026-03-06 15:36:10 [INFO] Spawning: GPU 5 β Calibration + Token NLL (GPU 5) [SFT] |
| 2026-03-06 15:36:10 [INFO] Spawning: GPU 6 β Generation (15 prompts Γ 4 temps) (GPU 6) [SFT] |
| 2026-03-06 15:36:10 [INFO] Spawning: GPU 7 β Repetition grid (12 Γ 5) (GPU 7) [SFT] |
| 2026-03-06 15:36:10 [INFO] Spawning: GPU 0 β PPL: 3b_val.bin (GPU 0) [SFT] |
| 2026-03-06 15:36:10 [INFO] Spawning: GPU 1 β PPL: korean_c4 + korean_val (GPU 1) [SFT] |
| 2026-03-06 15:36:10 [INFO] Spawning: GPU 2 β PPL: hplt_ko + cc100_ko (GPU 2) [SFT] |
| 2026-03-06 15:36:10 [INFO] Spawning: GPU 3 β PPL: 7 cosmo files (GPU 3) [SFT] |
| 2026-03-06 15:36:10 [INFO] Spawning: GPU 4 β PPL: 7 remaining files (GPU 4) [SFT] |
| 2026-03-06 15:36:48 [INFO] [DONE] GPU 5 β Calibration + Token NLL |
| 2026-03-06 15:37:50 [INFO] [DONE] GPU 7 β Repetition grid (12 Γ 5) |
| 2026-03-06 15:38:00 [INFO] [DONE] GPU 6 β Generation (15 prompts Γ 4 temps) |
| 2026-03-06 15:43:50 [INFO] [DONE] GPU 3 β PPL: 7 cosmo files |
| 2026-03-06 15:44:44 [INFO] [DONE] GPU 4 β PPL: 7 remaining files |
| 2026-03-06 15:46:48 [INFO] [DONE] GPU 2 β PPL: hplt_ko + cc100_ko |
| 2026-03-06 15:52:52 [INFO] [DONE] GPU 1 β PPL: korean_c4 + korean_val |
| 2026-03-06 16:13:30 [INFO] [DONE] GPU 0 β PPL: 3b_val.bin |
| 2026-03-06 16:13:30 [INFO] Complete: 8 succeeded, 0 failed |
| 2026-03-06 16:13:30 [INFO] Phase 1 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/phase1_results.json |
| 2026-03-06 16:13:30 [INFO] Generation samples saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/generation_samples.json |
| 2026-03-06 16:13:30 [INFO] Phase 1 complete in 37m 20s. |
| 2026-03-06 16:13:30 [INFO] |
| 2026-03-06 16:13:30 [INFO] |
| 2026-03-06 16:13:30 [INFO] PHASE 2 β SFT Benchmarks β 8 GPU Parallel |
| 2026-03-06 16:13:30 [INFO] |
| 2026-03-06 16:13:30 [INFO] Running 0-shot benchmarks on 8 GPUs ... |
| 2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 0 β KoBEST: boolq + copa + wic (GPU 0) [SFT] |
| 2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 1 β KoBEST: hellaswag + sentineg (GPU 1) [SFT] |
| 2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 2 β HAE-RAE (all subtasks) (GPU 2) [SFT] |
| 2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 3 β MMLU-KO (57 subtasks) (GPU 3) [SFT] |
| 2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 4 β EN: hellaswag + arc_easy + arc_challenge (GPU 4) [SFT] |
| 2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 5 β EN: winogrande + piqa (GPU 5) [SFT] |
| 2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 6 β MMLU-EN: humanities + social_sciences (GPU 6) [SFT] |
| 2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 7 β MMLU-EN: stem + other (GPU 7) [SFT] |
| 2026-03-06 16:14:42 [INFO] [DONE] [0-shot] GPU 1 β KoBEST: hellaswag + sentineg |
| 2026-03-06 16:14:44 [INFO] [DONE] [0-shot] GPU 3 β MMLU-KO (57 subtasks) |
| 2026-03-06 16:14:58 [INFO] [DONE] [0-shot] GPU 5 β EN: winogrande + piqa |
| 2026-03-06 16:15:00 [INFO] [DONE] [0-shot] GPU 2 β HAE-RAE (all subtasks) |
| 2026-03-06 16:15:02 [INFO] [DONE] [0-shot] GPU 0 β KoBEST: boolq + copa + wic |
| 2026-03-06 16:16:16 [INFO] [DONE] [0-shot] GPU 7 β MMLU-EN: stem + other |
| 2026-03-06 16:16:18 [INFO] [DONE] [0-shot] GPU 6 β MMLU-EN: humanities + social_sciences |
| 2026-03-06 16:16:34 [INFO] [DONE] [0-shot] GPU 4 β EN: hellaswag + arc_easy + arc_challenge |
| 2026-03-06 16:16:34 [INFO] Complete: 8 succeeded, 0 failed |
| 2026-03-06 16:16:34 [INFO] Phase 2 (0-shot) complete. |
| 2026-03-06 16:16:34 [INFO] Attempting 5-shot benchmarks ... |
| 2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 0 β KoBEST: boolq + copa + wic (GPU 0) [SFT] |
| 2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 1 β KoBEST: hellaswag + sentineg (GPU 1) [SFT] |
| 2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 2 β HAE-RAE (all subtasks) (GPU 2) [SFT] |
| 2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 3 β MMLU-KO (57 subtasks) (GPU 3) [SFT] |
| 2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 4 β EN: hellaswag + arc_easy + arc_challenge (GPU 4) [SFT] |
| 2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 5 β EN: winogrande + piqa (GPU 5) [SFT] |
| 2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 6 β MMLU-EN: humanities + social_sciences (GPU 6) [SFT] |
| 2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 7 β MMLU-EN: stem + other (GPU 7) [SFT] |
| 2026-03-06 16:18:06 [INFO] [DONE] [5-shot] GPU 1 β KoBEST: hellaswag + sentineg |
| 2026-03-06 16:18:13 [INFO] [DONE] [5-shot] GPU 3 β MMLU-KO (57 subtasks) |
| 2026-03-06 16:19:05 [INFO] [DONE] [5-shot] GPU 5 β EN: winogrande + piqa |
| 2026-03-06 16:19:21 [INFO] [DONE] [5-shot] GPU 2 β HAE-RAE (all subtasks) |
| 2026-03-06 16:19:45 [INFO] [DONE] [5-shot] GPU 0 β KoBEST: boolq + copa + wic |
| 2026-03-06 16:22:29 [INFO] [DONE] [5-shot] GPU 7 β MMLU-EN: stem + other |
| 2026-03-06 16:23:35 [INFO] [DONE] [5-shot] GPU 6 β MMLU-EN: humanities + social_sciences |
| 2026-03-06 16:25:28 [INFO] [DONE] [5-shot] GPU 4 β EN: hellaswag + arc_easy + arc_challenge |
| 2026-03-06 16:25:28 [INFO] Complete: 8 succeeded, 0 failed |
| 2026-03-06 16:25:28 [INFO] Phase 2 (5-shot) complete. |
| 2026-03-06 16:25:32 [INFO] Phase 2 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/phase2_results.json |
| 2026-03-06 16:25:32 [INFO] Phase 2 complete in 12m 1s. |
| 2026-03-06 16:25:32 [INFO] |
| 2026-03-06 16:25:32 [INFO] |
| 2026-03-06 16:25:32 [INFO] PHASE 3 β Base vs SFT Comparison Report |
| 2026-03-06 16:25:32 [INFO] |
| 2026-03-06 16:25:33 [ERROR] Phase 3 report generation failed: |
| Traceback (most recent call last): |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/sft_eval_pipeline.py", line 279, in run_sft_phase3 |
| report_path = generate_comparison_report( |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/report_generator.py", line 775, in generate_comparison_report |
| lines.append(f"- **μ΄ μμ μκ°**: {_fmt_seconds(total_elapsed_sec)}") |
| ^^^^^^^^^^^^ |
| NameError: name '_fmt_seconds' is not defined |
|
|
| 2026-03-06 16:25:37 [INFO] Fallback summary saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/sft_eval_summary.json |
| 2026-03-06 16:25:37 [INFO] Phase 3 complete in 5s. |
| 2026-03-06 16:25:37 [INFO] ======================================================================== |
| 2026-03-06 16:25:37 [INFO] SFT EVALUATION PIPELINE COMPLETE |
| 2026-03-06 16:25:37 [INFO] ======================================================================== |
| 2026-03-06 16:25:37 [INFO] Total time : 49m 27s |
| 2026-03-06 16:25:37 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536 |
| 2026-03-06 16:25:37 [INFO] Phase 1 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/phase1_results.json |
| 2026-03-06 16:25:37 [INFO] Phase 2 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/phase2_results.json |
| 2026-03-06 16:25:37 [INFO] Report : N/A |
| 2026-03-06 16:25:37 [INFO] ======================================================================== |
|
|