frankenstallm / source /eval /outputs /sft_pipeline.log
pathcosmos's picture
Upload folder using huggingface_hub (#29)
5b1ff4d
2026-03-06 15:36:10 [INFO] ========================================================================
2026-03-06 15:36:10 [INFO] FRANKENSTALLM 3B β€” SFT Evaluation Pipeline
2026-03-06 15:36:10 [INFO] ========================================================================
2026-03-06 15:36:10 [INFO] SFT Checkpoint : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_sft_v1/checkpoint-best
2026-03-06 15:36:10 [INFO] Tokenizer : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_sft_v1/tokenizer.json
2026-03-06 15:36:10 [INFO] Base Results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451
2026-03-06 15:36:10 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536
2026-03-06 15:36:10 [INFO] GPUs : [0, 1, 2, 3, 4, 5, 6, 7]
2026-03-06 15:36:10 [INFO] Phases : phase0=skip phase1=run phase2=run
2026-03-06 15:36:10 [INFO]
2026-03-06 15:36:10 [INFO] ------------------------------------------------------------------------
2026-03-06 15:36:10 [INFO] PHASE 0 β€” SFT Checkpoint β†’ HuggingFace Conversion
2026-03-06 15:36:10 [INFO] ------------------------------------------------------------------------
2026-03-06 15:36:10 [INFO] Using pre-converted HF model: eval/outputs/hf_3b_sft_best
2026-03-06 15:36:10 [INFO]
2026-03-06 15:36:10 [INFO] ------------------------------------------------------------------------
2026-03-06 15:36:10 [INFO] PHASE 1 β€” SFT Internal Evaluation β€” 8 GPU Parallel
2026-03-06 15:36:10 [INFO] ------------------------------------------------------------------------
2026-03-06 15:36:10 [INFO] Spawning: GPU 5 β€” Calibration + Token NLL (GPU 5) [SFT]
2026-03-06 15:36:10 [INFO] Spawning: GPU 6 β€” Generation (15 prompts Γ— 4 temps) (GPU 6) [SFT]
2026-03-06 15:36:10 [INFO] Spawning: GPU 7 β€” Repetition grid (12 Γ— 5) (GPU 7) [SFT]
2026-03-06 15:36:10 [INFO] Spawning: GPU 0 β€” PPL: 3b_val.bin (GPU 0) [SFT]
2026-03-06 15:36:10 [INFO] Spawning: GPU 1 β€” PPL: korean_c4 + korean_val (GPU 1) [SFT]
2026-03-06 15:36:10 [INFO] Spawning: GPU 2 β€” PPL: hplt_ko + cc100_ko (GPU 2) [SFT]
2026-03-06 15:36:10 [INFO] Spawning: GPU 3 β€” PPL: 7 cosmo files (GPU 3) [SFT]
2026-03-06 15:36:10 [INFO] Spawning: GPU 4 β€” PPL: 7 remaining files (GPU 4) [SFT]
2026-03-06 15:36:48 [INFO] [DONE] GPU 5 β€” Calibration + Token NLL
2026-03-06 15:37:50 [INFO] [DONE] GPU 7 β€” Repetition grid (12 Γ— 5)
2026-03-06 15:38:00 [INFO] [DONE] GPU 6 β€” Generation (15 prompts Γ— 4 temps)
2026-03-06 15:43:50 [INFO] [DONE] GPU 3 β€” PPL: 7 cosmo files
2026-03-06 15:44:44 [INFO] [DONE] GPU 4 β€” PPL: 7 remaining files
2026-03-06 15:46:48 [INFO] [DONE] GPU 2 β€” PPL: hplt_ko + cc100_ko
2026-03-06 15:52:52 [INFO] [DONE] GPU 1 β€” PPL: korean_c4 + korean_val
2026-03-06 16:13:30 [INFO] [DONE] GPU 0 β€” PPL: 3b_val.bin
2026-03-06 16:13:30 [INFO] Complete: 8 succeeded, 0 failed
2026-03-06 16:13:30 [INFO] Phase 1 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/phase1_results.json
2026-03-06 16:13:30 [INFO] Generation samples saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/generation_samples.json
2026-03-06 16:13:30 [INFO] Phase 1 complete in 37m 20s.
2026-03-06 16:13:30 [INFO]
2026-03-06 16:13:30 [INFO] ------------------------------------------------------------------------
2026-03-06 16:13:30 [INFO] PHASE 2 β€” SFT Benchmarks β€” 8 GPU Parallel
2026-03-06 16:13:30 [INFO] ------------------------------------------------------------------------
2026-03-06 16:13:30 [INFO] Running 0-shot benchmarks on 8 GPUs ...
2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 0 β€” KoBEST: boolq + copa + wic (GPU 0) [SFT]
2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 1 β€” KoBEST: hellaswag + sentineg (GPU 1) [SFT]
2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 2 β€” HAE-RAE (all subtasks) (GPU 2) [SFT]
2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 3 β€” MMLU-KO (57 subtasks) (GPU 3) [SFT]
2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 4 β€” EN: hellaswag + arc_easy + arc_challenge (GPU 4) [SFT]
2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 5 β€” EN: winogrande + piqa (GPU 5) [SFT]
2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 6 β€” MMLU-EN: humanities + social_sciences (GPU 6) [SFT]
2026-03-06 16:13:30 [INFO] Spawning: [0-shot] GPU 7 β€” MMLU-EN: stem + other (GPU 7) [SFT]
2026-03-06 16:14:42 [INFO] [DONE] [0-shot] GPU 1 β€” KoBEST: hellaswag + sentineg
2026-03-06 16:14:44 [INFO] [DONE] [0-shot] GPU 3 β€” MMLU-KO (57 subtasks)
2026-03-06 16:14:58 [INFO] [DONE] [0-shot] GPU 5 β€” EN: winogrande + piqa
2026-03-06 16:15:00 [INFO] [DONE] [0-shot] GPU 2 β€” HAE-RAE (all subtasks)
2026-03-06 16:15:02 [INFO] [DONE] [0-shot] GPU 0 β€” KoBEST: boolq + copa + wic
2026-03-06 16:16:16 [INFO] [DONE] [0-shot] GPU 7 β€” MMLU-EN: stem + other
2026-03-06 16:16:18 [INFO] [DONE] [0-shot] GPU 6 β€” MMLU-EN: humanities + social_sciences
2026-03-06 16:16:34 [INFO] [DONE] [0-shot] GPU 4 β€” EN: hellaswag + arc_easy + arc_challenge
2026-03-06 16:16:34 [INFO] Complete: 8 succeeded, 0 failed
2026-03-06 16:16:34 [INFO] Phase 2 (0-shot) complete.
2026-03-06 16:16:34 [INFO] Attempting 5-shot benchmarks ...
2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 0 β€” KoBEST: boolq + copa + wic (GPU 0) [SFT]
2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 1 β€” KoBEST: hellaswag + sentineg (GPU 1) [SFT]
2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 2 β€” HAE-RAE (all subtasks) (GPU 2) [SFT]
2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 3 β€” MMLU-KO (57 subtasks) (GPU 3) [SFT]
2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 4 β€” EN: hellaswag + arc_easy + arc_challenge (GPU 4) [SFT]
2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 5 β€” EN: winogrande + piqa (GPU 5) [SFT]
2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 6 β€” MMLU-EN: humanities + social_sciences (GPU 6) [SFT]
2026-03-06 16:16:34 [INFO] Spawning: [5-shot] GPU 7 β€” MMLU-EN: stem + other (GPU 7) [SFT]
2026-03-06 16:18:06 [INFO] [DONE] [5-shot] GPU 1 β€” KoBEST: hellaswag + sentineg
2026-03-06 16:18:13 [INFO] [DONE] [5-shot] GPU 3 β€” MMLU-KO (57 subtasks)
2026-03-06 16:19:05 [INFO] [DONE] [5-shot] GPU 5 β€” EN: winogrande + piqa
2026-03-06 16:19:21 [INFO] [DONE] [5-shot] GPU 2 β€” HAE-RAE (all subtasks)
2026-03-06 16:19:45 [INFO] [DONE] [5-shot] GPU 0 β€” KoBEST: boolq + copa + wic
2026-03-06 16:22:29 [INFO] [DONE] [5-shot] GPU 7 β€” MMLU-EN: stem + other
2026-03-06 16:23:35 [INFO] [DONE] [5-shot] GPU 6 β€” MMLU-EN: humanities + social_sciences
2026-03-06 16:25:28 [INFO] [DONE] [5-shot] GPU 4 β€” EN: hellaswag + arc_easy + arc_challenge
2026-03-06 16:25:28 [INFO] Complete: 8 succeeded, 0 failed
2026-03-06 16:25:28 [INFO] Phase 2 (5-shot) complete.
2026-03-06 16:25:32 [INFO] Phase 2 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/phase2_results.json
2026-03-06 16:25:32 [INFO] Phase 2 complete in 12m 1s.
2026-03-06 16:25:32 [INFO]
2026-03-06 16:25:32 [INFO] ------------------------------------------------------------------------
2026-03-06 16:25:32 [INFO] PHASE 3 β€” Base vs SFT Comparison Report
2026-03-06 16:25:32 [INFO] ------------------------------------------------------------------------
2026-03-06 16:25:33 [ERROR] Phase 3 report generation failed:
Traceback (most recent call last):
File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/sft_eval_pipeline.py", line 279, in run_sft_phase3
report_path = generate_comparison_report(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/report_generator.py", line 775, in generate_comparison_report
lines.append(f"- **총 μ†Œμš” μ‹œκ°„**: {_fmt_seconds(total_elapsed_sec)}")
^^^^^^^^^^^^
NameError: name '_fmt_seconds' is not defined
2026-03-06 16:25:37 [INFO] Fallback summary saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/sft_eval_summary.json
2026-03-06 16:25:37 [INFO] Phase 3 complete in 5s.
2026-03-06 16:25:37 [INFO] ========================================================================
2026-03-06 16:25:37 [INFO] SFT EVALUATION PIPELINE COMPLETE
2026-03-06 16:25:37 [INFO] ========================================================================
2026-03-06 16:25:37 [INFO] Total time : 49m 27s
2026-03-06 16:25:37 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536
2026-03-06 16:25:37 [INFO] Phase 1 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/phase1_results.json
2026-03-06 16:25:37 [INFO] Phase 2 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_sft_eval_20260306_1536/phase2_results.json
2026-03-06 16:25:37 [INFO] Report : N/A
2026-03-06 16:25:37 [INFO] ========================================================================