frankenstallm / source /eval /outputs /pipeline_run3.log
pathcosmos's picture
Upload folder using huggingface_hub (#29)
5b1ff4d
2026-03-05 03:35:28 [INFO] ========================================================================
2026-03-05 03:35:28 [INFO] FRANKENSTALLM 3B β€” Full Evaluation Pipeline
2026-03-05 03:35:28 [INFO] ========================================================================
2026-03-05 03:35:28 [INFO] Checkpoint : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_fp8_run1/checkpoint-0057000
2026-03-05 03:35:28 [INFO] Tokenizer : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/tokenizer/korean_sp/tokenizer.json
2026-03-05 03:35:28 [INFO] Data dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/data
2026-03-05 03:35:28 [INFO] Output dir : eval/outputs/3b_full_eval_20260305_0318
2026-03-05 03:35:28 [INFO] GPUs : [2, 3, 4, 5, 6, 7]
2026-03-05 03:35:28 [INFO] SEQ_LEN : 2048 STRIDE: 512 BATCH_SIZE: 32
2026-03-05 03:35:28 [INFO] Phases : phase0=skip phase1=run phase2=run
2026-03-05 03:35:28 [INFO]
2026-03-05 03:35:28 [INFO] ------------------------------------------------------------------------
2026-03-05 03:35:28 [INFO] PHASE 0 β€” HF Checkpoint Conversion
2026-03-05 03:35:28 [INFO] ------------------------------------------------------------------------
2026-03-05 03:35:28 [INFO] Skipping Phase 0 β€” reusing: eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000
2026-03-05 03:35:28 [INFO]
2026-03-05 03:35:28 [INFO] ------------------------------------------------------------------------
2026-03-05 03:35:28 [INFO] PHASE 1 β€” Internal Evaluation β€” 6 GPU Parallel
2026-03-05 03:35:28 [INFO] ------------------------------------------------------------------------
2026-03-05 03:35:28 [INFO] Spawning: GPU 5 β€” Calibration + Token NLL (GPU 5)
2026-03-05 03:35:28 [INFO] Spawning: GPU 6 β€” Generation (15 prompts Γ— 4 temps) (GPU 6)
2026-03-05 03:35:28 [INFO] Spawning: GPU 7 β€” Repetition grid (12 Γ— 5) (GPU 7)
2026-03-05 03:35:28 [INFO] Spawning: GPU 2 β€” PPL: 3b_val.bin (GPU 2)
2026-03-05 03:35:28 [INFO] Spawning: GPU 3 β€” PPL: korean_c4 + korean_val (GPU 3)
2026-03-05 03:35:28 [INFO] Spawning: GPU 4 β€” PPL: hplt_ko + cc100_ko + PPL: 7 cosmo files + PPL: 7 remaining files (GPU 4)
2026-03-05 03:35:52 [INFO] [DONE] GPU 5 β€” Calibration + Token NLL
2026-03-05 03:38:04 [INFO] [DONE] GPU 6 β€” Generation (15 prompts Γ— 4 temps)
2026-03-05 03:38:22 [INFO] [DONE] GPU 7 β€” Repetition grid (12 Γ— 5)
2026-03-05 03:51:46 [INFO] [DONE] GPU 3 β€” PPL: korean_c4 + korean_val
2026-03-05 04:01:10 [INFO] [DONE] GPU 4 β€” PPL: hplt_ko + cc100_ko + PPL: 7 cosmo files + PPL: 7 remaining files
2026-03-05 04:12:48 [INFO] [DONE] GPU 2 β€” PPL: 3b_val.bin
2026-03-05 04:12:48 [INFO] Complete: 6 succeeded, 0 failed
2026-03-05 04:12:48 [INFO] Phase 1 results saved: eval/outputs/3b_full_eval_20260305_0318/phase1_results.json
2026-03-05 04:12:48 [INFO] Generation samples saved: eval/outputs/3b_full_eval_20260305_0318/generation_samples.json
2026-03-05 04:12:48 [INFO] Phase 1 complete in 37m 20s.
2026-03-05 04:12:48 [INFO]
2026-03-05 04:12:48 [INFO] ------------------------------------------------------------------------
2026-03-05 04:12:48 [INFO] PHASE 2 β€” lm-eval Benchmarks β€” 6 GPU Parallel
2026-03-05 04:12:48 [INFO] ------------------------------------------------------------------------
2026-03-05 04:12:48 [INFO] Running 0-shot benchmarks on 6 GPUs ...
2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 2 β€” KoBEST: boolq + copa (GPU 2)
2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 3 β€” KoBEST: hellaswag + sentineg (GPU 3)
2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 4 β€” KoBEST: wic (GPU 4)
2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 5 β€” HAE-RAE (all subtasks) (GPU 5)
2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 6 β€” MMLU-KO part 1/2 (GPU 6)
2026-03-05 04:12:48 [INFO] Spawning: [0-shot] GPU 7 β€” MMLU-KO part 2/2 (GPU 7)
2026-03-05 04:12:52 [INFO] [DONE] [0-shot] GPU 6 β€” MMLU-KO part 1/2
2026-03-05 04:12:52 [INFO] [DONE] [0-shot] GPU 7 β€” MMLU-KO part 2/2
2026-03-05 04:13:10 [INFO] [DONE] [0-shot] GPU 4 β€” KoBEST: wic
2026-03-05 04:13:18 [INFO] [DONE] [0-shot] GPU 3 β€” KoBEST: hellaswag + sentineg
2026-03-05 04:13:20 [INFO] [DONE] [0-shot] GPU 2 β€” KoBEST: boolq + copa
2026-03-05 04:13:28 [INFO] [DONE] [0-shot] GPU 5 β€” HAE-RAE (all subtasks)
2026-03-05 04:13:28 [INFO] Complete: 6 succeeded, 0 failed
2026-03-05 04:13:28 [INFO] Phase 2 (0-shot) complete.
2026-03-05 04:13:28 [INFO] Attempting 5-shot benchmarks ...
2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 2 β€” KoBEST: boolq + copa (GPU 2)
2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 3 β€” KoBEST: hellaswag + sentineg (GPU 3)
2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 4 β€” KoBEST: wic (GPU 4)
2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 5 β€” HAE-RAE (all subtasks) (GPU 5)
2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 6 β€” MMLU-KO part 1/2 (GPU 6)
2026-03-05 04:13:28 [INFO] Spawning: [5-shot] GPU 7 β€” MMLU-KO part 2/2 (GPU 7)
2026-03-05 04:13:32 [INFO] [DONE] [5-shot] GPU 6 β€” MMLU-KO part 1/2
2026-03-05 04:13:32 [INFO] [DONE] [5-shot] GPU 7 β€” MMLU-KO part 2/2
2026-03-05 04:14:24 [INFO] [DONE] [5-shot] GPU 4 β€” KoBEST: wic
2026-03-05 04:14:36 [INFO] [DONE] [5-shot] GPU 3 β€” KoBEST: hellaswag + sentineg
2026-03-05 04:14:58 [INFO] [DONE] [5-shot] GPU 2 β€” KoBEST: boolq + copa
2026-03-05 04:15:04 [INFO] [DONE] [5-shot] GPU 5 β€” HAE-RAE (all subtasks)
2026-03-05 04:15:04 [INFO] Complete: 6 succeeded, 0 failed
2026-03-05 04:15:04 [INFO] Phase 2 (5-shot) complete.
2026-03-05 04:15:04 [INFO] Phase 2 results saved: eval/outputs/3b_full_eval_20260305_0318/phase2_results.json
2026-03-05 04:15:04 [INFO] Phase 2 complete in 2m 16s.
2026-03-05 04:15:04 [INFO]
2026-03-05 04:15:04 [INFO] ------------------------------------------------------------------------
2026-03-05 04:15:04 [INFO] PHASE 3 β€” Report Generation
2026-03-05 04:15:04 [INFO] ------------------------------------------------------------------------
2026-03-05 04:15:04 [INFO] Report saved: eval/outputs/3b_full_eval_20260305_0318/full_eval_report.md
2026-03-05 04:15:04 [INFO] Phase 3 complete in 0s.
2026-03-05 04:15:04 [INFO] ========================================================================
2026-03-05 04:15:04 [INFO] PIPELINE COMPLETE
2026-03-05 04:15:04 [INFO] ========================================================================
2026-03-05 04:15:04 [INFO] Total time : 39m 36s
2026-03-05 04:15:04 [INFO] Output dir : eval/outputs/3b_full_eval_20260305_0318
2026-03-05 04:15:04 [INFO] Phase 1 results : eval/outputs/3b_full_eval_20260305_0318/phase1_results.json
2026-03-05 04:15:04 [INFO] Phase 2 results : eval/outputs/3b_full_eval_20260305_0318/phase2_results.json
2026-03-05 04:15:04 [INFO] Gen samples : eval/outputs/3b_full_eval_20260305_0318/generation_samples.json
2026-03-05 04:15:04 [INFO] Report : eval/outputs/3b_full_eval_20260305_0318/full_eval_report.md
2026-03-05 04:15:04 [INFO] Phase 1 tasks : 6 OK / 0 failed
2026-03-05 04:15:04 [INFO] Phase 2 tasks : 6 OK / 0 failed
2026-03-05 04:15:04 [INFO] ========================================================================