frankenstallm / source /eval /outputs /reeval_run.log
pathcosmos's picture
Upload folder using huggingface_hub (#29)
5b1ff4d
2026-03-05 10:57:53 [INFO] ========================================================================
2026-03-05 10:57:53 [INFO] FRANKENSTALLM 3B β€” Re-evaluation Pipeline
2026-03-05 10:57:53 [INFO] ========================================================================
2026-03-05 10:57:53 [INFO] Previous run : eval/outputs/3b_full_eval_20260305_0318
2026-03-05 10:57:53 [INFO] HF checkpoint: eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000
2026-03-05 10:57:53 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057
2026-03-05 10:57:53 [INFO] GPUs : [0, 1, 2, 3, 4, 5, 6, 7]
2026-03-05 10:57:53 [INFO] Skip Phase 2 : False
2026-03-05 10:57:53 [INFO]
2026-03-05 10:57:53 [INFO] ------------------------------------------------------------------------
2026-03-05 10:57:53 [INFO] PHASE 1 β€” Copy from Previous Run
2026-03-05 10:57:53 [INFO] ------------------------------------------------------------------------
2026-03-05 10:57:53 [INFO] Copied phase1_results.json from previous run
2026-03-05 10:57:53 [INFO] Copied generation_samples.json
2026-03-05 10:57:53 [INFO]
2026-03-05 10:57:53 [INFO] ------------------------------------------------------------------------
2026-03-05 10:57:53 [INFO] PHASE 2 β€” Corrected Benchmarks β€” 8 GPU Parallel
2026-03-05 10:57:53 [INFO] ------------------------------------------------------------------------
2026-03-05 10:57:53 [INFO] Running 0-shot benchmarks on 8 GPUs ...
2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 0 β€” KoBEST: boolq + copa (GPU 0)
2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 1 β€” KoBEST: hellaswag + sentineg (GPU 1)
2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 2 β€” KoBEST: wic (GPU 2)
2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 3 β€” HAE-RAE (all subtasks) (GPU 3)
2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 4 β€” MMLU-KO (57 subtasks) (GPU 4)
2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 5 β€” EN: hellaswag + arc_easy + arc_challenge (GPU 5)
2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 6 β€” EN: winogrande + piqa (GPU 6)
2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 7 β€” MMLU-EN (57 subtasks) (GPU 7)
2026-03-05 10:59:21 [INFO] [DONE] [0-shot] GPU 2 β€” KoBEST: wic
2026-03-05 10:59:33 [INFO] [DONE] [0-shot] GPU 1 β€” KoBEST: hellaswag + sentineg
2026-03-05 10:59:35 [INFO] [DONE] [0-shot] GPU 4 β€” MMLU-KO (57 subtasks)
2026-03-05 10:59:41 [INFO] [DONE] [0-shot] GPU 0 β€” KoBEST: boolq + copa
2026-03-05 10:59:45 [INFO] [DONE] [0-shot] GPU 3 β€” HAE-RAE (all subtasks)
2026-03-05 10:59:47 [INFO] [DONE] [0-shot] GPU 6 β€” EN: winogrande + piqa
2026-03-05 11:00:41 [INFO] [DONE] [0-shot] GPU 5 β€” EN: hellaswag + arc_easy + arc_challenge
2026-03-05 11:01:30 [INFO] [DONE] [0-shot] GPU 7 β€” MMLU-EN (57 subtasks)
2026-03-05 11:01:30 [INFO] Complete: 8 succeeded, 0 failed
2026-03-05 11:01:30 [INFO] 0-shot complete.
2026-03-05 11:01:31 [INFO] Running 5-shot benchmarks (Korean only) ...
2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 0 β€” KoBEST: boolq + copa (GPU 0)
2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 1 β€” KoBEST: hellaswag + sentineg (GPU 1)
2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 2 β€” KoBEST: wic (GPU 2)
2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 3 β€” HAE-RAE (all subtasks) (GPU 3)
2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 4 β€” MMLU-KO (57 subtasks) (GPU 4)
2026-03-05 11:02:07 [INFO] [DONE] [5-shot] GPU 0 β€” KoBEST: boolq + copa
2026-03-05 11:02:31 [INFO] [DONE] [5-shot] GPU 2 β€” KoBEST: wic
2026-03-05 11:02:33 [INFO] [DONE] [5-shot] GPU 4 β€” MMLU-KO (57 subtasks)
2026-03-05 11:02:40 [INFO] [DONE] [5-shot] GPU 1 β€” KoBEST: hellaswag + sentineg
2026-03-05 11:02:58 [INFO] [DONE] [5-shot] GPU 3 β€” HAE-RAE (all subtasks)
2026-03-05 11:02:58 [INFO] Complete: 5 succeeded, 0 failed
2026-03-05 11:02:58 [INFO] 5-shot complete.
2026-03-05 11:03:00 [INFO] Phase 2 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/phase2_results.json
2026-03-05 11:03:00 [INFO] Phase 2 complete in 5m 6s.
2026-03-05 11:03:00 [INFO]
2026-03-05 11:03:00 [INFO] ------------------------------------------------------------------------
2026-03-05 11:03:00 [INFO] PHASE 3 β€” Report Generation (Fixed)
2026-03-05 11:03:00 [INFO] ------------------------------------------------------------------------
2026-03-05 11:03:00 [INFO] Report saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/full_eval_report.md
2026-03-05 11:03:00 [INFO] Individual reports: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/reports
2026-03-05 11:03:00 [INFO] Phase 3 complete in 0s.
2026-03-05 11:03:00 [INFO] ========================================================================
2026-03-05 11:03:00 [INFO] RE-EVALUATION COMPLETE
2026-03-05 11:03:00 [INFO] ========================================================================
2026-03-05 11:03:00 [INFO] Total time : 5m 6s
2026-03-05 11:03:00 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057
2026-03-05 11:03:00 [INFO] Phase 1 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/phase1_results.json
2026-03-05 11:03:00 [INFO] Phase 2 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/phase2_results.json
2026-03-05 11:03:00 [INFO] Report : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/full_eval_report.md
2026-03-05 11:03:00 [INFO] Reports dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/reports
2026-03-05 11:03:00 [INFO] Phase 2 (0-shot): 8 OK / 0 failed
2026-03-05 11:03:00 [INFO] Phase 2 (5-shot): 5 OK
2026-03-05 11:03:00 [INFO] ========================================================================