| 2026-03-05 10:57:53 [INFO] ======================================================================== |
| 2026-03-05 10:57:53 [INFO] FRANKENSTALLM 3B β Re-evaluation Pipeline |
| 2026-03-05 10:57:53 [INFO] ======================================================================== |
| 2026-03-05 10:57:53 [INFO] Previous run : eval/outputs/3b_full_eval_20260305_0318 |
| 2026-03-05 10:57:53 [INFO] HF checkpoint: eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000 |
| 2026-03-05 10:57:53 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057 |
| 2026-03-05 10:57:53 [INFO] GPUs : [0, 1, 2, 3, 4, 5, 6, 7] |
| 2026-03-05 10:57:53 [INFO] Skip Phase 2 : False |
| 2026-03-05 10:57:53 [INFO] |
| 2026-03-05 10:57:53 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 10:57:53 [INFO] PHASE 1 β Copy from Previous Run |
| 2026-03-05 10:57:53 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 10:57:53 [INFO] Copied phase1_results.json from previous run |
| 2026-03-05 10:57:53 [INFO] Copied generation_samples.json |
| 2026-03-05 10:57:53 [INFO] |
| 2026-03-05 10:57:53 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 10:57:53 [INFO] PHASE 2 β Corrected Benchmarks β 8 GPU Parallel |
| 2026-03-05 10:57:53 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 10:57:53 [INFO] Running 0-shot benchmarks on 8 GPUs ... |
| 2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 0 β KoBEST: boolq + copa (GPU 0) |
| 2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 1 β KoBEST: hellaswag + sentineg (GPU 1) |
| 2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 2 β KoBEST: wic (GPU 2) |
| 2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 3 β HAE-RAE (all subtasks) (GPU 3) |
| 2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 4 β MMLU-KO (57 subtasks) (GPU 4) |
| 2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 5 β EN: hellaswag + arc_easy + arc_challenge (GPU 5) |
| 2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 6 β EN: winogrande + piqa (GPU 6) |
| 2026-03-05 10:57:53 [INFO] Spawning: [0-shot] GPU 7 β MMLU-EN (57 subtasks) (GPU 7) |
| 2026-03-05 10:59:21 [INFO] [DONE] [0-shot] GPU 2 β KoBEST: wic |
| 2026-03-05 10:59:33 [INFO] [DONE] [0-shot] GPU 1 β KoBEST: hellaswag + sentineg |
| 2026-03-05 10:59:35 [INFO] [DONE] [0-shot] GPU 4 β MMLU-KO (57 subtasks) |
| 2026-03-05 10:59:41 [INFO] [DONE] [0-shot] GPU 0 β KoBEST: boolq + copa |
| 2026-03-05 10:59:45 [INFO] [DONE] [0-shot] GPU 3 β HAE-RAE (all subtasks) |
| 2026-03-05 10:59:47 [INFO] [DONE] [0-shot] GPU 6 β EN: winogrande + piqa |
| 2026-03-05 11:00:41 [INFO] [DONE] [0-shot] GPU 5 β EN: hellaswag + arc_easy + arc_challenge |
| 2026-03-05 11:01:30 [INFO] [DONE] [0-shot] GPU 7 β MMLU-EN (57 subtasks) |
| 2026-03-05 11:01:30 [INFO] Complete: 8 succeeded, 0 failed |
| 2026-03-05 11:01:30 [INFO] 0-shot complete. |
| 2026-03-05 11:01:31 [INFO] Running 5-shot benchmarks (Korean only) ... |
| 2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 0 β KoBEST: boolq + copa (GPU 0) |
| 2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 1 β KoBEST: hellaswag + sentineg (GPU 1) |
| 2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 2 β KoBEST: wic (GPU 2) |
| 2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 3 β HAE-RAE (all subtasks) (GPU 3) |
| 2026-03-05 11:01:31 [INFO] Spawning: [5-shot] GPU 4 β MMLU-KO (57 subtasks) (GPU 4) |
| 2026-03-05 11:02:07 [INFO] [DONE] [5-shot] GPU 0 β KoBEST: boolq + copa |
| 2026-03-05 11:02:31 [INFO] [DONE] [5-shot] GPU 2 β KoBEST: wic |
| 2026-03-05 11:02:33 [INFO] [DONE] [5-shot] GPU 4 β MMLU-KO (57 subtasks) |
| 2026-03-05 11:02:40 [INFO] [DONE] [5-shot] GPU 1 β KoBEST: hellaswag + sentineg |
| 2026-03-05 11:02:58 [INFO] [DONE] [5-shot] GPU 3 β HAE-RAE (all subtasks) |
| 2026-03-05 11:02:58 [INFO] Complete: 5 succeeded, 0 failed |
| 2026-03-05 11:02:58 [INFO] 5-shot complete. |
| 2026-03-05 11:03:00 [INFO] Phase 2 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/phase2_results.json |
| 2026-03-05 11:03:00 [INFO] Phase 2 complete in 5m 6s. |
| 2026-03-05 11:03:00 [INFO] |
| 2026-03-05 11:03:00 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 11:03:00 [INFO] PHASE 3 β Report Generation (Fixed) |
| 2026-03-05 11:03:00 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 11:03:00 [INFO] Report saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/full_eval_report.md |
| 2026-03-05 11:03:00 [INFO] Individual reports: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/reports |
| 2026-03-05 11:03:00 [INFO] Phase 3 complete in 0s. |
| 2026-03-05 11:03:00 [INFO] ======================================================================== |
| 2026-03-05 11:03:00 [INFO] RE-EVALUATION COMPLETE |
| 2026-03-05 11:03:00 [INFO] ======================================================================== |
| 2026-03-05 11:03:00 [INFO] Total time : 5m 6s |
| 2026-03-05 11:03:00 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057 |
| 2026-03-05 11:03:00 [INFO] Phase 1 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/phase1_results.json |
| 2026-03-05 11:03:00 [INFO] Phase 2 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/phase2_results.json |
| 2026-03-05 11:03:00 [INFO] Report : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/full_eval_report.md |
| 2026-03-05 11:03:00 [INFO] Reports dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1057/reports |
| 2026-03-05 11:03:00 [INFO] Phase 2 (0-shot): 8 OK / 0 failed |
| 2026-03-05 11:03:00 [INFO] Phase 2 (5-shot): 5 OK |
| 2026-03-05 11:03:00 [INFO] ======================================================================== |
|
|