| 2026-03-05 14:51:36 [INFO] ======================================================================== |
| 2026-03-05 14:51:36 [INFO] FRANKENSTALLM 3B β Re-evaluation Pipeline |
| 2026-03-05 14:51:36 [INFO] ======================================================================== |
| 2026-03-05 14:51:36 [INFO] Previous run : eval/outputs/3b_full_eval_20260305_0318 |
| 2026-03-05 14:51:36 [INFO] HF checkpoint: eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000 |
| 2026-03-05 14:51:36 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451 |
| 2026-03-05 14:51:36 [INFO] GPUs : [0, 1, 2, 3, 4, 5, 6, 7] |
| 2026-03-05 14:51:36 [INFO] Skip Phase 2 : False |
| 2026-03-05 14:51:36 [INFO] |
| 2026-03-05 14:51:36 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 14:51:36 [INFO] PHASE 1 β Copy from Previous Run |
| 2026-03-05 14:51:36 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 14:51:36 [INFO] Copied phase1_results.json from previous run |
| 2026-03-05 14:51:36 [INFO] Copied generation_samples.json |
| 2026-03-05 14:51:36 [INFO] |
| 2026-03-05 14:51:36 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 14:51:36 [INFO] PHASE 2 β Corrected Benchmarks β 8 GPU Parallel |
| 2026-03-05 14:51:36 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 14:51:36 [INFO] Spawning: [pipeline 0+5shot] GPU 0 β KoBEST: boolq + copa + wic (GPU 0) |
| 2026-03-05 14:51:36 [INFO] Spawning: [pipeline 0+5shot] GPU 1 β KoBEST: hellaswag + sentineg (GPU 1) |
| 2026-03-05 14:51:36 [INFO] Spawning: [pipeline 0+5shot] GPU 2 β HAE-RAE (all subtasks) (GPU 2) |
| 2026-03-05 14:51:36 [INFO] Spawning: [pipeline 0+5shot] GPU 3 β MMLU-KO (57 subtasks) (GPU 3) |
| 2026-03-05 14:51:36 [INFO] Spawning: [0-shot] GPU 4 β EN: hellaswag + arc_easy + arc_challenge (GPU 4) |
| 2026-03-05 14:51:36 [INFO] Spawning: [0-shot] GPU 5 β EN: winogrande + piqa (GPU 5) |
| 2026-03-05 14:51:36 [INFO] Spawning: [0-shot] GPU 6 β MMLU-EN: humanities + social_sciences (GPU 6) |
| 2026-03-05 14:51:36 [INFO] Spawning: [0-shot] GPU 7 β MMLU-EN: stem + other (GPU 7) |
| 2026-03-05 14:51:36 [INFO] Spawned 8 GPUs (Korean GPUs run 0+5-shot pipeline, EN GPUs 0-shot only). |
| 2026-03-05 14:52:58 [INFO] [DONE] [0-shot] GPU 5 β EN: winogrande + piqa |
| 2026-03-05 14:53:20 [INFO] [DONE] [pipeline 0+5shot] GPU 3 β MMLU-KO (57 subtasks) |
| 2026-03-05 14:53:24 [INFO] [DONE] [pipeline 0+5shot] GPU 1 β KoBEST: hellaswag + sentineg |
| 2026-03-05 14:54:54 [INFO] [DONE] [pipeline 0+5shot] GPU 2 β HAE-RAE (all subtasks) |
| 2026-03-05 14:55:10 [INFO] [DONE] [pipeline 0+5shot] GPU 0 β KoBEST: boolq + copa + wic |
| 2026-03-05 14:55:21 [INFO] [DONE] [0-shot] GPU 7 β MMLU-EN: stem + other |
| 2026-03-05 14:55:27 [INFO] [DONE] [0-shot] GPU 6 β MMLU-EN: humanities + social_sciences |
| 2026-03-05 14:55:51 [INFO] [DONE] [0-shot] GPU 4 β EN: hellaswag + arc_easy + arc_challenge |
| 2026-03-05 14:55:51 [INFO] Complete: 8 succeeded, 0 failed |
| 2026-03-05 14:55:53 [INFO] Phase 2 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/phase2_results.json |
| 2026-03-05 14:55:53 [INFO] Phase 2 complete in 4m 16s. |
| 2026-03-05 14:55:53 [INFO] |
| 2026-03-05 14:55:53 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 14:55:53 [INFO] PHASE 3 β Report Generation (Fixed) |
| 2026-03-05 14:55:53 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 14:55:53 [INFO] Report saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/full_eval_report.md |
| 2026-03-05 14:55:53 [INFO] Individual reports: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/reports |
| 2026-03-05 14:55:53 [INFO] Phase 3 complete in 0s. |
| 2026-03-05 14:55:53 [INFO] ======================================================================== |
| 2026-03-05 14:55:53 [INFO] RE-EVALUATION COMPLETE |
| 2026-03-05 14:55:53 [INFO] ======================================================================== |
| 2026-03-05 14:55:53 [INFO] Total time : 4m 16s |
| 2026-03-05 14:55:53 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451 |
| 2026-03-05 14:55:53 [INFO] Phase 1 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/phase1_results.json |
| 2026-03-05 14:55:53 [INFO] Phase 2 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/phase2_results.json |
| 2026-03-05 14:55:53 [INFO] Report : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/full_eval_report.md |
| 2026-03-05 14:55:53 [INFO] Reports dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/reports |
| 2026-03-05 14:55:53 [INFO] Phase 2 (0-shot): 8 OK / 0 failed |
| 2026-03-05 14:55:53 [INFO] Phase 2 (5-shot): 4 OK |
| 2026-03-05 14:55:53 [INFO] ======================================================================== |
|
|