frankenstallm / source /eval /outputs /reeval_v2_run.log
pathcosmos's picture
Upload folder using huggingface_hub (#29)
5b1ff4d
2026-03-05 14:51:36 [INFO] ========================================================================
2026-03-05 14:51:36 [INFO] FRANKENSTALLM 3B β€” Re-evaluation Pipeline
2026-03-05 14:51:36 [INFO] ========================================================================
2026-03-05 14:51:36 [INFO] Previous run : eval/outputs/3b_full_eval_20260305_0318
2026-03-05 14:51:36 [INFO] HF checkpoint: eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000
2026-03-05 14:51:36 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451
2026-03-05 14:51:36 [INFO] GPUs : [0, 1, 2, 3, 4, 5, 6, 7]
2026-03-05 14:51:36 [INFO] Skip Phase 2 : False
2026-03-05 14:51:36 [INFO]
2026-03-05 14:51:36 [INFO] ------------------------------------------------------------------------
2026-03-05 14:51:36 [INFO] PHASE 1 β€” Copy from Previous Run
2026-03-05 14:51:36 [INFO] ------------------------------------------------------------------------
2026-03-05 14:51:36 [INFO] Copied phase1_results.json from previous run
2026-03-05 14:51:36 [INFO] Copied generation_samples.json
2026-03-05 14:51:36 [INFO]
2026-03-05 14:51:36 [INFO] ------------------------------------------------------------------------
2026-03-05 14:51:36 [INFO] PHASE 2 β€” Corrected Benchmarks β€” 8 GPU Parallel
2026-03-05 14:51:36 [INFO] ------------------------------------------------------------------------
2026-03-05 14:51:36 [INFO] Spawning: [pipeline 0+5shot] GPU 0 β€” KoBEST: boolq + copa + wic (GPU 0)
2026-03-05 14:51:36 [INFO] Spawning: [pipeline 0+5shot] GPU 1 β€” KoBEST: hellaswag + sentineg (GPU 1)
2026-03-05 14:51:36 [INFO] Spawning: [pipeline 0+5shot] GPU 2 β€” HAE-RAE (all subtasks) (GPU 2)
2026-03-05 14:51:36 [INFO] Spawning: [pipeline 0+5shot] GPU 3 β€” MMLU-KO (57 subtasks) (GPU 3)
2026-03-05 14:51:36 [INFO] Spawning: [0-shot] GPU 4 β€” EN: hellaswag + arc_easy + arc_challenge (GPU 4)
2026-03-05 14:51:36 [INFO] Spawning: [0-shot] GPU 5 β€” EN: winogrande + piqa (GPU 5)
2026-03-05 14:51:36 [INFO] Spawning: [0-shot] GPU 6 β€” MMLU-EN: humanities + social_sciences (GPU 6)
2026-03-05 14:51:36 [INFO] Spawning: [0-shot] GPU 7 β€” MMLU-EN: stem + other (GPU 7)
2026-03-05 14:51:36 [INFO] Spawned 8 GPUs (Korean GPUs run 0+5-shot pipeline, EN GPUs 0-shot only).
2026-03-05 14:52:58 [INFO] [DONE] [0-shot] GPU 5 β€” EN: winogrande + piqa
2026-03-05 14:53:20 [INFO] [DONE] [pipeline 0+5shot] GPU 3 β€” MMLU-KO (57 subtasks)
2026-03-05 14:53:24 [INFO] [DONE] [pipeline 0+5shot] GPU 1 β€” KoBEST: hellaswag + sentineg
2026-03-05 14:54:54 [INFO] [DONE] [pipeline 0+5shot] GPU 2 β€” HAE-RAE (all subtasks)
2026-03-05 14:55:10 [INFO] [DONE] [pipeline 0+5shot] GPU 0 β€” KoBEST: boolq + copa + wic
2026-03-05 14:55:21 [INFO] [DONE] [0-shot] GPU 7 β€” MMLU-EN: stem + other
2026-03-05 14:55:27 [INFO] [DONE] [0-shot] GPU 6 β€” MMLU-EN: humanities + social_sciences
2026-03-05 14:55:51 [INFO] [DONE] [0-shot] GPU 4 β€” EN: hellaswag + arc_easy + arc_challenge
2026-03-05 14:55:51 [INFO] Complete: 8 succeeded, 0 failed
2026-03-05 14:55:53 [INFO] Phase 2 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/phase2_results.json
2026-03-05 14:55:53 [INFO] Phase 2 complete in 4m 16s.
2026-03-05 14:55:53 [INFO]
2026-03-05 14:55:53 [INFO] ------------------------------------------------------------------------
2026-03-05 14:55:53 [INFO] PHASE 3 β€” Report Generation (Fixed)
2026-03-05 14:55:53 [INFO] ------------------------------------------------------------------------
2026-03-05 14:55:53 [INFO] Report saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/full_eval_report.md
2026-03-05 14:55:53 [INFO] Individual reports: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/reports
2026-03-05 14:55:53 [INFO] Phase 3 complete in 0s.
2026-03-05 14:55:53 [INFO] ========================================================================
2026-03-05 14:55:53 [INFO] RE-EVALUATION COMPLETE
2026-03-05 14:55:53 [INFO] ========================================================================
2026-03-05 14:55:53 [INFO] Total time : 4m 16s
2026-03-05 14:55:53 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451
2026-03-05 14:55:53 [INFO] Phase 1 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/phase1_results.json
2026-03-05 14:55:53 [INFO] Phase 2 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/phase2_results.json
2026-03-05 14:55:53 [INFO] Report : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/full_eval_report.md
2026-03-05 14:55:53 [INFO] Reports dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_reeval_20260305_1451/reports
2026-03-05 14:55:53 [INFO] Phase 2 (0-shot): 8 OK / 0 failed
2026-03-05 14:55:53 [INFO] Phase 2 (5-shot): 4 OK
2026-03-05 14:55:53 [INFO] ========================================================================