| 2026-03-05 03:18:13 [INFO] ======================================================================== |
| 2026-03-05 03:18:13 [INFO] FRANKENSTALLM 3B β Full Evaluation Pipeline |
| 2026-03-05 03:18:13 [INFO] ======================================================================== |
| 2026-03-05 03:18:13 [INFO] Checkpoint : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_fp8_run1/checkpoint-0057000 |
| 2026-03-05 03:18:13 [INFO] Tokenizer : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/tokenizer/korean_sp/tokenizer.json |
| 2026-03-05 03:18:13 [INFO] Data dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/data |
| 2026-03-05 03:18:13 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318 |
| 2026-03-05 03:18:13 [INFO] GPUs : [2, 3, 4, 5, 6, 7] |
| 2026-03-05 03:18:13 [INFO] SEQ_LEN : 2048 STRIDE: 512 BATCH_SIZE: 32 |
| 2026-03-05 03:18:13 [INFO] Phases : phase0=run phase1=run phase2=run |
| 2026-03-05 03:18:13 [INFO] |
| 2026-03-05 03:18:13 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 03:18:13 [INFO] PHASE 0 β HF Checkpoint Conversion |
| 2026-03-05 03:18:13 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 03:18:13 [INFO] Running: /usr/bin/python /PROJECT/0325120031_A/ghong/taketimes/llm-bang/scripts/convert_to_hf.py --checkpoint /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_fp8_run1/checkpoint-0057000 --output /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000 --tokenizer /PROJECT/0325120031_A/ghong/taketimes/llm-bang/tokenizer/korean_sp/tokenizer.json |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| Checkpoint : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_fp8_run1/checkpoint-0057000 |
| Output : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000 |
| Model : d_model=3072, n_layers=28, vocab_size=64000, use_fp8=True |
| Loading model.pt ... |
| Source keys: 255 |
| Remapping weight names ... |
| Destination keys: 171 |
| Saving model.safetensors ... |
| Saved config.json |
| Copied tokenizer: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/tokenizer/korean_sp/tokenizer.json -> /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000/tokenizer.json |
|
|
| Done! HF model saved to: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000 |
| Verify: ls -lh /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000 |
| 2026-03-05 03:18:30 [INFO] HF checkpoint saved to: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000 |
| 2026-03-05 03:18:30 [INFO] Phase 0 complete in 17s. |
| 2026-03-05 03:18:30 [INFO] |
| 2026-03-05 03:18:30 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 03:18:30 [INFO] PHASE 1 β Internal Evaluation β 6 GPU Parallel |
| 2026-03-05 03:18:30 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 03:18:30 [INFO] Submitted: GPU 5 β Calibration + Token NLL |
| 2026-03-05 03:18:30 [INFO] Submitted: GPU 6 β Generation (15 prompts Γ 4 temps) |
| 2026-03-05 03:18:30 [INFO] Submitted: GPU 7 β Repetition grid (12 Γ 5) |
| 2026-03-05 03:18:30 [INFO] Submitted: GPU 2 β PPL: 3b_val.bin |
| 2026-03-05 03:18:30 [INFO] Submitted: GPU 3 β PPL: korean_c4 + korean_val |
| 2026-03-05 03:18:30 [INFO] Submitted: GPU 4 β PPL: hplt_ko + cc100_ko + PPL: 7 cosmo files + PPL: 7 remaining files |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| [PPL cuda:2] Loading model for 3b... |
| [PPL cuda:2] 3b: 75,681,623 tokens, 151.4 MB |
| [PPL_MULTI cuda:3] Loading model once for 2 files... |
| [PPL cuda:3] korean_c4: 15,159,838 tokens, 30.3 MB |
| [CALIB cuda:5] Loading model... |
| [CALIB cuda:5] Using 50,000 tokens from 3b_val.bin |
| [PPL_MULTI cuda:4] Loading model once for 16 files... |
| [PPL cuda:4] hplt_ko: 16,165,735 tokens, 32.3 MB |
| [DCTN-0301014756:3083641:0:3084057] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) |
| 2026-03-05 03:18:48 [ERROR] [FAILED] GPU 5 β Calibration + Token NLL |
| Traceback (most recent call last): |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. |
|
|
| 2026-03-05 03:18:48 [ERROR] [FAILED] GPU 6 β Generation (15 prompts Γ 4 temps) |
| Traceback (most recent call last): |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. |
|
|
| 2026-03-05 03:18:48 [ERROR] [FAILED] GPU 7 β Repetition grid (12 Γ 5) |
| Traceback (most recent call last): |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. |
|
|
| 2026-03-05 03:18:48 [ERROR] [FAILED] GPU 2 β PPL: 3b_val.bin |
| Traceback (most recent call last): |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. |
|
|
| 2026-03-05 03:18:48 [ERROR] [FAILED] GPU 3 β PPL: korean_c4 + korean_val |
| Traceback (most recent call last): |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. |
|
|
| 2026-03-05 03:18:48 [ERROR] [FAILED] GPU 4 β PPL: hplt_ko + cc100_ko + PPL: 7 cosmo files + PPL: 7 remaining files |
| Traceback (most recent call last): |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| File "/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/full_eval_pipeline.py", line 515, in run_phase1 |
| result = fut.result() |
| ^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result |
| return self.__get_result() |
| ^^^^^^^^^^^^^^^^^^^ |
| File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result |
| raise self._exception |
| concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. |
|
|
| 2026-03-05 03:18:50 [INFO] Phase 1 complete: 0 succeeded, 6 failed |
| 2026-03-05 03:18:50 [INFO] Phase 1 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/phase1_results.json |
| 2026-03-05 03:18:50 [INFO] Phase 1 complete in 19s. |
| 2026-03-05 03:18:50 [INFO] |
| 2026-03-05 03:18:50 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 03:18:50 [INFO] PHASE 2 β lm-eval Benchmarks β 6 GPU Parallel |
| 2026-03-05 03:18:50 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 03:18:50 [INFO] Running 0-shot benchmarks on 8 GPUs ... |
| 2026-03-05 03:18:50 [INFO] Submitted: [0-shot] GPU 2 β KoBEST: boolq + copa |
| 2026-03-05 03:18:50 [INFO] Submitted: [0-shot] GPU 3 β KoBEST: hellaswag + sentineg |
| 2026-03-05 03:18:50 [INFO] Submitted: [0-shot] GPU 4 β KoBEST: wic |
| 2026-03-05 03:18:50 [INFO] Submitted: [0-shot] GPU 5 β HAE-RAE (all subtasks) |
| 2026-03-05 03:18:50 [INFO] Submitted: [0-shot] GPU 6 β MMLU-KO part 1/2 |
| 2026-03-05 03:18:50 [INFO] Submitted: [0-shot] GPU 7 β MMLU-KO part 2/2 |
| [Phase 2 [0-shot] GPU 2 β KoBEST: boolq + copa] Starting on cuda:2 β tasks: ['kobest_boolq', 'kobest_copa'], 0-shot |
| [Phase 2 [0-shot] GPU 3 β KoBEST: hellaswag + sentineg] Starting on cuda:3 β tasks: ['kobest_hellaswag', 'kobest_sentineg'], 0-shot |
| [Phase 2 [0-shot] GPU 4 β KoBEST: wic] Starting on cuda:4 β tasks: ['kobest_wic'], 0-shot |
| [Phase 2 [0-shot] GPU 6 β MMLU-KO part 1/2] Starting on cuda:6 β tasks: ['global_mmlu_ko_abstract_algebra', 'global_mmlu_ko_anatomy', 'global_mmlu_ko_astronomy', 'global_mmlu_ko_business_ethics', 'global_mmlu_ko_clinical_knowledge', 'global_mmlu_ko_college_biology', 'global_mmlu_ko_college_chemistry', 'global_mmlu_ko_college_computer_science', 'global_mmlu_ko_college_mathematics', 'global_mmlu_ko_college_medicine', 'global_mmlu_ko_college_physics', 'global_mmlu_ko_computer_security', 'global_mmlu_ko_conceptual_physics', 'global_mmlu_ko_econometrics', 'global_mmlu_ko_electrical_engineering', 'global_mmlu_ko_elementary_mathematics', 'global_mmlu_ko_formal_logic', 'global_mmlu_ko_global_facts', 'global_mmlu_ko_high_school_biology', 'global_mmlu_ko_high_school_chemistry', 'global_mmlu_ko_high_school_computer_science', 'global_mmlu_ko_high_school_european_history', 'global_mmlu_ko_high_school_geography', 'global_mmlu_ko_high_school_government_and_politics', 'global_mmlu_ko_high_school_macroeconomics', 'global_mmlu_ko_high_school_mathematics', 'global_mmlu_ko_high_school_microeconomics', 'global_mmlu_ko_high_school_physics', 'global_mmlu_ko_high_school_psychology'], 0-shot |
| [Phase 2 [0-shot] GPU 5 β HAE-RAE (all subtasks)] Starting on cuda:5 β tasks: ['haerae'], 0-shot |
| [Phase 2 [0-shot] GPU 7 β MMLU-KO part 2/2] Starting on cuda:7 β tasks: ['global_mmlu_ko_high_school_statistics', 'global_mmlu_ko_high_school_us_history', 'global_mmlu_ko_high_school_world_history', 'global_mmlu_ko_human_aging', 'global_mmlu_ko_human_sexuality', 'global_mmlu_ko_international_law', 'global_mmlu_ko_jurisprudence', 'global_mmlu_ko_logical_fallacies', 'global_mmlu_ko_machine_learning', 'global_mmlu_ko_management', 'global_mmlu_ko_marketing', 'global_mmlu_ko_medical_genetics', 'global_mmlu_ko_miscellaneous', 'global_mmlu_ko_moral_disputes', 'global_mmlu_ko_moral_scenarios', 'global_mmlu_ko_nutrition', 'global_mmlu_ko_philosophy', 'global_mmlu_ko_prehistory', 'global_mmlu_ko_professional_accounting', 'global_mmlu_ko_professional_law', 'global_mmlu_ko_professional_medicine', 'global_mmlu_ko_professional_psychology', 'global_mmlu_ko_public_relations', 'global_mmlu_ko_security_studies', 'global_mmlu_ko_sociology', 'global_mmlu_ko_us_foreign_policy', 'global_mmlu_ko_virology', 'global_mmlu_ko_world_religions'], 0-shot |
| 2026-03-05 03:18:50 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:50 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:50 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:50 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:50 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:50 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:53 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:53 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:53 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:53 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:53 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_statistics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_us_history' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_world_history' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_human_aging' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_human_sexuality' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_international_law' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_jurisprudence' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_logical_fallacies' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_machine_learning' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_management' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_marketing' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_medical_genetics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_miscellaneous' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_moral_disputes' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_moral_scenarios' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_nutrition' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_philosophy' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_prehistory' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_professional_accounting' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_professional_law' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_professional_medicine' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_professional_psychology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_public_relations' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_security_studies' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_sociology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_us_foreign_policy' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_virology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_world_religions' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_abstract_algebra' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_anatomy' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_astronomy' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_business_ethics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_clinical_knowledge' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_biology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_chemistry' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_computer_science' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_mathematics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_medicine' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_physics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_computer_security' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_conceptual_physics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_econometrics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_electrical_engineering' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_elementary_mathematics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_formal_logic' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_global_facts' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_biology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_chemistry' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_computer_science' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_european_history' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_geography' not found in lm_eval registry β skipping. |
| [LM_EVAL] Starting on cuda:7 (CUDA_VISIBLE_DEVICES=7), tasks=['global_mmlu_ko_high_school_statistics', 'global_mmlu_ko_high_school_us_history', 'global_mmlu_ko_high_school_world_history', 'global_mmlu_ko_human_aging', 'global_mmlu_ko_human_sexuality', 'global_mmlu_ko_international_law', 'global_mmlu_ko_jurisprudence', 'global_mmlu_ko_logical_fallacies', 'global_mmlu_ko_machine_learning', 'global_mmlu_ko_management', 'global_mmlu_ko_marketing', 'global_mmlu_ko_medical_genetics', 'global_mmlu_ko_miscellaneous', 'global_mmlu_ko_moral_disputes', 'global_mmlu_ko_moral_scenarios', 'global_mmlu_ko_nutrition', 'global_mmlu_ko_philosophy', 'global_mmlu_ko_prehistory', 'global_mmlu_ko_professional_accounting', 'global_mmlu_ko_professional_law', 'global_mmlu_ko_professional_medicine', 'global_mmlu_ko_professional_psychology', 'global_mmlu_ko_public_relations', 'global_mmlu_ko_security_studies', 'global_mmlu_ko_sociology', 'global_mmlu_ko_us_foreign_policy', 'global_mmlu_ko_virology', 'global_mmlu_ko_world_religions'], num_fewshot=0 |
| [LM_EVAL] No valid tasks to evaluate. |
| [Phase 2 [0-shot] GPU 7 β MMLU-KO part 2/2] Done. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_government_and_politics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_macroeconomics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_mathematics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_microeconomics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_physics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_psychology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:53 [INFO] [DONE] [0-shot] GPU 7 β MMLU-KO part 2/2 |
| [LM_EVAL] Starting on cuda:6 (CUDA_VISIBLE_DEVICES=6), tasks=['global_mmlu_ko_abstract_algebra', 'global_mmlu_ko_anatomy', 'global_mmlu_ko_astronomy', 'global_mmlu_ko_business_ethics', 'global_mmlu_ko_clinical_knowledge', 'global_mmlu_ko_college_biology', 'global_mmlu_ko_college_chemistry', 'global_mmlu_ko_college_computer_science', 'global_mmlu_ko_college_mathematics', 'global_mmlu_ko_college_medicine', 'global_mmlu_ko_college_physics', 'global_mmlu_ko_computer_security', 'global_mmlu_ko_conceptual_physics', 'global_mmlu_ko_econometrics', 'global_mmlu_ko_electrical_engineering', 'global_mmlu_ko_elementary_mathematics', 'global_mmlu_ko_formal_logic', 'global_mmlu_ko_global_facts', 'global_mmlu_ko_high_school_biology', 'global_mmlu_ko_high_school_chemistry', 'global_mmlu_ko_high_school_computer_science', 'global_mmlu_ko_high_school_european_history', 'global_mmlu_ko_high_school_geography', 'global_mmlu_ko_high_school_government_and_politics', 'global_mmlu_ko_high_school_macroeconomics', 'global_mmlu_ko_high_school_mathematics', 'global_mmlu_ko_high_school_microeconomics', 'global_mmlu_ko_high_school_physics', 'global_mmlu_ko_high_school_psychology'], num_fewshot=0 |
| [LM_EVAL] No valid tasks to evaluate. |
| [Phase 2 [0-shot] GPU 6 β MMLU-KO part 1/2] Done. |
| 2026-03-05 03:18:53 [INFO] [DONE] [0-shot] GPU 6 β MMLU-KO part 1/2 |
| 2026-03-05 03:18:53 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:53 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:54 [WARNING] [LM_EVAL] Batch evaluation failed (lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device'). Falling back to per-task evaluation. |
| 2026-03-05 03:18:54 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:54 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:54 [WARNING] [LM_EVAL] Task 'kobest_wic' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| [LM_EVAL] Starting on cuda:4 (CUDA_VISIBLE_DEVICES=4), tasks=['kobest_wic'], num_fewshot=0 |
| [LM_EVAL] Evaluating 1 task(s) together: ['kobest_wic'] |
| [LM_EVAL] Evaluating task 'kobest_wic' individually... |
| [LM_EVAL] Evaluation complete in 1.6s |
| [LM_EVAL] Skipped tasks: ['kobest_wic'] |
| [Phase 2 [0-shot] GPU 4 β KoBEST: wic] Done. |
| 2026-03-05 03:18:54 [INFO] [DONE] [0-shot] GPU 4 β KoBEST: wic |
| 2026-03-05 03:18:55 [WARNING] [LM_EVAL] Batch evaluation failed (lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device'). Falling back to per-task evaluation. |
| 2026-03-05 03:18:55 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:55 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:55 [WARNING] [LM_EVAL] Task 'kobest_hellaswag' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| 2026-03-05 03:18:55 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:55 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:55 [WARNING] [LM_EVAL] Task 'kobest_sentineg' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| [LM_EVAL] Starting on cuda:3 (CUDA_VISIBLE_DEVICES=3), tasks=['kobest_hellaswag', 'kobest_sentineg'], num_fewshot=0 |
| [LM_EVAL] Evaluating 2 task(s) together: ['kobest_hellaswag', 'kobest_sentineg'] |
| [LM_EVAL] Evaluating task 'kobest_hellaswag' individually... |
| [LM_EVAL] Evaluating task 'kobest_sentineg' individually... |
| [LM_EVAL] Evaluation complete in 1.6s |
| [LM_EVAL] Skipped tasks: ['kobest_hellaswag', 'kobest_sentineg'] |
| [Phase 2 [0-shot] GPU 3 β KoBEST: hellaswag + sentineg] Done. |
| 2026-03-05 03:18:55 [INFO] [DONE] [0-shot] GPU 3 β KoBEST: hellaswag + sentineg |
| 2026-03-05 03:18:55 [WARNING] [LM_EVAL] Batch evaluation failed (lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device'). Falling back to per-task evaluation. |
| 2026-03-05 03:18:55 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:55 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:55 [WARNING] [LM_EVAL] Task 'kobest_boolq' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| 2026-03-05 03:18:55 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:55 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:55 [WARNING] [LM_EVAL] Task 'kobest_copa' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| [LM_EVAL] Starting on cuda:2 (CUDA_VISIBLE_DEVICES=2), tasks=['kobest_boolq', 'kobest_copa'], num_fewshot=0 |
| [LM_EVAL] Evaluating 2 task(s) together: ['kobest_boolq', 'kobest_copa'] |
| [LM_EVAL] Evaluating task 'kobest_boolq' individually... |
| [LM_EVAL] Evaluating task 'kobest_copa' individually... |
| [LM_EVAL] Evaluation complete in 1.6s |
| [LM_EVAL] Skipped tasks: ['kobest_boolq', 'kobest_copa'] |
| [Phase 2 [0-shot] GPU 2 β KoBEST: boolq + copa] Done. |
| 2026-03-05 03:18:55 [INFO] [DONE] [0-shot] GPU 2 β KoBEST: boolq + copa |
| 2026-03-05 03:18:55 [WARNING] [LM_EVAL] Batch evaluation failed (lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device'). Falling back to per-task evaluation. |
| 2026-03-05 03:18:55 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:55 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:55 [WARNING] [LM_EVAL] Task 'haerae' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| [LM_EVAL] Starting on cuda:5 (CUDA_VISIBLE_DEVICES=5), tasks=['haerae'], num_fewshot=0 |
| [LM_EVAL] Evaluating 1 task(s) together: ['haerae'] |
| [LM_EVAL] Evaluating task 'haerae' individually... |
| [LM_EVAL] Evaluation complete in 1.6s |
| [LM_EVAL] Skipped tasks: ['haerae'] |
| [Phase 2 [0-shot] GPU 5 β HAE-RAE (all subtasks)] Done. |
| 2026-03-05 03:18:55 [INFO] [DONE] [0-shot] GPU 5 β HAE-RAE (all subtasks) |
| 2026-03-05 03:18:55 [INFO] Phase 2 (0-shot) complete: 6 succeeded, 0 failed |
| 2026-03-05 03:18:55 [INFO] Attempting 5-shot benchmarks ... |
| 2026-03-05 03:18:55 [INFO] Submitted: [5-shot] GPU 2 β KoBEST: boolq + copa |
| 2026-03-05 03:18:55 [INFO] Submitted: [5-shot] GPU 3 β KoBEST: hellaswag + sentineg |
| 2026-03-05 03:18:55 [INFO] Submitted: [5-shot] GPU 4 β KoBEST: wic |
| 2026-03-05 03:18:55 [INFO] Submitted: [5-shot] GPU 5 β HAE-RAE (all subtasks) |
| 2026-03-05 03:18:55 [INFO] Submitted: [5-shot] GPU 6 β MMLU-KO part 1/2 |
| 2026-03-05 03:18:55 [INFO] Submitted: [5-shot] GPU 7 β MMLU-KO part 2/2 |
| [Phase 2 [5-shot] GPU 2 β KoBEST: boolq + copa] Starting on cuda:2 β tasks: ['kobest_boolq', 'kobest_copa'], 5-shot |
| [Phase 2 [5-shot] GPU 3 β KoBEST: hellaswag + sentineg] Starting on cuda:3 β tasks: ['kobest_hellaswag', 'kobest_sentineg'], 5-shot |
| [Phase 2 [5-shot] GPU 4 β KoBEST: wic] Starting on cuda:4 β tasks: ['kobest_wic'], 5-shot |
| [Phase 2 [5-shot] GPU 5 β HAE-RAE (all subtasks)] Starting on cuda:5 β tasks: ['haerae'], 5-shot |
| [Phase 2 [5-shot] GPU 6 β MMLU-KO part 1/2] Starting on cuda:6 β tasks: ['global_mmlu_ko_abstract_algebra', 'global_mmlu_ko_anatomy', 'global_mmlu_ko_astronomy', 'global_mmlu_ko_business_ethics', 'global_mmlu_ko_clinical_knowledge', 'global_mmlu_ko_college_biology', 'global_mmlu_ko_college_chemistry', 'global_mmlu_ko_college_computer_science', 'global_mmlu_ko_college_mathematics', 'global_mmlu_ko_college_medicine', 'global_mmlu_ko_college_physics', 'global_mmlu_ko_computer_security', 'global_mmlu_ko_conceptual_physics', 'global_mmlu_ko_econometrics', 'global_mmlu_ko_electrical_engineering', 'global_mmlu_ko_elementary_mathematics', 'global_mmlu_ko_formal_logic', 'global_mmlu_ko_global_facts', 'global_mmlu_ko_high_school_biology', 'global_mmlu_ko_high_school_chemistry', 'global_mmlu_ko_high_school_computer_science', 'global_mmlu_ko_high_school_european_history', 'global_mmlu_ko_high_school_geography', 'global_mmlu_ko_high_school_government_and_politics', 'global_mmlu_ko_high_school_macroeconomics', 'global_mmlu_ko_high_school_mathematics', 'global_mmlu_ko_high_school_microeconomics', 'global_mmlu_ko_high_school_physics', 'global_mmlu_ko_high_school_psychology'], 5-shot |
| [Phase 2 [5-shot] GPU 7 β MMLU-KO part 2/2] Starting on cuda:7 β tasks: ['global_mmlu_ko_high_school_statistics', 'global_mmlu_ko_high_school_us_history', 'global_mmlu_ko_high_school_world_history', 'global_mmlu_ko_human_aging', 'global_mmlu_ko_human_sexuality', 'global_mmlu_ko_international_law', 'global_mmlu_ko_jurisprudence', 'global_mmlu_ko_logical_fallacies', 'global_mmlu_ko_machine_learning', 'global_mmlu_ko_management', 'global_mmlu_ko_marketing', 'global_mmlu_ko_medical_genetics', 'global_mmlu_ko_miscellaneous', 'global_mmlu_ko_moral_disputes', 'global_mmlu_ko_moral_scenarios', 'global_mmlu_ko_nutrition', 'global_mmlu_ko_philosophy', 'global_mmlu_ko_prehistory', 'global_mmlu_ko_professional_accounting', 'global_mmlu_ko_professional_law', 'global_mmlu_ko_professional_medicine', 'global_mmlu_ko_professional_psychology', 'global_mmlu_ko_public_relations', 'global_mmlu_ko_security_studies', 'global_mmlu_ko_sociology', 'global_mmlu_ko_us_foreign_policy', 'global_mmlu_ko_virology', 'global_mmlu_ko_world_religions'], 5-shot |
| 2026-03-05 03:18:56 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:56 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:56 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:56 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:56 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:56 [INFO] TensorFlow version 2.20.0 available. |
| 2026-03-05 03:18:58 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:58 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:58 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:58 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:58 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:58 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:58 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:18:58 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_statistics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_us_history' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_world_history' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_human_aging' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_human_sexuality' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_international_law' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_jurisprudence' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_logical_fallacies' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_machine_learning' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_management' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_marketing' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_medical_genetics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_miscellaneous' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_moral_disputes' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_moral_scenarios' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_nutrition' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_philosophy' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_prehistory' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_professional_accounting' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_professional_law' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_professional_medicine' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_professional_psychology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_public_relations' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_security_studies' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_sociology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_us_foreign_policy' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_virology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_world_religions' not found in lm_eval registry β skipping. |
| [LM_EVAL] Starting on cuda:7 (CUDA_VISIBLE_DEVICES=7), tasks=['global_mmlu_ko_high_school_statistics', 'global_mmlu_ko_high_school_us_history', 'global_mmlu_ko_high_school_world_history', 'global_mmlu_ko_human_aging', 'global_mmlu_ko_human_sexuality', 'global_mmlu_ko_international_law', 'global_mmlu_ko_jurisprudence', 'global_mmlu_ko_logical_fallacies', 'global_mmlu_ko_machine_learning', 'global_mmlu_ko_management', 'global_mmlu_ko_marketing', 'global_mmlu_ko_medical_genetics', 'global_mmlu_ko_miscellaneous', 'global_mmlu_ko_moral_disputes', 'global_mmlu_ko_moral_scenarios', 'global_mmlu_ko_nutrition', 'global_mmlu_ko_philosophy', 'global_mmlu_ko_prehistory', 'global_mmlu_ko_professional_accounting', 'global_mmlu_ko_professional_law', 'global_mmlu_ko_professional_medicine', 'global_mmlu_ko_professional_psychology', 'global_mmlu_ko_public_relations', 'global_mmlu_ko_security_studies', 'global_mmlu_ko_sociology', 'global_mmlu_ko_us_foreign_policy', 'global_mmlu_ko_virology', 'global_mmlu_ko_world_religions'], num_fewshot=5 |
| [LM_EVAL] No valid tasks to evaluate. |
| [Phase 2 [5-shot] GPU 7 β MMLU-KO part 2/2] Done. |
| 2026-03-05 03:18:58 [INFO] [DONE] [5-shot] GPU 7 β MMLU-KO part 2/2 |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_abstract_algebra' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_anatomy' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_astronomy' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_business_ethics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_clinical_knowledge' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_biology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_chemistry' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_computer_science' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_mathematics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_medicine' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_college_physics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_computer_security' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_conceptual_physics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_econometrics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_electrical_engineering' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_elementary_mathematics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_formal_logic' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_global_facts' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_biology' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_chemistry' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_computer_science' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_european_history' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_geography' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_government_and_politics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_macroeconomics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_mathematics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_microeconomics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_physics' not found in lm_eval registry β skipping. |
| 2026-03-05 03:18:58 [WARNING] [LM_EVAL] Task 'global_mmlu_ko_high_school_psychology' not found in lm_eval registry β skipping. |
| [LM_EVAL] Starting on cuda:6 (CUDA_VISIBLE_DEVICES=6), tasks=['global_mmlu_ko_abstract_algebra', 'global_mmlu_ko_anatomy', 'global_mmlu_ko_astronomy', 'global_mmlu_ko_business_ethics', 'global_mmlu_ko_clinical_knowledge', 'global_mmlu_ko_college_biology', 'global_mmlu_ko_college_chemistry', 'global_mmlu_ko_college_computer_science', 'global_mmlu_ko_college_mathematics', 'global_mmlu_ko_college_medicine', 'global_mmlu_ko_college_physics', 'global_mmlu_ko_computer_security', 'global_mmlu_ko_conceptual_physics', 'global_mmlu_ko_econometrics', 'global_mmlu_ko_electrical_engineering', 'global_mmlu_ko_elementary_mathematics', 'global_mmlu_ko_formal_logic', 'global_mmlu_ko_global_facts', 'global_mmlu_ko_high_school_biology', 'global_mmlu_ko_high_school_chemistry', 'global_mmlu_ko_high_school_computer_science', 'global_mmlu_ko_high_school_european_history', 'global_mmlu_ko_high_school_geography', 'global_mmlu_ko_high_school_government_and_politics', 'global_mmlu_ko_high_school_macroeconomics', 'global_mmlu_ko_high_school_mathematics', 'global_mmlu_ko_high_school_microeconomics', 'global_mmlu_ko_high_school_physics', 'global_mmlu_ko_high_school_psychology'], num_fewshot=5 |
| [LM_EVAL] No valid tasks to evaluate. |
| [Phase 2 [5-shot] GPU 6 β MMLU-KO part 1/2] Done. |
| 2026-03-05 03:18:58 [INFO] [DONE] [5-shot] GPU 6 β MMLU-KO part 1/2 |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Batch evaluation failed (lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device'). Falling back to per-task evaluation. |
| 2026-03-05 03:19:00 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:19:00 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Task 'kobest_hellaswag' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| 2026-03-05 03:19:00 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:19:00 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Task 'kobest_sentineg' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| [LM_EVAL] Starting on cuda:3 (CUDA_VISIBLE_DEVICES=3), tasks=['kobest_hellaswag', 'kobest_sentineg'], num_fewshot=5 |
| [LM_EVAL] Evaluating 2 task(s) together: ['kobest_hellaswag', 'kobest_sentineg'] |
| [LM_EVAL] Evaluating task 'kobest_hellaswag' individually... |
| [LM_EVAL] Evaluating task 'kobest_sentineg' individually... |
| [LM_EVAL] Evaluation complete in 1.6s |
| [LM_EVAL] Skipped tasks: ['kobest_hellaswag', 'kobest_sentineg'] |
| [Phase 2 [5-shot] GPU 3 β KoBEST: hellaswag + sentineg] Done. |
| 2026-03-05 03:19:00 [INFO] [DONE] [5-shot] GPU 3 β KoBEST: hellaswag + sentineg |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Batch evaluation failed (lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device'). Falling back to per-task evaluation. |
| 2026-03-05 03:19:00 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:19:00 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Task 'kobest_boolq' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| 2026-03-05 03:19:00 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:19:00 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Task 'kobest_copa' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| [LM_EVAL] Starting on cuda:2 (CUDA_VISIBLE_DEVICES=2), tasks=['kobest_boolq', 'kobest_copa'], num_fewshot=5 |
| [LM_EVAL] Evaluating 2 task(s) together: ['kobest_boolq', 'kobest_copa'] |
| [LM_EVAL] Evaluating task 'kobest_boolq' individually... |
| [LM_EVAL] Evaluating task 'kobest_copa' individually... |
| [LM_EVAL] Evaluation complete in 1.6s |
| [LM_EVAL] Skipped tasks: ['kobest_boolq', 'kobest_copa'] |
| [Phase 2 [5-shot] GPU 2 β KoBEST: boolq + copa] Done. |
| 2026-03-05 03:19:00 [INFO] [DONE] [5-shot] GPU 2 β KoBEST: boolq + copa |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Batch evaluation failed (lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device'). Falling back to per-task evaluation. |
| 2026-03-05 03:19:00 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:19:00 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Task 'haerae' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| [LM_EVAL] Starting on cuda:5 (CUDA_VISIBLE_DEVICES=5), tasks=['haerae'], num_fewshot=5 |
| [LM_EVAL] Evaluating 1 task(s) together: ['haerae'] |
| [LM_EVAL] Evaluating task 'haerae' individually... |
| [LM_EVAL] Evaluation complete in 1.7s |
| [LM_EVAL] Skipped tasks: ['haerae'] |
| [Phase 2 [5-shot] GPU 5 β HAE-RAE (all subtasks)] Done. |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Batch evaluation failed (lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device'). Falling back to per-task evaluation. |
| 2026-03-05 03:19:00 [INFO] [DONE] [5-shot] GPU 5 β HAE-RAE (all subtasks) |
| 2026-03-05 03:19:00 [INFO] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2026-03-05 03:19:00 [INFO] Initializing hf model, with arguments: {'pretrained': |
| '/PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/hf_3b_checkpoint-0057000', 'dtype': |
| 'bfloat16', 'device': 'cuda:0'} |
| 2026-03-05 03:19:00 [WARNING] [LM_EVAL] Task 'kobest_wic' failed: lm_eval.models.huggingface.HFLM() got multiple values for keyword argument 'device' |
| [LM_EVAL] Starting on cuda:4 (CUDA_VISIBLE_DEVICES=4), tasks=['kobest_wic'], num_fewshot=5 |
| [LM_EVAL] Evaluating 1 task(s) together: ['kobest_wic'] |
| [LM_EVAL] Evaluating task 'kobest_wic' individually... |
| [LM_EVAL] Evaluation complete in 1.7s |
| [LM_EVAL] Skipped tasks: ['kobest_wic'] |
| [Phase 2 [5-shot] GPU 4 β KoBEST: wic] Done. |
| 2026-03-05 03:19:00 [INFO] [DONE] [5-shot] GPU 4 β KoBEST: wic |
| 2026-03-05 03:19:01 [INFO] Phase 2 (5-shot) complete: 6 succeeded, 0 failed |
| 2026-03-05 03:19:01 [INFO] Phase 2 results saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/phase2_results.json |
| 2026-03-05 03:19:01 [INFO] Phase 2 complete in 11s. |
| 2026-03-05 03:19:01 [INFO] |
| 2026-03-05 03:19:01 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 03:19:01 [INFO] PHASE 3 β Report Generation |
| 2026-03-05 03:19:01 [INFO] ------------------------------------------------------------------------ |
| 2026-03-05 03:19:01 [INFO] Report saved: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/full_eval_report.md |
| 2026-03-05 03:19:01 [INFO] Phase 3 complete in 0s. |
| 2026-03-05 03:19:01 [INFO] ======================================================================== |
| 2026-03-05 03:19:01 [INFO] PIPELINE COMPLETE |
| 2026-03-05 03:19:01 [INFO] ======================================================================== |
| 2026-03-05 03:19:01 [INFO] Total time : 47s |
| 2026-03-05 03:19:01 [INFO] Output dir : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318 |
| 2026-03-05 03:19:01 [INFO] Phase 1 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/phase1_results.json |
| 2026-03-05 03:19:01 [INFO] Phase 2 results : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/phase2_results.json |
| 2026-03-05 03:19:01 [INFO] Gen samples : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/generation_samples.json |
| 2026-03-05 03:19:01 [INFO] Report : /PROJECT/0325120031_A/ghong/taketimes/llm-bang/eval/outputs/3b_full_eval_20260305_0318/full_eval_report.md |
| 2026-03-05 03:19:01 [INFO] Phase 1 tasks : 0 OK / 6 failed |
| 2026-03-05 03:19:01 [INFO] Phase 2 tasks : 6 OK / 0 failed |
| 2026-03-05 03:19:01 [INFO] ======================================================================== |
| /usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 60 leaked semaphore objects to clean up at shutdown |
| warnings.warn('resource_tracker: There appear to be %d ' |
|
|