| ====================================================================== |
| FRANKENSTALLM 3B — 6-GPU 병렬 종합 평가 |
| Checkpoint: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_3b_fp8_run1/checkpoint-0057000 |
| Batch size: 32, Seq len: 2048, Stride: 512 |
| ====================================================================== |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden. |
| Overriding a previously registered kernel for the same operator and the same dispatch key |
| operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor |
| registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 |
| dispatch key: ADInplaceOrView |
| previous kernel: no debug info |
| new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.) |
| self.m.impl( |
| [PPL cuda:1] Loading model for korean_c4... |
| [PPL cuda:1] korean_c4: 15,159,838 tokens, 30.3MB |
| [PPL cuda:2] Loading model for korean_namuwiki... |
| [PPL cuda:2] korean_namuwiki: 2,166,179 tokens, 4.3MB |
| [PPL cuda:0] Loading model for 3b... |
| [PPL cuda:0] 3b: 75,681,623 tokens, 151.4MB |
| [CALIB cuda:3] Loading model... |
| [PPL cuda:2] korean_namuwiki: batch 50/133, running PPL=25.7009, 28s |
| [PPL cuda:2] korean_namuwiki: batch 100/133, running PPL=25.8650, 52s |
| [PPL cuda:2] ✓ korean_namuwiki: PPL=25.8814, BPT=4.6938, 67.4s |
| [PPL cuda:2] Loading model for korean_wiki... |
| [PPL cuda:2] korean_wiki: 524,561 tokens, 1.0MB |
| /usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 25 leaked semaphore objects to clean up at shutdown |
| warnings.warn('resource_tracker: There appear to be %d ' |
|
|