frankenstallm / source /eval /outputs /ppl_3b_val.txt
pathcosmos's picture
Upload folder using huggingface_hub (#29)
5b1ff4d
raw
history blame
10.1 kB
/usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor
registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922
dispatch key: ADInplaceOrView
previous kernel: no debug info
new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.)
self.m.impl(
Loading model from: checkpoints/korean_3b_fp8_run1/checkpoint-0057000
Model parameters: 3015.4M
Perplexity config: seq_len=2048, stride=512, batch_size=32
Loaded 75,681,623 tokens from data/3b_val.bin
Evaluating perplexity: 0%| | 0/4620 [00:00<?, ?batch/s] Evaluating perplexity: 0%| | 1/4620 [00:02<3:00:51, 2.35s/batch] Evaluating perplexity: 0%| | 2/4620 [00:04<2:50:04, 2.21s/batch] Evaluating perplexity: 0%| | 3/4620 [00:06<2:45:28, 2.15s/batch] Evaluating perplexity: 0%| | 4/4620 [00:08<2:43:16, 2.12s/batch] Evaluating perplexity: 0%| | 5/4620 [00:10<2:42:40, 2.12s/batch] Evaluating perplexity: 0%| | 6/4620 [00:12<2:42:48, 2.12s/batch] Evaluating perplexity: 0%| | 7/4620 [00:14<2:43:15, 2.12s/batch] Evaluating perplexity: 0%| | 8/4620 [00:17<2:43:41, 2.13s/batch] Evaluating perplexity: 0%| | 9/4620 [00:19<2:44:03, 2.13s/batch] Evaluating perplexity: 0%| | 10/4620 [00:21<2:44:39, 2.14s/batch] Evaluating perplexity: 0%| | 11/4620 [00:23<2:44:45, 2.14s/batch] Evaluating perplexity: 0%| | 12/4620 [00:25<2:44:44, 2.15s/batch] Evaluating perplexity: 0%| | 13/4620 [00:27<2:44:12, 2.14s/batch] Evaluating perplexity: 0%| | 14/4620 [00:30<2:44:50, 2.15s/batch] Evaluating perplexity: 0%| | 15/4620 [00:32<2:45:25, 2.16s/batch] Evaluating perplexity: 0%| | 16/4620 [00:34<2:45:19, 2.15s/batch] Evaluating perplexity: 0%| | 17/4620 [00:36<2:45:29, 2.16s/batch] Evaluating perplexity: 0%| | 18/4620 [00:39<3:05:57, 2.42s/batch] Evaluating perplexity: 0%| | 19/4620 [00:41<2:58:42, 2.33s/batch] Evaluating perplexity: 0%| | 20/4620 [00:43<2:54:44, 2.28s/batch] Evaluating perplexity: 0%| | 21/4620 [00:45<2:51:13, 2.23s/batch] Evaluating perplexity: 0%| | 22/4620 [00:48<2:48:49, 2.20s/batch] Evaluating perplexity: 0%| | 23/4620 [00:50<2:47:13, 2.18s/batch] Evaluating perplexity: 1%| | 24/4620 [00:52<2:45:42, 2.16s/batch] Evaluating perplexity: 1%| | 25/4620 [00:54<2:44:41, 2.15s/batch] Evaluating perplexity: 1%| | 26/4620 [00:56<2:44:52, 2.15s/batch] Evaluating perplexity: 1%| | 27/4620 [00:58<2:44:54, 2.15s/batch] Evaluating perplexity: 1%| | 28/4620 [01:00<2:44:47, 2.15s/batch] Evaluating perplexity: 1%| | 29/4620 [01:03<2:44:45, 2.15s/batch] Evaluating perplexity: 1%| | 30/4620 [01:05<2:43:59, 2.14s/batch] Evaluating perplexity: 1%| | 31/4620 [01:07<2:44:17, 2.15s/batch] Evaluating perplexity: 1%| | 32/4620 [01:09<2:44:21, 2.15s/batch] Evaluating perplexity: 1%| | 33/4620 [01:11<2:44:31, 2.15s/batch] Evaluating perplexity: 1%| | 34/4620 [01:13<2:45:05, 2.16s/batch] Evaluating perplexity: 1%| | 35/4620 [01:15<2:44:20, 2.15s/batch] Evaluating perplexity: 1%| | 36/4620 [01:18<2:44:27, 2.15s/batch] Evaluating perplexity: 1%| | 37/4620 [01:20<2:44:08, 2.15s/batch] Evaluating perplexity: 1%| | 38/4620 [01:22<2:44:06, 2.15s/batch] Evaluating perplexity: 1%| | 39/4620 [01:24<2:44:14, 2.15s/batch] Evaluating perplexity: 1%| | 40/4620 [01:26<2:44:11, 2.15s/batch] Evaluating perplexity: 1%| | 41/4620 [01:28<2:43:26, 2.14s/batch] Evaluating perplexity: 1%| | 42/4620 [01:30<2:43:19, 2.14s/batch] Evaluating perplexity: 1%| | 43/4620 [01:33<2:43:15, 2.14s/batch] Evaluating perplexity: 1%| | 44/4620 [01:35<2:42:47, 2.13s/batch] Evaluating perplexity: 1%| | 45/4620 [01:37<2:42:45, 2.13s/batch] Evaluating perplexity: 1%| | 46/4620 [01:39<2:42:32, 2.13s/batch] Evaluating perplexity: 1%| | 47/4620 [01:41<2:43:10, 2.14s/batch] Evaluating perplexity: 1%| | 48/4620 [01:43<2:42:50, 2.14s/batch] Evaluating perplexity: 1%| | 49/4620 [01:45<2:42:40, 2.14s/batch] Evaluating perplexity: 1%| | 50/4620 [01:48<2:42:54, 2.14s/batch] Evaluating perplexity: 1%| | 51/4620 [01:50<2:42:33, 2.13s/batch] Evaluating perplexity: 1%| | 52/4620 [01:52<2:41:34, 2.12s/batch] Evaluating perplexity: 1%| | 53/4620 [01:54<2:41:40, 2.12s/batch] Evaluating perplexity: 1%| | 54/4620 [01:56<2:41:53, 2.13s/batch] Evaluating perplexity: 1%| | 55/4620 [01:58<2:42:05, 2.13s/batch] Evaluating perplexity: 1%| | 56/4620 [02:00<2:42:19, 2.13s/batch] Evaluating perplexity: 1%| | 57/4620 [02:02<2:42:15, 2.13s/batch] Evaluating perplexity: 1%|▏ | 58/4620 [02:05<2:41:30, 2.12s/batch] Evaluating perplexity: 1%|▏ | 59/4620 [02:07<2:41:47, 2.13s/batch] Evaluating perplexity: 1%|▏ | 60/4620 [02:09<2:42:01, 2.13s/batch] Evaluating perplexity: 1%|▏ | 61/4620 [02:11<2:42:04, 2.13s/batch] Evaluating perplexity: 1%|▏ | 62/4620 [02:13<2:42:06, 2.13s/batch] Evaluating perplexity: 1%|▏ | 63/4620 [02:15<2:41:54, 2.13s/batch] Evaluating perplexity: 1%|▏ | 64/4620 [02:17<2:42:07, 2.14s/batch] Evaluating perplexity: 1%|▏ | 65/4620 [02:20<2:42:08, 2.14s/batch] Evaluating perplexity: 1%|▏ | 66/4620 [02:22<2:41:07, 2.12s/batch] Evaluating perplexity: 1%|▏ | 67/4620 [02:24<2:41:39, 2.13s/batch] Evaluating perplexity: 1%|▏ | 68/4620 [02:26<2:41:43, 2.13s/batch] Evaluating perplexity: 1%|▏ | 69/4620 [02:28<2:42:07, 2.14s/batch] Evaluating perplexity: 2%|▏ | 70/4620 [02:30<2:42:28, 2.14s/batch] Evaluating perplexity: 2%|▏ | 71/4620 [02:32<2:42:54, 2.15s/batch] Evaluating perplexity: 2%|▏ | 72/4620 [02:35<2:43:18, 2.15s/batch] Evaluating perplexity: 2%|▏ | 73/4620 [02:37<2:43:16, 2.15s/batch] Evaluating perplexity: 2%|▏ | 74/4620 [02:39<2:42:45, 2.15s/batch] Evaluating perplexity: 2%|▏ | 75/4620 [02:41<2:42:31, 2.15s/batch] Evaluating perplexity: 2%|▏ | 76/4620 [02:43<2:42:14, 2.14s/batch] Evaluating perplexity: 2%|▏ | 77/4620 [02:45<2:41:29, 2.13s/batch] Evaluating perplexity: 2%|▏ | 78/4620 [02:47<2:40:26, 2.12s/batch] Evaluating perplexity: 2%|▏ | 79/4620 [02:49<2:39:35, 2.11s/batch] Evaluating perplexity: 2%|▏ | 80/4620 [02:52<2:40:22, 2.12s/batch] Evaluating perplexity: 2%|▏ | 81/4620 [02:54<2:40:42, 2.12s/batch] Evaluating perplexity: 2%|▏ | 82/4620 [02:56<2:40:58, 2.13s/batch] Evaluating perplexity: 2%|▏ | 83/4620 [02:58<2:40:51, 2.13s/batch] Evaluating perplexity: 2%|▏ | 84/4620 [03:00<2:40:57, 2.13s/batch] Evaluating perplexity: 2%|▏ | 85/4620 [03:02<2:40:04, 2.12s/batch] Evaluating perplexity: 2%|▏ | 86/4620 [03:04<2:40:03, 2.12s/batch] Evaluating perplexity: 2%|▏ | 87/4620 [03:06<2:40:19, 2.12s/batch] Evaluating perplexity: 2%|▏ | 88/4620 [03:09<2:40:02, 2.12s/batch] Evaluating perplexity: 2%|▏ | 89/4620 [03:11<2:40:02, 2.12s/batch] Evaluating perplexity: 2%|▏ | 90/4620 [03:13<2:40:19, 2.12s/batch] Evaluating perplexity: 2%|▏ | 91/4620 [03:15<2:40:48, 2.13s/batch] Evaluating perplexity: 2%|▏ | 92/4620 [03:17<2:41:37, 2.14s/batch] Evaluating perplexity: 2%|▏ | 93/4620 [03:19<2:41:53, 2.15s/batch] Evaluating perplexity: 2%|▏ | 94/4620 [03:21<2:42:12, 2.15s/batch] Evaluating perplexity: 2%|▏ | 95/4620 [03:24<2:42:20, 2.15s/batch] Evaluating perplexity: 2%|▏ | 96/4620 [03:26<2:42:29, 2.16s/batch] Evaluating perplexity: 2%|▏ | 97/4620 [03:28<2:42:28, 2.16s/batch] Evaluating perplexity: 2%|▏ | 98/4620 [03:30<2:40:50, 2.13s/batch] Evaluating perplexity: 2%|▏ | 99/4620 [03:32<2:41:07, 2.14s/batch] Evaluating perplexity: 2%|▏ | 100/4620 [03:34<2:41:00, 2.14s/batch] Evaluating perplexity: 2%|▏ | 101/4620 [03:36<2:40:54, 2.14s/batch] Evaluating perplexity: 2%|▏ | 102/4620 [03:39<2:40:46, 2.14s/batch] Evaluating perplexity: 2%|▏ | 103/4620 [03:41<2:40:17, 2.13s/batch] Evaluating perplexity: 2%|▏ | 104/4620 [03:43<2:40:28, 2.13s/batch] Evaluating perplexity: 2%|▏ | 105/4620 [03:45<2:40:24, 2.13s/batch] Evaluating perplexity: 2%|▏ | 106/4620 [03:47<2:40:14, 2.13s/batch] Evaluating perplexity: 2%|▏ | 107/4620 [03:49<2:41:04, 2.14s/batch] Evaluating perplexity: 2%|▏ | 108/4620 [03:51<2:40:45, 2.14s/batch] Evaluating perplexity: 2%|▏ | 109/4620 [03:53<2:40:43, 2.14s/batch] Evaluating perplexity: 2%|▏ | 110/4620 [03:56<2:40:54, 2.14s/batch] Evaluating perplexity: 2%|▏ | 111/4620 [03:58<2:40:32, 2.14s/batch] Evaluating perplexity: 2%|▏ | 112/4620 [04:00<2:40:24, 2.13s/batch]