[17:59:11] STREAM_CE device cuda V 8192 train 3671502 valid 159631 [18:00:00] sv [165720.1 39223.6 29888.8 26770.9] [18:00:18] sv [175787.4 39518.2 29955.8 27073.4] [18:00:39] sv [186342.7 39784. 30000.3 27357.4] [18:01:04] sv [197372.5 40017.1 30020.3 27622.6] [18:01:32] sv [208562.8 40183.9 29996.6 27869.2] [18:02:03] sv [219681. 40294. 29939. 28097.2] [18:02:38] sv [230503.7 40357.3 29856. 28304.1] [18:03:16] sv [240817.9 40382.5 29755.6 28485.4] [18:03:58] sv [250420.1 40377.3 29644.3 28633.7] [18:04:43] sv [259114.6 40346.8 29527.6 28740.9] [18:27:36] resumed /workspace/oneshot/logs/glm_d896_readout/stream_ce_lam10.pt ppl 136.3978129937227 [18:27:37] init_eval_start D 17409 [18:27:42] STREAM_CE init_ppl=136.40 /workspace/oneshot/torch_ce_stream_readout.py:138: UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior. Consider using tensor.detach() first. (Triggered internally at /pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:836.) log(f"step={step} loss={float(loss):.4f} ppl={ppl:.2f} best={best:.2f}") [18:28:18] step=200 loss=4.3944 ppl=118.12 best=118.12 [18:28:54] step=400 loss=4.3815 ppl=113.37 best=113.37 [18:29:31] step=600 loss=4.3709 ppl=109.14 best=109.14 [18:30:08] step=800 loss=4.9107 ppl=106.80 best=106.80 [18:30:44] step=1000 loss=4.4804 ppl=104.45 best=104.45 [18:31:20] step=1200 loss=4.4660 ppl=102.33 best=102.33 [18:31:57] step=1400 loss=4.4507 ppl=101.31 best=101.31 [18:32:33] step=1600 loss=4.3290 ppl=99.47 best=99.47 [18:33:09] step=1800 loss=4.2714 ppl=98.49 best=98.49 [18:33:46] step=2000 loss=4.2895 ppl=96.73 best=96.73 [18:34:22] step=2200 loss=4.2743 ppl=96.07 best=96.07 [18:34:59] step=2400 loss=3.7439 ppl=94.83 best=94.83 [18:35:35] step=2600 loss=4.1255 ppl=93.89 best=93.89 [18:36:12] step=2800 loss=3.2027 ppl=93.24 best=93.24 [18:36:48] step=3000 loss=3.7933 ppl=92.75 best=92.75 [18:36:48] STREAM_CE best_ppl=92.75