2025-10-26 08:04:21,184 - root - INFO - Starting training. 2025-10-26 08:04:21,185 - root - INFO - Starting training. 2025-10-26 08:04:21,186 - root - INFO - Loading config from jobs/munin-7b-open-stage3/config.json 2025-10-26 08:04:21,186 - root - INFO - Loading config from jobs/munin-7b-open-stage3/config.json 2025-10-26 08:04:21,189 - root - INFO - Starting training. 2025-10-26 08:04:21,189 - root - INFO - Loading config from jobs/munin-7b-open-stage3/config.json 2025-10-26 08:04:21,408 - root - INFO - Starting training. 2025-10-26 08:04:21,408 - root - INFO - Starting training. 2025-10-26 08:04:21,408 - root - INFO - Loading config from jobs/munin-7b-open-stage3/config.json 2025-10-26 08:04:21,408 - root - INFO - Loading config from jobs/munin-7b-open-stage3/config.json 2025-10-26 08:04:21,408 - root - INFO - Starting training. 2025-10-26 08:04:21,408 - root - INFO - Loading config from jobs/munin-7b-open-stage3/config.json 2025-10-26 08:04:21,411 - root - INFO - Starting training. 2025-10-26 08:04:21,411 - root - INFO - Loading config from jobs/munin-7b-open-stage3/config.json 2025-10-26 08:04:21,414 - root - INFO - Starting training. 2025-10-26 08:04:21,414 - root - INFO - Loading config from jobs/munin-7b-open-stage3/config.json 2025-10-26 08:04:22,227 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config 2025-10-26 08:04:22,229 - root - INFO - Building 1-D device mesh with ['dp'], [8] 2025-10-26 08:04:22,229 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) 2025-10-26 08:04:22,591 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) 2025-10-26 08:04:22,640 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) 2025-10-26 08:04:22,641 - root - INFO - GPU capacity: NVIDIA B200 (0) with 178.36GiB memory 2025-10-26 08:04:22,645 - root - INFO - Compiling each TransformerBlock with torch.compile 2025-10-26 08:04:22,683 - root - INFO - Applied FSDP to the model 2025-10-26 08:04:22,684 - root - INFO - Model after parallelization model=FSDPTransformer( (tok_embeddings): Embedding(64256, 4096) (layers): ModuleDict( (0): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (1): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (2): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (3): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (4): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (5): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (6): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (7): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (8): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (9): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (10): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (11): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (12): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (13): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (14): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (15): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (16): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (17): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (18): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (19): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (20): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (21): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (22): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (23): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (24): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (25): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (26): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (27): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (28): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (29): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (30): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (31): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) ) (norm): RMSNorm() (output): Linear(in_features=4096, out_features=64256, bias=False) ) 2025-10-26 08:04:22,953 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config 2025-10-26 08:04:22,955 - root - INFO - Building 1-D device mesh with ['dp'], [8] 2025-10-26 08:04:22,955 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) 2025-10-26 08:04:23,007 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config 2025-10-26 08:04:23,008 - root - INFO - Building 1-D device mesh with ['dp'], [8] 2025-10-26 08:04:23,009 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) 2025-10-26 08:04:23,142 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config 2025-10-26 08:04:23,144 - root - INFO - Building 1-D device mesh with ['dp'], [8] 2025-10-26 08:04:23,144 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) 2025-10-26 08:04:23,302 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) 2025-10-26 08:04:23,323 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) 2025-10-26 08:04:23,352 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) 2025-10-26 08:04:23,352 - root - INFO - GPU capacity: NVIDIA B200 (4) with 178.36GiB memory 2025-10-26 08:04:23,356 - root - INFO - Compiling each TransformerBlock with torch.compile 2025-10-26 08:04:23,373 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) 2025-10-26 08:04:23,373 - root - INFO - GPU capacity: NVIDIA B200 (7) with 178.36GiB memory 2025-10-26 08:04:23,377 - root - INFO - Compiling each TransformerBlock with torch.compile 2025-10-26 08:04:23,389 - root - INFO - Applied FSDP to the model 2025-10-26 08:04:23,389 - root - INFO - Model after parallelization model=FSDPTransformer( (tok_embeddings): Embedding(64256, 4096) (layers): ModuleDict( (0): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (1): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (2): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (3): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (4): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (5): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (6): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (7): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (8): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (9): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (10): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (11): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (12): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (13): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (14): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (15): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (16): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (17): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (18): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (19): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (20): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (21): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (22): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (23): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (24): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (25): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (26): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (27): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (28): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (29): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (30): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (31): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) ) (norm): RMSNorm() (output): Linear(in_features=4096, out_features=64256, bias=False) ) 2025-10-26 08:04:23,413 - root - INFO - Applied FSDP to the model 2025-10-26 08:04:23,414 - root - INFO - Model after parallelization model=FSDPTransformer( (tok_embeddings): Embedding(64256, 4096) (layers): ModuleDict( (0): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (1): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (2): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (3): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (4): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (5): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (6): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (7): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (8): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (9): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (10): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (11): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (12): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (13): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (14): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (15): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (16): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (17): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (18): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (19): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (20): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (21): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (22): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (23): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (24): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (25): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (26): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (27): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (28): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (29): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (30): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (31): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) ) (norm): RMSNorm() (output): Linear(in_features=4096, out_features=64256, bias=False) ) 2025-10-26 08:04:23,464 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) 2025-10-26 08:04:23,477 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config 2025-10-26 08:04:23,479 - root - INFO - Building 1-D device mesh with ['dp'], [8] 2025-10-26 08:04:23,479 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) 2025-10-26 08:04:23,485 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config 2025-10-26 08:04:23,487 - root - INFO - Building 1-D device mesh with ['dp'], [8] 2025-10-26 08:04:23,487 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) 2025-10-26 08:04:23,488 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config 2025-10-26 08:04:23,490 - root - INFO - Building 1-D device mesh with ['dp'], [8] 2025-10-26 08:04:23,490 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) 2025-10-26 08:04:23,495 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config 2025-10-26 08:04:23,497 - root - INFO - Building 1-D device mesh with ['dp'], [8] 2025-10-26 08:04:23,498 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) 2025-10-26 08:04:23,517 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) 2025-10-26 08:04:23,517 - root - INFO - GPU capacity: NVIDIA B200 (6) with 178.36GiB memory 2025-10-26 08:04:23,520 - root - INFO - Compiling each TransformerBlock with torch.compile 2025-10-26 08:04:23,554 - root - INFO - Applied FSDP to the model 2025-10-26 08:04:23,555 - root - INFO - Model after parallelization model=FSDPTransformer( (tok_embeddings): Embedding(64256, 4096) (layers): ModuleDict( (0): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (1): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (2): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (3): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (4): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (5): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (6): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (7): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (8): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (9): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (10): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (11): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (12): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (13): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (14): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (15): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (16): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (17): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (18): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (19): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (20): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (21): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (22): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (23): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (24): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (25): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (26): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (27): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (28): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (29): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (30): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (31): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) ) (norm): RMSNorm() (output): Linear(in_features=4096, out_features=64256, bias=False) ) 2025-10-26 08:04:23,797 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) 2025-10-26 08:04:23,812 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) 2025-10-26 08:04:23,813 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) 2025-10-26 08:04:23,838 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) 2025-10-26 08:04:23,850 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) 2025-10-26 08:04:23,850 - root - INFO - GPU capacity: NVIDIA B200 (2) with 178.36GiB memory 2025-10-26 08:04:23,854 - root - INFO - Compiling each TransformerBlock with torch.compile 2025-10-26 08:04:23,861 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) 2025-10-26 08:04:23,862 - root - INFO - GPU capacity: NVIDIA B200 (3) with 178.36GiB memory 2025-10-26 08:04:23,864 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) 2025-10-26 08:04:23,864 - root - INFO - GPU capacity: NVIDIA B200 (1) with 178.36GiB memory 2025-10-26 08:04:23,865 - root - INFO - Compiling each TransformerBlock with torch.compile 2025-10-26 08:04:23,868 - root - INFO - Compiling each TransformerBlock with torch.compile 2025-10-26 08:04:23,889 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) 2025-10-26 08:04:23,889 - root - INFO - GPU capacity: NVIDIA B200 (5) with 178.36GiB memory 2025-10-26 08:04:23,890 - root - INFO - Applied FSDP to the model 2025-10-26 08:04:23,891 - root - INFO - Model after parallelization model=FSDPTransformer( (tok_embeddings): Embedding(64256, 4096) (layers): ModuleDict( (0): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (1): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (2): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (3): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (4): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (5): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (6): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (7): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (8): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (9): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (10): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (11): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (12): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (13): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (14): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (15): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (16): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (17): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (18): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (19): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (20): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (21): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (22): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (23): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (24): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (25): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (26): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (27): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (28): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (29): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (30): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (31): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) ) (norm): RMSNorm() (output): Linear(in_features=4096, out_features=64256, bias=False) ) 2025-10-26 08:04:23,892 - root - INFO - Compiling each TransformerBlock with torch.compile 2025-10-26 08:04:23,898 - root - INFO - Applied FSDP to the model 2025-10-26 08:04:23,898 - root - INFO - Model after parallelization model=FSDPTransformer( (tok_embeddings): Embedding(64256, 4096) (layers): ModuleDict( (0): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (1): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (2): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (3): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (4): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (5): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (6): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (7): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (8): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (9): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (10): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (11): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (12): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (13): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (14): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (15): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (16): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (17): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (18): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (19): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (20): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (21): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (22): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (23): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (24): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (25): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (26): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (27): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (28): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (29): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (30): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (31): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) ) (norm): RMSNorm() (output): Linear(in_features=4096, out_features=64256, bias=False) ) 2025-10-26 08:04:23,901 - root - INFO - Applied FSDP to the model 2025-10-26 08:04:23,901 - root - INFO - Model after parallelization model=FSDPTransformer( (tok_embeddings): Embedding(64256, 4096) (layers): ModuleDict( (0): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (1): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (2): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (3): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (4): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (5): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (6): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (7): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (8): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (9): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (10): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (11): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (12): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (13): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (14): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (15): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (16): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (17): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (18): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (19): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (20): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (21): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (22): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (23): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (24): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (25): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (26): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (27): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (28): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (29): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (30): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (31): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) ) (norm): RMSNorm() (output): Linear(in_features=4096, out_features=64256, bias=False) ) 2025-10-26 08:04:23,926 - root - INFO - Applied FSDP to the model 2025-10-26 08:04:23,926 - root - INFO - Model after parallelization model=FSDPTransformer( (tok_embeddings): Embedding(64256, 4096) (layers): ModuleDict( (0): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (1): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (2): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (3): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (4): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (5): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (6): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (7): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (8): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (9): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (10): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (11): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (12): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (13): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (14): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (15): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (16): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (17): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (18): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (19): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (20): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (21): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (22): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (23): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (24): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (25): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (26): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (27): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (28): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (29): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (30): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) (31): FSDPOptimizedModule( (_orig_mod): TransformerBlock( (attention): Attention( (wq): Linear(in_features=4096, out_features=4096, bias=False) (wk): Linear(in_features=4096, out_features=4096, bias=False) (wv): Linear(in_features=4096, out_features=4096, bias=False) (wo): Linear(in_features=4096, out_features=4096, bias=False) ) (feed_forward): FeedForward( (w1): Linear(in_features=4096, out_features=11008, bias=False) (w2): Linear(in_features=11008, out_features=4096, bias=False) (w3): Linear(in_features=4096, out_features=11008, bias=False) ) (attention_norm): RMSNorm() (ffn_norm): RMSNorm() ) ) ) (norm): RMSNorm() (output): Linear(in_features=4096, out_features=64256, bias=False) ) 2025-10-26 08:04:48,288 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) 2025-10-26 08:04:48,288 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) 2025-10-26 08:04:48,288 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) 2025-10-26 08:04:48,288 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) 2025-10-26 08:04:48,288 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) 2025-10-26 08:04:48,288 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) 2025-10-26 08:04:48,289 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) 2025-10-26 08:04:48,289 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) /home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once 2025-10-26 08:04:48,842 - root - INFO - Loaded cached document counts in 0.00010156631469726562 seconds 2025-10-26 08:04:48,842 - root - INFO - Loaded cached document counts in 0.00010609626770019531 seconds 2025-10-26 08:04:48,842 - root - INFO - Loaded cached document counts in 0.00010180473327636719 seconds 2025-10-26 08:04:48,842 - root - INFO - Loaded cached document counts in 0.00010585784912109375 seconds 2025-10-26 08:04:48,842 - root - INFO - Loaded cached document counts in 0.00010085105895996094 seconds 2025-10-26 08:04:48,842 - root - INFO - Loaded cached document counts in 0.00010728836059570312 seconds 2025-10-26 08:04:48,842 - root - INFO - Loaded cached document counts in 0.00011420249938964844 seconds 2025-10-26 08:04:48,842 - root - INFO - Loaded cached document counts in 0.0001277923583984375 seconds 2025-10-26 08:04:48,843 - root - INFO - Worker 0 responsible for docs: [('/work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet', 0, 945398)] 2025-10-26 08:04:48,843 - root - INFO - Total docs: 945399 2025-10-26 08:04:48,843 - root - INFO - Worker 0 assembled subdataset iterator for /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/, 1 of 1 No valid checkpoint detected at jobs/munin-7b-open-stage3/checkpoints/dataloader, dataset starting from scratch. 2025-10-26 08:04:48,844 - root - INFO - Nodecay weight: tok_embeddings.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.0._orig_mod.attention.wq.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.0._orig_mod.attention.wk.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.0._orig_mod.attention.wv.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.0._orig_mod.attention.wo.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,844 - root - INFO - Nodecay weight: layers.0._orig_mod.attention_norm.weight 2025-10-26 08:04:48,844 - root - INFO - Nodecay weight: layers.0._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.1._orig_mod.attention.wq.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.1._orig_mod.attention.wk.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.1._orig_mod.attention.wv.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.1._orig_mod.attention.wo.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,844 - root - INFO - Nodecay weight: layers.1._orig_mod.attention_norm.weight 2025-10-26 08:04:48,844 - root - INFO - Nodecay weight: layers.1._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.2._orig_mod.attention.wq.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.2._orig_mod.attention.wk.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.2._orig_mod.attention.wv.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.2._orig_mod.attention.wo.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,844 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,844 - root - INFO - Nodecay weight: layers.2._orig_mod.attention_norm.weight 2025-10-26 08:04:48,844 - root - INFO - Nodecay weight: layers.2._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.3._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.3._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.3._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.3._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.3._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.3._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.4._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.4._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.4._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.4._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.4._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.4._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.5._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.5._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.5._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.5._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.5._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.5._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.6._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.6._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.6._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.6._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.6._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.6._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.7._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.7._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.7._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.7._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.7._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.7._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.8._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.8._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.8._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.8._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.8._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.8._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.9._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.9._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.9._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.9._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.9._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.9._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.10._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.10._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.10._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.10._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.10._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.10._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.11._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.11._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.11._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.11._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.11._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.11._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.12._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.12._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.12._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.12._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.12._orig_mod.attention_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Nodecay weight: layers.12._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.13._orig_mod.attention.wq.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.13._orig_mod.attention.wk.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.13._orig_mod.attention.wv.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.13._orig_mod.attention.wo.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,845 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.13._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.13._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.14._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.14._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.14._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.14._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.14._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.14._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.15._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.15._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.15._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.15._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.15._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.15._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.16._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.16._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.16._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.16._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.16._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.16._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.17._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.17._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.17._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.17._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.17._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.17._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.18._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.18._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.18._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.18._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.18._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.18._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.19._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.19._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.19._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.19._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.19._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.19._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.20._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.20._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.20._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.20._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.20._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.20._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.21._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.21._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.21._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.21._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.21._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.21._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.22._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.22._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.22._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.22._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.22._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.22._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.23._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.23._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.23._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.23._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.23._orig_mod.attention_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Nodecay weight: layers.23._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.24._orig_mod.attention.wq.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.24._orig_mod.attention.wk.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.24._orig_mod.attention.wv.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.24._orig_mod.attention.wo.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,846 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.24._orig_mod.attention_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.24._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.25._orig_mod.attention.wq.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.25._orig_mod.attention.wk.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.25._orig_mod.attention.wv.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.25._orig_mod.attention.wo.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.25._orig_mod.attention_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.25._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.26._orig_mod.attention.wq.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.26._orig_mod.attention.wk.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.26._orig_mod.attention.wv.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.26._orig_mod.attention.wo.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.26._orig_mod.attention_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.26._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.27._orig_mod.attention.wq.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.27._orig_mod.attention.wk.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.27._orig_mod.attention.wv.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.27._orig_mod.attention.wo.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.27._orig_mod.attention_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.27._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.28._orig_mod.attention.wq.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.28._orig_mod.attention.wk.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.28._orig_mod.attention.wv.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.28._orig_mod.attention.wo.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.28._orig_mod.attention_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.28._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.29._orig_mod.attention.wq.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.29._orig_mod.attention.wk.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.29._orig_mod.attention.wv.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.29._orig_mod.attention.wo.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.29._orig_mod.attention_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.29._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.30._orig_mod.attention.wq.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.30._orig_mod.attention.wk.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.30._orig_mod.attention.wv.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.30._orig_mod.attention.wo.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.30._orig_mod.attention_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.30._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.31._orig_mod.attention.wq.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.31._orig_mod.attention.wk.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.31._orig_mod.attention.wv.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.31._orig_mod.attention.wo.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w1.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w2.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w3.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.31._orig_mod.attention_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: layers.31._orig_mod.ffn_norm.weight 2025-10-26 08:04:48,847 - root - INFO - Nodecay weight: norm.weight 2025-10-26 08:04:48,847 - root - INFO - Decay weight: output.weight 2025-10-26 08:04:49,455 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage3/checkpoints 2025-10-26 08:04:49,458 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage3/checkpoints 2025-10-26 08:04:49,463 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage3/checkpoints 2025-10-26 08:04:49,470 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage3/checkpoints 2025-10-26 08:04:49,480 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage3/checkpoints 2025-10-26 08:04:49,480 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage3/checkpoints 2025-10-26 08:04:49,500 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage3/checkpoints 2025-10-26 08:04:49,501 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage3/checkpoints 2025-10-26 08:04:49,528 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-pt/checkpoints/step-18926/ 2025-10-26 08:04:49,528 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-pt/checkpoints/step-18926/ 2025-10-26 08:04:49,528 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-pt/checkpoints/step-18926/ 2025-10-26 08:04:49,528 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-pt/checkpoints/step-18926/ 2025-10-26 08:04:49,528 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-pt/checkpoints/step-18926/ 2025-10-26 08:04:49,528 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-pt/checkpoints/step-18926/ 2025-10-26 08:04:49,528 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-pt/checkpoints/step-18926/ 2025-10-26 08:04:49,528 - root - INFO - Forcing load from /work/training/maester/jobs/munin-7b-open-pt/checkpoints/step-18926/ 2025-10-26 08:04:54,056 - root - INFO - Loaded model-only checkpoint from forced path in 4.53 seconds 2025-10-26 08:04:54,057 - root - INFO - Loaded model-only checkpoint from forced path in 4.53 seconds 2025-10-26 08:04:54,057 - root - INFO - Loaded model-only checkpoint from forced path in 4.53 seconds 2025-10-26 08:04:54,057 - root - INFO - Loaded model-only checkpoint from forced path in 4.53 seconds 2025-10-26 08:04:54,057 - root - INFO - Loaded model-only checkpoint from forced path in 4.53 seconds 2025-10-26 08:04:54,058 - root - INFO - Loaded model-only checkpoint from forced path in 4.53 seconds 2025-10-26 08:04:54,058 - root - INFO - Loaded model-only checkpoint from forced path in 4.53 seconds 2025-10-26 08:04:54,058 - root - INFO - Loaded model-only checkpoint from forced path in 4.53 seconds 2025-10-26 08:04:54,078 - root - INFO - Training starts at step 0 2025-10-26 08:04:54,079 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage3/traces 2025-10-26 08:04:54,079 - root - INFO - Training starts at step 0 2025-10-26 08:04:54,079 - root - INFO - Training starts at step 0 2025-10-26 08:04:54,079 - root - INFO - Training starts at step 0 2025-10-26 08:04:54,079 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage3/traces 2025-10-26 08:04:54,079 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage3/traces 2025-10-26 08:04:54,079 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage3/traces 2025-10-26 08:04:54,079 - root - INFO - Training starts at step 0 2025-10-26 08:04:54,080 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage3/traces 2025-10-26 08:04:54,079 - root - INFO - Training starts at step 0 2025-10-26 08:04:54,080 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage3/traces 2025-10-26 08:04:54,080 - root - INFO - Training starts at step 0 2025-10-26 08:04:54,080 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage3/traces 2025-10-26 08:04:54,081 - root - INFO - Training starts at step 0 2025-10-26 08:04:54,082 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage3/traces 2025-10-26 08:04:54,091 - root - INFO - ParquetDataset: entering epoch 0 2025-10-26 08:04:54,091 - root - INFO - Worker 0 opening new file /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet 2025-10-26 08:05:14,979 - root - INFO - Step 1: lr=4.00E-08, loss= 1.0624 (max= 1.3836), tps=3137, mfu=6.54%, memory: 150.54GiB(84.40%) time/data_loading=2.06s (max=2.63s, 25.16%) 2025-10-26 08:05:14,979 - root - INFO - Step 1: lr=4.00E-08, loss= 1.0624 (max= 1.3836), tps=3137, mfu=6.54%, memory: 150.54GiB(84.40%) time/data_loading=2.06s (max=2.63s, 25.16%) 2025-10-26 08:05:14,979 - root - INFO - Step 1: lr=4.00E-08, loss= 1.0624 (max= 1.3836), tps=3137, mfu=6.54%, memory: 150.54GiB(84.40%) time/data_loading=2.06s (max=2.63s, 25.16%) 2025-10-26 08:05:14,979 - root - INFO - Step 1: lr=4.00E-08, loss= 1.0624 (max= 1.3836), tps=3137, mfu=6.54%, memory: 150.54GiB(84.40%) time/data_loading=2.06s (max=2.63s, 25.16%) 2025-10-26 08:05:14,979 - root - INFO - Step 1: lr=4.00E-08, loss= 1.0624 (max= 1.3836), tps=3137, mfu=6.54%, memory: 150.54GiB(84.40%) time/data_loading=2.06s (max=2.63s, 25.16%) 2025-10-26 08:05:14,979 - root - INFO - Step 1: lr=4.00E-08, loss= 1.0624 (max= 1.3836), tps=3137, mfu=6.54%, memory: 150.54GiB(84.40%) time/data_loading=2.06s (max=2.63s, 25.16%) 2025-10-26 08:05:14,979 - root - INFO - Step 1: lr=4.00E-08, loss= 1.0624 (max= 1.3836), tps=3137, mfu=6.54%, memory: 150.54GiB(84.40%) time/data_loading=2.06s (max=2.63s, 25.16%) 2025-10-26 08:05:14,979 - root - INFO - Step 1: lr=4.00E-08, loss= 1.0624 (max= 1.3836), tps=3137, mfu=6.54%, memory: 150.54GiB(84.40%) time/data_loading=2.06s (max=2.63s, 25.16%) 2025-10-26 08:05:14,979 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 2025-10-26 08:05:14,979 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 2025-10-26 08:05:14,979 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 2025-10-26 08:05:14,979 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 2025-10-26 08:05:14,979 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 2025-10-26 08:05:14,979 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 2025-10-26 08:05:14,979 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 2025-10-26 08:05:14,979 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 2025-10-26 08:05:43,993 - root - INFO - Step 10: lr=2.20E-07, loss= 1.1209 (max= 1.8882), tps=20331, mfu=42.36%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:05:43,994 - root - INFO - Step 10: lr=2.20E-07, loss= 1.1209 (max= 1.8882), tps=20331, mfu=42.36%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:05:43,994 - root - INFO - Step 10: lr=2.20E-07, loss= 1.1209 (max= 1.8882), tps=20331, mfu=42.36%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:05:43,994 - root - INFO - Step 10: lr=2.20E-07, loss= 1.1209 (max= 1.8882), tps=20331, mfu=42.36%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:05:43,994 - root - INFO - Step 10: lr=2.20E-07, loss= 1.1209 (max= 1.8882), tps=20331, mfu=42.36%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:05:43,994 - root - INFO - Step 10: lr=2.20E-07, loss= 1.1209 (max= 1.8882), tps=20331, mfu=42.36%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:05:43,994 - root - INFO - Step 10: lr=2.20E-07, loss= 1.1209 (max= 1.8882), tps=20331, mfu=42.36%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:05:43,994 - root - INFO - Step 10: lr=2.20E-07, loss= 1.1209 (max= 1.8882), tps=20331, mfu=42.36%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:05:44,248 - root - INFO - Dumping traces at step 10 2025-10-26 08:05:44,254 - root - INFO - Dumping traces at step 10 2025-10-26 08:05:44,261 - root - INFO - Dumping traces at step 10 2025-10-26 08:05:44,266 - root - INFO - Dumping traces at step 10 2025-10-26 08:05:44,268 - root - INFO - Dumping traces at step 10 2025-10-26 08:05:44,270 - root - INFO - Dumping traces at step 10 2025-10-26 08:05:44,272 - root - INFO - Dumping traces at step 10 2025-10-26 08:05:44,272 - root - INFO - Dumping traces at step 10 2025-10-26 08:05:44,336 - root - INFO - Finished dumping traces in 0.09 seconds 2025-10-26 08:05:44,343 - root - INFO - Finished dumping traces in 0.09 seconds 2025-10-26 08:05:44,350 - root - INFO - Finished dumping traces in 0.09 seconds 2025-10-26 08:05:44,353 - root - INFO - Finished dumping traces in 0.09 seconds 2025-10-26 08:05:44,354 - root - INFO - Finished dumping traces in 0.09 seconds 2025-10-26 08:05:44,354 - root - INFO - Finished dumping traces in 0.08 seconds 2025-10-26 08:05:44,356 - root - INFO - Finished dumping traces in 0.08 seconds 2025-10-26 08:05:44,363 - root - INFO - Finished dumping traces in 0.09 seconds 2025-10-26 08:06:16,279 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1622 (max= 1.8160), tps=20301, mfu=42.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:16,279 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1622 (max= 1.8160), tps=20301, mfu=42.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:16,279 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1622 (max= 1.8160), tps=20301, mfu=42.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:16,279 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1622 (max= 1.8160), tps=20301, mfu=42.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:16,279 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1622 (max= 1.8160), tps=20301, mfu=42.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:16,279 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1622 (max= 1.8160), tps=20301, mfu=42.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:16,279 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1622 (max= 1.8160), tps=20301, mfu=42.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:16,279 - root - INFO - Step 20: lr=4.20E-07, loss= 1.1622 (max= 1.8160), tps=20301, mfu=42.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:16,377 - root - INFO - Dumping traces at step 20 2025-10-26 08:06:16,380 - root - INFO - Dumping traces at step 20 2025-10-26 08:06:16,381 - root - INFO - Dumping traces at step 20 2025-10-26 08:06:16,381 - root - INFO - Dumping traces at step 20 2025-10-26 08:06:16,382 - root - INFO - Dumping traces at step 20 2025-10-26 08:06:16,384 - root - INFO - Dumping traces at step 20 2025-10-26 08:06:16,384 - root - INFO - Dumping traces at step 20 2025-10-26 08:06:16,385 - root - INFO - Dumping traces at step 20 2025-10-26 08:06:16,481 - root - INFO - Finished dumping traces in 0.10 seconds 2025-10-26 08:06:16,496 - root - INFO - Finished dumping traces in 0.12 seconds 2025-10-26 08:06:16,501 - root - INFO - Finished dumping traces in 0.12 seconds 2025-10-26 08:06:16,501 - root - INFO - Finished dumping traces in 0.12 seconds 2025-10-26 08:06:16,501 - root - INFO - Finished dumping traces in 0.12 seconds 2025-10-26 08:06:16,502 - root - INFO - Finished dumping traces in 0.12 seconds 2025-10-26 08:06:16,502 - root - INFO - Finished dumping traces in 0.12 seconds 2025-10-26 08:06:16,503 - root - INFO - Finished dumping traces in 0.12 seconds 2025-10-26 08:06:48,380 - root - INFO - Step 30: lr=6.20E-07, loss= 1.0990 (max= 1.5918), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:48,380 - root - INFO - Step 30: lr=6.20E-07, loss= 1.0990 (max= 1.5918), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:48,380 - root - INFO - Step 30: lr=6.20E-07, loss= 1.0990 (max= 1.5918), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:48,380 - root - INFO - Step 30: lr=6.20E-07, loss= 1.0990 (max= 1.5918), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:48,380 - root - INFO - Step 30: lr=6.20E-07, loss= 1.0990 (max= 1.5918), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:48,380 - root - INFO - Step 30: lr=6.20E-07, loss= 1.0990 (max= 1.5918), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:48,380 - root - INFO - Step 30: lr=6.20E-07, loss= 1.0990 (max= 1.5918), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:48,380 - root - INFO - Step 30: lr=6.20E-07, loss= 1.0990 (max= 1.5918), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:06:48,482 - root - INFO - Dumping traces at step 30 2025-10-26 08:06:48,482 - root - INFO - Dumping traces at step 30 2025-10-26 08:06:48,482 - root - INFO - Dumping traces at step 30 2025-10-26 08:06:48,484 - root - INFO - Dumping traces at step 30 2025-10-26 08:06:48,484 - root - INFO - Dumping traces at step 30 2025-10-26 08:06:48,484 - root - INFO - Dumping traces at step 30 2025-10-26 08:06:48,486 - root - INFO - Dumping traces at step 30 2025-10-26 08:06:48,490 - root - INFO - Dumping traces at step 30 2025-10-26 08:06:48,582 - root - INFO - Finished dumping traces in 0.10 seconds 2025-10-26 08:06:48,582 - root - INFO - Finished dumping traces in 0.10 seconds 2025-10-26 08:06:48,585 - root - INFO - Finished dumping traces in 0.10 seconds 2025-10-26 08:06:48,588 - root - INFO - Finished dumping traces in 0.11 seconds 2025-10-26 08:06:48,588 - root - INFO - Finished dumping traces in 0.10 seconds 2025-10-26 08:06:48,589 - root - INFO - Finished dumping traces in 0.10 seconds 2025-10-26 08:06:48,594 - root - INFO - Finished dumping traces in 0.11 seconds 2025-10-26 08:06:48,597 - root - INFO - Finished dumping traces in 0.11 seconds 2025-10-26 08:07:20,530 - root - INFO - Step 40: lr=8.20E-07, loss= 1.0653 (max= 1.5685), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:20,530 - root - INFO - Step 40: lr=8.20E-07, loss= 1.0653 (max= 1.5685), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:20,530 - root - INFO - Step 40: lr=8.20E-07, loss= 1.0653 (max= 1.5685), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:20,530 - root - INFO - Step 40: lr=8.20E-07, loss= 1.0653 (max= 1.5685), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:20,530 - root - INFO - Step 40: lr=8.20E-07, loss= 1.0653 (max= 1.5685), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:20,530 - root - INFO - Step 40: lr=8.20E-07, loss= 1.0653 (max= 1.5685), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:20,530 - root - INFO - Step 40: lr=8.20E-07, loss= 1.0653 (max= 1.5685), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:20,531 - root - INFO - Step 40: lr=8.20E-07, loss= 1.0653 (max= 1.5685), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:37,010 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2675277 2025-10-26 08:07:52,346 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1051 (max= 1.5264), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:52,346 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1051 (max= 1.5264), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:52,346 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1051 (max= 1.5264), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:52,346 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1051 (max= 1.5264), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:52,346 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1051 (max= 1.5264), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:52,347 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1051 (max= 1.5264), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:52,347 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1051 (max= 1.5264), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:52,347 - root - INFO - Step 50: lr=1.02E-06, loss= 1.1051 (max= 1.5264), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:07:56,140 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:1038922 2025-10-26 08:08:24,147 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1003 (max= 1.6855), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:24,147 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1003 (max= 1.6855), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:24,147 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1003 (max= 1.6855), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:24,147 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1003 (max= 1.6855), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:24,147 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1003 (max= 1.6855), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:24,147 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1003 (max= 1.6855), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:24,148 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1003 (max= 1.6855), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:24,148 - root - INFO - Step 60: lr=1.22E-06, loss= 1.1003 (max= 1.6855), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:56,049 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1037 (max= 1.4950), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:56,049 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1037 (max= 1.4950), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:56,049 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1037 (max= 1.4950), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:56,049 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1037 (max= 1.4950), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:56,049 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1037 (max= 1.4950), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:56,050 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1037 (max= 1.4950), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:56,050 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1037 (max= 1.4950), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:08:56,050 - root - INFO - Step 70: lr=1.42E-06, loss= 1.1037 (max= 1.4950), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:27,927 - root - INFO - Step 80: lr=1.62E-06, loss= 1.0921 (max= 1.6041), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:27,927 - root - INFO - Step 80: lr=1.62E-06, loss= 1.0921 (max= 1.6041), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:27,927 - root - INFO - Step 80: lr=1.62E-06, loss= 1.0921 (max= 1.6041), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:27,927 - root - INFO - Step 80: lr=1.62E-06, loss= 1.0921 (max= 1.6041), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:27,927 - root - INFO - Step 80: lr=1.62E-06, loss= 1.0921 (max= 1.6041), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:27,927 - root - INFO - Step 80: lr=1.62E-06, loss= 1.0921 (max= 1.6041), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:27,927 - root - INFO - Step 80: lr=1.62E-06, loss= 1.0921 (max= 1.6041), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:27,927 - root - INFO - Step 80: lr=1.62E-06, loss= 1.0921 (max= 1.6041), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:59,845 - root - INFO - Step 90: lr=1.82E-06, loss= 1.0904 (max= 1.5440), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:59,846 - root - INFO - Step 90: lr=1.82E-06, loss= 1.0904 (max= 1.5440), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:59,846 - root - INFO - Step 90: lr=1.82E-06, loss= 1.0904 (max= 1.5440), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:59,846 - root - INFO - Step 90: lr=1.82E-06, loss= 1.0904 (max= 1.5440), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:59,846 - root - INFO - Step 90: lr=1.82E-06, loss= 1.0904 (max= 1.5440), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:59,846 - root - INFO - Step 90: lr=1.82E-06, loss= 1.0904 (max= 1.5440), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:59,846 - root - INFO - Step 90: lr=1.82E-06, loss= 1.0904 (max= 1.5440), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:09:59,847 - root - INFO - Step 90: lr=1.82E-06, loss= 1.0904 (max= 1.5440), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:10:31,714 - root - INFO - Step 100: lr=2.02E-06, loss= 1.0931 (max= 1.5012), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:10:31,714 - root - INFO - Step 100: lr=2.02E-06, loss= 1.0931 (max= 1.5012), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:10:31,714 - root - INFO - Step 100: lr=2.02E-06, loss= 1.0931 (max= 1.5012), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:10:31,714 - root - INFO - Step 100: lr=2.02E-06, loss= 1.0931 (max= 1.5012), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:10:31,714 - root - INFO - Step 100: lr=2.02E-06, loss= 1.0931 (max= 1.5012), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:10:31,715 - root - INFO - Step 100: lr=2.02E-06, loss= 1.0931 (max= 1.5012), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:10:31,715 - root - INFO - Step 100: lr=2.02E-06, loss= 1.0931 (max= 1.5012), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:10:31,715 - root - INFO - Step 100: lr=2.02E-06, loss= 1.0931 (max= 1.5012), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:03,545 - root - INFO - Step 110: lr=2.22E-06, loss= 1.0952 (max= 1.5936), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:03,546 - root - INFO - Step 110: lr=2.22E-06, loss= 1.0952 (max= 1.5936), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:03,546 - root - INFO - Step 110: lr=2.22E-06, loss= 1.0952 (max= 1.5936), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:03,546 - root - INFO - Step 110: lr=2.22E-06, loss= 1.0952 (max= 1.5936), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:03,546 - root - INFO - Step 110: lr=2.22E-06, loss= 1.0952 (max= 1.5936), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:03,546 - root - INFO - Step 110: lr=2.22E-06, loss= 1.0952 (max= 1.5936), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:03,546 - root - INFO - Step 110: lr=2.22E-06, loss= 1.0952 (max= 1.5936), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:03,546 - root - INFO - Step 110: lr=2.22E-06, loss= 1.0952 (max= 1.5936), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:35,430 - root - INFO - Step 120: lr=2.42E-06, loss= 1.0700 (max= 1.8315), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:35,430 - root - INFO - Step 120: lr=2.42E-06, loss= 1.0700 (max= 1.8315), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:35,430 - root - INFO - Step 120: lr=2.42E-06, loss= 1.0700 (max= 1.8315), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:35,430 - root - INFO - Step 120: lr=2.42E-06, loss= 1.0700 (max= 1.8315), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:35,430 - root - INFO - Step 120: lr=2.42E-06, loss= 1.0700 (max= 1.8315), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:35,431 - root - INFO - Step 120: lr=2.42E-06, loss= 1.0700 (max= 1.8315), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:35,431 - root - INFO - Step 120: lr=2.42E-06, loss= 1.0700 (max= 1.8315), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:11:35,431 - root - INFO - Step 120: lr=2.42E-06, loss= 1.0700 (max= 1.8315), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:07,331 - root - INFO - Step 130: lr=2.62E-06, loss= 1.0891 (max= 1.6651), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:07,331 - root - INFO - Step 130: lr=2.62E-06, loss= 1.0891 (max= 1.6651), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:07,331 - root - INFO - Step 130: lr=2.62E-06, loss= 1.0891 (max= 1.6651), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:07,331 - root - INFO - Step 130: lr=2.62E-06, loss= 1.0891 (max= 1.6651), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:07,331 - root - INFO - Step 130: lr=2.62E-06, loss= 1.0891 (max= 1.6651), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:07,331 - root - INFO - Step 130: lr=2.62E-06, loss= 1.0891 (max= 1.6651), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:07,331 - root - INFO - Step 130: lr=2.62E-06, loss= 1.0891 (max= 1.6651), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:07,331 - root - INFO - Step 130: lr=2.62E-06, loss= 1.0891 (max= 1.6651), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:39,111 - root - INFO - Step 140: lr=2.82E-06, loss= 1.0727 (max= 1.3975), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:39,111 - root - INFO - Step 140: lr=2.82E-06, loss= 1.0727 (max= 1.3975), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:39,111 - root - INFO - Step 140: lr=2.82E-06, loss= 1.0727 (max= 1.3975), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:39,111 - root - INFO - Step 140: lr=2.82E-06, loss= 1.0727 (max= 1.3975), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:39,111 - root - INFO - Step 140: lr=2.82E-06, loss= 1.0727 (max= 1.3975), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:39,111 - root - INFO - Step 140: lr=2.82E-06, loss= 1.0727 (max= 1.3975), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:39,112 - root - INFO - Step 140: lr=2.82E-06, loss= 1.0727 (max= 1.3975), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:12:39,112 - root - INFO - Step 140: lr=2.82E-06, loss= 1.0727 (max= 1.3975), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:10,997 - root - INFO - Step 150: lr=3.02E-06, loss= 1.0863 (max= 1.5581), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:10,998 - root - INFO - Step 150: lr=3.02E-06, loss= 1.0863 (max= 1.5581), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:10,998 - root - INFO - Step 150: lr=3.02E-06, loss= 1.0863 (max= 1.5581), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:10,998 - root - INFO - Step 150: lr=3.02E-06, loss= 1.0863 (max= 1.5581), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:10,998 - root - INFO - Step 150: lr=3.02E-06, loss= 1.0863 (max= 1.5581), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:10,998 - root - INFO - Step 150: lr=3.02E-06, loss= 1.0863 (max= 1.5581), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:10,998 - root - INFO - Step 150: lr=3.02E-06, loss= 1.0863 (max= 1.5581), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:10,998 - root - INFO - Step 150: lr=3.02E-06, loss= 1.0863 (max= 1.5581), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:42,897 - root - INFO - Step 160: lr=3.22E-06, loss= 1.0914 (max= 1.5260), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:42,897 - root - INFO - Step 160: lr=3.22E-06, loss= 1.0914 (max= 1.5260), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:42,897 - root - INFO - Step 160: lr=3.22E-06, loss= 1.0914 (max= 1.5260), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:42,897 - root - INFO - Step 160: lr=3.22E-06, loss= 1.0914 (max= 1.5260), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:42,897 - root - INFO - Step 160: lr=3.22E-06, loss= 1.0914 (max= 1.5260), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:42,897 - root - INFO - Step 160: lr=3.22E-06, loss= 1.0914 (max= 1.5260), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:42,897 - root - INFO - Step 160: lr=3.22E-06, loss= 1.0914 (max= 1.5260), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:13:42,898 - root - INFO - Step 160: lr=3.22E-06, loss= 1.0914 (max= 1.5260), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:14,840 - root - INFO - Step 170: lr=3.42E-06, loss= 1.0767 (max= 1.5706), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:14,841 - root - INFO - Step 170: lr=3.42E-06, loss= 1.0767 (max= 1.5706), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:14,841 - root - INFO - Step 170: lr=3.42E-06, loss= 1.0767 (max= 1.5706), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:14,841 - root - INFO - Step 170: lr=3.42E-06, loss= 1.0767 (max= 1.5706), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:14,841 - root - INFO - Step 170: lr=3.42E-06, loss= 1.0767 (max= 1.5706), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:14,841 - root - INFO - Step 170: lr=3.42E-06, loss= 1.0767 (max= 1.5706), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:14,841 - root - INFO - Step 170: lr=3.42E-06, loss= 1.0767 (max= 1.5706), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:14,841 - root - INFO - Step 170: lr=3.42E-06, loss= 1.0767 (max= 1.5706), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:46,697 - root - INFO - Step 180: lr=3.62E-06, loss= 1.0754 (max= 1.4136), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:46,697 - root - INFO - Step 180: lr=3.62E-06, loss= 1.0754 (max= 1.4136), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:46,697 - root - INFO - Step 180: lr=3.62E-06, loss= 1.0754 (max= 1.4136), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:46,697 - root - INFO - Step 180: lr=3.62E-06, loss= 1.0754 (max= 1.4136), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:46,697 - root - INFO - Step 180: lr=3.62E-06, loss= 1.0754 (max= 1.4136), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:46,697 - root - INFO - Step 180: lr=3.62E-06, loss= 1.0754 (max= 1.4136), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:46,697 - root - INFO - Step 180: lr=3.62E-06, loss= 1.0754 (max= 1.4136), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:14:46,698 - root - INFO - Step 180: lr=3.62E-06, loss= 1.0754 (max= 1.4136), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:18,576 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1015 (max= 1.8270), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:18,576 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1015 (max= 1.8270), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:18,576 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1015 (max= 1.8270), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:18,577 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1015 (max= 1.8270), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:18,576 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1015 (max= 1.8270), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:18,577 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1015 (max= 1.8270), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:18,577 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1015 (max= 1.8270), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:18,577 - root - INFO - Step 190: lr=3.82E-06, loss= 1.1015 (max= 1.8270), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:50,488 - root - INFO - Step 200: lr=4.02E-06, loss= 1.0990 (max= 1.5309), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:50,488 - root - INFO - Step 200: lr=4.02E-06, loss= 1.0990 (max= 1.5309), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:50,488 - root - INFO - Step 200: lr=4.02E-06, loss= 1.0990 (max= 1.5309), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:50,488 - root - INFO - Step 200: lr=4.02E-06, loss= 1.0990 (max= 1.5309), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:50,488 - root - INFO - Step 200: lr=4.02E-06, loss= 1.0990 (max= 1.5309), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:50,488 - root - INFO - Step 200: lr=4.02E-06, loss= 1.0990 (max= 1.5309), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:50,488 - root - INFO - Step 200: lr=4.02E-06, loss= 1.0990 (max= 1.5309), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:15:50,489 - root - INFO - Step 200: lr=4.02E-06, loss= 1.0990 (max= 1.5309), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:22,429 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1050 (max= 1.5858), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:22,429 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1050 (max= 1.5858), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:22,429 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1050 (max= 1.5858), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:22,429 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1050 (max= 1.5858), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:22,429 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1050 (max= 1.5858), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:22,429 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1050 (max= 1.5858), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:22,430 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1050 (max= 1.5858), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:22,430 - root - INFO - Step 210: lr=4.22E-06, loss= 1.1050 (max= 1.5858), tps=20521, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:54,318 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1097 (max= 1.6805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:54,319 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1097 (max= 1.6805), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:54,319 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1097 (max= 1.6805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:54,319 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1097 (max= 1.6805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:54,319 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1097 (max= 1.6805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:54,319 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1097 (max= 1.6805), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:54,319 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1097 (max= 1.6805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:16:54,319 - root - INFO - Step 220: lr=4.42E-06, loss= 1.1097 (max= 1.6805), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:26,166 - root - INFO - Step 230: lr=4.62E-06, loss= 1.0832 (max= 2.0076), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:26,166 - root - INFO - Step 230: lr=4.62E-06, loss= 1.0832 (max= 2.0076), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:26,167 - root - INFO - Step 230: lr=4.62E-06, loss= 1.0832 (max= 2.0076), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:26,167 - root - INFO - Step 230: lr=4.62E-06, loss= 1.0832 (max= 2.0076), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:26,167 - root - INFO - Step 230: lr=4.62E-06, loss= 1.0832 (max= 2.0076), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:26,167 - root - INFO - Step 230: lr=4.62E-06, loss= 1.0832 (max= 2.0076), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:26,167 - root - INFO - Step 230: lr=4.62E-06, loss= 1.0832 (max= 2.0076), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:26,167 - root - INFO - Step 230: lr=4.62E-06, loss= 1.0832 (max= 2.0076), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:52,250 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:4931720 2025-10-26 08:17:58,094 - root - INFO - Step 240: lr=4.82E-06, loss= 1.0952 (max= 1.7687), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:58,094 - root - INFO - Step 240: lr=4.82E-06, loss= 1.0952 (max= 1.7687), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:58,094 - root - INFO - Step 240: lr=4.82E-06, loss= 1.0952 (max= 1.7687), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:58,094 - root - INFO - Step 240: lr=4.82E-06, loss= 1.0952 (max= 1.7687), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:58,094 - root - INFO - Step 240: lr=4.82E-06, loss= 1.0952 (max= 1.7687), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:58,094 - root - INFO - Step 240: lr=4.82E-06, loss= 1.0952 (max= 1.7687), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:58,094 - root - INFO - Step 240: lr=4.82E-06, loss= 1.0952 (max= 1.7687), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:17:58,095 - root - INFO - Step 240: lr=4.82E-06, loss= 1.0952 (max= 1.7687), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:18:29,893 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1195 (max= 1.8170), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:18:29,893 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1195 (max= 1.8170), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:18:29,893 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1195 (max= 1.8170), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:18:29,893 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1195 (max= 1.8170), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:18:29,893 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1195 (max= 1.8170), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:18:29,893 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1195 (max= 1.8170), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:18:29,893 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1195 (max= 1.8170), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:18:29,893 - root - INFO - Step 250: lr=5.02E-06, loss= 1.1195 (max= 1.8170), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:01,718 - root - INFO - Step 260: lr=5.22E-06, loss= 1.0949 (max= 1.5200), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:01,718 - root - INFO - Step 260: lr=5.22E-06, loss= 1.0949 (max= 1.5200), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:01,718 - root - INFO - Step 260: lr=5.22E-06, loss= 1.0949 (max= 1.5200), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:01,718 - root - INFO - Step 260: lr=5.22E-06, loss= 1.0949 (max= 1.5200), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:01,718 - root - INFO - Step 260: lr=5.22E-06, loss= 1.0949 (max= 1.5200), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:01,718 - root - INFO - Step 260: lr=5.22E-06, loss= 1.0949 (max= 1.5200), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:01,718 - root - INFO - Step 260: lr=5.22E-06, loss= 1.0949 (max= 1.5200), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:01,718 - root - INFO - Step 260: lr=5.22E-06, loss= 1.0949 (max= 1.5200), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:33,596 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1166 (max= 1.8454), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:33,596 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1166 (max= 1.8454), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:33,596 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1166 (max= 1.8454), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:33,596 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1166 (max= 1.8454), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:33,596 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1166 (max= 1.8454), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:33,596 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1166 (max= 1.8454), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:33,596 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1166 (max= 1.8454), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:19:33,596 - root - INFO - Step 270: lr=5.42E-06, loss= 1.1166 (max= 1.8454), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:05,525 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1074 (max= 1.4890), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:05,525 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1074 (max= 1.4890), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:05,525 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1074 (max= 1.4890), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:05,525 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1074 (max= 1.4890), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:05,525 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1074 (max= 1.4890), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:05,525 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1074 (max= 1.4890), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:05,525 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1074 (max= 1.4890), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:05,525 - root - INFO - Step 280: lr=5.62E-06, loss= 1.1074 (max= 1.4890), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:37,353 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1033 (max= 1.5730), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:37,353 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1033 (max= 1.5730), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:37,353 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1033 (max= 1.5730), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:37,353 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1033 (max= 1.5730), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:37,353 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1033 (max= 1.5730), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:37,353 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1033 (max= 1.5730), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:37,353 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1033 (max= 1.5730), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:20:37,353 - root - INFO - Step 290: lr=5.82E-06, loss= 1.1033 (max= 1.5730), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:00,177 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:4942172 2025-10-26 08:21:09,199 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1012 (max= 1.7718), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:09,199 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1012 (max= 1.7718), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:09,199 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1012 (max= 1.7718), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:09,199 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1012 (max= 1.7718), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:09,199 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1012 (max= 1.7718), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:09,199 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1012 (max= 1.7718), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:09,199 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1012 (max= 1.7718), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:09,200 - root - INFO - Step 300: lr=6.02E-06, loss= 1.1012 (max= 1.7718), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:41,094 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1120 (max= 1.5047), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:41,094 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1120 (max= 1.5047), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:41,094 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1120 (max= 1.5047), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:41,094 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1120 (max= 1.5047), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:41,094 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1120 (max= 1.5047), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:41,094 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1120 (max= 1.5047), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:41,094 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1120 (max= 1.5047), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:21:41,094 - root - INFO - Step 310: lr=6.22E-06, loss= 1.1120 (max= 1.5047), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:12,928 - root - INFO - Step 320: lr=6.42E-06, loss= 1.0991 (max= 1.5859), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:12,928 - root - INFO - Step 320: lr=6.42E-06, loss= 1.0991 (max= 1.5859), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:12,928 - root - INFO - Step 320: lr=6.42E-06, loss= 1.0991 (max= 1.5859), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:12,928 - root - INFO - Step 320: lr=6.42E-06, loss= 1.0991 (max= 1.5859), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:12,928 - root - INFO - Step 320: lr=6.42E-06, loss= 1.0991 (max= 1.5859), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:12,928 - root - INFO - Step 320: lr=6.42E-06, loss= 1.0991 (max= 1.5859), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:12,928 - root - INFO - Step 320: lr=6.42E-06, loss= 1.0991 (max= 1.5859), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:12,929 - root - INFO - Step 320: lr=6.42E-06, loss= 1.0991 (max= 1.5859), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:44,807 - root - INFO - Step 330: lr=6.62E-06, loss= 1.0943 (max= 1.4759), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:44,807 - root - INFO - Step 330: lr=6.62E-06, loss= 1.0943 (max= 1.4759), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:44,807 - root - INFO - Step 330: lr=6.62E-06, loss= 1.0943 (max= 1.4759), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:44,807 - root - INFO - Step 330: lr=6.62E-06, loss= 1.0943 (max= 1.4759), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:44,807 - root - INFO - Step 330: lr=6.62E-06, loss= 1.0943 (max= 1.4759), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:44,807 - root - INFO - Step 330: lr=6.62E-06, loss= 1.0943 (max= 1.4759), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:44,807 - root - INFO - Step 330: lr=6.62E-06, loss= 1.0943 (max= 1.4759), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:22:44,807 - root - INFO - Step 330: lr=6.62E-06, loss= 1.0943 (max= 1.4759), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:16,647 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1229 (max= 1.6761), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:16,647 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1229 (max= 1.6761), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:16,647 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1229 (max= 1.6761), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:16,647 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1229 (max= 1.6761), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:16,647 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1229 (max= 1.6761), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:16,647 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1229 (max= 1.6761), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:16,647 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1229 (max= 1.6761), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:16,648 - root - INFO - Step 340: lr=6.82E-06, loss= 1.1229 (max= 1.6761), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:48,471 - root - INFO - Step 350: lr=7.02E-06, loss= 1.0841 (max= 1.6141), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:48,471 - root - INFO - Step 350: lr=7.02E-06, loss= 1.0841 (max= 1.6141), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:48,471 - root - INFO - Step 350: lr=7.02E-06, loss= 1.0841 (max= 1.6141), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:48,471 - root - INFO - Step 350: lr=7.02E-06, loss= 1.0841 (max= 1.6141), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:48,471 - root - INFO - Step 350: lr=7.02E-06, loss= 1.0841 (max= 1.6141), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:48,471 - root - INFO - Step 350: lr=7.02E-06, loss= 1.0841 (max= 1.6141), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:48,471 - root - INFO - Step 350: lr=7.02E-06, loss= 1.0841 (max= 1.6141), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:23:48,471 - root - INFO - Step 350: lr=7.02E-06, loss= 1.0841 (max= 1.6141), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:20,349 - root - INFO - Step 360: lr=7.22E-06, loss= 1.0948 (max= 1.6921), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:20,349 - root - INFO - Step 360: lr=7.22E-06, loss= 1.0948 (max= 1.6921), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:20,349 - root - INFO - Step 360: lr=7.22E-06, loss= 1.0948 (max= 1.6921), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:20,349 - root - INFO - Step 360: lr=7.22E-06, loss= 1.0948 (max= 1.6921), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:20,349 - root - INFO - Step 360: lr=7.22E-06, loss= 1.0948 (max= 1.6921), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:20,349 - root - INFO - Step 360: lr=7.22E-06, loss= 1.0948 (max= 1.6921), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:20,349 - root - INFO - Step 360: lr=7.22E-06, loss= 1.0948 (max= 1.6921), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:20,349 - root - INFO - Step 360: lr=7.22E-06, loss= 1.0948 (max= 1.6921), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:20,922 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:164484 2025-10-26 08:24:52,178 - root - INFO - Step 370: lr=7.42E-06, loss= 1.0933 (max= 1.4432), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:52,178 - root - INFO - Step 370: lr=7.42E-06, loss= 1.0933 (max= 1.4432), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:52,178 - root - INFO - Step 370: lr=7.42E-06, loss= 1.0933 (max= 1.4432), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:52,178 - root - INFO - Step 370: lr=7.42E-06, loss= 1.0933 (max= 1.4432), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:52,178 - root - INFO - Step 370: lr=7.42E-06, loss= 1.0933 (max= 1.4432), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:52,178 - root - INFO - Step 370: lr=7.42E-06, loss= 1.0933 (max= 1.4432), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:52,178 - root - INFO - Step 370: lr=7.42E-06, loss= 1.0933 (max= 1.4432), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:24:52,178 - root - INFO - Step 370: lr=7.42E-06, loss= 1.0933 (max= 1.4432), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:24,042 - root - INFO - Step 380: lr=7.62E-06, loss= 1.0607 (max= 1.5788), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:24,042 - root - INFO - Step 380: lr=7.62E-06, loss= 1.0607 (max= 1.5788), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:24,042 - root - INFO - Step 380: lr=7.62E-06, loss= 1.0607 (max= 1.5788), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:24,042 - root - INFO - Step 380: lr=7.62E-06, loss= 1.0607 (max= 1.5788), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:24,042 - root - INFO - Step 380: lr=7.62E-06, loss= 1.0607 (max= 1.5788), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:24,042 - root - INFO - Step 380: lr=7.62E-06, loss= 1.0607 (max= 1.5788), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:24,042 - root - INFO - Step 380: lr=7.62E-06, loss= 1.0607 (max= 1.5788), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:24,043 - root - INFO - Step 380: lr=7.62E-06, loss= 1.0607 (max= 1.5788), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:55,879 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1006 (max= 1.6443), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:55,879 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1006 (max= 1.6443), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:55,879 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1006 (max= 1.6443), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:55,879 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1006 (max= 1.6443), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:55,879 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1006 (max= 1.6443), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:55,879 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1006 (max= 1.6443), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:55,879 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1006 (max= 1.6443), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:25:55,879 - root - INFO - Step 390: lr=7.82E-06, loss= 1.1006 (max= 1.6443), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:27,728 - root - INFO - Step 400: lr=8.02E-06, loss= 1.0693 (max= 1.5246), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:27,728 - root - INFO - Step 400: lr=8.02E-06, loss= 1.0693 (max= 1.5246), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:27,728 - root - INFO - Step 400: lr=8.02E-06, loss= 1.0693 (max= 1.5246), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:27,728 - root - INFO - Step 400: lr=8.02E-06, loss= 1.0693 (max= 1.5246), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:27,728 - root - INFO - Step 400: lr=8.02E-06, loss= 1.0693 (max= 1.5246), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:27,728 - root - INFO - Step 400: lr=8.02E-06, loss= 1.0693 (max= 1.5246), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:27,729 - root - INFO - Step 400: lr=8.02E-06, loss= 1.0693 (max= 1.5246), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:27,729 - root - INFO - Step 400: lr=8.02E-06, loss= 1.0693 (max= 1.5246), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:59,548 - root - INFO - Step 410: lr=8.22E-06, loss= 1.0607 (max= 1.5058), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:59,548 - root - INFO - Step 410: lr=8.22E-06, loss= 1.0607 (max= 1.5058), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:59,548 - root - INFO - Step 410: lr=8.22E-06, loss= 1.0607 (max= 1.5058), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:59,548 - root - INFO - Step 410: lr=8.22E-06, loss= 1.0607 (max= 1.5058), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:59,548 - root - INFO - Step 410: lr=8.22E-06, loss= 1.0607 (max= 1.5058), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:59,548 - root - INFO - Step 410: lr=8.22E-06, loss= 1.0607 (max= 1.5058), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:59,548 - root - INFO - Step 410: lr=8.22E-06, loss= 1.0607 (max= 1.5058), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:26:59,548 - root - INFO - Step 410: lr=8.22E-06, loss= 1.0607 (max= 1.5058), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:27:31,500 - root - INFO - Step 420: lr=8.42E-06, loss= 1.0753 (max= 1.5686), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:27:31,500 - root - INFO - Step 420: lr=8.42E-06, loss= 1.0753 (max= 1.5686), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:27:31,500 - root - INFO - Step 420: lr=8.42E-06, loss= 1.0753 (max= 1.5686), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:27:31,500 - root - INFO - Step 420: lr=8.42E-06, loss= 1.0753 (max= 1.5686), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:27:31,500 - root - INFO - Step 420: lr=8.42E-06, loss= 1.0753 (max= 1.5686), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:27:31,500 - root - INFO - Step 420: lr=8.42E-06, loss= 1.0753 (max= 1.5686), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:27:31,501 - root - INFO - Step 420: lr=8.42E-06, loss= 1.0753 (max= 1.5686), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:27:31,501 - root - INFO - Step 420: lr=8.42E-06, loss= 1.0753 (max= 1.5686), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:03,344 - root - INFO - Step 430: lr=8.62E-06, loss= 1.0988 (max= 1.5644), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:03,344 - root - INFO - Step 430: lr=8.62E-06, loss= 1.0988 (max= 1.5644), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:03,344 - root - INFO - Step 430: lr=8.62E-06, loss= 1.0988 (max= 1.5644), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:03,344 - root - INFO - Step 430: lr=8.62E-06, loss= 1.0988 (max= 1.5644), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:03,344 - root - INFO - Step 430: lr=8.62E-06, loss= 1.0988 (max= 1.5644), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:03,344 - root - INFO - Step 430: lr=8.62E-06, loss= 1.0988 (max= 1.5644), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:03,344 - root - INFO - Step 430: lr=8.62E-06, loss= 1.0988 (max= 1.5644), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:03,345 - root - INFO - Step 430: lr=8.62E-06, loss= 1.0988 (max= 1.5644), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:35,168 - root - INFO - Step 440: lr=8.82E-06, loss= 1.0863 (max= 1.5076), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:35,168 - root - INFO - Step 440: lr=8.82E-06, loss= 1.0863 (max= 1.5076), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:35,168 - root - INFO - Step 440: lr=8.82E-06, loss= 1.0863 (max= 1.5076), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:35,168 - root - INFO - Step 440: lr=8.82E-06, loss= 1.0863 (max= 1.5076), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:35,168 - root - INFO - Step 440: lr=8.82E-06, loss= 1.0863 (max= 1.5076), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:35,168 - root - INFO - Step 440: lr=8.82E-06, loss= 1.0863 (max= 1.5076), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:35,168 - root - INFO - Step 440: lr=8.82E-06, loss= 1.0863 (max= 1.5076), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:28:35,169 - root - INFO - Step 440: lr=8.82E-06, loss= 1.0863 (max= 1.5076), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:06,968 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1012 (max= 1.4983), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:06,968 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1012 (max= 1.4983), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:06,968 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1012 (max= 1.4983), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:06,968 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1012 (max= 1.4983), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:06,968 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1012 (max= 1.4983), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:06,968 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1012 (max= 1.4983), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:06,968 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1012 (max= 1.4983), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:06,969 - root - INFO - Step 450: lr=9.02E-06, loss= 1.1012 (max= 1.4983), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:38,868 - root - INFO - Step 460: lr=9.22E-06, loss= 1.0725 (max= 1.3704), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:38,868 - root - INFO - Step 460: lr=9.22E-06, loss= 1.0725 (max= 1.3704), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:38,868 - root - INFO - Step 460: lr=9.22E-06, loss= 1.0725 (max= 1.3704), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:38,868 - root - INFO - Step 460: lr=9.22E-06, loss= 1.0725 (max= 1.3704), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:38,868 - root - INFO - Step 460: lr=9.22E-06, loss= 1.0725 (max= 1.3704), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:38,868 - root - INFO - Step 460: lr=9.22E-06, loss= 1.0725 (max= 1.3704), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:38,868 - root - INFO - Step 460: lr=9.22E-06, loss= 1.0725 (max= 1.3704), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:29:38,868 - root - INFO - Step 460: lr=9.22E-06, loss= 1.0725 (max= 1.3704), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:10,713 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1006 (max= 1.4763), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:10,713 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1006 (max= 1.4763), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:10,713 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1006 (max= 1.4763), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:10,714 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1006 (max= 1.4763), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:10,714 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1006 (max= 1.4763), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:10,714 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1006 (max= 1.4763), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:10,714 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1006 (max= 1.4763), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:10,714 - root - INFO - Step 470: lr=9.42E-06, loss= 1.1006 (max= 1.4763), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:42,546 - root - INFO - Step 480: lr=9.62E-06, loss= 1.0846 (max= 1.5602), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:42,547 - root - INFO - Step 480: lr=9.62E-06, loss= 1.0846 (max= 1.5602), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:42,547 - root - INFO - Step 480: lr=9.62E-06, loss= 1.0846 (max= 1.5602), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:42,547 - root - INFO - Step 480: lr=9.62E-06, loss= 1.0846 (max= 1.5602), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:42,547 - root - INFO - Step 480: lr=9.62E-06, loss= 1.0846 (max= 1.5602), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:42,547 - root - INFO - Step 480: lr=9.62E-06, loss= 1.0846 (max= 1.5602), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:42,547 - root - INFO - Step 480: lr=9.62E-06, loss= 1.0846 (max= 1.5602), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:30:42,547 - root - INFO - Step 480: lr=9.62E-06, loss= 1.0846 (max= 1.5602), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:14,411 - root - INFO - Step 490: lr=9.82E-06, loss= 1.0926 (max= 1.6155), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:14,411 - root - INFO - Step 490: lr=9.82E-06, loss= 1.0926 (max= 1.6155), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:14,411 - root - INFO - Step 490: lr=9.82E-06, loss= 1.0926 (max= 1.6155), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:14,411 - root - INFO - Step 490: lr=9.82E-06, loss= 1.0926 (max= 1.6155), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:14,411 - root - INFO - Step 490: lr=9.82E-06, loss= 1.0926 (max= 1.6155), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:14,411 - root - INFO - Step 490: lr=9.82E-06, loss= 1.0926 (max= 1.6155), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:14,411 - root - INFO - Step 490: lr=9.82E-06, loss= 1.0926 (max= 1.6155), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:14,411 - root - INFO - Step 490: lr=9.82E-06, loss= 1.0926 (max= 1.6155), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:46,323 - root - INFO - Step 500: lr=1.00E-05, loss= 1.0899 (max= 1.5437), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:46,323 - root - INFO - Step 500: lr=1.00E-05, loss= 1.0899 (max= 1.5437), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:46,323 - root - INFO - Step 500: lr=1.00E-05, loss= 1.0899 (max= 1.5437), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:46,323 - root - INFO - Step 500: lr=1.00E-05, loss= 1.0899 (max= 1.5437), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:46,323 - root - INFO - Step 500: lr=1.00E-05, loss= 1.0899 (max= 1.5437), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:46,323 - root - INFO - Step 500: lr=1.00E-05, loss= 1.0899 (max= 1.5437), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:46,323 - root - INFO - Step 500: lr=1.00E-05, loss= 1.0899 (max= 1.5437), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:31:46,323 - root - INFO - Step 500: lr=1.00E-05, loss= 1.0899 (max= 1.5437), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:32:18,131 - root - INFO - Step 510: lr=9.77E-06, loss= 1.1063 (max= 1.5637), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) 2025-10-26 08:32:18,131 - root - INFO - Step 510: lr=9.77E-06, loss= 1.1063 (max= 1.5637), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) 2025-10-26 08:32:18,131 - root - INFO - Step 510: lr=9.77E-06, loss= 1.1063 (max= 1.5637), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) 2025-10-26 08:32:18,131 - root - INFO - Step 510: lr=9.77E-06, loss= 1.1063 (max= 1.5637), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) 2025-10-26 08:32:18,131 - root - INFO - Step 510: lr=9.77E-06, loss= 1.1063 (max= 1.5637), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) 2025-10-26 08:32:18,131 - root - INFO - Step 510: lr=9.77E-06, loss= 1.1063 (max= 1.5637), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) 2025-10-26 08:32:18,131 - root - INFO - Step 510: lr=9.77E-06, loss= 1.1063 (max= 1.5637), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) 2025-10-26 08:32:18,131 - root - INFO - Step 510: lr=9.77E-06, loss= 1.1063 (max= 1.5637), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.02%) 2025-10-26 08:32:49,977 - root - INFO - Step 520: lr=9.67E-06, loss= 1.0922 (max= 1.5875), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:32:49,977 - root - INFO - Step 520: lr=9.67E-06, loss= 1.0922 (max= 1.5875), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:32:49,977 - root - INFO - Step 520: lr=9.67E-06, loss= 1.0922 (max= 1.5875), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:32:49,977 - root - INFO - Step 520: lr=9.67E-06, loss= 1.0922 (max= 1.5875), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:32:49,977 - root - INFO - Step 520: lr=9.67E-06, loss= 1.0922 (max= 1.5875), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:32:49,977 - root - INFO - Step 520: lr=9.67E-06, loss= 1.0922 (max= 1.5875), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:32:49,977 - root - INFO - Step 520: lr=9.67E-06, loss= 1.0922 (max= 1.5875), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:32:49,977 - root - INFO - Step 520: lr=9.67E-06, loss= 1.0922 (max= 1.5875), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:21,918 - root - INFO - Step 530: lr=9.60E-06, loss= 1.0842 (max= 1.5073), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:21,918 - root - INFO - Step 530: lr=9.60E-06, loss= 1.0842 (max= 1.5073), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:21,918 - root - INFO - Step 530: lr=9.60E-06, loss= 1.0842 (max= 1.5073), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:21,918 - root - INFO - Step 530: lr=9.60E-06, loss= 1.0842 (max= 1.5073), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:21,918 - root - INFO - Step 530: lr=9.60E-06, loss= 1.0842 (max= 1.5073), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:21,918 - root - INFO - Step 530: lr=9.60E-06, loss= 1.0842 (max= 1.5073), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:21,918 - root - INFO - Step 530: lr=9.60E-06, loss= 1.0842 (max= 1.5073), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:21,918 - root - INFO - Step 530: lr=9.60E-06, loss= 1.0842 (max= 1.5073), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:53,802 - root - INFO - Step 540: lr=9.53E-06, loss= 1.0866 (max= 1.8305), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:53,802 - root - INFO - Step 540: lr=9.53E-06, loss= 1.0866 (max= 1.8305), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:53,802 - root - INFO - Step 540: lr=9.53E-06, loss= 1.0866 (max= 1.8305), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:53,802 - root - INFO - Step 540: lr=9.53E-06, loss= 1.0866 (max= 1.8305), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:53,802 - root - INFO - Step 540: lr=9.53E-06, loss= 1.0866 (max= 1.8305), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:53,802 - root - INFO - Step 540: lr=9.53E-06, loss= 1.0866 (max= 1.8305), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:53,802 - root - INFO - Step 540: lr=9.53E-06, loss= 1.0866 (max= 1.8305), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:33:53,802 - root - INFO - Step 540: lr=9.53E-06, loss= 1.0866 (max= 1.8305), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:25,692 - root - INFO - Step 550: lr=9.48E-06, loss= 1.1070 (max= 1.6031), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:25,693 - root - INFO - Step 550: lr=9.48E-06, loss= 1.1070 (max= 1.6031), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:25,693 - root - INFO - Step 550: lr=9.48E-06, loss= 1.1070 (max= 1.6031), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:25,693 - root - INFO - Step 550: lr=9.48E-06, loss= 1.1070 (max= 1.6031), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:25,693 - root - INFO - Step 550: lr=9.48E-06, loss= 1.1070 (max= 1.6031), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:25,693 - root - INFO - Step 550: lr=9.48E-06, loss= 1.1070 (max= 1.6031), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:25,693 - root - INFO - Step 550: lr=9.48E-06, loss= 1.1070 (max= 1.6031), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:25,693 - root - INFO - Step 550: lr=9.48E-06, loss= 1.1070 (max= 1.6031), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:57,539 - root - INFO - Step 560: lr=9.43E-06, loss= 1.0948 (max= 1.5106), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:57,540 - root - INFO - Step 560: lr=9.43E-06, loss= 1.0948 (max= 1.5106), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:57,540 - root - INFO - Step 560: lr=9.43E-06, loss= 1.0948 (max= 1.5106), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:57,540 - root - INFO - Step 560: lr=9.43E-06, loss= 1.0948 (max= 1.5106), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:57,540 - root - INFO - Step 560: lr=9.43E-06, loss= 1.0948 (max= 1.5106), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:57,540 - root - INFO - Step 560: lr=9.43E-06, loss= 1.0948 (max= 1.5106), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:57,540 - root - INFO - Step 560: lr=9.43E-06, loss= 1.0948 (max= 1.5106), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:34:57,540 - root - INFO - Step 560: lr=9.43E-06, loss= 1.0948 (max= 1.5106), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:35:29,437 - root - INFO - Step 570: lr=9.38E-06, loss= 1.1014 (max= 1.7429), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:35:29,437 - root - INFO - Step 570: lr=9.38E-06, loss= 1.1014 (max= 1.7429), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:35:29,437 - root - INFO - Step 570: lr=9.38E-06, loss= 1.1014 (max= 1.7429), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:35:29,437 - root - INFO - Step 570: lr=9.38E-06, loss= 1.1014 (max= 1.7429), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:35:29,437 - root - INFO - Step 570: lr=9.38E-06, loss= 1.1014 (max= 1.7429), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:35:29,437 - root - INFO - Step 570: lr=9.38E-06, loss= 1.1014 (max= 1.7429), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:35:29,437 - root - INFO - Step 570: lr=9.38E-06, loss= 1.1014 (max= 1.7429), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:35:29,437 - root - INFO - Step 570: lr=9.38E-06, loss= 1.1014 (max= 1.7429), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:01,305 - root - INFO - Step 580: lr=9.34E-06, loss= 1.0661 (max= 1.4901), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:01,305 - root - INFO - Step 580: lr=9.34E-06, loss= 1.0661 (max= 1.4901), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:01,305 - root - INFO - Step 580: lr=9.34E-06, loss= 1.0661 (max= 1.4901), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:01,305 - root - INFO - Step 580: lr=9.34E-06, loss= 1.0661 (max= 1.4901), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:01,305 - root - INFO - Step 580: lr=9.34E-06, loss= 1.0661 (max= 1.4901), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:01,305 - root - INFO - Step 580: lr=9.34E-06, loss= 1.0661 (max= 1.4901), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:01,305 - root - INFO - Step 580: lr=9.34E-06, loss= 1.0661 (max= 1.4901), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:01,305 - root - INFO - Step 580: lr=9.34E-06, loss= 1.0661 (max= 1.4901), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:33,133 - root - INFO - Step 590: lr=9.30E-06, loss= 1.0927 (max= 1.6634), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:33,133 - root - INFO - Step 590: lr=9.30E-06, loss= 1.0927 (max= 1.6634), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:33,133 - root - INFO - Step 590: lr=9.30E-06, loss= 1.0927 (max= 1.6634), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:33,134 - root - INFO - Step 590: lr=9.30E-06, loss= 1.0927 (max= 1.6634), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:33,134 - root - INFO - Step 590: lr=9.30E-06, loss= 1.0927 (max= 1.6634), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:33,134 - root - INFO - Step 590: lr=9.30E-06, loss= 1.0927 (max= 1.6634), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:33,134 - root - INFO - Step 590: lr=9.30E-06, loss= 1.0927 (max= 1.6634), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:36:33,134 - root - INFO - Step 590: lr=9.30E-06, loss= 1.0927 (max= 1.6634), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:04,969 - root - INFO - Step 600: lr=9.26E-06, loss= 1.0947 (max= 1.4955), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:04,969 - root - INFO - Step 600: lr=9.26E-06, loss= 1.0947 (max= 1.4955), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:04,969 - root - INFO - Step 600: lr=9.26E-06, loss= 1.0947 (max= 1.4955), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:04,969 - root - INFO - Step 600: lr=9.26E-06, loss= 1.0947 (max= 1.4955), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:04,969 - root - INFO - Step 600: lr=9.26E-06, loss= 1.0947 (max= 1.4955), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:04,969 - root - INFO - Step 600: lr=9.26E-06, loss= 1.0947 (max= 1.4955), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:04,969 - root - INFO - Step 600: lr=9.26E-06, loss= 1.0947 (max= 1.4955), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:04,969 - root - INFO - Step 600: lr=9.26E-06, loss= 1.0947 (max= 1.4955), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:36,880 - root - INFO - Step 610: lr=9.23E-06, loss= 1.0874 (max= 1.4869), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:36,880 - root - INFO - Step 610: lr=9.23E-06, loss= 1.0874 (max= 1.4869), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:36,880 - root - INFO - Step 610: lr=9.23E-06, loss= 1.0874 (max= 1.4869), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:36,880 - root - INFO - Step 610: lr=9.23E-06, loss= 1.0874 (max= 1.4869), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:36,880 - root - INFO - Step 610: lr=9.23E-06, loss= 1.0874 (max= 1.4869), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:36,880 - root - INFO - Step 610: lr=9.23E-06, loss= 1.0874 (max= 1.4869), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:36,880 - root - INFO - Step 610: lr=9.23E-06, loss= 1.0874 (max= 1.4869), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:37:36,880 - root - INFO - Step 610: lr=9.23E-06, loss= 1.0874 (max= 1.4869), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:08,741 - root - INFO - Step 620: lr=9.19E-06, loss= 1.0928 (max= 1.5578), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:08,741 - root - INFO - Step 620: lr=9.19E-06, loss= 1.0928 (max= 1.5578), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:08,741 - root - INFO - Step 620: lr=9.19E-06, loss= 1.0928 (max= 1.5578), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:08,741 - root - INFO - Step 620: lr=9.19E-06, loss= 1.0928 (max= 1.5578), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:08,741 - root - INFO - Step 620: lr=9.19E-06, loss= 1.0928 (max= 1.5578), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:08,741 - root - INFO - Step 620: lr=9.19E-06, loss= 1.0928 (max= 1.5578), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:08,741 - root - INFO - Step 620: lr=9.19E-06, loss= 1.0928 (max= 1.5578), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:08,741 - root - INFO - Step 620: lr=9.19E-06, loss= 1.0928 (max= 1.5578), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:40,592 - root - INFO - Step 630: lr=9.16E-06, loss= 1.0976 (max= 1.5681), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:40,593 - root - INFO - Step 630: lr=9.16E-06, loss= 1.0976 (max= 1.5681), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:40,593 - root - INFO - Step 630: lr=9.16E-06, loss= 1.0976 (max= 1.5681), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:40,593 - root - INFO - Step 630: lr=9.16E-06, loss= 1.0976 (max= 1.5681), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:40,593 - root - INFO - Step 630: lr=9.16E-06, loss= 1.0976 (max= 1.5681), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:40,593 - root - INFO - Step 630: lr=9.16E-06, loss= 1.0976 (max= 1.5681), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:40,593 - root - INFO - Step 630: lr=9.16E-06, loss= 1.0976 (max= 1.5681), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:38:40,593 - root - INFO - Step 630: lr=9.16E-06, loss= 1.0976 (max= 1.5681), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:12,417 - root - INFO - Step 640: lr=9.13E-06, loss= 1.0984 (max= 1.5015), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:12,417 - root - INFO - Step 640: lr=9.13E-06, loss= 1.0984 (max= 1.5015), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:12,417 - root - INFO - Step 640: lr=9.13E-06, loss= 1.0984 (max= 1.5015), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:12,417 - root - INFO - Step 640: lr=9.13E-06, loss= 1.0984 (max= 1.5015), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:12,417 - root - INFO - Step 640: lr=9.13E-06, loss= 1.0984 (max= 1.5015), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:12,417 - root - INFO - Step 640: lr=9.13E-06, loss= 1.0984 (max= 1.5015), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:12,417 - root - INFO - Step 640: lr=9.13E-06, loss= 1.0984 (max= 1.5015), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:12,417 - root - INFO - Step 640: lr=9.13E-06, loss= 1.0984 (max= 1.5015), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:44,269 - root - INFO - Step 650: lr=9.10E-06, loss= 1.0884 (max= 1.4770), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:44,269 - root - INFO - Step 650: lr=9.10E-06, loss= 1.0884 (max= 1.4770), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:44,269 - root - INFO - Step 650: lr=9.10E-06, loss= 1.0884 (max= 1.4770), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:44,269 - root - INFO - Step 650: lr=9.10E-06, loss= 1.0884 (max= 1.4770), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:44,269 - root - INFO - Step 650: lr=9.10E-06, loss= 1.0884 (max= 1.4770), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:44,269 - root - INFO - Step 650: lr=9.10E-06, loss= 1.0884 (max= 1.4770), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:44,269 - root - INFO - Step 650: lr=9.10E-06, loss= 1.0884 (max= 1.4770), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:39:44,269 - root - INFO - Step 650: lr=9.10E-06, loss= 1.0884 (max= 1.4770), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:16,174 - root - INFO - Step 660: lr=9.07E-06, loss= 1.1108 (max= 1.4755), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:16,174 - root - INFO - Step 660: lr=9.07E-06, loss= 1.1108 (max= 1.4755), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:16,174 - root - INFO - Step 660: lr=9.07E-06, loss= 1.1108 (max= 1.4755), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:16,174 - root - INFO - Step 660: lr=9.07E-06, loss= 1.1108 (max= 1.4755), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:16,174 - root - INFO - Step 660: lr=9.07E-06, loss= 1.1108 (max= 1.4755), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:16,174 - root - INFO - Step 660: lr=9.07E-06, loss= 1.1108 (max= 1.4755), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:16,174 - root - INFO - Step 660: lr=9.07E-06, loss= 1.1108 (max= 1.4755), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:16,174 - root - INFO - Step 660: lr=9.07E-06, loss= 1.1108 (max= 1.4755), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:48,053 - root - INFO - Step 670: lr=9.04E-06, loss= 1.0793 (max= 1.5382), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:48,053 - root - INFO - Step 670: lr=9.04E-06, loss= 1.0793 (max= 1.5382), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:48,053 - root - INFO - Step 670: lr=9.04E-06, loss= 1.0793 (max= 1.5382), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:48,053 - root - INFO - Step 670: lr=9.04E-06, loss= 1.0793 (max= 1.5382), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:48,053 - root - INFO - Step 670: lr=9.04E-06, loss= 1.0793 (max= 1.5382), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:48,053 - root - INFO - Step 670: lr=9.04E-06, loss= 1.0793 (max= 1.5382), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:48,053 - root - INFO - Step 670: lr=9.04E-06, loss= 1.0793 (max= 1.5382), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:40:48,053 - root - INFO - Step 670: lr=9.04E-06, loss= 1.0793 (max= 1.5382), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:19,870 - root - INFO - Step 680: lr=9.01E-06, loss= 1.0909 (max= 1.6036), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:19,870 - root - INFO - Step 680: lr=9.01E-06, loss= 1.0909 (max= 1.6036), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:19,870 - root - INFO - Step 680: lr=9.01E-06, loss= 1.0909 (max= 1.6036), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:19,870 - root - INFO - Step 680: lr=9.01E-06, loss= 1.0909 (max= 1.6036), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:19,870 - root - INFO - Step 680: lr=9.01E-06, loss= 1.0909 (max= 1.6036), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:19,870 - root - INFO - Step 680: lr=9.01E-06, loss= 1.0909 (max= 1.6036), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:19,870 - root - INFO - Step 680: lr=9.01E-06, loss= 1.0909 (max= 1.6036), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:19,870 - root - INFO - Step 680: lr=9.01E-06, loss= 1.0909 (max= 1.6036), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:51,803 - root - INFO - Step 690: lr=8.98E-06, loss= 1.0959 (max= 1.5658), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:51,803 - root - INFO - Step 690: lr=8.98E-06, loss= 1.0959 (max= 1.5658), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:51,803 - root - INFO - Step 690: lr=8.98E-06, loss= 1.0959 (max= 1.5658), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:51,803 - root - INFO - Step 690: lr=8.98E-06, loss= 1.0959 (max= 1.5658), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:51,803 - root - INFO - Step 690: lr=8.98E-06, loss= 1.0959 (max= 1.5658), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:51,803 - root - INFO - Step 690: lr=8.98E-06, loss= 1.0959 (max= 1.5658), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:51,803 - root - INFO - Step 690: lr=8.98E-06, loss= 1.0959 (max= 1.5658), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:41:51,804 - root - INFO - Step 690: lr=8.98E-06, loss= 1.0959 (max= 1.5658), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:23,587 - root - INFO - Step 700: lr=8.96E-06, loss= 1.0883 (max= 1.5716), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:23,587 - root - INFO - Step 700: lr=8.96E-06, loss= 1.0883 (max= 1.5716), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:23,588 - root - INFO - Step 700: lr=8.96E-06, loss= 1.0883 (max= 1.5716), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:23,588 - root - INFO - Step 700: lr=8.96E-06, loss= 1.0883 (max= 1.5716), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:23,588 - root - INFO - Step 700: lr=8.96E-06, loss= 1.0883 (max= 1.5716), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:23,588 - root - INFO - Step 700: lr=8.96E-06, loss= 1.0883 (max= 1.5716), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:23,588 - root - INFO - Step 700: lr=8.96E-06, loss= 1.0883 (max= 1.5716), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:23,588 - root - INFO - Step 700: lr=8.96E-06, loss= 1.0883 (max= 1.5716), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:55,527 - root - INFO - Step 710: lr=8.93E-06, loss= 1.0755 (max= 1.4037), tps=20521, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:55,527 - root - INFO - Step 710: lr=8.93E-06, loss= 1.0755 (max= 1.4037), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:55,527 - root - INFO - Step 710: lr=8.93E-06, loss= 1.0755 (max= 1.4037), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:55,527 - root - INFO - Step 710: lr=8.93E-06, loss= 1.0755 (max= 1.4037), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:55,528 - root - INFO - Step 710: lr=8.93E-06, loss= 1.0755 (max= 1.4037), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:55,528 - root - INFO - Step 710: lr=8.93E-06, loss= 1.0755 (max= 1.4037), tps=20521, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:55,528 - root - INFO - Step 710: lr=8.93E-06, loss= 1.0755 (max= 1.4037), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:42:55,528 - root - INFO - Step 710: lr=8.93E-06, loss= 1.0755 (max= 1.4037), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:27,438 - root - INFO - Step 720: lr=8.91E-06, loss= 1.0866 (max= 1.5035), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:27,438 - root - INFO - Step 720: lr=8.91E-06, loss= 1.0866 (max= 1.5035), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:27,438 - root - INFO - Step 720: lr=8.91E-06, loss= 1.0866 (max= 1.5035), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:27,438 - root - INFO - Step 720: lr=8.91E-06, loss= 1.0866 (max= 1.5035), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:27,439 - root - INFO - Step 720: lr=8.91E-06, loss= 1.0866 (max= 1.5035), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:27,439 - root - INFO - Step 720: lr=8.91E-06, loss= 1.0866 (max= 1.5035), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:27,439 - root - INFO - Step 720: lr=8.91E-06, loss= 1.0866 (max= 1.5035), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:27,439 - root - INFO - Step 720: lr=8.91E-06, loss= 1.0866 (max= 1.5035), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:59,255 - root - INFO - Step 730: lr=8.88E-06, loss= 1.0845 (max= 1.5433), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:59,255 - root - INFO - Step 730: lr=8.88E-06, loss= 1.0845 (max= 1.5433), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:59,255 - root - INFO - Step 730: lr=8.88E-06, loss= 1.0845 (max= 1.5433), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:59,255 - root - INFO - Step 730: lr=8.88E-06, loss= 1.0845 (max= 1.5433), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:59,255 - root - INFO - Step 730: lr=8.88E-06, loss= 1.0845 (max= 1.5433), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:59,255 - root - INFO - Step 730: lr=8.88E-06, loss= 1.0845 (max= 1.5433), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:59,255 - root - INFO - Step 730: lr=8.88E-06, loss= 1.0845 (max= 1.5433), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:43:59,255 - root - INFO - Step 730: lr=8.88E-06, loss= 1.0845 (max= 1.5433), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:44:31,080 - root - INFO - Step 740: lr=8.86E-06, loss= 1.0968 (max= 1.5570), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:44:31,080 - root - INFO - Step 740: lr=8.86E-06, loss= 1.0968 (max= 1.5570), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:44:31,080 - root - INFO - Step 740: lr=8.86E-06, loss= 1.0968 (max= 1.5570), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:44:31,080 - root - INFO - Step 740: lr=8.86E-06, loss= 1.0968 (max= 1.5570), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:44:31,080 - root - INFO - Step 740: lr=8.86E-06, loss= 1.0968 (max= 1.5570), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:44:31,080 - root - INFO - Step 740: lr=8.86E-06, loss= 1.0968 (max= 1.5570), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:44:31,080 - root - INFO - Step 740: lr=8.86E-06, loss= 1.0968 (max= 1.5570), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:44:31,080 - root - INFO - Step 740: lr=8.86E-06, loss= 1.0968 (max= 1.5570), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:02,929 - root - INFO - Step 750: lr=8.84E-06, loss= 1.0715 (max= 1.5308), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:02,930 - root - INFO - Step 750: lr=8.84E-06, loss= 1.0715 (max= 1.5308), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:02,930 - root - INFO - Step 750: lr=8.84E-06, loss= 1.0715 (max= 1.5308), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:02,930 - root - INFO - Step 750: lr=8.84E-06, loss= 1.0715 (max= 1.5308), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:02,930 - root - INFO - Step 750: lr=8.84E-06, loss= 1.0715 (max= 1.5308), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:02,930 - root - INFO - Step 750: lr=8.84E-06, loss= 1.0715 (max= 1.5308), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:02,930 - root - INFO - Step 750: lr=8.84E-06, loss= 1.0715 (max= 1.5308), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:02,930 - root - INFO - Step 750: lr=8.84E-06, loss= 1.0715 (max= 1.5308), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:34,755 - root - INFO - Step 760: lr=8.81E-06, loss= 1.0903 (max= 1.6426), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:34,755 - root - INFO - Step 760: lr=8.81E-06, loss= 1.0903 (max= 1.6426), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:34,755 - root - INFO - Step 760: lr=8.81E-06, loss= 1.0903 (max= 1.6426), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:34,755 - root - INFO - Step 760: lr=8.81E-06, loss= 1.0903 (max= 1.6426), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:34,755 - root - INFO - Step 760: lr=8.81E-06, loss= 1.0903 (max= 1.6426), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:34,755 - root - INFO - Step 760: lr=8.81E-06, loss= 1.0903 (max= 1.6426), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:34,755 - root - INFO - Step 760: lr=8.81E-06, loss= 1.0903 (max= 1.6426), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:45:34,755 - root - INFO - Step 760: lr=8.81E-06, loss= 1.0903 (max= 1.6426), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:06,648 - root - INFO - Step 770: lr=8.79E-06, loss= 1.0840 (max= 1.5900), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:06,649 - root - INFO - Step 770: lr=8.79E-06, loss= 1.0840 (max= 1.5900), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:06,649 - root - INFO - Step 770: lr=8.79E-06, loss= 1.0840 (max= 1.5900), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:06,649 - root - INFO - Step 770: lr=8.79E-06, loss= 1.0840 (max= 1.5900), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:06,649 - root - INFO - Step 770: lr=8.79E-06, loss= 1.0840 (max= 1.5900), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:06,649 - root - INFO - Step 770: lr=8.79E-06, loss= 1.0840 (max= 1.5900), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:06,649 - root - INFO - Step 770: lr=8.79E-06, loss= 1.0840 (max= 1.5900), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:06,649 - root - INFO - Step 770: lr=8.79E-06, loss= 1.0840 (max= 1.5900), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:38,536 - root - INFO - Step 780: lr=8.77E-06, loss= 1.0881 (max= 1.5647), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:38,536 - root - INFO - Step 780: lr=8.77E-06, loss= 1.0881 (max= 1.5647), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:38,536 - root - INFO - Step 780: lr=8.77E-06, loss= 1.0881 (max= 1.5647), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:38,536 - root - INFO - Step 780: lr=8.77E-06, loss= 1.0881 (max= 1.5647), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:38,536 - root - INFO - Step 780: lr=8.77E-06, loss= 1.0881 (max= 1.5647), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:38,536 - root - INFO - Step 780: lr=8.77E-06, loss= 1.0881 (max= 1.5647), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:38,536 - root - INFO - Step 780: lr=8.77E-06, loss= 1.0881 (max= 1.5647), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:46:38,536 - root - INFO - Step 780: lr=8.77E-06, loss= 1.0881 (max= 1.5647), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:10,402 - root - INFO - Step 790: lr=8.75E-06, loss= 1.0848 (max= 1.4988), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:10,402 - root - INFO - Step 790: lr=8.75E-06, loss= 1.0848 (max= 1.4988), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:10,402 - root - INFO - Step 790: lr=8.75E-06, loss= 1.0848 (max= 1.4988), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:10,402 - root - INFO - Step 790: lr=8.75E-06, loss= 1.0848 (max= 1.4988), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:10,402 - root - INFO - Step 790: lr=8.75E-06, loss= 1.0848 (max= 1.4988), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:10,402 - root - INFO - Step 790: lr=8.75E-06, loss= 1.0848 (max= 1.4988), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:10,402 - root - INFO - Step 790: lr=8.75E-06, loss= 1.0848 (max= 1.4988), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:10,403 - root - INFO - Step 790: lr=8.75E-06, loss= 1.0848 (max= 1.4988), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:42,329 - root - INFO - Step 800: lr=8.72E-06, loss= 1.0742 (max= 1.6286), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:42,329 - root - INFO - Step 800: lr=8.72E-06, loss= 1.0742 (max= 1.6286), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:42,329 - root - INFO - Step 800: lr=8.72E-06, loss= 1.0742 (max= 1.6286), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:42,329 - root - INFO - Step 800: lr=8.72E-06, loss= 1.0742 (max= 1.6286), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:42,329 - root - INFO - Step 800: lr=8.72E-06, loss= 1.0742 (max= 1.6286), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:42,329 - root - INFO - Step 800: lr=8.72E-06, loss= 1.0742 (max= 1.6286), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:42,329 - root - INFO - Step 800: lr=8.72E-06, loss= 1.0742 (max= 1.6286), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:47:42,329 - root - INFO - Step 800: lr=8.72E-06, loss= 1.0742 (max= 1.6286), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:14,164 - root - INFO - Step 810: lr=8.70E-06, loss= 1.0868 (max= 1.4251), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:14,164 - root - INFO - Step 810: lr=8.70E-06, loss= 1.0868 (max= 1.4251), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:14,164 - root - INFO - Step 810: lr=8.70E-06, loss= 1.0868 (max= 1.4251), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:14,164 - root - INFO - Step 810: lr=8.70E-06, loss= 1.0868 (max= 1.4251), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:14,165 - root - INFO - Step 810: lr=8.70E-06, loss= 1.0868 (max= 1.4251), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:14,165 - root - INFO - Step 810: lr=8.70E-06, loss= 1.0868 (max= 1.4251), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:14,165 - root - INFO - Step 810: lr=8.70E-06, loss= 1.0868 (max= 1.4251), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:14,165 - root - INFO - Step 810: lr=8.70E-06, loss= 1.0868 (max= 1.4251), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:46,086 - root - INFO - Step 820: lr=8.68E-06, loss= 1.0815 (max= 1.5389), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:46,086 - root - INFO - Step 820: lr=8.68E-06, loss= 1.0815 (max= 1.5389), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:46,087 - root - INFO - Step 820: lr=8.68E-06, loss= 1.0815 (max= 1.5389), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:46,087 - root - INFO - Step 820: lr=8.68E-06, loss= 1.0815 (max= 1.5389), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:46,087 - root - INFO - Step 820: lr=8.68E-06, loss= 1.0815 (max= 1.5389), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:46,087 - root - INFO - Step 820: lr=8.68E-06, loss= 1.0815 (max= 1.5389), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:46,087 - root - INFO - Step 820: lr=8.68E-06, loss= 1.0815 (max= 1.5389), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:48:46,087 - root - INFO - Step 820: lr=8.68E-06, loss= 1.0815 (max= 1.5389), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:17,916 - root - INFO - Step 830: lr=8.66E-06, loss= 1.0914 (max= 1.6065), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:17,916 - root - INFO - Step 830: lr=8.66E-06, loss= 1.0914 (max= 1.6065), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:17,916 - root - INFO - Step 830: lr=8.66E-06, loss= 1.0914 (max= 1.6065), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:17,916 - root - INFO - Step 830: lr=8.66E-06, loss= 1.0914 (max= 1.6065), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:17,916 - root - INFO - Step 830: lr=8.66E-06, loss= 1.0914 (max= 1.6065), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:17,916 - root - INFO - Step 830: lr=8.66E-06, loss= 1.0914 (max= 1.6065), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:17,916 - root - INFO - Step 830: lr=8.66E-06, loss= 1.0914 (max= 1.6065), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:17,917 - root - INFO - Step 830: lr=8.66E-06, loss= 1.0914 (max= 1.6065), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:49,762 - root - INFO - Step 840: lr=8.64E-06, loss= 1.0855 (max= 1.4726), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:49,762 - root - INFO - Step 840: lr=8.64E-06, loss= 1.0855 (max= 1.4726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:49,762 - root - INFO - Step 840: lr=8.64E-06, loss= 1.0855 (max= 1.4726), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:49,762 - root - INFO - Step 840: lr=8.64E-06, loss= 1.0855 (max= 1.4726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:49,762 - root - INFO - Step 840: lr=8.64E-06, loss= 1.0855 (max= 1.4726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:49,762 - root - INFO - Step 840: lr=8.64E-06, loss= 1.0855 (max= 1.4726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:49,762 - root - INFO - Step 840: lr=8.64E-06, loss= 1.0855 (max= 1.4726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:49:49,762 - root - INFO - Step 840: lr=8.64E-06, loss= 1.0855 (max= 1.4726), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:21,680 - root - INFO - Step 850: lr=8.62E-06, loss= 1.0981 (max= 1.6991), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:21,680 - root - INFO - Step 850: lr=8.62E-06, loss= 1.0981 (max= 1.6991), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:21,680 - root - INFO - Step 850: lr=8.62E-06, loss= 1.0981 (max= 1.6991), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:21,680 - root - INFO - Step 850: lr=8.62E-06, loss= 1.0981 (max= 1.6991), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:21,680 - root - INFO - Step 850: lr=8.62E-06, loss= 1.0981 (max= 1.6991), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:21,680 - root - INFO - Step 850: lr=8.62E-06, loss= 1.0981 (max= 1.6991), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:21,680 - root - INFO - Step 850: lr=8.62E-06, loss= 1.0981 (max= 1.6991), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:21,680 - root - INFO - Step 850: lr=8.62E-06, loss= 1.0981 (max= 1.6991), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:53,631 - root - INFO - Step 860: lr=8.60E-06, loss= 1.1088 (max= 1.4734), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:53,631 - root - INFO - Step 860: lr=8.60E-06, loss= 1.1088 (max= 1.4734), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:53,631 - root - INFO - Step 860: lr=8.60E-06, loss= 1.1088 (max= 1.4734), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:53,631 - root - INFO - Step 860: lr=8.60E-06, loss= 1.1088 (max= 1.4734), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:53,631 - root - INFO - Step 860: lr=8.60E-06, loss= 1.1088 (max= 1.4734), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:53,631 - root - INFO - Step 860: lr=8.60E-06, loss= 1.1088 (max= 1.4734), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:53,632 - root - INFO - Step 860: lr=8.60E-06, loss= 1.1088 (max= 1.4734), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:50:53,632 - root - INFO - Step 860: lr=8.60E-06, loss= 1.1088 (max= 1.4734), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:25,464 - root - INFO - Step 870: lr=8.58E-06, loss= 1.0915 (max= 1.5132), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:25,464 - root - INFO - Step 870: lr=8.58E-06, loss= 1.0915 (max= 1.5132), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:25,464 - root - INFO - Step 870: lr=8.58E-06, loss= 1.0915 (max= 1.5132), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:25,464 - root - INFO - Step 870: lr=8.58E-06, loss= 1.0915 (max= 1.5132), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:25,464 - root - INFO - Step 870: lr=8.58E-06, loss= 1.0915 (max= 1.5132), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:25,464 - root - INFO - Step 870: lr=8.58E-06, loss= 1.0915 (max= 1.5132), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:25,464 - root - INFO - Step 870: lr=8.58E-06, loss= 1.0915 (max= 1.5132), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:25,465 - root - INFO - Step 870: lr=8.58E-06, loss= 1.0915 (max= 1.5132), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:57,382 - root - INFO - Step 880: lr=8.56E-06, loss= 1.0971 (max= 1.6139), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:57,382 - root - INFO - Step 880: lr=8.56E-06, loss= 1.0971 (max= 1.6139), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:57,382 - root - INFO - Step 880: lr=8.56E-06, loss= 1.0971 (max= 1.6139), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:57,382 - root - INFO - Step 880: lr=8.56E-06, loss= 1.0971 (max= 1.6139), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:57,382 - root - INFO - Step 880: lr=8.56E-06, loss= 1.0971 (max= 1.6139), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:57,382 - root - INFO - Step 880: lr=8.56E-06, loss= 1.0971 (max= 1.6139), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:57,383 - root - INFO - Step 880: lr=8.56E-06, loss= 1.0971 (max= 1.6139), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:51:57,383 - root - INFO - Step 880: lr=8.56E-06, loss= 1.0971 (max= 1.6139), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:52:29,249 - root - INFO - Step 890: lr=8.55E-06, loss= 1.0822 (max= 1.6344), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:52:29,249 - root - INFO - Step 890: lr=8.55E-06, loss= 1.0822 (max= 1.6344), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:52:29,249 - root - INFO - Step 890: lr=8.55E-06, loss= 1.0822 (max= 1.6344), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:52:29,249 - root - INFO - Step 890: lr=8.55E-06, loss= 1.0822 (max= 1.6344), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:52:29,249 - root - INFO - Step 890: lr=8.55E-06, loss= 1.0822 (max= 1.6344), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:52:29,249 - root - INFO - Step 890: lr=8.55E-06, loss= 1.0822 (max= 1.6344), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:52:29,249 - root - INFO - Step 890: lr=8.55E-06, loss= 1.0822 (max= 1.6344), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:52:29,249 - root - INFO - Step 890: lr=8.55E-06, loss= 1.0822 (max= 1.6344), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:01,140 - root - INFO - Step 900: lr=8.53E-06, loss= 1.0714 (max= 1.6066), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:01,140 - root - INFO - Step 900: lr=8.53E-06, loss= 1.0714 (max= 1.6066), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:01,140 - root - INFO - Step 900: lr=8.53E-06, loss= 1.0714 (max= 1.6066), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:01,140 - root - INFO - Step 900: lr=8.53E-06, loss= 1.0714 (max= 1.6066), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:01,141 - root - INFO - Step 900: lr=8.53E-06, loss= 1.0714 (max= 1.6066), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:01,141 - root - INFO - Step 900: lr=8.53E-06, loss= 1.0714 (max= 1.6066), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:01,141 - root - INFO - Step 900: lr=8.53E-06, loss= 1.0714 (max= 1.6066), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:01,141 - root - INFO - Step 900: lr=8.53E-06, loss= 1.0714 (max= 1.6066), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:33,063 - root - INFO - Step 910: lr=8.51E-06, loss= 1.0788 (max= 1.4147), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:33,063 - root - INFO - Step 910: lr=8.51E-06, loss= 1.0788 (max= 1.4147), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:33,063 - root - INFO - Step 910: lr=8.51E-06, loss= 1.0788 (max= 1.4147), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:33,063 - root - INFO - Step 910: lr=8.51E-06, loss= 1.0788 (max= 1.4147), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:33,063 - root - INFO - Step 910: lr=8.51E-06, loss= 1.0788 (max= 1.4147), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:33,063 - root - INFO - Step 910: lr=8.51E-06, loss= 1.0788 (max= 1.4147), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:33,063 - root - INFO - Step 910: lr=8.51E-06, loss= 1.0788 (max= 1.4147), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:53:33,063 - root - INFO - Step 910: lr=8.51E-06, loss= 1.0788 (max= 1.4147), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:04,934 - root - INFO - Step 920: lr=8.49E-06, loss= 1.0632 (max= 1.5408), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:04,934 - root - INFO - Step 920: lr=8.49E-06, loss= 1.0632 (max= 1.5408), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:04,934 - root - INFO - Step 920: lr=8.49E-06, loss= 1.0632 (max= 1.5408), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:04,934 - root - INFO - Step 920: lr=8.49E-06, loss= 1.0632 (max= 1.5408), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:04,934 - root - INFO - Step 920: lr=8.49E-06, loss= 1.0632 (max= 1.5408), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:04,934 - root - INFO - Step 920: lr=8.49E-06, loss= 1.0632 (max= 1.5408), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:04,934 - root - INFO - Step 920: lr=8.49E-06, loss= 1.0632 (max= 1.5408), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:04,934 - root - INFO - Step 920: lr=8.49E-06, loss= 1.0632 (max= 1.5408), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:36,700 - root - INFO - Step 930: lr=8.47E-06, loss= 1.0833 (max= 1.5074), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:36,700 - root - INFO - Step 930: lr=8.47E-06, loss= 1.0833 (max= 1.5074), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:36,700 - root - INFO - Step 930: lr=8.47E-06, loss= 1.0833 (max= 1.5074), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:36,700 - root - INFO - Step 930: lr=8.47E-06, loss= 1.0833 (max= 1.5074), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:36,700 - root - INFO - Step 930: lr=8.47E-06, loss= 1.0833 (max= 1.5074), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:36,700 - root - INFO - Step 930: lr=8.47E-06, loss= 1.0833 (max= 1.5074), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:36,700 - root - INFO - Step 930: lr=8.47E-06, loss= 1.0833 (max= 1.5074), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:54:36,701 - root - INFO - Step 930: lr=8.47E-06, loss= 1.0833 (max= 1.5074), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:08,546 - root - INFO - Step 940: lr=8.45E-06, loss= 1.0853 (max= 1.6089), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:08,546 - root - INFO - Step 940: lr=8.45E-06, loss= 1.0853 (max= 1.6089), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:08,546 - root - INFO - Step 940: lr=8.45E-06, loss= 1.0853 (max= 1.6089), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:08,546 - root - INFO - Step 940: lr=8.45E-06, loss= 1.0853 (max= 1.6089), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:08,546 - root - INFO - Step 940: lr=8.45E-06, loss= 1.0853 (max= 1.6089), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:08,546 - root - INFO - Step 940: lr=8.45E-06, loss= 1.0853 (max= 1.6089), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:08,546 - root - INFO - Step 940: lr=8.45E-06, loss= 1.0853 (max= 1.6089), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:08,547 - root - INFO - Step 940: lr=8.45E-06, loss= 1.0853 (max= 1.6089), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:40,403 - root - INFO - Step 950: lr=8.44E-06, loss= 1.0800 (max= 1.5025), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:40,403 - root - INFO - Step 950: lr=8.44E-06, loss= 1.0800 (max= 1.5025), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:40,403 - root - INFO - Step 950: lr=8.44E-06, loss= 1.0800 (max= 1.5025), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:40,404 - root - INFO - Step 950: lr=8.44E-06, loss= 1.0800 (max= 1.5025), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:40,404 - root - INFO - Step 950: lr=8.44E-06, loss= 1.0800 (max= 1.5025), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:40,404 - root - INFO - Step 950: lr=8.44E-06, loss= 1.0800 (max= 1.5025), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:40,404 - root - INFO - Step 950: lr=8.44E-06, loss= 1.0800 (max= 1.5025), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:55:40,404 - root - INFO - Step 950: lr=8.44E-06, loss= 1.0800 (max= 1.5025), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:12,194 - root - INFO - Step 960: lr=8.42E-06, loss= 1.0845 (max= 1.5063), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:12,194 - root - INFO - Step 960: lr=8.42E-06, loss= 1.0845 (max= 1.5063), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:12,194 - root - INFO - Step 960: lr=8.42E-06, loss= 1.0845 (max= 1.5063), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:12,194 - root - INFO - Step 960: lr=8.42E-06, loss= 1.0845 (max= 1.5063), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:12,195 - root - INFO - Step 960: lr=8.42E-06, loss= 1.0845 (max= 1.5063), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:12,195 - root - INFO - Step 960: lr=8.42E-06, loss= 1.0845 (max= 1.5063), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:12,195 - root - INFO - Step 960: lr=8.42E-06, loss= 1.0845 (max= 1.5063), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:12,195 - root - INFO - Step 960: lr=8.42E-06, loss= 1.0845 (max= 1.5063), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:44,048 - root - INFO - Step 970: lr=8.40E-06, loss= 1.0746 (max= 1.6603), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:44,048 - root - INFO - Step 970: lr=8.40E-06, loss= 1.0746 (max= 1.6603), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:44,048 - root - INFO - Step 970: lr=8.40E-06, loss= 1.0746 (max= 1.6603), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:44,048 - root - INFO - Step 970: lr=8.40E-06, loss= 1.0746 (max= 1.6603), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:44,048 - root - INFO - Step 970: lr=8.40E-06, loss= 1.0746 (max= 1.6603), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:44,048 - root - INFO - Step 970: lr=8.40E-06, loss= 1.0746 (max= 1.6603), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:44,048 - root - INFO - Step 970: lr=8.40E-06, loss= 1.0746 (max= 1.6603), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:56:44,048 - root - INFO - Step 970: lr=8.40E-06, loss= 1.0746 (max= 1.6603), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:15,931 - root - INFO - Step 980: lr=8.39E-06, loss= 1.0590 (max= 1.5234), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:15,931 - root - INFO - Step 980: lr=8.39E-06, loss= 1.0590 (max= 1.5234), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:15,931 - root - INFO - Step 980: lr=8.39E-06, loss= 1.0590 (max= 1.5234), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:15,931 - root - INFO - Step 980: lr=8.39E-06, loss= 1.0590 (max= 1.5234), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:15,931 - root - INFO - Step 980: lr=8.39E-06, loss= 1.0590 (max= 1.5234), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:15,931 - root - INFO - Step 980: lr=8.39E-06, loss= 1.0590 (max= 1.5234), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:15,931 - root - INFO - Step 980: lr=8.39E-06, loss= 1.0590 (max= 1.5234), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:15,931 - root - INFO - Step 980: lr=8.39E-06, loss= 1.0590 (max= 1.5234), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:47,811 - root - INFO - Step 990: lr=8.37E-06, loss= 1.0794 (max= 1.4295), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:47,811 - root - INFO - Step 990: lr=8.37E-06, loss= 1.0794 (max= 1.4295), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:47,811 - root - INFO - Step 990: lr=8.37E-06, loss= 1.0794 (max= 1.4295), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:47,811 - root - INFO - Step 990: lr=8.37E-06, loss= 1.0794 (max= 1.4295), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:47,811 - root - INFO - Step 990: lr=8.37E-06, loss= 1.0794 (max= 1.4295), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:47,811 - root - INFO - Step 990: lr=8.37E-06, loss= 1.0794 (max= 1.4295), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:47,811 - root - INFO - Step 990: lr=8.37E-06, loss= 1.0794 (max= 1.4295), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:57:47,811 - root - INFO - Step 990: lr=8.37E-06, loss= 1.0794 (max= 1.4295), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-1000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-1000! Save time: 4.5463526248931885 2025-10-26 08:58:19,713 - root - INFO - Step 1000: lr=8.35E-06, loss= 1.0770 (max= 1.5369), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:58:19,713 - root - INFO - Step 1000: lr=8.35E-06, loss= 1.0770 (max= 1.5369), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:58:19,713 - root - INFO - Step 1000: lr=8.35E-06, loss= 1.0770 (max= 1.5369), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:58:19,713 - root - INFO - Saving a full checkpoint at step 1000 2025-10-26 08:58:19,713 - root - INFO - Saving a full checkpoint at step 1000 2025-10-26 08:58:19,713 - root - INFO - Saving a full checkpoint at step 1000 2025-10-26 08:58:19,713 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 08:58:19,713 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 08:58:19,713 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 08:58:19,713 - root - INFO - Step 1000: lr=8.35E-06, loss= 1.0770 (max= 1.5369), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:58:19,713 - root - INFO - Step 1000: lr=8.35E-06, loss= 1.0770 (max= 1.5369), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:58:19,713 - root - INFO - Saving a full checkpoint at step 1000 2025-10-26 08:58:19,713 - root - INFO - Step 1000: lr=8.35E-06, loss= 1.0770 (max= 1.5369), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:58:19,713 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 08:58:19,713 - root - INFO - Saving a full checkpoint at step 1000 2025-10-26 08:58:19,713 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 08:58:19,713 - root - INFO - Step 1000: lr=8.35E-06, loss= 1.0770 (max= 1.5369), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:58:19,713 - root - INFO - Step 1000: lr=8.35E-06, loss= 1.0770 (max= 1.5369), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:58:19,713 - root - INFO - Saving a full checkpoint at step 1000 2025-10-26 08:58:19,713 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 08:58:19,713 - root - INFO - Saving a full checkpoint at step 1000 2025-10-26 08:58:19,713 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 08:58:19,713 - root - INFO - Saving a full checkpoint at step 1000 2025-10-26 08:58:19,713 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 08:58:44,128 - root - INFO - Finished saving the checkpoint in 24.42 seconds 2025-10-26 08:58:44,137 - root - INFO - Finished saving the checkpoint in 24.42 seconds 2025-10-26 08:58:44,138 - root - INFO - Finished saving the checkpoint in 24.42 seconds 2025-10-26 08:58:44,138 - root - INFO - Finished saving the checkpoint in 24.42 seconds 2025-10-26 08:58:44,138 - root - INFO - Finished saving the checkpoint in 24.42 seconds 2025-10-26 08:58:44,138 - root - INFO - Finished saving the checkpoint in 24.43 seconds 2025-10-26 08:58:44,138 - root - INFO - Finished saving the checkpoint in 24.43 seconds 2025-10-26 08:58:44,139 - root - INFO - Finished saving the checkpoint in 24.43 seconds 2025-10-26 08:59:15,911 - root - INFO - Step 1010: lr=8.34E-06, loss= 1.0957 (max= 1.4569), tps=11662, mfu=24.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:15,911 - root - INFO - Step 1010: lr=8.34E-06, loss= 1.0957 (max= 1.4569), tps=11662, mfu=24.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:15,911 - root - INFO - Step 1010: lr=8.34E-06, loss= 1.0957 (max= 1.4569), tps=11662, mfu=24.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:15,911 - root - INFO - Step 1010: lr=8.34E-06, loss= 1.0957 (max= 1.4569), tps=11662, mfu=24.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:15,911 - root - INFO - Step 1010: lr=8.34E-06, loss= 1.0957 (max= 1.4569), tps=11662, mfu=24.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:15,911 - root - INFO - Step 1010: lr=8.34E-06, loss= 1.0957 (max= 1.4569), tps=11662, mfu=24.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:15,912 - root - INFO - Step 1010: lr=8.34E-06, loss= 1.0957 (max= 1.4569), tps=11662, mfu=24.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:15,912 - root - INFO - Step 1010: lr=8.34E-06, loss= 1.0957 (max= 1.4569), tps=11662, mfu=24.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:47,868 - root - INFO - Step 1020: lr=8.32E-06, loss= 1.0705 (max= 1.5974), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:47,868 - root - INFO - Step 1020: lr=8.32E-06, loss= 1.0705 (max= 1.5974), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:47,868 - root - INFO - Step 1020: lr=8.32E-06, loss= 1.0705 (max= 1.5974), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:47,868 - root - INFO - Step 1020: lr=8.32E-06, loss= 1.0705 (max= 1.5974), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:47,868 - root - INFO - Step 1020: lr=8.32E-06, loss= 1.0705 (max= 1.5974), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:47,868 - root - INFO - Step 1020: lr=8.32E-06, loss= 1.0705 (max= 1.5974), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:47,868 - root - INFO - Step 1020: lr=8.32E-06, loss= 1.0705 (max= 1.5974), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 08:59:47,868 - root - INFO - Step 1020: lr=8.32E-06, loss= 1.0705 (max= 1.5974), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:19,745 - root - INFO - Step 1030: lr=8.30E-06, loss= 1.0569 (max= 1.4928), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:19,745 - root - INFO - Step 1030: lr=8.30E-06, loss= 1.0569 (max= 1.4928), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:19,745 - root - INFO - Step 1030: lr=8.30E-06, loss= 1.0569 (max= 1.4928), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:19,745 - root - INFO - Step 1030: lr=8.30E-06, loss= 1.0569 (max= 1.4928), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:19,745 - root - INFO - Step 1030: lr=8.30E-06, loss= 1.0569 (max= 1.4928), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:19,745 - root - INFO - Step 1030: lr=8.30E-06, loss= 1.0569 (max= 1.4928), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:19,745 - root - INFO - Step 1030: lr=8.30E-06, loss= 1.0569 (max= 1.4928), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:19,745 - root - INFO - Step 1030: lr=8.30E-06, loss= 1.0569 (max= 1.4928), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:51,572 - root - INFO - Step 1040: lr=8.29E-06, loss= 1.0682 (max= 1.5997), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:51,572 - root - INFO - Step 1040: lr=8.29E-06, loss= 1.0682 (max= 1.5997), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:51,572 - root - INFO - Step 1040: lr=8.29E-06, loss= 1.0682 (max= 1.5997), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:51,572 - root - INFO - Step 1040: lr=8.29E-06, loss= 1.0682 (max= 1.5997), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:51,572 - root - INFO - Step 1040: lr=8.29E-06, loss= 1.0682 (max= 1.5997), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:51,572 - root - INFO - Step 1040: lr=8.29E-06, loss= 1.0682 (max= 1.5997), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:51,572 - root - INFO - Step 1040: lr=8.29E-06, loss= 1.0682 (max= 1.5997), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:00:51,572 - root - INFO - Step 1040: lr=8.29E-06, loss= 1.0682 (max= 1.5997), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:23,528 - root - INFO - Step 1050: lr=8.27E-06, loss= 1.0566 (max= 1.5796), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:23,528 - root - INFO - Step 1050: lr=8.27E-06, loss= 1.0566 (max= 1.5796), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:23,528 - root - INFO - Step 1050: lr=8.27E-06, loss= 1.0566 (max= 1.5796), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:23,528 - root - INFO - Step 1050: lr=8.27E-06, loss= 1.0566 (max= 1.5796), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:23,528 - root - INFO - Step 1050: lr=8.27E-06, loss= 1.0566 (max= 1.5796), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:23,528 - root - INFO - Step 1050: lr=8.27E-06, loss= 1.0566 (max= 1.5796), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:23,528 - root - INFO - Step 1050: lr=8.27E-06, loss= 1.0566 (max= 1.5796), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:23,529 - root - INFO - Step 1050: lr=8.27E-06, loss= 1.0566 (max= 1.5796), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:55,298 - root - INFO - Step 1060: lr=8.26E-06, loss= 1.0460 (max= 1.5104), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:55,298 - root - INFO - Step 1060: lr=8.26E-06, loss= 1.0460 (max= 1.5104), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:55,298 - root - INFO - Step 1060: lr=8.26E-06, loss= 1.0460 (max= 1.5104), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:55,298 - root - INFO - Step 1060: lr=8.26E-06, loss= 1.0460 (max= 1.5104), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:55,298 - root - INFO - Step 1060: lr=8.26E-06, loss= 1.0460 (max= 1.5104), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:55,298 - root - INFO - Step 1060: lr=8.26E-06, loss= 1.0460 (max= 1.5104), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:55,298 - root - INFO - Step 1060: lr=8.26E-06, loss= 1.0460 (max= 1.5104), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:01:55,298 - root - INFO - Step 1060: lr=8.26E-06, loss= 1.0460 (max= 1.5104), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:27,134 - root - INFO - Step 1070: lr=8.24E-06, loss= 1.0747 (max= 1.8513), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:27,134 - root - INFO - Step 1070: lr=8.24E-06, loss= 1.0747 (max= 1.8513), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:27,134 - root - INFO - Step 1070: lr=8.24E-06, loss= 1.0747 (max= 1.8513), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:27,134 - root - INFO - Step 1070: lr=8.24E-06, loss= 1.0747 (max= 1.8513), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:27,134 - root - INFO - Step 1070: lr=8.24E-06, loss= 1.0747 (max= 1.8513), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:27,135 - root - INFO - Step 1070: lr=8.24E-06, loss= 1.0747 (max= 1.8513), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:27,135 - root - INFO - Step 1070: lr=8.24E-06, loss= 1.0747 (max= 1.8513), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:27,135 - root - INFO - Step 1070: lr=8.24E-06, loss= 1.0747 (max= 1.8513), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:58,921 - root - INFO - Step 1080: lr=8.23E-06, loss= 1.0710 (max= 1.5317), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:58,921 - root - INFO - Step 1080: lr=8.23E-06, loss= 1.0710 (max= 1.5317), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:58,921 - root - INFO - Step 1080: lr=8.23E-06, loss= 1.0710 (max= 1.5317), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:58,922 - root - INFO - Step 1080: lr=8.23E-06, loss= 1.0710 (max= 1.5317), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:58,922 - root - INFO - Step 1080: lr=8.23E-06, loss= 1.0710 (max= 1.5317), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:58,922 - root - INFO - Step 1080: lr=8.23E-06, loss= 1.0710 (max= 1.5317), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:58,922 - root - INFO - Step 1080: lr=8.23E-06, loss= 1.0710 (max= 1.5317), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:02:58,922 - root - INFO - Step 1080: lr=8.23E-06, loss= 1.0710 (max= 1.5317), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:03:30,927 - root - INFO - Step 1090: lr=8.21E-06, loss= 1.0530 (max= 1.4799), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:03:30,927 - root - INFO - Step 1090: lr=8.21E-06, loss= 1.0530 (max= 1.4799), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:03:30,927 - root - INFO - Step 1090: lr=8.21E-06, loss= 1.0530 (max= 1.4799), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:03:30,927 - root - INFO - Step 1090: lr=8.21E-06, loss= 1.0530 (max= 1.4799), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:03:30,927 - root - INFO - Step 1090: lr=8.21E-06, loss= 1.0530 (max= 1.4799), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:03:30,928 - root - INFO - Step 1090: lr=8.21E-06, loss= 1.0530 (max= 1.4799), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:03:30,928 - root - INFO - Step 1090: lr=8.21E-06, loss= 1.0530 (max= 1.4799), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:03:30,928 - root - INFO - Step 1090: lr=8.21E-06, loss= 1.0530 (max= 1.4799), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:02,898 - root - INFO - Step 1100: lr=8.20E-06, loss= 1.0459 (max= 1.4916), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:02,898 - root - INFO - Step 1100: lr=8.20E-06, loss= 1.0459 (max= 1.4916), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:02,898 - root - INFO - Step 1100: lr=8.20E-06, loss= 1.0459 (max= 1.4916), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:02,898 - root - INFO - Step 1100: lr=8.20E-06, loss= 1.0459 (max= 1.4916), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:02,898 - root - INFO - Step 1100: lr=8.20E-06, loss= 1.0459 (max= 1.4916), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:02,898 - root - INFO - Step 1100: lr=8.20E-06, loss= 1.0459 (max= 1.4916), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:02,898 - root - INFO - Step 1100: lr=8.20E-06, loss= 1.0459 (max= 1.4916), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:02,898 - root - INFO - Step 1100: lr=8.20E-06, loss= 1.0459 (max= 1.4916), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:35,125 - root - INFO - Step 1110: lr=8.18E-06, loss= 1.0761 (max= 1.4577), tps=20339, mfu=42.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:35,125 - root - INFO - Step 1110: lr=8.18E-06, loss= 1.0761 (max= 1.4577), tps=20338, mfu=42.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:35,125 - root - INFO - Step 1110: lr=8.18E-06, loss= 1.0761 (max= 1.4577), tps=20339, mfu=42.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:35,125 - root - INFO - Step 1110: lr=8.18E-06, loss= 1.0761 (max= 1.4577), tps=20338, mfu=42.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:35,125 - root - INFO - Step 1110: lr=8.18E-06, loss= 1.0761 (max= 1.4577), tps=20339, mfu=42.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:35,125 - root - INFO - Step 1110: lr=8.18E-06, loss= 1.0761 (max= 1.4577), tps=20339, mfu=42.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:35,125 - root - INFO - Step 1110: lr=8.18E-06, loss= 1.0761 (max= 1.4577), tps=20339, mfu=42.38%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:04:35,125 - root - INFO - Step 1110: lr=8.18E-06, loss= 1.0761 (max= 1.4577), tps=20338, mfu=42.37%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:06,953 - root - INFO - Step 1120: lr=8.17E-06, loss= 1.0584 (max= 1.4464), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:06,953 - root - INFO - Step 1120: lr=8.17E-06, loss= 1.0584 (max= 1.4464), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:06,953 - root - INFO - Step 1120: lr=8.17E-06, loss= 1.0584 (max= 1.4464), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:06,954 - root - INFO - Step 1120: lr=8.17E-06, loss= 1.0584 (max= 1.4464), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:06,954 - root - INFO - Step 1120: lr=8.17E-06, loss= 1.0584 (max= 1.4464), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:06,954 - root - INFO - Step 1120: lr=8.17E-06, loss= 1.0584 (max= 1.4464), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:06,954 - root - INFO - Step 1120: lr=8.17E-06, loss= 1.0584 (max= 1.4464), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:06,954 - root - INFO - Step 1120: lr=8.17E-06, loss= 1.0584 (max= 1.4464), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:38,766 - root - INFO - Step 1130: lr=8.15E-06, loss= 1.0563 (max= 1.5417), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:38,766 - root - INFO - Step 1130: lr=8.15E-06, loss= 1.0563 (max= 1.5417), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:38,766 - root - INFO - Step 1130: lr=8.15E-06, loss= 1.0563 (max= 1.5417), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:38,766 - root - INFO - Step 1130: lr=8.15E-06, loss= 1.0563 (max= 1.5417), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:38,766 - root - INFO - Step 1130: lr=8.15E-06, loss= 1.0563 (max= 1.5417), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:38,766 - root - INFO - Step 1130: lr=8.15E-06, loss= 1.0563 (max= 1.5417), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:38,766 - root - INFO - Step 1130: lr=8.15E-06, loss= 1.0563 (max= 1.5417), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:05:38,766 - root - INFO - Step 1130: lr=8.15E-06, loss= 1.0563 (max= 1.5417), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:10,601 - root - INFO - Step 1140: lr=8.14E-06, loss= 1.0584 (max= 1.5878), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:10,601 - root - INFO - Step 1140: lr=8.14E-06, loss= 1.0584 (max= 1.5878), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:10,601 - root - INFO - Step 1140: lr=8.14E-06, loss= 1.0584 (max= 1.5878), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:10,602 - root - INFO - Step 1140: lr=8.14E-06, loss= 1.0584 (max= 1.5878), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:10,602 - root - INFO - Step 1140: lr=8.14E-06, loss= 1.0584 (max= 1.5878), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:10,602 - root - INFO - Step 1140: lr=8.14E-06, loss= 1.0584 (max= 1.5878), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:10,602 - root - INFO - Step 1140: lr=8.14E-06, loss= 1.0584 (max= 1.5878), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:10,602 - root - INFO - Step 1140: lr=8.14E-06, loss= 1.0584 (max= 1.5878), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:42,422 - root - INFO - Step 1150: lr=8.12E-06, loss= 1.0463 (max= 1.4399), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:42,422 - root - INFO - Step 1150: lr=8.12E-06, loss= 1.0463 (max= 1.4399), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:42,422 - root - INFO - Step 1150: lr=8.12E-06, loss= 1.0463 (max= 1.4399), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:42,422 - root - INFO - Step 1150: lr=8.12E-06, loss= 1.0463 (max= 1.4399), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:42,422 - root - INFO - Step 1150: lr=8.12E-06, loss= 1.0463 (max= 1.4399), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:42,422 - root - INFO - Step 1150: lr=8.12E-06, loss= 1.0463 (max= 1.4399), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:42,422 - root - INFO - Step 1150: lr=8.12E-06, loss= 1.0463 (max= 1.4399), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:06:42,422 - root - INFO - Step 1150: lr=8.12E-06, loss= 1.0463 (max= 1.4399), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:14,287 - root - INFO - Step 1160: lr=8.11E-06, loss= 1.0723 (max= 1.4733), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:14,287 - root - INFO - Step 1160: lr=8.11E-06, loss= 1.0723 (max= 1.4733), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:14,287 - root - INFO - Step 1160: lr=8.11E-06, loss= 1.0723 (max= 1.4733), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:14,287 - root - INFO - Step 1160: lr=8.11E-06, loss= 1.0723 (max= 1.4733), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:14,287 - root - INFO - Step 1160: lr=8.11E-06, loss= 1.0723 (max= 1.4733), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:14,287 - root - INFO - Step 1160: lr=8.11E-06, loss= 1.0723 (max= 1.4733), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:14,287 - root - INFO - Step 1160: lr=8.11E-06, loss= 1.0723 (max= 1.4733), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:14,287 - root - INFO - Step 1160: lr=8.11E-06, loss= 1.0723 (max= 1.4733), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:46,200 - root - INFO - Step 1170: lr=8.09E-06, loss= 1.0561 (max= 1.6901), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:46,200 - root - INFO - Step 1170: lr=8.09E-06, loss= 1.0561 (max= 1.6901), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:46,201 - root - INFO - Step 1170: lr=8.09E-06, loss= 1.0561 (max= 1.6901), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:46,201 - root - INFO - Step 1170: lr=8.09E-06, loss= 1.0561 (max= 1.6901), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:46,201 - root - INFO - Step 1170: lr=8.09E-06, loss= 1.0561 (max= 1.6901), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:46,201 - root - INFO - Step 1170: lr=8.09E-06, loss= 1.0561 (max= 1.6901), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:46,201 - root - INFO - Step 1170: lr=8.09E-06, loss= 1.0561 (max= 1.6901), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:07:46,201 - root - INFO - Step 1170: lr=8.09E-06, loss= 1.0561 (max= 1.6901), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:18,109 - root - INFO - Step 1180: lr=8.08E-06, loss= 1.0720 (max= 1.6009), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:18,109 - root - INFO - Step 1180: lr=8.08E-06, loss= 1.0720 (max= 1.6009), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:18,109 - root - INFO - Step 1180: lr=8.08E-06, loss= 1.0720 (max= 1.6009), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:18,109 - root - INFO - Step 1180: lr=8.08E-06, loss= 1.0720 (max= 1.6009), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:18,109 - root - INFO - Step 1180: lr=8.08E-06, loss= 1.0720 (max= 1.6009), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:18,109 - root - INFO - Step 1180: lr=8.08E-06, loss= 1.0720 (max= 1.6009), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:18,109 - root - INFO - Step 1180: lr=8.08E-06, loss= 1.0720 (max= 1.6009), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:18,109 - root - INFO - Step 1180: lr=8.08E-06, loss= 1.0720 (max= 1.6009), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:49,957 - root - INFO - Step 1190: lr=8.06E-06, loss= 1.0689 (max= 1.4970), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:49,957 - root - INFO - Step 1190: lr=8.06E-06, loss= 1.0689 (max= 1.4970), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:49,957 - root - INFO - Step 1190: lr=8.06E-06, loss= 1.0689 (max= 1.4970), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:49,957 - root - INFO - Step 1190: lr=8.06E-06, loss= 1.0689 (max= 1.4970), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:49,957 - root - INFO - Step 1190: lr=8.06E-06, loss= 1.0689 (max= 1.4970), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:49,957 - root - INFO - Step 1190: lr=8.06E-06, loss= 1.0689 (max= 1.4970), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:49,957 - root - INFO - Step 1190: lr=8.06E-06, loss= 1.0689 (max= 1.4970), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:08:49,957 - root - INFO - Step 1190: lr=8.06E-06, loss= 1.0689 (max= 1.4970), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:21,749 - root - INFO - Step 1200: lr=8.05E-06, loss= 1.0476 (max= 1.4065), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:21,749 - root - INFO - Step 1200: lr=8.05E-06, loss= 1.0476 (max= 1.4065), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:21,749 - root - INFO - Step 1200: lr=8.05E-06, loss= 1.0476 (max= 1.4065), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:21,749 - root - INFO - Step 1200: lr=8.05E-06, loss= 1.0476 (max= 1.4065), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:21,749 - root - INFO - Step 1200: lr=8.05E-06, loss= 1.0476 (max= 1.4065), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:21,749 - root - INFO - Step 1200: lr=8.05E-06, loss= 1.0476 (max= 1.4065), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:21,750 - root - INFO - Step 1200: lr=8.05E-06, loss= 1.0476 (max= 1.4065), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:21,750 - root - INFO - Step 1200: lr=8.05E-06, loss= 1.0476 (max= 1.4065), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:53,564 - root - INFO - Step 1210: lr=8.04E-06, loss= 1.0598 (max= 1.7333), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:53,564 - root - INFO - Step 1210: lr=8.04E-06, loss= 1.0598 (max= 1.7333), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:53,564 - root - INFO - Step 1210: lr=8.04E-06, loss= 1.0598 (max= 1.7333), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:53,565 - root - INFO - Step 1210: lr=8.04E-06, loss= 1.0598 (max= 1.7333), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:53,565 - root - INFO - Step 1210: lr=8.04E-06, loss= 1.0598 (max= 1.7333), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:53,565 - root - INFO - Step 1210: lr=8.04E-06, loss= 1.0598 (max= 1.7333), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:53,565 - root - INFO - Step 1210: lr=8.04E-06, loss= 1.0598 (max= 1.7333), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:09:53,565 - root - INFO - Step 1210: lr=8.04E-06, loss= 1.0598 (max= 1.7333), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:25,386 - root - INFO - Step 1220: lr=8.02E-06, loss= 1.0485 (max= 1.5921), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:25,386 - root - INFO - Step 1220: lr=8.02E-06, loss= 1.0485 (max= 1.5921), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:25,386 - root - INFO - Step 1220: lr=8.02E-06, loss= 1.0485 (max= 1.5921), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:25,386 - root - INFO - Step 1220: lr=8.02E-06, loss= 1.0485 (max= 1.5921), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:25,386 - root - INFO - Step 1220: lr=8.02E-06, loss= 1.0485 (max= 1.5921), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:25,386 - root - INFO - Step 1220: lr=8.02E-06, loss= 1.0485 (max= 1.5921), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:25,386 - root - INFO - Step 1220: lr=8.02E-06, loss= 1.0485 (max= 1.5921), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:25,386 - root - INFO - Step 1220: lr=8.02E-06, loss= 1.0485 (max= 1.5921), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:57,241 - root - INFO - Step 1230: lr=8.01E-06, loss= 1.0671 (max= 1.4946), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:57,241 - root - INFO - Step 1230: lr=8.01E-06, loss= 1.0671 (max= 1.4946), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:57,241 - root - INFO - Step 1230: lr=8.01E-06, loss= 1.0671 (max= 1.4946), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:57,241 - root - INFO - Step 1230: lr=8.01E-06, loss= 1.0671 (max= 1.4946), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:57,242 - root - INFO - Step 1230: lr=8.01E-06, loss= 1.0671 (max= 1.4946), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:57,242 - root - INFO - Step 1230: lr=8.01E-06, loss= 1.0671 (max= 1.4946), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:57,242 - root - INFO - Step 1230: lr=8.01E-06, loss= 1.0671 (max= 1.4946), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:10:57,242 - root - INFO - Step 1230: lr=8.01E-06, loss= 1.0671 (max= 1.4946), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:11:29,252 - root - INFO - Step 1240: lr=8.00E-06, loss= 1.0533 (max= 1.6632), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:11:29,252 - root - INFO - Step 1240: lr=8.00E-06, loss= 1.0533 (max= 1.6632), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:11:29,252 - root - INFO - Step 1240: lr=8.00E-06, loss= 1.0533 (max= 1.6632), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:11:29,252 - root - INFO - Step 1240: lr=8.00E-06, loss= 1.0533 (max= 1.6632), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:11:29,252 - root - INFO - Step 1240: lr=8.00E-06, loss= 1.0533 (max= 1.6632), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:11:29,253 - root - INFO - Step 1240: lr=8.00E-06, loss= 1.0533 (max= 1.6632), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:11:29,253 - root - INFO - Step 1240: lr=8.00E-06, loss= 1.0533 (max= 1.6632), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:11:29,253 - root - INFO - Step 1240: lr=8.00E-06, loss= 1.0533 (max= 1.6632), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:11:50,716 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:4518290 2025-10-26 09:12:01,153 - root - INFO - Step 1250: lr=7.98E-06, loss= 1.0617 (max= 1.5597), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:01,153 - root - INFO - Step 1250: lr=7.98E-06, loss= 1.0617 (max= 1.5597), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:01,153 - root - INFO - Step 1250: lr=7.98E-06, loss= 1.0617 (max= 1.5597), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:01,153 - root - INFO - Step 1250: lr=7.98E-06, loss= 1.0617 (max= 1.5597), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:01,153 - root - INFO - Step 1250: lr=7.98E-06, loss= 1.0617 (max= 1.5597), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:01,153 - root - INFO - Step 1250: lr=7.98E-06, loss= 1.0617 (max= 1.5597), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:01,153 - root - INFO - Step 1250: lr=7.98E-06, loss= 1.0617 (max= 1.5597), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:01,153 - root - INFO - Step 1250: lr=7.98E-06, loss= 1.0617 (max= 1.5597), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:33,094 - root - INFO - Step 1260: lr=7.97E-06, loss= 1.0721 (max= 1.6075), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:33,094 - root - INFO - Step 1260: lr=7.97E-06, loss= 1.0721 (max= 1.6075), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:33,094 - root - INFO - Step 1260: lr=7.97E-06, loss= 1.0721 (max= 1.6075), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:33,094 - root - INFO - Step 1260: lr=7.97E-06, loss= 1.0721 (max= 1.6075), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:33,094 - root - INFO - Step 1260: lr=7.97E-06, loss= 1.0721 (max= 1.6075), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:33,094 - root - INFO - Step 1260: lr=7.97E-06, loss= 1.0721 (max= 1.6075), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:33,094 - root - INFO - Step 1260: lr=7.97E-06, loss= 1.0721 (max= 1.6075), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:12:33,094 - root - INFO - Step 1260: lr=7.97E-06, loss= 1.0721 (max= 1.6075), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:04,897 - root - INFO - Step 1270: lr=7.96E-06, loss= 1.0543 (max= 1.5668), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:04,897 - root - INFO - Step 1270: lr=7.96E-06, loss= 1.0543 (max= 1.5668), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:04,897 - root - INFO - Step 1270: lr=7.96E-06, loss= 1.0543 (max= 1.5668), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:04,897 - root - INFO - Step 1270: lr=7.96E-06, loss= 1.0543 (max= 1.5668), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:04,897 - root - INFO - Step 1270: lr=7.96E-06, loss= 1.0543 (max= 1.5668), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:04,897 - root - INFO - Step 1270: lr=7.96E-06, loss= 1.0543 (max= 1.5668), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:04,897 - root - INFO - Step 1270: lr=7.96E-06, loss= 1.0543 (max= 1.5668), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:04,898 - root - INFO - Step 1270: lr=7.96E-06, loss= 1.0543 (max= 1.5668), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:36,949 - root - INFO - Step 1280: lr=7.94E-06, loss= 1.0907 (max= 1.5609), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:36,949 - root - INFO - Step 1280: lr=7.94E-06, loss= 1.0907 (max= 1.5609), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:36,949 - root - INFO - Step 1280: lr=7.94E-06, loss= 1.0907 (max= 1.5609), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:36,949 - root - INFO - Step 1280: lr=7.94E-06, loss= 1.0907 (max= 1.5609), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:36,949 - root - INFO - Step 1280: lr=7.94E-06, loss= 1.0907 (max= 1.5609), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:36,949 - root - INFO - Step 1280: lr=7.94E-06, loss= 1.0907 (max= 1.5609), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:36,949 - root - INFO - Step 1280: lr=7.94E-06, loss= 1.0907 (max= 1.5609), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:13:36,949 - root - INFO - Step 1280: lr=7.94E-06, loss= 1.0907 (max= 1.5609), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:08,843 - root - INFO - Step 1290: lr=7.93E-06, loss= 1.0501 (max= 1.5699), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:08,843 - root - INFO - Step 1290: lr=7.93E-06, loss= 1.0501 (max= 1.5699), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:08,843 - root - INFO - Step 1290: lr=7.93E-06, loss= 1.0501 (max= 1.5699), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:08,844 - root - INFO - Step 1290: lr=7.93E-06, loss= 1.0501 (max= 1.5699), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:08,844 - root - INFO - Step 1290: lr=7.93E-06, loss= 1.0501 (max= 1.5699), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:08,844 - root - INFO - Step 1290: lr=7.93E-06, loss= 1.0501 (max= 1.5699), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:08,844 - root - INFO - Step 1290: lr=7.93E-06, loss= 1.0501 (max= 1.5699), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:08,844 - root - INFO - Step 1290: lr=7.93E-06, loss= 1.0501 (max= 1.5699), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:40,733 - root - INFO - Step 1300: lr=7.92E-06, loss= 1.0705 (max= 1.6221), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:40,733 - root - INFO - Step 1300: lr=7.92E-06, loss= 1.0705 (max= 1.6221), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:40,733 - root - INFO - Step 1300: lr=7.92E-06, loss= 1.0705 (max= 1.6221), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:40,733 - root - INFO - Step 1300: lr=7.92E-06, loss= 1.0705 (max= 1.6221), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:40,733 - root - INFO - Step 1300: lr=7.92E-06, loss= 1.0705 (max= 1.6221), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:40,733 - root - INFO - Step 1300: lr=7.92E-06, loss= 1.0705 (max= 1.6221), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:40,733 - root - INFO - Step 1300: lr=7.92E-06, loss= 1.0705 (max= 1.6221), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:14:40,733 - root - INFO - Step 1300: lr=7.92E-06, loss= 1.0705 (max= 1.6221), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:12,556 - root - INFO - Step 1310: lr=7.90E-06, loss= 1.0584 (max= 1.4673), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:12,556 - root - INFO - Step 1310: lr=7.90E-06, loss= 1.0584 (max= 1.4673), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:12,556 - root - INFO - Step 1310: lr=7.90E-06, loss= 1.0584 (max= 1.4673), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:12,556 - root - INFO - Step 1310: lr=7.90E-06, loss= 1.0584 (max= 1.4673), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:12,556 - root - INFO - Step 1310: lr=7.90E-06, loss= 1.0584 (max= 1.4673), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:12,556 - root - INFO - Step 1310: lr=7.90E-06, loss= 1.0584 (max= 1.4673), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:12,556 - root - INFO - Step 1310: lr=7.90E-06, loss= 1.0584 (max= 1.4673), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:12,556 - root - INFO - Step 1310: lr=7.90E-06, loss= 1.0584 (max= 1.4673), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:44,523 - root - INFO - Step 1320: lr=7.89E-06, loss= 1.0673 (max= 1.6615), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:44,523 - root - INFO - Step 1320: lr=7.89E-06, loss= 1.0673 (max= 1.6615), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:44,523 - root - INFO - Step 1320: lr=7.89E-06, loss= 1.0673 (max= 1.6615), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:44,523 - root - INFO - Step 1320: lr=7.89E-06, loss= 1.0673 (max= 1.6615), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:44,523 - root - INFO - Step 1320: lr=7.89E-06, loss= 1.0673 (max= 1.6615), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:44,523 - root - INFO - Step 1320: lr=7.89E-06, loss= 1.0673 (max= 1.6615), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:44,523 - root - INFO - Step 1320: lr=7.89E-06, loss= 1.0673 (max= 1.6615), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:15:44,523 - root - INFO - Step 1320: lr=7.89E-06, loss= 1.0673 (max= 1.6615), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:16,316 - root - INFO - Step 1330: lr=7.88E-06, loss= 1.0527 (max= 1.5453), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:16,316 - root - INFO - Step 1330: lr=7.88E-06, loss= 1.0527 (max= 1.5453), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:16,316 - root - INFO - Step 1330: lr=7.88E-06, loss= 1.0527 (max= 1.5453), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:16,316 - root - INFO - Step 1330: lr=7.88E-06, loss= 1.0527 (max= 1.5453), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:16,316 - root - INFO - Step 1330: lr=7.88E-06, loss= 1.0527 (max= 1.5453), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:16,316 - root - INFO - Step 1330: lr=7.88E-06, loss= 1.0527 (max= 1.5453), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:16,316 - root - INFO - Step 1330: lr=7.88E-06, loss= 1.0527 (max= 1.5453), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:16,316 - root - INFO - Step 1330: lr=7.88E-06, loss= 1.0527 (max= 1.5453), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:48,197 - root - INFO - Step 1340: lr=7.86E-06, loss= 1.0483 (max= 1.4781), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:48,197 - root - INFO - Step 1340: lr=7.86E-06, loss= 1.0483 (max= 1.4781), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:48,197 - root - INFO - Step 1340: lr=7.86E-06, loss= 1.0483 (max= 1.4781), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:48,197 - root - INFO - Step 1340: lr=7.86E-06, loss= 1.0483 (max= 1.4781), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:48,197 - root - INFO - Step 1340: lr=7.86E-06, loss= 1.0483 (max= 1.4781), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:48,197 - root - INFO - Step 1340: lr=7.86E-06, loss= 1.0483 (max= 1.4781), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:48,197 - root - INFO - Step 1340: lr=7.86E-06, loss= 1.0483 (max= 1.4781), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:16:48,197 - root - INFO - Step 1340: lr=7.86E-06, loss= 1.0483 (max= 1.4781), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:20,047 - root - INFO - Step 1350: lr=7.85E-06, loss= 1.0403 (max= 1.5531), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:20,047 - root - INFO - Step 1350: lr=7.85E-06, loss= 1.0403 (max= 1.5531), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:20,047 - root - INFO - Step 1350: lr=7.85E-06, loss= 1.0403 (max= 1.5531), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:20,047 - root - INFO - Step 1350: lr=7.85E-06, loss= 1.0403 (max= 1.5531), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:20,047 - root - INFO - Step 1350: lr=7.85E-06, loss= 1.0403 (max= 1.5531), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:20,047 - root - INFO - Step 1350: lr=7.85E-06, loss= 1.0403 (max= 1.5531), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:20,047 - root - INFO - Step 1350: lr=7.85E-06, loss= 1.0403 (max= 1.5531), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:20,047 - root - INFO - Step 1350: lr=7.85E-06, loss= 1.0403 (max= 1.5531), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:51,950 - root - INFO - Step 1360: lr=7.84E-06, loss= 1.0780 (max= 1.7087), tps=20545, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:51,950 - root - INFO - Step 1360: lr=7.84E-06, loss= 1.0780 (max= 1.7087), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:51,950 - root - INFO - Step 1360: lr=7.84E-06, loss= 1.0780 (max= 1.7087), tps=20545, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:51,950 - root - INFO - Step 1360: lr=7.84E-06, loss= 1.0780 (max= 1.7087), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:51,950 - root - INFO - Step 1360: lr=7.84E-06, loss= 1.0780 (max= 1.7087), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:51,950 - root - INFO - Step 1360: lr=7.84E-06, loss= 1.0780 (max= 1.7087), tps=20545, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:51,950 - root - INFO - Step 1360: lr=7.84E-06, loss= 1.0780 (max= 1.7087), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:17:51,951 - root - INFO - Step 1360: lr=7.84E-06, loss= 1.0780 (max= 1.7087), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:23,783 - root - INFO - Step 1370: lr=7.83E-06, loss= 1.0459 (max= 1.6740), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:23,783 - root - INFO - Step 1370: lr=7.83E-06, loss= 1.0459 (max= 1.6740), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:23,783 - root - INFO - Step 1370: lr=7.83E-06, loss= 1.0459 (max= 1.6740), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:23,783 - root - INFO - Step 1370: lr=7.83E-06, loss= 1.0459 (max= 1.6740), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:23,783 - root - INFO - Step 1370: lr=7.83E-06, loss= 1.0459 (max= 1.6740), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:23,783 - root - INFO - Step 1370: lr=7.83E-06, loss= 1.0459 (max= 1.6740), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:23,783 - root - INFO - Step 1370: lr=7.83E-06, loss= 1.0459 (max= 1.6740), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:23,783 - root - INFO - Step 1370: lr=7.83E-06, loss= 1.0459 (max= 1.6740), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:55,636 - root - INFO - Step 1380: lr=7.81E-06, loss= 1.0520 (max= 1.5044), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:55,636 - root - INFO - Step 1380: lr=7.81E-06, loss= 1.0520 (max= 1.5044), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:55,636 - root - INFO - Step 1380: lr=7.81E-06, loss= 1.0520 (max= 1.5044), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:55,636 - root - INFO - Step 1380: lr=7.81E-06, loss= 1.0520 (max= 1.5044), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:55,636 - root - INFO - Step 1380: lr=7.81E-06, loss= 1.0520 (max= 1.5044), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:55,637 - root - INFO - Step 1380: lr=7.81E-06, loss= 1.0520 (max= 1.5044), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:55,637 - root - INFO - Step 1380: lr=7.81E-06, loss= 1.0520 (max= 1.5044), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:18:55,637 - root - INFO - Step 1380: lr=7.81E-06, loss= 1.0520 (max= 1.5044), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:27,432 - root - INFO - Step 1390: lr=7.80E-06, loss= 1.0404 (max= 1.5683), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:27,432 - root - INFO - Step 1390: lr=7.80E-06, loss= 1.0404 (max= 1.5683), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:27,432 - root - INFO - Step 1390: lr=7.80E-06, loss= 1.0404 (max= 1.5683), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:27,432 - root - INFO - Step 1390: lr=7.80E-06, loss= 1.0404 (max= 1.5683), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:27,432 - root - INFO - Step 1390: lr=7.80E-06, loss= 1.0404 (max= 1.5683), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:27,432 - root - INFO - Step 1390: lr=7.80E-06, loss= 1.0404 (max= 1.5683), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:27,432 - root - INFO - Step 1390: lr=7.80E-06, loss= 1.0404 (max= 1.5683), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:27,432 - root - INFO - Step 1390: lr=7.80E-06, loss= 1.0404 (max= 1.5683), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:59,408 - root - INFO - Step 1400: lr=7.79E-06, loss= 1.0739 (max= 1.7161), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:59,408 - root - INFO - Step 1400: lr=7.79E-06, loss= 1.0739 (max= 1.7161), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:59,408 - root - INFO - Step 1400: lr=7.79E-06, loss= 1.0739 (max= 1.7161), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:59,408 - root - INFO - Step 1400: lr=7.79E-06, loss= 1.0739 (max= 1.7161), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:59,408 - root - INFO - Step 1400: lr=7.79E-06, loss= 1.0739 (max= 1.7161), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:59,408 - root - INFO - Step 1400: lr=7.79E-06, loss= 1.0739 (max= 1.7161), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:59,408 - root - INFO - Step 1400: lr=7.79E-06, loss= 1.0739 (max= 1.7161), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:19:59,408 - root - INFO - Step 1400: lr=7.79E-06, loss= 1.0739 (max= 1.7161), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:20:31,232 - root - INFO - Step 1410: lr=7.78E-06, loss= 1.0460 (max= 1.5843), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:20:31,232 - root - INFO - Step 1410: lr=7.78E-06, loss= 1.0460 (max= 1.5843), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:20:31,232 - root - INFO - Step 1410: lr=7.78E-06, loss= 1.0460 (max= 1.5843), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:20:31,232 - root - INFO - Step 1410: lr=7.78E-06, loss= 1.0460 (max= 1.5843), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:20:31,232 - root - INFO - Step 1410: lr=7.78E-06, loss= 1.0460 (max= 1.5843), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:20:31,232 - root - INFO - Step 1410: lr=7.78E-06, loss= 1.0460 (max= 1.5843), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:20:31,232 - root - INFO - Step 1410: lr=7.78E-06, loss= 1.0460 (max= 1.5843), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:20:31,232 - root - INFO - Step 1410: lr=7.78E-06, loss= 1.0460 (max= 1.5843), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:03,095 - root - INFO - Step 1420: lr=7.77E-06, loss= 1.0569 (max= 1.4858), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:03,095 - root - INFO - Step 1420: lr=7.77E-06, loss= 1.0569 (max= 1.4858), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:03,095 - root - INFO - Step 1420: lr=7.77E-06, loss= 1.0569 (max= 1.4858), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:03,095 - root - INFO - Step 1420: lr=7.77E-06, loss= 1.0569 (max= 1.4858), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:03,095 - root - INFO - Step 1420: lr=7.77E-06, loss= 1.0569 (max= 1.4858), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:03,095 - root - INFO - Step 1420: lr=7.77E-06, loss= 1.0569 (max= 1.4858), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:03,095 - root - INFO - Step 1420: lr=7.77E-06, loss= 1.0569 (max= 1.4858), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:03,096 - root - INFO - Step 1420: lr=7.77E-06, loss= 1.0569 (max= 1.4858), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:34,954 - root - INFO - Step 1430: lr=7.75E-06, loss= 1.0705 (max= 1.5658), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:34,954 - root - INFO - Step 1430: lr=7.75E-06, loss= 1.0705 (max= 1.5658), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:34,954 - root - INFO - Step 1430: lr=7.75E-06, loss= 1.0705 (max= 1.5658), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:34,954 - root - INFO - Step 1430: lr=7.75E-06, loss= 1.0705 (max= 1.5658), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:34,954 - root - INFO - Step 1430: lr=7.75E-06, loss= 1.0705 (max= 1.5658), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:34,954 - root - INFO - Step 1430: lr=7.75E-06, loss= 1.0705 (max= 1.5658), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:34,954 - root - INFO - Step 1430: lr=7.75E-06, loss= 1.0705 (max= 1.5658), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:21:34,954 - root - INFO - Step 1430: lr=7.75E-06, loss= 1.0705 (max= 1.5658), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:06,745 - root - INFO - Step 1440: lr=7.74E-06, loss= 1.0299 (max= 1.3891), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:06,745 - root - INFO - Step 1440: lr=7.74E-06, loss= 1.0299 (max= 1.3891), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:06,745 - root - INFO - Step 1440: lr=7.74E-06, loss= 1.0299 (max= 1.3891), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:06,745 - root - INFO - Step 1440: lr=7.74E-06, loss= 1.0299 (max= 1.3891), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:06,745 - root - INFO - Step 1440: lr=7.74E-06, loss= 1.0299 (max= 1.3891), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:06,745 - root - INFO - Step 1440: lr=7.74E-06, loss= 1.0299 (max= 1.3891), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:06,745 - root - INFO - Step 1440: lr=7.74E-06, loss= 1.0299 (max= 1.3891), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:06,745 - root - INFO - Step 1440: lr=7.74E-06, loss= 1.0299 (max= 1.3891), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:38,621 - root - INFO - Step 1450: lr=7.73E-06, loss= 1.0661 (max= 1.4963), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:38,621 - root - INFO - Step 1450: lr=7.73E-06, loss= 1.0661 (max= 1.4963), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:38,621 - root - INFO - Step 1450: lr=7.73E-06, loss= 1.0661 (max= 1.4963), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:38,621 - root - INFO - Step 1450: lr=7.73E-06, loss= 1.0661 (max= 1.4963), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:38,621 - root - INFO - Step 1450: lr=7.73E-06, loss= 1.0661 (max= 1.4963), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:38,621 - root - INFO - Step 1450: lr=7.73E-06, loss= 1.0661 (max= 1.4963), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:38,622 - root - INFO - Step 1450: lr=7.73E-06, loss= 1.0661 (max= 1.4963), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:22:38,622 - root - INFO - Step 1450: lr=7.73E-06, loss= 1.0661 (max= 1.4963), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:10,462 - root - INFO - Step 1460: lr=7.72E-06, loss= 1.0805 (max= 1.5118), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:10,462 - root - INFO - Step 1460: lr=7.72E-06, loss= 1.0805 (max= 1.5118), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:10,462 - root - INFO - Step 1460: lr=7.72E-06, loss= 1.0805 (max= 1.5118), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:10,462 - root - INFO - Step 1460: lr=7.72E-06, loss= 1.0805 (max= 1.5118), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:10,462 - root - INFO - Step 1460: lr=7.72E-06, loss= 1.0805 (max= 1.5118), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:10,462 - root - INFO - Step 1460: lr=7.72E-06, loss= 1.0805 (max= 1.5118), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:10,462 - root - INFO - Step 1460: lr=7.72E-06, loss= 1.0805 (max= 1.5118), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:10,462 - root - INFO - Step 1460: lr=7.72E-06, loss= 1.0805 (max= 1.5118), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:42,316 - root - INFO - Step 1470: lr=7.71E-06, loss= 1.0459 (max= 1.6607), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:42,316 - root - INFO - Step 1470: lr=7.71E-06, loss= 1.0459 (max= 1.6607), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:42,317 - root - INFO - Step 1470: lr=7.71E-06, loss= 1.0459 (max= 1.6607), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:42,317 - root - INFO - Step 1470: lr=7.71E-06, loss= 1.0459 (max= 1.6607), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:42,317 - root - INFO - Step 1470: lr=7.71E-06, loss= 1.0459 (max= 1.6607), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:42,317 - root - INFO - Step 1470: lr=7.71E-06, loss= 1.0459 (max= 1.6607), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:42,317 - root - INFO - Step 1470: lr=7.71E-06, loss= 1.0459 (max= 1.6607), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:23:42,317 - root - INFO - Step 1470: lr=7.71E-06, loss= 1.0459 (max= 1.6607), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:14,374 - root - INFO - Step 1480: lr=7.69E-06, loss= 1.0485 (max= 1.4897), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:14,374 - root - INFO - Step 1480: lr=7.69E-06, loss= 1.0485 (max= 1.4897), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:14,374 - root - INFO - Step 1480: lr=7.69E-06, loss= 1.0485 (max= 1.4897), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:14,375 - root - INFO - Step 1480: lr=7.69E-06, loss= 1.0485 (max= 1.4897), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:14,375 - root - INFO - Step 1480: lr=7.69E-06, loss= 1.0485 (max= 1.4897), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:14,375 - root - INFO - Step 1480: lr=7.69E-06, loss= 1.0485 (max= 1.4897), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:14,375 - root - INFO - Step 1480: lr=7.69E-06, loss= 1.0485 (max= 1.4897), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:14,375 - root - INFO - Step 1480: lr=7.69E-06, loss= 1.0485 (max= 1.4897), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:46,248 - root - INFO - Step 1490: lr=7.68E-06, loss= 1.0620 (max= 1.5111), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:46,248 - root - INFO - Step 1490: lr=7.68E-06, loss= 1.0620 (max= 1.5111), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:46,248 - root - INFO - Step 1490: lr=7.68E-06, loss= 1.0620 (max= 1.5111), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:46,248 - root - INFO - Step 1490: lr=7.68E-06, loss= 1.0620 (max= 1.5111), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:46,248 - root - INFO - Step 1490: lr=7.68E-06, loss= 1.0620 (max= 1.5111), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:46,248 - root - INFO - Step 1490: lr=7.68E-06, loss= 1.0620 (max= 1.5111), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:46,248 - root - INFO - Step 1490: lr=7.68E-06, loss= 1.0620 (max= 1.5111), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:24:46,248 - root - INFO - Step 1490: lr=7.68E-06, loss= 1.0620 (max= 1.5111), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:18,109 - root - INFO - Step 1500: lr=7.67E-06, loss= 1.0506 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:18,109 - root - INFO - Step 1500: lr=7.67E-06, loss= 1.0506 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:18,110 - root - INFO - Step 1500: lr=7.67E-06, loss= 1.0506 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:18,110 - root - INFO - Step 1500: lr=7.67E-06, loss= 1.0506 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:18,110 - root - INFO - Step 1500: lr=7.67E-06, loss= 1.0506 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:18,110 - root - INFO - Step 1500: lr=7.67E-06, loss= 1.0506 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:18,110 - root - INFO - Step 1500: lr=7.67E-06, loss= 1.0506 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:18,110 - root - INFO - Step 1500: lr=7.67E-06, loss= 1.0506 (max= 1.6106), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:50,037 - root - INFO - Step 1510: lr=7.66E-06, loss= 1.0607 (max= 1.5015), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:50,037 - root - INFO - Step 1510: lr=7.66E-06, loss= 1.0607 (max= 1.5015), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:50,038 - root - INFO - Step 1510: lr=7.66E-06, loss= 1.0607 (max= 1.5015), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:50,038 - root - INFO - Step 1510: lr=7.66E-06, loss= 1.0607 (max= 1.5015), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:50,038 - root - INFO - Step 1510: lr=7.66E-06, loss= 1.0607 (max= 1.5015), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:50,038 - root - INFO - Step 1510: lr=7.66E-06, loss= 1.0607 (max= 1.5015), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:50,038 - root - INFO - Step 1510: lr=7.66E-06, loss= 1.0607 (max= 1.5015), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:25:50,038 - root - INFO - Step 1510: lr=7.66E-06, loss= 1.0607 (max= 1.5015), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:21,866 - root - INFO - Step 1520: lr=7.65E-06, loss= 1.0804 (max= 1.4500), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:21,866 - root - INFO - Step 1520: lr=7.65E-06, loss= 1.0804 (max= 1.4500), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:21,866 - root - INFO - Step 1520: lr=7.65E-06, loss= 1.0804 (max= 1.4500), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:21,866 - root - INFO - Step 1520: lr=7.65E-06, loss= 1.0804 (max= 1.4500), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:21,866 - root - INFO - Step 1520: lr=7.65E-06, loss= 1.0804 (max= 1.4500), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:21,866 - root - INFO - Step 1520: lr=7.65E-06, loss= 1.0804 (max= 1.4500), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:21,866 - root - INFO - Step 1520: lr=7.65E-06, loss= 1.0804 (max= 1.4500), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:21,866 - root - INFO - Step 1520: lr=7.65E-06, loss= 1.0804 (max= 1.4500), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:53,727 - root - INFO - Step 1530: lr=7.64E-06, loss= 1.0790 (max= 1.6023), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:53,727 - root - INFO - Step 1530: lr=7.64E-06, loss= 1.0790 (max= 1.6023), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:53,727 - root - INFO - Step 1530: lr=7.64E-06, loss= 1.0790 (max= 1.6023), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:53,727 - root - INFO - Step 1530: lr=7.64E-06, loss= 1.0790 (max= 1.6023), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:53,727 - root - INFO - Step 1530: lr=7.64E-06, loss= 1.0790 (max= 1.6023), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:53,727 - root - INFO - Step 1530: lr=7.64E-06, loss= 1.0790 (max= 1.6023), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:53,727 - root - INFO - Step 1530: lr=7.64E-06, loss= 1.0790 (max= 1.6023), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:26:53,727 - root - INFO - Step 1530: lr=7.64E-06, loss= 1.0790 (max= 1.6023), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:25,627 - root - INFO - Step 1540: lr=7.62E-06, loss= 1.1118 (max= 1.5048), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:25,627 - root - INFO - Step 1540: lr=7.62E-06, loss= 1.1118 (max= 1.5048), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:25,627 - root - INFO - Step 1540: lr=7.62E-06, loss= 1.1118 (max= 1.5048), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:25,627 - root - INFO - Step 1540: lr=7.62E-06, loss= 1.1118 (max= 1.5048), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:25,627 - root - INFO - Step 1540: lr=7.62E-06, loss= 1.1118 (max= 1.5048), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:25,627 - root - INFO - Step 1540: lr=7.62E-06, loss= 1.1118 (max= 1.5048), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:25,627 - root - INFO - Step 1540: lr=7.62E-06, loss= 1.1118 (max= 1.5048), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:25,627 - root - INFO - Step 1540: lr=7.62E-06, loss= 1.1118 (max= 1.5048), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:57,467 - root - INFO - Step 1550: lr=7.61E-06, loss= 1.0518 (max= 1.4690), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:57,467 - root - INFO - Step 1550: lr=7.61E-06, loss= 1.0518 (max= 1.4690), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:57,467 - root - INFO - Step 1550: lr=7.61E-06, loss= 1.0518 (max= 1.4690), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:57,467 - root - INFO - Step 1550: lr=7.61E-06, loss= 1.0518 (max= 1.4690), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:57,467 - root - INFO - Step 1550: lr=7.61E-06, loss= 1.0518 (max= 1.4690), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:57,467 - root - INFO - Step 1550: lr=7.61E-06, loss= 1.0518 (max= 1.4690), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:57,467 - root - INFO - Step 1550: lr=7.61E-06, loss= 1.0518 (max= 1.4690), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:27:57,467 - root - INFO - Step 1550: lr=7.61E-06, loss= 1.0518 (max= 1.4690), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:28:29,350 - root - INFO - Step 1560: lr=7.60E-06, loss= 1.0781 (max= 1.5374), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:28:29,350 - root - INFO - Step 1560: lr=7.60E-06, loss= 1.0781 (max= 1.5374), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:28:29,350 - root - INFO - Step 1560: lr=7.60E-06, loss= 1.0781 (max= 1.5374), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:28:29,350 - root - INFO - Step 1560: lr=7.60E-06, loss= 1.0781 (max= 1.5374), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:28:29,350 - root - INFO - Step 1560: lr=7.60E-06, loss= 1.0781 (max= 1.5374), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:28:29,350 - root - INFO - Step 1560: lr=7.60E-06, loss= 1.0781 (max= 1.5374), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:28:29,350 - root - INFO - Step 1560: lr=7.60E-06, loss= 1.0781 (max= 1.5374), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:28:29,350 - root - INFO - Step 1560: lr=7.60E-06, loss= 1.0781 (max= 1.5374), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:01,170 - root - INFO - Step 1570: lr=7.59E-06, loss= 1.0499 (max= 1.5281), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:01,170 - root - INFO - Step 1570: lr=7.59E-06, loss= 1.0499 (max= 1.5281), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:01,170 - root - INFO - Step 1570: lr=7.59E-06, loss= 1.0499 (max= 1.5281), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:01,170 - root - INFO - Step 1570: lr=7.59E-06, loss= 1.0499 (max= 1.5281), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:01,170 - root - INFO - Step 1570: lr=7.59E-06, loss= 1.0499 (max= 1.5281), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:01,170 - root - INFO - Step 1570: lr=7.59E-06, loss= 1.0499 (max= 1.5281), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:01,170 - root - INFO - Step 1570: lr=7.59E-06, loss= 1.0499 (max= 1.5281), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:01,171 - root - INFO - Step 1570: lr=7.59E-06, loss= 1.0499 (max= 1.5281), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:33,225 - root - INFO - Step 1580: lr=7.58E-06, loss= 1.0705 (max= 1.4521), tps=20447, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:33,226 - root - INFO - Step 1580: lr=7.58E-06, loss= 1.0705 (max= 1.4521), tps=20447, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:33,226 - root - INFO - Step 1580: lr=7.58E-06, loss= 1.0705 (max= 1.4521), tps=20447, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:33,226 - root - INFO - Step 1580: lr=7.58E-06, loss= 1.0705 (max= 1.4521), tps=20447, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:33,226 - root - INFO - Step 1580: lr=7.58E-06, loss= 1.0705 (max= 1.4521), tps=20447, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:33,226 - root - INFO - Step 1580: lr=7.58E-06, loss= 1.0705 (max= 1.4521), tps=20447, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:33,226 - root - INFO - Step 1580: lr=7.58E-06, loss= 1.0705 (max= 1.4521), tps=20447, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:29:33,226 - root - INFO - Step 1580: lr=7.58E-06, loss= 1.0705 (max= 1.4521), tps=20447, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:05,065 - root - INFO - Step 1590: lr=7.57E-06, loss= 1.0647 (max= 1.5078), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:05,065 - root - INFO - Step 1590: lr=7.57E-06, loss= 1.0647 (max= 1.5078), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:05,065 - root - INFO - Step 1590: lr=7.57E-06, loss= 1.0647 (max= 1.5078), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:05,065 - root - INFO - Step 1590: lr=7.57E-06, loss= 1.0647 (max= 1.5078), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:05,065 - root - INFO - Step 1590: lr=7.57E-06, loss= 1.0647 (max= 1.5078), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:05,065 - root - INFO - Step 1590: lr=7.57E-06, loss= 1.0647 (max= 1.5078), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:05,065 - root - INFO - Step 1590: lr=7.57E-06, loss= 1.0647 (max= 1.5078), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:05,065 - root - INFO - Step 1590: lr=7.57E-06, loss= 1.0647 (max= 1.5078), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:36,948 - root - INFO - Step 1600: lr=7.56E-06, loss= 1.0748 (max= 1.5834), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:36,948 - root - INFO - Step 1600: lr=7.56E-06, loss= 1.0748 (max= 1.5834), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:36,948 - root - INFO - Step 1600: lr=7.56E-06, loss= 1.0748 (max= 1.5834), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:36,948 - root - INFO - Step 1600: lr=7.56E-06, loss= 1.0748 (max= 1.5834), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:36,948 - root - INFO - Step 1600: lr=7.56E-06, loss= 1.0748 (max= 1.5834), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:36,948 - root - INFO - Step 1600: lr=7.56E-06, loss= 1.0748 (max= 1.5834), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:36,948 - root - INFO - Step 1600: lr=7.56E-06, loss= 1.0748 (max= 1.5834), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:30:36,948 - root - INFO - Step 1600: lr=7.56E-06, loss= 1.0748 (max= 1.5834), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:08,784 - root - INFO - Step 1610: lr=7.55E-06, loss= 1.0740 (max= 1.8074), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:08,784 - root - INFO - Step 1610: lr=7.55E-06, loss= 1.0740 (max= 1.8074), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:08,785 - root - INFO - Step 1610: lr=7.55E-06, loss= 1.0740 (max= 1.8074), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:08,785 - root - INFO - Step 1610: lr=7.55E-06, loss= 1.0740 (max= 1.8074), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:08,785 - root - INFO - Step 1610: lr=7.55E-06, loss= 1.0740 (max= 1.8074), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:08,785 - root - INFO - Step 1610: lr=7.55E-06, loss= 1.0740 (max= 1.8074), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:08,785 - root - INFO - Step 1610: lr=7.55E-06, loss= 1.0740 (max= 1.8074), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:08,785 - root - INFO - Step 1610: lr=7.55E-06, loss= 1.0740 (max= 1.8074), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:40,724 - root - INFO - Step 1620: lr=7.53E-06, loss= 1.0530 (max= 1.5649), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:40,724 - root - INFO - Step 1620: lr=7.53E-06, loss= 1.0530 (max= 1.5649), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:40,724 - root - INFO - Step 1620: lr=7.53E-06, loss= 1.0530 (max= 1.5649), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:40,724 - root - INFO - Step 1620: lr=7.53E-06, loss= 1.0530 (max= 1.5649), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:40,724 - root - INFO - Step 1620: lr=7.53E-06, loss= 1.0530 (max= 1.5649), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:40,724 - root - INFO - Step 1620: lr=7.53E-06, loss= 1.0530 (max= 1.5649), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:40,724 - root - INFO - Step 1620: lr=7.53E-06, loss= 1.0530 (max= 1.5649), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:31:40,724 - root - INFO - Step 1620: lr=7.53E-06, loss= 1.0530 (max= 1.5649), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:12,582 - root - INFO - Step 1630: lr=7.52E-06, loss= 1.0338 (max= 1.4569), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:12,582 - root - INFO - Step 1630: lr=7.52E-06, loss= 1.0338 (max= 1.4569), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:12,582 - root - INFO - Step 1630: lr=7.52E-06, loss= 1.0338 (max= 1.4569), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:12,582 - root - INFO - Step 1630: lr=7.52E-06, loss= 1.0338 (max= 1.4569), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:12,582 - root - INFO - Step 1630: lr=7.52E-06, loss= 1.0338 (max= 1.4569), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:12,582 - root - INFO - Step 1630: lr=7.52E-06, loss= 1.0338 (max= 1.4569), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:12,582 - root - INFO - Step 1630: lr=7.52E-06, loss= 1.0338 (max= 1.4569), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:12,582 - root - INFO - Step 1630: lr=7.52E-06, loss= 1.0338 (max= 1.4569), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:44,551 - root - INFO - Step 1640: lr=7.51E-06, loss= 1.0509 (max= 1.5192), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:44,551 - root - INFO - Step 1640: lr=7.51E-06, loss= 1.0509 (max= 1.5192), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:44,551 - root - INFO - Step 1640: lr=7.51E-06, loss= 1.0509 (max= 1.5192), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:44,551 - root - INFO - Step 1640: lr=7.51E-06, loss= 1.0509 (max= 1.5192), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:44,551 - root - INFO - Step 1640: lr=7.51E-06, loss= 1.0509 (max= 1.5192), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:44,551 - root - INFO - Step 1640: lr=7.51E-06, loss= 1.0509 (max= 1.5192), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:44,551 - root - INFO - Step 1640: lr=7.51E-06, loss= 1.0509 (max= 1.5192), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:32:44,551 - root - INFO - Step 1640: lr=7.51E-06, loss= 1.0509 (max= 1.5192), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:16,613 - root - INFO - Step 1650: lr=7.50E-06, loss= 1.0407 (max= 1.5201), tps=20443, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:16,613 - root - INFO - Step 1650: lr=7.50E-06, loss= 1.0407 (max= 1.5201), tps=20443, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:16,613 - root - INFO - Step 1650: lr=7.50E-06, loss= 1.0407 (max= 1.5201), tps=20443, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:16,613 - root - INFO - Step 1650: lr=7.50E-06, loss= 1.0407 (max= 1.5201), tps=20443, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:16,613 - root - INFO - Step 1650: lr=7.50E-06, loss= 1.0407 (max= 1.5201), tps=20443, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:16,613 - root - INFO - Step 1650: lr=7.50E-06, loss= 1.0407 (max= 1.5201), tps=20443, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:16,613 - root - INFO - Step 1650: lr=7.50E-06, loss= 1.0407 (max= 1.5201), tps=20443, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:16,613 - root - INFO - Step 1650: lr=7.50E-06, loss= 1.0407 (max= 1.5201), tps=20442, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:48,463 - root - INFO - Step 1660: lr=7.49E-06, loss= 1.0610 (max= 1.4714), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:48,463 - root - INFO - Step 1660: lr=7.49E-06, loss= 1.0610 (max= 1.4714), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:48,463 - root - INFO - Step 1660: lr=7.49E-06, loss= 1.0610 (max= 1.4714), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:48,463 - root - INFO - Step 1660: lr=7.49E-06, loss= 1.0610 (max= 1.4714), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:48,463 - root - INFO - Step 1660: lr=7.49E-06, loss= 1.0610 (max= 1.4714), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:48,463 - root - INFO - Step 1660: lr=7.49E-06, loss= 1.0610 (max= 1.4714), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:48,463 - root - INFO - Step 1660: lr=7.49E-06, loss= 1.0610 (max= 1.4714), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:33:48,463 - root - INFO - Step 1660: lr=7.49E-06, loss= 1.0610 (max= 1.4714), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:20,385 - root - INFO - Step 1670: lr=7.48E-06, loss= 1.0519 (max= 1.4947), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:20,385 - root - INFO - Step 1670: lr=7.48E-06, loss= 1.0519 (max= 1.4947), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:20,386 - root - INFO - Step 1670: lr=7.48E-06, loss= 1.0519 (max= 1.4947), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:20,386 - root - INFO - Step 1670: lr=7.48E-06, loss= 1.0519 (max= 1.4947), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:20,386 - root - INFO - Step 1670: lr=7.48E-06, loss= 1.0519 (max= 1.4947), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:20,386 - root - INFO - Step 1670: lr=7.48E-06, loss= 1.0519 (max= 1.4947), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:20,386 - root - INFO - Step 1670: lr=7.48E-06, loss= 1.0519 (max= 1.4947), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:20,386 - root - INFO - Step 1670: lr=7.48E-06, loss= 1.0519 (max= 1.4947), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:52,392 - root - INFO - Step 1680: lr=7.47E-06, loss= 1.0573 (max= 1.5479), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:52,392 - root - INFO - Step 1680: lr=7.47E-06, loss= 1.0573 (max= 1.5479), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:52,392 - root - INFO - Step 1680: lr=7.47E-06, loss= 1.0573 (max= 1.5479), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:52,392 - root - INFO - Step 1680: lr=7.47E-06, loss= 1.0573 (max= 1.5479), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:52,392 - root - INFO - Step 1680: lr=7.47E-06, loss= 1.0573 (max= 1.5479), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:52,392 - root - INFO - Step 1680: lr=7.47E-06, loss= 1.0573 (max= 1.5479), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:52,392 - root - INFO - Step 1680: lr=7.47E-06, loss= 1.0573 (max= 1.5479), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:34:52,392 - root - INFO - Step 1680: lr=7.47E-06, loss= 1.0573 (max= 1.5479), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:24,460 - root - INFO - Step 1690: lr=7.46E-06, loss= 1.0635 (max= 1.5943), tps=20439, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:24,460 - root - INFO - Step 1690: lr=7.46E-06, loss= 1.0635 (max= 1.5943), tps=20439, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:24,460 - root - INFO - Step 1690: lr=7.46E-06, loss= 1.0635 (max= 1.5943), tps=20439, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:24,460 - root - INFO - Step 1690: lr=7.46E-06, loss= 1.0635 (max= 1.5943), tps=20439, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:24,460 - root - INFO - Step 1690: lr=7.46E-06, loss= 1.0635 (max= 1.5943), tps=20439, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:24,460 - root - INFO - Step 1690: lr=7.46E-06, loss= 1.0635 (max= 1.5943), tps=20439, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:24,460 - root - INFO - Step 1690: lr=7.46E-06, loss= 1.0635 (max= 1.5943), tps=20439, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:24,460 - root - INFO - Step 1690: lr=7.46E-06, loss= 1.0635 (max= 1.5943), tps=20439, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:56,256 - root - INFO - Step 1700: lr=7.45E-06, loss= 1.0633 (max= 1.4521), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:56,256 - root - INFO - Step 1700: lr=7.45E-06, loss= 1.0633 (max= 1.4521), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:56,256 - root - INFO - Step 1700: lr=7.45E-06, loss= 1.0633 (max= 1.4521), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:56,256 - root - INFO - Step 1700: lr=7.45E-06, loss= 1.0633 (max= 1.4521), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:56,256 - root - INFO - Step 1700: lr=7.45E-06, loss= 1.0633 (max= 1.4521), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:56,256 - root - INFO - Step 1700: lr=7.45E-06, loss= 1.0633 (max= 1.4521), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:56,256 - root - INFO - Step 1700: lr=7.45E-06, loss= 1.0633 (max= 1.4521), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:35:56,256 - root - INFO - Step 1700: lr=7.45E-06, loss= 1.0633 (max= 1.4521), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:36:28,369 - root - INFO - Step 1710: lr=7.44E-06, loss= 1.0696 (max= 1.4721), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:36:28,369 - root - INFO - Step 1710: lr=7.44E-06, loss= 1.0696 (max= 1.4721), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:36:28,369 - root - INFO - Step 1710: lr=7.44E-06, loss= 1.0696 (max= 1.4721), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:36:28,369 - root - INFO - Step 1710: lr=7.44E-06, loss= 1.0696 (max= 1.4721), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:36:28,369 - root - INFO - Step 1710: lr=7.44E-06, loss= 1.0696 (max= 1.4721), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:36:28,369 - root - INFO - Step 1710: lr=7.44E-06, loss= 1.0696 (max= 1.4721), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:36:28,370 - root - INFO - Step 1710: lr=7.44E-06, loss= 1.0696 (max= 1.4721), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:36:28,370 - root - INFO - Step 1710: lr=7.44E-06, loss= 1.0696 (max= 1.4721), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:00,121 - root - INFO - Step 1720: lr=7.43E-06, loss= 1.0632 (max= 1.5180), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:00,121 - root - INFO - Step 1720: lr=7.43E-06, loss= 1.0632 (max= 1.5180), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:00,121 - root - INFO - Step 1720: lr=7.43E-06, loss= 1.0632 (max= 1.5180), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:00,121 - root - INFO - Step 1720: lr=7.43E-06, loss= 1.0632 (max= 1.5180), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:00,121 - root - INFO - Step 1720: lr=7.43E-06, loss= 1.0632 (max= 1.5180), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:00,121 - root - INFO - Step 1720: lr=7.43E-06, loss= 1.0632 (max= 1.5180), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:00,121 - root - INFO - Step 1720: lr=7.43E-06, loss= 1.0632 (max= 1.5180), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:00,122 - root - INFO - Step 1720: lr=7.43E-06, loss= 1.0632 (max= 1.5180), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:32,003 - root - INFO - Step 1730: lr=7.42E-06, loss= 1.0830 (max= 1.5811), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:32,003 - root - INFO - Step 1730: lr=7.42E-06, loss= 1.0830 (max= 1.5811), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:32,003 - root - INFO - Step 1730: lr=7.42E-06, loss= 1.0830 (max= 1.5811), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:32,003 - root - INFO - Step 1730: lr=7.42E-06, loss= 1.0830 (max= 1.5811), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:32,003 - root - INFO - Step 1730: lr=7.42E-06, loss= 1.0830 (max= 1.5811), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:32,003 - root - INFO - Step 1730: lr=7.42E-06, loss= 1.0830 (max= 1.5811), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:32,003 - root - INFO - Step 1730: lr=7.42E-06, loss= 1.0830 (max= 1.5811), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:37:32,003 - root - INFO - Step 1730: lr=7.42E-06, loss= 1.0830 (max= 1.5811), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:03,834 - root - INFO - Step 1740: lr=7.41E-06, loss= 1.0552 (max= 1.4731), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:03,834 - root - INFO - Step 1740: lr=7.41E-06, loss= 1.0552 (max= 1.4731), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:03,834 - root - INFO - Step 1740: lr=7.41E-06, loss= 1.0552 (max= 1.4731), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:03,834 - root - INFO - Step 1740: lr=7.41E-06, loss= 1.0552 (max= 1.4731), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:03,834 - root - INFO - Step 1740: lr=7.41E-06, loss= 1.0552 (max= 1.4731), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:03,834 - root - INFO - Step 1740: lr=7.41E-06, loss= 1.0552 (max= 1.4731), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:03,834 - root - INFO - Step 1740: lr=7.41E-06, loss= 1.0552 (max= 1.4731), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:03,834 - root - INFO - Step 1740: lr=7.41E-06, loss= 1.0552 (max= 1.4731), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:35,824 - root - INFO - Step 1750: lr=7.40E-06, loss= 1.0520 (max= 1.6438), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:35,824 - root - INFO - Step 1750: lr=7.40E-06, loss= 1.0520 (max= 1.6438), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:35,824 - root - INFO - Step 1750: lr=7.40E-06, loss= 1.0520 (max= 1.6438), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:35,824 - root - INFO - Step 1750: lr=7.40E-06, loss= 1.0520 (max= 1.6438), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:35,824 - root - INFO - Step 1750: lr=7.40E-06, loss= 1.0520 (max= 1.6438), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:35,824 - root - INFO - Step 1750: lr=7.40E-06, loss= 1.0520 (max= 1.6438), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:35,824 - root - INFO - Step 1750: lr=7.40E-06, loss= 1.0520 (max= 1.6438), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:38:35,824 - root - INFO - Step 1750: lr=7.40E-06, loss= 1.0520 (max= 1.6438), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:07,719 - root - INFO - Step 1760: lr=7.39E-06, loss= 1.0567 (max= 1.4025), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:07,719 - root - INFO - Step 1760: lr=7.39E-06, loss= 1.0567 (max= 1.4025), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:07,719 - root - INFO - Step 1760: lr=7.39E-06, loss= 1.0567 (max= 1.4025), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:07,719 - root - INFO - Step 1760: lr=7.39E-06, loss= 1.0567 (max= 1.4025), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:07,719 - root - INFO - Step 1760: lr=7.39E-06, loss= 1.0567 (max= 1.4025), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:07,719 - root - INFO - Step 1760: lr=7.39E-06, loss= 1.0567 (max= 1.4025), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:07,719 - root - INFO - Step 1760: lr=7.39E-06, loss= 1.0567 (max= 1.4025), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:07,719 - root - INFO - Step 1760: lr=7.39E-06, loss= 1.0567 (max= 1.4025), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:39,495 - root - INFO - Step 1770: lr=7.37E-06, loss= 1.0406 (max= 1.4786), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:39,495 - root - INFO - Step 1770: lr=7.37E-06, loss= 1.0406 (max= 1.4786), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:39,495 - root - INFO - Step 1770: lr=7.37E-06, loss= 1.0406 (max= 1.4786), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:39,495 - root - INFO - Step 1770: lr=7.37E-06, loss= 1.0406 (max= 1.4786), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:39,495 - root - INFO - Step 1770: lr=7.37E-06, loss= 1.0406 (max= 1.4786), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:39,495 - root - INFO - Step 1770: lr=7.37E-06, loss= 1.0406 (max= 1.4786), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:39,495 - root - INFO - Step 1770: lr=7.37E-06, loss= 1.0406 (max= 1.4786), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:39:39,495 - root - INFO - Step 1770: lr=7.37E-06, loss= 1.0406 (max= 1.4786), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:04,177 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5233938 2025-10-26 09:40:11,405 - root - INFO - Step 1780: lr=7.36E-06, loss= 1.0582 (max= 1.4471), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:11,405 - root - INFO - Step 1780: lr=7.36E-06, loss= 1.0582 (max= 1.4471), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:11,405 - root - INFO - Step 1780: lr=7.36E-06, loss= 1.0582 (max= 1.4471), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:11,405 - root - INFO - Step 1780: lr=7.36E-06, loss= 1.0582 (max= 1.4471), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:11,405 - root - INFO - Step 1780: lr=7.36E-06, loss= 1.0582 (max= 1.4471), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:11,405 - root - INFO - Step 1780: lr=7.36E-06, loss= 1.0582 (max= 1.4471), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:11,405 - root - INFO - Step 1780: lr=7.36E-06, loss= 1.0582 (max= 1.4471), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:11,405 - root - INFO - Step 1780: lr=7.36E-06, loss= 1.0582 (max= 1.4471), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:43,274 - root - INFO - Step 1790: lr=7.35E-06, loss= 1.0650 (max= 1.4225), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:43,274 - root - INFO - Step 1790: lr=7.35E-06, loss= 1.0650 (max= 1.4225), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:43,274 - root - INFO - Step 1790: lr=7.35E-06, loss= 1.0650 (max= 1.4225), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:43,274 - root - INFO - Step 1790: lr=7.35E-06, loss= 1.0650 (max= 1.4225), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:43,274 - root - INFO - Step 1790: lr=7.35E-06, loss= 1.0650 (max= 1.4225), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:43,274 - root - INFO - Step 1790: lr=7.35E-06, loss= 1.0650 (max= 1.4225), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:43,274 - root - INFO - Step 1790: lr=7.35E-06, loss= 1.0650 (max= 1.4225), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:40:43,274 - root - INFO - Step 1790: lr=7.35E-06, loss= 1.0650 (max= 1.4225), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:15,096 - root - INFO - Step 1800: lr=7.34E-06, loss= 1.0517 (max= 1.5343), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:15,096 - root - INFO - Step 1800: lr=7.34E-06, loss= 1.0517 (max= 1.5343), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:15,096 - root - INFO - Step 1800: lr=7.34E-06, loss= 1.0517 (max= 1.5343), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:15,097 - root - INFO - Step 1800: lr=7.34E-06, loss= 1.0517 (max= 1.5343), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:15,097 - root - INFO - Step 1800: lr=7.34E-06, loss= 1.0517 (max= 1.5343), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:15,097 - root - INFO - Step 1800: lr=7.34E-06, loss= 1.0517 (max= 1.5343), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:15,097 - root - INFO - Step 1800: lr=7.34E-06, loss= 1.0517 (max= 1.5343), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:15,097 - root - INFO - Step 1800: lr=7.34E-06, loss= 1.0517 (max= 1.5343), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:46,877 - root - INFO - Step 1810: lr=7.33E-06, loss= 1.0401 (max= 1.4841), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:46,877 - root - INFO - Step 1810: lr=7.33E-06, loss= 1.0401 (max= 1.4841), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:46,877 - root - INFO - Step 1810: lr=7.33E-06, loss= 1.0401 (max= 1.4841), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:46,877 - root - INFO - Step 1810: lr=7.33E-06, loss= 1.0401 (max= 1.4841), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:46,877 - root - INFO - Step 1810: lr=7.33E-06, loss= 1.0401 (max= 1.4841), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:46,877 - root - INFO - Step 1810: lr=7.33E-06, loss= 1.0401 (max= 1.4841), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:46,877 - root - INFO - Step 1810: lr=7.33E-06, loss= 1.0401 (max= 1.4841), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:41:46,877 - root - INFO - Step 1810: lr=7.33E-06, loss= 1.0401 (max= 1.4841), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:18,593 - root - INFO - Step 1820: lr=7.32E-06, loss= 1.0475 (max= 1.4838), tps=20665, mfu=43.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:18,593 - root - INFO - Step 1820: lr=7.32E-06, loss= 1.0475 (max= 1.4838), tps=20666, mfu=43.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:18,593 - root - INFO - Step 1820: lr=7.32E-06, loss= 1.0475 (max= 1.4838), tps=20666, mfu=43.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:18,593 - root - INFO - Step 1820: lr=7.32E-06, loss= 1.0475 (max= 1.4838), tps=20666, mfu=43.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:18,593 - root - INFO - Step 1820: lr=7.32E-06, loss= 1.0475 (max= 1.4838), tps=20666, mfu=43.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:18,593 - root - INFO - Step 1820: lr=7.32E-06, loss= 1.0475 (max= 1.4838), tps=20666, mfu=43.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:18,593 - root - INFO - Step 1820: lr=7.32E-06, loss= 1.0475 (max= 1.4838), tps=20666, mfu=43.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:18,593 - root - INFO - Step 1820: lr=7.32E-06, loss= 1.0475 (max= 1.4838), tps=20665, mfu=43.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:20,922 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:3770236 2025-10-26 09:42:50,483 - root - INFO - Step 1830: lr=7.31E-06, loss= 1.0678 (max= 1.5416), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:50,483 - root - INFO - Step 1830: lr=7.31E-06, loss= 1.0678 (max= 1.5416), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:50,483 - root - INFO - Step 1830: lr=7.31E-06, loss= 1.0678 (max= 1.5416), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:50,483 - root - INFO - Step 1830: lr=7.31E-06, loss= 1.0678 (max= 1.5416), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:50,483 - root - INFO - Step 1830: lr=7.31E-06, loss= 1.0678 (max= 1.5416), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:50,483 - root - INFO - Step 1830: lr=7.31E-06, loss= 1.0678 (max= 1.5416), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:50,483 - root - INFO - Step 1830: lr=7.31E-06, loss= 1.0678 (max= 1.5416), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:42:50,483 - root - INFO - Step 1830: lr=7.31E-06, loss= 1.0678 (max= 1.5416), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:22,416 - root - INFO - Step 1840: lr=7.30E-06, loss= 1.0738 (max= 1.5775), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:22,416 - root - INFO - Step 1840: lr=7.30E-06, loss= 1.0738 (max= 1.5775), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:22,416 - root - INFO - Step 1840: lr=7.30E-06, loss= 1.0738 (max= 1.5775), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:22,416 - root - INFO - Step 1840: lr=7.30E-06, loss= 1.0738 (max= 1.5775), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:22,416 - root - INFO - Step 1840: lr=7.30E-06, loss= 1.0738 (max= 1.5775), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:22,416 - root - INFO - Step 1840: lr=7.30E-06, loss= 1.0738 (max= 1.5775), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:22,416 - root - INFO - Step 1840: lr=7.30E-06, loss= 1.0738 (max= 1.5775), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:22,416 - root - INFO - Step 1840: lr=7.30E-06, loss= 1.0738 (max= 1.5775), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:54,287 - root - INFO - Step 1850: lr=7.29E-06, loss= 1.0593 (max= 1.7218), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:54,287 - root - INFO - Step 1850: lr=7.29E-06, loss= 1.0593 (max= 1.7218), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:54,287 - root - INFO - Step 1850: lr=7.29E-06, loss= 1.0593 (max= 1.7218), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:54,287 - root - INFO - Step 1850: lr=7.29E-06, loss= 1.0593 (max= 1.7218), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:54,287 - root - INFO - Step 1850: lr=7.29E-06, loss= 1.0593 (max= 1.7218), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:54,287 - root - INFO - Step 1850: lr=7.29E-06, loss= 1.0593 (max= 1.7218), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:54,287 - root - INFO - Step 1850: lr=7.29E-06, loss= 1.0593 (max= 1.7218), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:43:54,287 - root - INFO - Step 1850: lr=7.29E-06, loss= 1.0593 (max= 1.7218), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:26,214 - root - INFO - Step 1860: lr=7.28E-06, loss= 1.0599 (max= 1.4709), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:26,214 - root - INFO - Step 1860: lr=7.28E-06, loss= 1.0599 (max= 1.4709), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:26,214 - root - INFO - Step 1860: lr=7.28E-06, loss= 1.0599 (max= 1.4709), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:26,215 - root - INFO - Step 1860: lr=7.28E-06, loss= 1.0599 (max= 1.4709), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:26,215 - root - INFO - Step 1860: lr=7.28E-06, loss= 1.0599 (max= 1.4709), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:26,215 - root - INFO - Step 1860: lr=7.28E-06, loss= 1.0599 (max= 1.4709), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:26,215 - root - INFO - Step 1860: lr=7.28E-06, loss= 1.0599 (max= 1.4709), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:26,215 - root - INFO - Step 1860: lr=7.28E-06, loss= 1.0599 (max= 1.4709), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:58,052 - root - INFO - Step 1870: lr=7.27E-06, loss= 1.0394 (max= 1.5619), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:58,052 - root - INFO - Step 1870: lr=7.27E-06, loss= 1.0394 (max= 1.5619), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:58,052 - root - INFO - Step 1870: lr=7.27E-06, loss= 1.0394 (max= 1.5619), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:58,052 - root - INFO - Step 1870: lr=7.27E-06, loss= 1.0394 (max= 1.5619), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:58,052 - root - INFO - Step 1870: lr=7.27E-06, loss= 1.0394 (max= 1.5619), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:58,052 - root - INFO - Step 1870: lr=7.27E-06, loss= 1.0394 (max= 1.5619), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:58,052 - root - INFO - Step 1870: lr=7.27E-06, loss= 1.0394 (max= 1.5619), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:44:58,052 - root - INFO - Step 1870: lr=7.27E-06, loss= 1.0394 (max= 1.5619), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:45:29,813 - root - INFO - Step 1880: lr=7.26E-06, loss= 1.0594 (max= 1.4527), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:45:29,813 - root - INFO - Step 1880: lr=7.26E-06, loss= 1.0594 (max= 1.4527), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:45:29,813 - root - INFO - Step 1880: lr=7.26E-06, loss= 1.0594 (max= 1.4527), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:45:29,813 - root - INFO - Step 1880: lr=7.26E-06, loss= 1.0594 (max= 1.4527), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:45:29,813 - root - INFO - Step 1880: lr=7.26E-06, loss= 1.0594 (max= 1.4527), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:45:29,813 - root - INFO - Step 1880: lr=7.26E-06, loss= 1.0594 (max= 1.4527), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:45:29,813 - root - INFO - Step 1880: lr=7.26E-06, loss= 1.0594 (max= 1.4527), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:45:29,813 - root - INFO - Step 1880: lr=7.26E-06, loss= 1.0594 (max= 1.4527), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:01,601 - root - INFO - Step 1890: lr=7.25E-06, loss= 1.0403 (max= 1.5477), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:01,601 - root - INFO - Step 1890: lr=7.25E-06, loss= 1.0403 (max= 1.5477), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:01,601 - root - INFO - Step 1890: lr=7.25E-06, loss= 1.0403 (max= 1.5477), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:01,601 - root - INFO - Step 1890: lr=7.25E-06, loss= 1.0403 (max= 1.5477), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:01,601 - root - INFO - Step 1890: lr=7.25E-06, loss= 1.0403 (max= 1.5477), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:01,601 - root - INFO - Step 1890: lr=7.25E-06, loss= 1.0403 (max= 1.5477), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:01,602 - root - INFO - Step 1890: lr=7.25E-06, loss= 1.0403 (max= 1.5477), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:01,602 - root - INFO - Step 1890: lr=7.25E-06, loss= 1.0403 (max= 1.5477), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:11,721 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:1770577 2025-10-26 09:46:33,426 - root - INFO - Step 1900: lr=7.24E-06, loss= 1.0291 (max= 1.7971), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:33,426 - root - INFO - Step 1900: lr=7.24E-06, loss= 1.0291 (max= 1.7971), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:33,426 - root - INFO - Step 1900: lr=7.24E-06, loss= 1.0291 (max= 1.7971), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:33,426 - root - INFO - Step 1900: lr=7.24E-06, loss= 1.0291 (max= 1.7971), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:33,426 - root - INFO - Step 1900: lr=7.24E-06, loss= 1.0291 (max= 1.7971), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:33,426 - root - INFO - Step 1900: lr=7.24E-06, loss= 1.0291 (max= 1.7971), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:33,426 - root - INFO - Step 1900: lr=7.24E-06, loss= 1.0291 (max= 1.7971), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:46:33,426 - root - INFO - Step 1900: lr=7.24E-06, loss= 1.0291 (max= 1.7971), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:05,254 - root - INFO - Step 1910: lr=7.23E-06, loss= 1.0553 (max= 1.4159), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:05,254 - root - INFO - Step 1910: lr=7.23E-06, loss= 1.0553 (max= 1.4159), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:05,254 - root - INFO - Step 1910: lr=7.23E-06, loss= 1.0553 (max= 1.4159), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:05,254 - root - INFO - Step 1910: lr=7.23E-06, loss= 1.0553 (max= 1.4159), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:05,254 - root - INFO - Step 1910: lr=7.23E-06, loss= 1.0553 (max= 1.4159), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:05,254 - root - INFO - Step 1910: lr=7.23E-06, loss= 1.0553 (max= 1.4159), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:05,254 - root - INFO - Step 1910: lr=7.23E-06, loss= 1.0553 (max= 1.4159), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:05,254 - root - INFO - Step 1910: lr=7.23E-06, loss= 1.0553 (max= 1.4159), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:37,256 - root - INFO - Step 1920: lr=7.22E-06, loss= 1.0334 (max= 1.4925), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:37,256 - root - INFO - Step 1920: lr=7.22E-06, loss= 1.0334 (max= 1.4925), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:37,256 - root - INFO - Step 1920: lr=7.22E-06, loss= 1.0334 (max= 1.4925), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:37,256 - root - INFO - Step 1920: lr=7.22E-06, loss= 1.0334 (max= 1.4925), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:37,256 - root - INFO - Step 1920: lr=7.22E-06, loss= 1.0334 (max= 1.4925), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:37,256 - root - INFO - Step 1920: lr=7.22E-06, loss= 1.0334 (max= 1.4925), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:37,256 - root - INFO - Step 1920: lr=7.22E-06, loss= 1.0334 (max= 1.4925), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:47:37,257 - root - INFO - Step 1920: lr=7.22E-06, loss= 1.0334 (max= 1.4925), tps=20481, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:09,269 - root - INFO - Step 1930: lr=7.21E-06, loss= 1.0539 (max= 1.5230), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:09,269 - root - INFO - Step 1930: lr=7.21E-06, loss= 1.0539 (max= 1.5230), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:09,269 - root - INFO - Step 1930: lr=7.21E-06, loss= 1.0539 (max= 1.5230), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:09,269 - root - INFO - Step 1930: lr=7.21E-06, loss= 1.0539 (max= 1.5230), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:09,269 - root - INFO - Step 1930: lr=7.21E-06, loss= 1.0539 (max= 1.5230), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:09,269 - root - INFO - Step 1930: lr=7.21E-06, loss= 1.0539 (max= 1.5230), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:09,270 - root - INFO - Step 1930: lr=7.21E-06, loss= 1.0539 (max= 1.5230), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:09,270 - root - INFO - Step 1930: lr=7.21E-06, loss= 1.0539 (max= 1.5230), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:41,138 - root - INFO - Step 1940: lr=7.20E-06, loss= 1.0336 (max= 1.3799), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:41,138 - root - INFO - Step 1940: lr=7.20E-06, loss= 1.0336 (max= 1.3799), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:41,138 - root - INFO - Step 1940: lr=7.20E-06, loss= 1.0336 (max= 1.3799), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:41,138 - root - INFO - Step 1940: lr=7.20E-06, loss= 1.0336 (max= 1.3799), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:41,138 - root - INFO - Step 1940: lr=7.20E-06, loss= 1.0336 (max= 1.3799), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:41,138 - root - INFO - Step 1940: lr=7.20E-06, loss= 1.0336 (max= 1.3799), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:41,138 - root - INFO - Step 1940: lr=7.20E-06, loss= 1.0336 (max= 1.3799), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:48:41,138 - root - INFO - Step 1940: lr=7.20E-06, loss= 1.0336 (max= 1.3799), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:12,903 - root - INFO - Step 1950: lr=7.19E-06, loss= 1.0289 (max= 1.4994), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:12,904 - root - INFO - Step 1950: lr=7.19E-06, loss= 1.0289 (max= 1.4994), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:12,904 - root - INFO - Step 1950: lr=7.19E-06, loss= 1.0289 (max= 1.4994), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:12,904 - root - INFO - Step 1950: lr=7.19E-06, loss= 1.0289 (max= 1.4994), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:12,904 - root - INFO - Step 1950: lr=7.19E-06, loss= 1.0289 (max= 1.4994), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:12,904 - root - INFO - Step 1950: lr=7.19E-06, loss= 1.0289 (max= 1.4994), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:12,904 - root - INFO - Step 1950: lr=7.19E-06, loss= 1.0289 (max= 1.4994), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:12,904 - root - INFO - Step 1950: lr=7.19E-06, loss= 1.0289 (max= 1.4994), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:44,742 - root - INFO - Step 1960: lr=7.19E-06, loss= 1.0362 (max= 1.5701), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:44,742 - root - INFO - Step 1960: lr=7.19E-06, loss= 1.0362 (max= 1.5701), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:44,742 - root - INFO - Step 1960: lr=7.19E-06, loss= 1.0362 (max= 1.5701), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:44,742 - root - INFO - Step 1960: lr=7.19E-06, loss= 1.0362 (max= 1.5701), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:44,742 - root - INFO - Step 1960: lr=7.19E-06, loss= 1.0362 (max= 1.5701), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:44,742 - root - INFO - Step 1960: lr=7.19E-06, loss= 1.0362 (max= 1.5701), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:44,742 - root - INFO - Step 1960: lr=7.19E-06, loss= 1.0362 (max= 1.5701), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:49:44,742 - root - INFO - Step 1960: lr=7.19E-06, loss= 1.0362 (max= 1.5701), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:16,587 - root - INFO - Step 1970: lr=7.18E-06, loss= 1.0267 (max= 1.6228), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:16,587 - root - INFO - Step 1970: lr=7.18E-06, loss= 1.0267 (max= 1.6228), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:16,587 - root - INFO - Step 1970: lr=7.18E-06, loss= 1.0267 (max= 1.6228), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:16,587 - root - INFO - Step 1970: lr=7.18E-06, loss= 1.0267 (max= 1.6228), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:16,587 - root - INFO - Step 1970: lr=7.18E-06, loss= 1.0267 (max= 1.6228), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:16,587 - root - INFO - Step 1970: lr=7.18E-06, loss= 1.0267 (max= 1.6228), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:16,587 - root - INFO - Step 1970: lr=7.18E-06, loss= 1.0267 (max= 1.6228), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:16,587 - root - INFO - Step 1970: lr=7.18E-06, loss= 1.0267 (max= 1.6228), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:48,478 - root - INFO - Step 1980: lr=7.17E-06, loss= 1.0561 (max= 1.4227), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:48,478 - root - INFO - Step 1980: lr=7.17E-06, loss= 1.0561 (max= 1.4227), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:48,478 - root - INFO - Step 1980: lr=7.17E-06, loss= 1.0561 (max= 1.4227), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:48,478 - root - INFO - Step 1980: lr=7.17E-06, loss= 1.0561 (max= 1.4227), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:48,478 - root - INFO - Step 1980: lr=7.17E-06, loss= 1.0561 (max= 1.4227), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:48,478 - root - INFO - Step 1980: lr=7.17E-06, loss= 1.0561 (max= 1.4227), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:48,478 - root - INFO - Step 1980: lr=7.17E-06, loss= 1.0561 (max= 1.4227), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:50:48,478 - root - INFO - Step 1980: lr=7.17E-06, loss= 1.0561 (max= 1.4227), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:20,475 - root - INFO - Step 1990: lr=7.16E-06, loss= 1.0457 (max= 1.4917), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:20,475 - root - INFO - Step 1990: lr=7.16E-06, loss= 1.0457 (max= 1.4917), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:20,475 - root - INFO - Step 1990: lr=7.16E-06, loss= 1.0457 (max= 1.4917), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:20,475 - root - INFO - Step 1990: lr=7.16E-06, loss= 1.0457 (max= 1.4917), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:20,475 - root - INFO - Step 1990: lr=7.16E-06, loss= 1.0457 (max= 1.4917), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:20,475 - root - INFO - Step 1990: lr=7.16E-06, loss= 1.0457 (max= 1.4917), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:20,475 - root - INFO - Step 1990: lr=7.16E-06, loss= 1.0457 (max= 1.4917), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:20,475 - root - INFO - Step 1990: lr=7.16E-06, loss= 1.0457 (max= 1.4917), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-2000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-2000! Save time: 4.420971155166626 2025-10-26 09:51:52,289 - root - INFO - Step 2000: lr=7.15E-06, loss= 1.0496 (max= 1.4344), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:52,289 - root - INFO - Saving a full checkpoint at step 2000 2025-10-26 09:51:52,289 - root - INFO - Step 2000: lr=7.15E-06, loss= 1.0496 (max= 1.4344), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:52,289 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 09:51:52,289 - root - INFO - Step 2000: lr=7.15E-06, loss= 1.0496 (max= 1.4344), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:52,289 - root - INFO - Saving a full checkpoint at step 2000 2025-10-26 09:51:52,289 - root - INFO - Step 2000: lr=7.15E-06, loss= 1.0496 (max= 1.4344), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:52,289 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 09:51:52,289 - root - INFO - Saving a full checkpoint at step 2000 2025-10-26 09:51:52,289 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 09:51:52,289 - root - INFO - Step 2000: lr=7.15E-06, loss= 1.0496 (max= 1.4344), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:52,289 - root - INFO - Saving a full checkpoint at step 2000 2025-10-26 09:51:52,289 - root - INFO - Step 2000: lr=7.15E-06, loss= 1.0496 (max= 1.4344), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:52,289 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 09:51:52,289 - root - INFO - Step 2000: lr=7.15E-06, loss= 1.0496 (max= 1.4344), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:52,289 - root - INFO - Saving a full checkpoint at step 2000 2025-10-26 09:51:52,289 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 09:51:52,289 - root - INFO - Saving a full checkpoint at step 2000 2025-10-26 09:51:52,289 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 09:51:52,289 - root - INFO - Saving a full checkpoint at step 2000 2025-10-26 09:51:52,289 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 09:51:52,289 - root - INFO - Step 2000: lr=7.15E-06, loss= 1.0496 (max= 1.4344), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:51:52,289 - root - INFO - Saving a full checkpoint at step 2000 2025-10-26 09:51:52,289 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 09:52:06,985 - root - INFO - Finished saving the checkpoint in 14.70 seconds 2025-10-26 09:52:06,993 - root - INFO - Finished saving the checkpoint in 14.70 seconds 2025-10-26 09:52:06,993 - root - INFO - Finished saving the checkpoint in 14.70 seconds 2025-10-26 09:52:06,993 - root - INFO - Finished saving the checkpoint in 14.70 seconds 2025-10-26 09:52:06,994 - root - INFO - Finished saving the checkpoint in 14.70 seconds 2025-10-26 09:52:06,994 - root - INFO - Finished saving the checkpoint in 14.70 seconds 2025-10-26 09:52:06,995 - root - INFO - Finished saving the checkpoint in 14.71 seconds 2025-10-26 09:52:06,996 - root - INFO - Finished saving the checkpoint in 14.71 seconds 2025-10-26 09:52:38,734 - root - INFO - Step 2010: lr=7.14E-06, loss= 1.0545 (max= 1.5761), tps=14111, mfu=29.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:52:38,734 - root - INFO - Step 2010: lr=7.14E-06, loss= 1.0545 (max= 1.5761), tps=14111, mfu=29.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:52:38,734 - root - INFO - Step 2010: lr=7.14E-06, loss= 1.0545 (max= 1.5761), tps=14112, mfu=29.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:52:38,734 - root - INFO - Step 2010: lr=7.14E-06, loss= 1.0545 (max= 1.5761), tps=14112, mfu=29.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:52:38,734 - root - INFO - Step 2010: lr=7.14E-06, loss= 1.0545 (max= 1.5761), tps=14112, mfu=29.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:52:38,734 - root - INFO - Step 2010: lr=7.14E-06, loss= 1.0545 (max= 1.5761), tps=14112, mfu=29.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:52:38,734 - root - INFO - Step 2010: lr=7.14E-06, loss= 1.0545 (max= 1.5761), tps=14112, mfu=29.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:52:38,734 - root - INFO - Step 2010: lr=7.14E-06, loss= 1.0545 (max= 1.5761), tps=14111, mfu=29.40%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:10,571 - root - INFO - Step 2020: lr=7.13E-06, loss= 1.0557 (max= 1.5064), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:10,571 - root - INFO - Step 2020: lr=7.13E-06, loss= 1.0557 (max= 1.5064), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:10,571 - root - INFO - Step 2020: lr=7.13E-06, loss= 1.0557 (max= 1.5064), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:10,571 - root - INFO - Step 2020: lr=7.13E-06, loss= 1.0557 (max= 1.5064), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:10,571 - root - INFO - Step 2020: lr=7.13E-06, loss= 1.0557 (max= 1.5064), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:10,571 - root - INFO - Step 2020: lr=7.13E-06, loss= 1.0557 (max= 1.5064), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:10,571 - root - INFO - Step 2020: lr=7.13E-06, loss= 1.0557 (max= 1.5064), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:10,571 - root - INFO - Step 2020: lr=7.13E-06, loss= 1.0557 (max= 1.5064), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:42,460 - root - INFO - Step 2030: lr=7.12E-06, loss= 1.0425 (max= 1.4649), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:42,460 - root - INFO - Step 2030: lr=7.12E-06, loss= 1.0425 (max= 1.4649), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:42,460 - root - INFO - Step 2030: lr=7.12E-06, loss= 1.0425 (max= 1.4649), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:42,460 - root - INFO - Step 2030: lr=7.12E-06, loss= 1.0425 (max= 1.4649), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:42,460 - root - INFO - Step 2030: lr=7.12E-06, loss= 1.0425 (max= 1.4649), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:42,460 - root - INFO - Step 2030: lr=7.12E-06, loss= 1.0425 (max= 1.4649), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:42,460 - root - INFO - Step 2030: lr=7.12E-06, loss= 1.0425 (max= 1.4649), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:53:42,460 - root - INFO - Step 2030: lr=7.12E-06, loss= 1.0425 (max= 1.4649), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:14,458 - root - INFO - Step 2040: lr=7.11E-06, loss= 1.0214 (max= 1.4301), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:14,458 - root - INFO - Step 2040: lr=7.11E-06, loss= 1.0214 (max= 1.4301), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:14,459 - root - INFO - Step 2040: lr=7.11E-06, loss= 1.0214 (max= 1.4301), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:14,459 - root - INFO - Step 2040: lr=7.11E-06, loss= 1.0214 (max= 1.4301), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:14,459 - root - INFO - Step 2040: lr=7.11E-06, loss= 1.0214 (max= 1.4301), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:14,459 - root - INFO - Step 2040: lr=7.11E-06, loss= 1.0214 (max= 1.4301), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:14,459 - root - INFO - Step 2040: lr=7.11E-06, loss= 1.0214 (max= 1.4301), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:14,459 - root - INFO - Step 2040: lr=7.11E-06, loss= 1.0214 (max= 1.4301), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:46,498 - root - INFO - Step 2050: lr=7.10E-06, loss= 1.0226 (max= 1.5610), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:46,498 - root - INFO - Step 2050: lr=7.10E-06, loss= 1.0226 (max= 1.5610), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:46,498 - root - INFO - Step 2050: lr=7.10E-06, loss= 1.0226 (max= 1.5610), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:46,498 - root - INFO - Step 2050: lr=7.10E-06, loss= 1.0226 (max= 1.5610), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:46,498 - root - INFO - Step 2050: lr=7.10E-06, loss= 1.0226 (max= 1.5610), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:46,498 - root - INFO - Step 2050: lr=7.10E-06, loss= 1.0226 (max= 1.5610), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:46,498 - root - INFO - Step 2050: lr=7.10E-06, loss= 1.0226 (max= 1.5610), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:54:46,498 - root - INFO - Step 2050: lr=7.10E-06, loss= 1.0226 (max= 1.5610), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:18,530 - root - INFO - Step 2060: lr=7.09E-06, loss= 1.0173 (max= 1.4511), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:18,530 - root - INFO - Step 2060: lr=7.09E-06, loss= 1.0173 (max= 1.4511), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:18,530 - root - INFO - Step 2060: lr=7.09E-06, loss= 1.0173 (max= 1.4511), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:18,530 - root - INFO - Step 2060: lr=7.09E-06, loss= 1.0173 (max= 1.4511), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:18,530 - root - INFO - Step 2060: lr=7.09E-06, loss= 1.0173 (max= 1.4511), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:18,530 - root - INFO - Step 2060: lr=7.09E-06, loss= 1.0173 (max= 1.4511), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:18,530 - root - INFO - Step 2060: lr=7.09E-06, loss= 1.0173 (max= 1.4511), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:18,530 - root - INFO - Step 2060: lr=7.09E-06, loss= 1.0173 (max= 1.4511), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:50,466 - root - INFO - Step 2070: lr=7.08E-06, loss= 1.0642 (max= 1.5546), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:50,466 - root - INFO - Step 2070: lr=7.08E-06, loss= 1.0642 (max= 1.5546), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:50,466 - root - INFO - Step 2070: lr=7.08E-06, loss= 1.0642 (max= 1.5546), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:50,466 - root - INFO - Step 2070: lr=7.08E-06, loss= 1.0642 (max= 1.5546), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:50,466 - root - INFO - Step 2070: lr=7.08E-06, loss= 1.0642 (max= 1.5546), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:50,466 - root - INFO - Step 2070: lr=7.08E-06, loss= 1.0642 (max= 1.5546), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:50,466 - root - INFO - Step 2070: lr=7.08E-06, loss= 1.0642 (max= 1.5546), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:55:50,466 - root - INFO - Step 2070: lr=7.08E-06, loss= 1.0642 (max= 1.5546), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:22,360 - root - INFO - Step 2080: lr=7.07E-06, loss= 1.0249 (max= 1.4841), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:22,360 - root - INFO - Step 2080: lr=7.07E-06, loss= 1.0249 (max= 1.4841), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:22,361 - root - INFO - Step 2080: lr=7.07E-06, loss= 1.0249 (max= 1.4841), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:22,361 - root - INFO - Step 2080: lr=7.07E-06, loss= 1.0249 (max= 1.4841), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:22,361 - root - INFO - Step 2080: lr=7.07E-06, loss= 1.0249 (max= 1.4841), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:22,361 - root - INFO - Step 2080: lr=7.07E-06, loss= 1.0249 (max= 1.4841), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:22,361 - root - INFO - Step 2080: lr=7.07E-06, loss= 1.0249 (max= 1.4841), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:22,361 - root - INFO - Step 2080: lr=7.07E-06, loss= 1.0249 (max= 1.4841), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:54,189 - root - INFO - Step 2090: lr=7.06E-06, loss= 1.0308 (max= 1.4442), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:54,189 - root - INFO - Step 2090: lr=7.06E-06, loss= 1.0308 (max= 1.4442), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:54,189 - root - INFO - Step 2090: lr=7.06E-06, loss= 1.0308 (max= 1.4442), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:54,189 - root - INFO - Step 2090: lr=7.06E-06, loss= 1.0308 (max= 1.4442), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:54,189 - root - INFO - Step 2090: lr=7.06E-06, loss= 1.0308 (max= 1.4442), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:54,189 - root - INFO - Step 2090: lr=7.06E-06, loss= 1.0308 (max= 1.4442), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:54,189 - root - INFO - Step 2090: lr=7.06E-06, loss= 1.0308 (max= 1.4442), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:56:54,190 - root - INFO - Step 2090: lr=7.06E-06, loss= 1.0308 (max= 1.4442), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:25,986 - root - INFO - Step 2100: lr=7.05E-06, loss= 1.0594 (max= 1.5523), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:25,986 - root - INFO - Step 2100: lr=7.05E-06, loss= 1.0594 (max= 1.5523), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:25,986 - root - INFO - Step 2100: lr=7.05E-06, loss= 1.0594 (max= 1.5523), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:25,986 - root - INFO - Step 2100: lr=7.05E-06, loss= 1.0594 (max= 1.5523), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:25,986 - root - INFO - Step 2100: lr=7.05E-06, loss= 1.0594 (max= 1.5523), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:25,986 - root - INFO - Step 2100: lr=7.05E-06, loss= 1.0594 (max= 1.5523), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:25,986 - root - INFO - Step 2100: lr=7.05E-06, loss= 1.0594 (max= 1.5523), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:25,986 - root - INFO - Step 2100: lr=7.05E-06, loss= 1.0594 (max= 1.5523), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:57,830 - root - INFO - Step 2110: lr=7.04E-06, loss= 1.0707 (max= 1.6994), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:57,830 - root - INFO - Step 2110: lr=7.04E-06, loss= 1.0707 (max= 1.6994), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:57,830 - root - INFO - Step 2110: lr=7.04E-06, loss= 1.0707 (max= 1.6994), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:57,830 - root - INFO - Step 2110: lr=7.04E-06, loss= 1.0707 (max= 1.6994), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:57,830 - root - INFO - Step 2110: lr=7.04E-06, loss= 1.0707 (max= 1.6994), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:57,830 - root - INFO - Step 2110: lr=7.04E-06, loss= 1.0707 (max= 1.6994), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:57,830 - root - INFO - Step 2110: lr=7.04E-06, loss= 1.0707 (max= 1.6994), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:57:57,830 - root - INFO - Step 2110: lr=7.04E-06, loss= 1.0707 (max= 1.6994), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:58:29,928 - root - INFO - Step 2120: lr=7.03E-06, loss= 1.0457 (max= 1.5830), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:58:29,928 - root - INFO - Step 2120: lr=7.03E-06, loss= 1.0457 (max= 1.5830), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:58:29,928 - root - INFO - Step 2120: lr=7.03E-06, loss= 1.0457 (max= 1.5830), tps=20420, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:58:29,928 - root - INFO - Step 2120: lr=7.03E-06, loss= 1.0457 (max= 1.5830), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:58:29,928 - root - INFO - Step 2120: lr=7.03E-06, loss= 1.0457 (max= 1.5830), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:58:29,928 - root - INFO - Step 2120: lr=7.03E-06, loss= 1.0457 (max= 1.5830), tps=20420, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:58:29,928 - root - INFO - Step 2120: lr=7.03E-06, loss= 1.0457 (max= 1.5830), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:58:29,928 - root - INFO - Step 2120: lr=7.03E-06, loss= 1.0457 (max= 1.5830), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:58:32,277 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:631713 2025-10-26 09:59:01,810 - root - INFO - Step 2130: lr=7.03E-06, loss= 1.0469 (max= 1.4882), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:01,810 - root - INFO - Step 2130: lr=7.03E-06, loss= 1.0469 (max= 1.4882), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:01,810 - root - INFO - Step 2130: lr=7.03E-06, loss= 1.0469 (max= 1.4882), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:01,810 - root - INFO - Step 2130: lr=7.03E-06, loss= 1.0469 (max= 1.4882), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:01,810 - root - INFO - Step 2130: lr=7.03E-06, loss= 1.0469 (max= 1.4882), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:01,810 - root - INFO - Step 2130: lr=7.03E-06, loss= 1.0469 (max= 1.4882), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:01,810 - root - INFO - Step 2130: lr=7.03E-06, loss= 1.0469 (max= 1.4882), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:01,810 - root - INFO - Step 2130: lr=7.03E-06, loss= 1.0469 (max= 1.4882), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:21,608 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5917317 2025-10-26 09:59:33,839 - root - INFO - Step 2140: lr=7.02E-06, loss= 1.0408 (max= 1.6451), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:33,839 - root - INFO - Step 2140: lr=7.02E-06, loss= 1.0408 (max= 1.6451), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:33,839 - root - INFO - Step 2140: lr=7.02E-06, loss= 1.0408 (max= 1.6451), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:33,839 - root - INFO - Step 2140: lr=7.02E-06, loss= 1.0408 (max= 1.6451), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:33,839 - root - INFO - Step 2140: lr=7.02E-06, loss= 1.0408 (max= 1.6451), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:33,839 - root - INFO - Step 2140: lr=7.02E-06, loss= 1.0408 (max= 1.6451), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:33,839 - root - INFO - Step 2140: lr=7.02E-06, loss= 1.0408 (max= 1.6451), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 09:59:33,839 - root - INFO - Step 2140: lr=7.02E-06, loss= 1.0408 (max= 1.6451), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:05,751 - root - INFO - Step 2150: lr=7.01E-06, loss= 1.0568 (max= 1.4761), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:05,751 - root - INFO - Step 2150: lr=7.01E-06, loss= 1.0568 (max= 1.4761), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:05,751 - root - INFO - Step 2150: lr=7.01E-06, loss= 1.0568 (max= 1.4761), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:05,751 - root - INFO - Step 2150: lr=7.01E-06, loss= 1.0568 (max= 1.4761), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:05,751 - root - INFO - Step 2150: lr=7.01E-06, loss= 1.0568 (max= 1.4761), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:05,751 - root - INFO - Step 2150: lr=7.01E-06, loss= 1.0568 (max= 1.4761), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:05,751 - root - INFO - Step 2150: lr=7.01E-06, loss= 1.0568 (max= 1.4761), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:05,751 - root - INFO - Step 2150: lr=7.01E-06, loss= 1.0568 (max= 1.4761), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:37,577 - root - INFO - Step 2160: lr=7.00E-06, loss= 1.0419 (max= 1.4250), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:37,577 - root - INFO - Step 2160: lr=7.00E-06, loss= 1.0419 (max= 1.4250), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:37,577 - root - INFO - Step 2160: lr=7.00E-06, loss= 1.0419 (max= 1.4250), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:37,577 - root - INFO - Step 2160: lr=7.00E-06, loss= 1.0419 (max= 1.4250), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:37,577 - root - INFO - Step 2160: lr=7.00E-06, loss= 1.0419 (max= 1.4250), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:37,577 - root - INFO - Step 2160: lr=7.00E-06, loss= 1.0419 (max= 1.4250), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:37,577 - root - INFO - Step 2160: lr=7.00E-06, loss= 1.0419 (max= 1.4250), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:00:37,577 - root - INFO - Step 2160: lr=7.00E-06, loss= 1.0419 (max= 1.4250), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:09,346 - root - INFO - Step 2170: lr=6.99E-06, loss= 1.0414 (max= 1.6424), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:09,346 - root - INFO - Step 2170: lr=6.99E-06, loss= 1.0414 (max= 1.6424), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:09,346 - root - INFO - Step 2170: lr=6.99E-06, loss= 1.0414 (max= 1.6424), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:09,346 - root - INFO - Step 2170: lr=6.99E-06, loss= 1.0414 (max= 1.6424), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:09,346 - root - INFO - Step 2170: lr=6.99E-06, loss= 1.0414 (max= 1.6424), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:09,347 - root - INFO - Step 2170: lr=6.99E-06, loss= 1.0414 (max= 1.6424), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:09,347 - root - INFO - Step 2170: lr=6.99E-06, loss= 1.0414 (max= 1.6424), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:09,347 - root - INFO - Step 2170: lr=6.99E-06, loss= 1.0414 (max= 1.6424), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:41,215 - root - INFO - Step 2180: lr=6.98E-06, loss= 1.0585 (max= 1.4539), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:41,215 - root - INFO - Step 2180: lr=6.98E-06, loss= 1.0585 (max= 1.4539), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:41,215 - root - INFO - Step 2180: lr=6.98E-06, loss= 1.0585 (max= 1.4539), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:41,215 - root - INFO - Step 2180: lr=6.98E-06, loss= 1.0585 (max= 1.4539), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:41,215 - root - INFO - Step 2180: lr=6.98E-06, loss= 1.0585 (max= 1.4539), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:41,215 - root - INFO - Step 2180: lr=6.98E-06, loss= 1.0585 (max= 1.4539), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:41,215 - root - INFO - Step 2180: lr=6.98E-06, loss= 1.0585 (max= 1.4539), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:01:41,216 - root - INFO - Step 2180: lr=6.98E-06, loss= 1.0585 (max= 1.4539), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:13,056 - root - INFO - Step 2190: lr=6.97E-06, loss= 1.0532 (max= 1.5465), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:13,056 - root - INFO - Step 2190: lr=6.97E-06, loss= 1.0532 (max= 1.5465), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:13,056 - root - INFO - Step 2190: lr=6.97E-06, loss= 1.0532 (max= 1.5465), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:13,056 - root - INFO - Step 2190: lr=6.97E-06, loss= 1.0532 (max= 1.5465), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:13,056 - root - INFO - Step 2190: lr=6.97E-06, loss= 1.0532 (max= 1.5465), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:13,056 - root - INFO - Step 2190: lr=6.97E-06, loss= 1.0532 (max= 1.5465), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:13,056 - root - INFO - Step 2190: lr=6.97E-06, loss= 1.0532 (max= 1.5465), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:13,057 - root - INFO - Step 2190: lr=6.97E-06, loss= 1.0532 (max= 1.5465), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:44,847 - root - INFO - Step 2200: lr=6.96E-06, loss= 1.0615 (max= 1.5839), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:44,847 - root - INFO - Step 2200: lr=6.96E-06, loss= 1.0615 (max= 1.5839), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:44,847 - root - INFO - Step 2200: lr=6.96E-06, loss= 1.0615 (max= 1.5839), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:44,847 - root - INFO - Step 2200: lr=6.96E-06, loss= 1.0615 (max= 1.5839), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:44,847 - root - INFO - Step 2200: lr=6.96E-06, loss= 1.0615 (max= 1.5839), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:44,847 - root - INFO - Step 2200: lr=6.96E-06, loss= 1.0615 (max= 1.5839), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:44,847 - root - INFO - Step 2200: lr=6.96E-06, loss= 1.0615 (max= 1.5839), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:02:44,847 - root - INFO - Step 2200: lr=6.96E-06, loss= 1.0615 (max= 1.5839), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:16,719 - root - INFO - Step 2210: lr=6.95E-06, loss= 1.0645 (max= 1.5040), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:16,719 - root - INFO - Step 2210: lr=6.95E-06, loss= 1.0645 (max= 1.5040), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:16,720 - root - INFO - Step 2210: lr=6.95E-06, loss= 1.0645 (max= 1.5040), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:16,720 - root - INFO - Step 2210: lr=6.95E-06, loss= 1.0645 (max= 1.5040), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:16,720 - root - INFO - Step 2210: lr=6.95E-06, loss= 1.0645 (max= 1.5040), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:16,720 - root - INFO - Step 2210: lr=6.95E-06, loss= 1.0645 (max= 1.5040), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:16,720 - root - INFO - Step 2210: lr=6.95E-06, loss= 1.0645 (max= 1.5040), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:16,720 - root - INFO - Step 2210: lr=6.95E-06, loss= 1.0645 (max= 1.5040), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:48,602 - root - INFO - Step 2220: lr=6.94E-06, loss= 1.0254 (max= 1.4864), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:48,602 - root - INFO - Step 2220: lr=6.94E-06, loss= 1.0254 (max= 1.4864), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:48,602 - root - INFO - Step 2220: lr=6.94E-06, loss= 1.0254 (max= 1.4864), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:48,602 - root - INFO - Step 2220: lr=6.94E-06, loss= 1.0254 (max= 1.4864), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:48,602 - root - INFO - Step 2220: lr=6.94E-06, loss= 1.0254 (max= 1.4864), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:48,602 - root - INFO - Step 2220: lr=6.94E-06, loss= 1.0254 (max= 1.4864), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:48,602 - root - INFO - Step 2220: lr=6.94E-06, loss= 1.0254 (max= 1.4864), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:48,602 - root - INFO - Step 2220: lr=6.94E-06, loss= 1.0254 (max= 1.4864), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:03:58,730 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2856944 2025-10-26 10:04:20,435 - root - INFO - Step 2230: lr=6.94E-06, loss= 1.0582 (max= 1.4750), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:20,435 - root - INFO - Step 2230: lr=6.94E-06, loss= 1.0582 (max= 1.4750), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:20,435 - root - INFO - Step 2230: lr=6.94E-06, loss= 1.0582 (max= 1.4750), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:20,435 - root - INFO - Step 2230: lr=6.94E-06, loss= 1.0582 (max= 1.4750), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:20,435 - root - INFO - Step 2230: lr=6.94E-06, loss= 1.0582 (max= 1.4750), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:20,435 - root - INFO - Step 2230: lr=6.94E-06, loss= 1.0582 (max= 1.4750), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:20,435 - root - INFO - Step 2230: lr=6.94E-06, loss= 1.0582 (max= 1.4750), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:20,435 - root - INFO - Step 2230: lr=6.94E-06, loss= 1.0582 (max= 1.4750), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:52,172 - root - INFO - Step 2240: lr=6.93E-06, loss= 1.0614 (max= 1.5488), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:52,172 - root - INFO - Step 2240: lr=6.93E-06, loss= 1.0614 (max= 1.5488), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:52,172 - root - INFO - Step 2240: lr=6.93E-06, loss= 1.0614 (max= 1.5488), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:52,172 - root - INFO - Step 2240: lr=6.93E-06, loss= 1.0614 (max= 1.5488), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:52,172 - root - INFO - Step 2240: lr=6.93E-06, loss= 1.0614 (max= 1.5488), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:52,172 - root - INFO - Step 2240: lr=6.93E-06, loss= 1.0614 (max= 1.5488), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:52,172 - root - INFO - Step 2240: lr=6.93E-06, loss= 1.0614 (max= 1.5488), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:04:52,172 - root - INFO - Step 2240: lr=6.93E-06, loss= 1.0614 (max= 1.5488), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:24,138 - root - INFO - Step 2250: lr=6.92E-06, loss= 1.0353 (max= 1.6638), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:24,138 - root - INFO - Step 2250: lr=6.92E-06, loss= 1.0353 (max= 1.6638), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:24,138 - root - INFO - Step 2250: lr=6.92E-06, loss= 1.0353 (max= 1.6638), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:24,138 - root - INFO - Step 2250: lr=6.92E-06, loss= 1.0353 (max= 1.6638), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:24,138 - root - INFO - Step 2250: lr=6.92E-06, loss= 1.0353 (max= 1.6638), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:24,138 - root - INFO - Step 2250: lr=6.92E-06, loss= 1.0353 (max= 1.6638), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:24,138 - root - INFO - Step 2250: lr=6.92E-06, loss= 1.0353 (max= 1.6638), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:24,138 - root - INFO - Step 2250: lr=6.92E-06, loss= 1.0353 (max= 1.6638), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:56,205 - root - INFO - Step 2260: lr=6.91E-06, loss= 1.0513 (max= 1.5201), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:56,205 - root - INFO - Step 2260: lr=6.91E-06, loss= 1.0513 (max= 1.5201), tps=20439, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:56,205 - root - INFO - Step 2260: lr=6.91E-06, loss= 1.0513 (max= 1.5201), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:56,205 - root - INFO - Step 2260: lr=6.91E-06, loss= 1.0513 (max= 1.5201), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:56,205 - root - INFO - Step 2260: lr=6.91E-06, loss= 1.0513 (max= 1.5201), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:56,205 - root - INFO - Step 2260: lr=6.91E-06, loss= 1.0513 (max= 1.5201), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:56,205 - root - INFO - Step 2260: lr=6.91E-06, loss= 1.0513 (max= 1.5201), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:05:56,205 - root - INFO - Step 2260: lr=6.91E-06, loss= 1.0513 (max= 1.5201), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:28,126 - root - INFO - Step 2270: lr=6.90E-06, loss= 1.0450 (max= 1.5409), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:28,126 - root - INFO - Step 2270: lr=6.90E-06, loss= 1.0450 (max= 1.5409), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:28,126 - root - INFO - Step 2270: lr=6.90E-06, loss= 1.0450 (max= 1.5409), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:28,126 - root - INFO - Step 2270: lr=6.90E-06, loss= 1.0450 (max= 1.5409), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:28,126 - root - INFO - Step 2270: lr=6.90E-06, loss= 1.0450 (max= 1.5409), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:28,126 - root - INFO - Step 2270: lr=6.90E-06, loss= 1.0450 (max= 1.5409), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:28,126 - root - INFO - Step 2270: lr=6.90E-06, loss= 1.0450 (max= 1.5409), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:28,126 - root - INFO - Step 2270: lr=6.90E-06, loss= 1.0450 (max= 1.5409), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:59,949 - root - INFO - Step 2280: lr=6.89E-06, loss= 1.0469 (max= 1.5008), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:59,949 - root - INFO - Step 2280: lr=6.89E-06, loss= 1.0469 (max= 1.5008), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:59,949 - root - INFO - Step 2280: lr=6.89E-06, loss= 1.0469 (max= 1.5008), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:59,950 - root - INFO - Step 2280: lr=6.89E-06, loss= 1.0469 (max= 1.5008), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:59,950 - root - INFO - Step 2280: lr=6.89E-06, loss= 1.0469 (max= 1.5008), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:59,950 - root - INFO - Step 2280: lr=6.89E-06, loss= 1.0469 (max= 1.5008), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:59,950 - root - INFO - Step 2280: lr=6.89E-06, loss= 1.0469 (max= 1.5008), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:06:59,950 - root - INFO - Step 2280: lr=6.89E-06, loss= 1.0469 (max= 1.5008), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:07:32,064 - root - INFO - Step 2290: lr=6.88E-06, loss= 1.0509 (max= 1.4961), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:07:32,064 - root - INFO - Step 2290: lr=6.88E-06, loss= 1.0509 (max= 1.4961), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:07:32,064 - root - INFO - Step 2290: lr=6.88E-06, loss= 1.0509 (max= 1.4961), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:07:32,064 - root - INFO - Step 2290: lr=6.88E-06, loss= 1.0509 (max= 1.4961), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:07:32,064 - root - INFO - Step 2290: lr=6.88E-06, loss= 1.0509 (max= 1.4961), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:07:32,064 - root - INFO - Step 2290: lr=6.88E-06, loss= 1.0509 (max= 1.4961), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:07:32,064 - root - INFO - Step 2290: lr=6.88E-06, loss= 1.0509 (max= 1.4961), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:07:32,064 - root - INFO - Step 2290: lr=6.88E-06, loss= 1.0509 (max= 1.4961), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:04,034 - root - INFO - Step 2300: lr=6.87E-06, loss= 1.0332 (max= 1.4816), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:04,034 - root - INFO - Step 2300: lr=6.87E-06, loss= 1.0332 (max= 1.4816), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:04,035 - root - INFO - Step 2300: lr=6.87E-06, loss= 1.0332 (max= 1.4816), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:04,035 - root - INFO - Step 2300: lr=6.87E-06, loss= 1.0332 (max= 1.4816), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:04,035 - root - INFO - Step 2300: lr=6.87E-06, loss= 1.0332 (max= 1.4816), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:04,035 - root - INFO - Step 2300: lr=6.87E-06, loss= 1.0332 (max= 1.4816), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:04,035 - root - INFO - Step 2300: lr=6.87E-06, loss= 1.0332 (max= 1.4816), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:04,035 - root - INFO - Step 2300: lr=6.87E-06, loss= 1.0332 (max= 1.4816), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:35,920 - root - INFO - Step 2310: lr=6.87E-06, loss= 1.0243 (max= 1.5086), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:35,920 - root - INFO - Step 2310: lr=6.87E-06, loss= 1.0243 (max= 1.5086), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:35,921 - root - INFO - Step 2310: lr=6.87E-06, loss= 1.0243 (max= 1.5086), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:35,921 - root - INFO - Step 2310: lr=6.87E-06, loss= 1.0243 (max= 1.5086), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:35,921 - root - INFO - Step 2310: lr=6.87E-06, loss= 1.0243 (max= 1.5086), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:35,921 - root - INFO - Step 2310: lr=6.87E-06, loss= 1.0243 (max= 1.5086), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:35,921 - root - INFO - Step 2310: lr=6.87E-06, loss= 1.0243 (max= 1.5086), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:08:35,921 - root - INFO - Step 2310: lr=6.87E-06, loss= 1.0243 (max= 1.5086), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:07,775 - root - INFO - Step 2320: lr=6.86E-06, loss= 1.0535 (max= 1.5737), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:07,775 - root - INFO - Step 2320: lr=6.86E-06, loss= 1.0535 (max= 1.5737), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:07,775 - root - INFO - Step 2320: lr=6.86E-06, loss= 1.0535 (max= 1.5737), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:07,776 - root - INFO - Step 2320: lr=6.86E-06, loss= 1.0535 (max= 1.5737), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:07,776 - root - INFO - Step 2320: lr=6.86E-06, loss= 1.0535 (max= 1.5737), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:07,776 - root - INFO - Step 2320: lr=6.86E-06, loss= 1.0535 (max= 1.5737), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:07,776 - root - INFO - Step 2320: lr=6.86E-06, loss= 1.0535 (max= 1.5737), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:07,776 - root - INFO - Step 2320: lr=6.86E-06, loss= 1.0535 (max= 1.5737), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:39,614 - root - INFO - Step 2330: lr=6.85E-06, loss= 1.0626 (max= 1.5454), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:39,614 - root - INFO - Step 2330: lr=6.85E-06, loss= 1.0626 (max= 1.5454), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:39,614 - root - INFO - Step 2330: lr=6.85E-06, loss= 1.0626 (max= 1.5454), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:39,615 - root - INFO - Step 2330: lr=6.85E-06, loss= 1.0626 (max= 1.5454), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:39,615 - root - INFO - Step 2330: lr=6.85E-06, loss= 1.0626 (max= 1.5454), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:39,615 - root - INFO - Step 2330: lr=6.85E-06, loss= 1.0626 (max= 1.5454), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:39,615 - root - INFO - Step 2330: lr=6.85E-06, loss= 1.0626 (max= 1.5454), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:09:39,615 - root - INFO - Step 2330: lr=6.85E-06, loss= 1.0626 (max= 1.5454), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:11,408 - root - INFO - Step 2340: lr=6.84E-06, loss= 1.0509 (max= 1.4751), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:11,408 - root - INFO - Step 2340: lr=6.84E-06, loss= 1.0509 (max= 1.4751), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:11,408 - root - INFO - Step 2340: lr=6.84E-06, loss= 1.0509 (max= 1.4751), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:11,408 - root - INFO - Step 2340: lr=6.84E-06, loss= 1.0509 (max= 1.4751), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:11,408 - root - INFO - Step 2340: lr=6.84E-06, loss= 1.0509 (max= 1.4751), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:11,408 - root - INFO - Step 2340: lr=6.84E-06, loss= 1.0509 (max= 1.4751), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:11,408 - root - INFO - Step 2340: lr=6.84E-06, loss= 1.0509 (max= 1.4751), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:11,408 - root - INFO - Step 2340: lr=6.84E-06, loss= 1.0509 (max= 1.4751), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:43,404 - root - INFO - Step 2350: lr=6.83E-06, loss= 1.0164 (max= 1.4705), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:43,404 - root - INFO - Step 2350: lr=6.83E-06, loss= 1.0164 (max= 1.4705), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:43,404 - root - INFO - Step 2350: lr=6.83E-06, loss= 1.0164 (max= 1.4705), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:43,404 - root - INFO - Step 2350: lr=6.83E-06, loss= 1.0164 (max= 1.4705), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:43,404 - root - INFO - Step 2350: lr=6.83E-06, loss= 1.0164 (max= 1.4705), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:43,404 - root - INFO - Step 2350: lr=6.83E-06, loss= 1.0164 (max= 1.4705), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:43,405 - root - INFO - Step 2350: lr=6.83E-06, loss= 1.0164 (max= 1.4705), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:10:43,405 - root - INFO - Step 2350: lr=6.83E-06, loss= 1.0164 (max= 1.4705), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:15,254 - root - INFO - Step 2360: lr=6.82E-06, loss= 1.0426 (max= 1.4931), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:15,254 - root - INFO - Step 2360: lr=6.82E-06, loss= 1.0426 (max= 1.4931), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:15,254 - root - INFO - Step 2360: lr=6.82E-06, loss= 1.0426 (max= 1.4931), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:15,254 - root - INFO - Step 2360: lr=6.82E-06, loss= 1.0426 (max= 1.4931), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:15,254 - root - INFO - Step 2360: lr=6.82E-06, loss= 1.0426 (max= 1.4931), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:15,254 - root - INFO - Step 2360: lr=6.82E-06, loss= 1.0426 (max= 1.4931), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:15,254 - root - INFO - Step 2360: lr=6.82E-06, loss= 1.0426 (max= 1.4931), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:15,254 - root - INFO - Step 2360: lr=6.82E-06, loss= 1.0426 (max= 1.4931), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:47,156 - root - INFO - Step 2370: lr=6.81E-06, loss= 1.0746 (max= 1.4891), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:47,156 - root - INFO - Step 2370: lr=6.81E-06, loss= 1.0746 (max= 1.4891), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:47,157 - root - INFO - Step 2370: lr=6.81E-06, loss= 1.0746 (max= 1.4891), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:47,157 - root - INFO - Step 2370: lr=6.81E-06, loss= 1.0746 (max= 1.4891), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:47,157 - root - INFO - Step 2370: lr=6.81E-06, loss= 1.0746 (max= 1.4891), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:47,157 - root - INFO - Step 2370: lr=6.81E-06, loss= 1.0746 (max= 1.4891), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:47,157 - root - INFO - Step 2370: lr=6.81E-06, loss= 1.0746 (max= 1.4891), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:11:47,157 - root - INFO - Step 2370: lr=6.81E-06, loss= 1.0746 (max= 1.4891), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:19,099 - root - INFO - Step 2380: lr=6.81E-06, loss= 1.0610 (max= 1.5467), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:19,099 - root - INFO - Step 2380: lr=6.81E-06, loss= 1.0610 (max= 1.5467), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:19,099 - root - INFO - Step 2380: lr=6.81E-06, loss= 1.0610 (max= 1.5467), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:19,099 - root - INFO - Step 2380: lr=6.81E-06, loss= 1.0610 (max= 1.5467), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:19,099 - root - INFO - Step 2380: lr=6.81E-06, loss= 1.0610 (max= 1.5467), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:19,099 - root - INFO - Step 2380: lr=6.81E-06, loss= 1.0610 (max= 1.5467), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:19,099 - root - INFO - Step 2380: lr=6.81E-06, loss= 1.0610 (max= 1.5467), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:19,099 - root - INFO - Step 2380: lr=6.81E-06, loss= 1.0610 (max= 1.5467), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:50,867 - root - INFO - Step 2390: lr=6.80E-06, loss= 1.0462 (max= 1.5274), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:50,867 - root - INFO - Step 2390: lr=6.80E-06, loss= 1.0462 (max= 1.5274), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:50,867 - root - INFO - Step 2390: lr=6.80E-06, loss= 1.0462 (max= 1.5274), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:50,867 - root - INFO - Step 2390: lr=6.80E-06, loss= 1.0462 (max= 1.5274), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:50,867 - root - INFO - Step 2390: lr=6.80E-06, loss= 1.0462 (max= 1.5274), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:50,867 - root - INFO - Step 2390: lr=6.80E-06, loss= 1.0462 (max= 1.5274), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:50,867 - root - INFO - Step 2390: lr=6.80E-06, loss= 1.0462 (max= 1.5274), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:12:50,867 - root - INFO - Step 2390: lr=6.80E-06, loss= 1.0462 (max= 1.5274), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:22,697 - root - INFO - Step 2400: lr=6.79E-06, loss= 1.0844 (max= 1.6421), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:22,697 - root - INFO - Step 2400: lr=6.79E-06, loss= 1.0844 (max= 1.6421), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:22,697 - root - INFO - Step 2400: lr=6.79E-06, loss= 1.0844 (max= 1.6421), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:22,697 - root - INFO - Step 2400: lr=6.79E-06, loss= 1.0844 (max= 1.6421), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:22,697 - root - INFO - Step 2400: lr=6.79E-06, loss= 1.0844 (max= 1.6421), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:22,698 - root - INFO - Step 2400: lr=6.79E-06, loss= 1.0844 (max= 1.6421), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:22,698 - root - INFO - Step 2400: lr=6.79E-06, loss= 1.0844 (max= 1.6421), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:22,698 - root - INFO - Step 2400: lr=6.79E-06, loss= 1.0844 (max= 1.6421), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:54,598 - root - INFO - Step 2410: lr=6.78E-06, loss= 1.0280 (max= 1.3941), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:54,599 - root - INFO - Step 2410: lr=6.78E-06, loss= 1.0280 (max= 1.3941), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:54,599 - root - INFO - Step 2410: lr=6.78E-06, loss= 1.0280 (max= 1.3941), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:54,599 - root - INFO - Step 2410: lr=6.78E-06, loss= 1.0280 (max= 1.3941), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:54,599 - root - INFO - Step 2410: lr=6.78E-06, loss= 1.0280 (max= 1.3941), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:54,599 - root - INFO - Step 2410: lr=6.78E-06, loss= 1.0280 (max= 1.3941), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:54,599 - root - INFO - Step 2410: lr=6.78E-06, loss= 1.0280 (max= 1.3941), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:13:54,599 - root - INFO - Step 2410: lr=6.78E-06, loss= 1.0280 (max= 1.3941), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:26,439 - root - INFO - Step 2420: lr=6.77E-06, loss= 1.0438 (max= 1.5596), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:26,439 - root - INFO - Step 2420: lr=6.77E-06, loss= 1.0438 (max= 1.5596), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:26,439 - root - INFO - Step 2420: lr=6.77E-06, loss= 1.0438 (max= 1.5596), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:26,440 - root - INFO - Step 2420: lr=6.77E-06, loss= 1.0438 (max= 1.5596), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:26,440 - root - INFO - Step 2420: lr=6.77E-06, loss= 1.0438 (max= 1.5596), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:26,440 - root - INFO - Step 2420: lr=6.77E-06, loss= 1.0438 (max= 1.5596), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:26,440 - root - INFO - Step 2420: lr=6.77E-06, loss= 1.0438 (max= 1.5596), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:26,440 - root - INFO - Step 2420: lr=6.77E-06, loss= 1.0438 (max= 1.5596), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:58,352 - root - INFO - Step 2430: lr=6.76E-06, loss= 1.0693 (max= 1.5058), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:58,353 - root - INFO - Step 2430: lr=6.76E-06, loss= 1.0693 (max= 1.5058), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:58,353 - root - INFO - Step 2430: lr=6.76E-06, loss= 1.0693 (max= 1.5058), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:58,353 - root - INFO - Step 2430: lr=6.76E-06, loss= 1.0693 (max= 1.5058), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:58,353 - root - INFO - Step 2430: lr=6.76E-06, loss= 1.0693 (max= 1.5058), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:58,353 - root - INFO - Step 2430: lr=6.76E-06, loss= 1.0693 (max= 1.5058), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:58,353 - root - INFO - Step 2430: lr=6.76E-06, loss= 1.0693 (max= 1.5058), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:14:58,353 - root - INFO - Step 2430: lr=6.76E-06, loss= 1.0693 (max= 1.5058), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:15:30,246 - root - INFO - Step 2440: lr=6.76E-06, loss= 1.0322 (max= 1.5560), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:15:30,246 - root - INFO - Step 2440: lr=6.76E-06, loss= 1.0322 (max= 1.5560), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:15:30,247 - root - INFO - Step 2440: lr=6.76E-06, loss= 1.0322 (max= 1.5560), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:15:30,247 - root - INFO - Step 2440: lr=6.76E-06, loss= 1.0322 (max= 1.5560), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:15:30,247 - root - INFO - Step 2440: lr=6.76E-06, loss= 1.0322 (max= 1.5560), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:15:30,247 - root - INFO - Step 2440: lr=6.76E-06, loss= 1.0322 (max= 1.5560), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:15:30,247 - root - INFO - Step 2440: lr=6.76E-06, loss= 1.0322 (max= 1.5560), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:15:30,247 - root - INFO - Step 2440: lr=6.76E-06, loss= 1.0322 (max= 1.5560), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:02,171 - root - INFO - Step 2450: lr=6.75E-06, loss= 1.0514 (max= 1.7367), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:02,172 - root - INFO - Step 2450: lr=6.75E-06, loss= 1.0514 (max= 1.7367), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:02,172 - root - INFO - Step 2450: lr=6.75E-06, loss= 1.0514 (max= 1.7367), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:02,172 - root - INFO - Step 2450: lr=6.75E-06, loss= 1.0514 (max= 1.7367), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:02,172 - root - INFO - Step 2450: lr=6.75E-06, loss= 1.0514 (max= 1.7367), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:02,172 - root - INFO - Step 2450: lr=6.75E-06, loss= 1.0514 (max= 1.7367), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:02,172 - root - INFO - Step 2450: lr=6.75E-06, loss= 1.0514 (max= 1.7367), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:02,172 - root - INFO - Step 2450: lr=6.75E-06, loss= 1.0514 (max= 1.7367), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:34,044 - root - INFO - Step 2460: lr=6.74E-06, loss= 1.0362 (max= 1.4300), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:34,044 - root - INFO - Step 2460: lr=6.74E-06, loss= 1.0362 (max= 1.4300), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:34,044 - root - INFO - Step 2460: lr=6.74E-06, loss= 1.0362 (max= 1.4300), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:34,044 - root - INFO - Step 2460: lr=6.74E-06, loss= 1.0362 (max= 1.4300), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:34,044 - root - INFO - Step 2460: lr=6.74E-06, loss= 1.0362 (max= 1.4300), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:34,044 - root - INFO - Step 2460: lr=6.74E-06, loss= 1.0362 (max= 1.4300), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:34,044 - root - INFO - Step 2460: lr=6.74E-06, loss= 1.0362 (max= 1.4300), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:16:34,044 - root - INFO - Step 2460: lr=6.74E-06, loss= 1.0362 (max= 1.4300), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:06,098 - root - INFO - Step 2470: lr=6.73E-06, loss= 1.0525 (max= 1.5093), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:06,098 - root - INFO - Step 2470: lr=6.73E-06, loss= 1.0525 (max= 1.5093), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:06,098 - root - INFO - Step 2470: lr=6.73E-06, loss= 1.0525 (max= 1.5093), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:06,098 - root - INFO - Step 2470: lr=6.73E-06, loss= 1.0525 (max= 1.5093), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:06,099 - root - INFO - Step 2470: lr=6.73E-06, loss= 1.0525 (max= 1.5093), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:06,099 - root - INFO - Step 2470: lr=6.73E-06, loss= 1.0525 (max= 1.5093), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:06,099 - root - INFO - Step 2470: lr=6.73E-06, loss= 1.0525 (max= 1.5093), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:06,099 - root - INFO - Step 2470: lr=6.73E-06, loss= 1.0525 (max= 1.5093), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:38,051 - root - INFO - Step 2480: lr=6.72E-06, loss= 1.0466 (max= 1.6817), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:38,051 - root - INFO - Step 2480: lr=6.72E-06, loss= 1.0466 (max= 1.6817), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:38,051 - root - INFO - Step 2480: lr=6.72E-06, loss= 1.0466 (max= 1.6817), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:38,051 - root - INFO - Step 2480: lr=6.72E-06, loss= 1.0466 (max= 1.6817), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:38,051 - root - INFO - Step 2480: lr=6.72E-06, loss= 1.0466 (max= 1.6817), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:38,051 - root - INFO - Step 2480: lr=6.72E-06, loss= 1.0466 (max= 1.6817), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:38,051 - root - INFO - Step 2480: lr=6.72E-06, loss= 1.0466 (max= 1.6817), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:17:38,051 - root - INFO - Step 2480: lr=6.72E-06, loss= 1.0466 (max= 1.6817), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:09,978 - root - INFO - Step 2490: lr=6.71E-06, loss= 1.0671 (max= 1.5115), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:09,978 - root - INFO - Step 2490: lr=6.71E-06, loss= 1.0671 (max= 1.5115), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:09,978 - root - INFO - Step 2490: lr=6.71E-06, loss= 1.0671 (max= 1.5115), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:09,978 - root - INFO - Step 2490: lr=6.71E-06, loss= 1.0671 (max= 1.5115), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:09,978 - root - INFO - Step 2490: lr=6.71E-06, loss= 1.0671 (max= 1.5115), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:09,978 - root - INFO - Step 2490: lr=6.71E-06, loss= 1.0671 (max= 1.5115), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:09,978 - root - INFO - Step 2490: lr=6.71E-06, loss= 1.0671 (max= 1.5115), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:09,978 - root - INFO - Step 2490: lr=6.71E-06, loss= 1.0671 (max= 1.5115), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:41,842 - root - INFO - Step 2500: lr=6.71E-06, loss= 1.0590 (max= 1.4621), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:41,842 - root - INFO - Step 2500: lr=6.71E-06, loss= 1.0590 (max= 1.4621), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:41,843 - root - INFO - Step 2500: lr=6.71E-06, loss= 1.0590 (max= 1.4621), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:41,843 - root - INFO - Step 2500: lr=6.71E-06, loss= 1.0590 (max= 1.4621), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:41,843 - root - INFO - Step 2500: lr=6.71E-06, loss= 1.0590 (max= 1.4621), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:41,843 - root - INFO - Step 2500: lr=6.71E-06, loss= 1.0590 (max= 1.4621), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:41,843 - root - INFO - Step 2500: lr=6.71E-06, loss= 1.0590 (max= 1.4621), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:18:41,843 - root - INFO - Step 2500: lr=6.71E-06, loss= 1.0590 (max= 1.4621), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:13,699 - root - INFO - Step 2510: lr=6.70E-06, loss= 1.0422 (max= 1.8024), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:13,699 - root - INFO - Step 2510: lr=6.70E-06, loss= 1.0422 (max= 1.8024), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:13,699 - root - INFO - Step 2510: lr=6.70E-06, loss= 1.0422 (max= 1.8024), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:13,699 - root - INFO - Step 2510: lr=6.70E-06, loss= 1.0422 (max= 1.8024), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:13,699 - root - INFO - Step 2510: lr=6.70E-06, loss= 1.0422 (max= 1.8024), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:13,699 - root - INFO - Step 2510: lr=6.70E-06, loss= 1.0422 (max= 1.8024), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:13,699 - root - INFO - Step 2510: lr=6.70E-06, loss= 1.0422 (max= 1.8024), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:13,699 - root - INFO - Step 2510: lr=6.70E-06, loss= 1.0422 (max= 1.8024), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:45,494 - root - INFO - Step 2520: lr=6.69E-06, loss= 1.0581 (max= 1.8696), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:45,494 - root - INFO - Step 2520: lr=6.69E-06, loss= 1.0581 (max= 1.8696), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:45,494 - root - INFO - Step 2520: lr=6.69E-06, loss= 1.0581 (max= 1.8696), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:45,494 - root - INFO - Step 2520: lr=6.69E-06, loss= 1.0581 (max= 1.8696), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:45,494 - root - INFO - Step 2520: lr=6.69E-06, loss= 1.0581 (max= 1.8696), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:45,494 - root - INFO - Step 2520: lr=6.69E-06, loss= 1.0581 (max= 1.8696), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:45,494 - root - INFO - Step 2520: lr=6.69E-06, loss= 1.0581 (max= 1.8696), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:19:45,494 - root - INFO - Step 2520: lr=6.69E-06, loss= 1.0581 (max= 1.8696), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:17,302 - root - INFO - Step 2530: lr=6.68E-06, loss= 1.0467 (max= 1.4846), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:17,302 - root - INFO - Step 2530: lr=6.68E-06, loss= 1.0467 (max= 1.4846), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:17,302 - root - INFO - Step 2530: lr=6.68E-06, loss= 1.0467 (max= 1.4846), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:17,302 - root - INFO - Step 2530: lr=6.68E-06, loss= 1.0467 (max= 1.4846), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:17,302 - root - INFO - Step 2530: lr=6.68E-06, loss= 1.0467 (max= 1.4846), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:17,302 - root - INFO - Step 2530: lr=6.68E-06, loss= 1.0467 (max= 1.4846), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:17,302 - root - INFO - Step 2530: lr=6.68E-06, loss= 1.0467 (max= 1.4846), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:17,302 - root - INFO - Step 2530: lr=6.68E-06, loss= 1.0467 (max= 1.4846), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:49,080 - root - INFO - Step 2540: lr=6.67E-06, loss= 1.0671 (max= 1.6519), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:49,080 - root - INFO - Step 2540: lr=6.67E-06, loss= 1.0671 (max= 1.6519), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:49,080 - root - INFO - Step 2540: lr=6.67E-06, loss= 1.0671 (max= 1.6519), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:49,080 - root - INFO - Step 2540: lr=6.67E-06, loss= 1.0671 (max= 1.6519), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:49,080 - root - INFO - Step 2540: lr=6.67E-06, loss= 1.0671 (max= 1.6519), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:49,080 - root - INFO - Step 2540: lr=6.67E-06, loss= 1.0671 (max= 1.6519), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:49,080 - root - INFO - Step 2540: lr=6.67E-06, loss= 1.0671 (max= 1.6519), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:20:49,080 - root - INFO - Step 2540: lr=6.67E-06, loss= 1.0671 (max= 1.6519), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:20,835 - root - INFO - Step 2550: lr=6.66E-06, loss= 1.0660 (max= 1.4808), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:20,835 - root - INFO - Step 2550: lr=6.66E-06, loss= 1.0660 (max= 1.4808), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:20,835 - root - INFO - Step 2550: lr=6.66E-06, loss= 1.0660 (max= 1.4808), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:20,835 - root - INFO - Step 2550: lr=6.66E-06, loss= 1.0660 (max= 1.4808), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:20,835 - root - INFO - Step 2550: lr=6.66E-06, loss= 1.0660 (max= 1.4808), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:20,835 - root - INFO - Step 2550: lr=6.66E-06, loss= 1.0660 (max= 1.4808), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:20,835 - root - INFO - Step 2550: lr=6.66E-06, loss= 1.0660 (max= 1.4808), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:20,835 - root - INFO - Step 2550: lr=6.66E-06, loss= 1.0660 (max= 1.4808), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:52,737 - root - INFO - Step 2560: lr=6.66E-06, loss= 1.0380 (max= 1.7410), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:52,737 - root - INFO - Step 2560: lr=6.66E-06, loss= 1.0380 (max= 1.7410), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:52,737 - root - INFO - Step 2560: lr=6.66E-06, loss= 1.0380 (max= 1.7410), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:52,737 - root - INFO - Step 2560: lr=6.66E-06, loss= 1.0380 (max= 1.7410), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:52,737 - root - INFO - Step 2560: lr=6.66E-06, loss= 1.0380 (max= 1.7410), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:52,737 - root - INFO - Step 2560: lr=6.66E-06, loss= 1.0380 (max= 1.7410), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:52,737 - root - INFO - Step 2560: lr=6.66E-06, loss= 1.0380 (max= 1.7410), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:21:52,737 - root - INFO - Step 2560: lr=6.66E-06, loss= 1.0380 (max= 1.7410), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:24,685 - root - INFO - Step 2570: lr=6.65E-06, loss= 1.0628 (max= 1.4398), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:24,685 - root - INFO - Step 2570: lr=6.65E-06, loss= 1.0628 (max= 1.4398), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:24,685 - root - INFO - Step 2570: lr=6.65E-06, loss= 1.0628 (max= 1.4398), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:24,686 - root - INFO - Step 2570: lr=6.65E-06, loss= 1.0628 (max= 1.4398), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:24,686 - root - INFO - Step 2570: lr=6.65E-06, loss= 1.0628 (max= 1.4398), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:24,686 - root - INFO - Step 2570: lr=6.65E-06, loss= 1.0628 (max= 1.4398), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:24,686 - root - INFO - Step 2570: lr=6.65E-06, loss= 1.0628 (max= 1.4398), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:24,686 - root - INFO - Step 2570: lr=6.65E-06, loss= 1.0628 (max= 1.4398), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:56,610 - root - INFO - Step 2580: lr=6.64E-06, loss= 1.0614 (max= 1.4631), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:56,610 - root - INFO - Step 2580: lr=6.64E-06, loss= 1.0614 (max= 1.4631), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:56,610 - root - INFO - Step 2580: lr=6.64E-06, loss= 1.0614 (max= 1.4631), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:56,610 - root - INFO - Step 2580: lr=6.64E-06, loss= 1.0614 (max= 1.4631), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:56,610 - root - INFO - Step 2580: lr=6.64E-06, loss= 1.0614 (max= 1.4631), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:56,610 - root - INFO - Step 2580: lr=6.64E-06, loss= 1.0614 (max= 1.4631), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:56,610 - root - INFO - Step 2580: lr=6.64E-06, loss= 1.0614 (max= 1.4631), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:22:56,610 - root - INFO - Step 2580: lr=6.64E-06, loss= 1.0614 (max= 1.4631), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:23:28,517 - root - INFO - Step 2590: lr=6.63E-06, loss= 1.0414 (max= 1.6138), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:23:28,517 - root - INFO - Step 2590: lr=6.63E-06, loss= 1.0414 (max= 1.6138), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:23:28,517 - root - INFO - Step 2590: lr=6.63E-06, loss= 1.0414 (max= 1.6138), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:23:28,517 - root - INFO - Step 2590: lr=6.63E-06, loss= 1.0414 (max= 1.6138), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:23:28,517 - root - INFO - Step 2590: lr=6.63E-06, loss= 1.0414 (max= 1.6138), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:23:28,517 - root - INFO - Step 2590: lr=6.63E-06, loss= 1.0414 (max= 1.6138), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:23:28,518 - root - INFO - Step 2590: lr=6.63E-06, loss= 1.0414 (max= 1.6138), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:23:28,518 - root - INFO - Step 2590: lr=6.63E-06, loss= 1.0414 (max= 1.6138), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:00,492 - root - INFO - Step 2600: lr=6.62E-06, loss= 1.0502 (max= 1.9278), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:00,492 - root - INFO - Step 2600: lr=6.62E-06, loss= 1.0502 (max= 1.9278), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:00,492 - root - INFO - Step 2600: lr=6.62E-06, loss= 1.0502 (max= 1.9278), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:00,492 - root - INFO - Step 2600: lr=6.62E-06, loss= 1.0502 (max= 1.9278), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:00,492 - root - INFO - Step 2600: lr=6.62E-06, loss= 1.0502 (max= 1.9278), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:00,492 - root - INFO - Step 2600: lr=6.62E-06, loss= 1.0502 (max= 1.9278), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:00,492 - root - INFO - Step 2600: lr=6.62E-06, loss= 1.0502 (max= 1.9278), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:00,492 - root - INFO - Step 2600: lr=6.62E-06, loss= 1.0502 (max= 1.9278), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:01,064 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:6008421 2025-10-26 10:24:32,307 - root - INFO - Step 2610: lr=6.62E-06, loss= 1.0720 (max= 1.5814), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:32,307 - root - INFO - Step 2610: lr=6.62E-06, loss= 1.0720 (max= 1.5814), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:32,307 - root - INFO - Step 2610: lr=6.62E-06, loss= 1.0720 (max= 1.5814), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:32,307 - root - INFO - Step 2610: lr=6.62E-06, loss= 1.0720 (max= 1.5814), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:32,307 - root - INFO - Step 2610: lr=6.62E-06, loss= 1.0720 (max= 1.5814), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:32,307 - root - INFO - Step 2610: lr=6.62E-06, loss= 1.0720 (max= 1.5814), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:32,307 - root - INFO - Step 2610: lr=6.62E-06, loss= 1.0720 (max= 1.5814), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:24:32,308 - root - INFO - Step 2610: lr=6.62E-06, loss= 1.0720 (max= 1.5814), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:04,107 - root - INFO - Step 2620: lr=6.61E-06, loss= 1.0414 (max= 1.5386), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:04,107 - root - INFO - Step 2620: lr=6.61E-06, loss= 1.0414 (max= 1.5386), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:04,107 - root - INFO - Step 2620: lr=6.61E-06, loss= 1.0414 (max= 1.5386), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:04,107 - root - INFO - Step 2620: lr=6.61E-06, loss= 1.0414 (max= 1.5386), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:04,107 - root - INFO - Step 2620: lr=6.61E-06, loss= 1.0414 (max= 1.5386), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:04,107 - root - INFO - Step 2620: lr=6.61E-06, loss= 1.0414 (max= 1.5386), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:04,107 - root - INFO - Step 2620: lr=6.61E-06, loss= 1.0414 (max= 1.5386), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:04,107 - root - INFO - Step 2620: lr=6.61E-06, loss= 1.0414 (max= 1.5386), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:36,096 - root - INFO - Step 2630: lr=6.60E-06, loss= 1.0493 (max= 1.6273), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:36,096 - root - INFO - Step 2630: lr=6.60E-06, loss= 1.0493 (max= 1.6273), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:36,097 - root - INFO - Step 2630: lr=6.60E-06, loss= 1.0493 (max= 1.6273), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:36,097 - root - INFO - Step 2630: lr=6.60E-06, loss= 1.0493 (max= 1.6273), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:36,097 - root - INFO - Step 2630: lr=6.60E-06, loss= 1.0493 (max= 1.6273), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:36,097 - root - INFO - Step 2630: lr=6.60E-06, loss= 1.0493 (max= 1.6273), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:36,097 - root - INFO - Step 2630: lr=6.60E-06, loss= 1.0493 (max= 1.6273), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:25:36,097 - root - INFO - Step 2630: lr=6.60E-06, loss= 1.0493 (max= 1.6273), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:08,030 - root - INFO - Step 2640: lr=6.59E-06, loss= 1.0523 (max= 1.6290), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:08,030 - root - INFO - Step 2640: lr=6.59E-06, loss= 1.0523 (max= 1.6290), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:08,030 - root - INFO - Step 2640: lr=6.59E-06, loss= 1.0523 (max= 1.6290), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:08,030 - root - INFO - Step 2640: lr=6.59E-06, loss= 1.0523 (max= 1.6290), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:08,030 - root - INFO - Step 2640: lr=6.59E-06, loss= 1.0523 (max= 1.6290), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:08,030 - root - INFO - Step 2640: lr=6.59E-06, loss= 1.0523 (max= 1.6290), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:08,030 - root - INFO - Step 2640: lr=6.59E-06, loss= 1.0523 (max= 1.6290), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:08,030 - root - INFO - Step 2640: lr=6.59E-06, loss= 1.0523 (max= 1.6290), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:39,965 - root - INFO - Step 2650: lr=6.58E-06, loss= 1.0586 (max= 1.4334), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:39,965 - root - INFO - Step 2650: lr=6.58E-06, loss= 1.0586 (max= 1.4334), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:39,965 - root - INFO - Step 2650: lr=6.58E-06, loss= 1.0586 (max= 1.4334), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:39,965 - root - INFO - Step 2650: lr=6.58E-06, loss= 1.0586 (max= 1.4334), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:39,965 - root - INFO - Step 2650: lr=6.58E-06, loss= 1.0586 (max= 1.4334), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:39,965 - root - INFO - Step 2650: lr=6.58E-06, loss= 1.0586 (max= 1.4334), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:39,965 - root - INFO - Step 2650: lr=6.58E-06, loss= 1.0586 (max= 1.4334), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:26:39,965 - root - INFO - Step 2650: lr=6.58E-06, loss= 1.0586 (max= 1.4334), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:11,852 - root - INFO - Step 2660: lr=6.58E-06, loss= 1.0600 (max= 1.5372), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:11,852 - root - INFO - Step 2660: lr=6.58E-06, loss= 1.0600 (max= 1.5372), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:11,852 - root - INFO - Step 2660: lr=6.58E-06, loss= 1.0600 (max= 1.5372), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:11,852 - root - INFO - Step 2660: lr=6.58E-06, loss= 1.0600 (max= 1.5372), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:11,853 - root - INFO - Step 2660: lr=6.58E-06, loss= 1.0600 (max= 1.5372), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:11,853 - root - INFO - Step 2660: lr=6.58E-06, loss= 1.0600 (max= 1.5372), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:11,853 - root - INFO - Step 2660: lr=6.58E-06, loss= 1.0600 (max= 1.5372), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:11,853 - root - INFO - Step 2660: lr=6.58E-06, loss= 1.0600 (max= 1.5372), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:43,690 - root - INFO - Step 2670: lr=6.57E-06, loss= 1.0657 (max= 1.4984), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:43,690 - root - INFO - Step 2670: lr=6.57E-06, loss= 1.0657 (max= 1.4984), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:43,691 - root - INFO - Step 2670: lr=6.57E-06, loss= 1.0657 (max= 1.4984), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:43,691 - root - INFO - Step 2670: lr=6.57E-06, loss= 1.0657 (max= 1.4984), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:43,691 - root - INFO - Step 2670: lr=6.57E-06, loss= 1.0657 (max= 1.4984), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:43,691 - root - INFO - Step 2670: lr=6.57E-06, loss= 1.0657 (max= 1.4984), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:43,691 - root - INFO - Step 2670: lr=6.57E-06, loss= 1.0657 (max= 1.4984), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:27:43,691 - root - INFO - Step 2670: lr=6.57E-06, loss= 1.0657 (max= 1.4984), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:15,578 - root - INFO - Step 2680: lr=6.56E-06, loss= 1.0456 (max= 1.4571), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:15,578 - root - INFO - Step 2680: lr=6.56E-06, loss= 1.0456 (max= 1.4571), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:15,578 - root - INFO - Step 2680: lr=6.56E-06, loss= 1.0456 (max= 1.4571), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:15,578 - root - INFO - Step 2680: lr=6.56E-06, loss= 1.0456 (max= 1.4571), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:15,578 - root - INFO - Step 2680: lr=6.56E-06, loss= 1.0456 (max= 1.4571), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:15,578 - root - INFO - Step 2680: lr=6.56E-06, loss= 1.0456 (max= 1.4571), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:15,578 - root - INFO - Step 2680: lr=6.56E-06, loss= 1.0456 (max= 1.4571), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:15,579 - root - INFO - Step 2680: lr=6.56E-06, loss= 1.0456 (max= 1.4571), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:47,441 - root - INFO - Step 2690: lr=6.55E-06, loss= 1.0521 (max= 1.5200), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:47,442 - root - INFO - Step 2690: lr=6.55E-06, loss= 1.0521 (max= 1.5200), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:47,442 - root - INFO - Step 2690: lr=6.55E-06, loss= 1.0521 (max= 1.5200), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:47,442 - root - INFO - Step 2690: lr=6.55E-06, loss= 1.0521 (max= 1.5200), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:47,442 - root - INFO - Step 2690: lr=6.55E-06, loss= 1.0521 (max= 1.5200), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:47,442 - root - INFO - Step 2690: lr=6.55E-06, loss= 1.0521 (max= 1.5200), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:47,442 - root - INFO - Step 2690: lr=6.55E-06, loss= 1.0521 (max= 1.5200), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:28:47,442 - root - INFO - Step 2690: lr=6.55E-06, loss= 1.0521 (max= 1.5200), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:19,292 - root - INFO - Step 2700: lr=6.54E-06, loss= 1.0407 (max= 1.5892), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:19,292 - root - INFO - Step 2700: lr=6.54E-06, loss= 1.0407 (max= 1.5892), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:19,292 - root - INFO - Step 2700: lr=6.54E-06, loss= 1.0407 (max= 1.5892), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:19,292 - root - INFO - Step 2700: lr=6.54E-06, loss= 1.0407 (max= 1.5892), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:19,292 - root - INFO - Step 2700: lr=6.54E-06, loss= 1.0407 (max= 1.5892), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:19,292 - root - INFO - Step 2700: lr=6.54E-06, loss= 1.0407 (max= 1.5892), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:19,292 - root - INFO - Step 2700: lr=6.54E-06, loss= 1.0407 (max= 1.5892), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:19,292 - root - INFO - Step 2700: lr=6.54E-06, loss= 1.0407 (max= 1.5892), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:51,150 - root - INFO - Step 2710: lr=6.54E-06, loss= 1.0623 (max= 1.5190), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:51,150 - root - INFO - Step 2710: lr=6.54E-06, loss= 1.0623 (max= 1.5190), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:51,150 - root - INFO - Step 2710: lr=6.54E-06, loss= 1.0623 (max= 1.5190), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:51,150 - root - INFO - Step 2710: lr=6.54E-06, loss= 1.0623 (max= 1.5190), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:51,150 - root - INFO - Step 2710: lr=6.54E-06, loss= 1.0623 (max= 1.5190), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:51,150 - root - INFO - Step 2710: lr=6.54E-06, loss= 1.0623 (max= 1.5190), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:51,150 - root - INFO - Step 2710: lr=6.54E-06, loss= 1.0623 (max= 1.5190), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:29:51,150 - root - INFO - Step 2710: lr=6.54E-06, loss= 1.0623 (max= 1.5190), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:23,065 - root - INFO - Step 2720: lr=6.53E-06, loss= 1.0438 (max= 1.5160), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:23,065 - root - INFO - Step 2720: lr=6.53E-06, loss= 1.0438 (max= 1.5160), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:23,065 - root - INFO - Step 2720: lr=6.53E-06, loss= 1.0438 (max= 1.5160), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:23,065 - root - INFO - Step 2720: lr=6.53E-06, loss= 1.0438 (max= 1.5160), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:23,065 - root - INFO - Step 2720: lr=6.53E-06, loss= 1.0438 (max= 1.5160), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:23,065 - root - INFO - Step 2720: lr=6.53E-06, loss= 1.0438 (max= 1.5160), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:23,065 - root - INFO - Step 2720: lr=6.53E-06, loss= 1.0438 (max= 1.5160), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:23,065 - root - INFO - Step 2720: lr=6.53E-06, loss= 1.0438 (max= 1.5160), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:54,919 - root - INFO - Step 2730: lr=6.52E-06, loss= 1.1001 (max= 1.9484), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:54,919 - root - INFO - Step 2730: lr=6.52E-06, loss= 1.1001 (max= 1.9484), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:54,919 - root - INFO - Step 2730: lr=6.52E-06, loss= 1.1001 (max= 1.9484), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:54,919 - root - INFO - Step 2730: lr=6.52E-06, loss= 1.1001 (max= 1.9484), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:54,919 - root - INFO - Step 2730: lr=6.52E-06, loss= 1.1001 (max= 1.9484), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:54,919 - root - INFO - Step 2730: lr=6.52E-06, loss= 1.1001 (max= 1.9484), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:54,919 - root - INFO - Step 2730: lr=6.52E-06, loss= 1.1001 (max= 1.9484), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:30:54,920 - root - INFO - Step 2730: lr=6.52E-06, loss= 1.1001 (max= 1.9484), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:26,770 - root - INFO - Step 2740: lr=6.51E-06, loss= 1.0309 (max= 1.6309), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:26,770 - root - INFO - Step 2740: lr=6.51E-06, loss= 1.0309 (max= 1.6309), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:26,770 - root - INFO - Step 2740: lr=6.51E-06, loss= 1.0309 (max= 1.6309), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:26,770 - root - INFO - Step 2740: lr=6.51E-06, loss= 1.0309 (max= 1.6309), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:26,770 - root - INFO - Step 2740: lr=6.51E-06, loss= 1.0309 (max= 1.6309), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:26,770 - root - INFO - Step 2740: lr=6.51E-06, loss= 1.0309 (max= 1.6309), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:26,770 - root - INFO - Step 2740: lr=6.51E-06, loss= 1.0309 (max= 1.6309), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:26,770 - root - INFO - Step 2740: lr=6.51E-06, loss= 1.0309 (max= 1.6309), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:58,660 - root - INFO - Step 2750: lr=6.51E-06, loss= 1.0790 (max= 1.6429), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:58,660 - root - INFO - Step 2750: lr=6.51E-06, loss= 1.0790 (max= 1.6429), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:58,660 - root - INFO - Step 2750: lr=6.51E-06, loss= 1.0790 (max= 1.6429), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:58,660 - root - INFO - Step 2750: lr=6.51E-06, loss= 1.0790 (max= 1.6429), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:58,661 - root - INFO - Step 2750: lr=6.51E-06, loss= 1.0790 (max= 1.6429), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:58,661 - root - INFO - Step 2750: lr=6.51E-06, loss= 1.0790 (max= 1.6429), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:58,661 - root - INFO - Step 2750: lr=6.51E-06, loss= 1.0790 (max= 1.6429), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:31:58,661 - root - INFO - Step 2750: lr=6.51E-06, loss= 1.0790 (max= 1.6429), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:32:30,462 - root - INFO - Step 2760: lr=6.50E-06, loss= 1.0621 (max= 1.4743), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:32:30,462 - root - INFO - Step 2760: lr=6.50E-06, loss= 1.0621 (max= 1.4743), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:32:30,463 - root - INFO - Step 2760: lr=6.50E-06, loss= 1.0621 (max= 1.4743), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:32:30,463 - root - INFO - Step 2760: lr=6.50E-06, loss= 1.0621 (max= 1.4743), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:32:30,463 - root - INFO - Step 2760: lr=6.50E-06, loss= 1.0621 (max= 1.4743), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:32:30,463 - root - INFO - Step 2760: lr=6.50E-06, loss= 1.0621 (max= 1.4743), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:32:30,463 - root - INFO - Step 2760: lr=6.50E-06, loss= 1.0621 (max= 1.4743), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:32:30,463 - root - INFO - Step 2760: lr=6.50E-06, loss= 1.0621 (max= 1.4743), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:02,296 - root - INFO - Step 2770: lr=6.49E-06, loss= 1.0812 (max= 1.4995), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:02,296 - root - INFO - Step 2770: lr=6.49E-06, loss= 1.0812 (max= 1.4995), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:02,296 - root - INFO - Step 2770: lr=6.49E-06, loss= 1.0812 (max= 1.4995), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:02,296 - root - INFO - Step 2770: lr=6.49E-06, loss= 1.0812 (max= 1.4995), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:02,296 - root - INFO - Step 2770: lr=6.49E-06, loss= 1.0812 (max= 1.4995), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:02,296 - root - INFO - Step 2770: lr=6.49E-06, loss= 1.0812 (max= 1.4995), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:02,296 - root - INFO - Step 2770: lr=6.49E-06, loss= 1.0812 (max= 1.4995), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:02,296 - root - INFO - Step 2770: lr=6.49E-06, loss= 1.0812 (max= 1.4995), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:34,034 - root - INFO - Step 2780: lr=6.48E-06, loss= 1.0320 (max= 1.4928), tps=20651, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:34,034 - root - INFO - Step 2780: lr=6.48E-06, loss= 1.0320 (max= 1.4928), tps=20651, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:34,035 - root - INFO - Step 2780: lr=6.48E-06, loss= 1.0320 (max= 1.4928), tps=20651, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:34,035 - root - INFO - Step 2780: lr=6.48E-06, loss= 1.0320 (max= 1.4928), tps=20651, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:34,035 - root - INFO - Step 2780: lr=6.48E-06, loss= 1.0320 (max= 1.4928), tps=20651, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:34,035 - root - INFO - Step 2780: lr=6.48E-06, loss= 1.0320 (max= 1.4928), tps=20651, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:34,035 - root - INFO - Step 2780: lr=6.48E-06, loss= 1.0320 (max= 1.4928), tps=20651, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:33:34,035 - root - INFO - Step 2780: lr=6.48E-06, loss= 1.0320 (max= 1.4928), tps=20651, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:06,050 - root - INFO - Step 2790: lr=6.47E-06, loss= 1.0837 (max= 1.5628), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:06,050 - root - INFO - Step 2790: lr=6.47E-06, loss= 1.0837 (max= 1.5628), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:06,050 - root - INFO - Step 2790: lr=6.47E-06, loss= 1.0837 (max= 1.5628), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:06,051 - root - INFO - Step 2790: lr=6.47E-06, loss= 1.0837 (max= 1.5628), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:06,051 - root - INFO - Step 2790: lr=6.47E-06, loss= 1.0837 (max= 1.5628), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:06,051 - root - INFO - Step 2790: lr=6.47E-06, loss= 1.0837 (max= 1.5628), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:06,051 - root - INFO - Step 2790: lr=6.47E-06, loss= 1.0837 (max= 1.5628), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:06,051 - root - INFO - Step 2790: lr=6.47E-06, loss= 1.0837 (max= 1.5628), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:37,847 - root - INFO - Step 2800: lr=6.47E-06, loss= 1.0505 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:37,848 - root - INFO - Step 2800: lr=6.47E-06, loss= 1.0505 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:37,848 - root - INFO - Step 2800: lr=6.47E-06, loss= 1.0505 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:37,848 - root - INFO - Step 2800: lr=6.47E-06, loss= 1.0505 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:37,848 - root - INFO - Step 2800: lr=6.47E-06, loss= 1.0505 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:37,848 - root - INFO - Step 2800: lr=6.47E-06, loss= 1.0505 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:37,848 - root - INFO - Step 2800: lr=6.47E-06, loss= 1.0505 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:34:37,848 - root - INFO - Step 2800: lr=6.47E-06, loss= 1.0505 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:09,634 - root - INFO - Step 2810: lr=6.46E-06, loss= 1.0636 (max= 1.4640), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:09,634 - root - INFO - Step 2810: lr=6.46E-06, loss= 1.0636 (max= 1.4640), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:09,634 - root - INFO - Step 2810: lr=6.46E-06, loss= 1.0636 (max= 1.4640), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:09,635 - root - INFO - Step 2810: lr=6.46E-06, loss= 1.0636 (max= 1.4640), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:09,635 - root - INFO - Step 2810: lr=6.46E-06, loss= 1.0636 (max= 1.4640), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:09,635 - root - INFO - Step 2810: lr=6.46E-06, loss= 1.0636 (max= 1.4640), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:09,635 - root - INFO - Step 2810: lr=6.46E-06, loss= 1.0636 (max= 1.4640), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:09,635 - root - INFO - Step 2810: lr=6.46E-06, loss= 1.0636 (max= 1.4640), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:41,508 - root - INFO - Step 2820: lr=6.45E-06, loss= 1.0848 (max= 1.3680), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:41,508 - root - INFO - Step 2820: lr=6.45E-06, loss= 1.0848 (max= 1.3680), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:41,508 - root - INFO - Step 2820: lr=6.45E-06, loss= 1.0848 (max= 1.3680), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:41,508 - root - INFO - Step 2820: lr=6.45E-06, loss= 1.0848 (max= 1.3680), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:41,508 - root - INFO - Step 2820: lr=6.45E-06, loss= 1.0848 (max= 1.3680), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:41,508 - root - INFO - Step 2820: lr=6.45E-06, loss= 1.0848 (max= 1.3680), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:41,508 - root - INFO - Step 2820: lr=6.45E-06, loss= 1.0848 (max= 1.3680), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:35:41,508 - root - INFO - Step 2820: lr=6.45E-06, loss= 1.0848 (max= 1.3680), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:13,295 - root - INFO - Step 2830: lr=6.44E-06, loss= 1.0678 (max= 1.5056), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:13,295 - root - INFO - Step 2830: lr=6.44E-06, loss= 1.0678 (max= 1.5056), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:13,295 - root - INFO - Step 2830: lr=6.44E-06, loss= 1.0678 (max= 1.5056), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:13,295 - root - INFO - Step 2830: lr=6.44E-06, loss= 1.0678 (max= 1.5056), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:13,295 - root - INFO - Step 2830: lr=6.44E-06, loss= 1.0678 (max= 1.5056), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:13,295 - root - INFO - Step 2830: lr=6.44E-06, loss= 1.0678 (max= 1.5056), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:13,295 - root - INFO - Step 2830: lr=6.44E-06, loss= 1.0678 (max= 1.5056), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:13,295 - root - INFO - Step 2830: lr=6.44E-06, loss= 1.0678 (max= 1.5056), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:45,199 - root - INFO - Step 2840: lr=6.44E-06, loss= 1.0353 (max= 1.4687), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:45,199 - root - INFO - Step 2840: lr=6.44E-06, loss= 1.0353 (max= 1.4687), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:45,199 - root - INFO - Step 2840: lr=6.44E-06, loss= 1.0353 (max= 1.4687), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:45,200 - root - INFO - Step 2840: lr=6.44E-06, loss= 1.0353 (max= 1.4687), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:45,200 - root - INFO - Step 2840: lr=6.44E-06, loss= 1.0353 (max= 1.4687), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:45,200 - root - INFO - Step 2840: lr=6.44E-06, loss= 1.0353 (max= 1.4687), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:45,200 - root - INFO - Step 2840: lr=6.44E-06, loss= 1.0353 (max= 1.4687), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:36:45,200 - root - INFO - Step 2840: lr=6.44E-06, loss= 1.0353 (max= 1.4687), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:37:17,063 - root - INFO - Step 2850: lr=6.43E-06, loss= 1.0612 (max= 1.5103), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:37:17,064 - root - INFO - Step 2850: lr=6.43E-06, loss= 1.0612 (max= 1.5103), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:37:17,064 - root - INFO - Step 2850: lr=6.43E-06, loss= 1.0612 (max= 1.5103), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:37:17,064 - root - INFO - Step 2850: lr=6.43E-06, loss= 1.0612 (max= 1.5103), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:37:17,064 - root - INFO - Step 2850: lr=6.43E-06, loss= 1.0612 (max= 1.5103), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:37:17,064 - root - INFO - Step 2850: lr=6.43E-06, loss= 1.0612 (max= 1.5103), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:37:17,064 - root - INFO - Step 2850: lr=6.43E-06, loss= 1.0612 (max= 1.5103), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:37:17,064 - root - INFO - Step 2850: lr=6.43E-06, loss= 1.0612 (max= 1.5103), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:37:52,232 - root - INFO - Step 2860: lr=6.42E-06, loss= 1.0813 (max= 1.5717), tps=18637, mfu=38.83%, memory: 154.31GiB(86.51%) time/data_loading=0.03s (max=0.22s, 12.49%) 2025-10-26 10:37:52,232 - root - INFO - Step 2860: lr=6.42E-06, loss= 1.0813 (max= 1.5717), tps=18637, mfu=38.83%, memory: 154.31GiB(86.51%) time/data_loading=0.03s (max=0.22s, 12.49%) 2025-10-26 10:37:52,232 - root - INFO - Step 2860: lr=6.42E-06, loss= 1.0813 (max= 1.5717), tps=18637, mfu=38.83%, memory: 154.31GiB(86.51%) time/data_loading=0.03s (max=0.22s, 12.49%) 2025-10-26 10:37:52,232 - root - INFO - Step 2860: lr=6.42E-06, loss= 1.0813 (max= 1.5717), tps=18637, mfu=38.83%, memory: 154.31GiB(86.51%) time/data_loading=0.03s (max=0.22s, 12.49%) 2025-10-26 10:37:52,232 - root - INFO - Step 2860: lr=6.42E-06, loss= 1.0813 (max= 1.5717), tps=18637, mfu=38.83%, memory: 154.31GiB(86.51%) time/data_loading=0.03s (max=0.22s, 12.49%) 2025-10-26 10:37:52,232 - root - INFO - Step 2860: lr=6.42E-06, loss= 1.0813 (max= 1.5717), tps=18637, mfu=38.83%, memory: 154.31GiB(86.51%) time/data_loading=0.03s (max=0.22s, 12.49%) 2025-10-26 10:37:52,232 - root - INFO - Step 2860: lr=6.42E-06, loss= 1.0813 (max= 1.5717), tps=18637, mfu=38.83%, memory: 154.31GiB(86.51%) time/data_loading=0.03s (max=0.22s, 12.49%) 2025-10-26 10:37:52,233 - root - INFO - Step 2860: lr=6.42E-06, loss= 1.0813 (max= 1.5717), tps=18637, mfu=38.83%, memory: 154.31GiB(86.51%) time/data_loading=0.03s (max=0.22s, 12.49%) 2025-10-26 10:38:24,059 - root - INFO - Step 2870: lr=6.41E-06, loss= 1.0741 (max= 1.4769), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:24,059 - root - INFO - Step 2870: lr=6.41E-06, loss= 1.0741 (max= 1.4769), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:24,059 - root - INFO - Step 2870: lr=6.41E-06, loss= 1.0741 (max= 1.4769), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:24,060 - root - INFO - Step 2870: lr=6.41E-06, loss= 1.0741 (max= 1.4769), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:24,060 - root - INFO - Step 2870: lr=6.41E-06, loss= 1.0741 (max= 1.4769), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:24,060 - root - INFO - Step 2870: lr=6.41E-06, loss= 1.0741 (max= 1.4769), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:24,060 - root - INFO - Step 2870: lr=6.41E-06, loss= 1.0741 (max= 1.4769), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:24,060 - root - INFO - Step 2870: lr=6.41E-06, loss= 1.0741 (max= 1.4769), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:55,880 - root - INFO - Step 2880: lr=6.41E-06, loss= 1.0487 (max= 1.5092), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:55,880 - root - INFO - Step 2880: lr=6.41E-06, loss= 1.0487 (max= 1.5092), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:55,880 - root - INFO - Step 2880: lr=6.41E-06, loss= 1.0487 (max= 1.5092), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:55,880 - root - INFO - Step 2880: lr=6.41E-06, loss= 1.0487 (max= 1.5092), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:55,880 - root - INFO - Step 2880: lr=6.41E-06, loss= 1.0487 (max= 1.5092), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:55,881 - root - INFO - Step 2880: lr=6.41E-06, loss= 1.0487 (max= 1.5092), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:55,881 - root - INFO - Step 2880: lr=6.41E-06, loss= 1.0487 (max= 1.5092), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:38:55,881 - root - INFO - Step 2880: lr=6.41E-06, loss= 1.0487 (max= 1.5092), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:27,664 - root - INFO - Step 2890: lr=6.40E-06, loss= 1.0579 (max= 1.5730), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:27,664 - root - INFO - Step 2890: lr=6.40E-06, loss= 1.0579 (max= 1.5730), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:27,664 - root - INFO - Step 2890: lr=6.40E-06, loss= 1.0579 (max= 1.5730), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:27,665 - root - INFO - Step 2890: lr=6.40E-06, loss= 1.0579 (max= 1.5730), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:27,665 - root - INFO - Step 2890: lr=6.40E-06, loss= 1.0579 (max= 1.5730), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:27,665 - root - INFO - Step 2890: lr=6.40E-06, loss= 1.0579 (max= 1.5730), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:27,665 - root - INFO - Step 2890: lr=6.40E-06, loss= 1.0579 (max= 1.5730), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:27,665 - root - INFO - Step 2890: lr=6.40E-06, loss= 1.0579 (max= 1.5730), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:37,785 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:6438263 2025-10-26 10:39:59,757 - root - INFO - Step 2900: lr=6.39E-06, loss= 1.0604 (max= 1.5474), tps=20423, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:59,757 - root - INFO - Step 2900: lr=6.39E-06, loss= 1.0604 (max= 1.5474), tps=20423, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:59,757 - root - INFO - Step 2900: lr=6.39E-06, loss= 1.0604 (max= 1.5474), tps=20423, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:59,757 - root - INFO - Step 2900: lr=6.39E-06, loss= 1.0604 (max= 1.5474), tps=20423, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:59,757 - root - INFO - Step 2900: lr=6.39E-06, loss= 1.0604 (max= 1.5474), tps=20423, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:59,757 - root - INFO - Step 2900: lr=6.39E-06, loss= 1.0604 (max= 1.5474), tps=20423, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:59,757 - root - INFO - Step 2900: lr=6.39E-06, loss= 1.0604 (max= 1.5474), tps=20423, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:39:59,757 - root - INFO - Step 2900: lr=6.39E-06, loss= 1.0604 (max= 1.5474), tps=20423, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:40:31,888 - root - INFO - Step 2910: lr=6.38E-06, loss= 1.0565 (max= 1.6429), tps=20398, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:40:31,888 - root - INFO - Step 2910: lr=6.38E-06, loss= 1.0565 (max= 1.6429), tps=20398, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:40:31,888 - root - INFO - Step 2910: lr=6.38E-06, loss= 1.0565 (max= 1.6429), tps=20398, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:40:31,888 - root - INFO - Step 2910: lr=6.38E-06, loss= 1.0565 (max= 1.6429), tps=20398, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:40:31,889 - root - INFO - Step 2910: lr=6.38E-06, loss= 1.0565 (max= 1.6429), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:40:31,889 - root - INFO - Step 2910: lr=6.38E-06, loss= 1.0565 (max= 1.6429), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:40:31,889 - root - INFO - Step 2910: lr=6.38E-06, loss= 1.0565 (max= 1.6429), tps=20398, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:40:31,889 - root - INFO - Step 2910: lr=6.38E-06, loss= 1.0565 (max= 1.6429), tps=20398, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:03,747 - root - INFO - Step 2920: lr=6.38E-06, loss= 1.0714 (max= 1.9175), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:03,748 - root - INFO - Step 2920: lr=6.38E-06, loss= 1.0714 (max= 1.9175), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:03,748 - root - INFO - Step 2920: lr=6.38E-06, loss= 1.0714 (max= 1.9175), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:03,748 - root - INFO - Step 2920: lr=6.38E-06, loss= 1.0714 (max= 1.9175), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:03,748 - root - INFO - Step 2920: lr=6.38E-06, loss= 1.0714 (max= 1.9175), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:03,748 - root - INFO - Step 2920: lr=6.38E-06, loss= 1.0714 (max= 1.9175), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:03,748 - root - INFO - Step 2920: lr=6.38E-06, loss= 1.0714 (max= 1.9175), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:03,748 - root - INFO - Step 2920: lr=6.38E-06, loss= 1.0714 (max= 1.9175), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:35,592 - root - INFO - Step 2930: lr=6.37E-06, loss= 1.0634 (max= 1.4663), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:35,592 - root - INFO - Step 2930: lr=6.37E-06, loss= 1.0634 (max= 1.4663), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:35,592 - root - INFO - Step 2930: lr=6.37E-06, loss= 1.0634 (max= 1.4663), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:35,592 - root - INFO - Step 2930: lr=6.37E-06, loss= 1.0634 (max= 1.4663), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:35,592 - root - INFO - Step 2930: lr=6.37E-06, loss= 1.0634 (max= 1.4663), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:35,593 - root - INFO - Step 2930: lr=6.37E-06, loss= 1.0634 (max= 1.4663), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:35,593 - root - INFO - Step 2930: lr=6.37E-06, loss= 1.0634 (max= 1.4663), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:41:35,593 - root - INFO - Step 2930: lr=6.37E-06, loss= 1.0634 (max= 1.4663), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:07,388 - root - INFO - Step 2940: lr=6.36E-06, loss= 1.0500 (max= 1.4279), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:07,388 - root - INFO - Step 2940: lr=6.36E-06, loss= 1.0500 (max= 1.4279), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:07,388 - root - INFO - Step 2940: lr=6.36E-06, loss= 1.0500 (max= 1.4279), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:07,388 - root - INFO - Step 2940: lr=6.36E-06, loss= 1.0500 (max= 1.4279), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:07,388 - root - INFO - Step 2940: lr=6.36E-06, loss= 1.0500 (max= 1.4279), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:07,388 - root - INFO - Step 2940: lr=6.36E-06, loss= 1.0500 (max= 1.4279), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:07,389 - root - INFO - Step 2940: lr=6.36E-06, loss= 1.0500 (max= 1.4279), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:07,389 - root - INFO - Step 2940: lr=6.36E-06, loss= 1.0500 (max= 1.4279), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:39,270 - root - INFO - Step 2950: lr=6.35E-06, loss= 1.0328 (max= 1.5200), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:39,270 - root - INFO - Step 2950: lr=6.35E-06, loss= 1.0328 (max= 1.5200), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:39,270 - root - INFO - Step 2950: lr=6.35E-06, loss= 1.0328 (max= 1.5200), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:39,270 - root - INFO - Step 2950: lr=6.35E-06, loss= 1.0328 (max= 1.5200), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:39,270 - root - INFO - Step 2950: lr=6.35E-06, loss= 1.0328 (max= 1.5200), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:39,270 - root - INFO - Step 2950: lr=6.35E-06, loss= 1.0328 (max= 1.5200), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:39,271 - root - INFO - Step 2950: lr=6.35E-06, loss= 1.0328 (max= 1.5200), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:42:39,271 - root - INFO - Step 2950: lr=6.35E-06, loss= 1.0328 (max= 1.5200), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:11,105 - root - INFO - Step 2960: lr=6.35E-06, loss= 1.0677 (max= 1.4968), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:11,105 - root - INFO - Step 2960: lr=6.35E-06, loss= 1.0677 (max= 1.4968), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:11,105 - root - INFO - Step 2960: lr=6.35E-06, loss= 1.0677 (max= 1.4968), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:11,105 - root - INFO - Step 2960: lr=6.35E-06, loss= 1.0677 (max= 1.4968), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:11,106 - root - INFO - Step 2960: lr=6.35E-06, loss= 1.0677 (max= 1.4968), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:11,106 - root - INFO - Step 2960: lr=6.35E-06, loss= 1.0677 (max= 1.4968), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:11,106 - root - INFO - Step 2960: lr=6.35E-06, loss= 1.0677 (max= 1.4968), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:11,106 - root - INFO - Step 2960: lr=6.35E-06, loss= 1.0677 (max= 1.4968), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:42,946 - root - INFO - Step 2970: lr=6.34E-06, loss= 1.0599 (max= 1.5056), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:42,946 - root - INFO - Step 2970: lr=6.34E-06, loss= 1.0599 (max= 1.5056), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:42,946 - root - INFO - Step 2970: lr=6.34E-06, loss= 1.0599 (max= 1.5056), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:42,946 - root - INFO - Step 2970: lr=6.34E-06, loss= 1.0599 (max= 1.5056), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:42,946 - root - INFO - Step 2970: lr=6.34E-06, loss= 1.0599 (max= 1.5056), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:42,946 - root - INFO - Step 2970: lr=6.34E-06, loss= 1.0599 (max= 1.5056), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:42,946 - root - INFO - Step 2970: lr=6.34E-06, loss= 1.0599 (max= 1.5056), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:43:42,947 - root - INFO - Step 2970: lr=6.34E-06, loss= 1.0599 (max= 1.5056), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:14,781 - root - INFO - Step 2980: lr=6.33E-06, loss= 1.0543 (max= 1.6550), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:14,781 - root - INFO - Step 2980: lr=6.33E-06, loss= 1.0543 (max= 1.6550), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:14,781 - root - INFO - Step 2980: lr=6.33E-06, loss= 1.0543 (max= 1.6550), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:14,781 - root - INFO - Step 2980: lr=6.33E-06, loss= 1.0543 (max= 1.6550), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:14,781 - root - INFO - Step 2980: lr=6.33E-06, loss= 1.0543 (max= 1.6550), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:14,781 - root - INFO - Step 2980: lr=6.33E-06, loss= 1.0543 (max= 1.6550), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:14,781 - root - INFO - Step 2980: lr=6.33E-06, loss= 1.0543 (max= 1.6550), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:14,781 - root - INFO - Step 2980: lr=6.33E-06, loss= 1.0543 (max= 1.6550), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:46,616 - root - INFO - Step 2990: lr=6.32E-06, loss= 1.0405 (max= 1.4667), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:46,616 - root - INFO - Step 2990: lr=6.32E-06, loss= 1.0405 (max= 1.4667), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:46,616 - root - INFO - Step 2990: lr=6.32E-06, loss= 1.0405 (max= 1.4667), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:46,617 - root - INFO - Step 2990: lr=6.32E-06, loss= 1.0405 (max= 1.4667), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:46,617 - root - INFO - Step 2990: lr=6.32E-06, loss= 1.0405 (max= 1.4667), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:46,617 - root - INFO - Step 2990: lr=6.32E-06, loss= 1.0405 (max= 1.4667), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:46,617 - root - INFO - Step 2990: lr=6.32E-06, loss= 1.0405 (max= 1.4667), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:44:46,617 - root - INFO - Step 2990: lr=6.32E-06, loss= 1.0405 (max= 1.4667), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-3000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-3000! Save time: 4.4142937660217285 2025-10-26 10:45:18,434 - root - INFO - Step 3000: lr=6.32E-06, loss= 1.0613 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:45:18,434 - root - INFO - Step 3000: lr=6.32E-06, loss= 1.0613 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:45:18,434 - root - INFO - Step 3000: lr=6.32E-06, loss= 1.0613 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:45:18,434 - root - INFO - Step 3000: lr=6.32E-06, loss= 1.0613 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:45:18,434 - root - INFO - Saving a full checkpoint at step 3000 2025-10-26 10:45:18,434 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 10:45:18,434 - root - INFO - Saving a full checkpoint at step 3000 2025-10-26 10:45:18,434 - root - INFO - Saving a full checkpoint at step 3000 2025-10-26 10:45:18,434 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 10:45:18,434 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 10:45:18,434 - root - INFO - Saving a full checkpoint at step 3000 2025-10-26 10:45:18,434 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 10:45:18,434 - root - INFO - Step 3000: lr=6.32E-06, loss= 1.0613 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:45:18,434 - root - INFO - Step 3000: lr=6.32E-06, loss= 1.0613 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:45:18,434 - root - INFO - Saving a full checkpoint at step 3000 2025-10-26 10:45:18,434 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 10:45:18,434 - root - INFO - Saving a full checkpoint at step 3000 2025-10-26 10:45:18,434 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 10:45:18,434 - root - INFO - Step 3000: lr=6.32E-06, loss= 1.0613 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:45:18,434 - root - INFO - Saving a full checkpoint at step 3000 2025-10-26 10:45:18,434 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 10:45:18,435 - root - INFO - Step 3000: lr=6.32E-06, loss= 1.0613 (max= 1.5729), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:45:18,435 - root - INFO - Saving a full checkpoint at step 3000 2025-10-26 10:45:18,435 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 10:45:33,879 - root - INFO - Finished saving the checkpoint in 15.44 seconds 2025-10-26 10:45:33,882 - root - INFO - Finished saving the checkpoint in 15.45 seconds 2025-10-26 10:45:33,884 - root - INFO - Finished saving the checkpoint in 15.45 seconds 2025-10-26 10:45:33,884 - root - INFO - Finished saving the checkpoint in 15.45 seconds 2025-10-26 10:45:33,884 - root - INFO - Finished saving the checkpoint in 15.45 seconds 2025-10-26 10:45:33,884 - root - INFO - Finished saving the checkpoint in 15.45 seconds 2025-10-26 10:45:33,884 - root - INFO - Finished saving the checkpoint in 15.45 seconds 2025-10-26 10:45:33,885 - root - INFO - Finished saving the checkpoint in 15.45 seconds 2025-10-26 10:46:05,822 - root - INFO - Step 3010: lr=6.31E-06, loss= 1.0360 (max= 1.5447), tps=13831, mfu=28.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:05,822 - root - INFO - Step 3010: lr=6.31E-06, loss= 1.0360 (max= 1.5447), tps=13831, mfu=28.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:05,822 - root - INFO - Step 3010: lr=6.31E-06, loss= 1.0360 (max= 1.5447), tps=13831, mfu=28.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:05,822 - root - INFO - Step 3010: lr=6.31E-06, loss= 1.0360 (max= 1.5447), tps=13831, mfu=28.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:05,822 - root - INFO - Step 3010: lr=6.31E-06, loss= 1.0360 (max= 1.5447), tps=13831, mfu=28.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:05,822 - root - INFO - Step 3010: lr=6.31E-06, loss= 1.0360 (max= 1.5447), tps=13831, mfu=28.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:05,822 - root - INFO - Step 3010: lr=6.31E-06, loss= 1.0360 (max= 1.5447), tps=13831, mfu=28.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:05,822 - root - INFO - Step 3010: lr=6.31E-06, loss= 1.0360 (max= 1.5447), tps=13831, mfu=28.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:37,828 - root - INFO - Step 3020: lr=6.30E-06, loss= 1.0647 (max= 1.6363), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:37,828 - root - INFO - Step 3020: lr=6.30E-06, loss= 1.0647 (max= 1.6363), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:37,828 - root - INFO - Step 3020: lr=6.30E-06, loss= 1.0647 (max= 1.6363), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:37,828 - root - INFO - Step 3020: lr=6.30E-06, loss= 1.0647 (max= 1.6363), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:37,828 - root - INFO - Step 3020: lr=6.30E-06, loss= 1.0647 (max= 1.6363), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:37,828 - root - INFO - Step 3020: lr=6.30E-06, loss= 1.0647 (max= 1.6363), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:37,828 - root - INFO - Step 3020: lr=6.30E-06, loss= 1.0647 (max= 1.6363), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:46:37,829 - root - INFO - Step 3020: lr=6.30E-06, loss= 1.0647 (max= 1.6363), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:09,656 - root - INFO - Step 3030: lr=6.29E-06, loss= 1.0592 (max= 1.4650), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:09,656 - root - INFO - Step 3030: lr=6.29E-06, loss= 1.0592 (max= 1.4650), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:09,656 - root - INFO - Step 3030: lr=6.29E-06, loss= 1.0592 (max= 1.4650), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:09,656 - root - INFO - Step 3030: lr=6.29E-06, loss= 1.0592 (max= 1.4650), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:09,656 - root - INFO - Step 3030: lr=6.29E-06, loss= 1.0592 (max= 1.4650), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:09,656 - root - INFO - Step 3030: lr=6.29E-06, loss= 1.0592 (max= 1.4650), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:09,656 - root - INFO - Step 3030: lr=6.29E-06, loss= 1.0592 (max= 1.4650), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:09,656 - root - INFO - Step 3030: lr=6.29E-06, loss= 1.0592 (max= 1.4650), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:41,527 - root - INFO - Step 3040: lr=6.29E-06, loss= 1.0700 (max= 1.5105), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:41,527 - root - INFO - Step 3040: lr=6.29E-06, loss= 1.0700 (max= 1.5105), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:41,527 - root - INFO - Step 3040: lr=6.29E-06, loss= 1.0700 (max= 1.5105), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:41,527 - root - INFO - Step 3040: lr=6.29E-06, loss= 1.0700 (max= 1.5105), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:41,527 - root - INFO - Step 3040: lr=6.29E-06, loss= 1.0700 (max= 1.5105), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:41,527 - root - INFO - Step 3040: lr=6.29E-06, loss= 1.0700 (max= 1.5105), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:41,527 - root - INFO - Step 3040: lr=6.29E-06, loss= 1.0700 (max= 1.5105), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:47:41,527 - root - INFO - Step 3040: lr=6.29E-06, loss= 1.0700 (max= 1.5105), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:13,477 - root - INFO - Step 3050: lr=6.28E-06, loss= 1.0479 (max= 1.4562), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:13,477 - root - INFO - Step 3050: lr=6.28E-06, loss= 1.0479 (max= 1.4562), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:13,477 - root - INFO - Step 3050: lr=6.28E-06, loss= 1.0479 (max= 1.4562), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:13,477 - root - INFO - Step 3050: lr=6.28E-06, loss= 1.0479 (max= 1.4562), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:13,477 - root - INFO - Step 3050: lr=6.28E-06, loss= 1.0479 (max= 1.4562), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:13,477 - root - INFO - Step 3050: lr=6.28E-06, loss= 1.0479 (max= 1.4562), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:13,477 - root - INFO - Step 3050: lr=6.28E-06, loss= 1.0479 (max= 1.4562), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:13,477 - root - INFO - Step 3050: lr=6.28E-06, loss= 1.0479 (max= 1.4562), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:45,323 - root - INFO - Step 3060: lr=6.27E-06, loss= 1.0583 (max= 1.5962), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:45,323 - root - INFO - Step 3060: lr=6.27E-06, loss= 1.0583 (max= 1.5962), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:45,323 - root - INFO - Step 3060: lr=6.27E-06, loss= 1.0583 (max= 1.5962), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:45,323 - root - INFO - Step 3060: lr=6.27E-06, loss= 1.0583 (max= 1.5962), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:45,323 - root - INFO - Step 3060: lr=6.27E-06, loss= 1.0583 (max= 1.5962), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:45,323 - root - INFO - Step 3060: lr=6.27E-06, loss= 1.0583 (max= 1.5962), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:45,323 - root - INFO - Step 3060: lr=6.27E-06, loss= 1.0583 (max= 1.5962), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:48:45,323 - root - INFO - Step 3060: lr=6.27E-06, loss= 1.0583 (max= 1.5962), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:17,287 - root - INFO - Step 3070: lr=6.27E-06, loss= 1.0903 (max= 1.8388), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:17,287 - root - INFO - Step 3070: lr=6.27E-06, loss= 1.0903 (max= 1.8388), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:17,287 - root - INFO - Step 3070: lr=6.27E-06, loss= 1.0903 (max= 1.8388), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:17,288 - root - INFO - Step 3070: lr=6.27E-06, loss= 1.0903 (max= 1.8388), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:17,288 - root - INFO - Step 3070: lr=6.27E-06, loss= 1.0903 (max= 1.8388), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:17,288 - root - INFO - Step 3070: lr=6.27E-06, loss= 1.0903 (max= 1.8388), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:17,288 - root - INFO - Step 3070: lr=6.27E-06, loss= 1.0903 (max= 1.8388), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:17,288 - root - INFO - Step 3070: lr=6.27E-06, loss= 1.0903 (max= 1.8388), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:49,135 - root - INFO - Step 3080: lr=6.26E-06, loss= 1.0490 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:49,135 - root - INFO - Step 3080: lr=6.26E-06, loss= 1.0490 (max= 1.5550), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:49,135 - root - INFO - Step 3080: lr=6.26E-06, loss= 1.0490 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:49,135 - root - INFO - Step 3080: lr=6.26E-06, loss= 1.0490 (max= 1.5550), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:49,135 - root - INFO - Step 3080: lr=6.26E-06, loss= 1.0490 (max= 1.5550), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:49,135 - root - INFO - Step 3080: lr=6.26E-06, loss= 1.0490 (max= 1.5550), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:49,135 - root - INFO - Step 3080: lr=6.26E-06, loss= 1.0490 (max= 1.5550), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:49:49,135 - root - INFO - Step 3080: lr=6.26E-06, loss= 1.0490 (max= 1.5550), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:20,952 - root - INFO - Step 3090: lr=6.25E-06, loss= 1.0535 (max= 1.5812), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:20,953 - root - INFO - Step 3090: lr=6.25E-06, loss= 1.0535 (max= 1.5812), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:20,953 - root - INFO - Step 3090: lr=6.25E-06, loss= 1.0535 (max= 1.5812), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:20,953 - root - INFO - Step 3090: lr=6.25E-06, loss= 1.0535 (max= 1.5812), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:20,953 - root - INFO - Step 3090: lr=6.25E-06, loss= 1.0535 (max= 1.5812), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:20,953 - root - INFO - Step 3090: lr=6.25E-06, loss= 1.0535 (max= 1.5812), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:20,953 - root - INFO - Step 3090: lr=6.25E-06, loss= 1.0535 (max= 1.5812), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:20,953 - root - INFO - Step 3090: lr=6.25E-06, loss= 1.0535 (max= 1.5812), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:52,844 - root - INFO - Step 3100: lr=6.24E-06, loss= 1.0613 (max= 1.4250), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:52,844 - root - INFO - Step 3100: lr=6.24E-06, loss= 1.0613 (max= 1.4250), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:52,844 - root - INFO - Step 3100: lr=6.24E-06, loss= 1.0613 (max= 1.4250), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:52,844 - root - INFO - Step 3100: lr=6.24E-06, loss= 1.0613 (max= 1.4250), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:52,844 - root - INFO - Step 3100: lr=6.24E-06, loss= 1.0613 (max= 1.4250), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:52,845 - root - INFO - Step 3100: lr=6.24E-06, loss= 1.0613 (max= 1.4250), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:52,845 - root - INFO - Step 3100: lr=6.24E-06, loss= 1.0613 (max= 1.4250), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:50:52,845 - root - INFO - Step 3100: lr=6.24E-06, loss= 1.0613 (max= 1.4250), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:24,683 - root - INFO - Step 3110: lr=6.24E-06, loss= 1.0606 (max= 1.4426), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:24,683 - root - INFO - Step 3110: lr=6.24E-06, loss= 1.0606 (max= 1.4426), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:24,683 - root - INFO - Step 3110: lr=6.24E-06, loss= 1.0606 (max= 1.4426), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:24,683 - root - INFO - Step 3110: lr=6.24E-06, loss= 1.0606 (max= 1.4426), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:24,683 - root - INFO - Step 3110: lr=6.24E-06, loss= 1.0606 (max= 1.4426), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:24,683 - root - INFO - Step 3110: lr=6.24E-06, loss= 1.0606 (max= 1.4426), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:24,683 - root - INFO - Step 3110: lr=6.24E-06, loss= 1.0606 (max= 1.4426), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:24,683 - root - INFO - Step 3110: lr=6.24E-06, loss= 1.0606 (max= 1.4426), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:56,541 - root - INFO - Step 3120: lr=6.23E-06, loss= 1.0786 (max= 1.4491), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:56,541 - root - INFO - Step 3120: lr=6.23E-06, loss= 1.0786 (max= 1.4491), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:56,541 - root - INFO - Step 3120: lr=6.23E-06, loss= 1.0786 (max= 1.4491), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:56,541 - root - INFO - Step 3120: lr=6.23E-06, loss= 1.0786 (max= 1.4491), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:56,541 - root - INFO - Step 3120: lr=6.23E-06, loss= 1.0786 (max= 1.4491), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:56,541 - root - INFO - Step 3120: lr=6.23E-06, loss= 1.0786 (max= 1.4491), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:56,541 - root - INFO - Step 3120: lr=6.23E-06, loss= 1.0786 (max= 1.4491), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:51:56,541 - root - INFO - Step 3120: lr=6.23E-06, loss= 1.0786 (max= 1.4491), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:52:28,483 - root - INFO - Step 3130: lr=6.22E-06, loss= 1.0450 (max= 1.4196), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:52:28,483 - root - INFO - Step 3130: lr=6.22E-06, loss= 1.0450 (max= 1.4196), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:52:28,483 - root - INFO - Step 3130: lr=6.22E-06, loss= 1.0450 (max= 1.4196), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:52:28,483 - root - INFO - Step 3130: lr=6.22E-06, loss= 1.0450 (max= 1.4196), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:52:28,483 - root - INFO - Step 3130: lr=6.22E-06, loss= 1.0450 (max= 1.4196), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:52:28,483 - root - INFO - Step 3130: lr=6.22E-06, loss= 1.0450 (max= 1.4196), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:52:28,484 - root - INFO - Step 3130: lr=6.22E-06, loss= 1.0450 (max= 1.4196), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:52:28,484 - root - INFO - Step 3130: lr=6.22E-06, loss= 1.0450 (max= 1.4196), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:00,420 - root - INFO - Step 3140: lr=6.21E-06, loss= 1.0618 (max= 1.6114), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:00,420 - root - INFO - Step 3140: lr=6.21E-06, loss= 1.0618 (max= 1.6114), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:00,420 - root - INFO - Step 3140: lr=6.21E-06, loss= 1.0618 (max= 1.6114), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:00,420 - root - INFO - Step 3140: lr=6.21E-06, loss= 1.0618 (max= 1.6114), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:00,420 - root - INFO - Step 3140: lr=6.21E-06, loss= 1.0618 (max= 1.6114), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:00,420 - root - INFO - Step 3140: lr=6.21E-06, loss= 1.0618 (max= 1.6114), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:00,420 - root - INFO - Step 3140: lr=6.21E-06, loss= 1.0618 (max= 1.6114), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:00,421 - root - INFO - Step 3140: lr=6.21E-06, loss= 1.0618 (max= 1.6114), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:32,380 - root - INFO - Step 3150: lr=6.21E-06, loss= 1.0805 (max= 1.5488), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:32,380 - root - INFO - Step 3150: lr=6.21E-06, loss= 1.0805 (max= 1.5488), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:32,380 - root - INFO - Step 3150: lr=6.21E-06, loss= 1.0805 (max= 1.5488), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:32,380 - root - INFO - Step 3150: lr=6.21E-06, loss= 1.0805 (max= 1.5488), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:32,380 - root - INFO - Step 3150: lr=6.21E-06, loss= 1.0805 (max= 1.5488), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:32,380 - root - INFO - Step 3150: lr=6.21E-06, loss= 1.0805 (max= 1.5488), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:32,380 - root - INFO - Step 3150: lr=6.21E-06, loss= 1.0805 (max= 1.5488), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:53:32,380 - root - INFO - Step 3150: lr=6.21E-06, loss= 1.0805 (max= 1.5488), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:04,144 - root - INFO - Step 3160: lr=6.20E-06, loss= 1.0564 (max= 1.5595), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:04,144 - root - INFO - Step 3160: lr=6.20E-06, loss= 1.0564 (max= 1.5595), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:04,144 - root - INFO - Step 3160: lr=6.20E-06, loss= 1.0564 (max= 1.5595), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:04,144 - root - INFO - Step 3160: lr=6.20E-06, loss= 1.0564 (max= 1.5595), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:04,144 - root - INFO - Step 3160: lr=6.20E-06, loss= 1.0564 (max= 1.5595), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:04,144 - root - INFO - Step 3160: lr=6.20E-06, loss= 1.0564 (max= 1.5595), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:04,144 - root - INFO - Step 3160: lr=6.20E-06, loss= 1.0564 (max= 1.5595), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:04,144 - root - INFO - Step 3160: lr=6.20E-06, loss= 1.0564 (max= 1.5595), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:35,934 - root - INFO - Step 3170: lr=6.19E-06, loss= 1.0762 (max= 1.5639), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:35,934 - root - INFO - Step 3170: lr=6.19E-06, loss= 1.0762 (max= 1.5639), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:35,934 - root - INFO - Step 3170: lr=6.19E-06, loss= 1.0762 (max= 1.5639), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:35,934 - root - INFO - Step 3170: lr=6.19E-06, loss= 1.0762 (max= 1.5639), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:35,934 - root - INFO - Step 3170: lr=6.19E-06, loss= 1.0762 (max= 1.5639), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:35,934 - root - INFO - Step 3170: lr=6.19E-06, loss= 1.0762 (max= 1.5639), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:35,934 - root - INFO - Step 3170: lr=6.19E-06, loss= 1.0762 (max= 1.5639), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:54:35,935 - root - INFO - Step 3170: lr=6.19E-06, loss= 1.0762 (max= 1.5639), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:07,789 - root - INFO - Step 3180: lr=6.19E-06, loss= 1.0624 (max= 1.5708), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:07,789 - root - INFO - Step 3180: lr=6.19E-06, loss= 1.0624 (max= 1.5708), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:07,789 - root - INFO - Step 3180: lr=6.19E-06, loss= 1.0624 (max= 1.5708), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:07,789 - root - INFO - Step 3180: lr=6.19E-06, loss= 1.0624 (max= 1.5708), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:07,789 - root - INFO - Step 3180: lr=6.19E-06, loss= 1.0624 (max= 1.5708), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:07,789 - root - INFO - Step 3180: lr=6.19E-06, loss= 1.0624 (max= 1.5708), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:07,789 - root - INFO - Step 3180: lr=6.19E-06, loss= 1.0624 (max= 1.5708), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:07,789 - root - INFO - Step 3180: lr=6.19E-06, loss= 1.0624 (max= 1.5708), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:39,721 - root - INFO - Step 3190: lr=6.18E-06, loss= 1.0863 (max= 1.4823), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:39,721 - root - INFO - Step 3190: lr=6.18E-06, loss= 1.0863 (max= 1.4823), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:39,721 - root - INFO - Step 3190: lr=6.18E-06, loss= 1.0863 (max= 1.4823), tps=20525, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:39,721 - root - INFO - Step 3190: lr=6.18E-06, loss= 1.0863 (max= 1.4823), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:39,721 - root - INFO - Step 3190: lr=6.18E-06, loss= 1.0863 (max= 1.4823), tps=20525, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:39,721 - root - INFO - Step 3190: lr=6.18E-06, loss= 1.0863 (max= 1.4823), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:39,721 - root - INFO - Step 3190: lr=6.18E-06, loss= 1.0863 (max= 1.4823), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:55:39,721 - root - INFO - Step 3190: lr=6.18E-06, loss= 1.0863 (max= 1.4823), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:11,530 - root - INFO - Step 3200: lr=6.17E-06, loss= 1.0291 (max= 1.4284), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:11,530 - root - INFO - Step 3200: lr=6.17E-06, loss= 1.0291 (max= 1.4284), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:11,530 - root - INFO - Step 3200: lr=6.17E-06, loss= 1.0291 (max= 1.4284), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:11,530 - root - INFO - Step 3200: lr=6.17E-06, loss= 1.0291 (max= 1.4284), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:11,530 - root - INFO - Step 3200: lr=6.17E-06, loss= 1.0291 (max= 1.4284), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:11,530 - root - INFO - Step 3200: lr=6.17E-06, loss= 1.0291 (max= 1.4284), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:11,530 - root - INFO - Step 3200: lr=6.17E-06, loss= 1.0291 (max= 1.4284), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:11,530 - root - INFO - Step 3200: lr=6.17E-06, loss= 1.0291 (max= 1.4284), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:43,383 - root - INFO - Step 3210: lr=6.16E-06, loss= 1.0639 (max= 1.4570), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:43,383 - root - INFO - Step 3210: lr=6.16E-06, loss= 1.0639 (max= 1.4570), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:43,383 - root - INFO - Step 3210: lr=6.16E-06, loss= 1.0639 (max= 1.4570), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:43,383 - root - INFO - Step 3210: lr=6.16E-06, loss= 1.0639 (max= 1.4570), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:43,383 - root - INFO - Step 3210: lr=6.16E-06, loss= 1.0639 (max= 1.4570), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:43,383 - root - INFO - Step 3210: lr=6.16E-06, loss= 1.0639 (max= 1.4570), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:43,383 - root - INFO - Step 3210: lr=6.16E-06, loss= 1.0639 (max= 1.4570), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:56:43,383 - root - INFO - Step 3210: lr=6.16E-06, loss= 1.0639 (max= 1.4570), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:15,191 - root - INFO - Step 3220: lr=6.16E-06, loss= 1.0702 (max= 1.4561), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:15,191 - root - INFO - Step 3220: lr=6.16E-06, loss= 1.0702 (max= 1.4561), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:15,191 - root - INFO - Step 3220: lr=6.16E-06, loss= 1.0702 (max= 1.4561), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:15,191 - root - INFO - Step 3220: lr=6.16E-06, loss= 1.0702 (max= 1.4561), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:15,191 - root - INFO - Step 3220: lr=6.16E-06, loss= 1.0702 (max= 1.4561), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:15,191 - root - INFO - Step 3220: lr=6.16E-06, loss= 1.0702 (max= 1.4561), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:15,191 - root - INFO - Step 3220: lr=6.16E-06, loss= 1.0702 (max= 1.4561), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:15,191 - root - INFO - Step 3220: lr=6.16E-06, loss= 1.0702 (max= 1.4561), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:47,181 - root - INFO - Step 3230: lr=6.15E-06, loss= 1.0753 (max= 1.5573), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:47,181 - root - INFO - Step 3230: lr=6.15E-06, loss= 1.0753 (max= 1.5573), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:47,181 - root - INFO - Step 3230: lr=6.15E-06, loss= 1.0753 (max= 1.5573), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:47,181 - root - INFO - Step 3230: lr=6.15E-06, loss= 1.0753 (max= 1.5573), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:47,181 - root - INFO - Step 3230: lr=6.15E-06, loss= 1.0753 (max= 1.5573), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:47,181 - root - INFO - Step 3230: lr=6.15E-06, loss= 1.0753 (max= 1.5573), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:47,181 - root - INFO - Step 3230: lr=6.15E-06, loss= 1.0753 (max= 1.5573), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:57:47,181 - root - INFO - Step 3230: lr=6.15E-06, loss= 1.0753 (max= 1.5573), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:19,038 - root - INFO - Step 3240: lr=6.14E-06, loss= 1.0823 (max= 1.6625), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:19,038 - root - INFO - Step 3240: lr=6.14E-06, loss= 1.0823 (max= 1.6625), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:19,038 - root - INFO - Step 3240: lr=6.14E-06, loss= 1.0823 (max= 1.6625), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:19,038 - root - INFO - Step 3240: lr=6.14E-06, loss= 1.0823 (max= 1.6625), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:19,038 - root - INFO - Step 3240: lr=6.14E-06, loss= 1.0823 (max= 1.6625), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:19,038 - root - INFO - Step 3240: lr=6.14E-06, loss= 1.0823 (max= 1.6625), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:19,039 - root - INFO - Step 3240: lr=6.14E-06, loss= 1.0823 (max= 1.6625), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:19,039 - root - INFO - Step 3240: lr=6.14E-06, loss= 1.0823 (max= 1.6625), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:51,020 - root - INFO - Step 3250: lr=6.14E-06, loss= 1.0939 (max= 1.7768), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:51,020 - root - INFO - Step 3250: lr=6.14E-06, loss= 1.0939 (max= 1.7768), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:51,020 - root - INFO - Step 3250: lr=6.14E-06, loss= 1.0939 (max= 1.7768), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:51,020 - root - INFO - Step 3250: lr=6.14E-06, loss= 1.0939 (max= 1.7768), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:51,020 - root - INFO - Step 3250: lr=6.14E-06, loss= 1.0939 (max= 1.7768), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:51,020 - root - INFO - Step 3250: lr=6.14E-06, loss= 1.0939 (max= 1.7768), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:51,020 - root - INFO - Step 3250: lr=6.14E-06, loss= 1.0939 (max= 1.7768), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:58:51,020 - root - INFO - Step 3250: lr=6.14E-06, loss= 1.0939 (max= 1.7768), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:22,927 - root - INFO - Step 3260: lr=6.13E-06, loss= 1.0809 (max= 1.6590), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:22,927 - root - INFO - Step 3260: lr=6.13E-06, loss= 1.0809 (max= 1.6590), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:22,927 - root - INFO - Step 3260: lr=6.13E-06, loss= 1.0809 (max= 1.6590), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:22,927 - root - INFO - Step 3260: lr=6.13E-06, loss= 1.0809 (max= 1.6590), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:22,927 - root - INFO - Step 3260: lr=6.13E-06, loss= 1.0809 (max= 1.6590), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:22,927 - root - INFO - Step 3260: lr=6.13E-06, loss= 1.0809 (max= 1.6590), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:22,927 - root - INFO - Step 3260: lr=6.13E-06, loss= 1.0809 (max= 1.6590), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:22,927 - root - INFO - Step 3260: lr=6.13E-06, loss= 1.0809 (max= 1.6590), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:54,805 - root - INFO - Step 3270: lr=6.12E-06, loss= 1.0816 (max= 1.8677), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:54,805 - root - INFO - Step 3270: lr=6.12E-06, loss= 1.0816 (max= 1.8677), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:54,805 - root - INFO - Step 3270: lr=6.12E-06, loss= 1.0816 (max= 1.8677), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:54,805 - root - INFO - Step 3270: lr=6.12E-06, loss= 1.0816 (max= 1.8677), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:54,805 - root - INFO - Step 3270: lr=6.12E-06, loss= 1.0816 (max= 1.8677), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:54,805 - root - INFO - Step 3270: lr=6.12E-06, loss= 1.0816 (max= 1.8677), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:54,805 - root - INFO - Step 3270: lr=6.12E-06, loss= 1.0816 (max= 1.8677), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 10:59:54,805 - root - INFO - Step 3270: lr=6.12E-06, loss= 1.0816 (max= 1.8677), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:26,712 - root - INFO - Step 3280: lr=6.12E-06, loss= 1.0564 (max= 1.6069), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:26,712 - root - INFO - Step 3280: lr=6.12E-06, loss= 1.0564 (max= 1.6069), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:26,712 - root - INFO - Step 3280: lr=6.12E-06, loss= 1.0564 (max= 1.6069), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:26,712 - root - INFO - Step 3280: lr=6.12E-06, loss= 1.0564 (max= 1.6069), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:26,712 - root - INFO - Step 3280: lr=6.12E-06, loss= 1.0564 (max= 1.6069), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:26,713 - root - INFO - Step 3280: lr=6.12E-06, loss= 1.0564 (max= 1.6069), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:26,713 - root - INFO - Step 3280: lr=6.12E-06, loss= 1.0564 (max= 1.6069), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:26,713 - root - INFO - Step 3280: lr=6.12E-06, loss= 1.0564 (max= 1.6069), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:58,859 - root - INFO - Step 3290: lr=6.11E-06, loss= 1.0911 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:58,859 - root - INFO - Step 3290: lr=6.11E-06, loss= 1.0911 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:58,859 - root - INFO - Step 3290: lr=6.11E-06, loss= 1.0911 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:58,859 - root - INFO - Step 3290: lr=6.11E-06, loss= 1.0911 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:58,859 - root - INFO - Step 3290: lr=6.11E-06, loss= 1.0911 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:58,859 - root - INFO - Step 3290: lr=6.11E-06, loss= 1.0911 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:58,859 - root - INFO - Step 3290: lr=6.11E-06, loss= 1.0911 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:00:58,859 - root - INFO - Step 3290: lr=6.11E-06, loss= 1.0911 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:01:30,659 - root - INFO - Step 3300: lr=6.10E-06, loss= 1.0557 (max= 1.5310), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:01:30,659 - root - INFO - Step 3300: lr=6.10E-06, loss= 1.0557 (max= 1.5310), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:01:30,659 - root - INFO - Step 3300: lr=6.10E-06, loss= 1.0557 (max= 1.5310), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:01:30,659 - root - INFO - Step 3300: lr=6.10E-06, loss= 1.0557 (max= 1.5310), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:01:30,659 - root - INFO - Step 3300: lr=6.10E-06, loss= 1.0557 (max= 1.5310), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:01:30,659 - root - INFO - Step 3300: lr=6.10E-06, loss= 1.0557 (max= 1.5310), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:01:30,659 - root - INFO - Step 3300: lr=6.10E-06, loss= 1.0557 (max= 1.5310), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:01:30,659 - root - INFO - Step 3300: lr=6.10E-06, loss= 1.0557 (max= 1.5310), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:02,445 - root - INFO - Step 3310: lr=6.09E-06, loss= 1.0823 (max= 1.5774), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:02,445 - root - INFO - Step 3310: lr=6.09E-06, loss= 1.0823 (max= 1.5774), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:02,445 - root - INFO - Step 3310: lr=6.09E-06, loss= 1.0823 (max= 1.5774), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:02,445 - root - INFO - Step 3310: lr=6.09E-06, loss= 1.0823 (max= 1.5774), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:02,445 - root - INFO - Step 3310: lr=6.09E-06, loss= 1.0823 (max= 1.5774), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:02,445 - root - INFO - Step 3310: lr=6.09E-06, loss= 1.0823 (max= 1.5774), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:02,445 - root - INFO - Step 3310: lr=6.09E-06, loss= 1.0823 (max= 1.5774), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:02,446 - root - INFO - Step 3310: lr=6.09E-06, loss= 1.0823 (max= 1.5774), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:34,301 - root - INFO - Step 3320: lr=6.09E-06, loss= 1.0705 (max= 1.6010), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:34,301 - root - INFO - Step 3320: lr=6.09E-06, loss= 1.0705 (max= 1.6010), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:34,301 - root - INFO - Step 3320: lr=6.09E-06, loss= 1.0705 (max= 1.6010), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:34,301 - root - INFO - Step 3320: lr=6.09E-06, loss= 1.0705 (max= 1.6010), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:34,301 - root - INFO - Step 3320: lr=6.09E-06, loss= 1.0705 (max= 1.6010), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:34,301 - root - INFO - Step 3320: lr=6.09E-06, loss= 1.0705 (max= 1.6010), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:34,301 - root - INFO - Step 3320: lr=6.09E-06, loss= 1.0705 (max= 1.6010), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:02:34,301 - root - INFO - Step 3320: lr=6.09E-06, loss= 1.0705 (max= 1.6010), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:06,108 - root - INFO - Step 3330: lr=6.08E-06, loss= 1.0823 (max= 1.5168), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:06,108 - root - INFO - Step 3330: lr=6.08E-06, loss= 1.0823 (max= 1.5168), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:06,108 - root - INFO - Step 3330: lr=6.08E-06, loss= 1.0823 (max= 1.5168), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:06,109 - root - INFO - Step 3330: lr=6.08E-06, loss= 1.0823 (max= 1.5168), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:06,109 - root - INFO - Step 3330: lr=6.08E-06, loss= 1.0823 (max= 1.5168), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:06,109 - root - INFO - Step 3330: lr=6.08E-06, loss= 1.0823 (max= 1.5168), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:06,109 - root - INFO - Step 3330: lr=6.08E-06, loss= 1.0823 (max= 1.5168), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:06,109 - root - INFO - Step 3330: lr=6.08E-06, loss= 1.0823 (max= 1.5168), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:37,998 - root - INFO - Step 3340: lr=6.07E-06, loss= 1.0878 (max= 1.5124), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:37,998 - root - INFO - Step 3340: lr=6.07E-06, loss= 1.0878 (max= 1.5124), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:37,998 - root - INFO - Step 3340: lr=6.07E-06, loss= 1.0878 (max= 1.5124), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:37,998 - root - INFO - Step 3340: lr=6.07E-06, loss= 1.0878 (max= 1.5124), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:37,998 - root - INFO - Step 3340: lr=6.07E-06, loss= 1.0878 (max= 1.5124), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:37,998 - root - INFO - Step 3340: lr=6.07E-06, loss= 1.0878 (max= 1.5124), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:37,998 - root - INFO - Step 3340: lr=6.07E-06, loss= 1.0878 (max= 1.5124), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:03:37,998 - root - INFO - Step 3340: lr=6.07E-06, loss= 1.0878 (max= 1.5124), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:09,880 - root - INFO - Step 3350: lr=6.07E-06, loss= 1.0750 (max= 1.5666), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:09,880 - root - INFO - Step 3350: lr=6.07E-06, loss= 1.0750 (max= 1.5666), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:09,880 - root - INFO - Step 3350: lr=6.07E-06, loss= 1.0750 (max= 1.5666), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:09,880 - root - INFO - Step 3350: lr=6.07E-06, loss= 1.0750 (max= 1.5666), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:09,880 - root - INFO - Step 3350: lr=6.07E-06, loss= 1.0750 (max= 1.5666), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:09,880 - root - INFO - Step 3350: lr=6.07E-06, loss= 1.0750 (max= 1.5666), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:09,880 - root - INFO - Step 3350: lr=6.07E-06, loss= 1.0750 (max= 1.5666), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:09,880 - root - INFO - Step 3350: lr=6.07E-06, loss= 1.0750 (max= 1.5666), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:41,744 - root - INFO - Step 3360: lr=6.06E-06, loss= 1.0687 (max= 1.7089), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:41,744 - root - INFO - Step 3360: lr=6.06E-06, loss= 1.0687 (max= 1.7089), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:41,744 - root - INFO - Step 3360: lr=6.06E-06, loss= 1.0687 (max= 1.7089), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:41,744 - root - INFO - Step 3360: lr=6.06E-06, loss= 1.0687 (max= 1.7089), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:41,744 - root - INFO - Step 3360: lr=6.06E-06, loss= 1.0687 (max= 1.7089), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:41,744 - root - INFO - Step 3360: lr=6.06E-06, loss= 1.0687 (max= 1.7089), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:41,744 - root - INFO - Step 3360: lr=6.06E-06, loss= 1.0687 (max= 1.7089), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:04:41,744 - root - INFO - Step 3360: lr=6.06E-06, loss= 1.0687 (max= 1.7089), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:13,679 - root - INFO - Step 3370: lr=6.05E-06, loss= 1.0808 (max= 1.5525), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:13,679 - root - INFO - Step 3370: lr=6.05E-06, loss= 1.0808 (max= 1.5525), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:13,679 - root - INFO - Step 3370: lr=6.05E-06, loss= 1.0808 (max= 1.5525), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:13,679 - root - INFO - Step 3370: lr=6.05E-06, loss= 1.0808 (max= 1.5525), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:13,679 - root - INFO - Step 3370: lr=6.05E-06, loss= 1.0808 (max= 1.5525), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:13,679 - root - INFO - Step 3370: lr=6.05E-06, loss= 1.0808 (max= 1.5525), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:13,679 - root - INFO - Step 3370: lr=6.05E-06, loss= 1.0808 (max= 1.5525), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:13,679 - root - INFO - Step 3370: lr=6.05E-06, loss= 1.0808 (max= 1.5525), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:45,432 - root - INFO - Step 3380: lr=6.05E-06, loss= 1.0568 (max= 1.5564), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:45,432 - root - INFO - Step 3380: lr=6.05E-06, loss= 1.0568 (max= 1.5564), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:45,432 - root - INFO - Step 3380: lr=6.05E-06, loss= 1.0568 (max= 1.5564), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:45,432 - root - INFO - Step 3380: lr=6.05E-06, loss= 1.0568 (max= 1.5564), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:45,432 - root - INFO - Step 3380: lr=6.05E-06, loss= 1.0568 (max= 1.5564), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:45,432 - root - INFO - Step 3380: lr=6.05E-06, loss= 1.0568 (max= 1.5564), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:45,432 - root - INFO - Step 3380: lr=6.05E-06, loss= 1.0568 (max= 1.5564), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:05:45,432 - root - INFO - Step 3380: lr=6.05E-06, loss= 1.0568 (max= 1.5564), tps=20642, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:17,381 - root - INFO - Step 3390: lr=6.04E-06, loss= 1.0796 (max= 1.6880), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:17,381 - root - INFO - Step 3390: lr=6.04E-06, loss= 1.0796 (max= 1.6880), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:17,381 - root - INFO - Step 3390: lr=6.04E-06, loss= 1.0796 (max= 1.6880), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:17,381 - root - INFO - Step 3390: lr=6.04E-06, loss= 1.0796 (max= 1.6880), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:17,381 - root - INFO - Step 3390: lr=6.04E-06, loss= 1.0796 (max= 1.6880), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:17,381 - root - INFO - Step 3390: lr=6.04E-06, loss= 1.0796 (max= 1.6880), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:17,381 - root - INFO - Step 3390: lr=6.04E-06, loss= 1.0796 (max= 1.6880), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:17,381 - root - INFO - Step 3390: lr=6.04E-06, loss= 1.0796 (max= 1.6880), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:49,223 - root - INFO - Step 3400: lr=6.03E-06, loss= 1.0523 (max= 1.6573), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:49,224 - root - INFO - Step 3400: lr=6.03E-06, loss= 1.0523 (max= 1.6573), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:49,224 - root - INFO - Step 3400: lr=6.03E-06, loss= 1.0523 (max= 1.6573), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:49,224 - root - INFO - Step 3400: lr=6.03E-06, loss= 1.0523 (max= 1.6573), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:49,224 - root - INFO - Step 3400: lr=6.03E-06, loss= 1.0523 (max= 1.6573), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:49,224 - root - INFO - Step 3400: lr=6.03E-06, loss= 1.0523 (max= 1.6573), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:49,224 - root - INFO - Step 3400: lr=6.03E-06, loss= 1.0523 (max= 1.6573), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:06:49,224 - root - INFO - Step 3400: lr=6.03E-06, loss= 1.0523 (max= 1.6573), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:21,028 - root - INFO - Step 3410: lr=6.03E-06, loss= 1.0622 (max= 1.6197), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:21,028 - root - INFO - Step 3410: lr=6.03E-06, loss= 1.0622 (max= 1.6197), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:21,028 - root - INFO - Step 3410: lr=6.03E-06, loss= 1.0622 (max= 1.6197), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:21,028 - root - INFO - Step 3410: lr=6.03E-06, loss= 1.0622 (max= 1.6197), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:21,028 - root - INFO - Step 3410: lr=6.03E-06, loss= 1.0622 (max= 1.6197), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:21,028 - root - INFO - Step 3410: lr=6.03E-06, loss= 1.0622 (max= 1.6197), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:21,028 - root - INFO - Step 3410: lr=6.03E-06, loss= 1.0622 (max= 1.6197), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:21,028 - root - INFO - Step 3410: lr=6.03E-06, loss= 1.0622 (max= 1.6197), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:52,998 - root - INFO - Step 3420: lr=6.02E-06, loss= 1.0998 (max= 1.7061), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:52,998 - root - INFO - Step 3420: lr=6.02E-06, loss= 1.0998 (max= 1.7061), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:52,998 - root - INFO - Step 3420: lr=6.02E-06, loss= 1.0998 (max= 1.7061), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:52,998 - root - INFO - Step 3420: lr=6.02E-06, loss= 1.0998 (max= 1.7061), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:52,998 - root - INFO - Step 3420: lr=6.02E-06, loss= 1.0998 (max= 1.7061), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:52,999 - root - INFO - Step 3420: lr=6.02E-06, loss= 1.0998 (max= 1.7061), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:52,999 - root - INFO - Step 3420: lr=6.02E-06, loss= 1.0998 (max= 1.7061), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:07:52,999 - root - INFO - Step 3420: lr=6.02E-06, loss= 1.0998 (max= 1.7061), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:24,851 - root - INFO - Step 3430: lr=6.01E-06, loss= 1.0504 (max= 1.6755), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:24,851 - root - INFO - Step 3430: lr=6.01E-06, loss= 1.0504 (max= 1.6755), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:24,851 - root - INFO - Step 3430: lr=6.01E-06, loss= 1.0504 (max= 1.6755), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:24,851 - root - INFO - Step 3430: lr=6.01E-06, loss= 1.0504 (max= 1.6755), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:24,852 - root - INFO - Step 3430: lr=6.01E-06, loss= 1.0504 (max= 1.6755), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:24,852 - root - INFO - Step 3430: lr=6.01E-06, loss= 1.0504 (max= 1.6755), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:24,852 - root - INFO - Step 3430: lr=6.01E-06, loss= 1.0504 (max= 1.6755), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:24,852 - root - INFO - Step 3430: lr=6.01E-06, loss= 1.0504 (max= 1.6755), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:56,667 - root - INFO - Step 3440: lr=6.01E-06, loss= 1.0765 (max= 1.4929), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:56,667 - root - INFO - Step 3440: lr=6.01E-06, loss= 1.0765 (max= 1.4929), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:56,667 - root - INFO - Step 3440: lr=6.01E-06, loss= 1.0765 (max= 1.4929), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:56,667 - root - INFO - Step 3440: lr=6.01E-06, loss= 1.0765 (max= 1.4929), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:56,667 - root - INFO - Step 3440: lr=6.01E-06, loss= 1.0765 (max= 1.4929), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:56,667 - root - INFO - Step 3440: lr=6.01E-06, loss= 1.0765 (max= 1.4929), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:56,667 - root - INFO - Step 3440: lr=6.01E-06, loss= 1.0765 (max= 1.4929), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:08:56,667 - root - INFO - Step 3440: lr=6.01E-06, loss= 1.0765 (max= 1.4929), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:09:28,582 - root - INFO - Step 3450: lr=6.00E-06, loss= 1.0186 (max= 1.5147), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:09:28,582 - root - INFO - Step 3450: lr=6.00E-06, loss= 1.0186 (max= 1.5147), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:09:28,582 - root - INFO - Step 3450: lr=6.00E-06, loss= 1.0186 (max= 1.5147), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:09:28,582 - root - INFO - Step 3450: lr=6.00E-06, loss= 1.0186 (max= 1.5147), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:09:28,582 - root - INFO - Step 3450: lr=6.00E-06, loss= 1.0186 (max= 1.5147), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:09:28,583 - root - INFO - Step 3450: lr=6.00E-06, loss= 1.0186 (max= 1.5147), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:09:28,583 - root - INFO - Step 3450: lr=6.00E-06, loss= 1.0186 (max= 1.5147), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:09:28,583 - root - INFO - Step 3450: lr=6.00E-06, loss= 1.0186 (max= 1.5147), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:00,560 - root - INFO - Step 3460: lr=5.99E-06, loss= 1.0492 (max= 1.5002), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:00,560 - root - INFO - Step 3460: lr=5.99E-06, loss= 1.0492 (max= 1.5002), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:00,560 - root - INFO - Step 3460: lr=5.99E-06, loss= 1.0492 (max= 1.5002), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:00,560 - root - INFO - Step 3460: lr=5.99E-06, loss= 1.0492 (max= 1.5002), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:00,561 - root - INFO - Step 3460: lr=5.99E-06, loss= 1.0492 (max= 1.5002), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:00,561 - root - INFO - Step 3460: lr=5.99E-06, loss= 1.0492 (max= 1.5002), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:00,561 - root - INFO - Step 3460: lr=5.99E-06, loss= 1.0492 (max= 1.5002), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:00,561 - root - INFO - Step 3460: lr=5.99E-06, loss= 1.0492 (max= 1.5002), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:32,446 - root - INFO - Step 3470: lr=5.99E-06, loss= 1.0726 (max= 1.5606), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:32,446 - root - INFO - Step 3470: lr=5.99E-06, loss= 1.0726 (max= 1.5606), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:32,446 - root - INFO - Step 3470: lr=5.99E-06, loss= 1.0726 (max= 1.5606), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:32,446 - root - INFO - Step 3470: lr=5.99E-06, loss= 1.0726 (max= 1.5606), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:32,446 - root - INFO - Step 3470: lr=5.99E-06, loss= 1.0726 (max= 1.5606), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:32,446 - root - INFO - Step 3470: lr=5.99E-06, loss= 1.0726 (max= 1.5606), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:32,446 - root - INFO - Step 3470: lr=5.99E-06, loss= 1.0726 (max= 1.5606), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:10:32,446 - root - INFO - Step 3470: lr=5.99E-06, loss= 1.0726 (max= 1.5606), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:04,230 - root - INFO - Step 3480: lr=5.98E-06, loss= 1.0825 (max= 1.4473), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:04,230 - root - INFO - Step 3480: lr=5.98E-06, loss= 1.0825 (max= 1.4473), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:04,230 - root - INFO - Step 3480: lr=5.98E-06, loss= 1.0825 (max= 1.4473), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:04,230 - root - INFO - Step 3480: lr=5.98E-06, loss= 1.0825 (max= 1.4473), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:04,230 - root - INFO - Step 3480: lr=5.98E-06, loss= 1.0825 (max= 1.4473), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:04,230 - root - INFO - Step 3480: lr=5.98E-06, loss= 1.0825 (max= 1.4473), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:04,230 - root - INFO - Step 3480: lr=5.98E-06, loss= 1.0825 (max= 1.4473), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:04,230 - root - INFO - Step 3480: lr=5.98E-06, loss= 1.0825 (max= 1.4473), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:36,085 - root - INFO - Step 3490: lr=5.97E-06, loss= 1.0875 (max= 1.6736), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:36,085 - root - INFO - Step 3490: lr=5.97E-06, loss= 1.0875 (max= 1.6736), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:36,085 - root - INFO - Step 3490: lr=5.97E-06, loss= 1.0875 (max= 1.6736), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:36,085 - root - INFO - Step 3490: lr=5.97E-06, loss= 1.0875 (max= 1.6736), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:36,085 - root - INFO - Step 3490: lr=5.97E-06, loss= 1.0875 (max= 1.6736), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:36,086 - root - INFO - Step 3490: lr=5.97E-06, loss= 1.0875 (max= 1.6736), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:36,086 - root - INFO - Step 3490: lr=5.97E-06, loss= 1.0875 (max= 1.6736), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:11:36,086 - root - INFO - Step 3490: lr=5.97E-06, loss= 1.0875 (max= 1.6736), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:07,915 - root - INFO - Step 3500: lr=5.96E-06, loss= 1.0732 (max= 1.7166), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:07,915 - root - INFO - Step 3500: lr=5.96E-06, loss= 1.0732 (max= 1.7166), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:07,915 - root - INFO - Step 3500: lr=5.96E-06, loss= 1.0732 (max= 1.7166), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:07,915 - root - INFO - Step 3500: lr=5.96E-06, loss= 1.0732 (max= 1.7166), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:07,915 - root - INFO - Step 3500: lr=5.96E-06, loss= 1.0732 (max= 1.7166), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:07,916 - root - INFO - Step 3500: lr=5.96E-06, loss= 1.0732 (max= 1.7166), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:07,916 - root - INFO - Step 3500: lr=5.96E-06, loss= 1.0732 (max= 1.7166), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:07,916 - root - INFO - Step 3500: lr=5.96E-06, loss= 1.0732 (max= 1.7166), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:39,753 - root - INFO - Step 3510: lr=5.96E-06, loss= 1.0558 (max= 1.5517), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:39,753 - root - INFO - Step 3510: lr=5.96E-06, loss= 1.0558 (max= 1.5517), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:39,753 - root - INFO - Step 3510: lr=5.96E-06, loss= 1.0558 (max= 1.5517), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:39,753 - root - INFO - Step 3510: lr=5.96E-06, loss= 1.0558 (max= 1.5517), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:39,753 - root - INFO - Step 3510: lr=5.96E-06, loss= 1.0558 (max= 1.5517), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:39,753 - root - INFO - Step 3510: lr=5.96E-06, loss= 1.0558 (max= 1.5517), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:39,753 - root - INFO - Step 3510: lr=5.96E-06, loss= 1.0558 (max= 1.5517), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:12:39,753 - root - INFO - Step 3510: lr=5.96E-06, loss= 1.0558 (max= 1.5517), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:11,641 - root - INFO - Step 3520: lr=5.95E-06, loss= 1.0666 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:11,641 - root - INFO - Step 3520: lr=5.95E-06, loss= 1.0666 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:11,641 - root - INFO - Step 3520: lr=5.95E-06, loss= 1.0666 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:11,641 - root - INFO - Step 3520: lr=5.95E-06, loss= 1.0666 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:11,641 - root - INFO - Step 3520: lr=5.95E-06, loss= 1.0666 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:11,641 - root - INFO - Step 3520: lr=5.95E-06, loss= 1.0666 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:11,641 - root - INFO - Step 3520: lr=5.95E-06, loss= 1.0666 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:11,641 - root - INFO - Step 3520: lr=5.95E-06, loss= 1.0666 (max= 1.5716), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:43,536 - root - INFO - Step 3530: lr=5.94E-06, loss= 1.0747 (max= 1.9695), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:43,536 - root - INFO - Step 3530: lr=5.94E-06, loss= 1.0747 (max= 1.9695), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:43,536 - root - INFO - Step 3530: lr=5.94E-06, loss= 1.0747 (max= 1.9695), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:43,536 - root - INFO - Step 3530: lr=5.94E-06, loss= 1.0747 (max= 1.9695), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:43,536 - root - INFO - Step 3530: lr=5.94E-06, loss= 1.0747 (max= 1.9695), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:43,536 - root - INFO - Step 3530: lr=5.94E-06, loss= 1.0747 (max= 1.9695), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:43,536 - root - INFO - Step 3530: lr=5.94E-06, loss= 1.0747 (max= 1.9695), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:13:43,536 - root - INFO - Step 3530: lr=5.94E-06, loss= 1.0747 (max= 1.9695), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:15,357 - root - INFO - Step 3540: lr=5.94E-06, loss= 1.0733 (max= 1.5128), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:15,357 - root - INFO - Step 3540: lr=5.94E-06, loss= 1.0733 (max= 1.5128), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:15,357 - root - INFO - Step 3540: lr=5.94E-06, loss= 1.0733 (max= 1.5128), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:15,357 - root - INFO - Step 3540: lr=5.94E-06, loss= 1.0733 (max= 1.5128), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:15,357 - root - INFO - Step 3540: lr=5.94E-06, loss= 1.0733 (max= 1.5128), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:15,357 - root - INFO - Step 3540: lr=5.94E-06, loss= 1.0733 (max= 1.5128), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:15,357 - root - INFO - Step 3540: lr=5.94E-06, loss= 1.0733 (max= 1.5128), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:15,357 - root - INFO - Step 3540: lr=5.94E-06, loss= 1.0733 (max= 1.5128), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:47,143 - root - INFO - Step 3550: lr=5.93E-06, loss= 1.1007 (max= 1.5162), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:47,143 - root - INFO - Step 3550: lr=5.93E-06, loss= 1.1007 (max= 1.5162), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:47,143 - root - INFO - Step 3550: lr=5.93E-06, loss= 1.1007 (max= 1.5162), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:47,143 - root - INFO - Step 3550: lr=5.93E-06, loss= 1.1007 (max= 1.5162), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:47,143 - root - INFO - Step 3550: lr=5.93E-06, loss= 1.1007 (max= 1.5162), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:47,143 - root - INFO - Step 3550: lr=5.93E-06, loss= 1.1007 (max= 1.5162), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:47,144 - root - INFO - Step 3550: lr=5.93E-06, loss= 1.1007 (max= 1.5162), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:14:47,144 - root - INFO - Step 3550: lr=5.93E-06, loss= 1.1007 (max= 1.5162), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:19,068 - root - INFO - Step 3560: lr=5.92E-06, loss= 1.0728 (max= 1.4681), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:19,068 - root - INFO - Step 3560: lr=5.92E-06, loss= 1.0728 (max= 1.4681), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:19,068 - root - INFO - Step 3560: lr=5.92E-06, loss= 1.0728 (max= 1.4681), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:19,068 - root - INFO - Step 3560: lr=5.92E-06, loss= 1.0728 (max= 1.4681), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:19,068 - root - INFO - Step 3560: lr=5.92E-06, loss= 1.0728 (max= 1.4681), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:19,068 - root - INFO - Step 3560: lr=5.92E-06, loss= 1.0728 (max= 1.4681), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:19,068 - root - INFO - Step 3560: lr=5.92E-06, loss= 1.0728 (max= 1.4681), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:19,068 - root - INFO - Step 3560: lr=5.92E-06, loss= 1.0728 (max= 1.4681), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:50,907 - root - INFO - Step 3570: lr=5.92E-06, loss= 1.0646 (max= 1.7237), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:50,907 - root - INFO - Step 3570: lr=5.92E-06, loss= 1.0646 (max= 1.7237), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:50,907 - root - INFO - Step 3570: lr=5.92E-06, loss= 1.0646 (max= 1.7237), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:50,907 - root - INFO - Step 3570: lr=5.92E-06, loss= 1.0646 (max= 1.7237), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:50,907 - root - INFO - Step 3570: lr=5.92E-06, loss= 1.0646 (max= 1.7237), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:50,907 - root - INFO - Step 3570: lr=5.92E-06, loss= 1.0646 (max= 1.7237), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:50,907 - root - INFO - Step 3570: lr=5.92E-06, loss= 1.0646 (max= 1.7237), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:15:50,907 - root - INFO - Step 3570: lr=5.92E-06, loss= 1.0646 (max= 1.7237), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:22,884 - root - INFO - Step 3580: lr=5.91E-06, loss= 1.0703 (max= 1.5240), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:22,884 - root - INFO - Step 3580: lr=5.91E-06, loss= 1.0703 (max= 1.5240), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:22,884 - root - INFO - Step 3580: lr=5.91E-06, loss= 1.0703 (max= 1.5240), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:22,884 - root - INFO - Step 3580: lr=5.91E-06, loss= 1.0703 (max= 1.5240), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:22,884 - root - INFO - Step 3580: lr=5.91E-06, loss= 1.0703 (max= 1.5240), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:22,885 - root - INFO - Step 3580: lr=5.91E-06, loss= 1.0703 (max= 1.5240), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:22,885 - root - INFO - Step 3580: lr=5.91E-06, loss= 1.0703 (max= 1.5240), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:22,885 - root - INFO - Step 3580: lr=5.91E-06, loss= 1.0703 (max= 1.5240), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:54,682 - root - INFO - Step 3590: lr=5.90E-06, loss= 1.0794 (max= 1.5053), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:54,682 - root - INFO - Step 3590: lr=5.90E-06, loss= 1.0794 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:54,682 - root - INFO - Step 3590: lr=5.90E-06, loss= 1.0794 (max= 1.5053), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:54,682 - root - INFO - Step 3590: lr=5.90E-06, loss= 1.0794 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:54,682 - root - INFO - Step 3590: lr=5.90E-06, loss= 1.0794 (max= 1.5053), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:54,682 - root - INFO - Step 3590: lr=5.90E-06, loss= 1.0794 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:54,682 - root - INFO - Step 3590: lr=5.90E-06, loss= 1.0794 (max= 1.5053), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:16:54,682 - root - INFO - Step 3590: lr=5.90E-06, loss= 1.0794 (max= 1.5053), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:26,663 - root - INFO - Step 3600: lr=5.90E-06, loss= 1.0421 (max= 1.6292), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:26,663 - root - INFO - Step 3600: lr=5.90E-06, loss= 1.0421 (max= 1.6292), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:26,663 - root - INFO - Step 3600: lr=5.90E-06, loss= 1.0421 (max= 1.6292), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:26,663 - root - INFO - Step 3600: lr=5.90E-06, loss= 1.0421 (max= 1.6292), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:26,664 - root - INFO - Step 3600: lr=5.90E-06, loss= 1.0421 (max= 1.6292), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:26,664 - root - INFO - Step 3600: lr=5.90E-06, loss= 1.0421 (max= 1.6292), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:26,664 - root - INFO - Step 3600: lr=5.90E-06, loss= 1.0421 (max= 1.6292), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:26,664 - root - INFO - Step 3600: lr=5.90E-06, loss= 1.0421 (max= 1.6292), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:58,523 - root - INFO - Step 3610: lr=5.89E-06, loss= 1.0676 (max= 1.5739), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:58,523 - root - INFO - Step 3610: lr=5.89E-06, loss= 1.0676 (max= 1.5739), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:58,523 - root - INFO - Step 3610: lr=5.89E-06, loss= 1.0676 (max= 1.5739), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:58,523 - root - INFO - Step 3610: lr=5.89E-06, loss= 1.0676 (max= 1.5739), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:58,523 - root - INFO - Step 3610: lr=5.89E-06, loss= 1.0676 (max= 1.5739), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:58,523 - root - INFO - Step 3610: lr=5.89E-06, loss= 1.0676 (max= 1.5739), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:58,523 - root - INFO - Step 3610: lr=5.89E-06, loss= 1.0676 (max= 1.5739), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:17:58,523 - root - INFO - Step 3610: lr=5.89E-06, loss= 1.0676 (max= 1.5739), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:18:30,288 - root - INFO - Step 3620: lr=5.89E-06, loss= 1.0716 (max= 1.7052), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:18:30,288 - root - INFO - Step 3620: lr=5.89E-06, loss= 1.0716 (max= 1.7052), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:18:30,288 - root - INFO - Step 3620: lr=5.89E-06, loss= 1.0716 (max= 1.7052), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:18:30,288 - root - INFO - Step 3620: lr=5.89E-06, loss= 1.0716 (max= 1.7052), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:18:30,288 - root - INFO - Step 3620: lr=5.89E-06, loss= 1.0716 (max= 1.7052), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:18:30,289 - root - INFO - Step 3620: lr=5.89E-06, loss= 1.0716 (max= 1.7052), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:18:30,289 - root - INFO - Step 3620: lr=5.89E-06, loss= 1.0716 (max= 1.7052), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:18:30,289 - root - INFO - Step 3620: lr=5.89E-06, loss= 1.0716 (max= 1.7052), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:02,102 - root - INFO - Step 3630: lr=5.88E-06, loss= 1.0509 (max= 1.4617), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:02,102 - root - INFO - Step 3630: lr=5.88E-06, loss= 1.0509 (max= 1.4617), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:02,102 - root - INFO - Step 3630: lr=5.88E-06, loss= 1.0509 (max= 1.4617), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:02,102 - root - INFO - Step 3630: lr=5.88E-06, loss= 1.0509 (max= 1.4617), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:02,102 - root - INFO - Step 3630: lr=5.88E-06, loss= 1.0509 (max= 1.4617), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:02,102 - root - INFO - Step 3630: lr=5.88E-06, loss= 1.0509 (max= 1.4617), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:02,102 - root - INFO - Step 3630: lr=5.88E-06, loss= 1.0509 (max= 1.4617), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:02,102 - root - INFO - Step 3630: lr=5.88E-06, loss= 1.0509 (max= 1.4617), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:34,018 - root - INFO - Step 3640: lr=5.87E-06, loss= 1.0638 (max= 1.5558), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:34,018 - root - INFO - Step 3640: lr=5.87E-06, loss= 1.0638 (max= 1.5558), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:34,019 - root - INFO - Step 3640: lr=5.87E-06, loss= 1.0638 (max= 1.5558), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:34,019 - root - INFO - Step 3640: lr=5.87E-06, loss= 1.0638 (max= 1.5558), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:34,019 - root - INFO - Step 3640: lr=5.87E-06, loss= 1.0638 (max= 1.5558), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:34,019 - root - INFO - Step 3640: lr=5.87E-06, loss= 1.0638 (max= 1.5558), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:34,019 - root - INFO - Step 3640: lr=5.87E-06, loss= 1.0638 (max= 1.5558), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:19:34,019 - root - INFO - Step 3640: lr=5.87E-06, loss= 1.0638 (max= 1.5558), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:05,832 - root - INFO - Step 3650: lr=5.87E-06, loss= 1.0548 (max= 1.4589), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:05,833 - root - INFO - Step 3650: lr=5.87E-06, loss= 1.0548 (max= 1.4589), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:05,833 - root - INFO - Step 3650: lr=5.87E-06, loss= 1.0548 (max= 1.4589), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:05,833 - root - INFO - Step 3650: lr=5.87E-06, loss= 1.0548 (max= 1.4589), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:05,833 - root - INFO - Step 3650: lr=5.87E-06, loss= 1.0548 (max= 1.4589), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:05,833 - root - INFO - Step 3650: lr=5.87E-06, loss= 1.0548 (max= 1.4589), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:05,833 - root - INFO - Step 3650: lr=5.87E-06, loss= 1.0548 (max= 1.4589), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:05,833 - root - INFO - Step 3650: lr=5.87E-06, loss= 1.0548 (max= 1.4589), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:37,728 - root - INFO - Step 3660: lr=5.86E-06, loss= 1.0517 (max= 1.4581), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:37,728 - root - INFO - Step 3660: lr=5.86E-06, loss= 1.0517 (max= 1.4581), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:37,728 - root - INFO - Step 3660: lr=5.86E-06, loss= 1.0517 (max= 1.4581), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:37,728 - root - INFO - Step 3660: lr=5.86E-06, loss= 1.0517 (max= 1.4581), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:37,728 - root - INFO - Step 3660: lr=5.86E-06, loss= 1.0517 (max= 1.4581), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:37,728 - root - INFO - Step 3660: lr=5.86E-06, loss= 1.0517 (max= 1.4581), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:37,728 - root - INFO - Step 3660: lr=5.86E-06, loss= 1.0517 (max= 1.4581), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:20:37,728 - root - INFO - Step 3660: lr=5.86E-06, loss= 1.0517 (max= 1.4581), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:09,876 - root - INFO - Step 3670: lr=5.85E-06, loss= 1.0823 (max= 1.4253), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:09,876 - root - INFO - Step 3670: lr=5.85E-06, loss= 1.0823 (max= 1.4253), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:09,877 - root - INFO - Step 3670: lr=5.85E-06, loss= 1.0823 (max= 1.4253), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:09,877 - root - INFO - Step 3670: lr=5.85E-06, loss= 1.0823 (max= 1.4253), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:09,877 - root - INFO - Step 3670: lr=5.85E-06, loss= 1.0823 (max= 1.4253), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:09,877 - root - INFO - Step 3670: lr=5.85E-06, loss= 1.0823 (max= 1.4253), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:09,877 - root - INFO - Step 3670: lr=5.85E-06, loss= 1.0823 (max= 1.4253), tps=20387, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:09,877 - root - INFO - Step 3670: lr=5.85E-06, loss= 1.0823 (max= 1.4253), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:41,681 - root - INFO - Step 3680: lr=5.85E-06, loss= 1.0716 (max= 1.4722), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:41,681 - root - INFO - Step 3680: lr=5.85E-06, loss= 1.0716 (max= 1.4722), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:41,681 - root - INFO - Step 3680: lr=5.85E-06, loss= 1.0716 (max= 1.4722), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:41,681 - root - INFO - Step 3680: lr=5.85E-06, loss= 1.0716 (max= 1.4722), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:41,681 - root - INFO - Step 3680: lr=5.85E-06, loss= 1.0716 (max= 1.4722), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:41,681 - root - INFO - Step 3680: lr=5.85E-06, loss= 1.0716 (max= 1.4722), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:41,681 - root - INFO - Step 3680: lr=5.85E-06, loss= 1.0716 (max= 1.4722), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:21:41,681 - root - INFO - Step 3680: lr=5.85E-06, loss= 1.0716 (max= 1.4722), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:13,525 - root - INFO - Step 3690: lr=5.84E-06, loss= 1.0578 (max= 1.4665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:13,525 - root - INFO - Step 3690: lr=5.84E-06, loss= 1.0578 (max= 1.4665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:13,525 - root - INFO - Step 3690: lr=5.84E-06, loss= 1.0578 (max= 1.4665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:13,525 - root - INFO - Step 3690: lr=5.84E-06, loss= 1.0578 (max= 1.4665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:13,525 - root - INFO - Step 3690: lr=5.84E-06, loss= 1.0578 (max= 1.4665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:13,525 - root - INFO - Step 3690: lr=5.84E-06, loss= 1.0578 (max= 1.4665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:13,525 - root - INFO - Step 3690: lr=5.84E-06, loss= 1.0578 (max= 1.4665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:13,525 - root - INFO - Step 3690: lr=5.84E-06, loss= 1.0578 (max= 1.4665), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:45,436 - root - INFO - Step 3700: lr=5.83E-06, loss= 1.1144 (max= 1.8410), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:45,437 - root - INFO - Step 3700: lr=5.83E-06, loss= 1.1144 (max= 1.8410), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:45,437 - root - INFO - Step 3700: lr=5.83E-06, loss= 1.1144 (max= 1.8410), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:45,437 - root - INFO - Step 3700: lr=5.83E-06, loss= 1.1144 (max= 1.8410), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:45,437 - root - INFO - Step 3700: lr=5.83E-06, loss= 1.1144 (max= 1.8410), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:45,437 - root - INFO - Step 3700: lr=5.83E-06, loss= 1.1144 (max= 1.8410), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:45,437 - root - INFO - Step 3700: lr=5.83E-06, loss= 1.1144 (max= 1.8410), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:22:45,437 - root - INFO - Step 3700: lr=5.83E-06, loss= 1.1144 (max= 1.8410), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:17,364 - root - INFO - Step 3710: lr=5.83E-06, loss= 1.0788 (max= 1.4908), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:17,364 - root - INFO - Step 3710: lr=5.83E-06, loss= 1.0788 (max= 1.4908), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:17,364 - root - INFO - Step 3710: lr=5.83E-06, loss= 1.0788 (max= 1.4908), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:17,364 - root - INFO - Step 3710: lr=5.83E-06, loss= 1.0788 (max= 1.4908), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:17,364 - root - INFO - Step 3710: lr=5.83E-06, loss= 1.0788 (max= 1.4908), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:17,364 - root - INFO - Step 3710: lr=5.83E-06, loss= 1.0788 (max= 1.4908), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:17,364 - root - INFO - Step 3710: lr=5.83E-06, loss= 1.0788 (max= 1.4908), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:17,364 - root - INFO - Step 3710: lr=5.83E-06, loss= 1.0788 (max= 1.4908), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:49,209 - root - INFO - Step 3720: lr=5.82E-06, loss= 1.0830 (max= 1.5990), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:49,210 - root - INFO - Step 3720: lr=5.82E-06, loss= 1.0830 (max= 1.5990), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:49,210 - root - INFO - Step 3720: lr=5.82E-06, loss= 1.0830 (max= 1.5990), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:49,210 - root - INFO - Step 3720: lr=5.82E-06, loss= 1.0830 (max= 1.5990), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:49,210 - root - INFO - Step 3720: lr=5.82E-06, loss= 1.0830 (max= 1.5990), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:49,210 - root - INFO - Step 3720: lr=5.82E-06, loss= 1.0830 (max= 1.5990), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:49,210 - root - INFO - Step 3720: lr=5.82E-06, loss= 1.0830 (max= 1.5990), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:23:49,210 - root - INFO - Step 3720: lr=5.82E-06, loss= 1.0830 (max= 1.5990), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:21,032 - root - INFO - Step 3730: lr=5.81E-06, loss= 1.0892 (max= 1.6412), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:21,032 - root - INFO - Step 3730: lr=5.81E-06, loss= 1.0892 (max= 1.6412), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:21,032 - root - INFO - Step 3730: lr=5.81E-06, loss= 1.0892 (max= 1.6412), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:21,033 - root - INFO - Step 3730: lr=5.81E-06, loss= 1.0892 (max= 1.6412), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:21,033 - root - INFO - Step 3730: lr=5.81E-06, loss= 1.0892 (max= 1.6412), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:21,033 - root - INFO - Step 3730: lr=5.81E-06, loss= 1.0892 (max= 1.6412), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:21,033 - root - INFO - Step 3730: lr=5.81E-06, loss= 1.0892 (max= 1.6412), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:21,033 - root - INFO - Step 3730: lr=5.81E-06, loss= 1.0892 (max= 1.6412), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:52,918 - root - INFO - Step 3740: lr=5.81E-06, loss= 1.0802 (max= 1.7472), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:52,918 - root - INFO - Step 3740: lr=5.81E-06, loss= 1.0802 (max= 1.7472), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:52,918 - root - INFO - Step 3740: lr=5.81E-06, loss= 1.0802 (max= 1.7472), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:52,918 - root - INFO - Step 3740: lr=5.81E-06, loss= 1.0802 (max= 1.7472), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:52,918 - root - INFO - Step 3740: lr=5.81E-06, loss= 1.0802 (max= 1.7472), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:52,918 - root - INFO - Step 3740: lr=5.81E-06, loss= 1.0802 (max= 1.7472), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:52,918 - root - INFO - Step 3740: lr=5.81E-06, loss= 1.0802 (max= 1.7472), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:24:52,919 - root - INFO - Step 3740: lr=5.81E-06, loss= 1.0802 (max= 1.7472), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:24,779 - root - INFO - Step 3750: lr=5.80E-06, loss= 1.0624 (max= 1.5440), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:24,779 - root - INFO - Step 3750: lr=5.80E-06, loss= 1.0624 (max= 1.5440), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:24,779 - root - INFO - Step 3750: lr=5.80E-06, loss= 1.0624 (max= 1.5440), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:24,779 - root - INFO - Step 3750: lr=5.80E-06, loss= 1.0624 (max= 1.5440), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:24,779 - root - INFO - Step 3750: lr=5.80E-06, loss= 1.0624 (max= 1.5440), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:24,779 - root - INFO - Step 3750: lr=5.80E-06, loss= 1.0624 (max= 1.5440), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:24,779 - root - INFO - Step 3750: lr=5.80E-06, loss= 1.0624 (max= 1.5440), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:24,779 - root - INFO - Step 3750: lr=5.80E-06, loss= 1.0624 (max= 1.5440), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:56,653 - root - INFO - Step 3760: lr=5.79E-06, loss= 1.0808 (max= 1.7296), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:56,653 - root - INFO - Step 3760: lr=5.79E-06, loss= 1.0808 (max= 1.7296), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:56,653 - root - INFO - Step 3760: lr=5.79E-06, loss= 1.0808 (max= 1.7296), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:56,653 - root - INFO - Step 3760: lr=5.79E-06, loss= 1.0808 (max= 1.7296), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:56,653 - root - INFO - Step 3760: lr=5.79E-06, loss= 1.0808 (max= 1.7296), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:56,654 - root - INFO - Step 3760: lr=5.79E-06, loss= 1.0808 (max= 1.7296), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:56,654 - root - INFO - Step 3760: lr=5.79E-06, loss= 1.0808 (max= 1.7296), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:25:56,654 - root - INFO - Step 3760: lr=5.79E-06, loss= 1.0808 (max= 1.7296), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:26:28,491 - root - INFO - Step 3770: lr=5.79E-06, loss= 1.0494 (max= 1.5161), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:26:28,491 - root - INFO - Step 3770: lr=5.79E-06, loss= 1.0494 (max= 1.5161), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:26:28,491 - root - INFO - Step 3770: lr=5.79E-06, loss= 1.0494 (max= 1.5161), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:26:28,491 - root - INFO - Step 3770: lr=5.79E-06, loss= 1.0494 (max= 1.5161), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:26:28,491 - root - INFO - Step 3770: lr=5.79E-06, loss= 1.0494 (max= 1.5161), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:26:28,491 - root - INFO - Step 3770: lr=5.79E-06, loss= 1.0494 (max= 1.5161), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:26:28,491 - root - INFO - Step 3770: lr=5.79E-06, loss= 1.0494 (max= 1.5161), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:26:28,491 - root - INFO - Step 3770: lr=5.79E-06, loss= 1.0494 (max= 1.5161), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:00,527 - root - INFO - Step 3780: lr=5.78E-06, loss= 1.0674 (max= 1.5729), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:00,527 - root - INFO - Step 3780: lr=5.78E-06, loss= 1.0674 (max= 1.5729), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:00,527 - root - INFO - Step 3780: lr=5.78E-06, loss= 1.0674 (max= 1.5729), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:00,527 - root - INFO - Step 3780: lr=5.78E-06, loss= 1.0674 (max= 1.5729), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:00,527 - root - INFO - Step 3780: lr=5.78E-06, loss= 1.0674 (max= 1.5729), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:00,527 - root - INFO - Step 3780: lr=5.78E-06, loss= 1.0674 (max= 1.5729), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:00,527 - root - INFO - Step 3780: lr=5.78E-06, loss= 1.0674 (max= 1.5729), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:00,528 - root - INFO - Step 3780: lr=5.78E-06, loss= 1.0674 (max= 1.5729), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:32,401 - root - INFO - Step 3790: lr=5.77E-06, loss= 1.0586 (max= 1.6090), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:32,401 - root - INFO - Step 3790: lr=5.77E-06, loss= 1.0586 (max= 1.6090), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:32,401 - root - INFO - Step 3790: lr=5.77E-06, loss= 1.0586 (max= 1.6090), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:32,401 - root - INFO - Step 3790: lr=5.77E-06, loss= 1.0586 (max= 1.6090), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:32,402 - root - INFO - Step 3790: lr=5.77E-06, loss= 1.0586 (max= 1.6090), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:32,402 - root - INFO - Step 3790: lr=5.77E-06, loss= 1.0586 (max= 1.6090), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:32,402 - root - INFO - Step 3790: lr=5.77E-06, loss= 1.0586 (max= 1.6090), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:27:32,402 - root - INFO - Step 3790: lr=5.77E-06, loss= 1.0586 (max= 1.6090), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:04,285 - root - INFO - Step 3800: lr=5.77E-06, loss= 1.0609 (max= 1.6836), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:04,285 - root - INFO - Step 3800: lr=5.77E-06, loss= 1.0609 (max= 1.6836), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:04,285 - root - INFO - Step 3800: lr=5.77E-06, loss= 1.0609 (max= 1.6836), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:04,285 - root - INFO - Step 3800: lr=5.77E-06, loss= 1.0609 (max= 1.6836), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:04,285 - root - INFO - Step 3800: lr=5.77E-06, loss= 1.0609 (max= 1.6836), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:04,285 - root - INFO - Step 3800: lr=5.77E-06, loss= 1.0609 (max= 1.6836), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:04,285 - root - INFO - Step 3800: lr=5.77E-06, loss= 1.0609 (max= 1.6836), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:04,285 - root - INFO - Step 3800: lr=5.77E-06, loss= 1.0609 (max= 1.6836), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:36,168 - root - INFO - Step 3810: lr=5.76E-06, loss= 1.0599 (max= 1.5624), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:36,168 - root - INFO - Step 3810: lr=5.76E-06, loss= 1.0599 (max= 1.5624), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:36,168 - root - INFO - Step 3810: lr=5.76E-06, loss= 1.0599 (max= 1.5624), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:36,168 - root - INFO - Step 3810: lr=5.76E-06, loss= 1.0599 (max= 1.5624), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:36,168 - root - INFO - Step 3810: lr=5.76E-06, loss= 1.0599 (max= 1.5624), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:36,168 - root - INFO - Step 3810: lr=5.76E-06, loss= 1.0599 (max= 1.5624), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:36,168 - root - INFO - Step 3810: lr=5.76E-06, loss= 1.0599 (max= 1.5624), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:28:36,169 - root - INFO - Step 3810: lr=5.76E-06, loss= 1.0599 (max= 1.5624), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:08,276 - root - INFO - Step 3820: lr=5.76E-06, loss= 1.0744 (max= 1.5171), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:08,276 - root - INFO - Step 3820: lr=5.76E-06, loss= 1.0744 (max= 1.5171), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:08,276 - root - INFO - Step 3820: lr=5.76E-06, loss= 1.0744 (max= 1.5171), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:08,276 - root - INFO - Step 3820: lr=5.76E-06, loss= 1.0744 (max= 1.5171), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:08,276 - root - INFO - Step 3820: lr=5.76E-06, loss= 1.0744 (max= 1.5171), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:08,276 - root - INFO - Step 3820: lr=5.76E-06, loss= 1.0744 (max= 1.5171), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:08,276 - root - INFO - Step 3820: lr=5.76E-06, loss= 1.0744 (max= 1.5171), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:08,277 - root - INFO - Step 3820: lr=5.76E-06, loss= 1.0744 (max= 1.5171), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:40,153 - root - INFO - Step 3830: lr=5.75E-06, loss= 1.0737 (max= 1.5992), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:40,153 - root - INFO - Step 3830: lr=5.75E-06, loss= 1.0737 (max= 1.5992), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:40,153 - root - INFO - Step 3830: lr=5.75E-06, loss= 1.0737 (max= 1.5992), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:40,153 - root - INFO - Step 3830: lr=5.75E-06, loss= 1.0737 (max= 1.5992), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:40,153 - root - INFO - Step 3830: lr=5.75E-06, loss= 1.0737 (max= 1.5992), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:40,153 - root - INFO - Step 3830: lr=5.75E-06, loss= 1.0737 (max= 1.5992), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:40,154 - root - INFO - Step 3830: lr=5.75E-06, loss= 1.0737 (max= 1.5992), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:29:40,154 - root - INFO - Step 3830: lr=5.75E-06, loss= 1.0737 (max= 1.5992), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:12,004 - root - INFO - Step 3840: lr=5.74E-06, loss= 1.0876 (max= 1.5841), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:12,004 - root - INFO - Step 3840: lr=5.74E-06, loss= 1.0876 (max= 1.5841), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:12,004 - root - INFO - Step 3840: lr=5.74E-06, loss= 1.0876 (max= 1.5841), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:12,004 - root - INFO - Step 3840: lr=5.74E-06, loss= 1.0876 (max= 1.5841), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:12,004 - root - INFO - Step 3840: lr=5.74E-06, loss= 1.0876 (max= 1.5841), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:12,004 - root - INFO - Step 3840: lr=5.74E-06, loss= 1.0876 (max= 1.5841), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:12,004 - root - INFO - Step 3840: lr=5.74E-06, loss= 1.0876 (max= 1.5841), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:12,004 - root - INFO - Step 3840: lr=5.74E-06, loss= 1.0876 (max= 1.5841), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:43,991 - root - INFO - Step 3850: lr=5.74E-06, loss= 1.0620 (max= 1.4884), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:43,991 - root - INFO - Step 3850: lr=5.74E-06, loss= 1.0620 (max= 1.4884), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:43,991 - root - INFO - Step 3850: lr=5.74E-06, loss= 1.0620 (max= 1.4884), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:43,991 - root - INFO - Step 3850: lr=5.74E-06, loss= 1.0620 (max= 1.4884), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:43,991 - root - INFO - Step 3850: lr=5.74E-06, loss= 1.0620 (max= 1.4884), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:43,991 - root - INFO - Step 3850: lr=5.74E-06, loss= 1.0620 (max= 1.4884), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:43,991 - root - INFO - Step 3850: lr=5.74E-06, loss= 1.0620 (max= 1.4884), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:30:43,991 - root - INFO - Step 3850: lr=5.74E-06, loss= 1.0620 (max= 1.4884), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:15,969 - root - INFO - Step 3860: lr=5.73E-06, loss= 1.0514 (max= 1.6348), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:15,969 - root - INFO - Step 3860: lr=5.73E-06, loss= 1.0514 (max= 1.6348), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:15,969 - root - INFO - Step 3860: lr=5.73E-06, loss= 1.0514 (max= 1.6348), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:15,970 - root - INFO - Step 3860: lr=5.73E-06, loss= 1.0514 (max= 1.6348), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:15,970 - root - INFO - Step 3860: lr=5.73E-06, loss= 1.0514 (max= 1.6348), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:15,970 - root - INFO - Step 3860: lr=5.73E-06, loss= 1.0514 (max= 1.6348), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:15,970 - root - INFO - Step 3860: lr=5.73E-06, loss= 1.0514 (max= 1.6348), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:15,970 - root - INFO - Step 3860: lr=5.73E-06, loss= 1.0514 (max= 1.6348), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:47,887 - root - INFO - Step 3870: lr=5.72E-06, loss= 1.0351 (max= 1.4586), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:47,887 - root - INFO - Step 3870: lr=5.72E-06, loss= 1.0351 (max= 1.4586), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:47,887 - root - INFO - Step 3870: lr=5.72E-06, loss= 1.0351 (max= 1.4586), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:47,887 - root - INFO - Step 3870: lr=5.72E-06, loss= 1.0351 (max= 1.4586), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:47,887 - root - INFO - Step 3870: lr=5.72E-06, loss= 1.0351 (max= 1.4586), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:47,887 - root - INFO - Step 3870: lr=5.72E-06, loss= 1.0351 (max= 1.4586), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:47,887 - root - INFO - Step 3870: lr=5.72E-06, loss= 1.0351 (max= 1.4586), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:31:47,887 - root - INFO - Step 3870: lr=5.72E-06, loss= 1.0351 (max= 1.4586), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:19,848 - root - INFO - Step 3880: lr=5.72E-06, loss= 1.0536 (max= 1.5299), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:19,848 - root - INFO - Step 3880: lr=5.72E-06, loss= 1.0536 (max= 1.5299), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:19,848 - root - INFO - Step 3880: lr=5.72E-06, loss= 1.0536 (max= 1.5299), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:19,848 - root - INFO - Step 3880: lr=5.72E-06, loss= 1.0536 (max= 1.5299), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:19,848 - root - INFO - Step 3880: lr=5.72E-06, loss= 1.0536 (max= 1.5299), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:19,848 - root - INFO - Step 3880: lr=5.72E-06, loss= 1.0536 (max= 1.5299), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:19,849 - root - INFO - Step 3880: lr=5.72E-06, loss= 1.0536 (max= 1.5299), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:19,849 - root - INFO - Step 3880: lr=5.72E-06, loss= 1.0536 (max= 1.5299), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:51,827 - root - INFO - Step 3890: lr=5.71E-06, loss= 1.0564 (max= 1.4507), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:51,827 - root - INFO - Step 3890: lr=5.71E-06, loss= 1.0564 (max= 1.4507), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:51,827 - root - INFO - Step 3890: lr=5.71E-06, loss= 1.0564 (max= 1.4507), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:51,827 - root - INFO - Step 3890: lr=5.71E-06, loss= 1.0564 (max= 1.4507), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:51,827 - root - INFO - Step 3890: lr=5.71E-06, loss= 1.0564 (max= 1.4507), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:51,827 - root - INFO - Step 3890: lr=5.71E-06, loss= 1.0564 (max= 1.4507), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:51,828 - root - INFO - Step 3890: lr=5.71E-06, loss= 1.0564 (max= 1.4507), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:32:51,828 - root - INFO - Step 3890: lr=5.71E-06, loss= 1.0564 (max= 1.4507), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:23,787 - root - INFO - Step 3900: lr=5.70E-06, loss= 1.0665 (max= 1.4659), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:23,787 - root - INFO - Step 3900: lr=5.70E-06, loss= 1.0665 (max= 1.4659), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:23,787 - root - INFO - Step 3900: lr=5.70E-06, loss= 1.0665 (max= 1.4659), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:23,787 - root - INFO - Step 3900: lr=5.70E-06, loss= 1.0665 (max= 1.4659), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:23,788 - root - INFO - Step 3900: lr=5.70E-06, loss= 1.0665 (max= 1.4659), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:23,788 - root - INFO - Step 3900: lr=5.70E-06, loss= 1.0665 (max= 1.4659), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:23,788 - root - INFO - Step 3900: lr=5.70E-06, loss= 1.0665 (max= 1.4659), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:23,788 - root - INFO - Step 3900: lr=5.70E-06, loss= 1.0665 (max= 1.4659), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:55,651 - root - INFO - Step 3910: lr=5.70E-06, loss= 1.0846 (max= 1.5159), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:55,652 - root - INFO - Step 3910: lr=5.70E-06, loss= 1.0846 (max= 1.5159), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:55,652 - root - INFO - Step 3910: lr=5.70E-06, loss= 1.0846 (max= 1.5159), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:55,652 - root - INFO - Step 3910: lr=5.70E-06, loss= 1.0846 (max= 1.5159), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:55,652 - root - INFO - Step 3910: lr=5.70E-06, loss= 1.0846 (max= 1.5159), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:55,652 - root - INFO - Step 3910: lr=5.70E-06, loss= 1.0846 (max= 1.5159), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:55,652 - root - INFO - Step 3910: lr=5.70E-06, loss= 1.0846 (max= 1.5159), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:33:55,652 - root - INFO - Step 3910: lr=5.70E-06, loss= 1.0846 (max= 1.5159), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:27,520 - root - INFO - Step 3920: lr=5.69E-06, loss= 1.0987 (max= 1.5001), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:27,520 - root - INFO - Step 3920: lr=5.69E-06, loss= 1.0987 (max= 1.5001), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:27,520 - root - INFO - Step 3920: lr=5.69E-06, loss= 1.0987 (max= 1.5001), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:27,520 - root - INFO - Step 3920: lr=5.69E-06, loss= 1.0987 (max= 1.5001), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:27,520 - root - INFO - Step 3920: lr=5.69E-06, loss= 1.0987 (max= 1.5001), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:27,520 - root - INFO - Step 3920: lr=5.69E-06, loss= 1.0987 (max= 1.5001), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:27,520 - root - INFO - Step 3920: lr=5.69E-06, loss= 1.0987 (max= 1.5001), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:27,520 - root - INFO - Step 3920: lr=5.69E-06, loss= 1.0987 (max= 1.5001), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:59,392 - root - INFO - Step 3930: lr=5.69E-06, loss= 1.0871 (max= 1.5210), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:59,392 - root - INFO - Step 3930: lr=5.69E-06, loss= 1.0871 (max= 1.5210), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:59,392 - root - INFO - Step 3930: lr=5.69E-06, loss= 1.0871 (max= 1.5210), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:59,392 - root - INFO - Step 3930: lr=5.69E-06, loss= 1.0871 (max= 1.5210), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:59,392 - root - INFO - Step 3930: lr=5.69E-06, loss= 1.0871 (max= 1.5210), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:59,392 - root - INFO - Step 3930: lr=5.69E-06, loss= 1.0871 (max= 1.5210), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:59,392 - root - INFO - Step 3930: lr=5.69E-06, loss= 1.0871 (max= 1.5210), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:34:59,392 - root - INFO - Step 3930: lr=5.69E-06, loss= 1.0871 (max= 1.5210), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:35:31,357 - root - INFO - Step 3940: lr=5.68E-06, loss= 1.0786 (max= 1.5440), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:35:31,357 - root - INFO - Step 3940: lr=5.68E-06, loss= 1.0786 (max= 1.5440), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:35:31,357 - root - INFO - Step 3940: lr=5.68E-06, loss= 1.0786 (max= 1.5440), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:35:31,357 - root - INFO - Step 3940: lr=5.68E-06, loss= 1.0786 (max= 1.5440), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:35:31,357 - root - INFO - Step 3940: lr=5.68E-06, loss= 1.0786 (max= 1.5440), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:35:31,357 - root - INFO - Step 3940: lr=5.68E-06, loss= 1.0786 (max= 1.5440), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:35:31,358 - root - INFO - Step 3940: lr=5.68E-06, loss= 1.0786 (max= 1.5440), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:35:31,358 - root - INFO - Step 3940: lr=5.68E-06, loss= 1.0786 (max= 1.5440), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:03,226 - root - INFO - Step 3950: lr=5.67E-06, loss= 1.0728 (max= 1.6643), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:03,226 - root - INFO - Step 3950: lr=5.67E-06, loss= 1.0728 (max= 1.6643), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:03,227 - root - INFO - Step 3950: lr=5.67E-06, loss= 1.0728 (max= 1.6643), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:03,227 - root - INFO - Step 3950: lr=5.67E-06, loss= 1.0728 (max= 1.6643), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:03,227 - root - INFO - Step 3950: lr=5.67E-06, loss= 1.0728 (max= 1.6643), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:03,227 - root - INFO - Step 3950: lr=5.67E-06, loss= 1.0728 (max= 1.6643), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:03,227 - root - INFO - Step 3950: lr=5.67E-06, loss= 1.0728 (max= 1.6643), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:03,227 - root - INFO - Step 3950: lr=5.67E-06, loss= 1.0728 (max= 1.6643), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:35,050 - root - INFO - Step 3960: lr=5.67E-06, loss= 1.0742 (max= 1.5063), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:35,050 - root - INFO - Step 3960: lr=5.67E-06, loss= 1.0742 (max= 1.5063), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:35,050 - root - INFO - Step 3960: lr=5.67E-06, loss= 1.0742 (max= 1.5063), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:35,050 - root - INFO - Step 3960: lr=5.67E-06, loss= 1.0742 (max= 1.5063), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:35,051 - root - INFO - Step 3960: lr=5.67E-06, loss= 1.0742 (max= 1.5063), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:35,051 - root - INFO - Step 3960: lr=5.67E-06, loss= 1.0742 (max= 1.5063), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:35,051 - root - INFO - Step 3960: lr=5.67E-06, loss= 1.0742 (max= 1.5063), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:36:35,051 - root - INFO - Step 3960: lr=5.67E-06, loss= 1.0742 (max= 1.5063), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:07,027 - root - INFO - Step 3970: lr=5.66E-06, loss= 1.0651 (max= 1.4738), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:07,027 - root - INFO - Step 3970: lr=5.66E-06, loss= 1.0651 (max= 1.4738), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:07,027 - root - INFO - Step 3970: lr=5.66E-06, loss= 1.0651 (max= 1.4738), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:07,027 - root - INFO - Step 3970: lr=5.66E-06, loss= 1.0651 (max= 1.4738), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:07,027 - root - INFO - Step 3970: lr=5.66E-06, loss= 1.0651 (max= 1.4738), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:07,027 - root - INFO - Step 3970: lr=5.66E-06, loss= 1.0651 (max= 1.4738), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:07,028 - root - INFO - Step 3970: lr=5.66E-06, loss= 1.0651 (max= 1.4738), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:07,028 - root - INFO - Step 3970: lr=5.66E-06, loss= 1.0651 (max= 1.4738), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:39,047 - root - INFO - Step 3980: lr=5.65E-06, loss= 1.0856 (max= 1.5696), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:39,047 - root - INFO - Step 3980: lr=5.65E-06, loss= 1.0856 (max= 1.5696), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:39,047 - root - INFO - Step 3980: lr=5.65E-06, loss= 1.0856 (max= 1.5696), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:39,047 - root - INFO - Step 3980: lr=5.65E-06, loss= 1.0856 (max= 1.5696), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:39,047 - root - INFO - Step 3980: lr=5.65E-06, loss= 1.0856 (max= 1.5696), tps=20470, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:39,047 - root - INFO - Step 3980: lr=5.65E-06, loss= 1.0856 (max= 1.5696), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:39,048 - root - INFO - Step 3980: lr=5.65E-06, loss= 1.0856 (max= 1.5696), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:37:39,048 - root - INFO - Step 3980: lr=5.65E-06, loss= 1.0856 (max= 1.5696), tps=20470, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:10,913 - root - INFO - Step 3990: lr=5.65E-06, loss= 1.0805 (max= 1.4527), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:10,913 - root - INFO - Step 3990: lr=5.65E-06, loss= 1.0805 (max= 1.4527), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:10,913 - root - INFO - Step 3990: lr=5.65E-06, loss= 1.0805 (max= 1.4527), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:10,913 - root - INFO - Step 3990: lr=5.65E-06, loss= 1.0805 (max= 1.4527), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:10,913 - root - INFO - Step 3990: lr=5.65E-06, loss= 1.0805 (max= 1.4527), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:10,913 - root - INFO - Step 3990: lr=5.65E-06, loss= 1.0805 (max= 1.4527), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:10,913 - root - INFO - Step 3990: lr=5.65E-06, loss= 1.0805 (max= 1.4527), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:10,914 - root - INFO - Step 3990: lr=5.65E-06, loss= 1.0805 (max= 1.4527), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-4000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-4000! Save time: 4.416459560394287 2025-10-26 11:38:42,723 - root - INFO - Step 4000: lr=5.64E-06, loss= 1.0774 (max= 1.6556), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:42,723 - root - INFO - Step 4000: lr=5.64E-06, loss= 1.0774 (max= 1.6556), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:42,723 - root - INFO - Saving a full checkpoint at step 4000 2025-10-26 11:38:42,723 - root - INFO - Step 4000: lr=5.64E-06, loss= 1.0774 (max= 1.6556), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:42,723 - root - INFO - Saving a full checkpoint at step 4000 2025-10-26 11:38:42,723 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 11:38:42,723 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 11:38:42,723 - root - INFO - Saving a full checkpoint at step 4000 2025-10-26 11:38:42,723 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 11:38:42,723 - root - INFO - Step 4000: lr=5.64E-06, loss= 1.0774 (max= 1.6556), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:42,723 - root - INFO - Step 4000: lr=5.64E-06, loss= 1.0774 (max= 1.6556), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:42,723 - root - INFO - Saving a full checkpoint at step 4000 2025-10-26 11:38:42,723 - root - INFO - Step 4000: lr=5.64E-06, loss= 1.0774 (max= 1.6556), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:42,723 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 11:38:42,723 - root - INFO - Saving a full checkpoint at step 4000 2025-10-26 11:38:42,723 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 11:38:42,723 - root - INFO - Saving a full checkpoint at step 4000 2025-10-26 11:38:42,723 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 11:38:42,723 - root - INFO - Step 4000: lr=5.64E-06, loss= 1.0774 (max= 1.6556), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:42,723 - root - INFO - Step 4000: lr=5.64E-06, loss= 1.0774 (max= 1.6556), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:38:42,723 - root - INFO - Saving a full checkpoint at step 4000 2025-10-26 11:38:42,723 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 11:38:42,723 - root - INFO - Saving a full checkpoint at step 4000 2025-10-26 11:38:42,723 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 11:38:56,505 - root - INFO - Finished saving the checkpoint in 13.78 seconds 2025-10-26 11:38:56,512 - root - INFO - Finished saving the checkpoint in 13.79 seconds 2025-10-26 11:38:56,512 - root - INFO - Finished saving the checkpoint in 13.79 seconds 2025-10-26 11:38:56,513 - root - INFO - Finished saving the checkpoint in 13.79 seconds 2025-10-26 11:38:56,513 - root - INFO - Finished saving the checkpoint in 13.79 seconds 2025-10-26 11:38:56,513 - root - INFO - Finished saving the checkpoint in 13.79 seconds 2025-10-26 11:38:56,513 - root - INFO - Finished saving the checkpoint in 13.79 seconds 2025-10-26 11:38:56,514 - root - INFO - Finished saving the checkpoint in 13.79 seconds 2025-10-26 11:39:28,492 - root - INFO - Step 4010: lr=5.64E-06, loss= 1.0686 (max= 1.4399), tps=14320, mfu=29.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:39:28,492 - root - INFO - Step 4010: lr=5.64E-06, loss= 1.0686 (max= 1.4399), tps=14320, mfu=29.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:39:28,492 - root - INFO - Step 4010: lr=5.64E-06, loss= 1.0686 (max= 1.4399), tps=14320, mfu=29.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:39:28,492 - root - INFO - Step 4010: lr=5.64E-06, loss= 1.0686 (max= 1.4399), tps=14320, mfu=29.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:39:28,492 - root - INFO - Step 4010: lr=5.64E-06, loss= 1.0686 (max= 1.4399), tps=14320, mfu=29.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:39:28,492 - root - INFO - Step 4010: lr=5.64E-06, loss= 1.0686 (max= 1.4399), tps=14320, mfu=29.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:39:28,492 - root - INFO - Step 4010: lr=5.64E-06, loss= 1.0686 (max= 1.4399), tps=14320, mfu=29.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:39:28,492 - root - INFO - Step 4010: lr=5.64E-06, loss= 1.0686 (max= 1.4399), tps=14320, mfu=29.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:00,281 - root - INFO - Step 4020: lr=5.63E-06, loss= 1.0873 (max= 1.8162), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:00,281 - root - INFO - Step 4020: lr=5.63E-06, loss= 1.0873 (max= 1.8162), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:00,281 - root - INFO - Step 4020: lr=5.63E-06, loss= 1.0873 (max= 1.8162), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:00,281 - root - INFO - Step 4020: lr=5.63E-06, loss= 1.0873 (max= 1.8162), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:00,281 - root - INFO - Step 4020: lr=5.63E-06, loss= 1.0873 (max= 1.8162), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:00,282 - root - INFO - Step 4020: lr=5.63E-06, loss= 1.0873 (max= 1.8162), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:00,282 - root - INFO - Step 4020: lr=5.63E-06, loss= 1.0873 (max= 1.8162), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:00,282 - root - INFO - Step 4020: lr=5.63E-06, loss= 1.0873 (max= 1.8162), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:32,094 - root - INFO - Step 4030: lr=5.62E-06, loss= 1.0577 (max= 1.4985), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:32,094 - root - INFO - Step 4030: lr=5.62E-06, loss= 1.0577 (max= 1.4985), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:32,094 - root - INFO - Step 4030: lr=5.62E-06, loss= 1.0577 (max= 1.4985), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:32,094 - root - INFO - Step 4030: lr=5.62E-06, loss= 1.0577 (max= 1.4985), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:32,094 - root - INFO - Step 4030: lr=5.62E-06, loss= 1.0577 (max= 1.4985), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:32,094 - root - INFO - Step 4030: lr=5.62E-06, loss= 1.0577 (max= 1.4985), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:32,094 - root - INFO - Step 4030: lr=5.62E-06, loss= 1.0577 (max= 1.4985), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:40:32,095 - root - INFO - Step 4030: lr=5.62E-06, loss= 1.0577 (max= 1.4985), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:03,915 - root - INFO - Step 4040: lr=5.62E-06, loss= 1.0921 (max= 1.5696), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:03,915 - root - INFO - Step 4040: lr=5.62E-06, loss= 1.0921 (max= 1.5696), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:03,915 - root - INFO - Step 4040: lr=5.62E-06, loss= 1.0921 (max= 1.5696), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:03,916 - root - INFO - Step 4040: lr=5.62E-06, loss= 1.0921 (max= 1.5696), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:03,916 - root - INFO - Step 4040: lr=5.62E-06, loss= 1.0921 (max= 1.5696), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:03,916 - root - INFO - Step 4040: lr=5.62E-06, loss= 1.0921 (max= 1.5696), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:03,916 - root - INFO - Step 4040: lr=5.62E-06, loss= 1.0921 (max= 1.5696), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:03,916 - root - INFO - Step 4040: lr=5.62E-06, loss= 1.0921 (max= 1.5696), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:35,745 - root - INFO - Step 4050: lr=5.61E-06, loss= 1.0824 (max= 1.4714), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:35,745 - root - INFO - Step 4050: lr=5.61E-06, loss= 1.0824 (max= 1.4714), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:35,745 - root - INFO - Step 4050: lr=5.61E-06, loss= 1.0824 (max= 1.4714), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:35,745 - root - INFO - Step 4050: lr=5.61E-06, loss= 1.0824 (max= 1.4714), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:35,745 - root - INFO - Step 4050: lr=5.61E-06, loss= 1.0824 (max= 1.4714), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:35,745 - root - INFO - Step 4050: lr=5.61E-06, loss= 1.0824 (max= 1.4714), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:35,745 - root - INFO - Step 4050: lr=5.61E-06, loss= 1.0824 (max= 1.4714), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:41:35,745 - root - INFO - Step 4050: lr=5.61E-06, loss= 1.0824 (max= 1.4714), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:07,557 - root - INFO - Step 4060: lr=5.60E-06, loss= 1.0881 (max= 1.7422), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:07,557 - root - INFO - Step 4060: lr=5.60E-06, loss= 1.0881 (max= 1.7422), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:07,557 - root - INFO - Step 4060: lr=5.60E-06, loss= 1.0881 (max= 1.7422), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:07,557 - root - INFO - Step 4060: lr=5.60E-06, loss= 1.0881 (max= 1.7422), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:07,557 - root - INFO - Step 4060: lr=5.60E-06, loss= 1.0881 (max= 1.7422), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:07,557 - root - INFO - Step 4060: lr=5.60E-06, loss= 1.0881 (max= 1.7422), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:07,557 - root - INFO - Step 4060: lr=5.60E-06, loss= 1.0881 (max= 1.7422), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:07,557 - root - INFO - Step 4060: lr=5.60E-06, loss= 1.0881 (max= 1.7422), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:39,422 - root - INFO - Step 4070: lr=5.60E-06, loss= 1.0897 (max= 1.5211), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:39,422 - root - INFO - Step 4070: lr=5.60E-06, loss= 1.0897 (max= 1.5211), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:39,422 - root - INFO - Step 4070: lr=5.60E-06, loss= 1.0897 (max= 1.5211), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:39,422 - root - INFO - Step 4070: lr=5.60E-06, loss= 1.0897 (max= 1.5211), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:39,422 - root - INFO - Step 4070: lr=5.60E-06, loss= 1.0897 (max= 1.5211), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:39,422 - root - INFO - Step 4070: lr=5.60E-06, loss= 1.0897 (max= 1.5211), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:39,422 - root - INFO - Step 4070: lr=5.60E-06, loss= 1.0897 (max= 1.5211), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:42:39,422 - root - INFO - Step 4070: lr=5.60E-06, loss= 1.0897 (max= 1.5211), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:11,294 - root - INFO - Step 4080: lr=5.59E-06, loss= 1.0694 (max= 1.5559), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:11,294 - root - INFO - Step 4080: lr=5.59E-06, loss= 1.0694 (max= 1.5559), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:11,294 - root - INFO - Step 4080: lr=5.59E-06, loss= 1.0694 (max= 1.5559), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:11,294 - root - INFO - Step 4080: lr=5.59E-06, loss= 1.0694 (max= 1.5559), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:11,294 - root - INFO - Step 4080: lr=5.59E-06, loss= 1.0694 (max= 1.5559), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:11,294 - root - INFO - Step 4080: lr=5.59E-06, loss= 1.0694 (max= 1.5559), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:11,294 - root - INFO - Step 4080: lr=5.59E-06, loss= 1.0694 (max= 1.5559), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:11,294 - root - INFO - Step 4080: lr=5.59E-06, loss= 1.0694 (max= 1.5559), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:43,174 - root - INFO - Step 4090: lr=5.59E-06, loss= 1.0929 (max= 1.5101), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:43,174 - root - INFO - Step 4090: lr=5.59E-06, loss= 1.0929 (max= 1.5101), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:43,174 - root - INFO - Step 4090: lr=5.59E-06, loss= 1.0929 (max= 1.5101), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:43,174 - root - INFO - Step 4090: lr=5.59E-06, loss= 1.0929 (max= 1.5101), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:43,174 - root - INFO - Step 4090: lr=5.59E-06, loss= 1.0929 (max= 1.5101), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:43,174 - root - INFO - Step 4090: lr=5.59E-06, loss= 1.0929 (max= 1.5101), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:43,174 - root - INFO - Step 4090: lr=5.59E-06, loss= 1.0929 (max= 1.5101), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:43:43,174 - root - INFO - Step 4090: lr=5.59E-06, loss= 1.0929 (max= 1.5101), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:14,992 - root - INFO - Step 4100: lr=5.58E-06, loss= 1.0826 (max= 1.4845), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:14,992 - root - INFO - Step 4100: lr=5.58E-06, loss= 1.0826 (max= 1.4845), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:14,992 - root - INFO - Step 4100: lr=5.58E-06, loss= 1.0826 (max= 1.4845), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:14,992 - root - INFO - Step 4100: lr=5.58E-06, loss= 1.0826 (max= 1.4845), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:14,992 - root - INFO - Step 4100: lr=5.58E-06, loss= 1.0826 (max= 1.4845), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:14,992 - root - INFO - Step 4100: lr=5.58E-06, loss= 1.0826 (max= 1.4845), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:14,992 - root - INFO - Step 4100: lr=5.58E-06, loss= 1.0826 (max= 1.4845), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:14,992 - root - INFO - Step 4100: lr=5.58E-06, loss= 1.0826 (max= 1.4845), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:46,819 - root - INFO - Step 4110: lr=5.57E-06, loss= 1.0957 (max= 1.7441), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:46,819 - root - INFO - Step 4110: lr=5.57E-06, loss= 1.0957 (max= 1.7441), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:46,819 - root - INFO - Step 4110: lr=5.57E-06, loss= 1.0957 (max= 1.7441), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:46,819 - root - INFO - Step 4110: lr=5.57E-06, loss= 1.0957 (max= 1.7441), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:46,819 - root - INFO - Step 4110: lr=5.57E-06, loss= 1.0957 (max= 1.7441), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:46,819 - root - INFO - Step 4110: lr=5.57E-06, loss= 1.0957 (max= 1.7441), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:46,820 - root - INFO - Step 4110: lr=5.57E-06, loss= 1.0957 (max= 1.7441), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:44:46,820 - root - INFO - Step 4110: lr=5.57E-06, loss= 1.0957 (max= 1.7441), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:18,678 - root - INFO - Step 4120: lr=5.57E-06, loss= 1.0920 (max= 1.7143), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:18,678 - root - INFO - Step 4120: lr=5.57E-06, loss= 1.0920 (max= 1.7143), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:18,678 - root - INFO - Step 4120: lr=5.57E-06, loss= 1.0920 (max= 1.7143), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:18,678 - root - INFO - Step 4120: lr=5.57E-06, loss= 1.0920 (max= 1.7143), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:18,678 - root - INFO - Step 4120: lr=5.57E-06, loss= 1.0920 (max= 1.7143), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:18,678 - root - INFO - Step 4120: lr=5.57E-06, loss= 1.0920 (max= 1.7143), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:18,678 - root - INFO - Step 4120: lr=5.57E-06, loss= 1.0920 (max= 1.7143), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:18,678 - root - INFO - Step 4120: lr=5.57E-06, loss= 1.0920 (max= 1.7143), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:50,532 - root - INFO - Step 4130: lr=5.56E-06, loss= 1.0952 (max= 1.6664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:50,532 - root - INFO - Step 4130: lr=5.56E-06, loss= 1.0952 (max= 1.6664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:50,532 - root - INFO - Step 4130: lr=5.56E-06, loss= 1.0952 (max= 1.6664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:50,532 - root - INFO - Step 4130: lr=5.56E-06, loss= 1.0952 (max= 1.6664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:50,532 - root - INFO - Step 4130: lr=5.56E-06, loss= 1.0952 (max= 1.6664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:50,532 - root - INFO - Step 4130: lr=5.56E-06, loss= 1.0952 (max= 1.6664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:50,532 - root - INFO - Step 4130: lr=5.56E-06, loss= 1.0952 (max= 1.6664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:45:50,533 - root - INFO - Step 4130: lr=5.56E-06, loss= 1.0952 (max= 1.6664), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:22,401 - root - INFO - Step 4140: lr=5.56E-06, loss= 1.0541 (max= 1.4596), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:22,401 - root - INFO - Step 4140: lr=5.56E-06, loss= 1.0541 (max= 1.4596), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:22,401 - root - INFO - Step 4140: lr=5.56E-06, loss= 1.0541 (max= 1.4596), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:22,401 - root - INFO - Step 4140: lr=5.56E-06, loss= 1.0541 (max= 1.4596), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:22,401 - root - INFO - Step 4140: lr=5.56E-06, loss= 1.0541 (max= 1.4596), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:22,401 - root - INFO - Step 4140: lr=5.56E-06, loss= 1.0541 (max= 1.4596), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:22,401 - root - INFO - Step 4140: lr=5.56E-06, loss= 1.0541 (max= 1.4596), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:22,401 - root - INFO - Step 4140: lr=5.56E-06, loss= 1.0541 (max= 1.4596), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:54,154 - root - INFO - Step 4150: lr=5.55E-06, loss= 1.0800 (max= 1.4736), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:54,154 - root - INFO - Step 4150: lr=5.55E-06, loss= 1.0800 (max= 1.4736), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:54,154 - root - INFO - Step 4150: lr=5.55E-06, loss= 1.0800 (max= 1.4736), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:54,155 - root - INFO - Step 4150: lr=5.55E-06, loss= 1.0800 (max= 1.4736), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:54,155 - root - INFO - Step 4150: lr=5.55E-06, loss= 1.0800 (max= 1.4736), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:54,155 - root - INFO - Step 4150: lr=5.55E-06, loss= 1.0800 (max= 1.4736), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:54,155 - root - INFO - Step 4150: lr=5.55E-06, loss= 1.0800 (max= 1.4736), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:46:54,155 - root - INFO - Step 4150: lr=5.55E-06, loss= 1.0800 (max= 1.4736), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:26,060 - root - INFO - Step 4160: lr=5.54E-06, loss= 1.0878 (max= 1.5615), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:26,060 - root - INFO - Step 4160: lr=5.54E-06, loss= 1.0878 (max= 1.5615), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:26,060 - root - INFO - Step 4160: lr=5.54E-06, loss= 1.0878 (max= 1.5615), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:26,060 - root - INFO - Step 4160: lr=5.54E-06, loss= 1.0878 (max= 1.5615), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:26,060 - root - INFO - Step 4160: lr=5.54E-06, loss= 1.0878 (max= 1.5615), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:26,060 - root - INFO - Step 4160: lr=5.54E-06, loss= 1.0878 (max= 1.5615), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:26,061 - root - INFO - Step 4160: lr=5.54E-06, loss= 1.0878 (max= 1.5615), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:26,061 - root - INFO - Step 4160: lr=5.54E-06, loss= 1.0878 (max= 1.5615), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:57,827 - root - INFO - Step 4170: lr=5.54E-06, loss= 1.0865 (max= 1.8890), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:57,827 - root - INFO - Step 4170: lr=5.54E-06, loss= 1.0865 (max= 1.8890), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:57,827 - root - INFO - Step 4170: lr=5.54E-06, loss= 1.0865 (max= 1.8890), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:57,827 - root - INFO - Step 4170: lr=5.54E-06, loss= 1.0865 (max= 1.8890), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:57,827 - root - INFO - Step 4170: lr=5.54E-06, loss= 1.0865 (max= 1.8890), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:57,827 - root - INFO - Step 4170: lr=5.54E-06, loss= 1.0865 (max= 1.8890), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:57,827 - root - INFO - Step 4170: lr=5.54E-06, loss= 1.0865 (max= 1.8890), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:47:57,827 - root - INFO - Step 4170: lr=5.54E-06, loss= 1.0865 (max= 1.8890), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:48:29,629 - root - INFO - Step 4180: lr=5.53E-06, loss= 1.0639 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:48:29,629 - root - INFO - Step 4180: lr=5.53E-06, loss= 1.0639 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:48:29,629 - root - INFO - Step 4180: lr=5.53E-06, loss= 1.0639 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:48:29,629 - root - INFO - Step 4180: lr=5.53E-06, loss= 1.0639 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:48:29,629 - root - INFO - Step 4180: lr=5.53E-06, loss= 1.0639 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:48:29,629 - root - INFO - Step 4180: lr=5.53E-06, loss= 1.0639 (max= 1.5470), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:48:29,630 - root - INFO - Step 4180: lr=5.53E-06, loss= 1.0639 (max= 1.5470), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:48:29,630 - root - INFO - Step 4180: lr=5.53E-06, loss= 1.0639 (max= 1.5470), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:01,437 - root - INFO - Step 4190: lr=5.52E-06, loss= 1.0605 (max= 1.6069), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:01,437 - root - INFO - Step 4190: lr=5.52E-06, loss= 1.0605 (max= 1.6069), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:01,437 - root - INFO - Step 4190: lr=5.52E-06, loss= 1.0605 (max= 1.6069), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:01,438 - root - INFO - Step 4190: lr=5.52E-06, loss= 1.0605 (max= 1.6069), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:01,438 - root - INFO - Step 4190: lr=5.52E-06, loss= 1.0605 (max= 1.6069), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:01,438 - root - INFO - Step 4190: lr=5.52E-06, loss= 1.0605 (max= 1.6069), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:01,438 - root - INFO - Step 4190: lr=5.52E-06, loss= 1.0605 (max= 1.6069), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:01,438 - root - INFO - Step 4190: lr=5.52E-06, loss= 1.0605 (max= 1.6069), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:33,360 - root - INFO - Step 4200: lr=5.52E-06, loss= 1.0823 (max= 1.5123), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:33,361 - root - INFO - Step 4200: lr=5.52E-06, loss= 1.0823 (max= 1.5123), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:33,361 - root - INFO - Step 4200: lr=5.52E-06, loss= 1.0823 (max= 1.5123), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:33,361 - root - INFO - Step 4200: lr=5.52E-06, loss= 1.0823 (max= 1.5123), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:33,361 - root - INFO - Step 4200: lr=5.52E-06, loss= 1.0823 (max= 1.5123), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:33,361 - root - INFO - Step 4200: lr=5.52E-06, loss= 1.0823 (max= 1.5123), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:33,361 - root - INFO - Step 4200: lr=5.52E-06, loss= 1.0823 (max= 1.5123), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:49:33,361 - root - INFO - Step 4200: lr=5.52E-06, loss= 1.0823 (max= 1.5123), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:05,268 - root - INFO - Step 4210: lr=5.51E-06, loss= 1.0615 (max= 1.5011), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:05,268 - root - INFO - Step 4210: lr=5.51E-06, loss= 1.0615 (max= 1.5011), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:05,268 - root - INFO - Step 4210: lr=5.51E-06, loss= 1.0615 (max= 1.5011), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:05,268 - root - INFO - Step 4210: lr=5.51E-06, loss= 1.0615 (max= 1.5011), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:05,268 - root - INFO - Step 4210: lr=5.51E-06, loss= 1.0615 (max= 1.5011), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:05,268 - root - INFO - Step 4210: lr=5.51E-06, loss= 1.0615 (max= 1.5011), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:05,268 - root - INFO - Step 4210: lr=5.51E-06, loss= 1.0615 (max= 1.5011), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:05,268 - root - INFO - Step 4210: lr=5.51E-06, loss= 1.0615 (max= 1.5011), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:37,118 - root - INFO - Step 4220: lr=5.51E-06, loss= 1.0970 (max= 1.5426), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:37,118 - root - INFO - Step 4220: lr=5.51E-06, loss= 1.0970 (max= 1.5426), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:37,119 - root - INFO - Step 4220: lr=5.51E-06, loss= 1.0970 (max= 1.5426), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:37,119 - root - INFO - Step 4220: lr=5.51E-06, loss= 1.0970 (max= 1.5426), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:37,119 - root - INFO - Step 4220: lr=5.51E-06, loss= 1.0970 (max= 1.5426), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:37,119 - root - INFO - Step 4220: lr=5.51E-06, loss= 1.0970 (max= 1.5426), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:37,119 - root - INFO - Step 4220: lr=5.51E-06, loss= 1.0970 (max= 1.5426), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:50:37,119 - root - INFO - Step 4220: lr=5.51E-06, loss= 1.0970 (max= 1.5426), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:09,040 - root - INFO - Step 4230: lr=5.50E-06, loss= 1.1025 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:09,040 - root - INFO - Step 4230: lr=5.50E-06, loss= 1.1025 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:09,040 - root - INFO - Step 4230: lr=5.50E-06, loss= 1.1025 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:09,041 - root - INFO - Step 4230: lr=5.50E-06, loss= 1.1025 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:09,041 - root - INFO - Step 4230: lr=5.50E-06, loss= 1.1025 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:09,041 - root - INFO - Step 4230: lr=5.50E-06, loss= 1.1025 (max= 1.7283), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:09,041 - root - INFO - Step 4230: lr=5.50E-06, loss= 1.1025 (max= 1.7283), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:09,041 - root - INFO - Step 4230: lr=5.50E-06, loss= 1.1025 (max= 1.7283), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:40,978 - root - INFO - Step 4240: lr=5.49E-06, loss= 1.0718 (max= 1.5051), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:40,978 - root - INFO - Step 4240: lr=5.49E-06, loss= 1.0718 (max= 1.5051), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:40,978 - root - INFO - Step 4240: lr=5.49E-06, loss= 1.0718 (max= 1.5051), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:40,978 - root - INFO - Step 4240: lr=5.49E-06, loss= 1.0718 (max= 1.5051), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:40,978 - root - INFO - Step 4240: lr=5.49E-06, loss= 1.0718 (max= 1.5051), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:40,978 - root - INFO - Step 4240: lr=5.49E-06, loss= 1.0718 (max= 1.5051), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:40,978 - root - INFO - Step 4240: lr=5.49E-06, loss= 1.0718 (max= 1.5051), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:51:40,978 - root - INFO - Step 4240: lr=5.49E-06, loss= 1.0718 (max= 1.5051), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:12,822 - root - INFO - Step 4250: lr=5.49E-06, loss= 1.0912 (max= 1.5798), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:12,822 - root - INFO - Step 4250: lr=5.49E-06, loss= 1.0912 (max= 1.5798), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:12,822 - root - INFO - Step 4250: lr=5.49E-06, loss= 1.0912 (max= 1.5798), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:12,822 - root - INFO - Step 4250: lr=5.49E-06, loss= 1.0912 (max= 1.5798), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:12,822 - root - INFO - Step 4250: lr=5.49E-06, loss= 1.0912 (max= 1.5798), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:12,822 - root - INFO - Step 4250: lr=5.49E-06, loss= 1.0912 (max= 1.5798), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:12,823 - root - INFO - Step 4250: lr=5.49E-06, loss= 1.0912 (max= 1.5798), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:12,823 - root - INFO - Step 4250: lr=5.49E-06, loss= 1.0912 (max= 1.5798), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:44,576 - root - INFO - Step 4260: lr=5.48E-06, loss= 1.1041 (max= 1.6649), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:44,576 - root - INFO - Step 4260: lr=5.48E-06, loss= 1.1041 (max= 1.6649), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:44,576 - root - INFO - Step 4260: lr=5.48E-06, loss= 1.1041 (max= 1.6649), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:44,576 - root - INFO - Step 4260: lr=5.48E-06, loss= 1.1041 (max= 1.6649), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:44,576 - root - INFO - Step 4260: lr=5.48E-06, loss= 1.1041 (max= 1.6649), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:44,577 - root - INFO - Step 4260: lr=5.48E-06, loss= 1.1041 (max= 1.6649), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:44,577 - root - INFO - Step 4260: lr=5.48E-06, loss= 1.1041 (max= 1.6649), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:52:44,577 - root - INFO - Step 4260: lr=5.48E-06, loss= 1.1041 (max= 1.6649), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:16,527 - root - INFO - Step 4270: lr=5.48E-06, loss= 1.0964 (max= 1.4707), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:16,527 - root - INFO - Step 4270: lr=5.48E-06, loss= 1.0964 (max= 1.4707), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:16,528 - root - INFO - Step 4270: lr=5.48E-06, loss= 1.0964 (max= 1.4707), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:16,528 - root - INFO - Step 4270: lr=5.48E-06, loss= 1.0964 (max= 1.4707), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:16,528 - root - INFO - Step 4270: lr=5.48E-06, loss= 1.0964 (max= 1.4707), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:16,528 - root - INFO - Step 4270: lr=5.48E-06, loss= 1.0964 (max= 1.4707), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:16,528 - root - INFO - Step 4270: lr=5.48E-06, loss= 1.0964 (max= 1.4707), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:16,528 - root - INFO - Step 4270: lr=5.48E-06, loss= 1.0964 (max= 1.4707), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:48,439 - root - INFO - Step 4280: lr=5.47E-06, loss= 1.0677 (max= 1.5934), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:48,439 - root - INFO - Step 4280: lr=5.47E-06, loss= 1.0677 (max= 1.5934), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:48,439 - root - INFO - Step 4280: lr=5.47E-06, loss= 1.0677 (max= 1.5934), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:48,439 - root - INFO - Step 4280: lr=5.47E-06, loss= 1.0677 (max= 1.5934), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:48,439 - root - INFO - Step 4280: lr=5.47E-06, loss= 1.0677 (max= 1.5934), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:48,439 - root - INFO - Step 4280: lr=5.47E-06, loss= 1.0677 (max= 1.5934), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:48,439 - root - INFO - Step 4280: lr=5.47E-06, loss= 1.0677 (max= 1.5934), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:53:48,439 - root - INFO - Step 4280: lr=5.47E-06, loss= 1.0677 (max= 1.5934), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:20,521 - root - INFO - Step 4290: lr=5.46E-06, loss= 1.0687 (max= 1.4953), tps=20430, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:20,521 - root - INFO - Step 4290: lr=5.46E-06, loss= 1.0687 (max= 1.4953), tps=20430, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:20,521 - root - INFO - Step 4290: lr=5.46E-06, loss= 1.0687 (max= 1.4953), tps=20430, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:20,521 - root - INFO - Step 4290: lr=5.46E-06, loss= 1.0687 (max= 1.4953), tps=20430, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:20,521 - root - INFO - Step 4290: lr=5.46E-06, loss= 1.0687 (max= 1.4953), tps=20430, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:20,521 - root - INFO - Step 4290: lr=5.46E-06, loss= 1.0687 (max= 1.4953), tps=20430, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:20,521 - root - INFO - Step 4290: lr=5.46E-06, loss= 1.0687 (max= 1.4953), tps=20430, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:20,521 - root - INFO - Step 4290: lr=5.46E-06, loss= 1.0687 (max= 1.4953), tps=20430, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:52,408 - root - INFO - Step 4300: lr=5.46E-06, loss= 1.1075 (max= 1.6585), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:52,408 - root - INFO - Step 4300: lr=5.46E-06, loss= 1.1075 (max= 1.6585), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:52,408 - root - INFO - Step 4300: lr=5.46E-06, loss= 1.1075 (max= 1.6585), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:52,408 - root - INFO - Step 4300: lr=5.46E-06, loss= 1.1075 (max= 1.6585), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:52,408 - root - INFO - Step 4300: lr=5.46E-06, loss= 1.1075 (max= 1.6585), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:52,408 - root - INFO - Step 4300: lr=5.46E-06, loss= 1.1075 (max= 1.6585), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:52,408 - root - INFO - Step 4300: lr=5.46E-06, loss= 1.1075 (max= 1.6585), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:54:52,408 - root - INFO - Step 4300: lr=5.46E-06, loss= 1.1075 (max= 1.6585), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:24,333 - root - INFO - Step 4310: lr=5.45E-06, loss= 1.0796 (max= 1.4892), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:24,333 - root - INFO - Step 4310: lr=5.45E-06, loss= 1.0796 (max= 1.4892), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:24,333 - root - INFO - Step 4310: lr=5.45E-06, loss= 1.0796 (max= 1.4892), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:24,333 - root - INFO - Step 4310: lr=5.45E-06, loss= 1.0796 (max= 1.4892), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:24,334 - root - INFO - Step 4310: lr=5.45E-06, loss= 1.0796 (max= 1.4892), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:24,334 - root - INFO - Step 4310: lr=5.45E-06, loss= 1.0796 (max= 1.4892), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:24,334 - root - INFO - Step 4310: lr=5.45E-06, loss= 1.0796 (max= 1.4892), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:24,334 - root - INFO - Step 4310: lr=5.45E-06, loss= 1.0796 (max= 1.4892), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:56,140 - root - INFO - Step 4320: lr=5.45E-06, loss= 1.0865 (max= 1.5942), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:56,140 - root - INFO - Step 4320: lr=5.45E-06, loss= 1.0865 (max= 1.5942), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:56,140 - root - INFO - Step 4320: lr=5.45E-06, loss= 1.0865 (max= 1.5942), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:56,140 - root - INFO - Step 4320: lr=5.45E-06, loss= 1.0865 (max= 1.5942), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:56,141 - root - INFO - Step 4320: lr=5.45E-06, loss= 1.0865 (max= 1.5942), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:56,141 - root - INFO - Step 4320: lr=5.45E-06, loss= 1.0865 (max= 1.5942), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:56,141 - root - INFO - Step 4320: lr=5.45E-06, loss= 1.0865 (max= 1.5942), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:55:56,141 - root - INFO - Step 4320: lr=5.45E-06, loss= 1.0865 (max= 1.5942), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:56:28,087 - root - INFO - Step 4330: lr=5.44E-06, loss= 1.1023 (max= 1.6219), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:56:28,087 - root - INFO - Step 4330: lr=5.44E-06, loss= 1.1023 (max= 1.6219), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:56:28,087 - root - INFO - Step 4330: lr=5.44E-06, loss= 1.1023 (max= 1.6219), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:56:28,088 - root - INFO - Step 4330: lr=5.44E-06, loss= 1.1023 (max= 1.6219), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:56:28,088 - root - INFO - Step 4330: lr=5.44E-06, loss= 1.1023 (max= 1.6219), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:56:28,088 - root - INFO - Step 4330: lr=5.44E-06, loss= 1.1023 (max= 1.6219), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:56:28,088 - root - INFO - Step 4330: lr=5.44E-06, loss= 1.1023 (max= 1.6219), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:56:28,088 - root - INFO - Step 4330: lr=5.44E-06, loss= 1.1023 (max= 1.6219), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:00,236 - root - INFO - Step 4340: lr=5.43E-06, loss= 1.0816 (max= 1.5596), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:00,236 - root - INFO - Step 4340: lr=5.43E-06, loss= 1.0816 (max= 1.5596), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:00,236 - root - INFO - Step 4340: lr=5.43E-06, loss= 1.0816 (max= 1.5596), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:00,236 - root - INFO - Step 4340: lr=5.43E-06, loss= 1.0816 (max= 1.5596), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:00,236 - root - INFO - Step 4340: lr=5.43E-06, loss= 1.0816 (max= 1.5596), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:00,236 - root - INFO - Step 4340: lr=5.43E-06, loss= 1.0816 (max= 1.5596), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:00,236 - root - INFO - Step 4340: lr=5.43E-06, loss= 1.0816 (max= 1.5596), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:00,237 - root - INFO - Step 4340: lr=5.43E-06, loss= 1.0816 (max= 1.5596), tps=20388, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:32,085 - root - INFO - Step 4350: lr=5.43E-06, loss= 1.0760 (max= 1.5403), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:32,085 - root - INFO - Step 4350: lr=5.43E-06, loss= 1.0760 (max= 1.5403), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:32,085 - root - INFO - Step 4350: lr=5.43E-06, loss= 1.0760 (max= 1.5403), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:32,086 - root - INFO - Step 4350: lr=5.43E-06, loss= 1.0760 (max= 1.5403), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:32,086 - root - INFO - Step 4350: lr=5.43E-06, loss= 1.0760 (max= 1.5403), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:32,086 - root - INFO - Step 4350: lr=5.43E-06, loss= 1.0760 (max= 1.5403), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:32,086 - root - INFO - Step 4350: lr=5.43E-06, loss= 1.0760 (max= 1.5403), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:57:32,086 - root - INFO - Step 4350: lr=5.43E-06, loss= 1.0760 (max= 1.5403), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:03,969 - root - INFO - Step 4360: lr=5.42E-06, loss= 1.0633 (max= 1.5602), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:03,969 - root - INFO - Step 4360: lr=5.42E-06, loss= 1.0633 (max= 1.5602), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:03,969 - root - INFO - Step 4360: lr=5.42E-06, loss= 1.0633 (max= 1.5602), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:03,969 - root - INFO - Step 4360: lr=5.42E-06, loss= 1.0633 (max= 1.5602), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:03,969 - root - INFO - Step 4360: lr=5.42E-06, loss= 1.0633 (max= 1.5602), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:03,969 - root - INFO - Step 4360: lr=5.42E-06, loss= 1.0633 (max= 1.5602), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:03,969 - root - INFO - Step 4360: lr=5.42E-06, loss= 1.0633 (max= 1.5602), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:03,970 - root - INFO - Step 4360: lr=5.42E-06, loss= 1.0633 (max= 1.5602), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:35,999 - root - INFO - Step 4370: lr=5.42E-06, loss= 1.0780 (max= 1.5230), tps=20463, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:36,000 - root - INFO - Step 4370: lr=5.42E-06, loss= 1.0780 (max= 1.5230), tps=20463, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:36,000 - root - INFO - Step 4370: lr=5.42E-06, loss= 1.0780 (max= 1.5230), tps=20463, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:36,000 - root - INFO - Step 4370: lr=5.42E-06, loss= 1.0780 (max= 1.5230), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:36,000 - root - INFO - Step 4370: lr=5.42E-06, loss= 1.0780 (max= 1.5230), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:36,000 - root - INFO - Step 4370: lr=5.42E-06, loss= 1.0780 (max= 1.5230), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:36,000 - root - INFO - Step 4370: lr=5.42E-06, loss= 1.0780 (max= 1.5230), tps=20463, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:58:36,000 - root - INFO - Step 4370: lr=5.42E-06, loss= 1.0780 (max= 1.5230), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:07,913 - root - INFO - Step 4380: lr=5.41E-06, loss= 1.0852 (max= 1.6461), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:07,913 - root - INFO - Step 4380: lr=5.41E-06, loss= 1.0852 (max= 1.6461), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:07,913 - root - INFO - Step 4380: lr=5.41E-06, loss= 1.0852 (max= 1.6461), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:07,913 - root - INFO - Step 4380: lr=5.41E-06, loss= 1.0852 (max= 1.6461), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:07,913 - root - INFO - Step 4380: lr=5.41E-06, loss= 1.0852 (max= 1.6461), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:07,913 - root - INFO - Step 4380: lr=5.41E-06, loss= 1.0852 (max= 1.6461), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:07,913 - root - INFO - Step 4380: lr=5.41E-06, loss= 1.0852 (max= 1.6461), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:07,914 - root - INFO - Step 4380: lr=5.41E-06, loss= 1.0852 (max= 1.6461), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:39,872 - root - INFO - Step 4390: lr=5.41E-06, loss= 1.0764 (max= 1.6261), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:39,872 - root - INFO - Step 4390: lr=5.41E-06, loss= 1.0764 (max= 1.6261), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:39,872 - root - INFO - Step 4390: lr=5.41E-06, loss= 1.0764 (max= 1.6261), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:39,872 - root - INFO - Step 4390: lr=5.41E-06, loss= 1.0764 (max= 1.6261), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:39,872 - root - INFO - Step 4390: lr=5.41E-06, loss= 1.0764 (max= 1.6261), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:39,872 - root - INFO - Step 4390: lr=5.41E-06, loss= 1.0764 (max= 1.6261), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:39,872 - root - INFO - Step 4390: lr=5.41E-06, loss= 1.0764 (max= 1.6261), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 11:59:39,873 - root - INFO - Step 4390: lr=5.41E-06, loss= 1.0764 (max= 1.6261), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:11,707 - root - INFO - Step 4400: lr=5.40E-06, loss= 1.0668 (max= 1.4441), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:11,707 - root - INFO - Step 4400: lr=5.40E-06, loss= 1.0668 (max= 1.4441), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:11,707 - root - INFO - Step 4400: lr=5.40E-06, loss= 1.0668 (max= 1.4441), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:11,707 - root - INFO - Step 4400: lr=5.40E-06, loss= 1.0668 (max= 1.4441), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:11,707 - root - INFO - Step 4400: lr=5.40E-06, loss= 1.0668 (max= 1.4441), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:11,707 - root - INFO - Step 4400: lr=5.40E-06, loss= 1.0668 (max= 1.4441), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:11,708 - root - INFO - Step 4400: lr=5.40E-06, loss= 1.0668 (max= 1.4441), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:11,708 - root - INFO - Step 4400: lr=5.40E-06, loss= 1.0668 (max= 1.4441), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:43,824 - root - INFO - Step 4410: lr=5.39E-06, loss= 1.0680 (max= 1.5254), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:43,824 - root - INFO - Step 4410: lr=5.39E-06, loss= 1.0680 (max= 1.5254), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:43,824 - root - INFO - Step 4410: lr=5.39E-06, loss= 1.0680 (max= 1.5254), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:43,824 - root - INFO - Step 4410: lr=5.39E-06, loss= 1.0680 (max= 1.5254), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:43,825 - root - INFO - Step 4410: lr=5.39E-06, loss= 1.0680 (max= 1.5254), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:43,825 - root - INFO - Step 4410: lr=5.39E-06, loss= 1.0680 (max= 1.5254), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:43,825 - root - INFO - Step 4410: lr=5.39E-06, loss= 1.0680 (max= 1.5254), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:00:43,826 - root - INFO - Step 4410: lr=5.39E-06, loss= 1.0680 (max= 1.5254), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:15,703 - root - INFO - Step 4420: lr=5.39E-06, loss= 1.0705 (max= 1.5299), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:15,703 - root - INFO - Step 4420: lr=5.39E-06, loss= 1.0705 (max= 1.5299), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:15,703 - root - INFO - Step 4420: lr=5.39E-06, loss= 1.0705 (max= 1.5299), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:15,703 - root - INFO - Step 4420: lr=5.39E-06, loss= 1.0705 (max= 1.5299), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:15,703 - root - INFO - Step 4420: lr=5.39E-06, loss= 1.0705 (max= 1.5299), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:15,703 - root - INFO - Step 4420: lr=5.39E-06, loss= 1.0705 (max= 1.5299), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:15,703 - root - INFO - Step 4420: lr=5.39E-06, loss= 1.0705 (max= 1.5299), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:15,704 - root - INFO - Step 4420: lr=5.39E-06, loss= 1.0705 (max= 1.5299), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:47,520 - root - INFO - Step 4430: lr=5.38E-06, loss= 1.0978 (max= 1.6301), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:47,520 - root - INFO - Step 4430: lr=5.38E-06, loss= 1.0978 (max= 1.6301), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:47,520 - root - INFO - Step 4430: lr=5.38E-06, loss= 1.0978 (max= 1.6301), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:47,520 - root - INFO - Step 4430: lr=5.38E-06, loss= 1.0978 (max= 1.6301), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:47,520 - root - INFO - Step 4430: lr=5.38E-06, loss= 1.0978 (max= 1.6301), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:47,520 - root - INFO - Step 4430: lr=5.38E-06, loss= 1.0978 (max= 1.6301), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:47,520 - root - INFO - Step 4430: lr=5.38E-06, loss= 1.0978 (max= 1.6301), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:01:47,521 - root - INFO - Step 4430: lr=5.38E-06, loss= 1.0978 (max= 1.6301), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:19,359 - root - INFO - Step 4440: lr=5.38E-06, loss= 1.0793 (max= 1.4617), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:19,359 - root - INFO - Step 4440: lr=5.38E-06, loss= 1.0793 (max= 1.4617), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:19,359 - root - INFO - Step 4440: lr=5.38E-06, loss= 1.0793 (max= 1.4617), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:19,359 - root - INFO - Step 4440: lr=5.38E-06, loss= 1.0793 (max= 1.4617), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:19,359 - root - INFO - Step 4440: lr=5.38E-06, loss= 1.0793 (max= 1.4617), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:19,359 - root - INFO - Step 4440: lr=5.38E-06, loss= 1.0793 (max= 1.4617), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:19,359 - root - INFO - Step 4440: lr=5.38E-06, loss= 1.0793 (max= 1.4617), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:19,360 - root - INFO - Step 4440: lr=5.38E-06, loss= 1.0793 (max= 1.4617), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:51,257 - root - INFO - Step 4450: lr=5.37E-06, loss= 1.0931 (max= 1.6633), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:51,257 - root - INFO - Step 4450: lr=5.37E-06, loss= 1.0931 (max= 1.6633), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:51,257 - root - INFO - Step 4450: lr=5.37E-06, loss= 1.0931 (max= 1.6633), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:51,257 - root - INFO - Step 4450: lr=5.37E-06, loss= 1.0931 (max= 1.6633), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:51,257 - root - INFO - Step 4450: lr=5.37E-06, loss= 1.0931 (max= 1.6633), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:51,257 - root - INFO - Step 4450: lr=5.37E-06, loss= 1.0931 (max= 1.6633), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:51,257 - root - INFO - Step 4450: lr=5.37E-06, loss= 1.0931 (max= 1.6633), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:02:51,258 - root - INFO - Step 4450: lr=5.37E-06, loss= 1.0931 (max= 1.6633), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:23,239 - root - INFO - Step 4460: lr=5.36E-06, loss= 1.0789 (max= 1.5429), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:23,240 - root - INFO - Step 4460: lr=5.36E-06, loss= 1.0789 (max= 1.5429), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:23,240 - root - INFO - Step 4460: lr=5.36E-06, loss= 1.0789 (max= 1.5429), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:23,240 - root - INFO - Step 4460: lr=5.36E-06, loss= 1.0789 (max= 1.5429), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:23,240 - root - INFO - Step 4460: lr=5.36E-06, loss= 1.0789 (max= 1.5429), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:23,240 - root - INFO - Step 4460: lr=5.36E-06, loss= 1.0789 (max= 1.5429), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:23,240 - root - INFO - Step 4460: lr=5.36E-06, loss= 1.0789 (max= 1.5429), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:23,240 - root - INFO - Step 4460: lr=5.36E-06, loss= 1.0789 (max= 1.5429), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:55,058 - root - INFO - Step 4470: lr=5.36E-06, loss= 1.0725 (max= 1.7384), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:55,058 - root - INFO - Step 4470: lr=5.36E-06, loss= 1.0725 (max= 1.7384), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:55,058 - root - INFO - Step 4470: lr=5.36E-06, loss= 1.0725 (max= 1.7384), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:55,058 - root - INFO - Step 4470: lr=5.36E-06, loss= 1.0725 (max= 1.7384), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:55,058 - root - INFO - Step 4470: lr=5.36E-06, loss= 1.0725 (max= 1.7384), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:55,058 - root - INFO - Step 4470: lr=5.36E-06, loss= 1.0725 (max= 1.7384), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:55,058 - root - INFO - Step 4470: lr=5.36E-06, loss= 1.0725 (max= 1.7384), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:03:55,059 - root - INFO - Step 4470: lr=5.36E-06, loss= 1.0725 (max= 1.7384), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:26,933 - root - INFO - Step 4480: lr=5.35E-06, loss= 1.0756 (max= 1.6009), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:26,934 - root - INFO - Step 4480: lr=5.35E-06, loss= 1.0756 (max= 1.6009), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:26,934 - root - INFO - Step 4480: lr=5.35E-06, loss= 1.0756 (max= 1.6009), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:26,934 - root - INFO - Step 4480: lr=5.35E-06, loss= 1.0756 (max= 1.6009), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:26,934 - root - INFO - Step 4480: lr=5.35E-06, loss= 1.0756 (max= 1.6009), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:26,934 - root - INFO - Step 4480: lr=5.35E-06, loss= 1.0756 (max= 1.6009), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:26,934 - root - INFO - Step 4480: lr=5.35E-06, loss= 1.0756 (max= 1.6009), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:26,934 - root - INFO - Step 4480: lr=5.35E-06, loss= 1.0756 (max= 1.6009), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:58,719 - root - INFO - Step 4490: lr=5.35E-06, loss= 1.0530 (max= 1.4348), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:58,719 - root - INFO - Step 4490: lr=5.35E-06, loss= 1.0530 (max= 1.4348), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:58,719 - root - INFO - Step 4490: lr=5.35E-06, loss= 1.0530 (max= 1.4348), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:58,719 - root - INFO - Step 4490: lr=5.35E-06, loss= 1.0530 (max= 1.4348), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:58,719 - root - INFO - Step 4490: lr=5.35E-06, loss= 1.0530 (max= 1.4348), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:58,719 - root - INFO - Step 4490: lr=5.35E-06, loss= 1.0530 (max= 1.4348), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:58,720 - root - INFO - Step 4490: lr=5.35E-06, loss= 1.0530 (max= 1.4348), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:04:58,720 - root - INFO - Step 4490: lr=5.35E-06, loss= 1.0530 (max= 1.4348), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:05:30,672 - root - INFO - Step 4500: lr=5.34E-06, loss= 1.0770 (max= 1.5611), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:05:30,672 - root - INFO - Step 4500: lr=5.34E-06, loss= 1.0770 (max= 1.5611), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:05:30,672 - root - INFO - Step 4500: lr=5.34E-06, loss= 1.0770 (max= 1.5611), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:05:30,672 - root - INFO - Step 4500: lr=5.34E-06, loss= 1.0770 (max= 1.5611), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:05:30,672 - root - INFO - Step 4500: lr=5.34E-06, loss= 1.0770 (max= 1.5611), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:05:30,672 - root - INFO - Step 4500: lr=5.34E-06, loss= 1.0770 (max= 1.5611), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:05:30,672 - root - INFO - Step 4500: lr=5.34E-06, loss= 1.0770 (max= 1.5611), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:05:30,673 - root - INFO - Step 4500: lr=5.34E-06, loss= 1.0770 (max= 1.5611), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:02,439 - root - INFO - Step 4510: lr=5.33E-06, loss= 1.0836 (max= 1.6840), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:02,440 - root - INFO - Step 4510: lr=5.33E-06, loss= 1.0836 (max= 1.6840), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:02,440 - root - INFO - Step 4510: lr=5.33E-06, loss= 1.0836 (max= 1.6840), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:02,440 - root - INFO - Step 4510: lr=5.33E-06, loss= 1.0836 (max= 1.6840), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:02,440 - root - INFO - Step 4510: lr=5.33E-06, loss= 1.0836 (max= 1.6840), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:02,440 - root - INFO - Step 4510: lr=5.33E-06, loss= 1.0836 (max= 1.6840), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:02,440 - root - INFO - Step 4510: lr=5.33E-06, loss= 1.0836 (max= 1.6840), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:02,440 - root - INFO - Step 4510: lr=5.33E-06, loss= 1.0836 (max= 1.6840), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:34,261 - root - INFO - Step 4520: lr=5.33E-06, loss= 1.0656 (max= 1.4893), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:34,261 - root - INFO - Step 4520: lr=5.33E-06, loss= 1.0656 (max= 1.4893), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:34,261 - root - INFO - Step 4520: lr=5.33E-06, loss= 1.0656 (max= 1.4893), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:34,261 - root - INFO - Step 4520: lr=5.33E-06, loss= 1.0656 (max= 1.4893), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:34,261 - root - INFO - Step 4520: lr=5.33E-06, loss= 1.0656 (max= 1.4893), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:34,261 - root - INFO - Step 4520: lr=5.33E-06, loss= 1.0656 (max= 1.4893), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:34,261 - root - INFO - Step 4520: lr=5.33E-06, loss= 1.0656 (max= 1.4893), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:06:34,262 - root - INFO - Step 4520: lr=5.33E-06, loss= 1.0656 (max= 1.4893), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:06,031 - root - INFO - Step 4530: lr=5.32E-06, loss= 1.0487 (max= 1.4312), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:06,031 - root - INFO - Step 4530: lr=5.32E-06, loss= 1.0487 (max= 1.4312), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:06,031 - root - INFO - Step 4530: lr=5.32E-06, loss= 1.0487 (max= 1.4312), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:06,031 - root - INFO - Step 4530: lr=5.32E-06, loss= 1.0487 (max= 1.4312), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:06,031 - root - INFO - Step 4530: lr=5.32E-06, loss= 1.0487 (max= 1.4312), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:06,031 - root - INFO - Step 4530: lr=5.32E-06, loss= 1.0487 (max= 1.4312), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:06,031 - root - INFO - Step 4530: lr=5.32E-06, loss= 1.0487 (max= 1.4312), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:06,031 - root - INFO - Step 4530: lr=5.32E-06, loss= 1.0487 (max= 1.4312), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:38,054 - root - INFO - Step 4540: lr=5.32E-06, loss= 1.0742 (max= 1.6825), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:38,054 - root - INFO - Step 4540: lr=5.32E-06, loss= 1.0742 (max= 1.6825), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:38,054 - root - INFO - Step 4540: lr=5.32E-06, loss= 1.0742 (max= 1.6825), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:38,054 - root - INFO - Step 4540: lr=5.32E-06, loss= 1.0742 (max= 1.6825), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:38,054 - root - INFO - Step 4540: lr=5.32E-06, loss= 1.0742 (max= 1.6825), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:38,054 - root - INFO - Step 4540: lr=5.32E-06, loss= 1.0742 (max= 1.6825), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:38,054 - root - INFO - Step 4540: lr=5.32E-06, loss= 1.0742 (max= 1.6825), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:07:38,055 - root - INFO - Step 4540: lr=5.32E-06, loss= 1.0742 (max= 1.6825), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:10,103 - root - INFO - Step 4550: lr=5.31E-06, loss= 1.0759 (max= 1.4967), tps=20451, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:10,104 - root - INFO - Step 4550: lr=5.31E-06, loss= 1.0759 (max= 1.4967), tps=20451, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:10,104 - root - INFO - Step 4550: lr=5.31E-06, loss= 1.0759 (max= 1.4967), tps=20451, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:10,104 - root - INFO - Step 4550: lr=5.31E-06, loss= 1.0759 (max= 1.4967), tps=20451, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:10,104 - root - INFO - Step 4550: lr=5.31E-06, loss= 1.0759 (max= 1.4967), tps=20451, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:10,104 - root - INFO - Step 4550: lr=5.31E-06, loss= 1.0759 (max= 1.4967), tps=20450, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:10,104 - root - INFO - Step 4550: lr=5.31E-06, loss= 1.0759 (max= 1.4967), tps=20451, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:10,104 - root - INFO - Step 4550: lr=5.31E-06, loss= 1.0759 (max= 1.4967), tps=20451, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:42,201 - root - INFO - Step 4560: lr=5.31E-06, loss= 1.0778 (max= 1.7007), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:42,202 - root - INFO - Step 4560: lr=5.31E-06, loss= 1.0778 (max= 1.7007), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:42,202 - root - INFO - Step 4560: lr=5.31E-06, loss= 1.0778 (max= 1.7007), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:42,202 - root - INFO - Step 4560: lr=5.31E-06, loss= 1.0778 (max= 1.7007), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:42,202 - root - INFO - Step 4560: lr=5.31E-06, loss= 1.0778 (max= 1.7007), tps=20419, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:42,202 - root - INFO - Step 4560: lr=5.31E-06, loss= 1.0778 (max= 1.7007), tps=20420, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:42,202 - root - INFO - Step 4560: lr=5.31E-06, loss= 1.0778 (max= 1.7007), tps=20420, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:08:42,202 - root - INFO - Step 4560: lr=5.31E-06, loss= 1.0778 (max= 1.7007), tps=20420, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:13,969 - root - INFO - Step 4570: lr=5.30E-06, loss= 1.0829 (max= 1.4852), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:13,969 - root - INFO - Step 4570: lr=5.30E-06, loss= 1.0829 (max= 1.4852), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:13,969 - root - INFO - Step 4570: lr=5.30E-06, loss= 1.0829 (max= 1.4852), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:13,969 - root - INFO - Step 4570: lr=5.30E-06, loss= 1.0829 (max= 1.4852), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:13,969 - root - INFO - Step 4570: lr=5.30E-06, loss= 1.0829 (max= 1.4852), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:13,969 - root - INFO - Step 4570: lr=5.30E-06, loss= 1.0829 (max= 1.4852), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:13,969 - root - INFO - Step 4570: lr=5.30E-06, loss= 1.0829 (max= 1.4852), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:13,969 - root - INFO - Step 4570: lr=5.30E-06, loss= 1.0829 (max= 1.4852), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:45,824 - root - INFO - Step 4580: lr=5.29E-06, loss= 1.0610 (max= 1.4865), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:45,824 - root - INFO - Step 4580: lr=5.29E-06, loss= 1.0610 (max= 1.4865), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:45,824 - root - INFO - Step 4580: lr=5.29E-06, loss= 1.0610 (max= 1.4865), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:45,824 - root - INFO - Step 4580: lr=5.29E-06, loss= 1.0610 (max= 1.4865), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:45,824 - root - INFO - Step 4580: lr=5.29E-06, loss= 1.0610 (max= 1.4865), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:45,825 - root - INFO - Step 4580: lr=5.29E-06, loss= 1.0610 (max= 1.4865), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:45,825 - root - INFO - Step 4580: lr=5.29E-06, loss= 1.0610 (max= 1.4865), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:09:45,825 - root - INFO - Step 4580: lr=5.29E-06, loss= 1.0610 (max= 1.4865), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:17,655 - root - INFO - Step 4590: lr=5.29E-06, loss= 1.0826 (max= 1.5564), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:17,655 - root - INFO - Step 4590: lr=5.29E-06, loss= 1.0826 (max= 1.5564), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:17,655 - root - INFO - Step 4590: lr=5.29E-06, loss= 1.0826 (max= 1.5564), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:17,655 - root - INFO - Step 4590: lr=5.29E-06, loss= 1.0826 (max= 1.5564), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:17,655 - root - INFO - Step 4590: lr=5.29E-06, loss= 1.0826 (max= 1.5564), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:17,655 - root - INFO - Step 4590: lr=5.29E-06, loss= 1.0826 (max= 1.5564), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:17,655 - root - INFO - Step 4590: lr=5.29E-06, loss= 1.0826 (max= 1.5564), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:17,655 - root - INFO - Step 4590: lr=5.29E-06, loss= 1.0826 (max= 1.5564), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:49,464 - root - INFO - Step 4600: lr=5.28E-06, loss= 1.0808 (max= 1.4319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:49,464 - root - INFO - Step 4600: lr=5.28E-06, loss= 1.0808 (max= 1.4319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:49,464 - root - INFO - Step 4600: lr=5.28E-06, loss= 1.0808 (max= 1.4319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:49,464 - root - INFO - Step 4600: lr=5.28E-06, loss= 1.0808 (max= 1.4319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:49,464 - root - INFO - Step 4600: lr=5.28E-06, loss= 1.0808 (max= 1.4319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:49,464 - root - INFO - Step 4600: lr=5.28E-06, loss= 1.0808 (max= 1.4319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:49,464 - root - INFO - Step 4600: lr=5.28E-06, loss= 1.0808 (max= 1.4319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:10:49,464 - root - INFO - Step 4600: lr=5.28E-06, loss= 1.0808 (max= 1.4319), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:21,328 - root - INFO - Step 4610: lr=5.28E-06, loss= 1.0991 (max= 1.4882), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:21,328 - root - INFO - Step 4610: lr=5.28E-06, loss= 1.0991 (max= 1.4882), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:21,328 - root - INFO - Step 4610: lr=5.28E-06, loss= 1.0991 (max= 1.4882), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:21,328 - root - INFO - Step 4610: lr=5.28E-06, loss= 1.0991 (max= 1.4882), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:21,328 - root - INFO - Step 4610: lr=5.28E-06, loss= 1.0991 (max= 1.4882), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:21,328 - root - INFO - Step 4610: lr=5.28E-06, loss= 1.0991 (max= 1.4882), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:21,328 - root - INFO - Step 4610: lr=5.28E-06, loss= 1.0991 (max= 1.4882), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:21,328 - root - INFO - Step 4610: lr=5.28E-06, loss= 1.0991 (max= 1.4882), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:44,237 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:3694913 2025-10-26 12:11:53,214 - root - INFO - Step 4620: lr=5.27E-06, loss= 1.0553 (max= 1.4886), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:53,214 - root - INFO - Step 4620: lr=5.27E-06, loss= 1.0553 (max= 1.4886), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:53,214 - root - INFO - Step 4620: lr=5.27E-06, loss= 1.0553 (max= 1.4886), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:53,214 - root - INFO - Step 4620: lr=5.27E-06, loss= 1.0553 (max= 1.4886), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:53,214 - root - INFO - Step 4620: lr=5.27E-06, loss= 1.0553 (max= 1.4886), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:53,215 - root - INFO - Step 4620: lr=5.27E-06, loss= 1.0553 (max= 1.4886), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:53,215 - root - INFO - Step 4620: lr=5.27E-06, loss= 1.0553 (max= 1.4886), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:11:53,215 - root - INFO - Step 4620: lr=5.27E-06, loss= 1.0553 (max= 1.4886), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:24,971 - root - INFO - Step 4630: lr=5.27E-06, loss= 1.0668 (max= 1.6827), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:24,971 - root - INFO - Step 4630: lr=5.27E-06, loss= 1.0668 (max= 1.6827), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:24,971 - root - INFO - Step 4630: lr=5.27E-06, loss= 1.0668 (max= 1.6827), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:24,971 - root - INFO - Step 4630: lr=5.27E-06, loss= 1.0668 (max= 1.6827), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:24,971 - root - INFO - Step 4630: lr=5.27E-06, loss= 1.0668 (max= 1.6827), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:24,971 - root - INFO - Step 4630: lr=5.27E-06, loss= 1.0668 (max= 1.6827), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:24,971 - root - INFO - Step 4630: lr=5.27E-06, loss= 1.0668 (max= 1.6827), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:24,971 - root - INFO - Step 4630: lr=5.27E-06, loss= 1.0668 (max= 1.6827), tps=20640, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:56,787 - root - INFO - Step 4640: lr=5.26E-06, loss= 1.0798 (max= 1.7625), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:56,787 - root - INFO - Step 4640: lr=5.26E-06, loss= 1.0798 (max= 1.7625), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:56,788 - root - INFO - Step 4640: lr=5.26E-06, loss= 1.0798 (max= 1.7625), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:56,788 - root - INFO - Step 4640: lr=5.26E-06, loss= 1.0798 (max= 1.7625), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:56,788 - root - INFO - Step 4640: lr=5.26E-06, loss= 1.0798 (max= 1.7625), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:56,788 - root - INFO - Step 4640: lr=5.26E-06, loss= 1.0798 (max= 1.7625), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:56,788 - root - INFO - Step 4640: lr=5.26E-06, loss= 1.0798 (max= 1.7625), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:12:56,788 - root - INFO - Step 4640: lr=5.26E-06, loss= 1.0798 (max= 1.7625), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:13:28,702 - root - INFO - Step 4650: lr=5.25E-06, loss= 1.1032 (max= 1.8818), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:13:28,702 - root - INFO - Step 4650: lr=5.25E-06, loss= 1.1032 (max= 1.8818), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:13:28,702 - root - INFO - Step 4650: lr=5.25E-06, loss= 1.1032 (max= 1.8818), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:13:28,702 - root - INFO - Step 4650: lr=5.25E-06, loss= 1.1032 (max= 1.8818), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:13:28,702 - root - INFO - Step 4650: lr=5.25E-06, loss= 1.1032 (max= 1.8818), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:13:28,703 - root - INFO - Step 4650: lr=5.25E-06, loss= 1.1032 (max= 1.8818), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:13:28,703 - root - INFO - Step 4650: lr=5.25E-06, loss= 1.1032 (max= 1.8818), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:13:28,703 - root - INFO - Step 4650: lr=5.25E-06, loss= 1.1032 (max= 1.8818), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:00,532 - root - INFO - Step 4660: lr=5.25E-06, loss= 1.0886 (max= 1.8068), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:00,532 - root - INFO - Step 4660: lr=5.25E-06, loss= 1.0886 (max= 1.8068), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:00,532 - root - INFO - Step 4660: lr=5.25E-06, loss= 1.0886 (max= 1.8068), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:00,532 - root - INFO - Step 4660: lr=5.25E-06, loss= 1.0886 (max= 1.8068), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:00,532 - root - INFO - Step 4660: lr=5.25E-06, loss= 1.0886 (max= 1.8068), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:00,532 - root - INFO - Step 4660: lr=5.25E-06, loss= 1.0886 (max= 1.8068), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:00,533 - root - INFO - Step 4660: lr=5.25E-06, loss= 1.0886 (max= 1.8068), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:00,533 - root - INFO - Step 4660: lr=5.25E-06, loss= 1.0886 (max= 1.8068), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:32,300 - root - INFO - Step 4670: lr=5.24E-06, loss= 1.0823 (max= 1.6570), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:32,300 - root - INFO - Step 4670: lr=5.24E-06, loss= 1.0823 (max= 1.6570), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:32,300 - root - INFO - Step 4670: lr=5.24E-06, loss= 1.0823 (max= 1.6570), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:32,300 - root - INFO - Step 4670: lr=5.24E-06, loss= 1.0823 (max= 1.6570), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:32,300 - root - INFO - Step 4670: lr=5.24E-06, loss= 1.0823 (max= 1.6570), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:32,300 - root - INFO - Step 4670: lr=5.24E-06, loss= 1.0823 (max= 1.6570), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:32,300 - root - INFO - Step 4670: lr=5.24E-06, loss= 1.0823 (max= 1.6570), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:14:32,300 - root - INFO - Step 4670: lr=5.24E-06, loss= 1.0823 (max= 1.6570), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:04,171 - root - INFO - Step 4680: lr=5.24E-06, loss= 1.0734 (max= 1.5668), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:04,171 - root - INFO - Step 4680: lr=5.24E-06, loss= 1.0734 (max= 1.5668), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:04,171 - root - INFO - Step 4680: lr=5.24E-06, loss= 1.0734 (max= 1.5668), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:04,171 - root - INFO - Step 4680: lr=5.24E-06, loss= 1.0734 (max= 1.5668), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:04,171 - root - INFO - Step 4680: lr=5.24E-06, loss= 1.0734 (max= 1.5668), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:04,171 - root - INFO - Step 4680: lr=5.24E-06, loss= 1.0734 (max= 1.5668), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:04,171 - root - INFO - Step 4680: lr=5.24E-06, loss= 1.0734 (max= 1.5668), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:04,172 - root - INFO - Step 4680: lr=5.24E-06, loss= 1.0734 (max= 1.5668), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:36,033 - root - INFO - Step 4690: lr=5.23E-06, loss= 1.0722 (max= 1.6014), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:36,033 - root - INFO - Step 4690: lr=5.23E-06, loss= 1.0722 (max= 1.6014), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:36,033 - root - INFO - Step 4690: lr=5.23E-06, loss= 1.0722 (max= 1.6014), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:36,034 - root - INFO - Step 4690: lr=5.23E-06, loss= 1.0722 (max= 1.6014), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:36,034 - root - INFO - Step 4690: lr=5.23E-06, loss= 1.0722 (max= 1.6014), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:36,034 - root - INFO - Step 4690: lr=5.23E-06, loss= 1.0722 (max= 1.6014), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:36,034 - root - INFO - Step 4690: lr=5.23E-06, loss= 1.0722 (max= 1.6014), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:15:36,034 - root - INFO - Step 4690: lr=5.23E-06, loss= 1.0722 (max= 1.6014), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:07,844 - root - INFO - Step 4700: lr=5.23E-06, loss= 1.0828 (max= 1.5539), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:07,844 - root - INFO - Step 4700: lr=5.23E-06, loss= 1.0828 (max= 1.5539), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:07,845 - root - INFO - Step 4700: lr=5.23E-06, loss= 1.0828 (max= 1.5539), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:07,845 - root - INFO - Step 4700: lr=5.23E-06, loss= 1.0828 (max= 1.5539), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:07,845 - root - INFO - Step 4700: lr=5.23E-06, loss= 1.0828 (max= 1.5539), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:07,845 - root - INFO - Step 4700: lr=5.23E-06, loss= 1.0828 (max= 1.5539), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:07,845 - root - INFO - Step 4700: lr=5.23E-06, loss= 1.0828 (max= 1.5539), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:07,845 - root - INFO - Step 4700: lr=5.23E-06, loss= 1.0828 (max= 1.5539), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:39,681 - root - INFO - Step 4710: lr=5.22E-06, loss= 1.0627 (max= 1.5149), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:39,681 - root - INFO - Step 4710: lr=5.22E-06, loss= 1.0627 (max= 1.5149), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:39,681 - root - INFO - Step 4710: lr=5.22E-06, loss= 1.0627 (max= 1.5149), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:39,681 - root - INFO - Step 4710: lr=5.22E-06, loss= 1.0627 (max= 1.5149), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:39,681 - root - INFO - Step 4710: lr=5.22E-06, loss= 1.0627 (max= 1.5149), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:39,681 - root - INFO - Step 4710: lr=5.22E-06, loss= 1.0627 (max= 1.5149), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:39,681 - root - INFO - Step 4710: lr=5.22E-06, loss= 1.0627 (max= 1.5149), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:16:39,681 - root - INFO - Step 4710: lr=5.22E-06, loss= 1.0627 (max= 1.5149), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:11,528 - root - INFO - Step 4720: lr=5.21E-06, loss= 1.0332 (max= 1.5293), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:11,529 - root - INFO - Step 4720: lr=5.21E-06, loss= 1.0332 (max= 1.5293), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:11,529 - root - INFO - Step 4720: lr=5.21E-06, loss= 1.0332 (max= 1.5293), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:11,529 - root - INFO - Step 4720: lr=5.21E-06, loss= 1.0332 (max= 1.5293), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:11,529 - root - INFO - Step 4720: lr=5.21E-06, loss= 1.0332 (max= 1.5293), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:11,529 - root - INFO - Step 4720: lr=5.21E-06, loss= 1.0332 (max= 1.5293), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:11,529 - root - INFO - Step 4720: lr=5.21E-06, loss= 1.0332 (max= 1.5293), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:11,529 - root - INFO - Step 4720: lr=5.21E-06, loss= 1.0332 (max= 1.5293), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:43,396 - root - INFO - Step 4730: lr=5.21E-06, loss= 1.0754 (max= 1.5339), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:43,396 - root - INFO - Step 4730: lr=5.21E-06, loss= 1.0754 (max= 1.5339), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:43,396 - root - INFO - Step 4730: lr=5.21E-06, loss= 1.0754 (max= 1.5339), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:43,396 - root - INFO - Step 4730: lr=5.21E-06, loss= 1.0754 (max= 1.5339), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:43,396 - root - INFO - Step 4730: lr=5.21E-06, loss= 1.0754 (max= 1.5339), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:43,396 - root - INFO - Step 4730: lr=5.21E-06, loss= 1.0754 (max= 1.5339), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:43,396 - root - INFO - Step 4730: lr=5.21E-06, loss= 1.0754 (max= 1.5339), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:17:43,396 - root - INFO - Step 4730: lr=5.21E-06, loss= 1.0754 (max= 1.5339), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:08,042 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:6486819 2025-10-26 12:18:15,295 - root - INFO - Step 4740: lr=5.20E-06, loss= 1.0779 (max= 1.6167), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:15,295 - root - INFO - Step 4740: lr=5.20E-06, loss= 1.0779 (max= 1.6167), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:15,295 - root - INFO - Step 4740: lr=5.20E-06, loss= 1.0779 (max= 1.6167), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:15,295 - root - INFO - Step 4740: lr=5.20E-06, loss= 1.0779 (max= 1.6167), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:15,295 - root - INFO - Step 4740: lr=5.20E-06, loss= 1.0779 (max= 1.6167), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:15,296 - root - INFO - Step 4740: lr=5.20E-06, loss= 1.0779 (max= 1.6167), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:15,296 - root - INFO - Step 4740: lr=5.20E-06, loss= 1.0779 (max= 1.6167), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:15,296 - root - INFO - Step 4740: lr=5.20E-06, loss= 1.0779 (max= 1.6167), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:47,105 - root - INFO - Step 4750: lr=5.20E-06, loss= 1.0699 (max= 1.5708), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:47,105 - root - INFO - Step 4750: lr=5.20E-06, loss= 1.0699 (max= 1.5708), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:47,105 - root - INFO - Step 4750: lr=5.20E-06, loss= 1.0699 (max= 1.5708), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:47,105 - root - INFO - Step 4750: lr=5.20E-06, loss= 1.0699 (max= 1.5708), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:47,105 - root - INFO - Step 4750: lr=5.20E-06, loss= 1.0699 (max= 1.5708), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:47,105 - root - INFO - Step 4750: lr=5.20E-06, loss= 1.0699 (max= 1.5708), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:47,105 - root - INFO - Step 4750: lr=5.20E-06, loss= 1.0699 (max= 1.5708), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:18:47,106 - root - INFO - Step 4750: lr=5.20E-06, loss= 1.0699 (max= 1.5708), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:18,871 - root - INFO - Step 4760: lr=5.19E-06, loss= 1.0541 (max= 1.4512), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:18,871 - root - INFO - Step 4760: lr=5.19E-06, loss= 1.0541 (max= 1.4512), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:18,871 - root - INFO - Step 4760: lr=5.19E-06, loss= 1.0541 (max= 1.4512), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:18,871 - root - INFO - Step 4760: lr=5.19E-06, loss= 1.0541 (max= 1.4512), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:18,871 - root - INFO - Step 4760: lr=5.19E-06, loss= 1.0541 (max= 1.4512), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:18,871 - root - INFO - Step 4760: lr=5.19E-06, loss= 1.0541 (max= 1.4512), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:18,871 - root - INFO - Step 4760: lr=5.19E-06, loss= 1.0541 (max= 1.4512), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:18,871 - root - INFO - Step 4760: lr=5.19E-06, loss= 1.0541 (max= 1.4512), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:50,821 - root - INFO - Step 4770: lr=5.19E-06, loss= 1.0706 (max= 1.5637), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:50,821 - root - INFO - Step 4770: lr=5.19E-06, loss= 1.0706 (max= 1.5637), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:50,821 - root - INFO - Step 4770: lr=5.19E-06, loss= 1.0706 (max= 1.5637), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:50,821 - root - INFO - Step 4770: lr=5.19E-06, loss= 1.0706 (max= 1.5637), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:50,821 - root - INFO - Step 4770: lr=5.19E-06, loss= 1.0706 (max= 1.5637), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:50,821 - root - INFO - Step 4770: lr=5.19E-06, loss= 1.0706 (max= 1.5637), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:50,821 - root - INFO - Step 4770: lr=5.19E-06, loss= 1.0706 (max= 1.5637), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:19:50,822 - root - INFO - Step 4770: lr=5.19E-06, loss= 1.0706 (max= 1.5637), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:22,638 - root - INFO - Step 4780: lr=5.18E-06, loss= 1.0727 (max= 1.7640), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:22,638 - root - INFO - Step 4780: lr=5.18E-06, loss= 1.0727 (max= 1.7640), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:22,638 - root - INFO - Step 4780: lr=5.18E-06, loss= 1.0727 (max= 1.7640), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:22,638 - root - INFO - Step 4780: lr=5.18E-06, loss= 1.0727 (max= 1.7640), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:22,638 - root - INFO - Step 4780: lr=5.18E-06, loss= 1.0727 (max= 1.7640), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:22,638 - root - INFO - Step 4780: lr=5.18E-06, loss= 1.0727 (max= 1.7640), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:22,638 - root - INFO - Step 4780: lr=5.18E-06, loss= 1.0727 (max= 1.7640), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:22,639 - root - INFO - Step 4780: lr=5.18E-06, loss= 1.0727 (max= 1.7640), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:54,442 - root - INFO - Step 4790: lr=5.17E-06, loss= 1.0682 (max= 1.6382), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:54,442 - root - INFO - Step 4790: lr=5.17E-06, loss= 1.0682 (max= 1.6382), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:54,442 - root - INFO - Step 4790: lr=5.17E-06, loss= 1.0682 (max= 1.6382), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:54,442 - root - INFO - Step 4790: lr=5.17E-06, loss= 1.0682 (max= 1.6382), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:54,442 - root - INFO - Step 4790: lr=5.17E-06, loss= 1.0682 (max= 1.6382), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:54,442 - root - INFO - Step 4790: lr=5.17E-06, loss= 1.0682 (max= 1.6382), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:54,442 - root - INFO - Step 4790: lr=5.17E-06, loss= 1.0682 (max= 1.6382), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:20:54,442 - root - INFO - Step 4790: lr=5.17E-06, loss= 1.0682 (max= 1.6382), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:26,314 - root - INFO - Step 4800: lr=5.17E-06, loss= 1.0786 (max= 1.5221), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:26,314 - root - INFO - Step 4800: lr=5.17E-06, loss= 1.0786 (max= 1.5221), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:26,314 - root - INFO - Step 4800: lr=5.17E-06, loss= 1.0786 (max= 1.5221), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:26,314 - root - INFO - Step 4800: lr=5.17E-06, loss= 1.0786 (max= 1.5221), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:26,315 - root - INFO - Step 4800: lr=5.17E-06, loss= 1.0786 (max= 1.5221), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:26,315 - root - INFO - Step 4800: lr=5.17E-06, loss= 1.0786 (max= 1.5221), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:26,315 - root - INFO - Step 4800: lr=5.17E-06, loss= 1.0786 (max= 1.5221), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:26,315 - root - INFO - Step 4800: lr=5.17E-06, loss= 1.0786 (max= 1.5221), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:58,234 - root - INFO - Step 4810: lr=5.16E-06, loss= 1.0968 (max= 1.7084), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:58,234 - root - INFO - Step 4810: lr=5.16E-06, loss= 1.0968 (max= 1.7084), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:58,234 - root - INFO - Step 4810: lr=5.16E-06, loss= 1.0968 (max= 1.7084), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:58,234 - root - INFO - Step 4810: lr=5.16E-06, loss= 1.0968 (max= 1.7084), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:58,234 - root - INFO - Step 4810: lr=5.16E-06, loss= 1.0968 (max= 1.7084), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:58,234 - root - INFO - Step 4810: lr=5.16E-06, loss= 1.0968 (max= 1.7084), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:58,234 - root - INFO - Step 4810: lr=5.16E-06, loss= 1.0968 (max= 1.7084), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:21:58,234 - root - INFO - Step 4810: lr=5.16E-06, loss= 1.0968 (max= 1.7084), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:22:30,098 - root - INFO - Step 4820: lr=5.16E-06, loss= 1.0903 (max= 1.5521), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:22:30,098 - root - INFO - Step 4820: lr=5.16E-06, loss= 1.0903 (max= 1.5521), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:22:30,098 - root - INFO - Step 4820: lr=5.16E-06, loss= 1.0903 (max= 1.5521), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:22:30,098 - root - INFO - Step 4820: lr=5.16E-06, loss= 1.0903 (max= 1.5521), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:22:30,098 - root - INFO - Step 4820: lr=5.16E-06, loss= 1.0903 (max= 1.5521), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:22:30,099 - root - INFO - Step 4820: lr=5.16E-06, loss= 1.0903 (max= 1.5521), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:22:30,099 - root - INFO - Step 4820: lr=5.16E-06, loss= 1.0903 (max= 1.5521), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:22:30,099 - root - INFO - Step 4820: lr=5.16E-06, loss= 1.0903 (max= 1.5521), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:02,067 - root - INFO - Step 4830: lr=5.15E-06, loss= 1.0634 (max= 1.5161), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:02,067 - root - INFO - Step 4830: lr=5.15E-06, loss= 1.0634 (max= 1.5161), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:02,067 - root - INFO - Step 4830: lr=5.15E-06, loss= 1.0634 (max= 1.5161), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:02,067 - root - INFO - Step 4830: lr=5.15E-06, loss= 1.0634 (max= 1.5161), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:02,067 - root - INFO - Step 4830: lr=5.15E-06, loss= 1.0634 (max= 1.5161), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:02,068 - root - INFO - Step 4830: lr=5.15E-06, loss= 1.0634 (max= 1.5161), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:02,068 - root - INFO - Step 4830: lr=5.15E-06, loss= 1.0634 (max= 1.5161), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:02,068 - root - INFO - Step 4830: lr=5.15E-06, loss= 1.0634 (max= 1.5161), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:33,954 - root - INFO - Step 4840: lr=5.15E-06, loss= 1.0768 (max= 1.6256), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:33,954 - root - INFO - Step 4840: lr=5.15E-06, loss= 1.0768 (max= 1.6256), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:33,954 - root - INFO - Step 4840: lr=5.15E-06, loss= 1.0768 (max= 1.6256), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:33,954 - root - INFO - Step 4840: lr=5.15E-06, loss= 1.0768 (max= 1.6256), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:33,954 - root - INFO - Step 4840: lr=5.15E-06, loss= 1.0768 (max= 1.6256), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:33,954 - root - INFO - Step 4840: lr=5.15E-06, loss= 1.0768 (max= 1.6256), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:33,954 - root - INFO - Step 4840: lr=5.15E-06, loss= 1.0768 (max= 1.6256), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:23:33,955 - root - INFO - Step 4840: lr=5.15E-06, loss= 1.0768 (max= 1.6256), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:05,740 - root - INFO - Step 4850: lr=5.14E-06, loss= 1.0819 (max= 1.6046), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:05,740 - root - INFO - Step 4850: lr=5.14E-06, loss= 1.0819 (max= 1.6046), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:05,740 - root - INFO - Step 4850: lr=5.14E-06, loss= 1.0819 (max= 1.6046), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:05,740 - root - INFO - Step 4850: lr=5.14E-06, loss= 1.0819 (max= 1.6046), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:05,740 - root - INFO - Step 4850: lr=5.14E-06, loss= 1.0819 (max= 1.6046), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:05,740 - root - INFO - Step 4850: lr=5.14E-06, loss= 1.0819 (max= 1.6046), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:05,740 - root - INFO - Step 4850: lr=5.14E-06, loss= 1.0819 (max= 1.6046), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:05,740 - root - INFO - Step 4850: lr=5.14E-06, loss= 1.0819 (max= 1.6046), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:37,623 - root - INFO - Step 4860: lr=5.14E-06, loss= 1.0895 (max= 1.6468), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:37,623 - root - INFO - Step 4860: lr=5.14E-06, loss= 1.0895 (max= 1.6468), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:37,623 - root - INFO - Step 4860: lr=5.14E-06, loss= 1.0895 (max= 1.6468), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:37,623 - root - INFO - Step 4860: lr=5.14E-06, loss= 1.0895 (max= 1.6468), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:37,623 - root - INFO - Step 4860: lr=5.14E-06, loss= 1.0895 (max= 1.6468), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:37,623 - root - INFO - Step 4860: lr=5.14E-06, loss= 1.0895 (max= 1.6468), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:37,623 - root - INFO - Step 4860: lr=5.14E-06, loss= 1.0895 (max= 1.6468), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:24:37,625 - root - INFO - Step 4860: lr=5.14E-06, loss= 1.0895 (max= 1.6468), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:09,599 - root - INFO - Step 4870: lr=5.13E-06, loss= 1.0845 (max= 1.5898), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:09,599 - root - INFO - Step 4870: lr=5.13E-06, loss= 1.0845 (max= 1.5898), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:09,599 - root - INFO - Step 4870: lr=5.13E-06, loss= 1.0845 (max= 1.5898), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:09,599 - root - INFO - Step 4870: lr=5.13E-06, loss= 1.0845 (max= 1.5898), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:09,599 - root - INFO - Step 4870: lr=5.13E-06, loss= 1.0845 (max= 1.5898), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:09,599 - root - INFO - Step 4870: lr=5.13E-06, loss= 1.0845 (max= 1.5898), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:09,599 - root - INFO - Step 4870: lr=5.13E-06, loss= 1.0845 (max= 1.5898), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:09,599 - root - INFO - Step 4870: lr=5.13E-06, loss= 1.0845 (max= 1.5898), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:41,457 - root - INFO - Step 4880: lr=5.12E-06, loss= 1.0956 (max= 1.4073), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:41,457 - root - INFO - Step 4880: lr=5.12E-06, loss= 1.0956 (max= 1.4073), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:41,457 - root - INFO - Step 4880: lr=5.12E-06, loss= 1.0956 (max= 1.4073), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:41,457 - root - INFO - Step 4880: lr=5.12E-06, loss= 1.0956 (max= 1.4073), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:41,457 - root - INFO - Step 4880: lr=5.12E-06, loss= 1.0956 (max= 1.4073), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:41,458 - root - INFO - Step 4880: lr=5.12E-06, loss= 1.0956 (max= 1.4073), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:41,458 - root - INFO - Step 4880: lr=5.12E-06, loss= 1.0956 (max= 1.4073), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:25:41,458 - root - INFO - Step 4880: lr=5.12E-06, loss= 1.0956 (max= 1.4073), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:13,416 - root - INFO - Step 4890: lr=5.12E-06, loss= 1.1004 (max= 2.0463), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:13,416 - root - INFO - Step 4890: lr=5.12E-06, loss= 1.1004 (max= 2.0463), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:13,416 - root - INFO - Step 4890: lr=5.12E-06, loss= 1.1004 (max= 2.0463), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:13,416 - root - INFO - Step 4890: lr=5.12E-06, loss= 1.1004 (max= 2.0463), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:13,416 - root - INFO - Step 4890: lr=5.12E-06, loss= 1.1004 (max= 2.0463), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:13,416 - root - INFO - Step 4890: lr=5.12E-06, loss= 1.1004 (max= 2.0463), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:13,416 - root - INFO - Step 4890: lr=5.12E-06, loss= 1.1004 (max= 2.0463), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:13,416 - root - INFO - Step 4890: lr=5.12E-06, loss= 1.1004 (max= 2.0463), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:45,628 - root - INFO - Step 4900: lr=5.11E-06, loss= 1.0963 (max= 2.0531), tps=20347, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:45,628 - root - INFO - Step 4900: lr=5.11E-06, loss= 1.0963 (max= 2.0531), tps=20347, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:45,628 - root - INFO - Step 4900: lr=5.11E-06, loss= 1.0963 (max= 2.0531), tps=20347, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:45,628 - root - INFO - Step 4900: lr=5.11E-06, loss= 1.0963 (max= 2.0531), tps=20347, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:45,629 - root - INFO - Step 4900: lr=5.11E-06, loss= 1.0963 (max= 2.0531), tps=20347, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:45,629 - root - INFO - Step 4900: lr=5.11E-06, loss= 1.0963 (max= 2.0531), tps=20347, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:45,629 - root - INFO - Step 4900: lr=5.11E-06, loss= 1.0963 (max= 2.0531), tps=20347, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:26:45,629 - root - INFO - Step 4900: lr=5.11E-06, loss= 1.0963 (max= 2.0531), tps=20347, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:17,457 - root - INFO - Step 4910: lr=5.11E-06, loss= 1.0914 (max= 1.5040), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:17,457 - root - INFO - Step 4910: lr=5.11E-06, loss= 1.0914 (max= 1.5040), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:17,457 - root - INFO - Step 4910: lr=5.11E-06, loss= 1.0914 (max= 1.5040), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:17,457 - root - INFO - Step 4910: lr=5.11E-06, loss= 1.0914 (max= 1.5040), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:17,457 - root - INFO - Step 4910: lr=5.11E-06, loss= 1.0914 (max= 1.5040), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:17,457 - root - INFO - Step 4910: lr=5.11E-06, loss= 1.0914 (max= 1.5040), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:17,457 - root - INFO - Step 4910: lr=5.11E-06, loss= 1.0914 (max= 1.5040), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:17,457 - root - INFO - Step 4910: lr=5.11E-06, loss= 1.0914 (max= 1.5040), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:49,300 - root - INFO - Step 4920: lr=5.10E-06, loss= 1.0673 (max= 1.4734), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:49,300 - root - INFO - Step 4920: lr=5.10E-06, loss= 1.0673 (max= 1.4734), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:49,300 - root - INFO - Step 4920: lr=5.10E-06, loss= 1.0673 (max= 1.4734), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:49,300 - root - INFO - Step 4920: lr=5.10E-06, loss= 1.0673 (max= 1.4734), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:49,301 - root - INFO - Step 4920: lr=5.10E-06, loss= 1.0673 (max= 1.4734), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:49,301 - root - INFO - Step 4920: lr=5.10E-06, loss= 1.0673 (max= 1.4734), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:49,301 - root - INFO - Step 4920: lr=5.10E-06, loss= 1.0673 (max= 1.4734), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:27:49,301 - root - INFO - Step 4920: lr=5.10E-06, loss= 1.0673 (max= 1.4734), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:21,109 - root - INFO - Step 4930: lr=5.10E-06, loss= 1.0633 (max= 1.6728), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:21,109 - root - INFO - Step 4930: lr=5.10E-06, loss= 1.0633 (max= 1.6728), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:21,109 - root - INFO - Step 4930: lr=5.10E-06, loss= 1.0633 (max= 1.6728), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:21,109 - root - INFO - Step 4930: lr=5.10E-06, loss= 1.0633 (max= 1.6728), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:21,109 - root - INFO - Step 4930: lr=5.10E-06, loss= 1.0633 (max= 1.6728), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:21,109 - root - INFO - Step 4930: lr=5.10E-06, loss= 1.0633 (max= 1.6728), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:21,109 - root - INFO - Step 4930: lr=5.10E-06, loss= 1.0633 (max= 1.6728), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:21,109 - root - INFO - Step 4930: lr=5.10E-06, loss= 1.0633 (max= 1.6728), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:21,691 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5262024 2025-10-26 12:28:52,961 - root - INFO - Step 4940: lr=5.09E-06, loss= 1.0858 (max= 1.6317), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:52,961 - root - INFO - Step 4940: lr=5.09E-06, loss= 1.0858 (max= 1.6317), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:52,961 - root - INFO - Step 4940: lr=5.09E-06, loss= 1.0858 (max= 1.6317), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:52,961 - root - INFO - Step 4940: lr=5.09E-06, loss= 1.0858 (max= 1.6317), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:52,961 - root - INFO - Step 4940: lr=5.09E-06, loss= 1.0858 (max= 1.6317), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:52,961 - root - INFO - Step 4940: lr=5.09E-06, loss= 1.0858 (max= 1.6317), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:52,961 - root - INFO - Step 4940: lr=5.09E-06, loss= 1.0858 (max= 1.6317), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:28:52,961 - root - INFO - Step 4940: lr=5.09E-06, loss= 1.0858 (max= 1.6317), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:24,780 - root - INFO - Step 4950: lr=5.09E-06, loss= 1.1124 (max= 1.5581), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:24,780 - root - INFO - Step 4950: lr=5.09E-06, loss= 1.1124 (max= 1.5581), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:24,780 - root - INFO - Step 4950: lr=5.09E-06, loss= 1.1124 (max= 1.5581), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:24,780 - root - INFO - Step 4950: lr=5.09E-06, loss= 1.1124 (max= 1.5581), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:24,780 - root - INFO - Step 4950: lr=5.09E-06, loss= 1.1124 (max= 1.5581), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:24,780 - root - INFO - Step 4950: lr=5.09E-06, loss= 1.1124 (max= 1.5581), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:24,780 - root - INFO - Step 4950: lr=5.09E-06, loss= 1.1124 (max= 1.5581), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:24,780 - root - INFO - Step 4950: lr=5.09E-06, loss= 1.1124 (max= 1.5581), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:56,560 - root - INFO - Step 4960: lr=5.08E-06, loss= 1.1016 (max= 1.4951), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:56,560 - root - INFO - Step 4960: lr=5.08E-06, loss= 1.1016 (max= 1.4951), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:56,560 - root - INFO - Step 4960: lr=5.08E-06, loss= 1.1016 (max= 1.4951), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:56,560 - root - INFO - Step 4960: lr=5.08E-06, loss= 1.1016 (max= 1.4951), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:56,560 - root - INFO - Step 4960: lr=5.08E-06, loss= 1.1016 (max= 1.4951), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:56,561 - root - INFO - Step 4960: lr=5.08E-06, loss= 1.1016 (max= 1.4951), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:56,561 - root - INFO - Step 4960: lr=5.08E-06, loss= 1.1016 (max= 1.4951), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:29:56,561 - root - INFO - Step 4960: lr=5.08E-06, loss= 1.1016 (max= 1.4951), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:30:28,354 - root - INFO - Step 4970: lr=5.07E-06, loss= 1.0815 (max= 1.8154), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:30:28,355 - root - INFO - Step 4970: lr=5.07E-06, loss= 1.0815 (max= 1.8154), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:30:28,355 - root - INFO - Step 4970: lr=5.07E-06, loss= 1.0815 (max= 1.8154), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:30:28,355 - root - INFO - Step 4970: lr=5.07E-06, loss= 1.0815 (max= 1.8154), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:30:28,355 - root - INFO - Step 4970: lr=5.07E-06, loss= 1.0815 (max= 1.8154), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:30:28,355 - root - INFO - Step 4970: lr=5.07E-06, loss= 1.0815 (max= 1.8154), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:30:28,355 - root - INFO - Step 4970: lr=5.07E-06, loss= 1.0815 (max= 1.8154), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:30:28,356 - root - INFO - Step 4970: lr=5.07E-06, loss= 1.0815 (max= 1.8154), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:00,269 - root - INFO - Step 4980: lr=5.07E-06, loss= 1.1121 (max= 1.6733), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:00,269 - root - INFO - Step 4980: lr=5.07E-06, loss= 1.1121 (max= 1.6733), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:00,269 - root - INFO - Step 4980: lr=5.07E-06, loss= 1.1121 (max= 1.6733), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:00,269 - root - INFO - Step 4980: lr=5.07E-06, loss= 1.1121 (max= 1.6733), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:00,269 - root - INFO - Step 4980: lr=5.07E-06, loss= 1.1121 (max= 1.6733), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:00,269 - root - INFO - Step 4980: lr=5.07E-06, loss= 1.1121 (max= 1.6733), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:00,269 - root - INFO - Step 4980: lr=5.07E-06, loss= 1.1121 (max= 1.6733), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:00,270 - root - INFO - Step 4980: lr=5.07E-06, loss= 1.1121 (max= 1.6733), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:32,115 - root - INFO - Step 4990: lr=5.06E-06, loss= 1.0708 (max= 1.4749), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:32,115 - root - INFO - Step 4990: lr=5.06E-06, loss= 1.0708 (max= 1.4749), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:32,115 - root - INFO - Step 4990: lr=5.06E-06, loss= 1.0708 (max= 1.4749), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:32,115 - root - INFO - Step 4990: lr=5.06E-06, loss= 1.0708 (max= 1.4749), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:32,115 - root - INFO - Step 4990: lr=5.06E-06, loss= 1.0708 (max= 1.4749), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:32,115 - root - INFO - Step 4990: lr=5.06E-06, loss= 1.0708 (max= 1.4749), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:32,116 - root - INFO - Step 4990: lr=5.06E-06, loss= 1.0708 (max= 1.4749), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:31:32,116 - root - INFO - Step 4990: lr=5.06E-06, loss= 1.0708 (max= 1.4749), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-5000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-5000! Save time: 4.400207757949829 2025-10-26 12:32:03,990 - root - INFO - Step 5000: lr=5.06E-06, loss= 1.1001 (max= 1.5945), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:03,990 - root - INFO - Step 5000: lr=5.06E-06, loss= 1.1001 (max= 1.5945), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:03,990 - root - INFO - Step 5000: lr=5.06E-06, loss= 1.1001 (max= 1.5945), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:03,990 - root - INFO - Step 5000: lr=5.06E-06, loss= 1.1001 (max= 1.5945), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:03,990 - root - INFO - Saving a full checkpoint at step 5000 2025-10-26 12:32:03,990 - root - INFO - Step 5000: lr=5.06E-06, loss= 1.1001 (max= 1.5945), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:03,990 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 12:32:03,990 - root - INFO - Saving a full checkpoint at step 5000 2025-10-26 12:32:03,990 - root - INFO - Saving a full checkpoint at step 5000 2025-10-26 12:32:03,990 - root - INFO - Saving a full checkpoint at step 5000 2025-10-26 12:32:03,990 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 12:32:03,990 - root - INFO - Saving a full checkpoint at step 5000 2025-10-26 12:32:03,990 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 12:32:03,990 - root - INFO - Step 5000: lr=5.06E-06, loss= 1.1001 (max= 1.5945), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:03,990 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 12:32:03,990 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 12:32:03,990 - root - INFO - Saving a full checkpoint at step 5000 2025-10-26 12:32:03,990 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 12:32:03,990 - root - INFO - Step 5000: lr=5.06E-06, loss= 1.1001 (max= 1.5945), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:03,990 - root - INFO - Step 5000: lr=5.06E-06, loss= 1.1001 (max= 1.5945), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:03,990 - root - INFO - Saving a full checkpoint at step 5000 2025-10-26 12:32:03,990 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 12:32:03,990 - root - INFO - Saving a full checkpoint at step 5000 2025-10-26 12:32:03,990 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 12:32:19,870 - root - INFO - Finished saving the checkpoint in 15.88 seconds 2025-10-26 12:32:19,877 - root - INFO - Finished saving the checkpoint in 15.89 seconds 2025-10-26 12:32:19,877 - root - INFO - Finished saving the checkpoint in 15.89 seconds 2025-10-26 12:32:19,878 - root - INFO - Finished saving the checkpoint in 15.89 seconds 2025-10-26 12:32:19,878 - root - INFO - Finished saving the checkpoint in 15.89 seconds 2025-10-26 12:32:19,879 - root - INFO - Finished saving the checkpoint in 15.89 seconds 2025-10-26 12:32:19,880 - root - INFO - Finished saving the checkpoint in 15.89 seconds 2025-10-26 12:32:19,881 - root - INFO - Finished saving the checkpoint in 15.89 seconds 2025-10-26 12:32:51,584 - root - INFO - Step 5010: lr=5.05E-06, loss= 1.0988 (max= 1.7976), tps=13771, mfu=28.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:51,584 - root - INFO - Step 5010: lr=5.05E-06, loss= 1.0988 (max= 1.7976), tps=13771, mfu=28.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:51,584 - root - INFO - Step 5010: lr=5.05E-06, loss= 1.0988 (max= 1.7976), tps=13771, mfu=28.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:51,584 - root - INFO - Step 5010: lr=5.05E-06, loss= 1.0988 (max= 1.7976), tps=13771, mfu=28.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:51,584 - root - INFO - Step 5010: lr=5.05E-06, loss= 1.0988 (max= 1.7976), tps=13771, mfu=28.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:51,584 - root - INFO - Step 5010: lr=5.05E-06, loss= 1.0988 (max= 1.7976), tps=13771, mfu=28.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:51,584 - root - INFO - Step 5010: lr=5.05E-06, loss= 1.0988 (max= 1.7976), tps=13771, mfu=28.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:32:51,585 - root - INFO - Step 5010: lr=5.05E-06, loss= 1.0988 (max= 1.7976), tps=13771, mfu=28.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:23,447 - root - INFO - Step 5020: lr=5.05E-06, loss= 1.0858 (max= 1.5213), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:23,447 - root - INFO - Step 5020: lr=5.05E-06, loss= 1.0858 (max= 1.5213), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:23,447 - root - INFO - Step 5020: lr=5.05E-06, loss= 1.0858 (max= 1.5213), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:23,447 - root - INFO - Step 5020: lr=5.05E-06, loss= 1.0858 (max= 1.5213), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:23,447 - root - INFO - Step 5020: lr=5.05E-06, loss= 1.0858 (max= 1.5213), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:23,448 - root - INFO - Step 5020: lr=5.05E-06, loss= 1.0858 (max= 1.5213), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:23,448 - root - INFO - Step 5020: lr=5.05E-06, loss= 1.0858 (max= 1.5213), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:23,448 - root - INFO - Step 5020: lr=5.05E-06, loss= 1.0858 (max= 1.5213), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:55,305 - root - INFO - Step 5030: lr=5.04E-06, loss= 1.0875 (max= 1.4532), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:55,305 - root - INFO - Step 5030: lr=5.04E-06, loss= 1.0875 (max= 1.4532), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:55,305 - root - INFO - Step 5030: lr=5.04E-06, loss= 1.0875 (max= 1.4532), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:55,305 - root - INFO - Step 5030: lr=5.04E-06, loss= 1.0875 (max= 1.4532), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:55,305 - root - INFO - Step 5030: lr=5.04E-06, loss= 1.0875 (max= 1.4532), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:55,305 - root - INFO - Step 5030: lr=5.04E-06, loss= 1.0875 (max= 1.4532), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:55,305 - root - INFO - Step 5030: lr=5.04E-06, loss= 1.0875 (max= 1.4532), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:33:55,305 - root - INFO - Step 5030: lr=5.04E-06, loss= 1.0875 (max= 1.4532), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:27,090 - root - INFO - Step 5040: lr=5.04E-06, loss= 1.1134 (max= 1.5597), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:27,090 - root - INFO - Step 5040: lr=5.04E-06, loss= 1.1134 (max= 1.5597), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:27,090 - root - INFO - Step 5040: lr=5.04E-06, loss= 1.1134 (max= 1.5597), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:27,090 - root - INFO - Step 5040: lr=5.04E-06, loss= 1.1134 (max= 1.5597), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:27,090 - root - INFO - Step 5040: lr=5.04E-06, loss= 1.1134 (max= 1.5597), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:27,090 - root - INFO - Step 5040: lr=5.04E-06, loss= 1.1134 (max= 1.5597), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:27,090 - root - INFO - Step 5040: lr=5.04E-06, loss= 1.1134 (max= 1.5597), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:27,090 - root - INFO - Step 5040: lr=5.04E-06, loss= 1.1134 (max= 1.5597), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:58,916 - root - INFO - Step 5050: lr=5.03E-06, loss= 1.0831 (max= 1.4996), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:58,916 - root - INFO - Step 5050: lr=5.03E-06, loss= 1.0831 (max= 1.4996), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:58,916 - root - INFO - Step 5050: lr=5.03E-06, loss= 1.0831 (max= 1.4996), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:58,916 - root - INFO - Step 5050: lr=5.03E-06, loss= 1.0831 (max= 1.4996), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:58,916 - root - INFO - Step 5050: lr=5.03E-06, loss= 1.0831 (max= 1.4996), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:58,916 - root - INFO - Step 5050: lr=5.03E-06, loss= 1.0831 (max= 1.4996), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:58,916 - root - INFO - Step 5050: lr=5.03E-06, loss= 1.0831 (max= 1.4996), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:34:58,916 - root - INFO - Step 5050: lr=5.03E-06, loss= 1.0831 (max= 1.4996), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:35:30,687 - root - INFO - Step 5060: lr=5.03E-06, loss= 1.0985 (max= 1.4658), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:35:30,687 - root - INFO - Step 5060: lr=5.03E-06, loss= 1.0985 (max= 1.4658), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:35:30,687 - root - INFO - Step 5060: lr=5.03E-06, loss= 1.0985 (max= 1.4658), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:35:30,687 - root - INFO - Step 5060: lr=5.03E-06, loss= 1.0985 (max= 1.4658), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:35:30,687 - root - INFO - Step 5060: lr=5.03E-06, loss= 1.0985 (max= 1.4658), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:35:30,687 - root - INFO - Step 5060: lr=5.03E-06, loss= 1.0985 (max= 1.4658), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:35:30,688 - root - INFO - Step 5060: lr=5.03E-06, loss= 1.0985 (max= 1.4658), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:35:30,688 - root - INFO - Step 5060: lr=5.03E-06, loss= 1.0985 (max= 1.4658), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:02,494 - root - INFO - Step 5070: lr=5.02E-06, loss= 1.0939 (max= 1.5897), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:02,494 - root - INFO - Step 5070: lr=5.02E-06, loss= 1.0939 (max= 1.5897), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:02,494 - root - INFO - Step 5070: lr=5.02E-06, loss= 1.0939 (max= 1.5897), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:02,494 - root - INFO - Step 5070: lr=5.02E-06, loss= 1.0939 (max= 1.5897), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:02,494 - root - INFO - Step 5070: lr=5.02E-06, loss= 1.0939 (max= 1.5897), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:02,494 - root - INFO - Step 5070: lr=5.02E-06, loss= 1.0939 (max= 1.5897), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:02,495 - root - INFO - Step 5070: lr=5.02E-06, loss= 1.0939 (max= 1.5897), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:02,495 - root - INFO - Step 5070: lr=5.02E-06, loss= 1.0939 (max= 1.5897), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:34,377 - root - INFO - Step 5080: lr=5.01E-06, loss= 1.1107 (max= 1.6454), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:34,377 - root - INFO - Step 5080: lr=5.01E-06, loss= 1.1107 (max= 1.6454), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:34,378 - root - INFO - Step 5080: lr=5.01E-06, loss= 1.1107 (max= 1.6454), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:34,378 - root - INFO - Step 5080: lr=5.01E-06, loss= 1.1107 (max= 1.6454), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:34,378 - root - INFO - Step 5080: lr=5.01E-06, loss= 1.1107 (max= 1.6454), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:34,378 - root - INFO - Step 5080: lr=5.01E-06, loss= 1.1107 (max= 1.6454), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:34,378 - root - INFO - Step 5080: lr=5.01E-06, loss= 1.1107 (max= 1.6454), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:36:34,378 - root - INFO - Step 5080: lr=5.01E-06, loss= 1.1107 (max= 1.6454), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:06,163 - root - INFO - Step 5090: lr=5.01E-06, loss= 1.0963 (max= 1.5427), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:06,163 - root - INFO - Step 5090: lr=5.01E-06, loss= 1.0963 (max= 1.5427), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:06,163 - root - INFO - Step 5090: lr=5.01E-06, loss= 1.0963 (max= 1.5427), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:06,163 - root - INFO - Step 5090: lr=5.01E-06, loss= 1.0963 (max= 1.5427), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:06,163 - root - INFO - Step 5090: lr=5.01E-06, loss= 1.0963 (max= 1.5427), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:06,163 - root - INFO - Step 5090: lr=5.01E-06, loss= 1.0963 (max= 1.5427), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:06,163 - root - INFO - Step 5090: lr=5.01E-06, loss= 1.0963 (max= 1.5427), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:06,163 - root - INFO - Step 5090: lr=5.01E-06, loss= 1.0963 (max= 1.5427), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:37,976 - root - INFO - Step 5100: lr=5.00E-06, loss= 1.1155 (max= 1.5640), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:37,976 - root - INFO - Step 5100: lr=5.00E-06, loss= 1.1155 (max= 1.5640), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:37,977 - root - INFO - Step 5100: lr=5.00E-06, loss= 1.1155 (max= 1.5640), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:37,977 - root - INFO - Step 5100: lr=5.00E-06, loss= 1.1155 (max= 1.5640), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:37,977 - root - INFO - Step 5100: lr=5.00E-06, loss= 1.1155 (max= 1.5640), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:37,977 - root - INFO - Step 5100: lr=5.00E-06, loss= 1.1155 (max= 1.5640), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:37,977 - root - INFO - Step 5100: lr=5.00E-06, loss= 1.1155 (max= 1.5640), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:37:37,977 - root - INFO - Step 5100: lr=5.00E-06, loss= 1.1155 (max= 1.5640), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:09,850 - root - INFO - Step 5110: lr=5.00E-06, loss= 1.0759 (max= 1.5127), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:09,851 - root - INFO - Step 5110: lr=5.00E-06, loss= 1.0759 (max= 1.5127), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:09,851 - root - INFO - Step 5110: lr=5.00E-06, loss= 1.0759 (max= 1.5127), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:09,851 - root - INFO - Step 5110: lr=5.00E-06, loss= 1.0759 (max= 1.5127), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:09,851 - root - INFO - Step 5110: lr=5.00E-06, loss= 1.0759 (max= 1.5127), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:09,851 - root - INFO - Step 5110: lr=5.00E-06, loss= 1.0759 (max= 1.5127), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:09,851 - root - INFO - Step 5110: lr=5.00E-06, loss= 1.0759 (max= 1.5127), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:09,851 - root - INFO - Step 5110: lr=5.00E-06, loss= 1.0759 (max= 1.5127), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:41,762 - root - INFO - Step 5120: lr=4.99E-06, loss= 1.1088 (max= 1.4584), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:41,762 - root - INFO - Step 5120: lr=4.99E-06, loss= 1.1088 (max= 1.4584), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:41,762 - root - INFO - Step 5120: lr=4.99E-06, loss= 1.1088 (max= 1.4584), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:41,762 - root - INFO - Step 5120: lr=4.99E-06, loss= 1.1088 (max= 1.4584), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:41,762 - root - INFO - Step 5120: lr=4.99E-06, loss= 1.1088 (max= 1.4584), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:41,762 - root - INFO - Step 5120: lr=4.99E-06, loss= 1.1088 (max= 1.4584), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:41,762 - root - INFO - Step 5120: lr=4.99E-06, loss= 1.1088 (max= 1.4584), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:38:41,762 - root - INFO - Step 5120: lr=4.99E-06, loss= 1.1088 (max= 1.4584), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:13,704 - root - INFO - Step 5130: lr=4.99E-06, loss= 1.0895 (max= 1.5569), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:13,704 - root - INFO - Step 5130: lr=4.99E-06, loss= 1.0895 (max= 1.5569), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:13,705 - root - INFO - Step 5130: lr=4.99E-06, loss= 1.0895 (max= 1.5569), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:13,705 - root - INFO - Step 5130: lr=4.99E-06, loss= 1.0895 (max= 1.5569), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:13,705 - root - INFO - Step 5130: lr=4.99E-06, loss= 1.0895 (max= 1.5569), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:13,705 - root - INFO - Step 5130: lr=4.99E-06, loss= 1.0895 (max= 1.5569), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:13,705 - root - INFO - Step 5130: lr=4.99E-06, loss= 1.0895 (max= 1.5569), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:13,705 - root - INFO - Step 5130: lr=4.99E-06, loss= 1.0895 (max= 1.5569), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:45,515 - root - INFO - Step 5140: lr=4.98E-06, loss= 1.1141 (max= 1.5172), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:45,515 - root - INFO - Step 5140: lr=4.98E-06, loss= 1.1141 (max= 1.5172), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:45,515 - root - INFO - Step 5140: lr=4.98E-06, loss= 1.1141 (max= 1.5172), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:45,515 - root - INFO - Step 5140: lr=4.98E-06, loss= 1.1141 (max= 1.5172), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:45,515 - root - INFO - Step 5140: lr=4.98E-06, loss= 1.1141 (max= 1.5172), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:45,515 - root - INFO - Step 5140: lr=4.98E-06, loss= 1.1141 (max= 1.5172), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:45,516 - root - INFO - Step 5140: lr=4.98E-06, loss= 1.1141 (max= 1.5172), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:39:45,516 - root - INFO - Step 5140: lr=4.98E-06, loss= 1.1141 (max= 1.5172), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:17,388 - root - INFO - Step 5150: lr=4.98E-06, loss= 1.0903 (max= 1.7562), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:17,388 - root - INFO - Step 5150: lr=4.98E-06, loss= 1.0903 (max= 1.7562), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:17,388 - root - INFO - Step 5150: lr=4.98E-06, loss= 1.0903 (max= 1.7562), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:17,388 - root - INFO - Step 5150: lr=4.98E-06, loss= 1.0903 (max= 1.7562), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:17,388 - root - INFO - Step 5150: lr=4.98E-06, loss= 1.0903 (max= 1.7562), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:17,388 - root - INFO - Step 5150: lr=4.98E-06, loss= 1.0903 (max= 1.7562), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:17,388 - root - INFO - Step 5150: lr=4.98E-06, loss= 1.0903 (max= 1.7562), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:17,389 - root - INFO - Step 5150: lr=4.98E-06, loss= 1.0903 (max= 1.7562), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:49,204 - root - INFO - Step 5160: lr=4.97E-06, loss= 1.1018 (max= 1.5430), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:49,204 - root - INFO - Step 5160: lr=4.97E-06, loss= 1.1018 (max= 1.5430), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:49,204 - root - INFO - Step 5160: lr=4.97E-06, loss= 1.1018 (max= 1.5430), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:49,204 - root - INFO - Step 5160: lr=4.97E-06, loss= 1.1018 (max= 1.5430), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:49,204 - root - INFO - Step 5160: lr=4.97E-06, loss= 1.1018 (max= 1.5430), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:49,204 - root - INFO - Step 5160: lr=4.97E-06, loss= 1.1018 (max= 1.5430), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:49,204 - root - INFO - Step 5160: lr=4.97E-06, loss= 1.1018 (max= 1.5430), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:40:49,204 - root - INFO - Step 5160: lr=4.97E-06, loss= 1.1018 (max= 1.5430), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:21,067 - root - INFO - Step 5170: lr=4.97E-06, loss= 1.1002 (max= 1.6867), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:21,067 - root - INFO - Step 5170: lr=4.97E-06, loss= 1.1002 (max= 1.6867), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:21,067 - root - INFO - Step 5170: lr=4.97E-06, loss= 1.1002 (max= 1.6867), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:21,067 - root - INFO - Step 5170: lr=4.97E-06, loss= 1.1002 (max= 1.6867), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:21,067 - root - INFO - Step 5170: lr=4.97E-06, loss= 1.1002 (max= 1.6867), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:21,067 - root - INFO - Step 5170: lr=4.97E-06, loss= 1.1002 (max= 1.6867), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:21,067 - root - INFO - Step 5170: lr=4.97E-06, loss= 1.1002 (max= 1.6867), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:21,068 - root - INFO - Step 5170: lr=4.97E-06, loss= 1.1002 (max= 1.6867), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:52,952 - root - INFO - Step 5180: lr=4.96E-06, loss= 1.0959 (max= 1.5774), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:52,952 - root - INFO - Step 5180: lr=4.96E-06, loss= 1.0959 (max= 1.5774), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:52,952 - root - INFO - Step 5180: lr=4.96E-06, loss= 1.0959 (max= 1.5774), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:52,952 - root - INFO - Step 5180: lr=4.96E-06, loss= 1.0959 (max= 1.5774), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:52,952 - root - INFO - Step 5180: lr=4.96E-06, loss= 1.0959 (max= 1.5774), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:52,952 - root - INFO - Step 5180: lr=4.96E-06, loss= 1.0959 (max= 1.5774), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:52,952 - root - INFO - Step 5180: lr=4.96E-06, loss= 1.0959 (max= 1.5774), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:41:52,953 - root - INFO - Step 5180: lr=4.96E-06, loss= 1.0959 (max= 1.5774), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:24,855 - root - INFO - Step 5190: lr=4.95E-06, loss= 1.0791 (max= 1.5360), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:24,855 - root - INFO - Step 5190: lr=4.95E-06, loss= 1.0791 (max= 1.5360), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:24,855 - root - INFO - Step 5190: lr=4.95E-06, loss= 1.0791 (max= 1.5360), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:24,855 - root - INFO - Step 5190: lr=4.95E-06, loss= 1.0791 (max= 1.5360), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:24,855 - root - INFO - Step 5190: lr=4.95E-06, loss= 1.0791 (max= 1.5360), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:24,855 - root - INFO - Step 5190: lr=4.95E-06, loss= 1.0791 (max= 1.5360), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:24,855 - root - INFO - Step 5190: lr=4.95E-06, loss= 1.0791 (max= 1.5360), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:24,855 - root - INFO - Step 5190: lr=4.95E-06, loss= 1.0791 (max= 1.5360), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:56,659 - root - INFO - Step 5200: lr=4.95E-06, loss= 1.0915 (max= 1.5936), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:56,659 - root - INFO - Step 5200: lr=4.95E-06, loss= 1.0915 (max= 1.5936), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:56,659 - root - INFO - Step 5200: lr=4.95E-06, loss= 1.0915 (max= 1.5936), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:56,659 - root - INFO - Step 5200: lr=4.95E-06, loss= 1.0915 (max= 1.5936), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:56,659 - root - INFO - Step 5200: lr=4.95E-06, loss= 1.0915 (max= 1.5936), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:56,659 - root - INFO - Step 5200: lr=4.95E-06, loss= 1.0915 (max= 1.5936), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:56,659 - root - INFO - Step 5200: lr=4.95E-06, loss= 1.0915 (max= 1.5936), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:42:56,659 - root - INFO - Step 5200: lr=4.95E-06, loss= 1.0915 (max= 1.5936), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:43:28,503 - root - INFO - Step 5210: lr=4.94E-06, loss= 1.0978 (max= 1.4628), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:43:28,503 - root - INFO - Step 5210: lr=4.94E-06, loss= 1.0978 (max= 1.4628), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:43:28,503 - root - INFO - Step 5210: lr=4.94E-06, loss= 1.0978 (max= 1.4628), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:43:28,503 - root - INFO - Step 5210: lr=4.94E-06, loss= 1.0978 (max= 1.4628), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:43:28,503 - root - INFO - Step 5210: lr=4.94E-06, loss= 1.0978 (max= 1.4628), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:43:28,503 - root - INFO - Step 5210: lr=4.94E-06, loss= 1.0978 (max= 1.4628), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:43:28,503 - root - INFO - Step 5210: lr=4.94E-06, loss= 1.0978 (max= 1.4628), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:43:28,503 - root - INFO - Step 5210: lr=4.94E-06, loss= 1.0978 (max= 1.4628), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:00,411 - root - INFO - Step 5220: lr=4.94E-06, loss= 1.1087 (max= 1.6295), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:00,411 - root - INFO - Step 5220: lr=4.94E-06, loss= 1.1087 (max= 1.6295), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:00,411 - root - INFO - Step 5220: lr=4.94E-06, loss= 1.1087 (max= 1.6295), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:00,411 - root - INFO - Step 5220: lr=4.94E-06, loss= 1.1087 (max= 1.6295), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:00,411 - root - INFO - Step 5220: lr=4.94E-06, loss= 1.1087 (max= 1.6295), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:00,411 - root - INFO - Step 5220: lr=4.94E-06, loss= 1.1087 (max= 1.6295), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:00,411 - root - INFO - Step 5220: lr=4.94E-06, loss= 1.1087 (max= 1.6295), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:00,411 - root - INFO - Step 5220: lr=4.94E-06, loss= 1.1087 (max= 1.6295), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:32,278 - root - INFO - Step 5230: lr=4.93E-06, loss= 1.1233 (max= 1.5318), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:32,278 - root - INFO - Step 5230: lr=4.93E-06, loss= 1.1233 (max= 1.5318), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:32,279 - root - INFO - Step 5230: lr=4.93E-06, loss= 1.1233 (max= 1.5318), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:32,279 - root - INFO - Step 5230: lr=4.93E-06, loss= 1.1233 (max= 1.5318), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:32,279 - root - INFO - Step 5230: lr=4.93E-06, loss= 1.1233 (max= 1.5318), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:32,279 - root - INFO - Step 5230: lr=4.93E-06, loss= 1.1233 (max= 1.5318), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:32,279 - root - INFO - Step 5230: lr=4.93E-06, loss= 1.1233 (max= 1.5318), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:44:32,280 - root - INFO - Step 5230: lr=4.93E-06, loss= 1.1233 (max= 1.5318), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:04,220 - root - INFO - Step 5240: lr=4.93E-06, loss= 1.0871 (max= 1.6330), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:04,220 - root - INFO - Step 5240: lr=4.93E-06, loss= 1.0871 (max= 1.6330), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:04,220 - root - INFO - Step 5240: lr=4.93E-06, loss= 1.0871 (max= 1.6330), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:04,220 - root - INFO - Step 5240: lr=4.93E-06, loss= 1.0871 (max= 1.6330), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:04,220 - root - INFO - Step 5240: lr=4.93E-06, loss= 1.0871 (max= 1.6330), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:04,220 - root - INFO - Step 5240: lr=4.93E-06, loss= 1.0871 (max= 1.6330), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:04,220 - root - INFO - Step 5240: lr=4.93E-06, loss= 1.0871 (max= 1.6330), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:04,220 - root - INFO - Step 5240: lr=4.93E-06, loss= 1.0871 (max= 1.6330), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:36,041 - root - INFO - Step 5250: lr=4.92E-06, loss= 1.1053 (max= 1.5801), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:36,041 - root - INFO - Step 5250: lr=4.92E-06, loss= 1.1053 (max= 1.5801), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:36,041 - root - INFO - Step 5250: lr=4.92E-06, loss= 1.1053 (max= 1.5801), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:36,042 - root - INFO - Step 5250: lr=4.92E-06, loss= 1.1053 (max= 1.5801), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:36,042 - root - INFO - Step 5250: lr=4.92E-06, loss= 1.1053 (max= 1.5801), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:36,042 - root - INFO - Step 5250: lr=4.92E-06, loss= 1.1053 (max= 1.5801), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:36,042 - root - INFO - Step 5250: lr=4.92E-06, loss= 1.1053 (max= 1.5801), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:45:36,042 - root - INFO - Step 5250: lr=4.92E-06, loss= 1.1053 (max= 1.5801), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:07,836 - root - INFO - Step 5260: lr=4.92E-06, loss= 1.1050 (max= 1.7944), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:07,836 - root - INFO - Step 5260: lr=4.92E-06, loss= 1.1050 (max= 1.7944), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:07,836 - root - INFO - Step 5260: lr=4.92E-06, loss= 1.1050 (max= 1.7944), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:07,836 - root - INFO - Step 5260: lr=4.92E-06, loss= 1.1050 (max= 1.7944), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:07,836 - root - INFO - Step 5260: lr=4.92E-06, loss= 1.1050 (max= 1.7944), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:07,837 - root - INFO - Step 5260: lr=4.92E-06, loss= 1.1050 (max= 1.7944), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:07,837 - root - INFO - Step 5260: lr=4.92E-06, loss= 1.1050 (max= 1.7944), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:07,837 - root - INFO - Step 5260: lr=4.92E-06, loss= 1.1050 (max= 1.7944), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:39,639 - root - INFO - Step 5270: lr=4.91E-06, loss= 1.0979 (max= 1.5230), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:39,639 - root - INFO - Step 5270: lr=4.91E-06, loss= 1.0979 (max= 1.5230), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:39,639 - root - INFO - Step 5270: lr=4.91E-06, loss= 1.0979 (max= 1.5230), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:39,639 - root - INFO - Step 5270: lr=4.91E-06, loss= 1.0979 (max= 1.5230), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:39,639 - root - INFO - Step 5270: lr=4.91E-06, loss= 1.0979 (max= 1.5230), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:39,639 - root - INFO - Step 5270: lr=4.91E-06, loss= 1.0979 (max= 1.5230), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:39,639 - root - INFO - Step 5270: lr=4.91E-06, loss= 1.0979 (max= 1.5230), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:46:39,639 - root - INFO - Step 5270: lr=4.91E-06, loss= 1.0979 (max= 1.5230), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:11,504 - root - INFO - Step 5280: lr=4.91E-06, loss= 1.1067 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:11,504 - root - INFO - Step 5280: lr=4.91E-06, loss= 1.1067 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:11,504 - root - INFO - Step 5280: lr=4.91E-06, loss= 1.1067 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:11,504 - root - INFO - Step 5280: lr=4.91E-06, loss= 1.1067 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:11,504 - root - INFO - Step 5280: lr=4.91E-06, loss= 1.1067 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:11,504 - root - INFO - Step 5280: lr=4.91E-06, loss= 1.1067 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:11,504 - root - INFO - Step 5280: lr=4.91E-06, loss= 1.1067 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:11,504 - root - INFO - Step 5280: lr=4.91E-06, loss= 1.1067 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:43,403 - root - INFO - Step 5290: lr=4.90E-06, loss= 1.0860 (max= 1.5583), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:43,403 - root - INFO - Step 5290: lr=4.90E-06, loss= 1.0860 (max= 1.5583), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:43,403 - root - INFO - Step 5290: lr=4.90E-06, loss= 1.0860 (max= 1.5583), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:43,403 - root - INFO - Step 5290: lr=4.90E-06, loss= 1.0860 (max= 1.5583), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:43,403 - root - INFO - Step 5290: lr=4.90E-06, loss= 1.0860 (max= 1.5583), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:43,403 - root - INFO - Step 5290: lr=4.90E-06, loss= 1.0860 (max= 1.5583), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:43,403 - root - INFO - Step 5290: lr=4.90E-06, loss= 1.0860 (max= 1.5583), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:47:43,403 - root - INFO - Step 5290: lr=4.90E-06, loss= 1.0860 (max= 1.5583), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:15,197 - root - INFO - Step 5300: lr=4.90E-06, loss= 1.0733 (max= 1.6469), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:15,197 - root - INFO - Step 5300: lr=4.90E-06, loss= 1.0733 (max= 1.6469), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:15,198 - root - INFO - Step 5300: lr=4.90E-06, loss= 1.0733 (max= 1.6469), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:15,198 - root - INFO - Step 5300: lr=4.90E-06, loss= 1.0733 (max= 1.6469), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:15,198 - root - INFO - Step 5300: lr=4.90E-06, loss= 1.0733 (max= 1.6469), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:15,198 - root - INFO - Step 5300: lr=4.90E-06, loss= 1.0733 (max= 1.6469), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:15,198 - root - INFO - Step 5300: lr=4.90E-06, loss= 1.0733 (max= 1.6469), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:15,198 - root - INFO - Step 5300: lr=4.90E-06, loss= 1.0733 (max= 1.6469), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:47,064 - root - INFO - Step 5310: lr=4.89E-06, loss= 1.0988 (max= 1.6231), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:47,064 - root - INFO - Step 5310: lr=4.89E-06, loss= 1.0988 (max= 1.6231), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:47,064 - root - INFO - Step 5310: lr=4.89E-06, loss= 1.0988 (max= 1.6231), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:47,064 - root - INFO - Step 5310: lr=4.89E-06, loss= 1.0988 (max= 1.6231), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:47,064 - root - INFO - Step 5310: lr=4.89E-06, loss= 1.0988 (max= 1.6231), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:47,064 - root - INFO - Step 5310: lr=4.89E-06, loss= 1.0988 (max= 1.6231), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:47,064 - root - INFO - Step 5310: lr=4.89E-06, loss= 1.0988 (max= 1.6231), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:48:47,064 - root - INFO - Step 5310: lr=4.89E-06, loss= 1.0988 (max= 1.6231), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:18,874 - root - INFO - Step 5320: lr=4.89E-06, loss= 1.0870 (max= 1.9369), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:18,874 - root - INFO - Step 5320: lr=4.89E-06, loss= 1.0870 (max= 1.9369), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:18,874 - root - INFO - Step 5320: lr=4.89E-06, loss= 1.0870 (max= 1.9369), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:18,874 - root - INFO - Step 5320: lr=4.89E-06, loss= 1.0870 (max= 1.9369), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:18,874 - root - INFO - Step 5320: lr=4.89E-06, loss= 1.0870 (max= 1.9369), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:18,874 - root - INFO - Step 5320: lr=4.89E-06, loss= 1.0870 (max= 1.9369), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:18,874 - root - INFO - Step 5320: lr=4.89E-06, loss= 1.0870 (max= 1.9369), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:18,874 - root - INFO - Step 5320: lr=4.89E-06, loss= 1.0870 (max= 1.9369), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:50,813 - root - INFO - Step 5330: lr=4.88E-06, loss= 1.0787 (max= 1.5354), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:50,813 - root - INFO - Step 5330: lr=4.88E-06, loss= 1.0787 (max= 1.5354), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:50,813 - root - INFO - Step 5330: lr=4.88E-06, loss= 1.0787 (max= 1.5354), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:50,813 - root - INFO - Step 5330: lr=4.88E-06, loss= 1.0787 (max= 1.5354), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:50,813 - root - INFO - Step 5330: lr=4.88E-06, loss= 1.0787 (max= 1.5354), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:50,813 - root - INFO - Step 5330: lr=4.88E-06, loss= 1.0787 (max= 1.5354), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:50,813 - root - INFO - Step 5330: lr=4.88E-06, loss= 1.0787 (max= 1.5354), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:49:50,813 - root - INFO - Step 5330: lr=4.88E-06, loss= 1.0787 (max= 1.5354), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:22,657 - root - INFO - Step 5340: lr=4.87E-06, loss= 1.0954 (max= 1.5100), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:22,657 - root - INFO - Step 5340: lr=4.87E-06, loss= 1.0954 (max= 1.5100), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:22,657 - root - INFO - Step 5340: lr=4.87E-06, loss= 1.0954 (max= 1.5100), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:22,657 - root - INFO - Step 5340: lr=4.87E-06, loss= 1.0954 (max= 1.5100), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:22,657 - root - INFO - Step 5340: lr=4.87E-06, loss= 1.0954 (max= 1.5100), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:22,657 - root - INFO - Step 5340: lr=4.87E-06, loss= 1.0954 (max= 1.5100), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:22,657 - root - INFO - Step 5340: lr=4.87E-06, loss= 1.0954 (max= 1.5100), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:22,657 - root - INFO - Step 5340: lr=4.87E-06, loss= 1.0954 (max= 1.5100), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:54,572 - root - INFO - Step 5350: lr=4.87E-06, loss= 1.0813 (max= 1.4816), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:54,572 - root - INFO - Step 5350: lr=4.87E-06, loss= 1.0813 (max= 1.4816), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:54,572 - root - INFO - Step 5350: lr=4.87E-06, loss= 1.0813 (max= 1.4816), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:54,572 - root - INFO - Step 5350: lr=4.87E-06, loss= 1.0813 (max= 1.4816), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:54,572 - root - INFO - Step 5350: lr=4.87E-06, loss= 1.0813 (max= 1.4816), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:54,572 - root - INFO - Step 5350: lr=4.87E-06, loss= 1.0813 (max= 1.4816), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:54,572 - root - INFO - Step 5350: lr=4.87E-06, loss= 1.0813 (max= 1.4816), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:50:54,573 - root - INFO - Step 5350: lr=4.87E-06, loss= 1.0813 (max= 1.4816), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:20,620 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:347816 2025-10-26 12:51:26,415 - root - INFO - Step 5360: lr=4.86E-06, loss= 1.0805 (max= 1.4564), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:26,415 - root - INFO - Step 5360: lr=4.86E-06, loss= 1.0805 (max= 1.4564), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:26,415 - root - INFO - Step 5360: lr=4.86E-06, loss= 1.0805 (max= 1.4564), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:26,415 - root - INFO - Step 5360: lr=4.86E-06, loss= 1.0805 (max= 1.4564), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:26,415 - root - INFO - Step 5360: lr=4.86E-06, loss= 1.0805 (max= 1.4564), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:26,415 - root - INFO - Step 5360: lr=4.86E-06, loss= 1.0805 (max= 1.4564), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:26,415 - root - INFO - Step 5360: lr=4.86E-06, loss= 1.0805 (max= 1.4564), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:26,416 - root - INFO - Step 5360: lr=4.86E-06, loss= 1.0805 (max= 1.4564), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:58,252 - root - INFO - Step 5370: lr=4.86E-06, loss= 1.0964 (max= 1.5534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:58,252 - root - INFO - Step 5370: lr=4.86E-06, loss= 1.0964 (max= 1.5534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:58,252 - root - INFO - Step 5370: lr=4.86E-06, loss= 1.0964 (max= 1.5534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:58,252 - root - INFO - Step 5370: lr=4.86E-06, loss= 1.0964 (max= 1.5534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:58,252 - root - INFO - Step 5370: lr=4.86E-06, loss= 1.0964 (max= 1.5534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:58,252 - root - INFO - Step 5370: lr=4.86E-06, loss= 1.0964 (max= 1.5534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:58,252 - root - INFO - Step 5370: lr=4.86E-06, loss= 1.0964 (max= 1.5534), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:51:58,252 - root - INFO - Step 5370: lr=4.86E-06, loss= 1.0964 (max= 1.5534), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:52:30,192 - root - INFO - Step 5380: lr=4.85E-06, loss= 1.0709 (max= 1.5966), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:52:30,193 - root - INFO - Step 5380: lr=4.85E-06, loss= 1.0709 (max= 1.5966), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:52:30,193 - root - INFO - Step 5380: lr=4.85E-06, loss= 1.0709 (max= 1.5966), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:52:30,193 - root - INFO - Step 5380: lr=4.85E-06, loss= 1.0709 (max= 1.5966), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:52:30,193 - root - INFO - Step 5380: lr=4.85E-06, loss= 1.0709 (max= 1.5966), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:52:30,193 - root - INFO - Step 5380: lr=4.85E-06, loss= 1.0709 (max= 1.5966), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:52:30,193 - root - INFO - Step 5380: lr=4.85E-06, loss= 1.0709 (max= 1.5966), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:52:30,193 - root - INFO - Step 5380: lr=4.85E-06, loss= 1.0709 (max= 1.5966), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:01,990 - root - INFO - Step 5390: lr=4.85E-06, loss= 1.0899 (max= 1.6676), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:01,990 - root - INFO - Step 5390: lr=4.85E-06, loss= 1.0899 (max= 1.6676), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:01,990 - root - INFO - Step 5390: lr=4.85E-06, loss= 1.0899 (max= 1.6676), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:01,991 - root - INFO - Step 5390: lr=4.85E-06, loss= 1.0899 (max= 1.6676), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:01,991 - root - INFO - Step 5390: lr=4.85E-06, loss= 1.0899 (max= 1.6676), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:01,991 - root - INFO - Step 5390: lr=4.85E-06, loss= 1.0899 (max= 1.6676), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:01,991 - root - INFO - Step 5390: lr=4.85E-06, loss= 1.0899 (max= 1.6676), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:01,991 - root - INFO - Step 5390: lr=4.85E-06, loss= 1.0899 (max= 1.6676), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:15,268 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5458622 2025-10-26 12:53:34,016 - root - INFO - Step 5400: lr=4.84E-06, loss= 1.0811 (max= 1.4859), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:34,016 - root - INFO - Step 5400: lr=4.84E-06, loss= 1.0811 (max= 1.4859), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:34,016 - root - INFO - Step 5400: lr=4.84E-06, loss= 1.0811 (max= 1.4859), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:34,016 - root - INFO - Step 5400: lr=4.84E-06, loss= 1.0811 (max= 1.4859), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:34,016 - root - INFO - Step 5400: lr=4.84E-06, loss= 1.0811 (max= 1.4859), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:34,016 - root - INFO - Step 5400: lr=4.84E-06, loss= 1.0811 (max= 1.4859), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:34,016 - root - INFO - Step 5400: lr=4.84E-06, loss= 1.0811 (max= 1.4859), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:53:34,017 - root - INFO - Step 5400: lr=4.84E-06, loss= 1.0811 (max= 1.4859), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:05,896 - root - INFO - Step 5410: lr=4.84E-06, loss= 1.0702 (max= 1.4180), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:05,897 - root - INFO - Step 5410: lr=4.84E-06, loss= 1.0702 (max= 1.4180), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:05,897 - root - INFO - Step 5410: lr=4.84E-06, loss= 1.0702 (max= 1.4180), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:05,897 - root - INFO - Step 5410: lr=4.84E-06, loss= 1.0702 (max= 1.4180), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:05,897 - root - INFO - Step 5410: lr=4.84E-06, loss= 1.0702 (max= 1.4180), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:05,897 - root - INFO - Step 5410: lr=4.84E-06, loss= 1.0702 (max= 1.4180), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:05,897 - root - INFO - Step 5410: lr=4.84E-06, loss= 1.0702 (max= 1.4180), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:05,897 - root - INFO - Step 5410: lr=4.84E-06, loss= 1.0702 (max= 1.4180), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:37,728 - root - INFO - Step 5420: lr=4.83E-06, loss= 1.0782 (max= 1.5717), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:37,728 - root - INFO - Step 5420: lr=4.83E-06, loss= 1.0782 (max= 1.5717), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:37,728 - root - INFO - Step 5420: lr=4.83E-06, loss= 1.0782 (max= 1.5717), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:37,728 - root - INFO - Step 5420: lr=4.83E-06, loss= 1.0782 (max= 1.5717), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:37,728 - root - INFO - Step 5420: lr=4.83E-06, loss= 1.0782 (max= 1.5717), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:37,728 - root - INFO - Step 5420: lr=4.83E-06, loss= 1.0782 (max= 1.5717), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:37,728 - root - INFO - Step 5420: lr=4.83E-06, loss= 1.0782 (max= 1.5717), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:54:37,728 - root - INFO - Step 5420: lr=4.83E-06, loss= 1.0782 (max= 1.5717), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:09,527 - root - INFO - Step 5430: lr=4.83E-06, loss= 1.0762 (max= 1.5585), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:09,527 - root - INFO - Step 5430: lr=4.83E-06, loss= 1.0762 (max= 1.5585), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:09,527 - root - INFO - Step 5430: lr=4.83E-06, loss= 1.0762 (max= 1.5585), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:09,527 - root - INFO - Step 5430: lr=4.83E-06, loss= 1.0762 (max= 1.5585), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:09,527 - root - INFO - Step 5430: lr=4.83E-06, loss= 1.0762 (max= 1.5585), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:09,527 - root - INFO - Step 5430: lr=4.83E-06, loss= 1.0762 (max= 1.5585), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:09,527 - root - INFO - Step 5430: lr=4.83E-06, loss= 1.0762 (max= 1.5585), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:09,527 - root - INFO - Step 5430: lr=4.83E-06, loss= 1.0762 (max= 1.5585), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:41,356 - root - INFO - Step 5440: lr=4.82E-06, loss= 1.0692 (max= 1.6091), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:41,356 - root - INFO - Step 5440: lr=4.82E-06, loss= 1.0692 (max= 1.6091), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:41,356 - root - INFO - Step 5440: lr=4.82E-06, loss= 1.0692 (max= 1.6091), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:41,357 - root - INFO - Step 5440: lr=4.82E-06, loss= 1.0692 (max= 1.6091), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:41,357 - root - INFO - Step 5440: lr=4.82E-06, loss= 1.0692 (max= 1.6091), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:41,357 - root - INFO - Step 5440: lr=4.82E-06, loss= 1.0692 (max= 1.6091), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:41,357 - root - INFO - Step 5440: lr=4.82E-06, loss= 1.0692 (max= 1.6091), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:55:41,357 - root - INFO - Step 5440: lr=4.82E-06, loss= 1.0692 (max= 1.6091), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:13,172 - root - INFO - Step 5450: lr=4.82E-06, loss= 1.1068 (max= 1.4918), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:13,172 - root - INFO - Step 5450: lr=4.82E-06, loss= 1.1068 (max= 1.4918), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:13,172 - root - INFO - Step 5450: lr=4.82E-06, loss= 1.1068 (max= 1.4918), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:13,172 - root - INFO - Step 5450: lr=4.82E-06, loss= 1.1068 (max= 1.4918), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:13,172 - root - INFO - Step 5450: lr=4.82E-06, loss= 1.1068 (max= 1.4918), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:13,172 - root - INFO - Step 5450: lr=4.82E-06, loss= 1.1068 (max= 1.4918), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:13,172 - root - INFO - Step 5450: lr=4.82E-06, loss= 1.1068 (max= 1.4918), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:13,172 - root - INFO - Step 5450: lr=4.82E-06, loss= 1.1068 (max= 1.4918), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:44,991 - root - INFO - Step 5460: lr=4.81E-06, loss= 1.0905 (max= 1.4850), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:44,991 - root - INFO - Step 5460: lr=4.81E-06, loss= 1.0905 (max= 1.4850), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:44,991 - root - INFO - Step 5460: lr=4.81E-06, loss= 1.0905 (max= 1.4850), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:44,992 - root - INFO - Step 5460: lr=4.81E-06, loss= 1.0905 (max= 1.4850), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:44,992 - root - INFO - Step 5460: lr=4.81E-06, loss= 1.0905 (max= 1.4850), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:44,992 - root - INFO - Step 5460: lr=4.81E-06, loss= 1.0905 (max= 1.4850), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:44,992 - root - INFO - Step 5460: lr=4.81E-06, loss= 1.0905 (max= 1.4850), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:56:44,992 - root - INFO - Step 5460: lr=4.81E-06, loss= 1.0905 (max= 1.4850), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:16,899 - root - INFO - Step 5470: lr=4.81E-06, loss= 1.0823 (max= 1.9793), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:16,899 - root - INFO - Step 5470: lr=4.81E-06, loss= 1.0823 (max= 1.9793), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:16,899 - root - INFO - Step 5470: lr=4.81E-06, loss= 1.0823 (max= 1.9793), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:16,900 - root - INFO - Step 5470: lr=4.81E-06, loss= 1.0823 (max= 1.9793), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:16,900 - root - INFO - Step 5470: lr=4.81E-06, loss= 1.0823 (max= 1.9793), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:16,900 - root - INFO - Step 5470: lr=4.81E-06, loss= 1.0823 (max= 1.9793), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:16,900 - root - INFO - Step 5470: lr=4.81E-06, loss= 1.0823 (max= 1.9793), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:16,900 - root - INFO - Step 5470: lr=4.81E-06, loss= 1.0823 (max= 1.9793), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:48,780 - root - INFO - Step 5480: lr=4.80E-06, loss= 1.0969 (max= 1.4567), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:48,780 - root - INFO - Step 5480: lr=4.80E-06, loss= 1.0969 (max= 1.4567), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:48,780 - root - INFO - Step 5480: lr=4.80E-06, loss= 1.0969 (max= 1.4567), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:48,780 - root - INFO - Step 5480: lr=4.80E-06, loss= 1.0969 (max= 1.4567), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:48,781 - root - INFO - Step 5480: lr=4.80E-06, loss= 1.0969 (max= 1.4567), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:48,781 - root - INFO - Step 5480: lr=4.80E-06, loss= 1.0969 (max= 1.4567), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:48,781 - root - INFO - Step 5480: lr=4.80E-06, loss= 1.0969 (max= 1.4567), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:57:48,781 - root - INFO - Step 5480: lr=4.80E-06, loss= 1.0969 (max= 1.4567), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:20,621 - root - INFO - Step 5490: lr=4.80E-06, loss= 1.0879 (max= 1.5816), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:20,621 - root - INFO - Step 5490: lr=4.80E-06, loss= 1.0879 (max= 1.5816), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:20,622 - root - INFO - Step 5490: lr=4.80E-06, loss= 1.0879 (max= 1.5816), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:20,622 - root - INFO - Step 5490: lr=4.80E-06, loss= 1.0879 (max= 1.5816), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:20,622 - root - INFO - Step 5490: lr=4.80E-06, loss= 1.0879 (max= 1.5816), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:20,622 - root - INFO - Step 5490: lr=4.80E-06, loss= 1.0879 (max= 1.5816), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:20,622 - root - INFO - Step 5490: lr=4.80E-06, loss= 1.0879 (max= 1.5816), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:20,622 - root - INFO - Step 5490: lr=4.80E-06, loss= 1.0879 (max= 1.5816), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:52,471 - root - INFO - Step 5500: lr=4.79E-06, loss= 1.0823 (max= 1.6187), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:52,471 - root - INFO - Step 5500: lr=4.79E-06, loss= 1.0823 (max= 1.6187), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:52,471 - root - INFO - Step 5500: lr=4.79E-06, loss= 1.0823 (max= 1.6187), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:52,471 - root - INFO - Step 5500: lr=4.79E-06, loss= 1.0823 (max= 1.6187), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:52,471 - root - INFO - Step 5500: lr=4.79E-06, loss= 1.0823 (max= 1.6187), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:52,471 - root - INFO - Step 5500: lr=4.79E-06, loss= 1.0823 (max= 1.6187), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:52,471 - root - INFO - Step 5500: lr=4.79E-06, loss= 1.0823 (max= 1.6187), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:58:52,471 - root - INFO - Step 5500: lr=4.79E-06, loss= 1.0823 (max= 1.6187), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:24,341 - root - INFO - Step 5510: lr=4.79E-06, loss= 1.0672 (max= 1.6475), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:24,341 - root - INFO - Step 5510: lr=4.79E-06, loss= 1.0672 (max= 1.6475), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:24,341 - root - INFO - Step 5510: lr=4.79E-06, loss= 1.0672 (max= 1.6475), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:24,341 - root - INFO - Step 5510: lr=4.79E-06, loss= 1.0672 (max= 1.6475), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:24,341 - root - INFO - Step 5510: lr=4.79E-06, loss= 1.0672 (max= 1.6475), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:24,341 - root - INFO - Step 5510: lr=4.79E-06, loss= 1.0672 (max= 1.6475), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:24,341 - root - INFO - Step 5510: lr=4.79E-06, loss= 1.0672 (max= 1.6475), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:24,341 - root - INFO - Step 5510: lr=4.79E-06, loss= 1.0672 (max= 1.6475), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:56,169 - root - INFO - Step 5520: lr=4.78E-06, loss= 1.0870 (max= 1.5834), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:56,169 - root - INFO - Step 5520: lr=4.78E-06, loss= 1.0870 (max= 1.5834), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:56,170 - root - INFO - Step 5520: lr=4.78E-06, loss= 1.0870 (max= 1.5834), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:56,170 - root - INFO - Step 5520: lr=4.78E-06, loss= 1.0870 (max= 1.5834), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:56,170 - root - INFO - Step 5520: lr=4.78E-06, loss= 1.0870 (max= 1.5834), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:56,170 - root - INFO - Step 5520: lr=4.78E-06, loss= 1.0870 (max= 1.5834), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:56,170 - root - INFO - Step 5520: lr=4.78E-06, loss= 1.0870 (max= 1.5834), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 12:59:56,170 - root - INFO - Step 5520: lr=4.78E-06, loss= 1.0870 (max= 1.5834), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:27,892 - root - INFO - Step 5530: lr=4.78E-06, loss= 1.0709 (max= 1.5309), tps=20661, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:27,892 - root - INFO - Step 5530: lr=4.78E-06, loss= 1.0709 (max= 1.5309), tps=20661, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:27,892 - root - INFO - Step 5530: lr=4.78E-06, loss= 1.0709 (max= 1.5309), tps=20661, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:27,892 - root - INFO - Step 5530: lr=4.78E-06, loss= 1.0709 (max= 1.5309), tps=20661, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:27,893 - root - INFO - Step 5530: lr=4.78E-06, loss= 1.0709 (max= 1.5309), tps=20661, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:27,893 - root - INFO - Step 5530: lr=4.78E-06, loss= 1.0709 (max= 1.5309), tps=20661, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:27,893 - root - INFO - Step 5530: lr=4.78E-06, loss= 1.0709 (max= 1.5309), tps=20661, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:27,893 - root - INFO - Step 5530: lr=4.78E-06, loss= 1.0709 (max= 1.5309), tps=20661, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:59,739 - root - INFO - Step 5540: lr=4.77E-06, loss= 1.0822 (max= 1.4717), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:59,739 - root - INFO - Step 5540: lr=4.77E-06, loss= 1.0822 (max= 1.4717), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:59,739 - root - INFO - Step 5540: lr=4.77E-06, loss= 1.0822 (max= 1.4717), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:59,739 - root - INFO - Step 5540: lr=4.77E-06, loss= 1.0822 (max= 1.4717), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:59,739 - root - INFO - Step 5540: lr=4.77E-06, loss= 1.0822 (max= 1.4717), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:59,739 - root - INFO - Step 5540: lr=4.77E-06, loss= 1.0822 (max= 1.4717), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:59,739 - root - INFO - Step 5540: lr=4.77E-06, loss= 1.0822 (max= 1.4717), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:00:59,739 - root - INFO - Step 5540: lr=4.77E-06, loss= 1.0822 (max= 1.4717), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:01:31,726 - root - INFO - Step 5550: lr=4.76E-06, loss= 1.0584 (max= 1.5405), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:01:31,726 - root - INFO - Step 5550: lr=4.76E-06, loss= 1.0584 (max= 1.5405), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:01:31,726 - root - INFO - Step 5550: lr=4.76E-06, loss= 1.0584 (max= 1.5405), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:01:31,726 - root - INFO - Step 5550: lr=4.76E-06, loss= 1.0584 (max= 1.5405), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:01:31,726 - root - INFO - Step 5550: lr=4.76E-06, loss= 1.0584 (max= 1.5405), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:01:31,726 - root - INFO - Step 5550: lr=4.76E-06, loss= 1.0584 (max= 1.5405), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:01:31,726 - root - INFO - Step 5550: lr=4.76E-06, loss= 1.0584 (max= 1.5405), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:01:31,726 - root - INFO - Step 5550: lr=4.76E-06, loss= 1.0584 (max= 1.5405), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:03,772 - root - INFO - Step 5560: lr=4.76E-06, loss= 1.0522 (max= 1.5247), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:03,772 - root - INFO - Step 5560: lr=4.76E-06, loss= 1.0522 (max= 1.5247), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:03,772 - root - INFO - Step 5560: lr=4.76E-06, loss= 1.0522 (max= 1.5247), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:03,772 - root - INFO - Step 5560: lr=4.76E-06, loss= 1.0522 (max= 1.5247), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:03,772 - root - INFO - Step 5560: lr=4.76E-06, loss= 1.0522 (max= 1.5247), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:03,772 - root - INFO - Step 5560: lr=4.76E-06, loss= 1.0522 (max= 1.5247), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:03,772 - root - INFO - Step 5560: lr=4.76E-06, loss= 1.0522 (max= 1.5247), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:03,773 - root - INFO - Step 5560: lr=4.76E-06, loss= 1.0522 (max= 1.5247), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:35,589 - root - INFO - Step 5570: lr=4.75E-06, loss= 1.0375 (max= 1.4855), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:35,589 - root - INFO - Step 5570: lr=4.75E-06, loss= 1.0375 (max= 1.4855), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:35,589 - root - INFO - Step 5570: lr=4.75E-06, loss= 1.0375 (max= 1.4855), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:35,589 - root - INFO - Step 5570: lr=4.75E-06, loss= 1.0375 (max= 1.4855), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:35,589 - root - INFO - Step 5570: lr=4.75E-06, loss= 1.0375 (max= 1.4855), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:35,589 - root - INFO - Step 5570: lr=4.75E-06, loss= 1.0375 (max= 1.4855), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:35,589 - root - INFO - Step 5570: lr=4.75E-06, loss= 1.0375 (max= 1.4855), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:02:35,590 - root - INFO - Step 5570: lr=4.75E-06, loss= 1.0375 (max= 1.4855), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:07,440 - root - INFO - Step 5580: lr=4.75E-06, loss= 1.0601 (max= 1.5230), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:07,440 - root - INFO - Step 5580: lr=4.75E-06, loss= 1.0601 (max= 1.5230), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:07,440 - root - INFO - Step 5580: lr=4.75E-06, loss= 1.0601 (max= 1.5230), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:07,441 - root - INFO - Step 5580: lr=4.75E-06, loss= 1.0601 (max= 1.5230), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:07,441 - root - INFO - Step 5580: lr=4.75E-06, loss= 1.0601 (max= 1.5230), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:07,441 - root - INFO - Step 5580: lr=4.75E-06, loss= 1.0601 (max= 1.5230), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:07,441 - root - INFO - Step 5580: lr=4.75E-06, loss= 1.0601 (max= 1.5230), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:07,441 - root - INFO - Step 5580: lr=4.75E-06, loss= 1.0601 (max= 1.5230), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:39,410 - root - INFO - Step 5590: lr=4.74E-06, loss= 1.0799 (max= 1.5147), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:39,410 - root - INFO - Step 5590: lr=4.74E-06, loss= 1.0799 (max= 1.5147), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:39,410 - root - INFO - Step 5590: lr=4.74E-06, loss= 1.0799 (max= 1.5147), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:39,410 - root - INFO - Step 5590: lr=4.74E-06, loss= 1.0799 (max= 1.5147), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:39,410 - root - INFO - Step 5590: lr=4.74E-06, loss= 1.0799 (max= 1.5147), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:39,410 - root - INFO - Step 5590: lr=4.74E-06, loss= 1.0799 (max= 1.5147), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:39,410 - root - INFO - Step 5590: lr=4.74E-06, loss= 1.0799 (max= 1.5147), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:03:39,411 - root - INFO - Step 5590: lr=4.74E-06, loss= 1.0799 (max= 1.5147), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:11,238 - root - INFO - Step 5600: lr=4.74E-06, loss= 1.0779 (max= 1.6304), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:11,238 - root - INFO - Step 5600: lr=4.74E-06, loss= 1.0779 (max= 1.6304), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:11,238 - root - INFO - Step 5600: lr=4.74E-06, loss= 1.0779 (max= 1.6304), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:11,238 - root - INFO - Step 5600: lr=4.74E-06, loss= 1.0779 (max= 1.6304), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:11,238 - root - INFO - Step 5600: lr=4.74E-06, loss= 1.0779 (max= 1.6304), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:11,238 - root - INFO - Step 5600: lr=4.74E-06, loss= 1.0779 (max= 1.6304), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:11,239 - root - INFO - Step 5600: lr=4.74E-06, loss= 1.0779 (max= 1.6304), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:11,239 - root - INFO - Step 5600: lr=4.74E-06, loss= 1.0779 (max= 1.6304), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:43,079 - root - INFO - Step 5610: lr=4.73E-06, loss= 1.0528 (max= 1.5076), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:43,079 - root - INFO - Step 5610: lr=4.73E-06, loss= 1.0528 (max= 1.5076), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:43,079 - root - INFO - Step 5610: lr=4.73E-06, loss= 1.0528 (max= 1.5076), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:43,079 - root - INFO - Step 5610: lr=4.73E-06, loss= 1.0528 (max= 1.5076), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:43,079 - root - INFO - Step 5610: lr=4.73E-06, loss= 1.0528 (max= 1.5076), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:43,079 - root - INFO - Step 5610: lr=4.73E-06, loss= 1.0528 (max= 1.5076), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:43,079 - root - INFO - Step 5610: lr=4.73E-06, loss= 1.0528 (max= 1.5076), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:04:43,080 - root - INFO - Step 5610: lr=4.73E-06, loss= 1.0528 (max= 1.5076), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:14,877 - root - INFO - Step 5620: lr=4.73E-06, loss= 1.0407 (max= 1.5191), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:14,877 - root - INFO - Step 5620: lr=4.73E-06, loss= 1.0407 (max= 1.5191), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:14,877 - root - INFO - Step 5620: lr=4.73E-06, loss= 1.0407 (max= 1.5191), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:14,877 - root - INFO - Step 5620: lr=4.73E-06, loss= 1.0407 (max= 1.5191), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:14,877 - root - INFO - Step 5620: lr=4.73E-06, loss= 1.0407 (max= 1.5191), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:14,877 - root - INFO - Step 5620: lr=4.73E-06, loss= 1.0407 (max= 1.5191), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:14,877 - root - INFO - Step 5620: lr=4.73E-06, loss= 1.0407 (max= 1.5191), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:14,878 - root - INFO - Step 5620: lr=4.73E-06, loss= 1.0407 (max= 1.5191), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:46,715 - root - INFO - Step 5630: lr=4.72E-06, loss= 1.0760 (max= 1.5738), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:46,715 - root - INFO - Step 5630: lr=4.72E-06, loss= 1.0760 (max= 1.5738), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:46,715 - root - INFO - Step 5630: lr=4.72E-06, loss= 1.0760 (max= 1.5738), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:46,715 - root - INFO - Step 5630: lr=4.72E-06, loss= 1.0760 (max= 1.5738), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:46,715 - root - INFO - Step 5630: lr=4.72E-06, loss= 1.0760 (max= 1.5738), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:46,715 - root - INFO - Step 5630: lr=4.72E-06, loss= 1.0760 (max= 1.5738), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:46,715 - root - INFO - Step 5630: lr=4.72E-06, loss= 1.0760 (max= 1.5738), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:05:46,715 - root - INFO - Step 5630: lr=4.72E-06, loss= 1.0760 (max= 1.5738), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:18,514 - root - INFO - Step 5640: lr=4.72E-06, loss= 1.0728 (max= 1.7261), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:18,515 - root - INFO - Step 5640: lr=4.72E-06, loss= 1.0728 (max= 1.7261), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:18,515 - root - INFO - Step 5640: lr=4.72E-06, loss= 1.0728 (max= 1.7261), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:18,515 - root - INFO - Step 5640: lr=4.72E-06, loss= 1.0728 (max= 1.7261), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:18,515 - root - INFO - Step 5640: lr=4.72E-06, loss= 1.0728 (max= 1.7261), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:18,515 - root - INFO - Step 5640: lr=4.72E-06, loss= 1.0728 (max= 1.7261), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:18,515 - root - INFO - Step 5640: lr=4.72E-06, loss= 1.0728 (max= 1.7261), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:18,515 - root - INFO - Step 5640: lr=4.72E-06, loss= 1.0728 (max= 1.7261), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:50,298 - root - INFO - Step 5650: lr=4.71E-06, loss= 1.0698 (max= 1.4965), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:50,298 - root - INFO - Step 5650: lr=4.71E-06, loss= 1.0698 (max= 1.4965), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:50,299 - root - INFO - Step 5650: lr=4.71E-06, loss= 1.0698 (max= 1.4965), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:50,299 - root - INFO - Step 5650: lr=4.71E-06, loss= 1.0698 (max= 1.4965), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:50,299 - root - INFO - Step 5650: lr=4.71E-06, loss= 1.0698 (max= 1.4965), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:50,299 - root - INFO - Step 5650: lr=4.71E-06, loss= 1.0698 (max= 1.4965), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:50,299 - root - INFO - Step 5650: lr=4.71E-06, loss= 1.0698 (max= 1.4965), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:06:50,299 - root - INFO - Step 5650: lr=4.71E-06, loss= 1.0698 (max= 1.4965), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:22,415 - root - INFO - Step 5660: lr=4.71E-06, loss= 1.0657 (max= 1.4809), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:22,415 - root - INFO - Step 5660: lr=4.71E-06, loss= 1.0657 (max= 1.4809), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:22,415 - root - INFO - Step 5660: lr=4.71E-06, loss= 1.0657 (max= 1.4809), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:22,415 - root - INFO - Step 5660: lr=4.71E-06, loss= 1.0657 (max= 1.4809), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:22,415 - root - INFO - Step 5660: lr=4.71E-06, loss= 1.0657 (max= 1.4809), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:22,415 - root - INFO - Step 5660: lr=4.71E-06, loss= 1.0657 (max= 1.4809), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:22,415 - root - INFO - Step 5660: lr=4.71E-06, loss= 1.0657 (max= 1.4809), tps=20408, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:22,415 - root - INFO - Step 5660: lr=4.71E-06, loss= 1.0657 (max= 1.4809), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:54,283 - root - INFO - Step 5670: lr=4.70E-06, loss= 1.0656 (max= 1.5561), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:54,283 - root - INFO - Step 5670: lr=4.70E-06, loss= 1.0656 (max= 1.5561), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:54,283 - root - INFO - Step 5670: lr=4.70E-06, loss= 1.0656 (max= 1.5561), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:54,283 - root - INFO - Step 5670: lr=4.70E-06, loss= 1.0656 (max= 1.5561), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:54,283 - root - INFO - Step 5670: lr=4.70E-06, loss= 1.0656 (max= 1.5561), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:54,283 - root - INFO - Step 5670: lr=4.70E-06, loss= 1.0656 (max= 1.5561), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:54,283 - root - INFO - Step 5670: lr=4.70E-06, loss= 1.0656 (max= 1.5561), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:07:54,284 - root - INFO - Step 5670: lr=4.70E-06, loss= 1.0656 (max= 1.5561), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:26,132 - root - INFO - Step 5680: lr=4.70E-06, loss= 1.0967 (max= 1.4670), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:26,132 - root - INFO - Step 5680: lr=4.70E-06, loss= 1.0967 (max= 1.4670), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:26,133 - root - INFO - Step 5680: lr=4.70E-06, loss= 1.0967 (max= 1.4670), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:26,133 - root - INFO - Step 5680: lr=4.70E-06, loss= 1.0967 (max= 1.4670), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:26,133 - root - INFO - Step 5680: lr=4.70E-06, loss= 1.0967 (max= 1.4670), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:26,133 - root - INFO - Step 5680: lr=4.70E-06, loss= 1.0967 (max= 1.4670), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:26,133 - root - INFO - Step 5680: lr=4.70E-06, loss= 1.0967 (max= 1.4670), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:26,133 - root - INFO - Step 5680: lr=4.70E-06, loss= 1.0967 (max= 1.4670), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:57,915 - root - INFO - Step 5690: lr=4.69E-06, loss= 1.0614 (max= 1.4399), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:57,915 - root - INFO - Step 5690: lr=4.69E-06, loss= 1.0614 (max= 1.4399), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:57,915 - root - INFO - Step 5690: lr=4.69E-06, loss= 1.0614 (max= 1.4399), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:57,915 - root - INFO - Step 5690: lr=4.69E-06, loss= 1.0614 (max= 1.4399), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:57,915 - root - INFO - Step 5690: lr=4.69E-06, loss= 1.0614 (max= 1.4399), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:57,915 - root - INFO - Step 5690: lr=4.69E-06, loss= 1.0614 (max= 1.4399), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:57,915 - root - INFO - Step 5690: lr=4.69E-06, loss= 1.0614 (max= 1.4399), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:08:57,916 - root - INFO - Step 5690: lr=4.69E-06, loss= 1.0614 (max= 1.4399), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:09:29,699 - root - INFO - Step 5700: lr=4.69E-06, loss= 1.0613 (max= 1.5481), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:09:29,699 - root - INFO - Step 5700: lr=4.69E-06, loss= 1.0613 (max= 1.5481), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:09:29,699 - root - INFO - Step 5700: lr=4.69E-06, loss= 1.0613 (max= 1.5481), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:09:29,699 - root - INFO - Step 5700: lr=4.69E-06, loss= 1.0613 (max= 1.5481), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:09:29,699 - root - INFO - Step 5700: lr=4.69E-06, loss= 1.0613 (max= 1.5481), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:09:29,699 - root - INFO - Step 5700: lr=4.69E-06, loss= 1.0613 (max= 1.5481), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:09:29,699 - root - INFO - Step 5700: lr=4.69E-06, loss= 1.0613 (max= 1.5481), tps=20621, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:09:29,699 - root - INFO - Step 5700: lr=4.69E-06, loss= 1.0613 (max= 1.5481), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:01,526 - root - INFO - Step 5710: lr=4.68E-06, loss= 1.0566 (max= 1.5524), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:01,526 - root - INFO - Step 5710: lr=4.68E-06, loss= 1.0566 (max= 1.5524), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:01,526 - root - INFO - Step 5710: lr=4.68E-06, loss= 1.0566 (max= 1.5524), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:01,526 - root - INFO - Step 5710: lr=4.68E-06, loss= 1.0566 (max= 1.5524), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:01,526 - root - INFO - Step 5710: lr=4.68E-06, loss= 1.0566 (max= 1.5524), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:01,526 - root - INFO - Step 5710: lr=4.68E-06, loss= 1.0566 (max= 1.5524), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:01,526 - root - INFO - Step 5710: lr=4.68E-06, loss= 1.0566 (max= 1.5524), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:01,527 - root - INFO - Step 5710: lr=4.68E-06, loss= 1.0566 (max= 1.5524), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:33,381 - root - INFO - Step 5720: lr=4.68E-06, loss= 1.0509 (max= 1.4152), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:33,381 - root - INFO - Step 5720: lr=4.68E-06, loss= 1.0509 (max= 1.4152), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:33,381 - root - INFO - Step 5720: lr=4.68E-06, loss= 1.0509 (max= 1.4152), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:33,381 - root - INFO - Step 5720: lr=4.68E-06, loss= 1.0509 (max= 1.4152), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:33,381 - root - INFO - Step 5720: lr=4.68E-06, loss= 1.0509 (max= 1.4152), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:33,381 - root - INFO - Step 5720: lr=4.68E-06, loss= 1.0509 (max= 1.4152), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:33,381 - root - INFO - Step 5720: lr=4.68E-06, loss= 1.0509 (max= 1.4152), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:10:33,381 - root - INFO - Step 5720: lr=4.68E-06, loss= 1.0509 (max= 1.4152), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:05,179 - root - INFO - Step 5730: lr=4.67E-06, loss= 1.0752 (max= 1.6026), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:05,179 - root - INFO - Step 5730: lr=4.67E-06, loss= 1.0752 (max= 1.6026), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:05,179 - root - INFO - Step 5730: lr=4.67E-06, loss= 1.0752 (max= 1.6026), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:05,179 - root - INFO - Step 5730: lr=4.67E-06, loss= 1.0752 (max= 1.6026), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:05,179 - root - INFO - Step 5730: lr=4.67E-06, loss= 1.0752 (max= 1.6026), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:05,179 - root - INFO - Step 5730: lr=4.67E-06, loss= 1.0752 (max= 1.6026), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:05,180 - root - INFO - Step 5730: lr=4.67E-06, loss= 1.0752 (max= 1.6026), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:05,180 - root - INFO - Step 5730: lr=4.67E-06, loss= 1.0752 (max= 1.6026), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:37,013 - root - INFO - Step 5740: lr=4.67E-06, loss= 1.0544 (max= 1.7724), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:37,014 - root - INFO - Step 5740: lr=4.67E-06, loss= 1.0544 (max= 1.7724), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:37,014 - root - INFO - Step 5740: lr=4.67E-06, loss= 1.0544 (max= 1.7724), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:37,014 - root - INFO - Step 5740: lr=4.67E-06, loss= 1.0544 (max= 1.7724), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:37,014 - root - INFO - Step 5740: lr=4.67E-06, loss= 1.0544 (max= 1.7724), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:37,014 - root - INFO - Step 5740: lr=4.67E-06, loss= 1.0544 (max= 1.7724), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:37,014 - root - INFO - Step 5740: lr=4.67E-06, loss= 1.0544 (max= 1.7724), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:11:37,014 - root - INFO - Step 5740: lr=4.67E-06, loss= 1.0544 (max= 1.7724), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:08,910 - root - INFO - Step 5750: lr=4.66E-06, loss= 1.0791 (max= 1.6898), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:08,910 - root - INFO - Step 5750: lr=4.66E-06, loss= 1.0791 (max= 1.6898), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:08,910 - root - INFO - Step 5750: lr=4.66E-06, loss= 1.0791 (max= 1.6898), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:08,910 - root - INFO - Step 5750: lr=4.66E-06, loss= 1.0791 (max= 1.6898), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:08,910 - root - INFO - Step 5750: lr=4.66E-06, loss= 1.0791 (max= 1.6898), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:08,910 - root - INFO - Step 5750: lr=4.66E-06, loss= 1.0791 (max= 1.6898), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:08,910 - root - INFO - Step 5750: lr=4.66E-06, loss= 1.0791 (max= 1.6898), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:08,911 - root - INFO - Step 5750: lr=4.66E-06, loss= 1.0791 (max= 1.6898), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:40,778 - root - INFO - Step 5760: lr=4.66E-06, loss= 1.0778 (max= 1.6114), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:40,778 - root - INFO - Step 5760: lr=4.66E-06, loss= 1.0778 (max= 1.6114), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:40,778 - root - INFO - Step 5760: lr=4.66E-06, loss= 1.0778 (max= 1.6114), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:40,778 - root - INFO - Step 5760: lr=4.66E-06, loss= 1.0778 (max= 1.6114), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:40,778 - root - INFO - Step 5760: lr=4.66E-06, loss= 1.0778 (max= 1.6114), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:40,778 - root - INFO - Step 5760: lr=4.66E-06, loss= 1.0778 (max= 1.6114), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:40,778 - root - INFO - Step 5760: lr=4.66E-06, loss= 1.0778 (max= 1.6114), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:12:40,778 - root - INFO - Step 5760: lr=4.66E-06, loss= 1.0778 (max= 1.6114), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:12,737 - root - INFO - Step 5770: lr=4.65E-06, loss= 1.0566 (max= 1.3876), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:12,737 - root - INFO - Step 5770: lr=4.65E-06, loss= 1.0566 (max= 1.3876), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:12,737 - root - INFO - Step 5770: lr=4.65E-06, loss= 1.0566 (max= 1.3876), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:12,737 - root - INFO - Step 5770: lr=4.65E-06, loss= 1.0566 (max= 1.3876), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:12,737 - root - INFO - Step 5770: lr=4.65E-06, loss= 1.0566 (max= 1.3876), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:12,737 - root - INFO - Step 5770: lr=4.65E-06, loss= 1.0566 (max= 1.3876), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:12,738 - root - INFO - Step 5770: lr=4.65E-06, loss= 1.0566 (max= 1.3876), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:12,738 - root - INFO - Step 5770: lr=4.65E-06, loss= 1.0566 (max= 1.3876), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:44,568 - root - INFO - Step 5780: lr=4.65E-06, loss= 1.0594 (max= 1.4655), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:44,568 - root - INFO - Step 5780: lr=4.65E-06, loss= 1.0594 (max= 1.4655), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:44,568 - root - INFO - Step 5780: lr=4.65E-06, loss= 1.0594 (max= 1.4655), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:44,568 - root - INFO - Step 5780: lr=4.65E-06, loss= 1.0594 (max= 1.4655), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:44,568 - root - INFO - Step 5780: lr=4.65E-06, loss= 1.0594 (max= 1.4655), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:44,568 - root - INFO - Step 5780: lr=4.65E-06, loss= 1.0594 (max= 1.4655), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:44,568 - root - INFO - Step 5780: lr=4.65E-06, loss= 1.0594 (max= 1.4655), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:13:44,569 - root - INFO - Step 5780: lr=4.65E-06, loss= 1.0594 (max= 1.4655), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:16,347 - root - INFO - Step 5790: lr=4.64E-06, loss= 1.0501 (max= 1.4505), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:16,347 - root - INFO - Step 5790: lr=4.64E-06, loss= 1.0501 (max= 1.4505), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:16,347 - root - INFO - Step 5790: lr=4.64E-06, loss= 1.0501 (max= 1.4505), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:16,347 - root - INFO - Step 5790: lr=4.64E-06, loss= 1.0501 (max= 1.4505), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:16,347 - root - INFO - Step 5790: lr=4.64E-06, loss= 1.0501 (max= 1.4505), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:16,347 - root - INFO - Step 5790: lr=4.64E-06, loss= 1.0501 (max= 1.4505), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:16,347 - root - INFO - Step 5790: lr=4.64E-06, loss= 1.0501 (max= 1.4505), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:16,348 - root - INFO - Step 5790: lr=4.64E-06, loss= 1.0501 (max= 1.4505), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:48,211 - root - INFO - Step 5800: lr=4.64E-06, loss= 1.0552 (max= 1.5247), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:48,211 - root - INFO - Step 5800: lr=4.64E-06, loss= 1.0552 (max= 1.5247), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:48,211 - root - INFO - Step 5800: lr=4.64E-06, loss= 1.0552 (max= 1.5247), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:48,211 - root - INFO - Step 5800: lr=4.64E-06, loss= 1.0552 (max= 1.5247), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:48,211 - root - INFO - Step 5800: lr=4.64E-06, loss= 1.0552 (max= 1.5247), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:48,211 - root - INFO - Step 5800: lr=4.64E-06, loss= 1.0552 (max= 1.5247), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:48,211 - root - INFO - Step 5800: lr=4.64E-06, loss= 1.0552 (max= 1.5247), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:14:48,212 - root - INFO - Step 5800: lr=4.64E-06, loss= 1.0552 (max= 1.5247), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:20,095 - root - INFO - Step 5810: lr=4.63E-06, loss= 1.0665 (max= 1.6252), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:20,095 - root - INFO - Step 5810: lr=4.63E-06, loss= 1.0665 (max= 1.6252), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:20,095 - root - INFO - Step 5810: lr=4.63E-06, loss= 1.0665 (max= 1.6252), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:20,095 - root - INFO - Step 5810: lr=4.63E-06, loss= 1.0665 (max= 1.6252), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:20,095 - root - INFO - Step 5810: lr=4.63E-06, loss= 1.0665 (max= 1.6252), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:20,096 - root - INFO - Step 5810: lr=4.63E-06, loss= 1.0665 (max= 1.6252), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:20,096 - root - INFO - Step 5810: lr=4.63E-06, loss= 1.0665 (max= 1.6252), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:20,096 - root - INFO - Step 5810: lr=4.63E-06, loss= 1.0665 (max= 1.6252), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:51,907 - root - INFO - Step 5820: lr=4.63E-06, loss= 1.0562 (max= 1.6348), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:51,907 - root - INFO - Step 5820: lr=4.63E-06, loss= 1.0562 (max= 1.6348), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:51,907 - root - INFO - Step 5820: lr=4.63E-06, loss= 1.0562 (max= 1.6348), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:51,907 - root - INFO - Step 5820: lr=4.63E-06, loss= 1.0562 (max= 1.6348), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:51,907 - root - INFO - Step 5820: lr=4.63E-06, loss= 1.0562 (max= 1.6348), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:51,907 - root - INFO - Step 5820: lr=4.63E-06, loss= 1.0562 (max= 1.6348), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:51,908 - root - INFO - Step 5820: lr=4.63E-06, loss= 1.0562 (max= 1.6348), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:15:51,908 - root - INFO - Step 5820: lr=4.63E-06, loss= 1.0562 (max= 1.6348), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:23,815 - root - INFO - Step 5830: lr=4.62E-06, loss= 1.0718 (max= 1.5262), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:23,815 - root - INFO - Step 5830: lr=4.62E-06, loss= 1.0718 (max= 1.5262), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:23,815 - root - INFO - Step 5830: lr=4.62E-06, loss= 1.0718 (max= 1.5262), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:23,815 - root - INFO - Step 5830: lr=4.62E-06, loss= 1.0718 (max= 1.5262), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:23,815 - root - INFO - Step 5830: lr=4.62E-06, loss= 1.0718 (max= 1.5262), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:23,815 - root - INFO - Step 5830: lr=4.62E-06, loss= 1.0718 (max= 1.5262), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:23,815 - root - INFO - Step 5830: lr=4.62E-06, loss= 1.0718 (max= 1.5262), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:23,815 - root - INFO - Step 5830: lr=4.62E-06, loss= 1.0718 (max= 1.5262), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:55,588 - root - INFO - Step 5840: lr=4.62E-06, loss= 1.0599 (max= 1.4673), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:55,588 - root - INFO - Step 5840: lr=4.62E-06, loss= 1.0599 (max= 1.4673), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:55,588 - root - INFO - Step 5840: lr=4.62E-06, loss= 1.0599 (max= 1.4673), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:55,588 - root - INFO - Step 5840: lr=4.62E-06, loss= 1.0599 (max= 1.4673), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:55,588 - root - INFO - Step 5840: lr=4.62E-06, loss= 1.0599 (max= 1.4673), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:55,588 - root - INFO - Step 5840: lr=4.62E-06, loss= 1.0599 (max= 1.4673), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:55,588 - root - INFO - Step 5840: lr=4.62E-06, loss= 1.0599 (max= 1.4673), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:16:55,589 - root - INFO - Step 5840: lr=4.62E-06, loss= 1.0599 (max= 1.4673), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:27,701 - root - INFO - Step 5850: lr=4.61E-06, loss= 1.0786 (max= 1.4474), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:27,701 - root - INFO - Step 5850: lr=4.61E-06, loss= 1.0786 (max= 1.4474), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:27,701 - root - INFO - Step 5850: lr=4.61E-06, loss= 1.0786 (max= 1.4474), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:27,701 - root - INFO - Step 5850: lr=4.61E-06, loss= 1.0786 (max= 1.4474), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:27,701 - root - INFO - Step 5850: lr=4.61E-06, loss= 1.0786 (max= 1.4474), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:27,701 - root - INFO - Step 5850: lr=4.61E-06, loss= 1.0786 (max= 1.4474), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:27,701 - root - INFO - Step 5850: lr=4.61E-06, loss= 1.0786 (max= 1.4474), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:27,702 - root - INFO - Step 5850: lr=4.61E-06, loss= 1.0786 (max= 1.4474), tps=20411, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:59,625 - root - INFO - Step 5860: lr=4.61E-06, loss= 1.0720 (max= 1.7381), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:59,625 - root - INFO - Step 5860: lr=4.61E-06, loss= 1.0720 (max= 1.7381), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:59,625 - root - INFO - Step 5860: lr=4.61E-06, loss= 1.0720 (max= 1.7381), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:59,625 - root - INFO - Step 5860: lr=4.61E-06, loss= 1.0720 (max= 1.7381), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:59,625 - root - INFO - Step 5860: lr=4.61E-06, loss= 1.0720 (max= 1.7381), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:59,625 - root - INFO - Step 5860: lr=4.61E-06, loss= 1.0720 (max= 1.7381), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:59,625 - root - INFO - Step 5860: lr=4.61E-06, loss= 1.0720 (max= 1.7381), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:17:59,626 - root - INFO - Step 5860: lr=4.61E-06, loss= 1.0720 (max= 1.7381), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:18:31,390 - root - INFO - Step 5870: lr=4.60E-06, loss= 1.0667 (max= 1.4907), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:18:31,390 - root - INFO - Step 5870: lr=4.60E-06, loss= 1.0667 (max= 1.4907), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:18:31,390 - root - INFO - Step 5870: lr=4.60E-06, loss= 1.0667 (max= 1.4907), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:18:31,391 - root - INFO - Step 5870: lr=4.60E-06, loss= 1.0667 (max= 1.4907), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:18:31,391 - root - INFO - Step 5870: lr=4.60E-06, loss= 1.0667 (max= 1.4907), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:18:31,391 - root - INFO - Step 5870: lr=4.60E-06, loss= 1.0667 (max= 1.4907), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:18:31,391 - root - INFO - Step 5870: lr=4.60E-06, loss= 1.0667 (max= 1.4907), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:18:31,391 - root - INFO - Step 5870: lr=4.60E-06, loss= 1.0667 (max= 1.4907), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:03,186 - root - INFO - Step 5880: lr=4.60E-06, loss= 1.0687 (max= 1.5360), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:03,186 - root - INFO - Step 5880: lr=4.60E-06, loss= 1.0687 (max= 1.5360), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:03,186 - root - INFO - Step 5880: lr=4.60E-06, loss= 1.0687 (max= 1.5360), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:03,187 - root - INFO - Step 5880: lr=4.60E-06, loss= 1.0687 (max= 1.5360), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:03,187 - root - INFO - Step 5880: lr=4.60E-06, loss= 1.0687 (max= 1.5360), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:03,187 - root - INFO - Step 5880: lr=4.60E-06, loss= 1.0687 (max= 1.5360), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:03,187 - root - INFO - Step 5880: lr=4.60E-06, loss= 1.0687 (max= 1.5360), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:03,187 - root - INFO - Step 5880: lr=4.60E-06, loss= 1.0687 (max= 1.5360), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:35,079 - root - INFO - Step 5890: lr=4.59E-06, loss= 1.0613 (max= 1.3997), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:35,079 - root - INFO - Step 5890: lr=4.59E-06, loss= 1.0613 (max= 1.3997), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:35,079 - root - INFO - Step 5890: lr=4.59E-06, loss= 1.0613 (max= 1.3997), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:35,079 - root - INFO - Step 5890: lr=4.59E-06, loss= 1.0613 (max= 1.3997), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:35,079 - root - INFO - Step 5890: lr=4.59E-06, loss= 1.0613 (max= 1.3997), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:35,079 - root - INFO - Step 5890: lr=4.59E-06, loss= 1.0613 (max= 1.3997), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:35,079 - root - INFO - Step 5890: lr=4.59E-06, loss= 1.0613 (max= 1.3997), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:19:35,079 - root - INFO - Step 5890: lr=4.59E-06, loss= 1.0613 (max= 1.3997), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:06,869 - root - INFO - Step 5900: lr=4.59E-06, loss= 1.0538 (max= 1.5336), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:06,870 - root - INFO - Step 5900: lr=4.59E-06, loss= 1.0538 (max= 1.5336), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:06,870 - root - INFO - Step 5900: lr=4.59E-06, loss= 1.0538 (max= 1.5336), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:06,870 - root - INFO - Step 5900: lr=4.59E-06, loss= 1.0538 (max= 1.5336), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:06,870 - root - INFO - Step 5900: lr=4.59E-06, loss= 1.0538 (max= 1.5336), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:06,870 - root - INFO - Step 5900: lr=4.59E-06, loss= 1.0538 (max= 1.5336), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:06,870 - root - INFO - Step 5900: lr=4.59E-06, loss= 1.0538 (max= 1.5336), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:06,870 - root - INFO - Step 5900: lr=4.59E-06, loss= 1.0538 (max= 1.5336), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:38,716 - root - INFO - Step 5910: lr=4.58E-06, loss= 1.0766 (max= 1.7007), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:38,716 - root - INFO - Step 5910: lr=4.58E-06, loss= 1.0766 (max= 1.7007), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:38,716 - root - INFO - Step 5910: lr=4.58E-06, loss= 1.0766 (max= 1.7007), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:38,717 - root - INFO - Step 5910: lr=4.58E-06, loss= 1.0766 (max= 1.7007), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:38,717 - root - INFO - Step 5910: lr=4.58E-06, loss= 1.0766 (max= 1.7007), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:38,717 - root - INFO - Step 5910: lr=4.58E-06, loss= 1.0766 (max= 1.7007), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:38,717 - root - INFO - Step 5910: lr=4.58E-06, loss= 1.0766 (max= 1.7007), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:20:38,717 - root - INFO - Step 5910: lr=4.58E-06, loss= 1.0766 (max= 1.7007), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:10,519 - root - INFO - Step 5920: lr=4.58E-06, loss= 1.0630 (max= 1.4950), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:10,519 - root - INFO - Step 5920: lr=4.58E-06, loss= 1.0630 (max= 1.4950), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:10,519 - root - INFO - Step 5920: lr=4.58E-06, loss= 1.0630 (max= 1.4950), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:10,519 - root - INFO - Step 5920: lr=4.58E-06, loss= 1.0630 (max= 1.4950), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:10,519 - root - INFO - Step 5920: lr=4.58E-06, loss= 1.0630 (max= 1.4950), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:10,519 - root - INFO - Step 5920: lr=4.58E-06, loss= 1.0630 (max= 1.4950), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:10,519 - root - INFO - Step 5920: lr=4.58E-06, loss= 1.0630 (max= 1.4950), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:10,520 - root - INFO - Step 5920: lr=4.58E-06, loss= 1.0630 (max= 1.4950), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:42,363 - root - INFO - Step 5930: lr=4.57E-06, loss= 1.0688 (max= 1.7277), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:42,363 - root - INFO - Step 5930: lr=4.57E-06, loss= 1.0688 (max= 1.7277), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:42,363 - root - INFO - Step 5930: lr=4.57E-06, loss= 1.0688 (max= 1.7277), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:42,363 - root - INFO - Step 5930: lr=4.57E-06, loss= 1.0688 (max= 1.7277), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:42,363 - root - INFO - Step 5930: lr=4.57E-06, loss= 1.0688 (max= 1.7277), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:42,363 - root - INFO - Step 5930: lr=4.57E-06, loss= 1.0688 (max= 1.7277), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:42,363 - root - INFO - Step 5930: lr=4.57E-06, loss= 1.0688 (max= 1.7277), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:21:42,364 - root - INFO - Step 5930: lr=4.57E-06, loss= 1.0688 (max= 1.7277), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:14,189 - root - INFO - Step 5940: lr=4.57E-06, loss= 1.0555 (max= 1.5060), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:14,189 - root - INFO - Step 5940: lr=4.57E-06, loss= 1.0555 (max= 1.5060), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:14,189 - root - INFO - Step 5940: lr=4.57E-06, loss= 1.0555 (max= 1.5060), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:14,190 - root - INFO - Step 5940: lr=4.57E-06, loss= 1.0555 (max= 1.5060), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:14,190 - root - INFO - Step 5940: lr=4.57E-06, loss= 1.0555 (max= 1.5060), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:14,190 - root - INFO - Step 5940: lr=4.57E-06, loss= 1.0555 (max= 1.5060), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:14,190 - root - INFO - Step 5940: lr=4.57E-06, loss= 1.0555 (max= 1.5060), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:14,190 - root - INFO - Step 5940: lr=4.57E-06, loss= 1.0555 (max= 1.5060), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:26,055 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:7517588 2025-10-26 13:22:45,999 - root - INFO - Step 5950: lr=4.56E-06, loss= 1.0353 (max= 1.5079), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:45,999 - root - INFO - Step 5950: lr=4.56E-06, loss= 1.0353 (max= 1.5079), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:45,999 - root - INFO - Step 5950: lr=4.56E-06, loss= 1.0353 (max= 1.5079), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:45,999 - root - INFO - Step 5950: lr=4.56E-06, loss= 1.0353 (max= 1.5079), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:45,999 - root - INFO - Step 5950: lr=4.56E-06, loss= 1.0353 (max= 1.5079), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:45,999 - root - INFO - Step 5950: lr=4.56E-06, loss= 1.0353 (max= 1.5079), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:45,999 - root - INFO - Step 5950: lr=4.56E-06, loss= 1.0353 (max= 1.5079), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:22:46,000 - root - INFO - Step 5950: lr=4.56E-06, loss= 1.0353 (max= 1.5079), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:17,821 - root - INFO - Step 5960: lr=4.56E-06, loss= 1.0672 (max= 1.4850), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:17,821 - root - INFO - Step 5960: lr=4.56E-06, loss= 1.0672 (max= 1.4850), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:17,821 - root - INFO - Step 5960: lr=4.56E-06, loss= 1.0672 (max= 1.4850), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:17,821 - root - INFO - Step 5960: lr=4.56E-06, loss= 1.0672 (max= 1.4850), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:17,821 - root - INFO - Step 5960: lr=4.56E-06, loss= 1.0672 (max= 1.4850), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:17,821 - root - INFO - Step 5960: lr=4.56E-06, loss= 1.0672 (max= 1.4850), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:17,821 - root - INFO - Step 5960: lr=4.56E-06, loss= 1.0672 (max= 1.4850), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:17,821 - root - INFO - Step 5960: lr=4.56E-06, loss= 1.0672 (max= 1.4850), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:49,629 - root - INFO - Step 5970: lr=4.55E-06, loss= 1.0759 (max= 1.5027), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:49,629 - root - INFO - Step 5970: lr=4.55E-06, loss= 1.0759 (max= 1.5027), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:49,629 - root - INFO - Step 5970: lr=4.55E-06, loss= 1.0759 (max= 1.5027), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:49,629 - root - INFO - Step 5970: lr=4.55E-06, loss= 1.0759 (max= 1.5027), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:49,629 - root - INFO - Step 5970: lr=4.55E-06, loss= 1.0759 (max= 1.5027), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:49,629 - root - INFO - Step 5970: lr=4.55E-06, loss= 1.0759 (max= 1.5027), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:49,629 - root - INFO - Step 5970: lr=4.55E-06, loss= 1.0759 (max= 1.5027), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:23:49,630 - root - INFO - Step 5970: lr=4.55E-06, loss= 1.0759 (max= 1.5027), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:21,425 - root - INFO - Step 5980: lr=4.55E-06, loss= 1.0475 (max= 1.5083), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:21,425 - root - INFO - Step 5980: lr=4.55E-06, loss= 1.0475 (max= 1.5083), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:21,425 - root - INFO - Step 5980: lr=4.55E-06, loss= 1.0475 (max= 1.5083), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:21,426 - root - INFO - Step 5980: lr=4.55E-06, loss= 1.0475 (max= 1.5083), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:21,426 - root - INFO - Step 5980: lr=4.55E-06, loss= 1.0475 (max= 1.5083), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:21,426 - root - INFO - Step 5980: lr=4.55E-06, loss= 1.0475 (max= 1.5083), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:21,426 - root - INFO - Step 5980: lr=4.55E-06, loss= 1.0475 (max= 1.5083), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:21,426 - root - INFO - Step 5980: lr=4.55E-06, loss= 1.0475 (max= 1.5083), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:53,308 - root - INFO - Step 5990: lr=4.54E-06, loss= 1.0564 (max= 1.5108), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:53,308 - root - INFO - Step 5990: lr=4.54E-06, loss= 1.0564 (max= 1.5108), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:53,308 - root - INFO - Step 5990: lr=4.54E-06, loss= 1.0564 (max= 1.5108), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:53,308 - root - INFO - Step 5990: lr=4.54E-06, loss= 1.0564 (max= 1.5108), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:53,309 - root - INFO - Step 5990: lr=4.54E-06, loss= 1.0564 (max= 1.5108), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:53,309 - root - INFO - Step 5990: lr=4.54E-06, loss= 1.0564 (max= 1.5108), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:53,309 - root - INFO - Step 5990: lr=4.54E-06, loss= 1.0564 (max= 1.5108), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:24:53,309 - root - INFO - Step 5990: lr=4.54E-06, loss= 1.0564 (max= 1.5108), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-6000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-6000! Save time: 4.406543970108032 2025-10-26 13:25:25,184 - root - INFO - Step 6000: lr=4.54E-06, loss= 1.0675 (max= 1.4515), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:25:25,184 - root - INFO - Saving a full checkpoint at step 6000 2025-10-26 13:25:25,184 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 13:25:25,184 - root - INFO - Step 6000: lr=4.54E-06, loss= 1.0675 (max= 1.4515), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:25:25,184 - root - INFO - Step 6000: lr=4.54E-06, loss= 1.0675 (max= 1.4515), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:25:25,184 - root - INFO - Step 6000: lr=4.54E-06, loss= 1.0675 (max= 1.4515), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:25:25,184 - root - INFO - Step 6000: lr=4.54E-06, loss= 1.0675 (max= 1.4515), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:25:25,184 - root - INFO - Step 6000: lr=4.54E-06, loss= 1.0675 (max= 1.4515), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:25:25,184 - root - INFO - Saving a full checkpoint at step 6000 2025-10-26 13:25:25,184 - root - INFO - Saving a full checkpoint at step 6000 2025-10-26 13:25:25,184 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 13:25:25,184 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 13:25:25,184 - root - INFO - Step 6000: lr=4.54E-06, loss= 1.0675 (max= 1.4515), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:25:25,184 - root - INFO - Saving a full checkpoint at step 6000 2025-10-26 13:25:25,184 - root - INFO - Saving a full checkpoint at step 6000 2025-10-26 13:25:25,184 - root - INFO - Saving a full checkpoint at step 6000 2025-10-26 13:25:25,184 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 13:25:25,184 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 13:25:25,184 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 13:25:25,184 - root - INFO - Saving a full checkpoint at step 6000 2025-10-26 13:25:25,184 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 13:25:25,184 - root - INFO - Step 6000: lr=4.54E-06, loss= 1.0675 (max= 1.4515), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:25:25,185 - root - INFO - Saving a full checkpoint at step 6000 2025-10-26 13:25:25,185 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 13:25:39,675 - root - INFO - Finished saving the checkpoint in 14.49 seconds 2025-10-26 13:25:39,682 - root - INFO - Finished saving the checkpoint in 14.50 seconds 2025-10-26 13:25:39,682 - root - INFO - Finished saving the checkpoint in 14.50 seconds 2025-10-26 13:25:39,682 - root - INFO - Finished saving the checkpoint in 14.50 seconds 2025-10-26 13:25:39,683 - root - INFO - Finished saving the checkpoint in 14.50 seconds 2025-10-26 13:25:39,683 - root - INFO - Finished saving the checkpoint in 14.50 seconds 2025-10-26 13:25:39,685 - root - INFO - Finished saving the checkpoint in 14.50 seconds 2025-10-26 13:25:39,685 - root - INFO - Finished saving the checkpoint in 14.50 seconds 2025-10-26 13:26:11,418 - root - INFO - Step 6010: lr=4.53E-06, loss= 1.0567 (max= 1.5275), tps=14176, mfu=29.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:11,418 - root - INFO - Step 6010: lr=4.53E-06, loss= 1.0567 (max= 1.5275), tps=14176, mfu=29.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:11,418 - root - INFO - Step 6010: lr=4.53E-06, loss= 1.0567 (max= 1.5275), tps=14176, mfu=29.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:11,419 - root - INFO - Step 6010: lr=4.53E-06, loss= 1.0567 (max= 1.5275), tps=14176, mfu=29.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:11,419 - root - INFO - Step 6010: lr=4.53E-06, loss= 1.0567 (max= 1.5275), tps=14176, mfu=29.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:11,419 - root - INFO - Step 6010: lr=4.53E-06, loss= 1.0567 (max= 1.5275), tps=14176, mfu=29.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:11,419 - root - INFO - Step 6010: lr=4.53E-06, loss= 1.0567 (max= 1.5275), tps=14176, mfu=29.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:11,419 - root - INFO - Step 6010: lr=4.53E-06, loss= 1.0567 (max= 1.5275), tps=14176, mfu=29.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:43,259 - root - INFO - Step 6020: lr=4.53E-06, loss= 1.0572 (max= 1.4449), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:43,260 - root - INFO - Step 6020: lr=4.53E-06, loss= 1.0572 (max= 1.4449), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:43,260 - root - INFO - Step 6020: lr=4.53E-06, loss= 1.0572 (max= 1.4449), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:43,260 - root - INFO - Step 6020: lr=4.53E-06, loss= 1.0572 (max= 1.4449), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:43,260 - root - INFO - Step 6020: lr=4.53E-06, loss= 1.0572 (max= 1.4449), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:43,260 - root - INFO - Step 6020: lr=4.53E-06, loss= 1.0572 (max= 1.4449), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:43,260 - root - INFO - Step 6020: lr=4.53E-06, loss= 1.0572 (max= 1.4449), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:26:43,260 - root - INFO - Step 6020: lr=4.53E-06, loss= 1.0572 (max= 1.4449), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:15,213 - root - INFO - Step 6030: lr=4.52E-06, loss= 1.0658 (max= 1.5646), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:15,213 - root - INFO - Step 6030: lr=4.52E-06, loss= 1.0658 (max= 1.5646), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:15,213 - root - INFO - Step 6030: lr=4.52E-06, loss= 1.0658 (max= 1.5646), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:15,213 - root - INFO - Step 6030: lr=4.52E-06, loss= 1.0658 (max= 1.5646), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:15,213 - root - INFO - Step 6030: lr=4.52E-06, loss= 1.0658 (max= 1.5646), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:15,213 - root - INFO - Step 6030: lr=4.52E-06, loss= 1.0658 (max= 1.5646), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:15,213 - root - INFO - Step 6030: lr=4.52E-06, loss= 1.0658 (max= 1.5646), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:15,214 - root - INFO - Step 6030: lr=4.52E-06, loss= 1.0658 (max= 1.5646), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:47,133 - root - INFO - Step 6040: lr=4.52E-06, loss= 1.0507 (max= 1.4871), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:47,133 - root - INFO - Step 6040: lr=4.52E-06, loss= 1.0507 (max= 1.4871), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:47,133 - root - INFO - Step 6040: lr=4.52E-06, loss= 1.0507 (max= 1.4871), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:47,133 - root - INFO - Step 6040: lr=4.52E-06, loss= 1.0507 (max= 1.4871), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:47,133 - root - INFO - Step 6040: lr=4.52E-06, loss= 1.0507 (max= 1.4871), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:47,133 - root - INFO - Step 6040: lr=4.52E-06, loss= 1.0507 (max= 1.4871), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:47,133 - root - INFO - Step 6040: lr=4.52E-06, loss= 1.0507 (max= 1.4871), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:27:47,133 - root - INFO - Step 6040: lr=4.52E-06, loss= 1.0507 (max= 1.4871), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:19,022 - root - INFO - Step 6050: lr=4.51E-06, loss= 1.0428 (max= 1.6427), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:19,022 - root - INFO - Step 6050: lr=4.51E-06, loss= 1.0428 (max= 1.6427), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:19,022 - root - INFO - Step 6050: lr=4.51E-06, loss= 1.0428 (max= 1.6427), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:19,022 - root - INFO - Step 6050: lr=4.51E-06, loss= 1.0428 (max= 1.6427), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:19,022 - root - INFO - Step 6050: lr=4.51E-06, loss= 1.0428 (max= 1.6427), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:19,022 - root - INFO - Step 6050: lr=4.51E-06, loss= 1.0428 (max= 1.6427), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:19,022 - root - INFO - Step 6050: lr=4.51E-06, loss= 1.0428 (max= 1.6427), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:19,022 - root - INFO - Step 6050: lr=4.51E-06, loss= 1.0428 (max= 1.6427), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:50,881 - root - INFO - Step 6060: lr=4.51E-06, loss= 1.0678 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:50,881 - root - INFO - Step 6060: lr=4.51E-06, loss= 1.0678 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:50,881 - root - INFO - Step 6060: lr=4.51E-06, loss= 1.0678 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:50,881 - root - INFO - Step 6060: lr=4.51E-06, loss= 1.0678 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:50,881 - root - INFO - Step 6060: lr=4.51E-06, loss= 1.0678 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:50,881 - root - INFO - Step 6060: lr=4.51E-06, loss= 1.0678 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:50,881 - root - INFO - Step 6060: lr=4.51E-06, loss= 1.0678 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:28:50,881 - root - INFO - Step 6060: lr=4.51E-06, loss= 1.0678 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:22,744 - root - INFO - Step 6070: lr=4.50E-06, loss= 1.0562 (max= 1.5218), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:22,744 - root - INFO - Step 6070: lr=4.50E-06, loss= 1.0562 (max= 1.5218), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:22,745 - root - INFO - Step 6070: lr=4.50E-06, loss= 1.0562 (max= 1.5218), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:22,745 - root - INFO - Step 6070: lr=4.50E-06, loss= 1.0562 (max= 1.5218), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:22,745 - root - INFO - Step 6070: lr=4.50E-06, loss= 1.0562 (max= 1.5218), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:22,745 - root - INFO - Step 6070: lr=4.50E-06, loss= 1.0562 (max= 1.5218), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:22,745 - root - INFO - Step 6070: lr=4.50E-06, loss= 1.0562 (max= 1.5218), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:22,745 - root - INFO - Step 6070: lr=4.50E-06, loss= 1.0562 (max= 1.5218), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:54,569 - root - INFO - Step 6080: lr=4.50E-06, loss= 1.0545 (max= 1.5519), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:54,569 - root - INFO - Step 6080: lr=4.50E-06, loss= 1.0545 (max= 1.5519), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:54,569 - root - INFO - Step 6080: lr=4.50E-06, loss= 1.0545 (max= 1.5519), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:54,569 - root - INFO - Step 6080: lr=4.50E-06, loss= 1.0545 (max= 1.5519), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:54,569 - root - INFO - Step 6080: lr=4.50E-06, loss= 1.0545 (max= 1.5519), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:54,569 - root - INFO - Step 6080: lr=4.50E-06, loss= 1.0545 (max= 1.5519), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:54,569 - root - INFO - Step 6080: lr=4.50E-06, loss= 1.0545 (max= 1.5519), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:29:54,570 - root - INFO - Step 6080: lr=4.50E-06, loss= 1.0545 (max= 1.5519), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:26,446 - root - INFO - Step 6090: lr=4.49E-06, loss= 1.0470 (max= 1.5046), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:26,446 - root - INFO - Step 6090: lr=4.49E-06, loss= 1.0470 (max= 1.5046), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:26,446 - root - INFO - Step 6090: lr=4.49E-06, loss= 1.0470 (max= 1.5046), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:26,446 - root - INFO - Step 6090: lr=4.49E-06, loss= 1.0470 (max= 1.5046), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:26,446 - root - INFO - Step 6090: lr=4.49E-06, loss= 1.0470 (max= 1.5046), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:26,446 - root - INFO - Step 6090: lr=4.49E-06, loss= 1.0470 (max= 1.5046), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:26,446 - root - INFO - Step 6090: lr=4.49E-06, loss= 1.0470 (max= 1.5046), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:26,447 - root - INFO - Step 6090: lr=4.49E-06, loss= 1.0470 (max= 1.5046), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:26,452 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:7367368 2025-10-26 13:30:58,368 - root - INFO - Step 6100: lr=4.49E-06, loss= 1.0708 (max= 1.4896), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:58,369 - root - INFO - Step 6100: lr=4.49E-06, loss= 1.0708 (max= 1.4896), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:58,369 - root - INFO - Step 6100: lr=4.49E-06, loss= 1.0708 (max= 1.4896), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:58,369 - root - INFO - Step 6100: lr=4.49E-06, loss= 1.0708 (max= 1.4896), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:58,369 - root - INFO - Step 6100: lr=4.49E-06, loss= 1.0708 (max= 1.4896), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:58,369 - root - INFO - Step 6100: lr=4.49E-06, loss= 1.0708 (max= 1.4896), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:58,369 - root - INFO - Step 6100: lr=4.49E-06, loss= 1.0708 (max= 1.4896), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:30:58,369 - root - INFO - Step 6100: lr=4.49E-06, loss= 1.0708 (max= 1.4896), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:31:30,248 - root - INFO - Step 6110: lr=4.48E-06, loss= 1.0726 (max= 1.4978), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:31:30,249 - root - INFO - Step 6110: lr=4.48E-06, loss= 1.0726 (max= 1.4978), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:31:30,249 - root - INFO - Step 6110: lr=4.48E-06, loss= 1.0726 (max= 1.4978), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:31:30,249 - root - INFO - Step 6110: lr=4.48E-06, loss= 1.0726 (max= 1.4978), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:31:30,249 - root - INFO - Step 6110: lr=4.48E-06, loss= 1.0726 (max= 1.4978), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:31:30,249 - root - INFO - Step 6110: lr=4.48E-06, loss= 1.0726 (max= 1.4978), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:31:30,249 - root - INFO - Step 6110: lr=4.48E-06, loss= 1.0726 (max= 1.4978), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:31:30,249 - root - INFO - Step 6110: lr=4.48E-06, loss= 1.0726 (max= 1.4978), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:02,125 - root - INFO - Step 6120: lr=4.48E-06, loss= 1.0471 (max= 1.4916), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:02,125 - root - INFO - Step 6120: lr=4.48E-06, loss= 1.0471 (max= 1.4916), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:02,125 - root - INFO - Step 6120: lr=4.48E-06, loss= 1.0471 (max= 1.4916), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:02,125 - root - INFO - Step 6120: lr=4.48E-06, loss= 1.0471 (max= 1.4916), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:02,125 - root - INFO - Step 6120: lr=4.48E-06, loss= 1.0471 (max= 1.4916), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:02,125 - root - INFO - Step 6120: lr=4.48E-06, loss= 1.0471 (max= 1.4916), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:02,125 - root - INFO - Step 6120: lr=4.48E-06, loss= 1.0471 (max= 1.4916), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:02,126 - root - INFO - Step 6120: lr=4.48E-06, loss= 1.0471 (max= 1.4916), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:33,981 - root - INFO - Step 6130: lr=4.47E-06, loss= 1.0816 (max= 1.7579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:33,981 - root - INFO - Step 6130: lr=4.47E-06, loss= 1.0816 (max= 1.7579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:33,981 - root - INFO - Step 6130: lr=4.47E-06, loss= 1.0816 (max= 1.7579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:33,981 - root - INFO - Step 6130: lr=4.47E-06, loss= 1.0816 (max= 1.7579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:33,981 - root - INFO - Step 6130: lr=4.47E-06, loss= 1.0816 (max= 1.7579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:33,982 - root - INFO - Step 6130: lr=4.47E-06, loss= 1.0816 (max= 1.7579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:33,981 - root - INFO - Step 6130: lr=4.47E-06, loss= 1.0816 (max= 1.7579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:32:33,982 - root - INFO - Step 6130: lr=4.47E-06, loss= 1.0816 (max= 1.7579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:05,869 - root - INFO - Step 6140: lr=4.47E-06, loss= 1.0587 (max= 1.6790), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:05,869 - root - INFO - Step 6140: lr=4.47E-06, loss= 1.0587 (max= 1.6790), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:05,869 - root - INFO - Step 6140: lr=4.47E-06, loss= 1.0587 (max= 1.6790), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:05,869 - root - INFO - Step 6140: lr=4.47E-06, loss= 1.0587 (max= 1.6790), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:05,870 - root - INFO - Step 6140: lr=4.47E-06, loss= 1.0587 (max= 1.6790), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:05,870 - root - INFO - Step 6140: lr=4.47E-06, loss= 1.0587 (max= 1.6790), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:05,870 - root - INFO - Step 6140: lr=4.47E-06, loss= 1.0587 (max= 1.6790), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:05,870 - root - INFO - Step 6140: lr=4.47E-06, loss= 1.0587 (max= 1.6790), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:37,732 - root - INFO - Step 6150: lr=4.46E-06, loss= 1.0657 (max= 1.5468), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:37,732 - root - INFO - Step 6150: lr=4.46E-06, loss= 1.0657 (max= 1.5468), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:37,732 - root - INFO - Step 6150: lr=4.46E-06, loss= 1.0657 (max= 1.5468), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:37,732 - root - INFO - Step 6150: lr=4.46E-06, loss= 1.0657 (max= 1.5468), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:37,732 - root - INFO - Step 6150: lr=4.46E-06, loss= 1.0657 (max= 1.5468), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:37,732 - root - INFO - Step 6150: lr=4.46E-06, loss= 1.0657 (max= 1.5468), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:37,732 - root - INFO - Step 6150: lr=4.46E-06, loss= 1.0657 (max= 1.5468), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:33:37,733 - root - INFO - Step 6150: lr=4.46E-06, loss= 1.0657 (max= 1.5468), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:09,567 - root - INFO - Step 6160: lr=4.46E-06, loss= 1.0751 (max= 1.4585), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:09,567 - root - INFO - Step 6160: lr=4.46E-06, loss= 1.0751 (max= 1.4585), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:09,567 - root - INFO - Step 6160: lr=4.46E-06, loss= 1.0751 (max= 1.4585), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:09,567 - root - INFO - Step 6160: lr=4.46E-06, loss= 1.0751 (max= 1.4585), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:09,567 - root - INFO - Step 6160: lr=4.46E-06, loss= 1.0751 (max= 1.4585), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:09,567 - root - INFO - Step 6160: lr=4.46E-06, loss= 1.0751 (max= 1.4585), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:09,567 - root - INFO - Step 6160: lr=4.46E-06, loss= 1.0751 (max= 1.4585), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:09,567 - root - INFO - Step 6160: lr=4.46E-06, loss= 1.0751 (max= 1.4585), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:41,574 - root - INFO - Step 6170: lr=4.45E-06, loss= 1.0538 (max= 1.5069), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:41,574 - root - INFO - Step 6170: lr=4.45E-06, loss= 1.0538 (max= 1.5069), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:41,574 - root - INFO - Step 6170: lr=4.45E-06, loss= 1.0538 (max= 1.5069), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:41,574 - root - INFO - Step 6170: lr=4.45E-06, loss= 1.0538 (max= 1.5069), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:41,574 - root - INFO - Step 6170: lr=4.45E-06, loss= 1.0538 (max= 1.5069), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:41,574 - root - INFO - Step 6170: lr=4.45E-06, loss= 1.0538 (max= 1.5069), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:41,574 - root - INFO - Step 6170: lr=4.45E-06, loss= 1.0538 (max= 1.5069), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:34:41,575 - root - INFO - Step 6170: lr=4.45E-06, loss= 1.0538 (max= 1.5069), tps=20478, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:13,487 - root - INFO - Step 6180: lr=4.45E-06, loss= 1.0676 (max= 1.7076), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:13,487 - root - INFO - Step 6180: lr=4.45E-06, loss= 1.0676 (max= 1.7076), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:13,487 - root - INFO - Step 6180: lr=4.45E-06, loss= 1.0676 (max= 1.7076), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:13,487 - root - INFO - Step 6180: lr=4.45E-06, loss= 1.0676 (max= 1.7076), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:13,487 - root - INFO - Step 6180: lr=4.45E-06, loss= 1.0676 (max= 1.7076), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:13,487 - root - INFO - Step 6180: lr=4.45E-06, loss= 1.0676 (max= 1.7076), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:13,487 - root - INFO - Step 6180: lr=4.45E-06, loss= 1.0676 (max= 1.7076), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:13,488 - root - INFO - Step 6180: lr=4.45E-06, loss= 1.0676 (max= 1.7076), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:45,308 - root - INFO - Step 6190: lr=4.44E-06, loss= 1.0489 (max= 1.6769), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:45,308 - root - INFO - Step 6190: lr=4.44E-06, loss= 1.0489 (max= 1.6769), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:45,308 - root - INFO - Step 6190: lr=4.44E-06, loss= 1.0489 (max= 1.6769), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:45,308 - root - INFO - Step 6190: lr=4.44E-06, loss= 1.0489 (max= 1.6769), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:45,308 - root - INFO - Step 6190: lr=4.44E-06, loss= 1.0489 (max= 1.6769), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:45,308 - root - INFO - Step 6190: lr=4.44E-06, loss= 1.0489 (max= 1.6769), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:45,308 - root - INFO - Step 6190: lr=4.44E-06, loss= 1.0489 (max= 1.6769), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:35:45,309 - root - INFO - Step 6190: lr=4.44E-06, loss= 1.0489 (max= 1.6769), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:17,454 - root - INFO - Step 6200: lr=4.44E-06, loss= 1.0778 (max= 1.5468), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:17,454 - root - INFO - Step 6200: lr=4.44E-06, loss= 1.0778 (max= 1.5468), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:17,454 - root - INFO - Step 6200: lr=4.44E-06, loss= 1.0778 (max= 1.5468), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:17,454 - root - INFO - Step 6200: lr=4.44E-06, loss= 1.0778 (max= 1.5468), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:17,454 - root - INFO - Step 6200: lr=4.44E-06, loss= 1.0778 (max= 1.5468), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:17,454 - root - INFO - Step 6200: lr=4.44E-06, loss= 1.0778 (max= 1.5468), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:17,454 - root - INFO - Step 6200: lr=4.44E-06, loss= 1.0778 (max= 1.5468), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:17,455 - root - INFO - Step 6200: lr=4.44E-06, loss= 1.0778 (max= 1.5468), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:49,976 - root - INFO - Step 6210: lr=4.43E-06, loss= 1.0696 (max= 1.5834), tps=20154, mfu=41.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:49,976 - root - INFO - Step 6210: lr=4.43E-06, loss= 1.0696 (max= 1.5834), tps=20154, mfu=41.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:49,976 - root - INFO - Step 6210: lr=4.43E-06, loss= 1.0696 (max= 1.5834), tps=20154, mfu=41.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:49,976 - root - INFO - Step 6210: lr=4.43E-06, loss= 1.0696 (max= 1.5834), tps=20154, mfu=41.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:49,976 - root - INFO - Step 6210: lr=4.43E-06, loss= 1.0696 (max= 1.5834), tps=20154, mfu=41.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:49,976 - root - INFO - Step 6210: lr=4.43E-06, loss= 1.0696 (max= 1.5834), tps=20154, mfu=41.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:49,976 - root - INFO - Step 6210: lr=4.43E-06, loss= 1.0696 (max= 1.5834), tps=20154, mfu=41.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:36:49,976 - root - INFO - Step 6210: lr=4.43E-06, loss= 1.0696 (max= 1.5834), tps=20154, mfu=41.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:21,868 - root - INFO - Step 6220: lr=4.43E-06, loss= 1.0518 (max= 1.5709), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:21,868 - root - INFO - Step 6220: lr=4.43E-06, loss= 1.0518 (max= 1.5709), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:21,868 - root - INFO - Step 6220: lr=4.43E-06, loss= 1.0518 (max= 1.5709), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:21,868 - root - INFO - Step 6220: lr=4.43E-06, loss= 1.0518 (max= 1.5709), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:21,868 - root - INFO - Step 6220: lr=4.43E-06, loss= 1.0518 (max= 1.5709), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:21,868 - root - INFO - Step 6220: lr=4.43E-06, loss= 1.0518 (max= 1.5709), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:21,868 - root - INFO - Step 6220: lr=4.43E-06, loss= 1.0518 (max= 1.5709), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:21,869 - root - INFO - Step 6220: lr=4.43E-06, loss= 1.0518 (max= 1.5709), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:53,736 - root - INFO - Step 6230: lr=4.42E-06, loss= 1.0853 (max= 1.4656), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:53,737 - root - INFO - Step 6230: lr=4.42E-06, loss= 1.0853 (max= 1.4656), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:53,737 - root - INFO - Step 6230: lr=4.42E-06, loss= 1.0853 (max= 1.4656), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:53,737 - root - INFO - Step 6230: lr=4.42E-06, loss= 1.0853 (max= 1.4656), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:53,737 - root - INFO - Step 6230: lr=4.42E-06, loss= 1.0853 (max= 1.4656), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:53,737 - root - INFO - Step 6230: lr=4.42E-06, loss= 1.0853 (max= 1.4656), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:53,737 - root - INFO - Step 6230: lr=4.42E-06, loss= 1.0853 (max= 1.4656), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:37:53,737 - root - INFO - Step 6230: lr=4.42E-06, loss= 1.0853 (max= 1.4656), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:25,557 - root - INFO - Step 6240: lr=4.42E-06, loss= 1.0570 (max= 1.5224), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:25,557 - root - INFO - Step 6240: lr=4.42E-06, loss= 1.0570 (max= 1.5224), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:25,557 - root - INFO - Step 6240: lr=4.42E-06, loss= 1.0570 (max= 1.5224), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:25,557 - root - INFO - Step 6240: lr=4.42E-06, loss= 1.0570 (max= 1.5224), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:25,557 - root - INFO - Step 6240: lr=4.42E-06, loss= 1.0570 (max= 1.5224), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:25,557 - root - INFO - Step 6240: lr=4.42E-06, loss= 1.0570 (max= 1.5224), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:25,557 - root - INFO - Step 6240: lr=4.42E-06, loss= 1.0570 (max= 1.5224), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:25,557 - root - INFO - Step 6240: lr=4.42E-06, loss= 1.0570 (max= 1.5224), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:57,435 - root - INFO - Step 6250: lr=4.41E-06, loss= 1.0501 (max= 1.4437), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:57,435 - root - INFO - Step 6250: lr=4.41E-06, loss= 1.0501 (max= 1.4437), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:57,435 - root - INFO - Step 6250: lr=4.41E-06, loss= 1.0501 (max= 1.4437), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:57,435 - root - INFO - Step 6250: lr=4.41E-06, loss= 1.0501 (max= 1.4437), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:57,435 - root - INFO - Step 6250: lr=4.41E-06, loss= 1.0501 (max= 1.4437), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:57,435 - root - INFO - Step 6250: lr=4.41E-06, loss= 1.0501 (max= 1.4437), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:57,436 - root - INFO - Step 6250: lr=4.41E-06, loss= 1.0501 (max= 1.4437), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:38:57,436 - root - INFO - Step 6250: lr=4.41E-06, loss= 1.0501 (max= 1.4437), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:39:29,252 - root - INFO - Step 6260: lr=4.41E-06, loss= 1.0821 (max= 1.5237), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:39:29,253 - root - INFO - Step 6260: lr=4.41E-06, loss= 1.0821 (max= 1.5237), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:39:29,253 - root - INFO - Step 6260: lr=4.41E-06, loss= 1.0821 (max= 1.5237), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:39:29,253 - root - INFO - Step 6260: lr=4.41E-06, loss= 1.0821 (max= 1.5237), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:39:29,253 - root - INFO - Step 6260: lr=4.41E-06, loss= 1.0821 (max= 1.5237), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:39:29,253 - root - INFO - Step 6260: lr=4.41E-06, loss= 1.0821 (max= 1.5237), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:39:29,253 - root - INFO - Step 6260: lr=4.41E-06, loss= 1.0821 (max= 1.5237), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:39:29,253 - root - INFO - Step 6260: lr=4.41E-06, loss= 1.0821 (max= 1.5237), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:01,261 - root - INFO - Step 6270: lr=4.40E-06, loss= 1.0643 (max= 1.5737), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:01,262 - root - INFO - Step 6270: lr=4.40E-06, loss= 1.0643 (max= 1.5737), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:01,262 - root - INFO - Step 6270: lr=4.40E-06, loss= 1.0643 (max= 1.5737), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:01,262 - root - INFO - Step 6270: lr=4.40E-06, loss= 1.0643 (max= 1.5737), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:01,262 - root - INFO - Step 6270: lr=4.40E-06, loss= 1.0643 (max= 1.5737), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:01,262 - root - INFO - Step 6270: lr=4.40E-06, loss= 1.0643 (max= 1.5737), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:01,262 - root - INFO - Step 6270: lr=4.40E-06, loss= 1.0643 (max= 1.5737), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:01,262 - root - INFO - Step 6270: lr=4.40E-06, loss= 1.0643 (max= 1.5737), tps=20477, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:33,063 - root - INFO - Step 6280: lr=4.40E-06, loss= 1.0878 (max= 1.5901), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:33,063 - root - INFO - Step 6280: lr=4.40E-06, loss= 1.0878 (max= 1.5901), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:33,063 - root - INFO - Step 6280: lr=4.40E-06, loss= 1.0878 (max= 1.5901), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:33,063 - root - INFO - Step 6280: lr=4.40E-06, loss= 1.0878 (max= 1.5901), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:33,064 - root - INFO - Step 6280: lr=4.40E-06, loss= 1.0878 (max= 1.5901), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:33,064 - root - INFO - Step 6280: lr=4.40E-06, loss= 1.0878 (max= 1.5901), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:33,064 - root - INFO - Step 6280: lr=4.40E-06, loss= 1.0878 (max= 1.5901), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:40:33,064 - root - INFO - Step 6280: lr=4.40E-06, loss= 1.0878 (max= 1.5901), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:04,918 - root - INFO - Step 6290: lr=4.39E-06, loss= 1.0764 (max= 1.4992), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:04,918 - root - INFO - Step 6290: lr=4.39E-06, loss= 1.0764 (max= 1.4992), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:04,918 - root - INFO - Step 6290: lr=4.39E-06, loss= 1.0764 (max= 1.4992), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:04,918 - root - INFO - Step 6290: lr=4.39E-06, loss= 1.0764 (max= 1.4992), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:04,918 - root - INFO - Step 6290: lr=4.39E-06, loss= 1.0764 (max= 1.4992), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:04,918 - root - INFO - Step 6290: lr=4.39E-06, loss= 1.0764 (max= 1.4992), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:04,918 - root - INFO - Step 6290: lr=4.39E-06, loss= 1.0764 (max= 1.4992), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:04,919 - root - INFO - Step 6290: lr=4.39E-06, loss= 1.0764 (max= 1.4992), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:36,768 - root - INFO - Step 6300: lr=4.39E-06, loss= 1.0694 (max= 1.4652), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:36,768 - root - INFO - Step 6300: lr=4.39E-06, loss= 1.0694 (max= 1.4652), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:36,768 - root - INFO - Step 6300: lr=4.39E-06, loss= 1.0694 (max= 1.4652), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:36,768 - root - INFO - Step 6300: lr=4.39E-06, loss= 1.0694 (max= 1.4652), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:36,768 - root - INFO - Step 6300: lr=4.39E-06, loss= 1.0694 (max= 1.4652), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:36,768 - root - INFO - Step 6300: lr=4.39E-06, loss= 1.0694 (max= 1.4652), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:36,768 - root - INFO - Step 6300: lr=4.39E-06, loss= 1.0694 (max= 1.4652), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:41:36,769 - root - INFO - Step 6300: lr=4.39E-06, loss= 1.0694 (max= 1.4652), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:42:08,711 - root - INFO - Step 6310: lr=4.38E-06, loss= 1.0545 (max= 1.4956), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:42:08,711 - root - INFO - Step 6310: lr=4.38E-06, loss= 1.0545 (max= 1.4956), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:42:08,711 - root - INFO - Step 6310: lr=4.38E-06, loss= 1.0545 (max= 1.4956), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:42:08,711 - root - INFO - Step 6310: lr=4.38E-06, loss= 1.0545 (max= 1.4956), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:42:08,711 - root - INFO - Step 6310: lr=4.38E-06, loss= 1.0545 (max= 1.4956), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:42:08,711 - root - INFO - Step 6310: lr=4.38E-06, loss= 1.0545 (max= 1.4956), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:42:08,711 - root - INFO - Step 6310: lr=4.38E-06, loss= 1.0545 (max= 1.4956), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:42:08,712 - root - INFO - Step 6310: lr=4.38E-06, loss= 1.0545 (max= 1.4956), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:42:40,563 - root - INFO - Step 6320: lr=4.38E-06, loss= 1.0560 (max= 1.4503), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 2.00%) 2025-10-26 13:42:40,563 - root - INFO - Step 6320: lr=4.38E-06, loss= 1.0560 (max= 1.4503), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 2.00%) 2025-10-26 13:42:40,563 - root - INFO - Step 6320: lr=4.38E-06, loss= 1.0560 (max= 1.4503), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 2.00%) 2025-10-26 13:42:40,563 - root - INFO - Step 6320: lr=4.38E-06, loss= 1.0560 (max= 1.4503), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 2.00%) 2025-10-26 13:42:40,563 - root - INFO - Step 6320: lr=4.38E-06, loss= 1.0560 (max= 1.4503), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 2.00%) 2025-10-26 13:42:40,563 - root - INFO - Step 6320: lr=4.38E-06, loss= 1.0560 (max= 1.4503), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 2.00%) 2025-10-26 13:42:40,563 - root - INFO - Step 6320: lr=4.38E-06, loss= 1.0560 (max= 1.4503), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 2.00%) 2025-10-26 13:42:40,564 - root - INFO - Step 6320: lr=4.38E-06, loss= 1.0560 (max= 1.4503), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.03s, 2.00%) 2025-10-26 13:43:12,419 - root - INFO - Step 6330: lr=4.38E-06, loss= 1.0601 (max= 1.4414), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:12,419 - root - INFO - Step 6330: lr=4.38E-06, loss= 1.0601 (max= 1.4414), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:12,419 - root - INFO - Step 6330: lr=4.38E-06, loss= 1.0601 (max= 1.4414), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:12,419 - root - INFO - Step 6330: lr=4.38E-06, loss= 1.0601 (max= 1.4414), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:12,419 - root - INFO - Step 6330: lr=4.38E-06, loss= 1.0601 (max= 1.4414), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:12,419 - root - INFO - Step 6330: lr=4.38E-06, loss= 1.0601 (max= 1.4414), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:12,419 - root - INFO - Step 6330: lr=4.38E-06, loss= 1.0601 (max= 1.4414), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:12,419 - root - INFO - Step 6330: lr=4.38E-06, loss= 1.0601 (max= 1.4414), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:44,313 - root - INFO - Step 6340: lr=4.37E-06, loss= 1.0490 (max= 1.4794), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:44,313 - root - INFO - Step 6340: lr=4.37E-06, loss= 1.0490 (max= 1.4794), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:44,313 - root - INFO - Step 6340: lr=4.37E-06, loss= 1.0490 (max= 1.4794), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:44,313 - root - INFO - Step 6340: lr=4.37E-06, loss= 1.0490 (max= 1.4794), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:44,313 - root - INFO - Step 6340: lr=4.37E-06, loss= 1.0490 (max= 1.4794), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:44,313 - root - INFO - Step 6340: lr=4.37E-06, loss= 1.0490 (max= 1.4794), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:44,313 - root - INFO - Step 6340: lr=4.37E-06, loss= 1.0490 (max= 1.4794), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:43:44,313 - root - INFO - Step 6340: lr=4.37E-06, loss= 1.0490 (max= 1.4794), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:16,168 - root - INFO - Step 6350: lr=4.37E-06, loss= 1.0605 (max= 1.5399), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:16,168 - root - INFO - Step 6350: lr=4.37E-06, loss= 1.0605 (max= 1.5399), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:16,168 - root - INFO - Step 6350: lr=4.37E-06, loss= 1.0605 (max= 1.5399), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:16,168 - root - INFO - Step 6350: lr=4.37E-06, loss= 1.0605 (max= 1.5399), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:16,169 - root - INFO - Step 6350: lr=4.37E-06, loss= 1.0605 (max= 1.5399), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:16,169 - root - INFO - Step 6350: lr=4.37E-06, loss= 1.0605 (max= 1.5399), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:16,169 - root - INFO - Step 6350: lr=4.37E-06, loss= 1.0605 (max= 1.5399), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:16,169 - root - INFO - Step 6350: lr=4.37E-06, loss= 1.0605 (max= 1.5399), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:48,053 - root - INFO - Step 6360: lr=4.36E-06, loss= 1.0707 (max= 1.4714), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:48,053 - root - INFO - Step 6360: lr=4.36E-06, loss= 1.0707 (max= 1.4714), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:48,053 - root - INFO - Step 6360: lr=4.36E-06, loss= 1.0707 (max= 1.4714), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:48,053 - root - INFO - Step 6360: lr=4.36E-06, loss= 1.0707 (max= 1.4714), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:48,053 - root - INFO - Step 6360: lr=4.36E-06, loss= 1.0707 (max= 1.4714), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:48,053 - root - INFO - Step 6360: lr=4.36E-06, loss= 1.0707 (max= 1.4714), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:48,053 - root - INFO - Step 6360: lr=4.36E-06, loss= 1.0707 (max= 1.4714), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:44:48,055 - root - INFO - Step 6360: lr=4.36E-06, loss= 1.0707 (max= 1.4714), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:19,880 - root - INFO - Step 6370: lr=4.36E-06, loss= 1.0607 (max= 1.5157), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:19,880 - root - INFO - Step 6370: lr=4.36E-06, loss= 1.0607 (max= 1.5157), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:19,880 - root - INFO - Step 6370: lr=4.36E-06, loss= 1.0607 (max= 1.5157), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:19,880 - root - INFO - Step 6370: lr=4.36E-06, loss= 1.0607 (max= 1.5157), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:19,881 - root - INFO - Step 6370: lr=4.36E-06, loss= 1.0607 (max= 1.5157), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:19,881 - root - INFO - Step 6370: lr=4.36E-06, loss= 1.0607 (max= 1.5157), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:19,881 - root - INFO - Step 6370: lr=4.36E-06, loss= 1.0607 (max= 1.5157), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:19,881 - root - INFO - Step 6370: lr=4.36E-06, loss= 1.0607 (max= 1.5157), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:51,646 - root - INFO - Step 6380: lr=4.35E-06, loss= 1.0530 (max= 1.4706), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:51,646 - root - INFO - Step 6380: lr=4.35E-06, loss= 1.0530 (max= 1.4706), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:51,646 - root - INFO - Step 6380: lr=4.35E-06, loss= 1.0530 (max= 1.4706), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:51,646 - root - INFO - Step 6380: lr=4.35E-06, loss= 1.0530 (max= 1.4706), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:51,646 - root - INFO - Step 6380: lr=4.35E-06, loss= 1.0530 (max= 1.4706), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:51,646 - root - INFO - Step 6380: lr=4.35E-06, loss= 1.0530 (max= 1.4706), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:51,647 - root - INFO - Step 6380: lr=4.35E-06, loss= 1.0530 (max= 1.4706), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:45:51,647 - root - INFO - Step 6380: lr=4.35E-06, loss= 1.0530 (max= 1.4706), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:23,459 - root - INFO - Step 6390: lr=4.35E-06, loss= 1.0747 (max= 1.4973), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:23,459 - root - INFO - Step 6390: lr=4.35E-06, loss= 1.0747 (max= 1.4973), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:23,459 - root - INFO - Step 6390: lr=4.35E-06, loss= 1.0747 (max= 1.4973), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:23,459 - root - INFO - Step 6390: lr=4.35E-06, loss= 1.0747 (max= 1.4973), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:23,460 - root - INFO - Step 6390: lr=4.35E-06, loss= 1.0747 (max= 1.4973), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:23,460 - root - INFO - Step 6390: lr=4.35E-06, loss= 1.0747 (max= 1.4973), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:23,460 - root - INFO - Step 6390: lr=4.35E-06, loss= 1.0747 (max= 1.4973), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:23,460 - root - INFO - Step 6390: lr=4.35E-06, loss= 1.0747 (max= 1.4973), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:55,393 - root - INFO - Step 6400: lr=4.34E-06, loss= 1.0627 (max= 1.5092), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:55,393 - root - INFO - Step 6400: lr=4.34E-06, loss= 1.0627 (max= 1.5092), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:55,393 - root - INFO - Step 6400: lr=4.34E-06, loss= 1.0627 (max= 1.5092), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:55,393 - root - INFO - Step 6400: lr=4.34E-06, loss= 1.0627 (max= 1.5092), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:55,393 - root - INFO - Step 6400: lr=4.34E-06, loss= 1.0627 (max= 1.5092), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:55,393 - root - INFO - Step 6400: lr=4.34E-06, loss= 1.0627 (max= 1.5092), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:55,393 - root - INFO - Step 6400: lr=4.34E-06, loss= 1.0627 (max= 1.5092), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:46:55,393 - root - INFO - Step 6400: lr=4.34E-06, loss= 1.0627 (max= 1.5092), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:27,283 - root - INFO - Step 6410: lr=4.34E-06, loss= 1.0706 (max= 1.6087), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:27,283 - root - INFO - Step 6410: lr=4.34E-06, loss= 1.0706 (max= 1.6087), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:27,283 - root - INFO - Step 6410: lr=4.34E-06, loss= 1.0706 (max= 1.6087), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:27,283 - root - INFO - Step 6410: lr=4.34E-06, loss= 1.0706 (max= 1.6087), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:27,283 - root - INFO - Step 6410: lr=4.34E-06, loss= 1.0706 (max= 1.6087), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:27,283 - root - INFO - Step 6410: lr=4.34E-06, loss= 1.0706 (max= 1.6087), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:27,283 - root - INFO - Step 6410: lr=4.34E-06, loss= 1.0706 (max= 1.6087), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:27,284 - root - INFO - Step 6410: lr=4.34E-06, loss= 1.0706 (max= 1.6087), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:59,402 - root - INFO - Step 6420: lr=4.33E-06, loss= 1.0649 (max= 1.6236), tps=20406, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:59,402 - root - INFO - Step 6420: lr=4.33E-06, loss= 1.0649 (max= 1.6236), tps=20406, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:59,402 - root - INFO - Step 6420: lr=4.33E-06, loss= 1.0649 (max= 1.6236), tps=20406, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:59,402 - root - INFO - Step 6420: lr=4.33E-06, loss= 1.0649 (max= 1.6236), tps=20406, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:59,402 - root - INFO - Step 6420: lr=4.33E-06, loss= 1.0649 (max= 1.6236), tps=20406, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:59,402 - root - INFO - Step 6420: lr=4.33E-06, loss= 1.0649 (max= 1.6236), tps=20406, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:59,402 - root - INFO - Step 6420: lr=4.33E-06, loss= 1.0649 (max= 1.6236), tps=20406, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:47:59,403 - root - INFO - Step 6420: lr=4.33E-06, loss= 1.0649 (max= 1.6236), tps=20406, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:48:31,223 - root - INFO - Step 6430: lr=4.33E-06, loss= 1.0820 (max= 1.6410), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:48:31,223 - root - INFO - Step 6430: lr=4.33E-06, loss= 1.0820 (max= 1.6410), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:48:31,223 - root - INFO - Step 6430: lr=4.33E-06, loss= 1.0820 (max= 1.6410), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:48:31,223 - root - INFO - Step 6430: lr=4.33E-06, loss= 1.0820 (max= 1.6410), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:48:31,223 - root - INFO - Step 6430: lr=4.33E-06, loss= 1.0820 (max= 1.6410), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:48:31,223 - root - INFO - Step 6430: lr=4.33E-06, loss= 1.0820 (max= 1.6410), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:48:31,223 - root - INFO - Step 6430: lr=4.33E-06, loss= 1.0820 (max= 1.6410), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:48:31,224 - root - INFO - Step 6430: lr=4.33E-06, loss= 1.0820 (max= 1.6410), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:03,049 - root - INFO - Step 6440: lr=4.32E-06, loss= 1.0717 (max= 1.4918), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:03,049 - root - INFO - Step 6440: lr=4.32E-06, loss= 1.0717 (max= 1.4918), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:03,049 - root - INFO - Step 6440: lr=4.32E-06, loss= 1.0717 (max= 1.4918), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:03,049 - root - INFO - Step 6440: lr=4.32E-06, loss= 1.0717 (max= 1.4918), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:03,049 - root - INFO - Step 6440: lr=4.32E-06, loss= 1.0717 (max= 1.4918), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:03,049 - root - INFO - Step 6440: lr=4.32E-06, loss= 1.0717 (max= 1.4918), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:03,049 - root - INFO - Step 6440: lr=4.32E-06, loss= 1.0717 (max= 1.4918), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:03,049 - root - INFO - Step 6440: lr=4.32E-06, loss= 1.0717 (max= 1.4918), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:34,896 - root - INFO - Step 6450: lr=4.32E-06, loss= 1.0604 (max= 1.4719), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:34,896 - root - INFO - Step 6450: lr=4.32E-06, loss= 1.0604 (max= 1.4719), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:34,896 - root - INFO - Step 6450: lr=4.32E-06, loss= 1.0604 (max= 1.4719), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:34,896 - root - INFO - Step 6450: lr=4.32E-06, loss= 1.0604 (max= 1.4719), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:34,896 - root - INFO - Step 6450: lr=4.32E-06, loss= 1.0604 (max= 1.4719), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:34,896 - root - INFO - Step 6450: lr=4.32E-06, loss= 1.0604 (max= 1.4719), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:34,896 - root - INFO - Step 6450: lr=4.32E-06, loss= 1.0604 (max= 1.4719), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:49:34,897 - root - INFO - Step 6450: lr=4.32E-06, loss= 1.0604 (max= 1.4719), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:06,938 - root - INFO - Step 6460: lr=4.31E-06, loss= 1.0643 (max= 1.8788), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:06,938 - root - INFO - Step 6460: lr=4.31E-06, loss= 1.0643 (max= 1.8788), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:06,938 - root - INFO - Step 6460: lr=4.31E-06, loss= 1.0643 (max= 1.8788), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:06,939 - root - INFO - Step 6460: lr=4.31E-06, loss= 1.0643 (max= 1.8788), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:06,939 - root - INFO - Step 6460: lr=4.31E-06, loss= 1.0643 (max= 1.8788), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:06,939 - root - INFO - Step 6460: lr=4.31E-06, loss= 1.0643 (max= 1.8788), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:06,939 - root - INFO - Step 6460: lr=4.31E-06, loss= 1.0643 (max= 1.8788), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:06,939 - root - INFO - Step 6460: lr=4.31E-06, loss= 1.0643 (max= 1.8788), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:29,940 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2578770 2025-10-26 13:50:39,042 - root - INFO - Step 6470: lr=4.31E-06, loss= 1.0812 (max= 1.4813), tps=20416, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:39,042 - root - INFO - Step 6470: lr=4.31E-06, loss= 1.0812 (max= 1.4813), tps=20416, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:39,042 - root - INFO - Step 6470: lr=4.31E-06, loss= 1.0812 (max= 1.4813), tps=20416, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:39,042 - root - INFO - Step 6470: lr=4.31E-06, loss= 1.0812 (max= 1.4813), tps=20416, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:39,042 - root - INFO - Step 6470: lr=4.31E-06, loss= 1.0812 (max= 1.4813), tps=20416, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:39,042 - root - INFO - Step 6470: lr=4.31E-06, loss= 1.0812 (max= 1.4813), tps=20416, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:39,042 - root - INFO - Step 6470: lr=4.31E-06, loss= 1.0812 (max= 1.4813), tps=20416, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:50:39,043 - root - INFO - Step 6470: lr=4.31E-06, loss= 1.0812 (max= 1.4813), tps=20416, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:10,973 - root - INFO - Step 6480: lr=4.30E-06, loss= 1.0767 (max= 1.5999), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:10,973 - root - INFO - Step 6480: lr=4.30E-06, loss= 1.0767 (max= 1.5999), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:10,974 - root - INFO - Step 6480: lr=4.30E-06, loss= 1.0767 (max= 1.5999), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:10,974 - root - INFO - Step 6480: lr=4.30E-06, loss= 1.0767 (max= 1.5999), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:10,974 - root - INFO - Step 6480: lr=4.30E-06, loss= 1.0767 (max= 1.5999), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:10,974 - root - INFO - Step 6480: lr=4.30E-06, loss= 1.0767 (max= 1.5999), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:10,974 - root - INFO - Step 6480: lr=4.30E-06, loss= 1.0767 (max= 1.5999), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:10,974 - root - INFO - Step 6480: lr=4.30E-06, loss= 1.0767 (max= 1.5999), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:42,850 - root - INFO - Step 6490: lr=4.30E-06, loss= 1.0535 (max= 1.6823), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:42,850 - root - INFO - Step 6490: lr=4.30E-06, loss= 1.0535 (max= 1.6823), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:42,850 - root - INFO - Step 6490: lr=4.30E-06, loss= 1.0535 (max= 1.6823), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:42,850 - root - INFO - Step 6490: lr=4.30E-06, loss= 1.0535 (max= 1.6823), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:42,850 - root - INFO - Step 6490: lr=4.30E-06, loss= 1.0535 (max= 1.6823), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:42,850 - root - INFO - Step 6490: lr=4.30E-06, loss= 1.0535 (max= 1.6823), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:42,850 - root - INFO - Step 6490: lr=4.30E-06, loss= 1.0535 (max= 1.6823), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:51:42,851 - root - INFO - Step 6490: lr=4.30E-06, loss= 1.0535 (max= 1.6823), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:14,707 - root - INFO - Step 6500: lr=4.29E-06, loss= 1.0819 (max= 1.5129), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:14,707 - root - INFO - Step 6500: lr=4.29E-06, loss= 1.0819 (max= 1.5129), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:14,707 - root - INFO - Step 6500: lr=4.29E-06, loss= 1.0819 (max= 1.5129), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:14,707 - root - INFO - Step 6500: lr=4.29E-06, loss= 1.0819 (max= 1.5129), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:14,707 - root - INFO - Step 6500: lr=4.29E-06, loss= 1.0819 (max= 1.5129), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:14,707 - root - INFO - Step 6500: lr=4.29E-06, loss= 1.0819 (max= 1.5129), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:14,707 - root - INFO - Step 6500: lr=4.29E-06, loss= 1.0819 (max= 1.5129), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:14,708 - root - INFO - Step 6500: lr=4.29E-06, loss= 1.0819 (max= 1.5129), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:46,522 - root - INFO - Step 6510: lr=4.29E-06, loss= 1.0863 (max= 1.4776), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:46,522 - root - INFO - Step 6510: lr=4.29E-06, loss= 1.0863 (max= 1.4776), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:46,522 - root - INFO - Step 6510: lr=4.29E-06, loss= 1.0863 (max= 1.4776), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:46,522 - root - INFO - Step 6510: lr=4.29E-06, loss= 1.0863 (max= 1.4776), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:46,522 - root - INFO - Step 6510: lr=4.29E-06, loss= 1.0863 (max= 1.4776), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:46,522 - root - INFO - Step 6510: lr=4.29E-06, loss= 1.0863 (max= 1.4776), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:46,522 - root - INFO - Step 6510: lr=4.29E-06, loss= 1.0863 (max= 1.4776), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:52:46,522 - root - INFO - Step 6510: lr=4.29E-06, loss= 1.0863 (max= 1.4776), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:18,349 - root - INFO - Step 6520: lr=4.28E-06, loss= 1.0618 (max= 1.5085), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:18,350 - root - INFO - Step 6520: lr=4.28E-06, loss= 1.0618 (max= 1.5085), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:18,350 - root - INFO - Step 6520: lr=4.28E-06, loss= 1.0618 (max= 1.5085), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:18,350 - root - INFO - Step 6520: lr=4.28E-06, loss= 1.0618 (max= 1.5085), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:18,350 - root - INFO - Step 6520: lr=4.28E-06, loss= 1.0618 (max= 1.5085), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:18,350 - root - INFO - Step 6520: lr=4.28E-06, loss= 1.0618 (max= 1.5085), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:18,350 - root - INFO - Step 6520: lr=4.28E-06, loss= 1.0618 (max= 1.5085), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:18,350 - root - INFO - Step 6520: lr=4.28E-06, loss= 1.0618 (max= 1.5085), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:50,187 - root - INFO - Step 6530: lr=4.28E-06, loss= 1.0613 (max= 1.5181), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:50,188 - root - INFO - Step 6530: lr=4.28E-06, loss= 1.0613 (max= 1.5181), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:50,188 - root - INFO - Step 6530: lr=4.28E-06, loss= 1.0613 (max= 1.5181), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:50,188 - root - INFO - Step 6530: lr=4.28E-06, loss= 1.0613 (max= 1.5181), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:50,188 - root - INFO - Step 6530: lr=4.28E-06, loss= 1.0613 (max= 1.5181), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:50,188 - root - INFO - Step 6530: lr=4.28E-06, loss= 1.0613 (max= 1.5181), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:50,188 - root - INFO - Step 6530: lr=4.28E-06, loss= 1.0613 (max= 1.5181), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:53:50,189 - root - INFO - Step 6530: lr=4.28E-06, loss= 1.0613 (max= 1.5181), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:22,011 - root - INFO - Step 6540: lr=4.27E-06, loss= 1.0433 (max= 1.4439), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:22,011 - root - INFO - Step 6540: lr=4.27E-06, loss= 1.0433 (max= 1.4439), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:22,011 - root - INFO - Step 6540: lr=4.27E-06, loss= 1.0433 (max= 1.4439), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:22,012 - root - INFO - Step 6540: lr=4.27E-06, loss= 1.0433 (max= 1.4439), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:22,012 - root - INFO - Step 6540: lr=4.27E-06, loss= 1.0433 (max= 1.4439), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:22,012 - root - INFO - Step 6540: lr=4.27E-06, loss= 1.0433 (max= 1.4439), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:22,012 - root - INFO - Step 6540: lr=4.27E-06, loss= 1.0433 (max= 1.4439), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:22,012 - root - INFO - Step 6540: lr=4.27E-06, loss= 1.0433 (max= 1.4439), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:53,914 - root - INFO - Step 6550: lr=4.27E-06, loss= 1.0892 (max= 1.5348), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:53,914 - root - INFO - Step 6550: lr=4.27E-06, loss= 1.0892 (max= 1.5348), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:53,914 - root - INFO - Step 6550: lr=4.27E-06, loss= 1.0892 (max= 1.5348), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:53,914 - root - INFO - Step 6550: lr=4.27E-06, loss= 1.0892 (max= 1.5348), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:53,914 - root - INFO - Step 6550: lr=4.27E-06, loss= 1.0892 (max= 1.5348), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:53,914 - root - INFO - Step 6550: lr=4.27E-06, loss= 1.0892 (max= 1.5348), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:53,914 - root - INFO - Step 6550: lr=4.27E-06, loss= 1.0892 (max= 1.5348), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:54:53,915 - root - INFO - Step 6550: lr=4.27E-06, loss= 1.0892 (max= 1.5348), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:25,951 - root - INFO - Step 6560: lr=4.27E-06, loss= 1.0715 (max= 1.6483), tps=20458, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:25,951 - root - INFO - Step 6560: lr=4.27E-06, loss= 1.0715 (max= 1.6483), tps=20458, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:25,951 - root - INFO - Step 6560: lr=4.27E-06, loss= 1.0715 (max= 1.6483), tps=20458, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:25,951 - root - INFO - Step 6560: lr=4.27E-06, loss= 1.0715 (max= 1.6483), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:25,951 - root - INFO - Step 6560: lr=4.27E-06, loss= 1.0715 (max= 1.6483), tps=20458, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:25,951 - root - INFO - Step 6560: lr=4.27E-06, loss= 1.0715 (max= 1.6483), tps=20458, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:25,951 - root - INFO - Step 6560: lr=4.27E-06, loss= 1.0715 (max= 1.6483), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:25,952 - root - INFO - Step 6560: lr=4.27E-06, loss= 1.0715 (max= 1.6483), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:57,827 - root - INFO - Step 6570: lr=4.26E-06, loss= 1.0627 (max= 1.4796), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:57,827 - root - INFO - Step 6570: lr=4.26E-06, loss= 1.0627 (max= 1.4796), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:57,827 - root - INFO - Step 6570: lr=4.26E-06, loss= 1.0627 (max= 1.4796), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:57,827 - root - INFO - Step 6570: lr=4.26E-06, loss= 1.0627 (max= 1.4796), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:57,827 - root - INFO - Step 6570: lr=4.26E-06, loss= 1.0627 (max= 1.4796), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:57,827 - root - INFO - Step 6570: lr=4.26E-06, loss= 1.0627 (max= 1.4796), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:57,827 - root - INFO - Step 6570: lr=4.26E-06, loss= 1.0627 (max= 1.4796), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:55:57,828 - root - INFO - Step 6570: lr=4.26E-06, loss= 1.0627 (max= 1.4796), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:56:29,635 - root - INFO - Step 6580: lr=4.26E-06, loss= 1.0438 (max= 1.5411), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:56:29,635 - root - INFO - Step 6580: lr=4.26E-06, loss= 1.0438 (max= 1.5411), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:56:29,635 - root - INFO - Step 6580: lr=4.26E-06, loss= 1.0438 (max= 1.5411), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:56:29,635 - root - INFO - Step 6580: lr=4.26E-06, loss= 1.0438 (max= 1.5411), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:56:29,635 - root - INFO - Step 6580: lr=4.26E-06, loss= 1.0438 (max= 1.5411), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:56:29,635 - root - INFO - Step 6580: lr=4.26E-06, loss= 1.0438 (max= 1.5411), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:56:29,635 - root - INFO - Step 6580: lr=4.26E-06, loss= 1.0438 (max= 1.5411), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:56:29,636 - root - INFO - Step 6580: lr=4.26E-06, loss= 1.0438 (max= 1.5411), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:01,884 - root - INFO - Step 6590: lr=4.25E-06, loss= 1.0850 (max= 1.5135), tps=20325, mfu=42.35%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:01,884 - root - INFO - Step 6590: lr=4.25E-06, loss= 1.0850 (max= 1.5135), tps=20325, mfu=42.35%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:01,884 - root - INFO - Step 6590: lr=4.25E-06, loss= 1.0850 (max= 1.5135), tps=20325, mfu=42.35%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:01,884 - root - INFO - Step 6590: lr=4.25E-06, loss= 1.0850 (max= 1.5135), tps=20325, mfu=42.35%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:01,884 - root - INFO - Step 6590: lr=4.25E-06, loss= 1.0850 (max= 1.5135), tps=20325, mfu=42.35%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:01,884 - root - INFO - Step 6590: lr=4.25E-06, loss= 1.0850 (max= 1.5135), tps=20324, mfu=42.35%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:01,884 - root - INFO - Step 6590: lr=4.25E-06, loss= 1.0850 (max= 1.5135), tps=20324, mfu=42.35%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:01,884 - root - INFO - Step 6590: lr=4.25E-06, loss= 1.0850 (max= 1.5135), tps=20325, mfu=42.35%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:33,742 - root - INFO - Step 6600: lr=4.25E-06, loss= 1.0630 (max= 1.6354), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:33,742 - root - INFO - Step 6600: lr=4.25E-06, loss= 1.0630 (max= 1.6354), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:33,742 - root - INFO - Step 6600: lr=4.25E-06, loss= 1.0630 (max= 1.6354), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:33,742 - root - INFO - Step 6600: lr=4.25E-06, loss= 1.0630 (max= 1.6354), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:33,742 - root - INFO - Step 6600: lr=4.25E-06, loss= 1.0630 (max= 1.6354), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:33,742 - root - INFO - Step 6600: lr=4.25E-06, loss= 1.0630 (max= 1.6354), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:33,742 - root - INFO - Step 6600: lr=4.25E-06, loss= 1.0630 (max= 1.6354), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:57:33,742 - root - INFO - Step 6600: lr=4.25E-06, loss= 1.0630 (max= 1.6354), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:05,615 - root - INFO - Step 6610: lr=4.24E-06, loss= 1.0614 (max= 1.4235), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:05,615 - root - INFO - Step 6610: lr=4.24E-06, loss= 1.0614 (max= 1.4235), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:05,615 - root - INFO - Step 6610: lr=4.24E-06, loss= 1.0614 (max= 1.4235), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:05,615 - root - INFO - Step 6610: lr=4.24E-06, loss= 1.0614 (max= 1.4235), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:05,615 - root - INFO - Step 6610: lr=4.24E-06, loss= 1.0614 (max= 1.4235), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:05,615 - root - INFO - Step 6610: lr=4.24E-06, loss= 1.0614 (max= 1.4235), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:05,615 - root - INFO - Step 6610: lr=4.24E-06, loss= 1.0614 (max= 1.4235), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:05,616 - root - INFO - Step 6610: lr=4.24E-06, loss= 1.0614 (max= 1.4235), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:37,459 - root - INFO - Step 6620: lr=4.24E-06, loss= 1.0625 (max= 1.5532), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:37,459 - root - INFO - Step 6620: lr=4.24E-06, loss= 1.0625 (max= 1.5532), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:37,459 - root - INFO - Step 6620: lr=4.24E-06, loss= 1.0625 (max= 1.5532), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:37,459 - root - INFO - Step 6620: lr=4.24E-06, loss= 1.0625 (max= 1.5532), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:37,459 - root - INFO - Step 6620: lr=4.24E-06, loss= 1.0625 (max= 1.5532), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:37,459 - root - INFO - Step 6620: lr=4.24E-06, loss= 1.0625 (max= 1.5532), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:37,459 - root - INFO - Step 6620: lr=4.24E-06, loss= 1.0625 (max= 1.5532), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:58:37,460 - root - INFO - Step 6620: lr=4.24E-06, loss= 1.0625 (max= 1.5532), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:09,222 - root - INFO - Step 6630: lr=4.23E-06, loss= 1.0604 (max= 1.4068), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:09,222 - root - INFO - Step 6630: lr=4.23E-06, loss= 1.0604 (max= 1.4068), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:09,222 - root - INFO - Step 6630: lr=4.23E-06, loss= 1.0604 (max= 1.4068), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:09,222 - root - INFO - Step 6630: lr=4.23E-06, loss= 1.0604 (max= 1.4068), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:09,222 - root - INFO - Step 6630: lr=4.23E-06, loss= 1.0604 (max= 1.4068), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:09,222 - root - INFO - Step 6630: lr=4.23E-06, loss= 1.0604 (max= 1.4068), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:09,223 - root - INFO - Step 6630: lr=4.23E-06, loss= 1.0604 (max= 1.4068), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:09,223 - root - INFO - Step 6630: lr=4.23E-06, loss= 1.0604 (max= 1.4068), tps=20636, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:41,073 - root - INFO - Step 6640: lr=4.23E-06, loss= 1.0657 (max= 1.4879), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:41,074 - root - INFO - Step 6640: lr=4.23E-06, loss= 1.0657 (max= 1.4879), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:41,074 - root - INFO - Step 6640: lr=4.23E-06, loss= 1.0657 (max= 1.4879), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:41,074 - root - INFO - Step 6640: lr=4.23E-06, loss= 1.0657 (max= 1.4879), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:41,074 - root - INFO - Step 6640: lr=4.23E-06, loss= 1.0657 (max= 1.4879), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:41,074 - root - INFO - Step 6640: lr=4.23E-06, loss= 1.0657 (max= 1.4879), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:41,074 - root - INFO - Step 6640: lr=4.23E-06, loss= 1.0657 (max= 1.4879), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 13:59:41,074 - root - INFO - Step 6640: lr=4.23E-06, loss= 1.0657 (max= 1.4879), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:13,008 - root - INFO - Step 6650: lr=4.22E-06, loss= 1.0564 (max= 1.5157), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:13,008 - root - INFO - Step 6650: lr=4.22E-06, loss= 1.0564 (max= 1.5157), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:13,008 - root - INFO - Step 6650: lr=4.22E-06, loss= 1.0564 (max= 1.5157), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:13,008 - root - INFO - Step 6650: lr=4.22E-06, loss= 1.0564 (max= 1.5157), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:13,008 - root - INFO - Step 6650: lr=4.22E-06, loss= 1.0564 (max= 1.5157), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:13,008 - root - INFO - Step 6650: lr=4.22E-06, loss= 1.0564 (max= 1.5157), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:13,008 - root - INFO - Step 6650: lr=4.22E-06, loss= 1.0564 (max= 1.5157), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:13,009 - root - INFO - Step 6650: lr=4.22E-06, loss= 1.0564 (max= 1.5157), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:44,805 - root - INFO - Step 6660: lr=4.22E-06, loss= 1.0795 (max= 1.5092), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:44,805 - root - INFO - Step 6660: lr=4.22E-06, loss= 1.0795 (max= 1.5092), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:44,805 - root - INFO - Step 6660: lr=4.22E-06, loss= 1.0795 (max= 1.5092), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:44,805 - root - INFO - Step 6660: lr=4.22E-06, loss= 1.0795 (max= 1.5092), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:44,805 - root - INFO - Step 6660: lr=4.22E-06, loss= 1.0795 (max= 1.5092), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:44,805 - root - INFO - Step 6660: lr=4.22E-06, loss= 1.0795 (max= 1.5092), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:44,805 - root - INFO - Step 6660: lr=4.22E-06, loss= 1.0795 (max= 1.5092), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:00:44,805 - root - INFO - Step 6660: lr=4.22E-06, loss= 1.0795 (max= 1.5092), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:16,634 - root - INFO - Step 6670: lr=4.21E-06, loss= 1.0531 (max= 1.4235), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:16,634 - root - INFO - Step 6670: lr=4.21E-06, loss= 1.0531 (max= 1.4235), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:16,634 - root - INFO - Step 6670: lr=4.21E-06, loss= 1.0531 (max= 1.4235), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:16,634 - root - INFO - Step 6670: lr=4.21E-06, loss= 1.0531 (max= 1.4235), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:16,634 - root - INFO - Step 6670: lr=4.21E-06, loss= 1.0531 (max= 1.4235), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:16,634 - root - INFO - Step 6670: lr=4.21E-06, loss= 1.0531 (max= 1.4235), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:16,634 - root - INFO - Step 6670: lr=4.21E-06, loss= 1.0531 (max= 1.4235), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:16,635 - root - INFO - Step 6670: lr=4.21E-06, loss= 1.0531 (max= 1.4235), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:48,446 - root - INFO - Step 6680: lr=4.21E-06, loss= 1.0702 (max= 1.5731), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:48,446 - root - INFO - Step 6680: lr=4.21E-06, loss= 1.0702 (max= 1.5731), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:48,446 - root - INFO - Step 6680: lr=4.21E-06, loss= 1.0702 (max= 1.5731), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:48,446 - root - INFO - Step 6680: lr=4.21E-06, loss= 1.0702 (max= 1.5731), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:48,446 - root - INFO - Step 6680: lr=4.21E-06, loss= 1.0702 (max= 1.5731), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:48,446 - root - INFO - Step 6680: lr=4.21E-06, loss= 1.0702 (max= 1.5731), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:48,447 - root - INFO - Step 6680: lr=4.21E-06, loss= 1.0702 (max= 1.5731), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:01:48,447 - root - INFO - Step 6680: lr=4.21E-06, loss= 1.0702 (max= 1.5731), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:20,234 - root - INFO - Step 6690: lr=4.20E-06, loss= 1.0675 (max= 1.4915), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:20,234 - root - INFO - Step 6690: lr=4.20E-06, loss= 1.0675 (max= 1.4915), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:20,234 - root - INFO - Step 6690: lr=4.20E-06, loss= 1.0675 (max= 1.4915), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:20,234 - root - INFO - Step 6690: lr=4.20E-06, loss= 1.0675 (max= 1.4915), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:20,234 - root - INFO - Step 6690: lr=4.20E-06, loss= 1.0675 (max= 1.4915), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:20,234 - root - INFO - Step 6690: lr=4.20E-06, loss= 1.0675 (max= 1.4915), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:20,234 - root - INFO - Step 6690: lr=4.20E-06, loss= 1.0675 (max= 1.4915), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:20,235 - root - INFO - Step 6690: lr=4.20E-06, loss= 1.0675 (max= 1.4915), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:52,162 - root - INFO - Step 6700: lr=4.20E-06, loss= 1.0482 (max= 1.5681), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:52,162 - root - INFO - Step 6700: lr=4.20E-06, loss= 1.0482 (max= 1.5681), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:52,162 - root - INFO - Step 6700: lr=4.20E-06, loss= 1.0482 (max= 1.5681), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:52,162 - root - INFO - Step 6700: lr=4.20E-06, loss= 1.0482 (max= 1.5681), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:52,162 - root - INFO - Step 6700: lr=4.20E-06, loss= 1.0482 (max= 1.5681), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:52,162 - root - INFO - Step 6700: lr=4.20E-06, loss= 1.0482 (max= 1.5681), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:52,162 - root - INFO - Step 6700: lr=4.20E-06, loss= 1.0482 (max= 1.5681), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:02:52,163 - root - INFO - Step 6700: lr=4.20E-06, loss= 1.0482 (max= 1.5681), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:24,016 - root - INFO - Step 6710: lr=4.19E-06, loss= 1.0409 (max= 1.7842), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:24,016 - root - INFO - Step 6710: lr=4.19E-06, loss= 1.0409 (max= 1.7842), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:24,016 - root - INFO - Step 6710: lr=4.19E-06, loss= 1.0409 (max= 1.7842), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:24,016 - root - INFO - Step 6710: lr=4.19E-06, loss= 1.0409 (max= 1.7842), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:24,016 - root - INFO - Step 6710: lr=4.19E-06, loss= 1.0409 (max= 1.7842), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:24,016 - root - INFO - Step 6710: lr=4.19E-06, loss= 1.0409 (max= 1.7842), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:24,016 - root - INFO - Step 6710: lr=4.19E-06, loss= 1.0409 (max= 1.7842), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:24,016 - root - INFO - Step 6710: lr=4.19E-06, loss= 1.0409 (max= 1.7842), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:55,764 - root - INFO - Step 6720: lr=4.19E-06, loss= 1.0573 (max= 1.5641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:55,764 - root - INFO - Step 6720: lr=4.19E-06, loss= 1.0573 (max= 1.5641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:55,764 - root - INFO - Step 6720: lr=4.19E-06, loss= 1.0573 (max= 1.5641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:55,764 - root - INFO - Step 6720: lr=4.19E-06, loss= 1.0573 (max= 1.5641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:55,764 - root - INFO - Step 6720: lr=4.19E-06, loss= 1.0573 (max= 1.5641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:55,764 - root - INFO - Step 6720: lr=4.19E-06, loss= 1.0573 (max= 1.5641), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:55,764 - root - INFO - Step 6720: lr=4.19E-06, loss= 1.0573 (max= 1.5641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:03:55,765 - root - INFO - Step 6720: lr=4.19E-06, loss= 1.0573 (max= 1.5641), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:27,582 - root - INFO - Step 6730: lr=4.19E-06, loss= 1.0538 (max= 1.5267), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:27,582 - root - INFO - Step 6730: lr=4.19E-06, loss= 1.0538 (max= 1.5267), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:27,583 - root - INFO - Step 6730: lr=4.19E-06, loss= 1.0538 (max= 1.5267), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:27,583 - root - INFO - Step 6730: lr=4.19E-06, loss= 1.0538 (max= 1.5267), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:27,583 - root - INFO - Step 6730: lr=4.19E-06, loss= 1.0538 (max= 1.5267), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:27,583 - root - INFO - Step 6730: lr=4.19E-06, loss= 1.0538 (max= 1.5267), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:27,583 - root - INFO - Step 6730: lr=4.19E-06, loss= 1.0538 (max= 1.5267), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:27,583 - root - INFO - Step 6730: lr=4.19E-06, loss= 1.0538 (max= 1.5267), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:59,516 - root - INFO - Step 6740: lr=4.18E-06, loss= 1.0299 (max= 1.3808), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:59,516 - root - INFO - Step 6740: lr=4.18E-06, loss= 1.0299 (max= 1.3808), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:59,516 - root - INFO - Step 6740: lr=4.18E-06, loss= 1.0299 (max= 1.3808), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:59,516 - root - INFO - Step 6740: lr=4.18E-06, loss= 1.0299 (max= 1.3808), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:59,516 - root - INFO - Step 6740: lr=4.18E-06, loss= 1.0299 (max= 1.3808), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:59,516 - root - INFO - Step 6740: lr=4.18E-06, loss= 1.0299 (max= 1.3808), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:59,516 - root - INFO - Step 6740: lr=4.18E-06, loss= 1.0299 (max= 1.3808), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:04:59,517 - root - INFO - Step 6740: lr=4.18E-06, loss= 1.0299 (max= 1.3808), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:05:31,342 - root - INFO - Step 6750: lr=4.18E-06, loss= 1.0861 (max= 1.4823), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:05:31,342 - root - INFO - Step 6750: lr=4.18E-06, loss= 1.0861 (max= 1.4823), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:05:31,342 - root - INFO - Step 6750: lr=4.18E-06, loss= 1.0861 (max= 1.4823), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:05:31,342 - root - INFO - Step 6750: lr=4.18E-06, loss= 1.0861 (max= 1.4823), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:05:31,342 - root - INFO - Step 6750: lr=4.18E-06, loss= 1.0861 (max= 1.4823), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:05:31,342 - root - INFO - Step 6750: lr=4.18E-06, loss= 1.0861 (max= 1.4823), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:05:31,342 - root - INFO - Step 6750: lr=4.18E-06, loss= 1.0861 (max= 1.4823), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:05:31,342 - root - INFO - Step 6750: lr=4.18E-06, loss= 1.0861 (max= 1.4823), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:03,188 - root - INFO - Step 6760: lr=4.17E-06, loss= 1.0304 (max= 1.5836), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:03,188 - root - INFO - Step 6760: lr=4.17E-06, loss= 1.0304 (max= 1.5836), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:03,188 - root - INFO - Step 6760: lr=4.17E-06, loss= 1.0304 (max= 1.5836), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:03,188 - root - INFO - Step 6760: lr=4.17E-06, loss= 1.0304 (max= 1.5836), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:03,188 - root - INFO - Step 6760: lr=4.17E-06, loss= 1.0304 (max= 1.5836), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:03,188 - root - INFO - Step 6760: lr=4.17E-06, loss= 1.0304 (max= 1.5836), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:03,189 - root - INFO - Step 6760: lr=4.17E-06, loss= 1.0304 (max= 1.5836), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:03,189 - root - INFO - Step 6760: lr=4.17E-06, loss= 1.0304 (max= 1.5836), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:35,008 - root - INFO - Step 6770: lr=4.17E-06, loss= 1.0527 (max= 1.4520), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:35,008 - root - INFO - Step 6770: lr=4.17E-06, loss= 1.0527 (max= 1.4520), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:35,008 - root - INFO - Step 6770: lr=4.17E-06, loss= 1.0527 (max= 1.4520), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:35,008 - root - INFO - Step 6770: lr=4.17E-06, loss= 1.0527 (max= 1.4520), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:35,008 - root - INFO - Step 6770: lr=4.17E-06, loss= 1.0527 (max= 1.4520), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:35,008 - root - INFO - Step 6770: lr=4.17E-06, loss= 1.0527 (max= 1.4520), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:35,008 - root - INFO - Step 6770: lr=4.17E-06, loss= 1.0527 (max= 1.4520), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:06:35,008 - root - INFO - Step 6770: lr=4.17E-06, loss= 1.0527 (max= 1.4520), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:06,902 - root - INFO - Step 6780: lr=4.16E-06, loss= 1.0558 (max= 1.5906), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:06,902 - root - INFO - Step 6780: lr=4.16E-06, loss= 1.0558 (max= 1.5906), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:06,902 - root - INFO - Step 6780: lr=4.16E-06, loss= 1.0558 (max= 1.5906), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:06,902 - root - INFO - Step 6780: lr=4.16E-06, loss= 1.0558 (max= 1.5906), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:06,902 - root - INFO - Step 6780: lr=4.16E-06, loss= 1.0558 (max= 1.5906), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:06,902 - root - INFO - Step 6780: lr=4.16E-06, loss= 1.0558 (max= 1.5906), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:06,902 - root - INFO - Step 6780: lr=4.16E-06, loss= 1.0558 (max= 1.5906), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:06,902 - root - INFO - Step 6780: lr=4.16E-06, loss= 1.0558 (max= 1.5906), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:38,756 - root - INFO - Step 6790: lr=4.16E-06, loss= 1.0542 (max= 1.5008), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:38,756 - root - INFO - Step 6790: lr=4.16E-06, loss= 1.0542 (max= 1.5008), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:38,756 - root - INFO - Step 6790: lr=4.16E-06, loss= 1.0542 (max= 1.5008), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:38,756 - root - INFO - Step 6790: lr=4.16E-06, loss= 1.0542 (max= 1.5008), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:38,756 - root - INFO - Step 6790: lr=4.16E-06, loss= 1.0542 (max= 1.5008), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:38,756 - root - INFO - Step 6790: lr=4.16E-06, loss= 1.0542 (max= 1.5008), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:38,756 - root - INFO - Step 6790: lr=4.16E-06, loss= 1.0542 (max= 1.5008), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:07:38,757 - root - INFO - Step 6790: lr=4.16E-06, loss= 1.0542 (max= 1.5008), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:10,788 - root - INFO - Step 6800: lr=4.15E-06, loss= 1.0878 (max= 1.5170), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:10,788 - root - INFO - Step 6800: lr=4.15E-06, loss= 1.0878 (max= 1.5170), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:10,788 - root - INFO - Step 6800: lr=4.15E-06, loss= 1.0878 (max= 1.5170), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:10,788 - root - INFO - Step 6800: lr=4.15E-06, loss= 1.0878 (max= 1.5170), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:10,788 - root - INFO - Step 6800: lr=4.15E-06, loss= 1.0878 (max= 1.5170), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:10,788 - root - INFO - Step 6800: lr=4.15E-06, loss= 1.0878 (max= 1.5170), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:10,788 - root - INFO - Step 6800: lr=4.15E-06, loss= 1.0878 (max= 1.5170), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:10,789 - root - INFO - Step 6800: lr=4.15E-06, loss= 1.0878 (max= 1.5170), tps=20462, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:42,594 - root - INFO - Step 6810: lr=4.15E-06, loss= 1.0462 (max= 1.4625), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:42,594 - root - INFO - Step 6810: lr=4.15E-06, loss= 1.0462 (max= 1.4625), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:42,594 - root - INFO - Step 6810: lr=4.15E-06, loss= 1.0462 (max= 1.4625), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:42,594 - root - INFO - Step 6810: lr=4.15E-06, loss= 1.0462 (max= 1.4625), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:42,594 - root - INFO - Step 6810: lr=4.15E-06, loss= 1.0462 (max= 1.4625), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:42,594 - root - INFO - Step 6810: lr=4.15E-06, loss= 1.0462 (max= 1.4625), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:42,594 - root - INFO - Step 6810: lr=4.15E-06, loss= 1.0462 (max= 1.4625), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:08:42,594 - root - INFO - Step 6810: lr=4.15E-06, loss= 1.0462 (max= 1.4625), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:14,433 - root - INFO - Step 6820: lr=4.14E-06, loss= 1.0485 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:14,433 - root - INFO - Step 6820: lr=4.14E-06, loss= 1.0485 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:14,433 - root - INFO - Step 6820: lr=4.14E-06, loss= 1.0485 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:14,433 - root - INFO - Step 6820: lr=4.14E-06, loss= 1.0485 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:14,433 - root - INFO - Step 6820: lr=4.14E-06, loss= 1.0485 (max= 1.5935), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:14,433 - root - INFO - Step 6820: lr=4.14E-06, loss= 1.0485 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:14,433 - root - INFO - Step 6820: lr=4.14E-06, loss= 1.0485 (max= 1.5935), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:14,434 - root - INFO - Step 6820: lr=4.14E-06, loss= 1.0485 (max= 1.5935), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:46,283 - root - INFO - Step 6830: lr=4.14E-06, loss= 1.0493 (max= 1.4354), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:46,283 - root - INFO - Step 6830: lr=4.14E-06, loss= 1.0493 (max= 1.4354), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:46,283 - root - INFO - Step 6830: lr=4.14E-06, loss= 1.0493 (max= 1.4354), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:46,283 - root - INFO - Step 6830: lr=4.14E-06, loss= 1.0493 (max= 1.4354), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:46,283 - root - INFO - Step 6830: lr=4.14E-06, loss= 1.0493 (max= 1.4354), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:46,283 - root - INFO - Step 6830: lr=4.14E-06, loss= 1.0493 (max= 1.4354), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:46,284 - root - INFO - Step 6830: lr=4.14E-06, loss= 1.0493 (max= 1.4354), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:09:46,284 - root - INFO - Step 6830: lr=4.14E-06, loss= 1.0493 (max= 1.4354), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:18,117 - root - INFO - Step 6840: lr=4.13E-06, loss= 1.0321 (max= 1.4765), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:18,117 - root - INFO - Step 6840: lr=4.13E-06, loss= 1.0321 (max= 1.4765), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:18,117 - root - INFO - Step 6840: lr=4.13E-06, loss= 1.0321 (max= 1.4765), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:18,117 - root - INFO - Step 6840: lr=4.13E-06, loss= 1.0321 (max= 1.4765), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:18,117 - root - INFO - Step 6840: lr=4.13E-06, loss= 1.0321 (max= 1.4765), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:18,117 - root - INFO - Step 6840: lr=4.13E-06, loss= 1.0321 (max= 1.4765), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:18,117 - root - INFO - Step 6840: lr=4.13E-06, loss= 1.0321 (max= 1.4765), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:18,117 - root - INFO - Step 6840: lr=4.13E-06, loss= 1.0321 (max= 1.4765), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:49,888 - root - INFO - Step 6850: lr=4.13E-06, loss= 1.0224 (max= 1.4965), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:49,888 - root - INFO - Step 6850: lr=4.13E-06, loss= 1.0224 (max= 1.4965), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:49,888 - root - INFO - Step 6850: lr=4.13E-06, loss= 1.0224 (max= 1.4965), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:49,888 - root - INFO - Step 6850: lr=4.13E-06, loss= 1.0224 (max= 1.4965), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:49,888 - root - INFO - Step 6850: lr=4.13E-06, loss= 1.0224 (max= 1.4965), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:49,888 - root - INFO - Step 6850: lr=4.13E-06, loss= 1.0224 (max= 1.4965), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:49,888 - root - INFO - Step 6850: lr=4.13E-06, loss= 1.0224 (max= 1.4965), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:49,888 - root - INFO - Step 6850: lr=4.13E-06, loss= 1.0224 (max= 1.4965), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:10:56,841 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5775907 2025-10-26 14:11:21,727 - root - INFO - Step 6860: lr=4.12E-06, loss= 1.0616 (max= 1.6786), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:21,727 - root - INFO - Step 6860: lr=4.12E-06, loss= 1.0616 (max= 1.6786), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:21,727 - root - INFO - Step 6860: lr=4.12E-06, loss= 1.0616 (max= 1.6786), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:21,727 - root - INFO - Step 6860: lr=4.12E-06, loss= 1.0616 (max= 1.6786), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:21,727 - root - INFO - Step 6860: lr=4.12E-06, loss= 1.0616 (max= 1.6786), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:21,727 - root - INFO - Step 6860: lr=4.12E-06, loss= 1.0616 (max= 1.6786), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:21,728 - root - INFO - Step 6860: lr=4.12E-06, loss= 1.0616 (max= 1.6786), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:21,728 - root - INFO - Step 6860: lr=4.12E-06, loss= 1.0616 (max= 1.6786), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:53,578 - root - INFO - Step 6870: lr=4.12E-06, loss= 1.0796 (max= 1.4339), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:53,578 - root - INFO - Step 6870: lr=4.12E-06, loss= 1.0796 (max= 1.4339), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:53,578 - root - INFO - Step 6870: lr=4.12E-06, loss= 1.0796 (max= 1.4339), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:53,579 - root - INFO - Step 6870: lr=4.12E-06, loss= 1.0796 (max= 1.4339), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:53,579 - root - INFO - Step 6870: lr=4.12E-06, loss= 1.0796 (max= 1.4339), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:53,579 - root - INFO - Step 6870: lr=4.12E-06, loss= 1.0796 (max= 1.4339), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:53,579 - root - INFO - Step 6870: lr=4.12E-06, loss= 1.0796 (max= 1.4339), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:11:53,579 - root - INFO - Step 6870: lr=4.12E-06, loss= 1.0796 (max= 1.4339), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:25,456 - root - INFO - Step 6880: lr=4.12E-06, loss= 1.0285 (max= 1.4314), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:25,456 - root - INFO - Step 6880: lr=4.12E-06, loss= 1.0285 (max= 1.4314), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:25,456 - root - INFO - Step 6880: lr=4.12E-06, loss= 1.0285 (max= 1.4314), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:25,456 - root - INFO - Step 6880: lr=4.12E-06, loss= 1.0285 (max= 1.4314), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:25,456 - root - INFO - Step 6880: lr=4.12E-06, loss= 1.0285 (max= 1.4314), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:25,456 - root - INFO - Step 6880: lr=4.12E-06, loss= 1.0285 (max= 1.4314), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:25,457 - root - INFO - Step 6880: lr=4.12E-06, loss= 1.0285 (max= 1.4314), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:25,457 - root - INFO - Step 6880: lr=4.12E-06, loss= 1.0285 (max= 1.4314), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:57,263 - root - INFO - Step 6890: lr=4.11E-06, loss= 1.0619 (max= 1.5909), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:57,263 - root - INFO - Step 6890: lr=4.11E-06, loss= 1.0619 (max= 1.5909), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:57,263 - root - INFO - Step 6890: lr=4.11E-06, loss= 1.0619 (max= 1.5909), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:57,263 - root - INFO - Step 6890: lr=4.11E-06, loss= 1.0619 (max= 1.5909), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:57,263 - root - INFO - Step 6890: lr=4.11E-06, loss= 1.0619 (max= 1.5909), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:57,264 - root - INFO - Step 6890: lr=4.11E-06, loss= 1.0619 (max= 1.5909), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:57,264 - root - INFO - Step 6890: lr=4.11E-06, loss= 1.0619 (max= 1.5909), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:12:57,264 - root - INFO - Step 6890: lr=4.11E-06, loss= 1.0619 (max= 1.5909), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:13:29,094 - root - INFO - Step 6900: lr=4.11E-06, loss= 1.0583 (max= 1.6060), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:13:29,094 - root - INFO - Step 6900: lr=4.11E-06, loss= 1.0583 (max= 1.6060), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:13:29,094 - root - INFO - Step 6900: lr=4.11E-06, loss= 1.0583 (max= 1.6060), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:13:29,094 - root - INFO - Step 6900: lr=4.11E-06, loss= 1.0583 (max= 1.6060), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:13:29,094 - root - INFO - Step 6900: lr=4.11E-06, loss= 1.0583 (max= 1.6060), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:13:29,094 - root - INFO - Step 6900: lr=4.11E-06, loss= 1.0583 (max= 1.6060), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:13:29,094 - root - INFO - Step 6900: lr=4.11E-06, loss= 1.0583 (max= 1.6060), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:13:29,095 - root - INFO - Step 6900: lr=4.11E-06, loss= 1.0583 (max= 1.6060), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:00,924 - root - INFO - Step 6910: lr=4.10E-06, loss= 1.0662 (max= 1.5094), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:00,924 - root - INFO - Step 6910: lr=4.10E-06, loss= 1.0662 (max= 1.5094), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:00,924 - root - INFO - Step 6910: lr=4.10E-06, loss= 1.0662 (max= 1.5094), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:00,924 - root - INFO - Step 6910: lr=4.10E-06, loss= 1.0662 (max= 1.5094), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:00,924 - root - INFO - Step 6910: lr=4.10E-06, loss= 1.0662 (max= 1.5094), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:00,925 - root - INFO - Step 6910: lr=4.10E-06, loss= 1.0662 (max= 1.5094), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:00,925 - root - INFO - Step 6910: lr=4.10E-06, loss= 1.0662 (max= 1.5094), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:00,925 - root - INFO - Step 6910: lr=4.10E-06, loss= 1.0662 (max= 1.5094), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:26,938 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5818749 2025-10-26 14:14:32,750 - root - INFO - Step 6920: lr=4.10E-06, loss= 1.0765 (max= 1.7578), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:32,750 - root - INFO - Step 6920: lr=4.10E-06, loss= 1.0765 (max= 1.7578), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:32,750 - root - INFO - Step 6920: lr=4.10E-06, loss= 1.0765 (max= 1.7578), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:32,750 - root - INFO - Step 6920: lr=4.10E-06, loss= 1.0765 (max= 1.7578), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:32,750 - root - INFO - Step 6920: lr=4.10E-06, loss= 1.0765 (max= 1.7578), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:32,751 - root - INFO - Step 6920: lr=4.10E-06, loss= 1.0765 (max= 1.7578), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:32,751 - root - INFO - Step 6920: lr=4.10E-06, loss= 1.0765 (max= 1.7578), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:14:32,751 - root - INFO - Step 6920: lr=4.10E-06, loss= 1.0765 (max= 1.7578), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:04,609 - root - INFO - Step 6930: lr=4.09E-06, loss= 1.0665 (max= 1.5126), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:04,609 - root - INFO - Step 6930: lr=4.09E-06, loss= 1.0665 (max= 1.5126), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:04,609 - root - INFO - Step 6930: lr=4.09E-06, loss= 1.0665 (max= 1.5126), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:04,609 - root - INFO - Step 6930: lr=4.09E-06, loss= 1.0665 (max= 1.5126), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:04,609 - root - INFO - Step 6930: lr=4.09E-06, loss= 1.0665 (max= 1.5126), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:04,609 - root - INFO - Step 6930: lr=4.09E-06, loss= 1.0665 (max= 1.5126), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:04,610 - root - INFO - Step 6930: lr=4.09E-06, loss= 1.0665 (max= 1.5126), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:04,610 - root - INFO - Step 6930: lr=4.09E-06, loss= 1.0665 (max= 1.5126), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:36,505 - root - INFO - Step 6940: lr=4.09E-06, loss= 1.0452 (max= 1.5161), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:36,505 - root - INFO - Step 6940: lr=4.09E-06, loss= 1.0452 (max= 1.5161), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:36,505 - root - INFO - Step 6940: lr=4.09E-06, loss= 1.0452 (max= 1.5161), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:36,505 - root - INFO - Step 6940: lr=4.09E-06, loss= 1.0452 (max= 1.5161), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:36,505 - root - INFO - Step 6940: lr=4.09E-06, loss= 1.0452 (max= 1.5161), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:36,505 - root - INFO - Step 6940: lr=4.09E-06, loss= 1.0452 (max= 1.5161), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:36,505 - root - INFO - Step 6940: lr=4.09E-06, loss= 1.0452 (max= 1.5161), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:15:36,506 - root - INFO - Step 6940: lr=4.09E-06, loss= 1.0452 (max= 1.5161), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:08,341 - root - INFO - Step 6950: lr=4.08E-06, loss= 1.0635 (max= 1.5471), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:08,341 - root - INFO - Step 6950: lr=4.08E-06, loss= 1.0635 (max= 1.5471), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:08,341 - root - INFO - Step 6950: lr=4.08E-06, loss= 1.0635 (max= 1.5471), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:08,341 - root - INFO - Step 6950: lr=4.08E-06, loss= 1.0635 (max= 1.5471), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:08,341 - root - INFO - Step 6950: lr=4.08E-06, loss= 1.0635 (max= 1.5471), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:08,341 - root - INFO - Step 6950: lr=4.08E-06, loss= 1.0635 (max= 1.5471), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:08,341 - root - INFO - Step 6950: lr=4.08E-06, loss= 1.0635 (max= 1.5471), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:08,341 - root - INFO - Step 6950: lr=4.08E-06, loss= 1.0635 (max= 1.5471), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:40,197 - root - INFO - Step 6960: lr=4.08E-06, loss= 1.0588 (max= 1.5368), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:40,197 - root - INFO - Step 6960: lr=4.08E-06, loss= 1.0588 (max= 1.5368), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:40,197 - root - INFO - Step 6960: lr=4.08E-06, loss= 1.0588 (max= 1.5368), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:40,197 - root - INFO - Step 6960: lr=4.08E-06, loss= 1.0588 (max= 1.5368), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:40,197 - root - INFO - Step 6960: lr=4.08E-06, loss= 1.0588 (max= 1.5368), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:40,197 - root - INFO - Step 6960: lr=4.08E-06, loss= 1.0588 (max= 1.5368), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:40,197 - root - INFO - Step 6960: lr=4.08E-06, loss= 1.0588 (max= 1.5368), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:16:40,198 - root - INFO - Step 6960: lr=4.08E-06, loss= 1.0588 (max= 1.5368), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:12,002 - root - INFO - Step 6970: lr=4.07E-06, loss= 1.0652 (max= 1.4951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:12,002 - root - INFO - Step 6970: lr=4.07E-06, loss= 1.0652 (max= 1.4951), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:12,002 - root - INFO - Step 6970: lr=4.07E-06, loss= 1.0652 (max= 1.4951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:12,002 - root - INFO - Step 6970: lr=4.07E-06, loss= 1.0652 (max= 1.4951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:12,002 - root - INFO - Step 6970: lr=4.07E-06, loss= 1.0652 (max= 1.4951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:12,002 - root - INFO - Step 6970: lr=4.07E-06, loss= 1.0652 (max= 1.4951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:12,002 - root - INFO - Step 6970: lr=4.07E-06, loss= 1.0652 (max= 1.4951), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:12,003 - root - INFO - Step 6970: lr=4.07E-06, loss= 1.0652 (max= 1.4951), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:43,788 - root - INFO - Step 6980: lr=4.07E-06, loss= 1.0320 (max= 1.4135), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:43,788 - root - INFO - Step 6980: lr=4.07E-06, loss= 1.0320 (max= 1.4135), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:43,788 - root - INFO - Step 6980: lr=4.07E-06, loss= 1.0320 (max= 1.4135), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:43,788 - root - INFO - Step 6980: lr=4.07E-06, loss= 1.0320 (max= 1.4135), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:43,788 - root - INFO - Step 6980: lr=4.07E-06, loss= 1.0320 (max= 1.4135), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:43,788 - root - INFO - Step 6980: lr=4.07E-06, loss= 1.0320 (max= 1.4135), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:43,788 - root - INFO - Step 6980: lr=4.07E-06, loss= 1.0320 (max= 1.4135), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:17:43,788 - root - INFO - Step 6980: lr=4.07E-06, loss= 1.0320 (max= 1.4135), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:15,575 - root - INFO - Step 6990: lr=4.07E-06, loss= 1.0706 (max= 1.5879), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:15,575 - root - INFO - Step 6990: lr=4.07E-06, loss= 1.0706 (max= 1.5879), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:15,575 - root - INFO - Step 6990: lr=4.07E-06, loss= 1.0706 (max= 1.5879), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:15,575 - root - INFO - Step 6990: lr=4.07E-06, loss= 1.0706 (max= 1.5879), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:15,575 - root - INFO - Step 6990: lr=4.07E-06, loss= 1.0706 (max= 1.5879), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:15,575 - root - INFO - Step 6990: lr=4.07E-06, loss= 1.0706 (max= 1.5879), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:15,575 - root - INFO - Step 6990: lr=4.07E-06, loss= 1.0706 (max= 1.5879), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:15,576 - root - INFO - Step 6990: lr=4.07E-06, loss= 1.0706 (max= 1.5879), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:21,085 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:6873421 Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-7000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-7000! Save time: 4.384554862976074 2025-10-26 14:18:47,434 - root - INFO - Step 7000: lr=4.06E-06, loss= 1.0598 (max= 1.4800), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:47,435 - root - INFO - Saving a full checkpoint at step 7000 2025-10-26 14:18:47,435 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 14:18:47,435 - root - INFO - Step 7000: lr=4.06E-06, loss= 1.0598 (max= 1.4800), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:47,435 - root - INFO - Step 7000: lr=4.06E-06, loss= 1.0598 (max= 1.4800), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:47,435 - root - INFO - Step 7000: lr=4.06E-06, loss= 1.0598 (max= 1.4800), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:47,435 - root - INFO - Step 7000: lr=4.06E-06, loss= 1.0598 (max= 1.4800), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:47,435 - root - INFO - Step 7000: lr=4.06E-06, loss= 1.0598 (max= 1.4800), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:47,435 - root - INFO - Step 7000: lr=4.06E-06, loss= 1.0598 (max= 1.4800), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:47,435 - root - INFO - Saving a full checkpoint at step 7000 2025-10-26 14:18:47,435 - root - INFO - Saving a full checkpoint at step 7000 2025-10-26 14:18:47,435 - root - INFO - Saving a full checkpoint at step 7000 2025-10-26 14:18:47,435 - root - INFO - Saving a full checkpoint at step 7000 2025-10-26 14:18:47,435 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 14:18:47,435 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 14:18:47,435 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 14:18:47,435 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 14:18:47,435 - root - INFO - Saving a full checkpoint at step 7000 2025-10-26 14:18:47,435 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 14:18:47,435 - root - INFO - Saving a full checkpoint at step 7000 2025-10-26 14:18:47,435 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 14:18:47,435 - root - INFO - Step 7000: lr=4.06E-06, loss= 1.0598 (max= 1.4800), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:18:47,435 - root - INFO - Saving a full checkpoint at step 7000 2025-10-26 14:18:47,435 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 14:19:01,035 - root - INFO - Finished saving the checkpoint in 13.60 seconds 2025-10-26 14:19:01,042 - root - INFO - Finished saving the checkpoint in 13.61 seconds 2025-10-26 14:19:01,042 - root - INFO - Finished saving the checkpoint in 13.61 seconds 2025-10-26 14:19:01,042 - root - INFO - Finished saving the checkpoint in 13.61 seconds 2025-10-26 14:19:01,042 - root - INFO - Finished saving the checkpoint in 13.61 seconds 2025-10-26 14:19:01,043 - root - INFO - Finished saving the checkpoint in 13.61 seconds 2025-10-26 14:19:01,043 - root - INFO - Finished saving the checkpoint in 13.61 seconds 2025-10-26 14:19:01,043 - root - INFO - Finished saving the checkpoint in 13.61 seconds 2025-10-26 14:19:32,829 - root - INFO - Step 7010: lr=4.06E-06, loss= 1.0612 (max= 1.5473), tps=14438, mfu=30.08%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:19:32,829 - root - INFO - Step 7010: lr=4.06E-06, loss= 1.0612 (max= 1.5473), tps=14438, mfu=30.08%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:19:32,829 - root - INFO - Step 7010: lr=4.06E-06, loss= 1.0612 (max= 1.5473), tps=14438, mfu=30.08%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:19:32,829 - root - INFO - Step 7010: lr=4.06E-06, loss= 1.0612 (max= 1.5473), tps=14438, mfu=30.08%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:19:32,829 - root - INFO - Step 7010: lr=4.06E-06, loss= 1.0612 (max= 1.5473), tps=14438, mfu=30.08%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:19:32,829 - root - INFO - Step 7010: lr=4.06E-06, loss= 1.0612 (max= 1.5473), tps=14438, mfu=30.08%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:19:32,829 - root - INFO - Step 7010: lr=4.06E-06, loss= 1.0612 (max= 1.5473), tps=14438, mfu=30.08%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:19:32,830 - root - INFO - Step 7010: lr=4.06E-06, loss= 1.0612 (max= 1.5473), tps=14438, mfu=30.08%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:04,782 - root - INFO - Step 7020: lr=4.05E-06, loss= 1.0592 (max= 1.6130), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:04,782 - root - INFO - Step 7020: lr=4.05E-06, loss= 1.0592 (max= 1.6130), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:04,782 - root - INFO - Step 7020: lr=4.05E-06, loss= 1.0592 (max= 1.6130), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:04,782 - root - INFO - Step 7020: lr=4.05E-06, loss= 1.0592 (max= 1.6130), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:04,782 - root - INFO - Step 7020: lr=4.05E-06, loss= 1.0592 (max= 1.6130), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:04,782 - root - INFO - Step 7020: lr=4.05E-06, loss= 1.0592 (max= 1.6130), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:04,782 - root - INFO - Step 7020: lr=4.05E-06, loss= 1.0592 (max= 1.6130), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:04,783 - root - INFO - Step 7020: lr=4.05E-06, loss= 1.0592 (max= 1.6130), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:36,608 - root - INFO - Step 7030: lr=4.05E-06, loss= 1.0531 (max= 1.5016), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:36,608 - root - INFO - Step 7030: lr=4.05E-06, loss= 1.0531 (max= 1.5016), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:36,608 - root - INFO - Step 7030: lr=4.05E-06, loss= 1.0531 (max= 1.5016), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:36,608 - root - INFO - Step 7030: lr=4.05E-06, loss= 1.0531 (max= 1.5016), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:36,608 - root - INFO - Step 7030: lr=4.05E-06, loss= 1.0531 (max= 1.5016), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:36,608 - root - INFO - Step 7030: lr=4.05E-06, loss= 1.0531 (max= 1.5016), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:36,608 - root - INFO - Step 7030: lr=4.05E-06, loss= 1.0531 (max= 1.5016), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:20:36,609 - root - INFO - Step 7030: lr=4.05E-06, loss= 1.0531 (max= 1.5016), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:08,506 - root - INFO - Step 7040: lr=4.04E-06, loss= 1.0289 (max= 1.4635), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:08,506 - root - INFO - Step 7040: lr=4.04E-06, loss= 1.0289 (max= 1.4635), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:08,507 - root - INFO - Step 7040: lr=4.04E-06, loss= 1.0289 (max= 1.4635), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:08,507 - root - INFO - Step 7040: lr=4.04E-06, loss= 1.0289 (max= 1.4635), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:08,507 - root - INFO - Step 7040: lr=4.04E-06, loss= 1.0289 (max= 1.4635), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:08,507 - root - INFO - Step 7040: lr=4.04E-06, loss= 1.0289 (max= 1.4635), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:08,507 - root - INFO - Step 7040: lr=4.04E-06, loss= 1.0289 (max= 1.4635), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:08,507 - root - INFO - Step 7040: lr=4.04E-06, loss= 1.0289 (max= 1.4635), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:40,369 - root - INFO - Step 7050: lr=4.04E-06, loss= 1.0613 (max= 1.4428), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:40,369 - root - INFO - Step 7050: lr=4.04E-06, loss= 1.0613 (max= 1.4428), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:40,369 - root - INFO - Step 7050: lr=4.04E-06, loss= 1.0613 (max= 1.4428), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:40,369 - root - INFO - Step 7050: lr=4.04E-06, loss= 1.0613 (max= 1.4428), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:40,369 - root - INFO - Step 7050: lr=4.04E-06, loss= 1.0613 (max= 1.4428), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:40,369 - root - INFO - Step 7050: lr=4.04E-06, loss= 1.0613 (max= 1.4428), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:40,369 - root - INFO - Step 7050: lr=4.04E-06, loss= 1.0613 (max= 1.4428), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:21:40,370 - root - INFO - Step 7050: lr=4.04E-06, loss= 1.0613 (max= 1.4428), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:12,171 - root - INFO - Step 7060: lr=4.03E-06, loss= 1.0547 (max= 1.5094), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:12,171 - root - INFO - Step 7060: lr=4.03E-06, loss= 1.0547 (max= 1.5094), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:12,171 - root - INFO - Step 7060: lr=4.03E-06, loss= 1.0547 (max= 1.5094), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:12,171 - root - INFO - Step 7060: lr=4.03E-06, loss= 1.0547 (max= 1.5094), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:12,171 - root - INFO - Step 7060: lr=4.03E-06, loss= 1.0547 (max= 1.5094), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:12,171 - root - INFO - Step 7060: lr=4.03E-06, loss= 1.0547 (max= 1.5094), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:12,171 - root - INFO - Step 7060: lr=4.03E-06, loss= 1.0547 (max= 1.5094), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:12,172 - root - INFO - Step 7060: lr=4.03E-06, loss= 1.0547 (max= 1.5094), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:43,991 - root - INFO - Step 7070: lr=4.03E-06, loss= 1.0591 (max= 1.4636), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:43,991 - root - INFO - Step 7070: lr=4.03E-06, loss= 1.0591 (max= 1.4636), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:43,991 - root - INFO - Step 7070: lr=4.03E-06, loss= 1.0591 (max= 1.4636), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:43,991 - root - INFO - Step 7070: lr=4.03E-06, loss= 1.0591 (max= 1.4636), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:43,991 - root - INFO - Step 7070: lr=4.03E-06, loss= 1.0591 (max= 1.4636), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:43,991 - root - INFO - Step 7070: lr=4.03E-06, loss= 1.0591 (max= 1.4636), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:43,991 - root - INFO - Step 7070: lr=4.03E-06, loss= 1.0591 (max= 1.4636), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:22:43,992 - root - INFO - Step 7070: lr=4.03E-06, loss= 1.0591 (max= 1.4636), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:15,820 - root - INFO - Step 7080: lr=4.02E-06, loss= 1.0801 (max= 1.5793), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:15,820 - root - INFO - Step 7080: lr=4.02E-06, loss= 1.0801 (max= 1.5793), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:15,820 - root - INFO - Step 7080: lr=4.02E-06, loss= 1.0801 (max= 1.5793), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:15,820 - root - INFO - Step 7080: lr=4.02E-06, loss= 1.0801 (max= 1.5793), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:15,820 - root - INFO - Step 7080: lr=4.02E-06, loss= 1.0801 (max= 1.5793), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:15,820 - root - INFO - Step 7080: lr=4.02E-06, loss= 1.0801 (max= 1.5793), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:15,820 - root - INFO - Step 7080: lr=4.02E-06, loss= 1.0801 (max= 1.5793), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:15,821 - root - INFO - Step 7080: lr=4.02E-06, loss= 1.0801 (max= 1.5793), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:47,793 - root - INFO - Step 7090: lr=4.02E-06, loss= 1.0414 (max= 1.4411), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:47,793 - root - INFO - Step 7090: lr=4.02E-06, loss= 1.0414 (max= 1.4411), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:47,793 - root - INFO - Step 7090: lr=4.02E-06, loss= 1.0414 (max= 1.4411), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:47,793 - root - INFO - Step 7090: lr=4.02E-06, loss= 1.0414 (max= 1.4411), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:47,793 - root - INFO - Step 7090: lr=4.02E-06, loss= 1.0414 (max= 1.4411), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:47,793 - root - INFO - Step 7090: lr=4.02E-06, loss= 1.0414 (max= 1.4411), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:47,793 - root - INFO - Step 7090: lr=4.02E-06, loss= 1.0414 (max= 1.4411), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:23:47,794 - root - INFO - Step 7090: lr=4.02E-06, loss= 1.0414 (max= 1.4411), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:19,731 - root - INFO - Step 7100: lr=4.02E-06, loss= 1.0681 (max= 1.6134), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:19,731 - root - INFO - Step 7100: lr=4.02E-06, loss= 1.0681 (max= 1.6134), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:19,731 - root - INFO - Step 7100: lr=4.02E-06, loss= 1.0681 (max= 1.6134), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:19,731 - root - INFO - Step 7100: lr=4.02E-06, loss= 1.0681 (max= 1.6134), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:19,731 - root - INFO - Step 7100: lr=4.02E-06, loss= 1.0681 (max= 1.6134), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:19,731 - root - INFO - Step 7100: lr=4.02E-06, loss= 1.0681 (max= 1.6134), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:19,731 - root - INFO - Step 7100: lr=4.02E-06, loss= 1.0681 (max= 1.6134), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:19,731 - root - INFO - Step 7100: lr=4.02E-06, loss= 1.0681 (max= 1.6134), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:51,621 - root - INFO - Step 7110: lr=4.01E-06, loss= 1.0679 (max= 1.5678), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:51,621 - root - INFO - Step 7110: lr=4.01E-06, loss= 1.0679 (max= 1.5678), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:51,621 - root - INFO - Step 7110: lr=4.01E-06, loss= 1.0679 (max= 1.5678), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:51,621 - root - INFO - Step 7110: lr=4.01E-06, loss= 1.0679 (max= 1.5678), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:51,621 - root - INFO - Step 7110: lr=4.01E-06, loss= 1.0679 (max= 1.5678), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:51,621 - root - INFO - Step 7110: lr=4.01E-06, loss= 1.0679 (max= 1.5678), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:51,621 - root - INFO - Step 7110: lr=4.01E-06, loss= 1.0679 (max= 1.5678), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:24:51,622 - root - INFO - Step 7110: lr=4.01E-06, loss= 1.0679 (max= 1.5678), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:23,547 - root - INFO - Step 7120: lr=4.01E-06, loss= 1.0579 (max= 1.5054), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:23,548 - root - INFO - Step 7120: lr=4.01E-06, loss= 1.0579 (max= 1.5054), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:23,548 - root - INFO - Step 7120: lr=4.01E-06, loss= 1.0579 (max= 1.5054), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:23,548 - root - INFO - Step 7120: lr=4.01E-06, loss= 1.0579 (max= 1.5054), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:23,548 - root - INFO - Step 7120: lr=4.01E-06, loss= 1.0579 (max= 1.5054), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:23,548 - root - INFO - Step 7120: lr=4.01E-06, loss= 1.0579 (max= 1.5054), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:23,548 - root - INFO - Step 7120: lr=4.01E-06, loss= 1.0579 (max= 1.5054), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:23,548 - root - INFO - Step 7120: lr=4.01E-06, loss= 1.0579 (max= 1.5054), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:55,391 - root - INFO - Step 7130: lr=4.00E-06, loss= 1.0722 (max= 1.4711), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:55,391 - root - INFO - Step 7130: lr=4.00E-06, loss= 1.0722 (max= 1.4711), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:55,392 - root - INFO - Step 7130: lr=4.00E-06, loss= 1.0722 (max= 1.4711), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:55,392 - root - INFO - Step 7130: lr=4.00E-06, loss= 1.0722 (max= 1.4711), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:55,392 - root - INFO - Step 7130: lr=4.00E-06, loss= 1.0722 (max= 1.4711), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:55,392 - root - INFO - Step 7130: lr=4.00E-06, loss= 1.0722 (max= 1.4711), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:55,392 - root - INFO - Step 7130: lr=4.00E-06, loss= 1.0722 (max= 1.4711), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:25:55,392 - root - INFO - Step 7130: lr=4.00E-06, loss= 1.0722 (max= 1.4711), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:27,187 - root - INFO - Step 7140: lr=4.00E-06, loss= 1.0453 (max= 1.4359), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:27,187 - root - INFO - Step 7140: lr=4.00E-06, loss= 1.0453 (max= 1.4359), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:27,187 - root - INFO - Step 7140: lr=4.00E-06, loss= 1.0453 (max= 1.4359), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:27,187 - root - INFO - Step 7140: lr=4.00E-06, loss= 1.0453 (max= 1.4359), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:27,187 - root - INFO - Step 7140: lr=4.00E-06, loss= 1.0453 (max= 1.4359), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:27,188 - root - INFO - Step 7140: lr=4.00E-06, loss= 1.0453 (max= 1.4359), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:27,188 - root - INFO - Step 7140: lr=4.00E-06, loss= 1.0453 (max= 1.4359), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:27,188 - root - INFO - Step 7140: lr=4.00E-06, loss= 1.0453 (max= 1.4359), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:59,322 - root - INFO - Step 7150: lr=3.99E-06, loss= 1.0665 (max= 1.5037), tps=20396, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:59,322 - root - INFO - Step 7150: lr=3.99E-06, loss= 1.0665 (max= 1.5037), tps=20396, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:59,322 - root - INFO - Step 7150: lr=3.99E-06, loss= 1.0665 (max= 1.5037), tps=20396, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:59,322 - root - INFO - Step 7150: lr=3.99E-06, loss= 1.0665 (max= 1.5037), tps=20396, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:59,323 - root - INFO - Step 7150: lr=3.99E-06, loss= 1.0665 (max= 1.5037), tps=20396, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:59,323 - root - INFO - Step 7150: lr=3.99E-06, loss= 1.0665 (max= 1.5037), tps=20396, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:59,323 - root - INFO - Step 7150: lr=3.99E-06, loss= 1.0665 (max= 1.5037), tps=20396, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:26:59,323 - root - INFO - Step 7150: lr=3.99E-06, loss= 1.0665 (max= 1.5037), tps=20396, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:27:31,250 - root - INFO - Step 7160: lr=3.99E-06, loss= 1.0851 (max= 1.4765), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:27:31,250 - root - INFO - Step 7160: lr=3.99E-06, loss= 1.0851 (max= 1.4765), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:27:31,250 - root - INFO - Step 7160: lr=3.99E-06, loss= 1.0851 (max= 1.4765), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:27:31,250 - root - INFO - Step 7160: lr=3.99E-06, loss= 1.0851 (max= 1.4765), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:27:31,250 - root - INFO - Step 7160: lr=3.99E-06, loss= 1.0851 (max= 1.4765), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:27:31,250 - root - INFO - Step 7160: lr=3.99E-06, loss= 1.0851 (max= 1.4765), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:27:31,250 - root - INFO - Step 7160: lr=3.99E-06, loss= 1.0851 (max= 1.4765), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:27:31,250 - root - INFO - Step 7160: lr=3.99E-06, loss= 1.0851 (max= 1.4765), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:03,174 - root - INFO - Step 7170: lr=3.98E-06, loss= 1.0563 (max= 1.5255), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:03,174 - root - INFO - Step 7170: lr=3.98E-06, loss= 1.0563 (max= 1.5255), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:03,174 - root - INFO - Step 7170: lr=3.98E-06, loss= 1.0563 (max= 1.5255), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:03,174 - root - INFO - Step 7170: lr=3.98E-06, loss= 1.0563 (max= 1.5255), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:03,174 - root - INFO - Step 7170: lr=3.98E-06, loss= 1.0563 (max= 1.5255), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:03,174 - root - INFO - Step 7170: lr=3.98E-06, loss= 1.0563 (max= 1.5255), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:03,174 - root - INFO - Step 7170: lr=3.98E-06, loss= 1.0563 (max= 1.5255), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:03,175 - root - INFO - Step 7170: lr=3.98E-06, loss= 1.0563 (max= 1.5255), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:05,508 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5622267 2025-10-26 14:28:35,067 - root - INFO - Step 7180: lr=3.98E-06, loss= 1.0559 (max= 1.4743), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:35,067 - root - INFO - Step 7180: lr=3.98E-06, loss= 1.0559 (max= 1.4743), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:35,068 - root - INFO - Step 7180: lr=3.98E-06, loss= 1.0559 (max= 1.4743), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:35,068 - root - INFO - Step 7180: lr=3.98E-06, loss= 1.0559 (max= 1.4743), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:35,068 - root - INFO - Step 7180: lr=3.98E-06, loss= 1.0559 (max= 1.4743), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:35,068 - root - INFO - Step 7180: lr=3.98E-06, loss= 1.0559 (max= 1.4743), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:35,068 - root - INFO - Step 7180: lr=3.98E-06, loss= 1.0559 (max= 1.4743), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:28:35,068 - root - INFO - Step 7180: lr=3.98E-06, loss= 1.0559 (max= 1.4743), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:06,978 - root - INFO - Step 7190: lr=3.97E-06, loss= 1.0522 (max= 1.4665), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:06,978 - root - INFO - Step 7190: lr=3.97E-06, loss= 1.0522 (max= 1.4665), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:06,978 - root - INFO - Step 7190: lr=3.97E-06, loss= 1.0522 (max= 1.4665), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:06,978 - root - INFO - Step 7190: lr=3.97E-06, loss= 1.0522 (max= 1.4665), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:06,978 - root - INFO - Step 7190: lr=3.97E-06, loss= 1.0522 (max= 1.4665), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:06,978 - root - INFO - Step 7190: lr=3.97E-06, loss= 1.0522 (max= 1.4665), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:06,978 - root - INFO - Step 7190: lr=3.97E-06, loss= 1.0522 (max= 1.4665), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:06,978 - root - INFO - Step 7190: lr=3.97E-06, loss= 1.0522 (max= 1.4665), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:38,792 - root - INFO - Step 7200: lr=3.97E-06, loss= 1.0649 (max= 1.5112), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:38,792 - root - INFO - Step 7200: lr=3.97E-06, loss= 1.0649 (max= 1.5112), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:38,792 - root - INFO - Step 7200: lr=3.97E-06, loss= 1.0649 (max= 1.5112), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:38,792 - root - INFO - Step 7200: lr=3.97E-06, loss= 1.0649 (max= 1.5112), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:38,792 - root - INFO - Step 7200: lr=3.97E-06, loss= 1.0649 (max= 1.5112), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:38,792 - root - INFO - Step 7200: lr=3.97E-06, loss= 1.0649 (max= 1.5112), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:38,792 - root - INFO - Step 7200: lr=3.97E-06, loss= 1.0649 (max= 1.5112), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:29:38,792 - root - INFO - Step 7200: lr=3.97E-06, loss= 1.0649 (max= 1.5112), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:10,723 - root - INFO - Step 7210: lr=3.97E-06, loss= 1.0857 (max= 1.5165), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:10,723 - root - INFO - Step 7210: lr=3.97E-06, loss= 1.0857 (max= 1.5165), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:10,723 - root - INFO - Step 7210: lr=3.97E-06, loss= 1.0857 (max= 1.5165), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:10,723 - root - INFO - Step 7210: lr=3.97E-06, loss= 1.0857 (max= 1.5165), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:10,723 - root - INFO - Step 7210: lr=3.97E-06, loss= 1.0857 (max= 1.5165), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:10,723 - root - INFO - Step 7210: lr=3.97E-06, loss= 1.0857 (max= 1.5165), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:10,723 - root - INFO - Step 7210: lr=3.97E-06, loss= 1.0857 (max= 1.5165), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:10,723 - root - INFO - Step 7210: lr=3.97E-06, loss= 1.0857 (max= 1.5165), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:42,584 - root - INFO - Step 7220: lr=3.96E-06, loss= 1.0544 (max= 1.6370), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:42,584 - root - INFO - Step 7220: lr=3.96E-06, loss= 1.0544 (max= 1.6370), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:42,584 - root - INFO - Step 7220: lr=3.96E-06, loss= 1.0544 (max= 1.6370), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:42,584 - root - INFO - Step 7220: lr=3.96E-06, loss= 1.0544 (max= 1.6370), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:42,584 - root - INFO - Step 7220: lr=3.96E-06, loss= 1.0544 (max= 1.6370), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:42,584 - root - INFO - Step 7220: lr=3.96E-06, loss= 1.0544 (max= 1.6370), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:42,584 - root - INFO - Step 7220: lr=3.96E-06, loss= 1.0544 (max= 1.6370), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:30:42,585 - root - INFO - Step 7220: lr=3.96E-06, loss= 1.0544 (max= 1.6370), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:14,467 - root - INFO - Step 7230: lr=3.96E-06, loss= 1.0503 (max= 1.5952), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:14,467 - root - INFO - Step 7230: lr=3.96E-06, loss= 1.0503 (max= 1.5952), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:14,468 - root - INFO - Step 7230: lr=3.96E-06, loss= 1.0503 (max= 1.5952), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:14,468 - root - INFO - Step 7230: lr=3.96E-06, loss= 1.0503 (max= 1.5952), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:14,468 - root - INFO - Step 7230: lr=3.96E-06, loss= 1.0503 (max= 1.5952), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:14,468 - root - INFO - Step 7230: lr=3.96E-06, loss= 1.0503 (max= 1.5952), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:14,468 - root - INFO - Step 7230: lr=3.96E-06, loss= 1.0503 (max= 1.5952), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:14,468 - root - INFO - Step 7230: lr=3.96E-06, loss= 1.0503 (max= 1.5952), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:46,404 - root - INFO - Step 7240: lr=3.95E-06, loss= 1.0427 (max= 1.6224), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:46,405 - root - INFO - Step 7240: lr=3.95E-06, loss= 1.0427 (max= 1.6224), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:46,405 - root - INFO - Step 7240: lr=3.95E-06, loss= 1.0427 (max= 1.6224), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:46,405 - root - INFO - Step 7240: lr=3.95E-06, loss= 1.0427 (max= 1.6224), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:46,405 - root - INFO - Step 7240: lr=3.95E-06, loss= 1.0427 (max= 1.6224), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:46,405 - root - INFO - Step 7240: lr=3.95E-06, loss= 1.0427 (max= 1.6224), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:46,405 - root - INFO - Step 7240: lr=3.95E-06, loss= 1.0427 (max= 1.6224), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:31:46,405 - root - INFO - Step 7240: lr=3.95E-06, loss= 1.0427 (max= 1.6224), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:18,240 - root - INFO - Step 7250: lr=3.95E-06, loss= 1.0577 (max= 1.5567), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:18,240 - root - INFO - Step 7250: lr=3.95E-06, loss= 1.0577 (max= 1.5567), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:18,240 - root - INFO - Step 7250: lr=3.95E-06, loss= 1.0577 (max= 1.5567), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:18,241 - root - INFO - Step 7250: lr=3.95E-06, loss= 1.0577 (max= 1.5567), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:18,241 - root - INFO - Step 7250: lr=3.95E-06, loss= 1.0577 (max= 1.5567), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:18,241 - root - INFO - Step 7250: lr=3.95E-06, loss= 1.0577 (max= 1.5567), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:18,241 - root - INFO - Step 7250: lr=3.95E-06, loss= 1.0577 (max= 1.5567), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:18,241 - root - INFO - Step 7250: lr=3.95E-06, loss= 1.0577 (max= 1.5567), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:50,028 - root - INFO - Step 7260: lr=3.94E-06, loss= 1.0462 (max= 1.5750), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:50,029 - root - INFO - Step 7260: lr=3.94E-06, loss= 1.0462 (max= 1.5750), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:50,029 - root - INFO - Step 7260: lr=3.94E-06, loss= 1.0462 (max= 1.5750), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:50,029 - root - INFO - Step 7260: lr=3.94E-06, loss= 1.0462 (max= 1.5750), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:50,029 - root - INFO - Step 7260: lr=3.94E-06, loss= 1.0462 (max= 1.5750), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:50,029 - root - INFO - Step 7260: lr=3.94E-06, loss= 1.0462 (max= 1.5750), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:50,029 - root - INFO - Step 7260: lr=3.94E-06, loss= 1.0462 (max= 1.5750), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:32:50,029 - root - INFO - Step 7260: lr=3.94E-06, loss= 1.0462 (max= 1.5750), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:21,831 - root - INFO - Step 7270: lr=3.94E-06, loss= 1.0631 (max= 1.5413), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:21,832 - root - INFO - Step 7270: lr=3.94E-06, loss= 1.0631 (max= 1.5413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:21,832 - root - INFO - Step 7270: lr=3.94E-06, loss= 1.0631 (max= 1.5413), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:21,832 - root - INFO - Step 7270: lr=3.94E-06, loss= 1.0631 (max= 1.5413), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:21,832 - root - INFO - Step 7270: lr=3.94E-06, loss= 1.0631 (max= 1.5413), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:21,832 - root - INFO - Step 7270: lr=3.94E-06, loss= 1.0631 (max= 1.5413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:21,832 - root - INFO - Step 7270: lr=3.94E-06, loss= 1.0631 (max= 1.5413), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:21,832 - root - INFO - Step 7270: lr=3.94E-06, loss= 1.0631 (max= 1.5413), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:53,671 - root - INFO - Step 7280: lr=3.93E-06, loss= 1.0443 (max= 1.4811), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:53,671 - root - INFO - Step 7280: lr=3.93E-06, loss= 1.0443 (max= 1.4811), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:53,671 - root - INFO - Step 7280: lr=3.93E-06, loss= 1.0443 (max= 1.4811), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:53,671 - root - INFO - Step 7280: lr=3.93E-06, loss= 1.0443 (max= 1.4811), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:53,671 - root - INFO - Step 7280: lr=3.93E-06, loss= 1.0443 (max= 1.4811), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:53,671 - root - INFO - Step 7280: lr=3.93E-06, loss= 1.0443 (max= 1.4811), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:53,671 - root - INFO - Step 7280: lr=3.93E-06, loss= 1.0443 (max= 1.4811), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:33:53,672 - root - INFO - Step 7280: lr=3.93E-06, loss= 1.0443 (max= 1.4811), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:25,489 - root - INFO - Step 7290: lr=3.93E-06, loss= 1.0487 (max= 1.4872), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:25,489 - root - INFO - Step 7290: lr=3.93E-06, loss= 1.0487 (max= 1.4872), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:25,489 - root - INFO - Step 7290: lr=3.93E-06, loss= 1.0487 (max= 1.4872), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:25,489 - root - INFO - Step 7290: lr=3.93E-06, loss= 1.0487 (max= 1.4872), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:25,489 - root - INFO - Step 7290: lr=3.93E-06, loss= 1.0487 (max= 1.4872), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:25,489 - root - INFO - Step 7290: lr=3.93E-06, loss= 1.0487 (max= 1.4872), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:25,489 - root - INFO - Step 7290: lr=3.93E-06, loss= 1.0487 (max= 1.4872), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:25,490 - root - INFO - Step 7290: lr=3.93E-06, loss= 1.0487 (max= 1.4872), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:57,283 - root - INFO - Step 7300: lr=3.93E-06, loss= 1.0586 (max= 1.5401), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:57,283 - root - INFO - Step 7300: lr=3.93E-06, loss= 1.0586 (max= 1.5401), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:57,283 - root - INFO - Step 7300: lr=3.93E-06, loss= 1.0586 (max= 1.5401), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:57,283 - root - INFO - Step 7300: lr=3.93E-06, loss= 1.0586 (max= 1.5401), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:57,283 - root - INFO - Step 7300: lr=3.93E-06, loss= 1.0586 (max= 1.5401), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:57,283 - root - INFO - Step 7300: lr=3.93E-06, loss= 1.0586 (max= 1.5401), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:57,284 - root - INFO - Step 7300: lr=3.93E-06, loss= 1.0586 (max= 1.5401), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:34:57,284 - root - INFO - Step 7300: lr=3.93E-06, loss= 1.0586 (max= 1.5401), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:35:29,095 - root - INFO - Step 7310: lr=3.92E-06, loss= 1.0498 (max= 1.4956), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:35:29,095 - root - INFO - Step 7310: lr=3.92E-06, loss= 1.0498 (max= 1.4956), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:35:29,095 - root - INFO - Step 7310: lr=3.92E-06, loss= 1.0498 (max= 1.4956), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:35:29,095 - root - INFO - Step 7310: lr=3.92E-06, loss= 1.0498 (max= 1.4956), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:35:29,095 - root - INFO - Step 7310: lr=3.92E-06, loss= 1.0498 (max= 1.4956), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:35:29,095 - root - INFO - Step 7310: lr=3.92E-06, loss= 1.0498 (max= 1.4956), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:35:29,095 - root - INFO - Step 7310: lr=3.92E-06, loss= 1.0498 (max= 1.4956), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:35:29,095 - root - INFO - Step 7310: lr=3.92E-06, loss= 1.0498 (max= 1.4956), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:00,865 - root - INFO - Step 7320: lr=3.92E-06, loss= 1.0366 (max= 1.5221), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:00,865 - root - INFO - Step 7320: lr=3.92E-06, loss= 1.0366 (max= 1.5221), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:00,865 - root - INFO - Step 7320: lr=3.92E-06, loss= 1.0366 (max= 1.5221), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:00,865 - root - INFO - Step 7320: lr=3.92E-06, loss= 1.0366 (max= 1.5221), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:00,865 - root - INFO - Step 7320: lr=3.92E-06, loss= 1.0366 (max= 1.5221), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:00,865 - root - INFO - Step 7320: lr=3.92E-06, loss= 1.0366 (max= 1.5221), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:00,866 - root - INFO - Step 7320: lr=3.92E-06, loss= 1.0366 (max= 1.5221), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:00,866 - root - INFO - Step 7320: lr=3.92E-06, loss= 1.0366 (max= 1.5221), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:32,835 - root - INFO - Step 7330: lr=3.91E-06, loss= 1.0306 (max= 1.4533), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:32,835 - root - INFO - Step 7330: lr=3.91E-06, loss= 1.0306 (max= 1.4533), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:32,835 - root - INFO - Step 7330: lr=3.91E-06, loss= 1.0306 (max= 1.4533), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:32,835 - root - INFO - Step 7330: lr=3.91E-06, loss= 1.0306 (max= 1.4533), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:32,835 - root - INFO - Step 7330: lr=3.91E-06, loss= 1.0306 (max= 1.4533), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:32,835 - root - INFO - Step 7330: lr=3.91E-06, loss= 1.0306 (max= 1.4533), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:32,835 - root - INFO - Step 7330: lr=3.91E-06, loss= 1.0306 (max= 1.4533), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:36:32,835 - root - INFO - Step 7330: lr=3.91E-06, loss= 1.0306 (max= 1.4533), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:04,714 - root - INFO - Step 7340: lr=3.91E-06, loss= 1.0406 (max= 1.5568), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:04,714 - root - INFO - Step 7340: lr=3.91E-06, loss= 1.0406 (max= 1.5568), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:04,714 - root - INFO - Step 7340: lr=3.91E-06, loss= 1.0406 (max= 1.5568), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:04,714 - root - INFO - Step 7340: lr=3.91E-06, loss= 1.0406 (max= 1.5568), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:04,715 - root - INFO - Step 7340: lr=3.91E-06, loss= 1.0406 (max= 1.5568), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:04,715 - root - INFO - Step 7340: lr=3.91E-06, loss= 1.0406 (max= 1.5568), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:04,715 - root - INFO - Step 7340: lr=3.91E-06, loss= 1.0406 (max= 1.5568), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:04,715 - root - INFO - Step 7340: lr=3.91E-06, loss= 1.0406 (max= 1.5568), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:36,540 - root - INFO - Step 7350: lr=3.90E-06, loss= 1.0696 (max= 1.5219), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:36,540 - root - INFO - Step 7350: lr=3.90E-06, loss= 1.0696 (max= 1.5219), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:36,540 - root - INFO - Step 7350: lr=3.90E-06, loss= 1.0696 (max= 1.5219), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:36,540 - root - INFO - Step 7350: lr=3.90E-06, loss= 1.0696 (max= 1.5219), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:36,540 - root - INFO - Step 7350: lr=3.90E-06, loss= 1.0696 (max= 1.5219), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:36,540 - root - INFO - Step 7350: lr=3.90E-06, loss= 1.0696 (max= 1.5219), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:36,541 - root - INFO - Step 7350: lr=3.90E-06, loss= 1.0696 (max= 1.5219), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:37:36,541 - root - INFO - Step 7350: lr=3.90E-06, loss= 1.0696 (max= 1.5219), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:08,396 - root - INFO - Step 7360: lr=3.90E-06, loss= 1.0048 (max= 1.5759), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:08,397 - root - INFO - Step 7360: lr=3.90E-06, loss= 1.0048 (max= 1.5759), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:08,397 - root - INFO - Step 7360: lr=3.90E-06, loss= 1.0048 (max= 1.5759), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:08,397 - root - INFO - Step 7360: lr=3.90E-06, loss= 1.0048 (max= 1.5759), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:08,397 - root - INFO - Step 7360: lr=3.90E-06, loss= 1.0048 (max= 1.5759), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:08,397 - root - INFO - Step 7360: lr=3.90E-06, loss= 1.0048 (max= 1.5759), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:08,397 - root - INFO - Step 7360: lr=3.90E-06, loss= 1.0048 (max= 1.5759), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:08,397 - root - INFO - Step 7360: lr=3.90E-06, loss= 1.0048 (max= 1.5759), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:40,285 - root - INFO - Step 7370: lr=3.89E-06, loss= 1.0517 (max= 1.5094), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:40,285 - root - INFO - Step 7370: lr=3.89E-06, loss= 1.0517 (max= 1.5094), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:40,285 - root - INFO - Step 7370: lr=3.89E-06, loss= 1.0517 (max= 1.5094), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:40,285 - root - INFO - Step 7370: lr=3.89E-06, loss= 1.0517 (max= 1.5094), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:40,285 - root - INFO - Step 7370: lr=3.89E-06, loss= 1.0517 (max= 1.5094), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:40,285 - root - INFO - Step 7370: lr=3.89E-06, loss= 1.0517 (max= 1.5094), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:40,285 - root - INFO - Step 7370: lr=3.89E-06, loss= 1.0517 (max= 1.5094), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:38:40,285 - root - INFO - Step 7370: lr=3.89E-06, loss= 1.0517 (max= 1.5094), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:12,054 - root - INFO - Step 7380: lr=3.89E-06, loss= 1.0449 (max= 1.4448), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:12,054 - root - INFO - Step 7380: lr=3.89E-06, loss= 1.0449 (max= 1.4448), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:12,054 - root - INFO - Step 7380: lr=3.89E-06, loss= 1.0449 (max= 1.4448), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:12,054 - root - INFO - Step 7380: lr=3.89E-06, loss= 1.0449 (max= 1.4448), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:12,054 - root - INFO - Step 7380: lr=3.89E-06, loss= 1.0449 (max= 1.4448), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:12,054 - root - INFO - Step 7380: lr=3.89E-06, loss= 1.0449 (max= 1.4448), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:12,054 - root - INFO - Step 7380: lr=3.89E-06, loss= 1.0449 (max= 1.4448), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:12,054 - root - INFO - Step 7380: lr=3.89E-06, loss= 1.0449 (max= 1.4448), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:43,843 - root - INFO - Step 7390: lr=3.89E-06, loss= 1.0371 (max= 1.6583), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:43,843 - root - INFO - Step 7390: lr=3.89E-06, loss= 1.0371 (max= 1.6583), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:43,843 - root - INFO - Step 7390: lr=3.89E-06, loss= 1.0371 (max= 1.6583), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:43,843 - root - INFO - Step 7390: lr=3.89E-06, loss= 1.0371 (max= 1.6583), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:43,843 - root - INFO - Step 7390: lr=3.89E-06, loss= 1.0371 (max= 1.6583), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:43,843 - root - INFO - Step 7390: lr=3.89E-06, loss= 1.0371 (max= 1.6583), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:43,843 - root - INFO - Step 7390: lr=3.89E-06, loss= 1.0371 (max= 1.6583), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:39:43,843 - root - INFO - Step 7390: lr=3.89E-06, loss= 1.0371 (max= 1.6583), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:15,747 - root - INFO - Step 7400: lr=3.88E-06, loss= 1.0599 (max= 1.4582), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:15,747 - root - INFO - Step 7400: lr=3.88E-06, loss= 1.0599 (max= 1.4582), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:15,747 - root - INFO - Step 7400: lr=3.88E-06, loss= 1.0599 (max= 1.4582), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:15,747 - root - INFO - Step 7400: lr=3.88E-06, loss= 1.0599 (max= 1.4582), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:15,747 - root - INFO - Step 7400: lr=3.88E-06, loss= 1.0599 (max= 1.4582), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:15,747 - root - INFO - Step 7400: lr=3.88E-06, loss= 1.0599 (max= 1.4582), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:15,747 - root - INFO - Step 7400: lr=3.88E-06, loss= 1.0599 (max= 1.4582), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:15,748 - root - INFO - Step 7400: lr=3.88E-06, loss= 1.0599 (max= 1.4582), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:47,700 - root - INFO - Step 7410: lr=3.88E-06, loss= 1.0273 (max= 1.6741), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:47,701 - root - INFO - Step 7410: lr=3.88E-06, loss= 1.0273 (max= 1.6741), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:47,701 - root - INFO - Step 7410: lr=3.88E-06, loss= 1.0273 (max= 1.6741), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:47,701 - root - INFO - Step 7410: lr=3.88E-06, loss= 1.0273 (max= 1.6741), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:47,701 - root - INFO - Step 7410: lr=3.88E-06, loss= 1.0273 (max= 1.6741), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:47,701 - root - INFO - Step 7410: lr=3.88E-06, loss= 1.0273 (max= 1.6741), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:47,701 - root - INFO - Step 7410: lr=3.88E-06, loss= 1.0273 (max= 1.6741), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:40:47,701 - root - INFO - Step 7410: lr=3.88E-06, loss= 1.0273 (max= 1.6741), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:19,475 - root - INFO - Step 7420: lr=3.87E-06, loss= 1.0431 (max= 1.6621), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:19,475 - root - INFO - Step 7420: lr=3.87E-06, loss= 1.0431 (max= 1.6621), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:19,475 - root - INFO - Step 7420: lr=3.87E-06, loss= 1.0431 (max= 1.6621), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:19,475 - root - INFO - Step 7420: lr=3.87E-06, loss= 1.0431 (max= 1.6621), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:19,475 - root - INFO - Step 7420: lr=3.87E-06, loss= 1.0431 (max= 1.6621), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:19,475 - root - INFO - Step 7420: lr=3.87E-06, loss= 1.0431 (max= 1.6621), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:19,475 - root - INFO - Step 7420: lr=3.87E-06, loss= 1.0431 (max= 1.6621), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:19,476 - root - INFO - Step 7420: lr=3.87E-06, loss= 1.0431 (max= 1.6621), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:51,296 - root - INFO - Step 7430: lr=3.87E-06, loss= 1.0236 (max= 1.5485), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:51,296 - root - INFO - Step 7430: lr=3.87E-06, loss= 1.0236 (max= 1.5485), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:51,296 - root - INFO - Step 7430: lr=3.87E-06, loss= 1.0236 (max= 1.5485), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:51,296 - root - INFO - Step 7430: lr=3.87E-06, loss= 1.0236 (max= 1.5485), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:51,296 - root - INFO - Step 7430: lr=3.87E-06, loss= 1.0236 (max= 1.5485), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:51,296 - root - INFO - Step 7430: lr=3.87E-06, loss= 1.0236 (max= 1.5485), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:51,296 - root - INFO - Step 7430: lr=3.87E-06, loss= 1.0236 (max= 1.5485), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:41:51,297 - root - INFO - Step 7430: lr=3.87E-06, loss= 1.0236 (max= 1.5485), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:23,155 - root - INFO - Step 7440: lr=3.86E-06, loss= 1.0551 (max= 1.4601), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:23,155 - root - INFO - Step 7440: lr=3.86E-06, loss= 1.0551 (max= 1.4601), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:23,155 - root - INFO - Step 7440: lr=3.86E-06, loss= 1.0551 (max= 1.4601), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:23,155 - root - INFO - Step 7440: lr=3.86E-06, loss= 1.0551 (max= 1.4601), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:23,155 - root - INFO - Step 7440: lr=3.86E-06, loss= 1.0551 (max= 1.4601), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:23,155 - root - INFO - Step 7440: lr=3.86E-06, loss= 1.0551 (max= 1.4601), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:23,155 - root - INFO - Step 7440: lr=3.86E-06, loss= 1.0551 (max= 1.4601), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:23,156 - root - INFO - Step 7440: lr=3.86E-06, loss= 1.0551 (max= 1.4601), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:55,043 - root - INFO - Step 7450: lr=3.86E-06, loss= 1.0338 (max= 1.4493), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:55,043 - root - INFO - Step 7450: lr=3.86E-06, loss= 1.0338 (max= 1.4493), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:55,043 - root - INFO - Step 7450: lr=3.86E-06, loss= 1.0338 (max= 1.4493), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:55,043 - root - INFO - Step 7450: lr=3.86E-06, loss= 1.0338 (max= 1.4493), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:55,043 - root - INFO - Step 7450: lr=3.86E-06, loss= 1.0338 (max= 1.4493), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:55,043 - root - INFO - Step 7450: lr=3.86E-06, loss= 1.0338 (max= 1.4493), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:55,043 - root - INFO - Step 7450: lr=3.86E-06, loss= 1.0338 (max= 1.4493), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:42:55,044 - root - INFO - Step 7450: lr=3.86E-06, loss= 1.0338 (max= 1.4493), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:26,935 - root - INFO - Step 7460: lr=3.85E-06, loss= 1.0496 (max= 1.4365), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:26,935 - root - INFO - Step 7460: lr=3.85E-06, loss= 1.0496 (max= 1.4365), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:26,935 - root - INFO - Step 7460: lr=3.85E-06, loss= 1.0496 (max= 1.4365), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:26,935 - root - INFO - Step 7460: lr=3.85E-06, loss= 1.0496 (max= 1.4365), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:26,935 - root - INFO - Step 7460: lr=3.85E-06, loss= 1.0496 (max= 1.4365), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:26,935 - root - INFO - Step 7460: lr=3.85E-06, loss= 1.0496 (max= 1.4365), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:26,935 - root - INFO - Step 7460: lr=3.85E-06, loss= 1.0496 (max= 1.4365), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:26,935 - root - INFO - Step 7460: lr=3.85E-06, loss= 1.0496 (max= 1.4365), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:58,819 - root - INFO - Step 7470: lr=3.85E-06, loss= 1.0395 (max= 1.5340), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:58,820 - root - INFO - Step 7470: lr=3.85E-06, loss= 1.0395 (max= 1.5340), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:58,820 - root - INFO - Step 7470: lr=3.85E-06, loss= 1.0395 (max= 1.5340), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:58,820 - root - INFO - Step 7470: lr=3.85E-06, loss= 1.0395 (max= 1.5340), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:58,820 - root - INFO - Step 7470: lr=3.85E-06, loss= 1.0395 (max= 1.5340), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:58,820 - root - INFO - Step 7470: lr=3.85E-06, loss= 1.0395 (max= 1.5340), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:58,820 - root - INFO - Step 7470: lr=3.85E-06, loss= 1.0395 (max= 1.5340), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:43:58,821 - root - INFO - Step 7470: lr=3.85E-06, loss= 1.0395 (max= 1.5340), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:44:30,686 - root - INFO - Step 7480: lr=3.85E-06, loss= 1.0514 (max= 1.4664), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:44:30,686 - root - INFO - Step 7480: lr=3.85E-06, loss= 1.0514 (max= 1.4664), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:44:30,686 - root - INFO - Step 7480: lr=3.85E-06, loss= 1.0514 (max= 1.4664), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:44:30,686 - root - INFO - Step 7480: lr=3.85E-06, loss= 1.0514 (max= 1.4664), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:44:30,687 - root - INFO - Step 7480: lr=3.85E-06, loss= 1.0514 (max= 1.4664), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:44:30,687 - root - INFO - Step 7480: lr=3.85E-06, loss= 1.0514 (max= 1.4664), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:44:30,687 - root - INFO - Step 7480: lr=3.85E-06, loss= 1.0514 (max= 1.4664), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:44:30,687 - root - INFO - Step 7480: lr=3.85E-06, loss= 1.0514 (max= 1.4664), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:02,545 - root - INFO - Step 7490: lr=3.84E-06, loss= 1.0455 (max= 1.3925), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:02,545 - root - INFO - Step 7490: lr=3.84E-06, loss= 1.0455 (max= 1.3925), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:02,545 - root - INFO - Step 7490: lr=3.84E-06, loss= 1.0455 (max= 1.3925), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:02,545 - root - INFO - Step 7490: lr=3.84E-06, loss= 1.0455 (max= 1.3925), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:02,545 - root - INFO - Step 7490: lr=3.84E-06, loss= 1.0455 (max= 1.3925), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:02,545 - root - INFO - Step 7490: lr=3.84E-06, loss= 1.0455 (max= 1.3925), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:02,545 - root - INFO - Step 7490: lr=3.84E-06, loss= 1.0455 (max= 1.3925), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:02,546 - root - INFO - Step 7490: lr=3.84E-06, loss= 1.0455 (max= 1.3925), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:34,726 - root - INFO - Step 7500: lr=3.84E-06, loss= 1.0586 (max= 1.5334), tps=20367, mfu=42.43%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:34,726 - root - INFO - Step 7500: lr=3.84E-06, loss= 1.0586 (max= 1.5334), tps=20367, mfu=42.43%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:34,726 - root - INFO - Step 7500: lr=3.84E-06, loss= 1.0586 (max= 1.5334), tps=20367, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:34,726 - root - INFO - Step 7500: lr=3.84E-06, loss= 1.0586 (max= 1.5334), tps=20367, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:34,726 - root - INFO - Step 7500: lr=3.84E-06, loss= 1.0586 (max= 1.5334), tps=20367, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:34,726 - root - INFO - Step 7500: lr=3.84E-06, loss= 1.0586 (max= 1.5334), tps=20367, mfu=42.43%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:34,726 - root - INFO - Step 7500: lr=3.84E-06, loss= 1.0586 (max= 1.5334), tps=20367, mfu=42.43%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:45:34,727 - root - INFO - Step 7500: lr=3.84E-06, loss= 1.0586 (max= 1.5334), tps=20367, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:06,509 - root - INFO - Step 7510: lr=3.83E-06, loss= 1.0404 (max= 1.5106), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:06,509 - root - INFO - Step 7510: lr=3.83E-06, loss= 1.0404 (max= 1.5106), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:06,509 - root - INFO - Step 7510: lr=3.83E-06, loss= 1.0404 (max= 1.5106), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:06,509 - root - INFO - Step 7510: lr=3.83E-06, loss= 1.0404 (max= 1.5106), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:06,509 - root - INFO - Step 7510: lr=3.83E-06, loss= 1.0404 (max= 1.5106), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:06,509 - root - INFO - Step 7510: lr=3.83E-06, loss= 1.0404 (max= 1.5106), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:06,509 - root - INFO - Step 7510: lr=3.83E-06, loss= 1.0404 (max= 1.5106), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:06,509 - root - INFO - Step 7510: lr=3.83E-06, loss= 1.0404 (max= 1.5106), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:38,361 - root - INFO - Step 7520: lr=3.83E-06, loss= 1.0586 (max= 1.4340), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:38,361 - root - INFO - Step 7520: lr=3.83E-06, loss= 1.0586 (max= 1.4340), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:38,361 - root - INFO - Step 7520: lr=3.83E-06, loss= 1.0586 (max= 1.4340), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:38,361 - root - INFO - Step 7520: lr=3.83E-06, loss= 1.0586 (max= 1.4340), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:38,361 - root - INFO - Step 7520: lr=3.83E-06, loss= 1.0586 (max= 1.4340), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:38,361 - root - INFO - Step 7520: lr=3.83E-06, loss= 1.0586 (max= 1.4340), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:38,361 - root - INFO - Step 7520: lr=3.83E-06, loss= 1.0586 (max= 1.4340), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:46:38,361 - root - INFO - Step 7520: lr=3.83E-06, loss= 1.0586 (max= 1.4340), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:10,278 - root - INFO - Step 7530: lr=3.82E-06, loss= 1.0455 (max= 1.4401), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:10,278 - root - INFO - Step 7530: lr=3.82E-06, loss= 1.0455 (max= 1.4401), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:10,278 - root - INFO - Step 7530: lr=3.82E-06, loss= 1.0455 (max= 1.4401), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:10,278 - root - INFO - Step 7530: lr=3.82E-06, loss= 1.0455 (max= 1.4401), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:10,278 - root - INFO - Step 7530: lr=3.82E-06, loss= 1.0455 (max= 1.4401), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:10,278 - root - INFO - Step 7530: lr=3.82E-06, loss= 1.0455 (max= 1.4401), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:10,278 - root - INFO - Step 7530: lr=3.82E-06, loss= 1.0455 (max= 1.4401), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:10,279 - root - INFO - Step 7530: lr=3.82E-06, loss= 1.0455 (max= 1.4401), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:42,184 - root - INFO - Step 7540: lr=3.82E-06, loss= 1.0595 (max= 1.4939), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:42,184 - root - INFO - Step 7540: lr=3.82E-06, loss= 1.0595 (max= 1.4939), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:42,184 - root - INFO - Step 7540: lr=3.82E-06, loss= 1.0595 (max= 1.4939), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:42,185 - root - INFO - Step 7540: lr=3.82E-06, loss= 1.0595 (max= 1.4939), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:42,185 - root - INFO - Step 7540: lr=3.82E-06, loss= 1.0595 (max= 1.4939), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:42,185 - root - INFO - Step 7540: lr=3.82E-06, loss= 1.0595 (max= 1.4939), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:42,185 - root - INFO - Step 7540: lr=3.82E-06, loss= 1.0595 (max= 1.4939), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:47:42,185 - root - INFO - Step 7540: lr=3.82E-06, loss= 1.0595 (max= 1.4939), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:14,041 - root - INFO - Step 7550: lr=3.81E-06, loss= 1.0539 (max= 1.4439), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:14,041 - root - INFO - Step 7550: lr=3.81E-06, loss= 1.0539 (max= 1.4439), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:14,041 - root - INFO - Step 7550: lr=3.81E-06, loss= 1.0539 (max= 1.4439), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:14,041 - root - INFO - Step 7550: lr=3.81E-06, loss= 1.0539 (max= 1.4439), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:14,041 - root - INFO - Step 7550: lr=3.81E-06, loss= 1.0539 (max= 1.4439), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:14,041 - root - INFO - Step 7550: lr=3.81E-06, loss= 1.0539 (max= 1.4439), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:14,041 - root - INFO - Step 7550: lr=3.81E-06, loss= 1.0539 (max= 1.4439), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:14,041 - root - INFO - Step 7550: lr=3.81E-06, loss= 1.0539 (max= 1.4439), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:45,872 - root - INFO - Step 7560: lr=3.81E-06, loss= 1.0364 (max= 1.5505), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:45,872 - root - INFO - Step 7560: lr=3.81E-06, loss= 1.0364 (max= 1.5505), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:45,873 - root - INFO - Step 7560: lr=3.81E-06, loss= 1.0364 (max= 1.5505), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:45,873 - root - INFO - Step 7560: lr=3.81E-06, loss= 1.0364 (max= 1.5505), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:45,873 - root - INFO - Step 7560: lr=3.81E-06, loss= 1.0364 (max= 1.5505), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:45,873 - root - INFO - Step 7560: lr=3.81E-06, loss= 1.0364 (max= 1.5505), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:45,873 - root - INFO - Step 7560: lr=3.81E-06, loss= 1.0364 (max= 1.5505), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:48:45,873 - root - INFO - Step 7560: lr=3.81E-06, loss= 1.0364 (max= 1.5505), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:17,769 - root - INFO - Step 7570: lr=3.81E-06, loss= 1.0490 (max= 1.5723), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:17,769 - root - INFO - Step 7570: lr=3.81E-06, loss= 1.0490 (max= 1.5723), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:17,769 - root - INFO - Step 7570: lr=3.81E-06, loss= 1.0490 (max= 1.5723), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:17,769 - root - INFO - Step 7570: lr=3.81E-06, loss= 1.0490 (max= 1.5723), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:17,769 - root - INFO - Step 7570: lr=3.81E-06, loss= 1.0490 (max= 1.5723), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:17,769 - root - INFO - Step 7570: lr=3.81E-06, loss= 1.0490 (max= 1.5723), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:17,769 - root - INFO - Step 7570: lr=3.81E-06, loss= 1.0490 (max= 1.5723), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:17,770 - root - INFO - Step 7570: lr=3.81E-06, loss= 1.0490 (max= 1.5723), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:49,666 - root - INFO - Step 7580: lr=3.80E-06, loss= 1.0327 (max= 1.5662), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:49,666 - root - INFO - Step 7580: lr=3.80E-06, loss= 1.0327 (max= 1.5662), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:49,667 - root - INFO - Step 7580: lr=3.80E-06, loss= 1.0327 (max= 1.5662), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:49,667 - root - INFO - Step 7580: lr=3.80E-06, loss= 1.0327 (max= 1.5662), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:49,667 - root - INFO - Step 7580: lr=3.80E-06, loss= 1.0327 (max= 1.5662), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:49,667 - root - INFO - Step 7580: lr=3.80E-06, loss= 1.0327 (max= 1.5662), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:49,667 - root - INFO - Step 7580: lr=3.80E-06, loss= 1.0327 (max= 1.5662), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:49:49,667 - root - INFO - Step 7580: lr=3.80E-06, loss= 1.0327 (max= 1.5662), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:21,502 - root - INFO - Step 7590: lr=3.80E-06, loss= 1.0545 (max= 1.6033), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:21,502 - root - INFO - Step 7590: lr=3.80E-06, loss= 1.0545 (max= 1.6033), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:21,502 - root - INFO - Step 7590: lr=3.80E-06, loss= 1.0545 (max= 1.6033), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:21,502 - root - INFO - Step 7590: lr=3.80E-06, loss= 1.0545 (max= 1.6033), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:21,502 - root - INFO - Step 7590: lr=3.80E-06, loss= 1.0545 (max= 1.6033), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:21,502 - root - INFO - Step 7590: lr=3.80E-06, loss= 1.0545 (max= 1.6033), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:21,502 - root - INFO - Step 7590: lr=3.80E-06, loss= 1.0545 (max= 1.6033), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:21,503 - root - INFO - Step 7590: lr=3.80E-06, loss= 1.0545 (max= 1.6033), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:53,364 - root - INFO - Step 7600: lr=3.79E-06, loss= 1.0539 (max= 1.4606), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:53,364 - root - INFO - Step 7600: lr=3.79E-06, loss= 1.0539 (max= 1.4606), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:53,364 - root - INFO - Step 7600: lr=3.79E-06, loss= 1.0539 (max= 1.4606), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:53,364 - root - INFO - Step 7600: lr=3.79E-06, loss= 1.0539 (max= 1.4606), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:53,364 - root - INFO - Step 7600: lr=3.79E-06, loss= 1.0539 (max= 1.4606), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:53,364 - root - INFO - Step 7600: lr=3.79E-06, loss= 1.0539 (max= 1.4606), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:53,364 - root - INFO - Step 7600: lr=3.79E-06, loss= 1.0539 (max= 1.4606), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:50:53,365 - root - INFO - Step 7600: lr=3.79E-06, loss= 1.0539 (max= 1.4606), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:25,198 - root - INFO - Step 7610: lr=3.79E-06, loss= 1.0406 (max= 1.5842), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:25,198 - root - INFO - Step 7610: lr=3.79E-06, loss= 1.0406 (max= 1.5842), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:25,198 - root - INFO - Step 7610: lr=3.79E-06, loss= 1.0406 (max= 1.5842), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:25,198 - root - INFO - Step 7610: lr=3.79E-06, loss= 1.0406 (max= 1.5842), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:25,198 - root - INFO - Step 7610: lr=3.79E-06, loss= 1.0406 (max= 1.5842), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:25,198 - root - INFO - Step 7610: lr=3.79E-06, loss= 1.0406 (max= 1.5842), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:25,198 - root - INFO - Step 7610: lr=3.79E-06, loss= 1.0406 (max= 1.5842), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:25,199 - root - INFO - Step 7610: lr=3.79E-06, loss= 1.0406 (max= 1.5842), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:56,999 - root - INFO - Step 7620: lr=3.78E-06, loss= 1.0383 (max= 1.4344), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:56,999 - root - INFO - Step 7620: lr=3.78E-06, loss= 1.0383 (max= 1.4344), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:56,999 - root - INFO - Step 7620: lr=3.78E-06, loss= 1.0383 (max= 1.4344), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:56,999 - root - INFO - Step 7620: lr=3.78E-06, loss= 1.0383 (max= 1.4344), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:56,999 - root - INFO - Step 7620: lr=3.78E-06, loss= 1.0383 (max= 1.4344), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:56,999 - root - INFO - Step 7620: lr=3.78E-06, loss= 1.0383 (max= 1.4344), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:56,999 - root - INFO - Step 7620: lr=3.78E-06, loss= 1.0383 (max= 1.4344), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:51:56,999 - root - INFO - Step 7620: lr=3.78E-06, loss= 1.0383 (max= 1.4344), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:52:28,964 - root - INFO - Step 7630: lr=3.78E-06, loss= 1.0376 (max= 1.4215), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:52:28,964 - root - INFO - Step 7630: lr=3.78E-06, loss= 1.0376 (max= 1.4215), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:52:28,964 - root - INFO - Step 7630: lr=3.78E-06, loss= 1.0376 (max= 1.4215), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:52:28,964 - root - INFO - Step 7630: lr=3.78E-06, loss= 1.0376 (max= 1.4215), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:52:28,964 - root - INFO - Step 7630: lr=3.78E-06, loss= 1.0376 (max= 1.4215), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:52:28,964 - root - INFO - Step 7630: lr=3.78E-06, loss= 1.0376 (max= 1.4215), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:52:28,964 - root - INFO - Step 7630: lr=3.78E-06, loss= 1.0376 (max= 1.4215), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:52:28,965 - root - INFO - Step 7630: lr=3.78E-06, loss= 1.0376 (max= 1.4215), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:00,916 - root - INFO - Step 7640: lr=3.78E-06, loss= 1.0502 (max= 1.4716), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:00,916 - root - INFO - Step 7640: lr=3.78E-06, loss= 1.0502 (max= 1.4716), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:00,916 - root - INFO - Step 7640: lr=3.78E-06, loss= 1.0502 (max= 1.4716), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:00,916 - root - INFO - Step 7640: lr=3.78E-06, loss= 1.0502 (max= 1.4716), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:00,916 - root - INFO - Step 7640: lr=3.78E-06, loss= 1.0502 (max= 1.4716), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:00,916 - root - INFO - Step 7640: lr=3.78E-06, loss= 1.0502 (max= 1.4716), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:00,916 - root - INFO - Step 7640: lr=3.78E-06, loss= 1.0502 (max= 1.4716), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:00,917 - root - INFO - Step 7640: lr=3.78E-06, loss= 1.0502 (max= 1.4716), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:32,712 - root - INFO - Step 7650: lr=3.77E-06, loss= 1.0397 (max= 1.7084), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:32,712 - root - INFO - Step 7650: lr=3.77E-06, loss= 1.0397 (max= 1.7084), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:32,713 - root - INFO - Step 7650: lr=3.77E-06, loss= 1.0397 (max= 1.7084), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:32,713 - root - INFO - Step 7650: lr=3.77E-06, loss= 1.0397 (max= 1.7084), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:32,713 - root - INFO - Step 7650: lr=3.77E-06, loss= 1.0397 (max= 1.7084), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:32,713 - root - INFO - Step 7650: lr=3.77E-06, loss= 1.0397 (max= 1.7084), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:32,713 - root - INFO - Step 7650: lr=3.77E-06, loss= 1.0397 (max= 1.7084), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:53:32,713 - root - INFO - Step 7650: lr=3.77E-06, loss= 1.0397 (max= 1.7084), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:04,692 - root - INFO - Step 7660: lr=3.77E-06, loss= 1.0494 (max= 1.5907), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:04,692 - root - INFO - Step 7660: lr=3.77E-06, loss= 1.0494 (max= 1.5907), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:04,692 - root - INFO - Step 7660: lr=3.77E-06, loss= 1.0494 (max= 1.5907), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:04,692 - root - INFO - Step 7660: lr=3.77E-06, loss= 1.0494 (max= 1.5907), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:04,692 - root - INFO - Step 7660: lr=3.77E-06, loss= 1.0494 (max= 1.5907), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:04,692 - root - INFO - Step 7660: lr=3.77E-06, loss= 1.0494 (max= 1.5907), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:04,692 - root - INFO - Step 7660: lr=3.77E-06, loss= 1.0494 (max= 1.5907), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:04,693 - root - INFO - Step 7660: lr=3.77E-06, loss= 1.0494 (max= 1.5907), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:36,565 - root - INFO - Step 7670: lr=3.76E-06, loss= 1.0539 (max= 1.3579), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:36,565 - root - INFO - Step 7670: lr=3.76E-06, loss= 1.0539 (max= 1.3579), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:36,565 - root - INFO - Step 7670: lr=3.76E-06, loss= 1.0539 (max= 1.3579), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:36,565 - root - INFO - Step 7670: lr=3.76E-06, loss= 1.0539 (max= 1.3579), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:36,565 - root - INFO - Step 7670: lr=3.76E-06, loss= 1.0539 (max= 1.3579), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:36,565 - root - INFO - Step 7670: lr=3.76E-06, loss= 1.0539 (max= 1.3579), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:36,565 - root - INFO - Step 7670: lr=3.76E-06, loss= 1.0539 (max= 1.3579), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:54:36,566 - root - INFO - Step 7670: lr=3.76E-06, loss= 1.0539 (max= 1.3579), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:08,396 - root - INFO - Step 7680: lr=3.76E-06, loss= 1.0596 (max= 1.4605), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:08,396 - root - INFO - Step 7680: lr=3.76E-06, loss= 1.0596 (max= 1.4605), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:08,397 - root - INFO - Step 7680: lr=3.76E-06, loss= 1.0596 (max= 1.4605), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:08,397 - root - INFO - Step 7680: lr=3.76E-06, loss= 1.0596 (max= 1.4605), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:08,397 - root - INFO - Step 7680: lr=3.76E-06, loss= 1.0596 (max= 1.4605), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:08,397 - root - INFO - Step 7680: lr=3.76E-06, loss= 1.0596 (max= 1.4605), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:08,397 - root - INFO - Step 7680: lr=3.76E-06, loss= 1.0596 (max= 1.4605), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:08,397 - root - INFO - Step 7680: lr=3.76E-06, loss= 1.0596 (max= 1.4605), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:40,331 - root - INFO - Step 7690: lr=3.75E-06, loss= 1.0497 (max= 1.6003), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:40,332 - root - INFO - Step 7690: lr=3.75E-06, loss= 1.0497 (max= 1.6003), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:40,332 - root - INFO - Step 7690: lr=3.75E-06, loss= 1.0497 (max= 1.6003), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:40,332 - root - INFO - Step 7690: lr=3.75E-06, loss= 1.0497 (max= 1.6003), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:40,332 - root - INFO - Step 7690: lr=3.75E-06, loss= 1.0497 (max= 1.6003), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:40,332 - root - INFO - Step 7690: lr=3.75E-06, loss= 1.0497 (max= 1.6003), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:40,332 - root - INFO - Step 7690: lr=3.75E-06, loss= 1.0497 (max= 1.6003), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:55:40,333 - root - INFO - Step 7690: lr=3.75E-06, loss= 1.0497 (max= 1.6003), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:12,302 - root - INFO - Step 7700: lr=3.75E-06, loss= 1.0646 (max= 1.4958), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:12,302 - root - INFO - Step 7700: lr=3.75E-06, loss= 1.0646 (max= 1.4958), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:12,302 - root - INFO - Step 7700: lr=3.75E-06, loss= 1.0646 (max= 1.4958), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:12,302 - root - INFO - Step 7700: lr=3.75E-06, loss= 1.0646 (max= 1.4958), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:12,302 - root - INFO - Step 7700: lr=3.75E-06, loss= 1.0646 (max= 1.4958), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:12,302 - root - INFO - Step 7700: lr=3.75E-06, loss= 1.0646 (max= 1.4958), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:12,302 - root - INFO - Step 7700: lr=3.75E-06, loss= 1.0646 (max= 1.4958), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:12,303 - root - INFO - Step 7700: lr=3.75E-06, loss= 1.0646 (max= 1.4958), tps=20501, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:44,183 - root - INFO - Step 7710: lr=3.74E-06, loss= 1.0394 (max= 1.4915), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:44,183 - root - INFO - Step 7710: lr=3.74E-06, loss= 1.0394 (max= 1.4915), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:44,183 - root - INFO - Step 7710: lr=3.74E-06, loss= 1.0394 (max= 1.4915), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:44,183 - root - INFO - Step 7710: lr=3.74E-06, loss= 1.0394 (max= 1.4915), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:44,183 - root - INFO - Step 7710: lr=3.74E-06, loss= 1.0394 (max= 1.4915), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:44,183 - root - INFO - Step 7710: lr=3.74E-06, loss= 1.0394 (max= 1.4915), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:44,183 - root - INFO - Step 7710: lr=3.74E-06, loss= 1.0394 (max= 1.4915), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:56:44,184 - root - INFO - Step 7710: lr=3.74E-06, loss= 1.0394 (max= 1.4915), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:16,066 - root - INFO - Step 7720: lr=3.74E-06, loss= 1.0401 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:16,066 - root - INFO - Step 7720: lr=3.74E-06, loss= 1.0401 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:16,066 - root - INFO - Step 7720: lr=3.74E-06, loss= 1.0401 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:16,066 - root - INFO - Step 7720: lr=3.74E-06, loss= 1.0401 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:16,066 - root - INFO - Step 7720: lr=3.74E-06, loss= 1.0401 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:16,066 - root - INFO - Step 7720: lr=3.74E-06, loss= 1.0401 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:16,066 - root - INFO - Step 7720: lr=3.74E-06, loss= 1.0401 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:16,067 - root - INFO - Step 7720: lr=3.74E-06, loss= 1.0401 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:47,893 - root - INFO - Step 7730: lr=3.74E-06, loss= 1.0534 (max= 1.4843), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:47,893 - root - INFO - Step 7730: lr=3.74E-06, loss= 1.0534 (max= 1.4843), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:47,893 - root - INFO - Step 7730: lr=3.74E-06, loss= 1.0534 (max= 1.4843), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:47,893 - root - INFO - Step 7730: lr=3.74E-06, loss= 1.0534 (max= 1.4843), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:47,893 - root - INFO - Step 7730: lr=3.74E-06, loss= 1.0534 (max= 1.4843), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:47,893 - root - INFO - Step 7730: lr=3.74E-06, loss= 1.0534 (max= 1.4843), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:47,893 - root - INFO - Step 7730: lr=3.74E-06, loss= 1.0534 (max= 1.4843), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:57:47,894 - root - INFO - Step 7730: lr=3.74E-06, loss= 1.0534 (max= 1.4843), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:19,761 - root - INFO - Step 7740: lr=3.73E-06, loss= 1.0593 (max= 1.4007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:19,761 - root - INFO - Step 7740: lr=3.73E-06, loss= 1.0593 (max= 1.4007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:19,762 - root - INFO - Step 7740: lr=3.73E-06, loss= 1.0593 (max= 1.4007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:19,762 - root - INFO - Step 7740: lr=3.73E-06, loss= 1.0593 (max= 1.4007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:19,762 - root - INFO - Step 7740: lr=3.73E-06, loss= 1.0593 (max= 1.4007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:19,762 - root - INFO - Step 7740: lr=3.73E-06, loss= 1.0593 (max= 1.4007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:19,762 - root - INFO - Step 7740: lr=3.73E-06, loss= 1.0593 (max= 1.4007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:19,763 - root - INFO - Step 7740: lr=3.73E-06, loss= 1.0593 (max= 1.4007), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:51,798 - root - INFO - Step 7750: lr=3.73E-06, loss= 1.0701 (max= 1.5656), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:51,798 - root - INFO - Step 7750: lr=3.73E-06, loss= 1.0701 (max= 1.5656), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:51,798 - root - INFO - Step 7750: lr=3.73E-06, loss= 1.0701 (max= 1.5656), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:51,799 - root - INFO - Step 7750: lr=3.73E-06, loss= 1.0701 (max= 1.5656), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:51,799 - root - INFO - Step 7750: lr=3.73E-06, loss= 1.0701 (max= 1.5656), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:51,799 - root - INFO - Step 7750: lr=3.73E-06, loss= 1.0701 (max= 1.5656), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:51,800 - root - INFO - Step 7750: lr=3.73E-06, loss= 1.0701 (max= 1.5656), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:58:51,800 - root - INFO - Step 7750: lr=3.73E-06, loss= 1.0701 (max= 1.5656), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:23,662 - root - INFO - Step 7760: lr=3.72E-06, loss= 1.0411 (max= 1.5617), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:23,662 - root - INFO - Step 7760: lr=3.72E-06, loss= 1.0411 (max= 1.5617), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:23,662 - root - INFO - Step 7760: lr=3.72E-06, loss= 1.0411 (max= 1.5617), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:23,663 - root - INFO - Step 7760: lr=3.72E-06, loss= 1.0411 (max= 1.5617), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:23,663 - root - INFO - Step 7760: lr=3.72E-06, loss= 1.0411 (max= 1.5617), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:23,663 - root - INFO - Step 7760: lr=3.72E-06, loss= 1.0411 (max= 1.5617), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:23,663 - root - INFO - Step 7760: lr=3.72E-06, loss= 1.0411 (max= 1.5617), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:23,663 - root - INFO - Step 7760: lr=3.72E-06, loss= 1.0411 (max= 1.5617), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:55,596 - root - INFO - Step 7770: lr=3.72E-06, loss= 1.0477 (max= 1.4199), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:55,596 - root - INFO - Step 7770: lr=3.72E-06, loss= 1.0477 (max= 1.4199), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:55,597 - root - INFO - Step 7770: lr=3.72E-06, loss= 1.0477 (max= 1.4199), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:55,597 - root - INFO - Step 7770: lr=3.72E-06, loss= 1.0477 (max= 1.4199), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:55,597 - root - INFO - Step 7770: lr=3.72E-06, loss= 1.0477 (max= 1.4199), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:55,597 - root - INFO - Step 7770: lr=3.72E-06, loss= 1.0477 (max= 1.4199), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:55,597 - root - INFO - Step 7770: lr=3.72E-06, loss= 1.0477 (max= 1.4199), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 14:59:55,597 - root - INFO - Step 7770: lr=3.72E-06, loss= 1.0477 (max= 1.4199), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:27,407 - root - INFO - Step 7780: lr=3.71E-06, loss= 1.0430 (max= 1.6112), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:27,407 - root - INFO - Step 7780: lr=3.71E-06, loss= 1.0430 (max= 1.6112), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:27,407 - root - INFO - Step 7780: lr=3.71E-06, loss= 1.0430 (max= 1.6112), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:27,407 - root - INFO - Step 7780: lr=3.71E-06, loss= 1.0430 (max= 1.6112), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:27,407 - root - INFO - Step 7780: lr=3.71E-06, loss= 1.0430 (max= 1.6112), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:27,407 - root - INFO - Step 7780: lr=3.71E-06, loss= 1.0430 (max= 1.6112), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:27,407 - root - INFO - Step 7780: lr=3.71E-06, loss= 1.0430 (max= 1.6112), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:27,408 - root - INFO - Step 7780: lr=3.71E-06, loss= 1.0430 (max= 1.6112), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:59,301 - root - INFO - Step 7790: lr=3.71E-06, loss= 1.0242 (max= 1.5323), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:59,301 - root - INFO - Step 7790: lr=3.71E-06, loss= 1.0242 (max= 1.5323), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:59,301 - root - INFO - Step 7790: lr=3.71E-06, loss= 1.0242 (max= 1.5323), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:59,301 - root - INFO - Step 7790: lr=3.71E-06, loss= 1.0242 (max= 1.5323), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:59,301 - root - INFO - Step 7790: lr=3.71E-06, loss= 1.0242 (max= 1.5323), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:59,301 - root - INFO - Step 7790: lr=3.71E-06, loss= 1.0242 (max= 1.5323), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:59,301 - root - INFO - Step 7790: lr=3.71E-06, loss= 1.0242 (max= 1.5323), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:00:59,302 - root - INFO - Step 7790: lr=3.71E-06, loss= 1.0242 (max= 1.5323), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:01:31,138 - root - INFO - Step 7800: lr=3.71E-06, loss= 1.0654 (max= 1.3911), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:01:31,138 - root - INFO - Step 7800: lr=3.71E-06, loss= 1.0654 (max= 1.3911), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:01:31,138 - root - INFO - Step 7800: lr=3.71E-06, loss= 1.0654 (max= 1.3911), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:01:31,138 - root - INFO - Step 7800: lr=3.71E-06, loss= 1.0654 (max= 1.3911), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:01:31,138 - root - INFO - Step 7800: lr=3.71E-06, loss= 1.0654 (max= 1.3911), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:01:31,139 - root - INFO - Step 7800: lr=3.71E-06, loss= 1.0654 (max= 1.3911), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:01:31,139 - root - INFO - Step 7800: lr=3.71E-06, loss= 1.0654 (max= 1.3911), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:01:31,139 - root - INFO - Step 7800: lr=3.71E-06, loss= 1.0654 (max= 1.3911), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:02,969 - root - INFO - Step 7810: lr=3.70E-06, loss= 1.0483 (max= 1.5026), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:02,969 - root - INFO - Step 7810: lr=3.70E-06, loss= 1.0483 (max= 1.5026), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:02,969 - root - INFO - Step 7810: lr=3.70E-06, loss= 1.0483 (max= 1.5026), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:02,969 - root - INFO - Step 7810: lr=3.70E-06, loss= 1.0483 (max= 1.5026), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:02,969 - root - INFO - Step 7810: lr=3.70E-06, loss= 1.0483 (max= 1.5026), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:02,969 - root - INFO - Step 7810: lr=3.70E-06, loss= 1.0483 (max= 1.5026), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:02,969 - root - INFO - Step 7810: lr=3.70E-06, loss= 1.0483 (max= 1.5026), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:02,970 - root - INFO - Step 7810: lr=3.70E-06, loss= 1.0483 (max= 1.5026), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:34,863 - root - INFO - Step 7820: lr=3.70E-06, loss= 1.0928 (max= 1.5477), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:34,863 - root - INFO - Step 7820: lr=3.70E-06, loss= 1.0928 (max= 1.5477), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:34,863 - root - INFO - Step 7820: lr=3.70E-06, loss= 1.0928 (max= 1.5477), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:34,863 - root - INFO - Step 7820: lr=3.70E-06, loss= 1.0928 (max= 1.5477), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:34,863 - root - INFO - Step 7820: lr=3.70E-06, loss= 1.0928 (max= 1.5477), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:34,863 - root - INFO - Step 7820: lr=3.70E-06, loss= 1.0928 (max= 1.5477), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:34,863 - root - INFO - Step 7820: lr=3.70E-06, loss= 1.0928 (max= 1.5477), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:02:34,864 - root - INFO - Step 7820: lr=3.70E-06, loss= 1.0928 (max= 1.5477), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:06,837 - root - INFO - Step 7830: lr=3.69E-06, loss= 1.0470 (max= 1.5223), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:06,837 - root - INFO - Step 7830: lr=3.69E-06, loss= 1.0470 (max= 1.5223), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:06,837 - root - INFO - Step 7830: lr=3.69E-06, loss= 1.0470 (max= 1.5223), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:06,837 - root - INFO - Step 7830: lr=3.69E-06, loss= 1.0470 (max= 1.5223), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:06,837 - root - INFO - Step 7830: lr=3.69E-06, loss= 1.0470 (max= 1.5223), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:06,837 - root - INFO - Step 7830: lr=3.69E-06, loss= 1.0470 (max= 1.5223), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:06,837 - root - INFO - Step 7830: lr=3.69E-06, loss= 1.0470 (max= 1.5223), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:06,838 - root - INFO - Step 7830: lr=3.69E-06, loss= 1.0470 (max= 1.5223), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:38,734 - root - INFO - Step 7840: lr=3.69E-06, loss= 1.0911 (max= 1.6081), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:38,734 - root - INFO - Step 7840: lr=3.69E-06, loss= 1.0911 (max= 1.6081), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:38,734 - root - INFO - Step 7840: lr=3.69E-06, loss= 1.0911 (max= 1.6081), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:38,734 - root - INFO - Step 7840: lr=3.69E-06, loss= 1.0911 (max= 1.6081), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:38,734 - root - INFO - Step 7840: lr=3.69E-06, loss= 1.0911 (max= 1.6081), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:38,734 - root - INFO - Step 7840: lr=3.69E-06, loss= 1.0911 (max= 1.6081), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:38,734 - root - INFO - Step 7840: lr=3.69E-06, loss= 1.0911 (max= 1.6081), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:03:38,735 - root - INFO - Step 7840: lr=3.69E-06, loss= 1.0911 (max= 1.6081), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:10,557 - root - INFO - Step 7850: lr=3.68E-06, loss= 1.0524 (max= 1.4210), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:10,557 - root - INFO - Step 7850: lr=3.68E-06, loss= 1.0524 (max= 1.4210), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:10,557 - root - INFO - Step 7850: lr=3.68E-06, loss= 1.0524 (max= 1.4210), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:10,557 - root - INFO - Step 7850: lr=3.68E-06, loss= 1.0524 (max= 1.4210), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:10,557 - root - INFO - Step 7850: lr=3.68E-06, loss= 1.0524 (max= 1.4210), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:10,557 - root - INFO - Step 7850: lr=3.68E-06, loss= 1.0524 (max= 1.4210), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:10,558 - root - INFO - Step 7850: lr=3.68E-06, loss= 1.0524 (max= 1.4210), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:10,558 - root - INFO - Step 7850: lr=3.68E-06, loss= 1.0524 (max= 1.4210), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:42,423 - root - INFO - Step 7860: lr=3.68E-06, loss= 1.0517 (max= 1.4947), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:42,423 - root - INFO - Step 7860: lr=3.68E-06, loss= 1.0517 (max= 1.4947), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:42,423 - root - INFO - Step 7860: lr=3.68E-06, loss= 1.0517 (max= 1.4947), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:42,423 - root - INFO - Step 7860: lr=3.68E-06, loss= 1.0517 (max= 1.4947), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:42,423 - root - INFO - Step 7860: lr=3.68E-06, loss= 1.0517 (max= 1.4947), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:42,423 - root - INFO - Step 7860: lr=3.68E-06, loss= 1.0517 (max= 1.4947), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:42,423 - root - INFO - Step 7860: lr=3.68E-06, loss= 1.0517 (max= 1.4947), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:04:42,424 - root - INFO - Step 7860: lr=3.68E-06, loss= 1.0517 (max= 1.4947), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:14,272 - root - INFO - Step 7870: lr=3.68E-06, loss= 1.0676 (max= 1.4644), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:14,272 - root - INFO - Step 7870: lr=3.68E-06, loss= 1.0676 (max= 1.4644), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:14,272 - root - INFO - Step 7870: lr=3.68E-06, loss= 1.0676 (max= 1.4644), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:14,272 - root - INFO - Step 7870: lr=3.68E-06, loss= 1.0676 (max= 1.4644), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:14,272 - root - INFO - Step 7870: lr=3.68E-06, loss= 1.0676 (max= 1.4644), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:14,272 - root - INFO - Step 7870: lr=3.68E-06, loss= 1.0676 (max= 1.4644), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:14,272 - root - INFO - Step 7870: lr=3.68E-06, loss= 1.0676 (max= 1.4644), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:14,273 - root - INFO - Step 7870: lr=3.68E-06, loss= 1.0676 (max= 1.4644), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:46,097 - root - INFO - Step 7880: lr=3.67E-06, loss= 1.0297 (max= 1.4663), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:46,097 - root - INFO - Step 7880: lr=3.67E-06, loss= 1.0297 (max= 1.4663), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:46,097 - root - INFO - Step 7880: lr=3.67E-06, loss= 1.0297 (max= 1.4663), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:46,097 - root - INFO - Step 7880: lr=3.67E-06, loss= 1.0297 (max= 1.4663), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:46,097 - root - INFO - Step 7880: lr=3.67E-06, loss= 1.0297 (max= 1.4663), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:46,097 - root - INFO - Step 7880: lr=3.67E-06, loss= 1.0297 (max= 1.4663), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:46,097 - root - INFO - Step 7880: lr=3.67E-06, loss= 1.0297 (max= 1.4663), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:05:46,098 - root - INFO - Step 7880: lr=3.67E-06, loss= 1.0297 (max= 1.4663), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:17,932 - root - INFO - Step 7890: lr=3.67E-06, loss= 1.0764 (max= 1.5811), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:17,932 - root - INFO - Step 7890: lr=3.67E-06, loss= 1.0764 (max= 1.5811), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:17,932 - root - INFO - Step 7890: lr=3.67E-06, loss= 1.0764 (max= 1.5811), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:17,932 - root - INFO - Step 7890: lr=3.67E-06, loss= 1.0764 (max= 1.5811), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:17,932 - root - INFO - Step 7890: lr=3.67E-06, loss= 1.0764 (max= 1.5811), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:17,932 - root - INFO - Step 7890: lr=3.67E-06, loss= 1.0764 (max= 1.5811), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:17,932 - root - INFO - Step 7890: lr=3.67E-06, loss= 1.0764 (max= 1.5811), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:17,932 - root - INFO - Step 7890: lr=3.67E-06, loss= 1.0764 (max= 1.5811), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:49,763 - root - INFO - Step 7900: lr=3.66E-06, loss= 1.0495 (max= 1.5647), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:49,763 - root - INFO - Step 7900: lr=3.66E-06, loss= 1.0495 (max= 1.5647), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:49,763 - root - INFO - Step 7900: lr=3.66E-06, loss= 1.0495 (max= 1.5647), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:49,763 - root - INFO - Step 7900: lr=3.66E-06, loss= 1.0495 (max= 1.5647), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:49,763 - root - INFO - Step 7900: lr=3.66E-06, loss= 1.0495 (max= 1.5647), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:49,763 - root - INFO - Step 7900: lr=3.66E-06, loss= 1.0495 (max= 1.5647), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:49,763 - root - INFO - Step 7900: lr=3.66E-06, loss= 1.0495 (max= 1.5647), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:06:49,764 - root - INFO - Step 7900: lr=3.66E-06, loss= 1.0495 (max= 1.5647), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:21,641 - root - INFO - Step 7910: lr=3.66E-06, loss= 1.0277 (max= 1.4246), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:21,641 - root - INFO - Step 7910: lr=3.66E-06, loss= 1.0277 (max= 1.4246), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:21,641 - root - INFO - Step 7910: lr=3.66E-06, loss= 1.0277 (max= 1.4246), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:21,641 - root - INFO - Step 7910: lr=3.66E-06, loss= 1.0277 (max= 1.4246), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:21,641 - root - INFO - Step 7910: lr=3.66E-06, loss= 1.0277 (max= 1.4246), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:21,641 - root - INFO - Step 7910: lr=3.66E-06, loss= 1.0277 (max= 1.4246), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:21,641 - root - INFO - Step 7910: lr=3.66E-06, loss= 1.0277 (max= 1.4246), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:21,642 - root - INFO - Step 7910: lr=3.66E-06, loss= 1.0277 (max= 1.4246), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:53,572 - root - INFO - Step 7920: lr=3.65E-06, loss= 1.0389 (max= 1.5077), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:53,572 - root - INFO - Step 7920: lr=3.65E-06, loss= 1.0389 (max= 1.5077), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:53,572 - root - INFO - Step 7920: lr=3.65E-06, loss= 1.0389 (max= 1.5077), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:53,572 - root - INFO - Step 7920: lr=3.65E-06, loss= 1.0389 (max= 1.5077), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:53,573 - root - INFO - Step 7920: lr=3.65E-06, loss= 1.0389 (max= 1.5077), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:53,573 - root - INFO - Step 7920: lr=3.65E-06, loss= 1.0389 (max= 1.5077), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:53,573 - root - INFO - Step 7920: lr=3.65E-06, loss= 1.0389 (max= 1.5077), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:07:53,573 - root - INFO - Step 7920: lr=3.65E-06, loss= 1.0389 (max= 1.5077), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:25,401 - root - INFO - Step 7930: lr=3.65E-06, loss= 1.0359 (max= 1.5833), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:25,401 - root - INFO - Step 7930: lr=3.65E-06, loss= 1.0359 (max= 1.5833), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:25,401 - root - INFO - Step 7930: lr=3.65E-06, loss= 1.0359 (max= 1.5833), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:25,401 - root - INFO - Step 7930: lr=3.65E-06, loss= 1.0359 (max= 1.5833), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:25,401 - root - INFO - Step 7930: lr=3.65E-06, loss= 1.0359 (max= 1.5833), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:25,401 - root - INFO - Step 7930: lr=3.65E-06, loss= 1.0359 (max= 1.5833), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:25,402 - root - INFO - Step 7930: lr=3.65E-06, loss= 1.0359 (max= 1.5833), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:25,402 - root - INFO - Step 7930: lr=3.65E-06, loss= 1.0359 (max= 1.5833), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:57,309 - root - INFO - Step 7940: lr=3.65E-06, loss= 1.0749 (max= 1.7031), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:57,309 - root - INFO - Step 7940: lr=3.65E-06, loss= 1.0749 (max= 1.7031), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:57,309 - root - INFO - Step 7940: lr=3.65E-06, loss= 1.0749 (max= 1.7031), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:57,309 - root - INFO - Step 7940: lr=3.65E-06, loss= 1.0749 (max= 1.7031), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:57,309 - root - INFO - Step 7940: lr=3.65E-06, loss= 1.0749 (max= 1.7031), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:57,309 - root - INFO - Step 7940: lr=3.65E-06, loss= 1.0749 (max= 1.7031), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:57,309 - root - INFO - Step 7940: lr=3.65E-06, loss= 1.0749 (max= 1.7031), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:08:57,310 - root - INFO - Step 7940: lr=3.65E-06, loss= 1.0749 (max= 1.7031), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:09:29,186 - root - INFO - Step 7950: lr=3.64E-06, loss= 1.0667 (max= 1.5388), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:09:29,186 - root - INFO - Step 7950: lr=3.64E-06, loss= 1.0667 (max= 1.5388), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:09:29,187 - root - INFO - Step 7950: lr=3.64E-06, loss= 1.0667 (max= 1.5388), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:09:29,187 - root - INFO - Step 7950: lr=3.64E-06, loss= 1.0667 (max= 1.5388), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:09:29,187 - root - INFO - Step 7950: lr=3.64E-06, loss= 1.0667 (max= 1.5388), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:09:29,187 - root - INFO - Step 7950: lr=3.64E-06, loss= 1.0667 (max= 1.5388), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:09:29,187 - root - INFO - Step 7950: lr=3.64E-06, loss= 1.0667 (max= 1.5388), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:09:29,187 - root - INFO - Step 7950: lr=3.64E-06, loss= 1.0667 (max= 1.5388), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:01,030 - root - INFO - Step 7960: lr=3.64E-06, loss= 1.0421 (max= 1.4667), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:01,030 - root - INFO - Step 7960: lr=3.64E-06, loss= 1.0421 (max= 1.4667), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:01,030 - root - INFO - Step 7960: lr=3.64E-06, loss= 1.0421 (max= 1.4667), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:01,031 - root - INFO - Step 7960: lr=3.64E-06, loss= 1.0421 (max= 1.4667), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:01,031 - root - INFO - Step 7960: lr=3.64E-06, loss= 1.0421 (max= 1.4667), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:01,031 - root - INFO - Step 7960: lr=3.64E-06, loss= 1.0421 (max= 1.4667), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:01,031 - root - INFO - Step 7960: lr=3.64E-06, loss= 1.0421 (max= 1.4667), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:01,031 - root - INFO - Step 7960: lr=3.64E-06, loss= 1.0421 (max= 1.4667), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:32,916 - root - INFO - Step 7970: lr=3.63E-06, loss= 1.0453 (max= 1.4421), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:32,916 - root - INFO - Step 7970: lr=3.63E-06, loss= 1.0453 (max= 1.4421), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:32,916 - root - INFO - Step 7970: lr=3.63E-06, loss= 1.0453 (max= 1.4421), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:32,916 - root - INFO - Step 7970: lr=3.63E-06, loss= 1.0453 (max= 1.4421), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:32,916 - root - INFO - Step 7970: lr=3.63E-06, loss= 1.0453 (max= 1.4421), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:32,916 - root - INFO - Step 7970: lr=3.63E-06, loss= 1.0453 (max= 1.4421), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:32,916 - root - INFO - Step 7970: lr=3.63E-06, loss= 1.0453 (max= 1.4421), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:10:32,917 - root - INFO - Step 7970: lr=3.63E-06, loss= 1.0453 (max= 1.4421), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:04,759 - root - INFO - Step 7980: lr=3.63E-06, loss= 1.0395 (max= 1.5626), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:04,759 - root - INFO - Step 7980: lr=3.63E-06, loss= 1.0395 (max= 1.5626), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:04,759 - root - INFO - Step 7980: lr=3.63E-06, loss= 1.0395 (max= 1.5626), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:04,759 - root - INFO - Step 7980: lr=3.63E-06, loss= 1.0395 (max= 1.5626), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:04,759 - root - INFO - Step 7980: lr=3.63E-06, loss= 1.0395 (max= 1.5626), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:04,759 - root - INFO - Step 7980: lr=3.63E-06, loss= 1.0395 (max= 1.5626), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:04,759 - root - INFO - Step 7980: lr=3.63E-06, loss= 1.0395 (max= 1.5626), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:04,760 - root - INFO - Step 7980: lr=3.63E-06, loss= 1.0395 (max= 1.5626), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:36,585 - root - INFO - Step 7990: lr=3.62E-06, loss= 1.0583 (max= 1.4864), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:36,585 - root - INFO - Step 7990: lr=3.62E-06, loss= 1.0583 (max= 1.4864), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:36,585 - root - INFO - Step 7990: lr=3.62E-06, loss= 1.0583 (max= 1.4864), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:36,585 - root - INFO - Step 7990: lr=3.62E-06, loss= 1.0583 (max= 1.4864), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:36,585 - root - INFO - Step 7990: lr=3.62E-06, loss= 1.0583 (max= 1.4864), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:36,585 - root - INFO - Step 7990: lr=3.62E-06, loss= 1.0583 (max= 1.4864), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:36,585 - root - INFO - Step 7990: lr=3.62E-06, loss= 1.0583 (max= 1.4864), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:11:36,586 - root - INFO - Step 7990: lr=3.62E-06, loss= 1.0583 (max= 1.4864), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-8000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-8000! Save time: 4.3947155475616455 2025-10-26 15:12:08,419 - root - INFO - Step 8000: lr=3.62E-06, loss= 1.0618 (max= 1.4598), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:08,419 - root - INFO - Step 8000: lr=3.62E-06, loss= 1.0618 (max= 1.4598), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:08,419 - root - INFO - Saving a full checkpoint at step 8000 2025-10-26 15:12:08,419 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 15:12:08,419 - root - INFO - Step 8000: lr=3.62E-06, loss= 1.0618 (max= 1.4598), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:08,419 - root - INFO - Saving a full checkpoint at step 8000 2025-10-26 15:12:08,419 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 15:12:08,419 - root - INFO - Step 8000: lr=3.62E-06, loss= 1.0618 (max= 1.4598), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:08,419 - root - INFO - Saving a full checkpoint at step 8000 2025-10-26 15:12:08,419 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 15:12:08,419 - root - INFO - Saving a full checkpoint at step 8000 2025-10-26 15:12:08,419 - root - INFO - Step 8000: lr=3.62E-06, loss= 1.0618 (max= 1.4598), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:08,419 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 15:12:08,419 - root - INFO - Step 8000: lr=3.62E-06, loss= 1.0618 (max= 1.4598), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:08,419 - root - INFO - Saving a full checkpoint at step 8000 2025-10-26 15:12:08,419 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 15:12:08,419 - root - INFO - Step 8000: lr=3.62E-06, loss= 1.0618 (max= 1.4598), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:08,419 - root - INFO - Saving a full checkpoint at step 8000 2025-10-26 15:12:08,420 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 15:12:08,420 - root - INFO - Saving a full checkpoint at step 8000 2025-10-26 15:12:08,420 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 15:12:08,420 - root - INFO - Step 8000: lr=3.62E-06, loss= 1.0618 (max= 1.4598), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:08,420 - root - INFO - Saving a full checkpoint at step 8000 2025-10-26 15:12:08,420 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 15:12:22,072 - root - INFO - Finished saving the checkpoint in 13.65 seconds 2025-10-26 15:12:22,080 - root - INFO - Finished saving the checkpoint in 13.66 seconds 2025-10-26 15:12:22,080 - root - INFO - Finished saving the checkpoint in 13.66 seconds 2025-10-26 15:12:22,080 - root - INFO - Finished saving the checkpoint in 13.66 seconds 2025-10-26 15:12:22,080 - root - INFO - Finished saving the checkpoint in 13.66 seconds 2025-10-26 15:12:22,081 - root - INFO - Finished saving the checkpoint in 13.66 seconds 2025-10-26 15:12:22,083 - root - INFO - Finished saving the checkpoint in 13.66 seconds 2025-10-26 15:12:22,083 - root - INFO - Finished saving the checkpoint in 13.66 seconds 2025-10-26 15:12:53,849 - root - INFO - Step 8010: lr=3.62E-06, loss= 1.0624 (max= 1.6494), tps=14427, mfu=30.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:53,849 - root - INFO - Step 8010: lr=3.62E-06, loss= 1.0624 (max= 1.6494), tps=14427, mfu=30.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:53,849 - root - INFO - Step 8010: lr=3.62E-06, loss= 1.0624 (max= 1.6494), tps=14427, mfu=30.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:53,849 - root - INFO - Step 8010: lr=3.62E-06, loss= 1.0624 (max= 1.6494), tps=14427, mfu=30.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:53,849 - root - INFO - Step 8010: lr=3.62E-06, loss= 1.0624 (max= 1.6494), tps=14427, mfu=30.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:53,849 - root - INFO - Step 8010: lr=3.62E-06, loss= 1.0624 (max= 1.6494), tps=14427, mfu=30.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:53,849 - root - INFO - Step 8010: lr=3.62E-06, loss= 1.0624 (max= 1.6494), tps=14427, mfu=30.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:12:53,850 - root - INFO - Step 8010: lr=3.62E-06, loss= 1.0624 (max= 1.6494), tps=14427, mfu=30.06%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:25,686 - root - INFO - Step 8020: lr=3.61E-06, loss= 1.0607 (max= 1.5552), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:25,686 - root - INFO - Step 8020: lr=3.61E-06, loss= 1.0607 (max= 1.5552), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:25,686 - root - INFO - Step 8020: lr=3.61E-06, loss= 1.0607 (max= 1.5552), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:25,686 - root - INFO - Step 8020: lr=3.61E-06, loss= 1.0607 (max= 1.5552), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:25,686 - root - INFO - Step 8020: lr=3.61E-06, loss= 1.0607 (max= 1.5552), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:25,686 - root - INFO - Step 8020: lr=3.61E-06, loss= 1.0607 (max= 1.5552), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:25,686 - root - INFO - Step 8020: lr=3.61E-06, loss= 1.0607 (max= 1.5552), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:25,686 - root - INFO - Step 8020: lr=3.61E-06, loss= 1.0607 (max= 1.5552), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:57,665 - root - INFO - Step 8030: lr=3.61E-06, loss= 1.0507 (max= 1.5650), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:57,665 - root - INFO - Step 8030: lr=3.61E-06, loss= 1.0507 (max= 1.5650), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:57,665 - root - INFO - Step 8030: lr=3.61E-06, loss= 1.0507 (max= 1.5650), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:57,665 - root - INFO - Step 8030: lr=3.61E-06, loss= 1.0507 (max= 1.5650), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:57,665 - root - INFO - Step 8030: lr=3.61E-06, loss= 1.0507 (max= 1.5650), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:57,665 - root - INFO - Step 8030: lr=3.61E-06, loss= 1.0507 (max= 1.5650), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:57,665 - root - INFO - Step 8030: lr=3.61E-06, loss= 1.0507 (max= 1.5650), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:13:57,666 - root - INFO - Step 8030: lr=3.61E-06, loss= 1.0507 (max= 1.5650), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:14:29,476 - root - INFO - Step 8040: lr=3.60E-06, loss= 1.0592 (max= 1.4388), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:14:29,476 - root - INFO - Step 8040: lr=3.60E-06, loss= 1.0592 (max= 1.4388), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:14:29,476 - root - INFO - Step 8040: lr=3.60E-06, loss= 1.0592 (max= 1.4388), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:14:29,476 - root - INFO - Step 8040: lr=3.60E-06, loss= 1.0592 (max= 1.4388), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:14:29,476 - root - INFO - Step 8040: lr=3.60E-06, loss= 1.0592 (max= 1.4388), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:14:29,476 - root - INFO - Step 8040: lr=3.60E-06, loss= 1.0592 (max= 1.4388), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:14:29,476 - root - INFO - Step 8040: lr=3.60E-06, loss= 1.0592 (max= 1.4388), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:14:29,477 - root - INFO - Step 8040: lr=3.60E-06, loss= 1.0592 (max= 1.4388), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:01,284 - root - INFO - Step 8050: lr=3.60E-06, loss= 1.0579 (max= 1.5729), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:01,284 - root - INFO - Step 8050: lr=3.60E-06, loss= 1.0579 (max= 1.5729), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:01,285 - root - INFO - Step 8050: lr=3.60E-06, loss= 1.0579 (max= 1.5729), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:01,285 - root - INFO - Step 8050: lr=3.60E-06, loss= 1.0579 (max= 1.5729), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:01,285 - root - INFO - Step 8050: lr=3.60E-06, loss= 1.0579 (max= 1.5729), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:01,285 - root - INFO - Step 8050: lr=3.60E-06, loss= 1.0579 (max= 1.5729), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:01,285 - root - INFO - Step 8050: lr=3.60E-06, loss= 1.0579 (max= 1.5729), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:01,285 - root - INFO - Step 8050: lr=3.60E-06, loss= 1.0579 (max= 1.5729), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:33,221 - root - INFO - Step 8060: lr=3.59E-06, loss= 1.0748 (max= 1.5067), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:33,221 - root - INFO - Step 8060: lr=3.59E-06, loss= 1.0748 (max= 1.5067), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:33,221 - root - INFO - Step 8060: lr=3.59E-06, loss= 1.0748 (max= 1.5067), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:33,221 - root - INFO - Step 8060: lr=3.59E-06, loss= 1.0748 (max= 1.5067), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:33,221 - root - INFO - Step 8060: lr=3.59E-06, loss= 1.0748 (max= 1.5067), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:33,221 - root - INFO - Step 8060: lr=3.59E-06, loss= 1.0748 (max= 1.5067), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:33,221 - root - INFO - Step 8060: lr=3.59E-06, loss= 1.0748 (max= 1.5067), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:15:33,221 - root - INFO - Step 8060: lr=3.59E-06, loss= 1.0748 (max= 1.5067), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:05,127 - root - INFO - Step 8070: lr=3.59E-06, loss= 1.0542 (max= 1.4036), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:05,127 - root - INFO - Step 8070: lr=3.59E-06, loss= 1.0542 (max= 1.4036), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:05,127 - root - INFO - Step 8070: lr=3.59E-06, loss= 1.0542 (max= 1.4036), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:05,127 - root - INFO - Step 8070: lr=3.59E-06, loss= 1.0542 (max= 1.4036), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:05,127 - root - INFO - Step 8070: lr=3.59E-06, loss= 1.0542 (max= 1.4036), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:05,127 - root - INFO - Step 8070: lr=3.59E-06, loss= 1.0542 (max= 1.4036), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:05,127 - root - INFO - Step 8070: lr=3.59E-06, loss= 1.0542 (max= 1.4036), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:05,128 - root - INFO - Step 8070: lr=3.59E-06, loss= 1.0542 (max= 1.4036), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:37,083 - root - INFO - Step 8080: lr=3.59E-06, loss= 1.0788 (max= 1.5653), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:37,083 - root - INFO - Step 8080: lr=3.59E-06, loss= 1.0788 (max= 1.5653), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:37,083 - root - INFO - Step 8080: lr=3.59E-06, loss= 1.0788 (max= 1.5653), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:37,083 - root - INFO - Step 8080: lr=3.59E-06, loss= 1.0788 (max= 1.5653), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:37,083 - root - INFO - Step 8080: lr=3.59E-06, loss= 1.0788 (max= 1.5653), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:37,083 - root - INFO - Step 8080: lr=3.59E-06, loss= 1.0788 (max= 1.5653), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:37,083 - root - INFO - Step 8080: lr=3.59E-06, loss= 1.0788 (max= 1.5653), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:16:37,084 - root - INFO - Step 8080: lr=3.59E-06, loss= 1.0788 (max= 1.5653), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:09,158 - root - INFO - Step 8090: lr=3.58E-06, loss= 1.0623 (max= 1.5219), tps=20434, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:09,159 - root - INFO - Step 8090: lr=3.58E-06, loss= 1.0623 (max= 1.5219), tps=20434, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:09,159 - root - INFO - Step 8090: lr=3.58E-06, loss= 1.0623 (max= 1.5219), tps=20434, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:09,159 - root - INFO - Step 8090: lr=3.58E-06, loss= 1.0623 (max= 1.5219), tps=20434, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:09,159 - root - INFO - Step 8090: lr=3.58E-06, loss= 1.0623 (max= 1.5219), tps=20434, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:09,159 - root - INFO - Step 8090: lr=3.58E-06, loss= 1.0623 (max= 1.5219), tps=20434, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:09,159 - root - INFO - Step 8090: lr=3.58E-06, loss= 1.0623 (max= 1.5219), tps=20434, mfu=42.57%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:09,159 - root - INFO - Step 8090: lr=3.58E-06, loss= 1.0623 (max= 1.5219), tps=20434, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:40,966 - root - INFO - Step 8100: lr=3.58E-06, loss= 1.0568 (max= 1.5708), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:40,966 - root - INFO - Step 8100: lr=3.58E-06, loss= 1.0568 (max= 1.5708), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:40,966 - root - INFO - Step 8100: lr=3.58E-06, loss= 1.0568 (max= 1.5708), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:40,966 - root - INFO - Step 8100: lr=3.58E-06, loss= 1.0568 (max= 1.5708), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:40,966 - root - INFO - Step 8100: lr=3.58E-06, loss= 1.0568 (max= 1.5708), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:40,966 - root - INFO - Step 8100: lr=3.58E-06, loss= 1.0568 (max= 1.5708), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:40,966 - root - INFO - Step 8100: lr=3.58E-06, loss= 1.0568 (max= 1.5708), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:17:40,966 - root - INFO - Step 8100: lr=3.58E-06, loss= 1.0568 (max= 1.5708), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:12,821 - root - INFO - Step 8110: lr=3.57E-06, loss= 1.0520 (max= 1.6374), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:12,821 - root - INFO - Step 8110: lr=3.57E-06, loss= 1.0520 (max= 1.6374), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:12,821 - root - INFO - Step 8110: lr=3.57E-06, loss= 1.0520 (max= 1.6374), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:12,821 - root - INFO - Step 8110: lr=3.57E-06, loss= 1.0520 (max= 1.6374), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:12,821 - root - INFO - Step 8110: lr=3.57E-06, loss= 1.0520 (max= 1.6374), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:12,821 - root - INFO - Step 8110: lr=3.57E-06, loss= 1.0520 (max= 1.6374), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:12,821 - root - INFO - Step 8110: lr=3.57E-06, loss= 1.0520 (max= 1.6374), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:12,822 - root - INFO - Step 8110: lr=3.57E-06, loss= 1.0520 (max= 1.6374), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:44,698 - root - INFO - Step 8120: lr=3.57E-06, loss= 1.0573 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:44,698 - root - INFO - Step 8120: lr=3.57E-06, loss= 1.0573 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:44,698 - root - INFO - Step 8120: lr=3.57E-06, loss= 1.0573 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:44,698 - root - INFO - Step 8120: lr=3.57E-06, loss= 1.0573 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:44,698 - root - INFO - Step 8120: lr=3.57E-06, loss= 1.0573 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:44,698 - root - INFO - Step 8120: lr=3.57E-06, loss= 1.0573 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:44,698 - root - INFO - Step 8120: lr=3.57E-06, loss= 1.0573 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:18:44,699 - root - INFO - Step 8120: lr=3.57E-06, loss= 1.0573 (max= 1.5449), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:16,523 - root - INFO - Step 8130: lr=3.57E-06, loss= 1.0738 (max= 1.4468), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:16,523 - root - INFO - Step 8130: lr=3.57E-06, loss= 1.0738 (max= 1.4468), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:16,523 - root - INFO - Step 8130: lr=3.57E-06, loss= 1.0738 (max= 1.4468), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:16,523 - root - INFO - Step 8130: lr=3.57E-06, loss= 1.0738 (max= 1.4468), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:16,524 - root - INFO - Step 8130: lr=3.57E-06, loss= 1.0738 (max= 1.4468), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:16,524 - root - INFO - Step 8130: lr=3.57E-06, loss= 1.0738 (max= 1.4468), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:16,524 - root - INFO - Step 8130: lr=3.57E-06, loss= 1.0738 (max= 1.4468), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:16,524 - root - INFO - Step 8130: lr=3.57E-06, loss= 1.0738 (max= 1.4468), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:48,427 - root - INFO - Step 8140: lr=3.56E-06, loss= 1.0611 (max= 1.5112), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:48,427 - root - INFO - Step 8140: lr=3.56E-06, loss= 1.0611 (max= 1.5112), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:48,427 - root - INFO - Step 8140: lr=3.56E-06, loss= 1.0611 (max= 1.5112), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:48,427 - root - INFO - Step 8140: lr=3.56E-06, loss= 1.0611 (max= 1.5112), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:48,427 - root - INFO - Step 8140: lr=3.56E-06, loss= 1.0611 (max= 1.5112), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:48,427 - root - INFO - Step 8140: lr=3.56E-06, loss= 1.0611 (max= 1.5112), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:48,427 - root - INFO - Step 8140: lr=3.56E-06, loss= 1.0611 (max= 1.5112), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:19:48,428 - root - INFO - Step 8140: lr=3.56E-06, loss= 1.0611 (max= 1.5112), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:20,269 - root - INFO - Step 8150: lr=3.56E-06, loss= 1.0826 (max= 1.4755), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:20,269 - root - INFO - Step 8150: lr=3.56E-06, loss= 1.0826 (max= 1.4755), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:20,269 - root - INFO - Step 8150: lr=3.56E-06, loss= 1.0826 (max= 1.4755), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:20,269 - root - INFO - Step 8150: lr=3.56E-06, loss= 1.0826 (max= 1.4755), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:20,269 - root - INFO - Step 8150: lr=3.56E-06, loss= 1.0826 (max= 1.4755), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:20,269 - root - INFO - Step 8150: lr=3.56E-06, loss= 1.0826 (max= 1.4755), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:20,270 - root - INFO - Step 8150: lr=3.56E-06, loss= 1.0826 (max= 1.4755), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:20,270 - root - INFO - Step 8150: lr=3.56E-06, loss= 1.0826 (max= 1.4755), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:52,097 - root - INFO - Step 8160: lr=3.55E-06, loss= 1.0679 (max= 1.4770), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:52,097 - root - INFO - Step 8160: lr=3.55E-06, loss= 1.0679 (max= 1.4770), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:52,097 - root - INFO - Step 8160: lr=3.55E-06, loss= 1.0679 (max= 1.4770), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:52,098 - root - INFO - Step 8160: lr=3.55E-06, loss= 1.0679 (max= 1.4770), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:52,098 - root - INFO - Step 8160: lr=3.55E-06, loss= 1.0679 (max= 1.4770), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:52,098 - root - INFO - Step 8160: lr=3.55E-06, loss= 1.0679 (max= 1.4770), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:52,098 - root - INFO - Step 8160: lr=3.55E-06, loss= 1.0679 (max= 1.4770), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:20:52,098 - root - INFO - Step 8160: lr=3.55E-06, loss= 1.0679 (max= 1.4770), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:23,943 - root - INFO - Step 8170: lr=3.55E-06, loss= 1.0760 (max= 1.5411), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:23,943 - root - INFO - Step 8170: lr=3.55E-06, loss= 1.0760 (max= 1.5411), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:23,943 - root - INFO - Step 8170: lr=3.55E-06, loss= 1.0760 (max= 1.5411), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:23,943 - root - INFO - Step 8170: lr=3.55E-06, loss= 1.0760 (max= 1.5411), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:23,943 - root - INFO - Step 8170: lr=3.55E-06, loss= 1.0760 (max= 1.5411), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:23,943 - root - INFO - Step 8170: lr=3.55E-06, loss= 1.0760 (max= 1.5411), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:23,943 - root - INFO - Step 8170: lr=3.55E-06, loss= 1.0760 (max= 1.5411), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:23,944 - root - INFO - Step 8170: lr=3.55E-06, loss= 1.0760 (max= 1.5411), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:55,796 - root - INFO - Step 8180: lr=3.54E-06, loss= 1.0540 (max= 1.4540), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:55,796 - root - INFO - Step 8180: lr=3.54E-06, loss= 1.0540 (max= 1.4540), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:55,796 - root - INFO - Step 8180: lr=3.54E-06, loss= 1.0540 (max= 1.4540), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:55,796 - root - INFO - Step 8180: lr=3.54E-06, loss= 1.0540 (max= 1.4540), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:55,796 - root - INFO - Step 8180: lr=3.54E-06, loss= 1.0540 (max= 1.4540), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:55,796 - root - INFO - Step 8180: lr=3.54E-06, loss= 1.0540 (max= 1.4540), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:55,797 - root - INFO - Step 8180: lr=3.54E-06, loss= 1.0540 (max= 1.4540), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:21:55,797 - root - INFO - Step 8180: lr=3.54E-06, loss= 1.0540 (max= 1.4540), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:27,741 - root - INFO - Step 8190: lr=3.54E-06, loss= 1.0725 (max= 1.5434), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:27,741 - root - INFO - Step 8190: lr=3.54E-06, loss= 1.0725 (max= 1.5434), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:27,741 - root - INFO - Step 8190: lr=3.54E-06, loss= 1.0725 (max= 1.5434), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:27,741 - root - INFO - Step 8190: lr=3.54E-06, loss= 1.0725 (max= 1.5434), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:27,742 - root - INFO - Step 8190: lr=3.54E-06, loss= 1.0725 (max= 1.5434), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:27,742 - root - INFO - Step 8190: lr=3.54E-06, loss= 1.0725 (max= 1.5434), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:27,742 - root - INFO - Step 8190: lr=3.54E-06, loss= 1.0725 (max= 1.5434), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:27,742 - root - INFO - Step 8190: lr=3.54E-06, loss= 1.0725 (max= 1.5434), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:59,736 - root - INFO - Step 8200: lr=3.54E-06, loss= 1.0483 (max= 1.6842), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:59,736 - root - INFO - Step 8200: lr=3.54E-06, loss= 1.0483 (max= 1.6842), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:59,736 - root - INFO - Step 8200: lr=3.54E-06, loss= 1.0483 (max= 1.6842), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:59,736 - root - INFO - Step 8200: lr=3.54E-06, loss= 1.0483 (max= 1.6842), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:59,737 - root - INFO - Step 8200: lr=3.54E-06, loss= 1.0483 (max= 1.6842), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:59,737 - root - INFO - Step 8200: lr=3.54E-06, loss= 1.0483 (max= 1.6842), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:59,737 - root - INFO - Step 8200: lr=3.54E-06, loss= 1.0483 (max= 1.6842), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:22:59,737 - root - INFO - Step 8200: lr=3.54E-06, loss= 1.0483 (max= 1.6842), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:23:31,719 - root - INFO - Step 8210: lr=3.53E-06, loss= 1.0735 (max= 1.5444), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:23:31,719 - root - INFO - Step 8210: lr=3.53E-06, loss= 1.0735 (max= 1.5444), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:23:31,719 - root - INFO - Step 8210: lr=3.53E-06, loss= 1.0735 (max= 1.5444), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:23:31,719 - root - INFO - Step 8210: lr=3.53E-06, loss= 1.0735 (max= 1.5444), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:23:31,719 - root - INFO - Step 8210: lr=3.53E-06, loss= 1.0735 (max= 1.5444), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:23:31,719 - root - INFO - Step 8210: lr=3.53E-06, loss= 1.0735 (max= 1.5444), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:23:31,719 - root - INFO - Step 8210: lr=3.53E-06, loss= 1.0735 (max= 1.5444), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:23:31,719 - root - INFO - Step 8210: lr=3.53E-06, loss= 1.0735 (max= 1.5444), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:03,605 - root - INFO - Step 8220: lr=3.53E-06, loss= 1.0543 (max= 1.4679), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:03,605 - root - INFO - Step 8220: lr=3.53E-06, loss= 1.0543 (max= 1.4679), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:03,605 - root - INFO - Step 8220: lr=3.53E-06, loss= 1.0543 (max= 1.4679), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:03,605 - root - INFO - Step 8220: lr=3.53E-06, loss= 1.0543 (max= 1.4679), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:03,605 - root - INFO - Step 8220: lr=3.53E-06, loss= 1.0543 (max= 1.4679), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:03,605 - root - INFO - Step 8220: lr=3.53E-06, loss= 1.0543 (max= 1.4679), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:03,606 - root - INFO - Step 8220: lr=3.53E-06, loss= 1.0543 (max= 1.4679), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:03,606 - root - INFO - Step 8220: lr=3.53E-06, loss= 1.0543 (max= 1.4679), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:35,437 - root - INFO - Step 8230: lr=3.52E-06, loss= 1.0456 (max= 1.4651), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:35,437 - root - INFO - Step 8230: lr=3.52E-06, loss= 1.0456 (max= 1.4651), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:35,437 - root - INFO - Step 8230: lr=3.52E-06, loss= 1.0456 (max= 1.4651), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:35,437 - root - INFO - Step 8230: lr=3.52E-06, loss= 1.0456 (max= 1.4651), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:35,437 - root - INFO - Step 8230: lr=3.52E-06, loss= 1.0456 (max= 1.4651), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:35,437 - root - INFO - Step 8230: lr=3.52E-06, loss= 1.0456 (max= 1.4651), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:35,437 - root - INFO - Step 8230: lr=3.52E-06, loss= 1.0456 (max= 1.4651), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:24:35,437 - root - INFO - Step 8230: lr=3.52E-06, loss= 1.0456 (max= 1.4651), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:07,476 - root - INFO - Step 8240: lr=3.52E-06, loss= 1.0570 (max= 1.4784), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:07,476 - root - INFO - Step 8240: lr=3.52E-06, loss= 1.0570 (max= 1.4784), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:07,476 - root - INFO - Step 8240: lr=3.52E-06, loss= 1.0570 (max= 1.4784), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:07,476 - root - INFO - Step 8240: lr=3.52E-06, loss= 1.0570 (max= 1.4784), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:07,476 - root - INFO - Step 8240: lr=3.52E-06, loss= 1.0570 (max= 1.4784), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:07,476 - root - INFO - Step 8240: lr=3.52E-06, loss= 1.0570 (max= 1.4784), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:07,476 - root - INFO - Step 8240: lr=3.52E-06, loss= 1.0570 (max= 1.4784), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:07,477 - root - INFO - Step 8240: lr=3.52E-06, loss= 1.0570 (max= 1.4784), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:39,275 - root - INFO - Step 8250: lr=3.51E-06, loss= 1.0491 (max= 1.4377), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:39,275 - root - INFO - Step 8250: lr=3.51E-06, loss= 1.0491 (max= 1.4377), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:39,275 - root - INFO - Step 8250: lr=3.51E-06, loss= 1.0491 (max= 1.4377), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:39,275 - root - INFO - Step 8250: lr=3.51E-06, loss= 1.0491 (max= 1.4377), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:39,275 - root - INFO - Step 8250: lr=3.51E-06, loss= 1.0491 (max= 1.4377), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:39,275 - root - INFO - Step 8250: lr=3.51E-06, loss= 1.0491 (max= 1.4377), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:39,275 - root - INFO - Step 8250: lr=3.51E-06, loss= 1.0491 (max= 1.4377), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:25:39,275 - root - INFO - Step 8250: lr=3.51E-06, loss= 1.0491 (max= 1.4377), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:11,177 - root - INFO - Step 8260: lr=3.51E-06, loss= 1.0865 (max= 1.5229), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:11,177 - root - INFO - Step 8260: lr=3.51E-06, loss= 1.0865 (max= 1.5229), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:11,177 - root - INFO - Step 8260: lr=3.51E-06, loss= 1.0865 (max= 1.5229), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:11,177 - root - INFO - Step 8260: lr=3.51E-06, loss= 1.0865 (max= 1.5229), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:11,177 - root - INFO - Step 8260: lr=3.51E-06, loss= 1.0865 (max= 1.5229), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:11,177 - root - INFO - Step 8260: lr=3.51E-06, loss= 1.0865 (max= 1.5229), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:11,177 - root - INFO - Step 8260: lr=3.51E-06, loss= 1.0865 (max= 1.5229), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:11,178 - root - INFO - Step 8260: lr=3.51E-06, loss= 1.0865 (max= 1.5229), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:43,058 - root - INFO - Step 8270: lr=3.51E-06, loss= 1.0746 (max= 1.5451), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:43,058 - root - INFO - Step 8270: lr=3.51E-06, loss= 1.0746 (max= 1.5451), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:43,058 - root - INFO - Step 8270: lr=3.51E-06, loss= 1.0746 (max= 1.5451), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:43,058 - root - INFO - Step 8270: lr=3.51E-06, loss= 1.0746 (max= 1.5451), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:43,059 - root - INFO - Step 8270: lr=3.51E-06, loss= 1.0746 (max= 1.5451), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:43,059 - root - INFO - Step 8270: lr=3.51E-06, loss= 1.0746 (max= 1.5451), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:43,059 - root - INFO - Step 8270: lr=3.51E-06, loss= 1.0746 (max= 1.5451), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:26:43,059 - root - INFO - Step 8270: lr=3.51E-06, loss= 1.0746 (max= 1.5451), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:14,806 - root - INFO - Step 8280: lr=3.50E-06, loss= 1.0881 (max= 1.6036), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:14,806 - root - INFO - Step 8280: lr=3.50E-06, loss= 1.0881 (max= 1.6036), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:14,806 - root - INFO - Step 8280: lr=3.50E-06, loss= 1.0881 (max= 1.6036), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:14,806 - root - INFO - Step 8280: lr=3.50E-06, loss= 1.0881 (max= 1.6036), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:14,806 - root - INFO - Step 8280: lr=3.50E-06, loss= 1.0881 (max= 1.6036), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:14,806 - root - INFO - Step 8280: lr=3.50E-06, loss= 1.0881 (max= 1.6036), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:14,806 - root - INFO - Step 8280: lr=3.50E-06, loss= 1.0881 (max= 1.6036), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:14,807 - root - INFO - Step 8280: lr=3.50E-06, loss= 1.0881 (max= 1.6036), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:46,687 - root - INFO - Step 8290: lr=3.50E-06, loss= 1.0546 (max= 1.5769), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:46,687 - root - INFO - Step 8290: lr=3.50E-06, loss= 1.0546 (max= 1.5769), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:46,687 - root - INFO - Step 8290: lr=3.50E-06, loss= 1.0546 (max= 1.5769), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:46,687 - root - INFO - Step 8290: lr=3.50E-06, loss= 1.0546 (max= 1.5769), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:46,688 - root - INFO - Step 8290: lr=3.50E-06, loss= 1.0546 (max= 1.5769), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:46,688 - root - INFO - Step 8290: lr=3.50E-06, loss= 1.0546 (max= 1.5769), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:46,688 - root - INFO - Step 8290: lr=3.50E-06, loss= 1.0546 (max= 1.5769), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:27:46,688 - root - INFO - Step 8290: lr=3.50E-06, loss= 1.0546 (max= 1.5769), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:18,567 - root - INFO - Step 8300: lr=3.49E-06, loss= 1.0662 (max= 1.5138), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:18,567 - root - INFO - Step 8300: lr=3.49E-06, loss= 1.0662 (max= 1.5138), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:18,567 - root - INFO - Step 8300: lr=3.49E-06, loss= 1.0662 (max= 1.5138), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:18,567 - root - INFO - Step 8300: lr=3.49E-06, loss= 1.0662 (max= 1.5138), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:18,567 - root - INFO - Step 8300: lr=3.49E-06, loss= 1.0662 (max= 1.5138), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:18,567 - root - INFO - Step 8300: lr=3.49E-06, loss= 1.0662 (max= 1.5138), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:18,567 - root - INFO - Step 8300: lr=3.49E-06, loss= 1.0662 (max= 1.5138), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:18,567 - root - INFO - Step 8300: lr=3.49E-06, loss= 1.0662 (max= 1.5138), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:50,461 - root - INFO - Step 8310: lr=3.49E-06, loss= 1.0764 (max= 1.4519), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:50,461 - root - INFO - Step 8310: lr=3.49E-06, loss= 1.0764 (max= 1.4519), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:50,461 - root - INFO - Step 8310: lr=3.49E-06, loss= 1.0764 (max= 1.4519), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:50,461 - root - INFO - Step 8310: lr=3.49E-06, loss= 1.0764 (max= 1.4519), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:50,461 - root - INFO - Step 8310: lr=3.49E-06, loss= 1.0764 (max= 1.4519), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:50,461 - root - INFO - Step 8310: lr=3.49E-06, loss= 1.0764 (max= 1.4519), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:50,461 - root - INFO - Step 8310: lr=3.49E-06, loss= 1.0764 (max= 1.4519), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:28:50,461 - root - INFO - Step 8310: lr=3.49E-06, loss= 1.0764 (max= 1.4519), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:22,343 - root - INFO - Step 8320: lr=3.49E-06, loss= 1.0691 (max= 1.5814), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:22,343 - root - INFO - Step 8320: lr=3.49E-06, loss= 1.0691 (max= 1.5814), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:22,343 - root - INFO - Step 8320: lr=3.49E-06, loss= 1.0691 (max= 1.5814), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:22,343 - root - INFO - Step 8320: lr=3.49E-06, loss= 1.0691 (max= 1.5814), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:22,343 - root - INFO - Step 8320: lr=3.49E-06, loss= 1.0691 (max= 1.5814), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:22,343 - root - INFO - Step 8320: lr=3.49E-06, loss= 1.0691 (max= 1.5814), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:22,343 - root - INFO - Step 8320: lr=3.49E-06, loss= 1.0691 (max= 1.5814), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:22,343 - root - INFO - Step 8320: lr=3.49E-06, loss= 1.0691 (max= 1.5814), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:54,182 - root - INFO - Step 8330: lr=3.48E-06, loss= 1.0840 (max= 1.4360), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:54,182 - root - INFO - Step 8330: lr=3.48E-06, loss= 1.0840 (max= 1.4360), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:54,182 - root - INFO - Step 8330: lr=3.48E-06, loss= 1.0840 (max= 1.4360), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:54,182 - root - INFO - Step 8330: lr=3.48E-06, loss= 1.0840 (max= 1.4360), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:54,182 - root - INFO - Step 8330: lr=3.48E-06, loss= 1.0840 (max= 1.4360), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:54,182 - root - INFO - Step 8330: lr=3.48E-06, loss= 1.0840 (max= 1.4360), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:54,182 - root - INFO - Step 8330: lr=3.48E-06, loss= 1.0840 (max= 1.4360), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:29:54,182 - root - INFO - Step 8330: lr=3.48E-06, loss= 1.0840 (max= 1.4360), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:26,076 - root - INFO - Step 8340: lr=3.48E-06, loss= 1.0709 (max= 1.5143), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:26,076 - root - INFO - Step 8340: lr=3.48E-06, loss= 1.0709 (max= 1.5143), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:26,076 - root - INFO - Step 8340: lr=3.48E-06, loss= 1.0709 (max= 1.5143), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:26,076 - root - INFO - Step 8340: lr=3.48E-06, loss= 1.0709 (max= 1.5143), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:26,076 - root - INFO - Step 8340: lr=3.48E-06, loss= 1.0709 (max= 1.5143), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:26,076 - root - INFO - Step 8340: lr=3.48E-06, loss= 1.0709 (max= 1.5143), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:26,076 - root - INFO - Step 8340: lr=3.48E-06, loss= 1.0709 (max= 1.5143), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:26,077 - root - INFO - Step 8340: lr=3.48E-06, loss= 1.0709 (max= 1.5143), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:57,936 - root - INFO - Step 8350: lr=3.47E-06, loss= 1.0659 (max= 1.4486), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:57,936 - root - INFO - Step 8350: lr=3.47E-06, loss= 1.0659 (max= 1.4486), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:57,936 - root - INFO - Step 8350: lr=3.47E-06, loss= 1.0659 (max= 1.4486), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:57,936 - root - INFO - Step 8350: lr=3.47E-06, loss= 1.0659 (max= 1.4486), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:57,936 - root - INFO - Step 8350: lr=3.47E-06, loss= 1.0659 (max= 1.4486), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:57,936 - root - INFO - Step 8350: lr=3.47E-06, loss= 1.0659 (max= 1.4486), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:57,936 - root - INFO - Step 8350: lr=3.47E-06, loss= 1.0659 (max= 1.4486), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:30:57,937 - root - INFO - Step 8350: lr=3.47E-06, loss= 1.0659 (max= 1.4486), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:31:29,792 - root - INFO - Step 8360: lr=3.47E-06, loss= 1.0680 (max= 1.4303), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:31:29,792 - root - INFO - Step 8360: lr=3.47E-06, loss= 1.0680 (max= 1.4303), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:31:29,792 - root - INFO - Step 8360: lr=3.47E-06, loss= 1.0680 (max= 1.4303), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:31:29,792 - root - INFO - Step 8360: lr=3.47E-06, loss= 1.0680 (max= 1.4303), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:31:29,792 - root - INFO - Step 8360: lr=3.47E-06, loss= 1.0680 (max= 1.4303), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:31:29,792 - root - INFO - Step 8360: lr=3.47E-06, loss= 1.0680 (max= 1.4303), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:31:29,792 - root - INFO - Step 8360: lr=3.47E-06, loss= 1.0680 (max= 1.4303), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:31:29,793 - root - INFO - Step 8360: lr=3.47E-06, loss= 1.0680 (max= 1.4303), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:01,609 - root - INFO - Step 8370: lr=3.46E-06, loss= 1.0700 (max= 1.5667), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:01,609 - root - INFO - Step 8370: lr=3.46E-06, loss= 1.0700 (max= 1.5667), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:01,609 - root - INFO - Step 8370: lr=3.46E-06, loss= 1.0700 (max= 1.5667), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:01,609 - root - INFO - Step 8370: lr=3.46E-06, loss= 1.0700 (max= 1.5667), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:01,609 - root - INFO - Step 8370: lr=3.46E-06, loss= 1.0700 (max= 1.5667), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:01,609 - root - INFO - Step 8370: lr=3.46E-06, loss= 1.0700 (max= 1.5667), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:01,609 - root - INFO - Step 8370: lr=3.46E-06, loss= 1.0700 (max= 1.5667), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:01,610 - root - INFO - Step 8370: lr=3.46E-06, loss= 1.0700 (max= 1.5667), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:33,485 - root - INFO - Step 8380: lr=3.46E-06, loss= 1.0762 (max= 1.6343), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:33,485 - root - INFO - Step 8380: lr=3.46E-06, loss= 1.0762 (max= 1.6343), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:33,485 - root - INFO - Step 8380: lr=3.46E-06, loss= 1.0762 (max= 1.6343), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:33,486 - root - INFO - Step 8380: lr=3.46E-06, loss= 1.0762 (max= 1.6343), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:33,486 - root - INFO - Step 8380: lr=3.46E-06, loss= 1.0762 (max= 1.6343), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:33,486 - root - INFO - Step 8380: lr=3.46E-06, loss= 1.0762 (max= 1.6343), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:33,486 - root - INFO - Step 8380: lr=3.46E-06, loss= 1.0762 (max= 1.6343), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:32:33,486 - root - INFO - Step 8380: lr=3.46E-06, loss= 1.0762 (max= 1.6343), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:05,322 - root - INFO - Step 8390: lr=3.46E-06, loss= 1.0500 (max= 1.4872), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:05,322 - root - INFO - Step 8390: lr=3.46E-06, loss= 1.0500 (max= 1.4872), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:05,322 - root - INFO - Step 8390: lr=3.46E-06, loss= 1.0500 (max= 1.4872), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:05,322 - root - INFO - Step 8390: lr=3.46E-06, loss= 1.0500 (max= 1.4872), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:05,322 - root - INFO - Step 8390: lr=3.46E-06, loss= 1.0500 (max= 1.4872), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:05,322 - root - INFO - Step 8390: lr=3.46E-06, loss= 1.0500 (max= 1.4872), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:05,322 - root - INFO - Step 8390: lr=3.46E-06, loss= 1.0500 (max= 1.4872), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:05,323 - root - INFO - Step 8390: lr=3.46E-06, loss= 1.0500 (max= 1.4872), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:37,120 - root - INFO - Step 8400: lr=3.45E-06, loss= 1.0764 (max= 1.5791), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:37,120 - root - INFO - Step 8400: lr=3.45E-06, loss= 1.0764 (max= 1.5791), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:37,120 - root - INFO - Step 8400: lr=3.45E-06, loss= 1.0764 (max= 1.5791), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:37,120 - root - INFO - Step 8400: lr=3.45E-06, loss= 1.0764 (max= 1.5791), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:37,120 - root - INFO - Step 8400: lr=3.45E-06, loss= 1.0764 (max= 1.5791), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:37,120 - root - INFO - Step 8400: lr=3.45E-06, loss= 1.0764 (max= 1.5791), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:37,120 - root - INFO - Step 8400: lr=3.45E-06, loss= 1.0764 (max= 1.5791), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:33:37,121 - root - INFO - Step 8400: lr=3.45E-06, loss= 1.0764 (max= 1.5791), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:08,857 - root - INFO - Step 8410: lr=3.45E-06, loss= 1.0759 (max= 1.4339), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:08,857 - root - INFO - Step 8410: lr=3.45E-06, loss= 1.0759 (max= 1.4339), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:08,857 - root - INFO - Step 8410: lr=3.45E-06, loss= 1.0759 (max= 1.4339), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:08,857 - root - INFO - Step 8410: lr=3.45E-06, loss= 1.0759 (max= 1.4339), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:08,857 - root - INFO - Step 8410: lr=3.45E-06, loss= 1.0759 (max= 1.4339), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:08,857 - root - INFO - Step 8410: lr=3.45E-06, loss= 1.0759 (max= 1.4339), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:08,857 - root - INFO - Step 8410: lr=3.45E-06, loss= 1.0759 (max= 1.4339), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:08,858 - root - INFO - Step 8410: lr=3.45E-06, loss= 1.0759 (max= 1.4339), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:40,710 - root - INFO - Step 8420: lr=3.44E-06, loss= 1.0522 (max= 1.5812), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:40,710 - root - INFO - Step 8420: lr=3.44E-06, loss= 1.0522 (max= 1.5812), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:40,710 - root - INFO - Step 8420: lr=3.44E-06, loss= 1.0522 (max= 1.5812), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:40,710 - root - INFO - Step 8420: lr=3.44E-06, loss= 1.0522 (max= 1.5812), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:40,710 - root - INFO - Step 8420: lr=3.44E-06, loss= 1.0522 (max= 1.5812), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:40,711 - root - INFO - Step 8420: lr=3.44E-06, loss= 1.0522 (max= 1.5812), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:40,711 - root - INFO - Step 8420: lr=3.44E-06, loss= 1.0522 (max= 1.5812), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:34:40,711 - root - INFO - Step 8420: lr=3.44E-06, loss= 1.0522 (max= 1.5812), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:12,627 - root - INFO - Step 8430: lr=3.44E-06, loss= 1.0707 (max= 1.5874), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:12,627 - root - INFO - Step 8430: lr=3.44E-06, loss= 1.0707 (max= 1.5874), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:12,627 - root - INFO - Step 8430: lr=3.44E-06, loss= 1.0707 (max= 1.5874), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:12,627 - root - INFO - Step 8430: lr=3.44E-06, loss= 1.0707 (max= 1.5874), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:12,627 - root - INFO - Step 8430: lr=3.44E-06, loss= 1.0707 (max= 1.5874), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:12,627 - root - INFO - Step 8430: lr=3.44E-06, loss= 1.0707 (max= 1.5874), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:12,627 - root - INFO - Step 8430: lr=3.44E-06, loss= 1.0707 (max= 1.5874), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:12,628 - root - INFO - Step 8430: lr=3.44E-06, loss= 1.0707 (max= 1.5874), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:44,426 - root - INFO - Step 8440: lr=3.44E-06, loss= 1.0550 (max= 1.6130), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:44,426 - root - INFO - Step 8440: lr=3.44E-06, loss= 1.0550 (max= 1.6130), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:44,426 - root - INFO - Step 8440: lr=3.44E-06, loss= 1.0550 (max= 1.6130), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:44,426 - root - INFO - Step 8440: lr=3.44E-06, loss= 1.0550 (max= 1.6130), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:44,426 - root - INFO - Step 8440: lr=3.44E-06, loss= 1.0550 (max= 1.6130), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:44,426 - root - INFO - Step 8440: lr=3.44E-06, loss= 1.0550 (max= 1.6130), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:44,426 - root - INFO - Step 8440: lr=3.44E-06, loss= 1.0550 (max= 1.6130), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:35:44,426 - root - INFO - Step 8440: lr=3.44E-06, loss= 1.0550 (max= 1.6130), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:16,290 - root - INFO - Step 8450: lr=3.43E-06, loss= 1.0709 (max= 1.6142), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:16,290 - root - INFO - Step 8450: lr=3.43E-06, loss= 1.0709 (max= 1.6142), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:16,290 - root - INFO - Step 8450: lr=3.43E-06, loss= 1.0709 (max= 1.6142), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:16,290 - root - INFO - Step 8450: lr=3.43E-06, loss= 1.0709 (max= 1.6142), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:16,290 - root - INFO - Step 8450: lr=3.43E-06, loss= 1.0709 (max= 1.6142), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:16,290 - root - INFO - Step 8450: lr=3.43E-06, loss= 1.0709 (max= 1.6142), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:16,290 - root - INFO - Step 8450: lr=3.43E-06, loss= 1.0709 (max= 1.6142), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:16,290 - root - INFO - Step 8450: lr=3.43E-06, loss= 1.0709 (max= 1.6142), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:48,119 - root - INFO - Step 8460: lr=3.43E-06, loss= 1.0722 (max= 1.4706), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:48,119 - root - INFO - Step 8460: lr=3.43E-06, loss= 1.0722 (max= 1.4706), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:48,119 - root - INFO - Step 8460: lr=3.43E-06, loss= 1.0722 (max= 1.4706), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:48,119 - root - INFO - Step 8460: lr=3.43E-06, loss= 1.0722 (max= 1.4706), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:48,119 - root - INFO - Step 8460: lr=3.43E-06, loss= 1.0722 (max= 1.4706), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:48,119 - root - INFO - Step 8460: lr=3.43E-06, loss= 1.0722 (max= 1.4706), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:48,119 - root - INFO - Step 8460: lr=3.43E-06, loss= 1.0722 (max= 1.4706), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:36:48,119 - root - INFO - Step 8460: lr=3.43E-06, loss= 1.0722 (max= 1.4706), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:19,940 - root - INFO - Step 8470: lr=3.42E-06, loss= 1.0628 (max= 1.5310), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:19,940 - root - INFO - Step 8470: lr=3.42E-06, loss= 1.0628 (max= 1.5310), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:19,940 - root - INFO - Step 8470: lr=3.42E-06, loss= 1.0628 (max= 1.5310), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:19,940 - root - INFO - Step 8470: lr=3.42E-06, loss= 1.0628 (max= 1.5310), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:19,940 - root - INFO - Step 8470: lr=3.42E-06, loss= 1.0628 (max= 1.5310), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:19,940 - root - INFO - Step 8470: lr=3.42E-06, loss= 1.0628 (max= 1.5310), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:19,941 - root - INFO - Step 8470: lr=3.42E-06, loss= 1.0628 (max= 1.5310), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:19,941 - root - INFO - Step 8470: lr=3.42E-06, loss= 1.0628 (max= 1.5310), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:51,775 - root - INFO - Step 8480: lr=3.42E-06, loss= 1.0748 (max= 1.4740), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:51,775 - root - INFO - Step 8480: lr=3.42E-06, loss= 1.0748 (max= 1.4740), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:51,776 - root - INFO - Step 8480: lr=3.42E-06, loss= 1.0748 (max= 1.4740), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:51,776 - root - INFO - Step 8480: lr=3.42E-06, loss= 1.0748 (max= 1.4740), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:51,776 - root - INFO - Step 8480: lr=3.42E-06, loss= 1.0748 (max= 1.4740), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:51,776 - root - INFO - Step 8480: lr=3.42E-06, loss= 1.0748 (max= 1.4740), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:51,776 - root - INFO - Step 8480: lr=3.42E-06, loss= 1.0748 (max= 1.4740), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:37:51,776 - root - INFO - Step 8480: lr=3.42E-06, loss= 1.0748 (max= 1.4740), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:23,607 - root - INFO - Step 8490: lr=3.41E-06, loss= 1.0724 (max= 1.6454), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:23,607 - root - INFO - Step 8490: lr=3.41E-06, loss= 1.0724 (max= 1.6454), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:23,607 - root - INFO - Step 8490: lr=3.41E-06, loss= 1.0724 (max= 1.6454), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:23,608 - root - INFO - Step 8490: lr=3.41E-06, loss= 1.0724 (max= 1.6454), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:23,608 - root - INFO - Step 8490: lr=3.41E-06, loss= 1.0724 (max= 1.6454), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:23,608 - root - INFO - Step 8490: lr=3.41E-06, loss= 1.0724 (max= 1.6454), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:23,608 - root - INFO - Step 8490: lr=3.41E-06, loss= 1.0724 (max= 1.6454), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:23,608 - root - INFO - Step 8490: lr=3.41E-06, loss= 1.0724 (max= 1.6454), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:55,417 - root - INFO - Step 8500: lr=3.41E-06, loss= 1.0582 (max= 1.5734), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:55,417 - root - INFO - Step 8500: lr=3.41E-06, loss= 1.0582 (max= 1.5734), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:55,417 - root - INFO - Step 8500: lr=3.41E-06, loss= 1.0582 (max= 1.5734), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:55,417 - root - INFO - Step 8500: lr=3.41E-06, loss= 1.0582 (max= 1.5734), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:55,417 - root - INFO - Step 8500: lr=3.41E-06, loss= 1.0582 (max= 1.5734), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:55,417 - root - INFO - Step 8500: lr=3.41E-06, loss= 1.0582 (max= 1.5734), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:55,417 - root - INFO - Step 8500: lr=3.41E-06, loss= 1.0582 (max= 1.5734), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:38:55,418 - root - INFO - Step 8500: lr=3.41E-06, loss= 1.0582 (max= 1.5734), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:27,279 - root - INFO - Step 8510: lr=3.41E-06, loss= 1.0809 (max= 1.4536), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:27,279 - root - INFO - Step 8510: lr=3.41E-06, loss= 1.0809 (max= 1.4536), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:27,279 - root - INFO - Step 8510: lr=3.41E-06, loss= 1.0809 (max= 1.4536), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:27,279 - root - INFO - Step 8510: lr=3.41E-06, loss= 1.0809 (max= 1.4536), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:27,279 - root - INFO - Step 8510: lr=3.41E-06, loss= 1.0809 (max= 1.4536), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:27,279 - root - INFO - Step 8510: lr=3.41E-06, loss= 1.0809 (max= 1.4536), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:27,279 - root - INFO - Step 8510: lr=3.41E-06, loss= 1.0809 (max= 1.4536), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:27,280 - root - INFO - Step 8510: lr=3.41E-06, loss= 1.0809 (max= 1.4536), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:59,040 - root - INFO - Step 8520: lr=3.40E-06, loss= 1.0678 (max= 2.3330), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:59,040 - root - INFO - Step 8520: lr=3.40E-06, loss= 1.0678 (max= 2.3330), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:59,040 - root - INFO - Step 8520: lr=3.40E-06, loss= 1.0678 (max= 2.3330), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:59,040 - root - INFO - Step 8520: lr=3.40E-06, loss= 1.0678 (max= 2.3330), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:59,040 - root - INFO - Step 8520: lr=3.40E-06, loss= 1.0678 (max= 2.3330), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:59,040 - root - INFO - Step 8520: lr=3.40E-06, loss= 1.0678 (max= 2.3330), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:59,041 - root - INFO - Step 8520: lr=3.40E-06, loss= 1.0678 (max= 2.3330), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:39:59,041 - root - INFO - Step 8520: lr=3.40E-06, loss= 1.0678 (max= 2.3330), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:40:30,907 - root - INFO - Step 8530: lr=3.40E-06, loss= 1.0589 (max= 1.4951), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:40:30,907 - root - INFO - Step 8530: lr=3.40E-06, loss= 1.0589 (max= 1.4951), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:40:30,907 - root - INFO - Step 8530: lr=3.40E-06, loss= 1.0589 (max= 1.4951), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:40:30,907 - root - INFO - Step 8530: lr=3.40E-06, loss= 1.0589 (max= 1.4951), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:40:30,907 - root - INFO - Step 8530: lr=3.40E-06, loss= 1.0589 (max= 1.4951), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:40:30,907 - root - INFO - Step 8530: lr=3.40E-06, loss= 1.0589 (max= 1.4951), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:40:30,907 - root - INFO - Step 8530: lr=3.40E-06, loss= 1.0589 (max= 1.4951), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:40:30,908 - root - INFO - Step 8530: lr=3.40E-06, loss= 1.0589 (max= 1.4951), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:02,705 - root - INFO - Step 8540: lr=3.39E-06, loss= 1.0822 (max= 1.4061), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:02,705 - root - INFO - Step 8540: lr=3.39E-06, loss= 1.0822 (max= 1.4061), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:02,705 - root - INFO - Step 8540: lr=3.39E-06, loss= 1.0822 (max= 1.4061), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:02,705 - root - INFO - Step 8540: lr=3.39E-06, loss= 1.0822 (max= 1.4061), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:02,705 - root - INFO - Step 8540: lr=3.39E-06, loss= 1.0822 (max= 1.4061), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:02,705 - root - INFO - Step 8540: lr=3.39E-06, loss= 1.0822 (max= 1.4061), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:02,705 - root - INFO - Step 8540: lr=3.39E-06, loss= 1.0822 (max= 1.4061), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:02,706 - root - INFO - Step 8540: lr=3.39E-06, loss= 1.0822 (max= 1.4061), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:34,611 - root - INFO - Step 8550: lr=3.39E-06, loss= 1.0783 (max= 1.4970), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:34,611 - root - INFO - Step 8550: lr=3.39E-06, loss= 1.0783 (max= 1.4970), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:34,611 - root - INFO - Step 8550: lr=3.39E-06, loss= 1.0783 (max= 1.4970), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:34,611 - root - INFO - Step 8550: lr=3.39E-06, loss= 1.0783 (max= 1.4970), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:34,611 - root - INFO - Step 8550: lr=3.39E-06, loss= 1.0783 (max= 1.4970), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:34,611 - root - INFO - Step 8550: lr=3.39E-06, loss= 1.0783 (max= 1.4970), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:34,611 - root - INFO - Step 8550: lr=3.39E-06, loss= 1.0783 (max= 1.4970), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:41:34,611 - root - INFO - Step 8550: lr=3.39E-06, loss= 1.0783 (max= 1.4970), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:06,422 - root - INFO - Step 8560: lr=3.39E-06, loss= 1.0524 (max= 1.4299), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:06,422 - root - INFO - Step 8560: lr=3.39E-06, loss= 1.0524 (max= 1.4299), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:06,422 - root - INFO - Step 8560: lr=3.39E-06, loss= 1.0524 (max= 1.4299), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:06,422 - root - INFO - Step 8560: lr=3.39E-06, loss= 1.0524 (max= 1.4299), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:06,422 - root - INFO - Step 8560: lr=3.39E-06, loss= 1.0524 (max= 1.4299), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:06,422 - root - INFO - Step 8560: lr=3.39E-06, loss= 1.0524 (max= 1.4299), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:06,422 - root - INFO - Step 8560: lr=3.39E-06, loss= 1.0524 (max= 1.4299), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:06,423 - root - INFO - Step 8560: lr=3.39E-06, loss= 1.0524 (max= 1.4299), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:38,268 - root - INFO - Step 8570: lr=3.38E-06, loss= 1.0641 (max= 1.4031), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:38,268 - root - INFO - Step 8570: lr=3.38E-06, loss= 1.0641 (max= 1.4031), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:38,268 - root - INFO - Step 8570: lr=3.38E-06, loss= 1.0641 (max= 1.4031), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:38,268 - root - INFO - Step 8570: lr=3.38E-06, loss= 1.0641 (max= 1.4031), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:38,268 - root - INFO - Step 8570: lr=3.38E-06, loss= 1.0641 (max= 1.4031), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:38,268 - root - INFO - Step 8570: lr=3.38E-06, loss= 1.0641 (max= 1.4031), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:38,268 - root - INFO - Step 8570: lr=3.38E-06, loss= 1.0641 (max= 1.4031), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:42:38,268 - root - INFO - Step 8570: lr=3.38E-06, loss= 1.0641 (max= 1.4031), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:10,101 - root - INFO - Step 8580: lr=3.38E-06, loss= 1.0475 (max= 1.5244), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:10,101 - root - INFO - Step 8580: lr=3.38E-06, loss= 1.0475 (max= 1.5244), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:10,101 - root - INFO - Step 8580: lr=3.38E-06, loss= 1.0475 (max= 1.5244), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:10,101 - root - INFO - Step 8580: lr=3.38E-06, loss= 1.0475 (max= 1.5244), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:10,101 - root - INFO - Step 8580: lr=3.38E-06, loss= 1.0475 (max= 1.5244), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:10,101 - root - INFO - Step 8580: lr=3.38E-06, loss= 1.0475 (max= 1.5244), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:10,101 - root - INFO - Step 8580: lr=3.38E-06, loss= 1.0475 (max= 1.5244), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:10,102 - root - INFO - Step 8580: lr=3.38E-06, loss= 1.0475 (max= 1.5244), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:41,949 - root - INFO - Step 8590: lr=3.37E-06, loss= 1.0929 (max= 1.4678), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:41,949 - root - INFO - Step 8590: lr=3.37E-06, loss= 1.0929 (max= 1.4678), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:41,949 - root - INFO - Step 8590: lr=3.37E-06, loss= 1.0929 (max= 1.4678), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:41,949 - root - INFO - Step 8590: lr=3.37E-06, loss= 1.0929 (max= 1.4678), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:41,949 - root - INFO - Step 8590: lr=3.37E-06, loss= 1.0929 (max= 1.4678), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:41,949 - root - INFO - Step 8590: lr=3.37E-06, loss= 1.0929 (max= 1.4678), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:41,949 - root - INFO - Step 8590: lr=3.37E-06, loss= 1.0929 (max= 1.4678), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:43:41,950 - root - INFO - Step 8590: lr=3.37E-06, loss= 1.0929 (max= 1.4678), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:13,841 - root - INFO - Step 8600: lr=3.37E-06, loss= 1.0503 (max= 1.5317), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:13,841 - root - INFO - Step 8600: lr=3.37E-06, loss= 1.0503 (max= 1.5317), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:13,841 - root - INFO - Step 8600: lr=3.37E-06, loss= 1.0503 (max= 1.5317), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:13,841 - root - INFO - Step 8600: lr=3.37E-06, loss= 1.0503 (max= 1.5317), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:13,841 - root - INFO - Step 8600: lr=3.37E-06, loss= 1.0503 (max= 1.5317), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:13,841 - root - INFO - Step 8600: lr=3.37E-06, loss= 1.0503 (max= 1.5317), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:13,841 - root - INFO - Step 8600: lr=3.37E-06, loss= 1.0503 (max= 1.5317), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:13,842 - root - INFO - Step 8600: lr=3.37E-06, loss= 1.0503 (max= 1.5317), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:13,877 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:6460572 2025-10-26 15:44:45,834 - root - INFO - Step 8610: lr=3.37E-06, loss= 1.0442 (max= 1.4262), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:45,834 - root - INFO - Step 8610: lr=3.37E-06, loss= 1.0442 (max= 1.4262), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:45,834 - root - INFO - Step 8610: lr=3.37E-06, loss= 1.0442 (max= 1.4262), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:45,834 - root - INFO - Step 8610: lr=3.37E-06, loss= 1.0442 (max= 1.4262), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:45,835 - root - INFO - Step 8610: lr=3.37E-06, loss= 1.0442 (max= 1.4262), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:45,835 - root - INFO - Step 8610: lr=3.37E-06, loss= 1.0442 (max= 1.4262), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:45,835 - root - INFO - Step 8610: lr=3.37E-06, loss= 1.0442 (max= 1.4262), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:44:45,835 - root - INFO - Step 8610: lr=3.37E-06, loss= 1.0442 (max= 1.4262), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:17,678 - root - INFO - Step 8620: lr=3.36E-06, loss= 1.0660 (max= 1.4589), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:17,678 - root - INFO - Step 8620: lr=3.36E-06, loss= 1.0660 (max= 1.4589), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:17,678 - root - INFO - Step 8620: lr=3.36E-06, loss= 1.0660 (max= 1.4589), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:17,678 - root - INFO - Step 8620: lr=3.36E-06, loss= 1.0660 (max= 1.4589), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:17,679 - root - INFO - Step 8620: lr=3.36E-06, loss= 1.0660 (max= 1.4589), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:17,679 - root - INFO - Step 8620: lr=3.36E-06, loss= 1.0660 (max= 1.4589), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:17,679 - root - INFO - Step 8620: lr=3.36E-06, loss= 1.0660 (max= 1.4589), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:17,679 - root - INFO - Step 8620: lr=3.36E-06, loss= 1.0660 (max= 1.4589), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:49,418 - root - INFO - Step 8630: lr=3.36E-06, loss= 1.0782 (max= 1.6969), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:49,418 - root - INFO - Step 8630: lr=3.36E-06, loss= 1.0782 (max= 1.6969), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:49,418 - root - INFO - Step 8630: lr=3.36E-06, loss= 1.0782 (max= 1.6969), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:49,418 - root - INFO - Step 8630: lr=3.36E-06, loss= 1.0782 (max= 1.6969), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:49,418 - root - INFO - Step 8630: lr=3.36E-06, loss= 1.0782 (max= 1.6969), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:49,418 - root - INFO - Step 8630: lr=3.36E-06, loss= 1.0782 (max= 1.6969), tps=20650, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:49,418 - root - INFO - Step 8630: lr=3.36E-06, loss= 1.0782 (max= 1.6969), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:45:49,419 - root - INFO - Step 8630: lr=3.36E-06, loss= 1.0782 (max= 1.6969), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:21,183 - root - INFO - Step 8640: lr=3.35E-06, loss= 1.0692 (max= 1.4594), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:21,183 - root - INFO - Step 8640: lr=3.35E-06, loss= 1.0692 (max= 1.4594), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:21,183 - root - INFO - Step 8640: lr=3.35E-06, loss= 1.0692 (max= 1.4594), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:21,184 - root - INFO - Step 8640: lr=3.35E-06, loss= 1.0692 (max= 1.4594), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:21,184 - root - INFO - Step 8640: lr=3.35E-06, loss= 1.0692 (max= 1.4594), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:21,184 - root - INFO - Step 8640: lr=3.35E-06, loss= 1.0692 (max= 1.4594), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:21,184 - root - INFO - Step 8640: lr=3.35E-06, loss= 1.0692 (max= 1.4594), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:21,184 - root - INFO - Step 8640: lr=3.35E-06, loss= 1.0692 (max= 1.4594), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:53,292 - root - INFO - Step 8650: lr=3.35E-06, loss= 1.0608 (max= 1.5861), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:53,292 - root - INFO - Step 8650: lr=3.35E-06, loss= 1.0608 (max= 1.5861), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:53,292 - root - INFO - Step 8650: lr=3.35E-06, loss= 1.0608 (max= 1.5861), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:53,292 - root - INFO - Step 8650: lr=3.35E-06, loss= 1.0608 (max= 1.5861), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:53,292 - root - INFO - Step 8650: lr=3.35E-06, loss= 1.0608 (max= 1.5861), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:53,292 - root - INFO - Step 8650: lr=3.35E-06, loss= 1.0608 (max= 1.5861), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:53,292 - root - INFO - Step 8650: lr=3.35E-06, loss= 1.0608 (max= 1.5861), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:46:53,293 - root - INFO - Step 8650: lr=3.35E-06, loss= 1.0608 (max= 1.5861), tps=20413, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:25,096 - root - INFO - Step 8660: lr=3.35E-06, loss= 1.0619 (max= 1.7268), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:25,096 - root - INFO - Step 8660: lr=3.35E-06, loss= 1.0619 (max= 1.7268), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:25,096 - root - INFO - Step 8660: lr=3.35E-06, loss= 1.0619 (max= 1.7268), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:25,096 - root - INFO - Step 8660: lr=3.35E-06, loss= 1.0619 (max= 1.7268), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:25,096 - root - INFO - Step 8660: lr=3.35E-06, loss= 1.0619 (max= 1.7268), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:25,096 - root - INFO - Step 8660: lr=3.35E-06, loss= 1.0619 (max= 1.7268), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:25,096 - root - INFO - Step 8660: lr=3.35E-06, loss= 1.0619 (max= 1.7268), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:25,096 - root - INFO - Step 8660: lr=3.35E-06, loss= 1.0619 (max= 1.7268), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:57,150 - root - INFO - Step 8670: lr=3.34E-06, loss= 1.0674 (max= 1.5563), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:57,150 - root - INFO - Step 8670: lr=3.34E-06, loss= 1.0674 (max= 1.5563), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:57,150 - root - INFO - Step 8670: lr=3.34E-06, loss= 1.0674 (max= 1.5563), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:57,150 - root - INFO - Step 8670: lr=3.34E-06, loss= 1.0674 (max= 1.5563), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:57,150 - root - INFO - Step 8670: lr=3.34E-06, loss= 1.0674 (max= 1.5563), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:57,150 - root - INFO - Step 8670: lr=3.34E-06, loss= 1.0674 (max= 1.5563), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:57,150 - root - INFO - Step 8670: lr=3.34E-06, loss= 1.0674 (max= 1.5563), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:47:57,151 - root - INFO - Step 8670: lr=3.34E-06, loss= 1.0674 (max= 1.5563), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:48:29,119 - root - INFO - Step 8680: lr=3.34E-06, loss= 1.0258 (max= 1.5011), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:48:29,119 - root - INFO - Step 8680: lr=3.34E-06, loss= 1.0258 (max= 1.5011), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:48:29,119 - root - INFO - Step 8680: lr=3.34E-06, loss= 1.0258 (max= 1.5011), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:48:29,119 - root - INFO - Step 8680: lr=3.34E-06, loss= 1.0258 (max= 1.5011), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:48:29,119 - root - INFO - Step 8680: lr=3.34E-06, loss= 1.0258 (max= 1.5011), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:48:29,119 - root - INFO - Step 8680: lr=3.34E-06, loss= 1.0258 (max= 1.5011), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:48:29,119 - root - INFO - Step 8680: lr=3.34E-06, loss= 1.0258 (max= 1.5011), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:48:29,120 - root - INFO - Step 8680: lr=3.34E-06, loss= 1.0258 (max= 1.5011), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:00,963 - root - INFO - Step 8690: lr=3.33E-06, loss= 1.0492 (max= 1.5314), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:00,963 - root - INFO - Step 8690: lr=3.33E-06, loss= 1.0492 (max= 1.5314), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:00,963 - root - INFO - Step 8690: lr=3.33E-06, loss= 1.0492 (max= 1.5314), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:00,963 - root - INFO - Step 8690: lr=3.33E-06, loss= 1.0492 (max= 1.5314), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:00,963 - root - INFO - Step 8690: lr=3.33E-06, loss= 1.0492 (max= 1.5314), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:00,963 - root - INFO - Step 8690: lr=3.33E-06, loss= 1.0492 (max= 1.5314), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:00,963 - root - INFO - Step 8690: lr=3.33E-06, loss= 1.0492 (max= 1.5314), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:00,964 - root - INFO - Step 8690: lr=3.33E-06, loss= 1.0492 (max= 1.5314), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:32,888 - root - INFO - Step 8700: lr=3.33E-06, loss= 1.0512 (max= 1.7488), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:32,888 - root - INFO - Step 8700: lr=3.33E-06, loss= 1.0512 (max= 1.7488), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:32,888 - root - INFO - Step 8700: lr=3.33E-06, loss= 1.0512 (max= 1.7488), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:32,889 - root - INFO - Step 8700: lr=3.33E-06, loss= 1.0512 (max= 1.7488), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:32,889 - root - INFO - Step 8700: lr=3.33E-06, loss= 1.0512 (max= 1.7488), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:32,889 - root - INFO - Step 8700: lr=3.33E-06, loss= 1.0512 (max= 1.7488), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:32,889 - root - INFO - Step 8700: lr=3.33E-06, loss= 1.0512 (max= 1.7488), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:49:32,889 - root - INFO - Step 8700: lr=3.33E-06, loss= 1.0512 (max= 1.7488), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:04,730 - root - INFO - Step 8710: lr=3.32E-06, loss= 1.0560 (max= 1.5470), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:04,730 - root - INFO - Step 8710: lr=3.32E-06, loss= 1.0560 (max= 1.5470), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:04,730 - root - INFO - Step 8710: lr=3.32E-06, loss= 1.0560 (max= 1.5470), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:04,731 - root - INFO - Step 8710: lr=3.32E-06, loss= 1.0560 (max= 1.5470), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:04,731 - root - INFO - Step 8710: lr=3.32E-06, loss= 1.0560 (max= 1.5470), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:04,731 - root - INFO - Step 8710: lr=3.32E-06, loss= 1.0560 (max= 1.5470), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:04,731 - root - INFO - Step 8710: lr=3.32E-06, loss= 1.0560 (max= 1.5470), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:04,731 - root - INFO - Step 8710: lr=3.32E-06, loss= 1.0560 (max= 1.5470), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:36,698 - root - INFO - Step 8720: lr=3.32E-06, loss= 1.0646 (max= 1.4843), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:36,698 - root - INFO - Step 8720: lr=3.32E-06, loss= 1.0646 (max= 1.4843), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:36,698 - root - INFO - Step 8720: lr=3.32E-06, loss= 1.0646 (max= 1.4843), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:36,698 - root - INFO - Step 8720: lr=3.32E-06, loss= 1.0646 (max= 1.4843), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:36,698 - root - INFO - Step 8720: lr=3.32E-06, loss= 1.0646 (max= 1.4843), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:36,698 - root - INFO - Step 8720: lr=3.32E-06, loss= 1.0646 (max= 1.4843), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:36,699 - root - INFO - Step 8720: lr=3.32E-06, loss= 1.0646 (max= 1.4843), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:50:36,699 - root - INFO - Step 8720: lr=3.32E-06, loss= 1.0646 (max= 1.4843), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:08,660 - root - INFO - Step 8730: lr=3.32E-06, loss= 1.0554 (max= 1.5394), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:08,661 - root - INFO - Step 8730: lr=3.32E-06, loss= 1.0554 (max= 1.5394), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:08,661 - root - INFO - Step 8730: lr=3.32E-06, loss= 1.0554 (max= 1.5394), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:08,661 - root - INFO - Step 8730: lr=3.32E-06, loss= 1.0554 (max= 1.5394), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:08,661 - root - INFO - Step 8730: lr=3.32E-06, loss= 1.0554 (max= 1.5394), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:08,661 - root - INFO - Step 8730: lr=3.32E-06, loss= 1.0554 (max= 1.5394), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:08,661 - root - INFO - Step 8730: lr=3.32E-06, loss= 1.0554 (max= 1.5394), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:08,661 - root - INFO - Step 8730: lr=3.32E-06, loss= 1.0554 (max= 1.5394), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:40,633 - root - INFO - Step 8740: lr=3.31E-06, loss= 1.0702 (max= 1.6138), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:40,633 - root - INFO - Step 8740: lr=3.31E-06, loss= 1.0702 (max= 1.6138), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:40,633 - root - INFO - Step 8740: lr=3.31E-06, loss= 1.0702 (max= 1.6138), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:40,633 - root - INFO - Step 8740: lr=3.31E-06, loss= 1.0702 (max= 1.6138), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:40,633 - root - INFO - Step 8740: lr=3.31E-06, loss= 1.0702 (max= 1.6138), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:40,633 - root - INFO - Step 8740: lr=3.31E-06, loss= 1.0702 (max= 1.6138), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:40,633 - root - INFO - Step 8740: lr=3.31E-06, loss= 1.0702 (max= 1.6138), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:51:40,634 - root - INFO - Step 8740: lr=3.31E-06, loss= 1.0702 (max= 1.6138), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:12,460 - root - INFO - Step 8750: lr=3.31E-06, loss= 1.0466 (max= 1.4885), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:12,460 - root - INFO - Step 8750: lr=3.31E-06, loss= 1.0466 (max= 1.4885), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:12,460 - root - INFO - Step 8750: lr=3.31E-06, loss= 1.0466 (max= 1.4885), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:12,460 - root - INFO - Step 8750: lr=3.31E-06, loss= 1.0466 (max= 1.4885), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:12,460 - root - INFO - Step 8750: lr=3.31E-06, loss= 1.0466 (max= 1.4885), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:12,460 - root - INFO - Step 8750: lr=3.31E-06, loss= 1.0466 (max= 1.4885), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:12,461 - root - INFO - Step 8750: lr=3.31E-06, loss= 1.0466 (max= 1.4885), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:12,461 - root - INFO - Step 8750: lr=3.31E-06, loss= 1.0466 (max= 1.4885), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:44,364 - root - INFO - Step 8760: lr=3.30E-06, loss= 1.0517 (max= 1.7183), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:44,364 - root - INFO - Step 8760: lr=3.30E-06, loss= 1.0517 (max= 1.7183), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:44,364 - root - INFO - Step 8760: lr=3.30E-06, loss= 1.0517 (max= 1.7183), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:44,364 - root - INFO - Step 8760: lr=3.30E-06, loss= 1.0517 (max= 1.7183), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:44,364 - root - INFO - Step 8760: lr=3.30E-06, loss= 1.0517 (max= 1.7183), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:44,364 - root - INFO - Step 8760: lr=3.30E-06, loss= 1.0517 (max= 1.7183), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:44,365 - root - INFO - Step 8760: lr=3.30E-06, loss= 1.0517 (max= 1.7183), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:52:44,365 - root - INFO - Step 8760: lr=3.30E-06, loss= 1.0517 (max= 1.7183), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:16,581 - root - INFO - Step 8770: lr=3.30E-06, loss= 1.0920 (max= 1.6524), tps=20344, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:16,581 - root - INFO - Step 8770: lr=3.30E-06, loss= 1.0920 (max= 1.6524), tps=20344, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:16,581 - root - INFO - Step 8770: lr=3.30E-06, loss= 1.0920 (max= 1.6524), tps=20344, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:16,581 - root - INFO - Step 8770: lr=3.30E-06, loss= 1.0920 (max= 1.6524), tps=20344, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:16,582 - root - INFO - Step 8770: lr=3.30E-06, loss= 1.0920 (max= 1.6524), tps=20344, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:16,582 - root - INFO - Step 8770: lr=3.30E-06, loss= 1.0920 (max= 1.6524), tps=20344, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:16,582 - root - INFO - Step 8770: lr=3.30E-06, loss= 1.0920 (max= 1.6524), tps=20344, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:16,582 - root - INFO - Step 8770: lr=3.30E-06, loss= 1.0920 (max= 1.6524), tps=20344, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:48,483 - root - INFO - Step 8780: lr=3.30E-06, loss= 1.0896 (max= 1.6347), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:48,483 - root - INFO - Step 8780: lr=3.30E-06, loss= 1.0896 (max= 1.6347), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:48,483 - root - INFO - Step 8780: lr=3.30E-06, loss= 1.0896 (max= 1.6347), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:48,483 - root - INFO - Step 8780: lr=3.30E-06, loss= 1.0896 (max= 1.6347), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:48,483 - root - INFO - Step 8780: lr=3.30E-06, loss= 1.0896 (max= 1.6347), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:48,483 - root - INFO - Step 8780: lr=3.30E-06, loss= 1.0896 (max= 1.6347), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:48,484 - root - INFO - Step 8780: lr=3.30E-06, loss= 1.0896 (max= 1.6347), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:53:48,484 - root - INFO - Step 8780: lr=3.30E-06, loss= 1.0896 (max= 1.6347), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:20,295 - root - INFO - Step 8790: lr=3.29E-06, loss= 1.0517 (max= 1.4944), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:20,295 - root - INFO - Step 8790: lr=3.29E-06, loss= 1.0517 (max= 1.4944), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:20,295 - root - INFO - Step 8790: lr=3.29E-06, loss= 1.0517 (max= 1.4944), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:20,295 - root - INFO - Step 8790: lr=3.29E-06, loss= 1.0517 (max= 1.4944), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:20,295 - root - INFO - Step 8790: lr=3.29E-06, loss= 1.0517 (max= 1.4944), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:20,295 - root - INFO - Step 8790: lr=3.29E-06, loss= 1.0517 (max= 1.4944), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:20,295 - root - INFO - Step 8790: lr=3.29E-06, loss= 1.0517 (max= 1.4944), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:20,296 - root - INFO - Step 8790: lr=3.29E-06, loss= 1.0517 (max= 1.4944), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:52,073 - root - INFO - Step 8800: lr=3.29E-06, loss= 1.0749 (max= 1.5338), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:52,073 - root - INFO - Step 8800: lr=3.29E-06, loss= 1.0749 (max= 1.5338), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:52,073 - root - INFO - Step 8800: lr=3.29E-06, loss= 1.0749 (max= 1.5338), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:52,073 - root - INFO - Step 8800: lr=3.29E-06, loss= 1.0749 (max= 1.5338), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:52,073 - root - INFO - Step 8800: lr=3.29E-06, loss= 1.0749 (max= 1.5338), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:52,073 - root - INFO - Step 8800: lr=3.29E-06, loss= 1.0749 (max= 1.5338), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:52,073 - root - INFO - Step 8800: lr=3.29E-06, loss= 1.0749 (max= 1.5338), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:54:52,074 - root - INFO - Step 8800: lr=3.29E-06, loss= 1.0749 (max= 1.5338), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:23,916 - root - INFO - Step 8810: lr=3.28E-06, loss= 1.0823 (max= 1.7290), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:23,916 - root - INFO - Step 8810: lr=3.28E-06, loss= 1.0823 (max= 1.7290), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:23,916 - root - INFO - Step 8810: lr=3.28E-06, loss= 1.0823 (max= 1.7290), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:23,916 - root - INFO - Step 8810: lr=3.28E-06, loss= 1.0823 (max= 1.7290), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:23,916 - root - INFO - Step 8810: lr=3.28E-06, loss= 1.0823 (max= 1.7290), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:23,916 - root - INFO - Step 8810: lr=3.28E-06, loss= 1.0823 (max= 1.7290), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:23,916 - root - INFO - Step 8810: lr=3.28E-06, loss= 1.0823 (max= 1.7290), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:23,917 - root - INFO - Step 8810: lr=3.28E-06, loss= 1.0823 (max= 1.7290), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:55,712 - root - INFO - Step 8820: lr=3.28E-06, loss= 1.0472 (max= 1.5238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:55,712 - root - INFO - Step 8820: lr=3.28E-06, loss= 1.0472 (max= 1.5238), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:55,712 - root - INFO - Step 8820: lr=3.28E-06, loss= 1.0472 (max= 1.5238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:55,712 - root - INFO - Step 8820: lr=3.28E-06, loss= 1.0472 (max= 1.5238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:55,712 - root - INFO - Step 8820: lr=3.28E-06, loss= 1.0472 (max= 1.5238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:55,712 - root - INFO - Step 8820: lr=3.28E-06, loss= 1.0472 (max= 1.5238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:55,712 - root - INFO - Step 8820: lr=3.28E-06, loss= 1.0472 (max= 1.5238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:55:55,713 - root - INFO - Step 8820: lr=3.28E-06, loss= 1.0472 (max= 1.5238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:27,516 - root - INFO - Step 8830: lr=3.28E-06, loss= 1.0698 (max= 1.5257), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:27,516 - root - INFO - Step 8830: lr=3.28E-06, loss= 1.0698 (max= 1.5257), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:27,516 - root - INFO - Step 8830: lr=3.28E-06, loss= 1.0698 (max= 1.5257), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:27,516 - root - INFO - Step 8830: lr=3.28E-06, loss= 1.0698 (max= 1.5257), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:27,516 - root - INFO - Step 8830: lr=3.28E-06, loss= 1.0698 (max= 1.5257), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:27,516 - root - INFO - Step 8830: lr=3.28E-06, loss= 1.0698 (max= 1.5257), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:27,516 - root - INFO - Step 8830: lr=3.28E-06, loss= 1.0698 (max= 1.5257), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:27,517 - root - INFO - Step 8830: lr=3.28E-06, loss= 1.0698 (max= 1.5257), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:59,445 - root - INFO - Step 8840: lr=3.27E-06, loss= 1.0557 (max= 1.5064), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:59,445 - root - INFO - Step 8840: lr=3.27E-06, loss= 1.0557 (max= 1.5064), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:59,445 - root - INFO - Step 8840: lr=3.27E-06, loss= 1.0557 (max= 1.5064), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:59,445 - root - INFO - Step 8840: lr=3.27E-06, loss= 1.0557 (max= 1.5064), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:59,445 - root - INFO - Step 8840: lr=3.27E-06, loss= 1.0557 (max= 1.5064), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:59,445 - root - INFO - Step 8840: lr=3.27E-06, loss= 1.0557 (max= 1.5064), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:59,445 - root - INFO - Step 8840: lr=3.27E-06, loss= 1.0557 (max= 1.5064), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:56:59,446 - root - INFO - Step 8840: lr=3.27E-06, loss= 1.0557 (max= 1.5064), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:57:31,247 - root - INFO - Step 8850: lr=3.27E-06, loss= 1.0769 (max= 2.3198), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:57:31,247 - root - INFO - Step 8850: lr=3.27E-06, loss= 1.0769 (max= 2.3198), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:57:31,247 - root - INFO - Step 8850: lr=3.27E-06, loss= 1.0769 (max= 2.3198), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:57:31,247 - root - INFO - Step 8850: lr=3.27E-06, loss= 1.0769 (max= 2.3198), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:57:31,247 - root - INFO - Step 8850: lr=3.27E-06, loss= 1.0769 (max= 2.3198), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:57:31,247 - root - INFO - Step 8850: lr=3.27E-06, loss= 1.0769 (max= 2.3198), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:57:31,247 - root - INFO - Step 8850: lr=3.27E-06, loss= 1.0769 (max= 2.3198), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:57:31,248 - root - INFO - Step 8850: lr=3.27E-06, loss= 1.0769 (max= 2.3198), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:03,187 - root - INFO - Step 8860: lr=3.26E-06, loss= 1.0559 (max= 1.5829), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:03,187 - root - INFO - Step 8860: lr=3.26E-06, loss= 1.0559 (max= 1.5829), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:03,187 - root - INFO - Step 8860: lr=3.26E-06, loss= 1.0559 (max= 1.5829), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:03,187 - root - INFO - Step 8860: lr=3.26E-06, loss= 1.0559 (max= 1.5829), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:03,187 - root - INFO - Step 8860: lr=3.26E-06, loss= 1.0559 (max= 1.5829), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:03,188 - root - INFO - Step 8860: lr=3.26E-06, loss= 1.0559 (max= 1.5829), tps=20521, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:03,188 - root - INFO - Step 8860: lr=3.26E-06, loss= 1.0559 (max= 1.5829), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:03,188 - root - INFO - Step 8860: lr=3.26E-06, loss= 1.0559 (max= 1.5829), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:35,297 - root - INFO - Step 8870: lr=3.26E-06, loss= 1.0557 (max= 1.4161), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:35,297 - root - INFO - Step 8870: lr=3.26E-06, loss= 1.0557 (max= 1.4161), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:35,297 - root - INFO - Step 8870: lr=3.26E-06, loss= 1.0557 (max= 1.4161), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:35,297 - root - INFO - Step 8870: lr=3.26E-06, loss= 1.0557 (max= 1.4161), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:35,297 - root - INFO - Step 8870: lr=3.26E-06, loss= 1.0557 (max= 1.4161), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:35,297 - root - INFO - Step 8870: lr=3.26E-06, loss= 1.0557 (max= 1.4161), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:35,297 - root - INFO - Step 8870: lr=3.26E-06, loss= 1.0557 (max= 1.4161), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:58:35,298 - root - INFO - Step 8870: lr=3.26E-06, loss= 1.0557 (max= 1.4161), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:07,193 - root - INFO - Step 8880: lr=3.26E-06, loss= 1.0578 (max= 1.5009), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:07,193 - root - INFO - Step 8880: lr=3.26E-06, loss= 1.0578 (max= 1.5009), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:07,193 - root - INFO - Step 8880: lr=3.26E-06, loss= 1.0578 (max= 1.5009), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:07,193 - root - INFO - Step 8880: lr=3.26E-06, loss= 1.0578 (max= 1.5009), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:07,193 - root - INFO - Step 8880: lr=3.26E-06, loss= 1.0578 (max= 1.5009), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:07,193 - root - INFO - Step 8880: lr=3.26E-06, loss= 1.0578 (max= 1.5009), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:07,193 - root - INFO - Step 8880: lr=3.26E-06, loss= 1.0578 (max= 1.5009), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:07,194 - root - INFO - Step 8880: lr=3.26E-06, loss= 1.0578 (max= 1.5009), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:39,339 - root - INFO - Step 8890: lr=3.25E-06, loss= 1.0460 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:39,339 - root - INFO - Step 8890: lr=3.25E-06, loss= 1.0460 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:39,339 - root - INFO - Step 8890: lr=3.25E-06, loss= 1.0460 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:39,339 - root - INFO - Step 8890: lr=3.25E-06, loss= 1.0460 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:39,339 - root - INFO - Step 8890: lr=3.25E-06, loss= 1.0460 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:39,339 - root - INFO - Step 8890: lr=3.25E-06, loss= 1.0460 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:39,339 - root - INFO - Step 8890: lr=3.25E-06, loss= 1.0460 (max= 1.4936), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 15:59:39,339 - root - INFO - Step 8890: lr=3.25E-06, loss= 1.0460 (max= 1.4936), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:11,065 - root - INFO - Step 8900: lr=3.25E-06, loss= 1.0801 (max= 1.5101), tps=20659, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:11,065 - root - INFO - Step 8900: lr=3.25E-06, loss= 1.0801 (max= 1.5101), tps=20659, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:11,065 - root - INFO - Step 8900: lr=3.25E-06, loss= 1.0801 (max= 1.5101), tps=20659, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:11,065 - root - INFO - Step 8900: lr=3.25E-06, loss= 1.0801 (max= 1.5101), tps=20659, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:11,065 - root - INFO - Step 8900: lr=3.25E-06, loss= 1.0801 (max= 1.5101), tps=20659, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:11,065 - root - INFO - Step 8900: lr=3.25E-06, loss= 1.0801 (max= 1.5101), tps=20659, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:11,065 - root - INFO - Step 8900: lr=3.25E-06, loss= 1.0801 (max= 1.5101), tps=20659, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:11,066 - root - INFO - Step 8900: lr=3.25E-06, loss= 1.0801 (max= 1.5101), tps=20659, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:42,930 - root - INFO - Step 8910: lr=3.24E-06, loss= 1.0491 (max= 1.5036), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:42,930 - root - INFO - Step 8910: lr=3.24E-06, loss= 1.0491 (max= 1.5036), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:42,930 - root - INFO - Step 8910: lr=3.24E-06, loss= 1.0491 (max= 1.5036), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:42,930 - root - INFO - Step 8910: lr=3.24E-06, loss= 1.0491 (max= 1.5036), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:42,930 - root - INFO - Step 8910: lr=3.24E-06, loss= 1.0491 (max= 1.5036), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:42,930 - root - INFO - Step 8910: lr=3.24E-06, loss= 1.0491 (max= 1.5036), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:42,930 - root - INFO - Step 8910: lr=3.24E-06, loss= 1.0491 (max= 1.5036), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:00:42,931 - root - INFO - Step 8910: lr=3.24E-06, loss= 1.0491 (max= 1.5036), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:12,188 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2680781 2025-10-26 16:01:14,810 - root - INFO - Step 8920: lr=3.24E-06, loss= 1.0668 (max= 1.4729), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:14,810 - root - INFO - Step 8920: lr=3.24E-06, loss= 1.0668 (max= 1.4729), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:14,810 - root - INFO - Step 8920: lr=3.24E-06, loss= 1.0668 (max= 1.4729), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:14,810 - root - INFO - Step 8920: lr=3.24E-06, loss= 1.0668 (max= 1.4729), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:14,810 - root - INFO - Step 8920: lr=3.24E-06, loss= 1.0668 (max= 1.4729), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:14,810 - root - INFO - Step 8920: lr=3.24E-06, loss= 1.0668 (max= 1.4729), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:14,810 - root - INFO - Step 8920: lr=3.24E-06, loss= 1.0668 (max= 1.4729), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:14,810 - root - INFO - Step 8920: lr=3.24E-06, loss= 1.0668 (max= 1.4729), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:46,644 - root - INFO - Step 8930: lr=3.24E-06, loss= 1.0569 (max= 1.4532), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:46,644 - root - INFO - Step 8930: lr=3.24E-06, loss= 1.0569 (max= 1.4532), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:46,644 - root - INFO - Step 8930: lr=3.24E-06, loss= 1.0569 (max= 1.4532), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:46,644 - root - INFO - Step 8930: lr=3.24E-06, loss= 1.0569 (max= 1.4532), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:46,644 - root - INFO - Step 8930: lr=3.24E-06, loss= 1.0569 (max= 1.4532), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:46,644 - root - INFO - Step 8930: lr=3.24E-06, loss= 1.0569 (max= 1.4532), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:46,644 - root - INFO - Step 8930: lr=3.24E-06, loss= 1.0569 (max= 1.4532), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:01:46,644 - root - INFO - Step 8930: lr=3.24E-06, loss= 1.0569 (max= 1.4532), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:18,439 - root - INFO - Step 8940: lr=3.23E-06, loss= 1.0828 (max= 1.4778), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:18,439 - root - INFO - Step 8940: lr=3.23E-06, loss= 1.0828 (max= 1.4778), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:18,439 - root - INFO - Step 8940: lr=3.23E-06, loss= 1.0828 (max= 1.4778), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:18,439 - root - INFO - Step 8940: lr=3.23E-06, loss= 1.0828 (max= 1.4778), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:18,439 - root - INFO - Step 8940: lr=3.23E-06, loss= 1.0828 (max= 1.4778), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:18,439 - root - INFO - Step 8940: lr=3.23E-06, loss= 1.0828 (max= 1.4778), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:18,439 - root - INFO - Step 8940: lr=3.23E-06, loss= 1.0828 (max= 1.4778), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:18,440 - root - INFO - Step 8940: lr=3.23E-06, loss= 1.0828 (max= 1.4778), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:36,755 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:6280204 2025-10-26 16:02:50,618 - root - INFO - Step 8950: lr=3.23E-06, loss= 1.0527 (max= 1.4963), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:50,618 - root - INFO - Step 8950: lr=3.23E-06, loss= 1.0527 (max= 1.4963), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:50,618 - root - INFO - Step 8950: lr=3.23E-06, loss= 1.0527 (max= 1.4963), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:50,618 - root - INFO - Step 8950: lr=3.23E-06, loss= 1.0527 (max= 1.4963), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:50,618 - root - INFO - Step 8950: lr=3.23E-06, loss= 1.0527 (max= 1.4963), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:50,618 - root - INFO - Step 8950: lr=3.23E-06, loss= 1.0527 (max= 1.4963), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:50,618 - root - INFO - Step 8950: lr=3.23E-06, loss= 1.0527 (max= 1.4963), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:02:50,619 - root - INFO - Step 8950: lr=3.23E-06, loss= 1.0527 (max= 1.4963), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:22,532 - root - INFO - Step 8960: lr=3.22E-06, loss= 1.0634 (max= 1.5136), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:22,532 - root - INFO - Step 8960: lr=3.22E-06, loss= 1.0634 (max= 1.5136), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:22,532 - root - INFO - Step 8960: lr=3.22E-06, loss= 1.0634 (max= 1.5136), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:22,532 - root - INFO - Step 8960: lr=3.22E-06, loss= 1.0634 (max= 1.5136), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:22,532 - root - INFO - Step 8960: lr=3.22E-06, loss= 1.0634 (max= 1.5136), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:22,532 - root - INFO - Step 8960: lr=3.22E-06, loss= 1.0634 (max= 1.5136), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:22,532 - root - INFO - Step 8960: lr=3.22E-06, loss= 1.0634 (max= 1.5136), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:22,533 - root - INFO - Step 8960: lr=3.22E-06, loss= 1.0634 (max= 1.5136), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:54,412 - root - INFO - Step 8970: lr=3.22E-06, loss= 1.0580 (max= 1.4054), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:54,412 - root - INFO - Step 8970: lr=3.22E-06, loss= 1.0580 (max= 1.4054), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:54,412 - root - INFO - Step 8970: lr=3.22E-06, loss= 1.0580 (max= 1.4054), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:54,412 - root - INFO - Step 8970: lr=3.22E-06, loss= 1.0580 (max= 1.4054), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:54,412 - root - INFO - Step 8970: lr=3.22E-06, loss= 1.0580 (max= 1.4054), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:54,412 - root - INFO - Step 8970: lr=3.22E-06, loss= 1.0580 (max= 1.4054), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:54,412 - root - INFO - Step 8970: lr=3.22E-06, loss= 1.0580 (max= 1.4054), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:03:54,413 - root - INFO - Step 8970: lr=3.22E-06, loss= 1.0580 (max= 1.4054), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:26,447 - root - INFO - Step 8980: lr=3.22E-06, loss= 1.0641 (max= 1.4742), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:26,447 - root - INFO - Step 8980: lr=3.22E-06, loss= 1.0641 (max= 1.4742), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:26,447 - root - INFO - Step 8980: lr=3.22E-06, loss= 1.0641 (max= 1.4742), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:26,447 - root - INFO - Step 8980: lr=3.22E-06, loss= 1.0641 (max= 1.4742), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:26,447 - root - INFO - Step 8980: lr=3.22E-06, loss= 1.0641 (max= 1.4742), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:26,447 - root - INFO - Step 8980: lr=3.22E-06, loss= 1.0641 (max= 1.4742), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:26,447 - root - INFO - Step 8980: lr=3.22E-06, loss= 1.0641 (max= 1.4742), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:26,448 - root - INFO - Step 8980: lr=3.22E-06, loss= 1.0641 (max= 1.4742), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:58,294 - root - INFO - Step 8990: lr=3.21E-06, loss= 1.0413 (max= 1.3918), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:58,294 - root - INFO - Step 8990: lr=3.21E-06, loss= 1.0413 (max= 1.3918), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:58,294 - root - INFO - Step 8990: lr=3.21E-06, loss= 1.0413 (max= 1.3918), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:58,294 - root - INFO - Step 8990: lr=3.21E-06, loss= 1.0413 (max= 1.3918), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:58,294 - root - INFO - Step 8990: lr=3.21E-06, loss= 1.0413 (max= 1.3918), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:58,294 - root - INFO - Step 8990: lr=3.21E-06, loss= 1.0413 (max= 1.3918), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:58,295 - root - INFO - Step 8990: lr=3.21E-06, loss= 1.0413 (max= 1.3918), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:04:58,295 - root - INFO - Step 8990: lr=3.21E-06, loss= 1.0413 (max= 1.3918), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-9000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-9000! Save time: 4.490308523178101 2025-10-26 16:05:30,158 - root - INFO - Step 9000: lr=3.21E-06, loss= 1.0696 (max= 1.6031), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:05:30,158 - root - INFO - Saving a full checkpoint at step 9000 2025-10-26 16:05:30,158 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:05:30,158 - root - INFO - Step 9000: lr=3.21E-06, loss= 1.0696 (max= 1.6031), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:05:30,158 - root - INFO - Step 9000: lr=3.21E-06, loss= 1.0696 (max= 1.6031), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:05:30,158 - root - INFO - Step 9000: lr=3.21E-06, loss= 1.0696 (max= 1.6031), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:05:30,158 - root - INFO - Saving a full checkpoint at step 9000 2025-10-26 16:05:30,158 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:05:30,158 - root - INFO - Step 9000: lr=3.21E-06, loss= 1.0696 (max= 1.6031), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:05:30,158 - root - INFO - Saving a full checkpoint at step 9000 2025-10-26 16:05:30,158 - root - INFO - Saving a full checkpoint at step 9000 2025-10-26 16:05:30,158 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:05:30,158 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:05:30,158 - root - INFO - Saving a full checkpoint at step 9000 2025-10-26 16:05:30,158 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:05:30,158 - root - INFO - Step 9000: lr=3.21E-06, loss= 1.0696 (max= 1.6031), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:05:30,158 - root - INFO - Saving a full checkpoint at step 9000 2025-10-26 16:05:30,158 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:05:30,158 - root - INFO - Step 9000: lr=3.21E-06, loss= 1.0696 (max= 1.6031), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:05:30,158 - root - INFO - Saving a full checkpoint at step 9000 2025-10-26 16:05:30,158 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:05:30,159 - root - INFO - Step 9000: lr=3.21E-06, loss= 1.0696 (max= 1.6031), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:05:30,159 - root - INFO - Saving a full checkpoint at step 9000 2025-10-26 16:05:30,159 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:05:45,056 - root - INFO - Finished saving the checkpoint in 14.90 seconds 2025-10-26 16:05:45,064 - root - INFO - Finished saving the checkpoint in 14.91 seconds 2025-10-26 16:05:45,064 - root - INFO - Finished saving the checkpoint in 14.91 seconds 2025-10-26 16:05:45,065 - root - INFO - Finished saving the checkpoint in 14.91 seconds 2025-10-26 16:05:45,065 - root - INFO - Finished saving the checkpoint in 14.91 seconds 2025-10-26 16:05:45,065 - root - INFO - Finished saving the checkpoint in 14.91 seconds 2025-10-26 16:05:45,066 - root - INFO - Finished saving the checkpoint in 14.91 seconds 2025-10-26 16:05:45,066 - root - INFO - Finished saving the checkpoint in 14.91 seconds 2025-10-26 16:06:16,876 - root - INFO - Step 9010: lr=3.20E-06, loss= 1.0605 (max= 1.5834), tps=14029, mfu=29.23%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:16,877 - root - INFO - Step 9010: lr=3.20E-06, loss= 1.0605 (max= 1.5834), tps=14029, mfu=29.23%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:16,877 - root - INFO - Step 9010: lr=3.20E-06, loss= 1.0605 (max= 1.5834), tps=14029, mfu=29.23%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:16,877 - root - INFO - Step 9010: lr=3.20E-06, loss= 1.0605 (max= 1.5834), tps=14029, mfu=29.23%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:16,877 - root - INFO - Step 9010: lr=3.20E-06, loss= 1.0605 (max= 1.5834), tps=14029, mfu=29.23%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:16,877 - root - INFO - Step 9010: lr=3.20E-06, loss= 1.0605 (max= 1.5834), tps=14029, mfu=29.23%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:16,877 - root - INFO - Step 9010: lr=3.20E-06, loss= 1.0605 (max= 1.5834), tps=14029, mfu=29.23%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:16,877 - root - INFO - Step 9010: lr=3.20E-06, loss= 1.0605 (max= 1.5834), tps=14029, mfu=29.23%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:48,768 - root - INFO - Step 9020: lr=3.20E-06, loss= 1.0502 (max= 1.4262), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:48,768 - root - INFO - Step 9020: lr=3.20E-06, loss= 1.0502 (max= 1.4262), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:48,768 - root - INFO - Step 9020: lr=3.20E-06, loss= 1.0502 (max= 1.4262), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:48,768 - root - INFO - Step 9020: lr=3.20E-06, loss= 1.0502 (max= 1.4262), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:48,768 - root - INFO - Step 9020: lr=3.20E-06, loss= 1.0502 (max= 1.4262), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:48,769 - root - INFO - Step 9020: lr=3.20E-06, loss= 1.0502 (max= 1.4262), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:48,769 - root - INFO - Step 9020: lr=3.20E-06, loss= 1.0502 (max= 1.4262), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:06:48,769 - root - INFO - Step 9020: lr=3.20E-06, loss= 1.0502 (max= 1.4262), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:20,697 - root - INFO - Step 9030: lr=3.20E-06, loss= 1.0572 (max= 1.6243), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:20,697 - root - INFO - Step 9030: lr=3.20E-06, loss= 1.0572 (max= 1.6243), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:20,697 - root - INFO - Step 9030: lr=3.20E-06, loss= 1.0572 (max= 1.6243), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:20,697 - root - INFO - Step 9030: lr=3.20E-06, loss= 1.0572 (max= 1.6243), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:20,697 - root - INFO - Step 9030: lr=3.20E-06, loss= 1.0572 (max= 1.6243), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:20,697 - root - INFO - Step 9030: lr=3.20E-06, loss= 1.0572 (max= 1.6243), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:20,697 - root - INFO - Step 9030: lr=3.20E-06, loss= 1.0572 (max= 1.6243), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:20,697 - root - INFO - Step 9030: lr=3.20E-06, loss= 1.0572 (max= 1.6243), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:27,650 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5261724 2025-10-26 16:07:52,538 - root - INFO - Step 9040: lr=3.19E-06, loss= 1.0481 (max= 1.4632), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:52,538 - root - INFO - Step 9040: lr=3.19E-06, loss= 1.0481 (max= 1.4632), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:52,538 - root - INFO - Step 9040: lr=3.19E-06, loss= 1.0481 (max= 1.4632), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:52,538 - root - INFO - Step 9040: lr=3.19E-06, loss= 1.0481 (max= 1.4632), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:52,538 - root - INFO - Step 9040: lr=3.19E-06, loss= 1.0481 (max= 1.4632), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:52,538 - root - INFO - Step 9040: lr=3.19E-06, loss= 1.0481 (max= 1.4632), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:52,538 - root - INFO - Step 9040: lr=3.19E-06, loss= 1.0481 (max= 1.4632), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:07:52,539 - root - INFO - Step 9040: lr=3.19E-06, loss= 1.0481 (max= 1.4632), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:24,347 - root - INFO - Step 9050: lr=3.19E-06, loss= 1.0681 (max= 1.4937), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:24,348 - root - INFO - Step 9050: lr=3.19E-06, loss= 1.0681 (max= 1.4937), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:24,348 - root - INFO - Step 9050: lr=3.19E-06, loss= 1.0681 (max= 1.4937), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:24,348 - root - INFO - Step 9050: lr=3.19E-06, loss= 1.0681 (max= 1.4937), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:24,348 - root - INFO - Step 9050: lr=3.19E-06, loss= 1.0681 (max= 1.4937), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:24,348 - root - INFO - Step 9050: lr=3.19E-06, loss= 1.0681 (max= 1.4937), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:24,348 - root - INFO - Step 9050: lr=3.19E-06, loss= 1.0681 (max= 1.4937), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:24,348 - root - INFO - Step 9050: lr=3.19E-06, loss= 1.0681 (max= 1.4937), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:56,091 - root - INFO - Step 9060: lr=3.18E-06, loss= 1.0179 (max= 1.4513), tps=20648, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:56,091 - root - INFO - Step 9060: lr=3.18E-06, loss= 1.0179 (max= 1.4513), tps=20648, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:56,091 - root - INFO - Step 9060: lr=3.18E-06, loss= 1.0179 (max= 1.4513), tps=20648, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:56,091 - root - INFO - Step 9060: lr=3.18E-06, loss= 1.0179 (max= 1.4513), tps=20648, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:56,091 - root - INFO - Step 9060: lr=3.18E-06, loss= 1.0179 (max= 1.4513), tps=20648, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:56,091 - root - INFO - Step 9060: lr=3.18E-06, loss= 1.0179 (max= 1.4513), tps=20648, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:56,091 - root - INFO - Step 9060: lr=3.18E-06, loss= 1.0179 (max= 1.4513), tps=20648, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:08:56,092 - root - INFO - Step 9060: lr=3.18E-06, loss= 1.0179 (max= 1.4513), tps=20648, mfu=43.02%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:09:28,095 - root - INFO - Step 9070: lr=3.18E-06, loss= 1.0557 (max= 1.6546), tps=20480, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:09:28,095 - root - INFO - Step 9070: lr=3.18E-06, loss= 1.0557 (max= 1.6546), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:09:28,095 - root - INFO - Step 9070: lr=3.18E-06, loss= 1.0557 (max= 1.6546), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:09:28,095 - root - INFO - Step 9070: lr=3.18E-06, loss= 1.0557 (max= 1.6546), tps=20480, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:09:28,095 - root - INFO - Step 9070: lr=3.18E-06, loss= 1.0557 (max= 1.6546), tps=20480, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:09:28,095 - root - INFO - Step 9070: lr=3.18E-06, loss= 1.0557 (max= 1.6546), tps=20480, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:09:28,095 - root - INFO - Step 9070: lr=3.18E-06, loss= 1.0557 (max= 1.6546), tps=20480, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:09:28,096 - root - INFO - Step 9070: lr=3.18E-06, loss= 1.0557 (max= 1.6546), tps=20480, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:00,195 - root - INFO - Step 9080: lr=3.18E-06, loss= 1.0843 (max= 1.4485), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:00,195 - root - INFO - Step 9080: lr=3.18E-06, loss= 1.0843 (max= 1.4485), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:00,195 - root - INFO - Step 9080: lr=3.18E-06, loss= 1.0843 (max= 1.4485), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:00,195 - root - INFO - Step 9080: lr=3.18E-06, loss= 1.0843 (max= 1.4485), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:00,195 - root - INFO - Step 9080: lr=3.18E-06, loss= 1.0843 (max= 1.4485), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:00,195 - root - INFO - Step 9080: lr=3.18E-06, loss= 1.0843 (max= 1.4485), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:00,195 - root - INFO - Step 9080: lr=3.18E-06, loss= 1.0843 (max= 1.4485), tps=20418, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:00,196 - root - INFO - Step 9080: lr=3.18E-06, loss= 1.0843 (max= 1.4485), tps=20419, mfu=42.54%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:32,098 - root - INFO - Step 9090: lr=3.17E-06, loss= 1.0624 (max= 1.6885), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:32,098 - root - INFO - Step 9090: lr=3.17E-06, loss= 1.0624 (max= 1.6885), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:32,098 - root - INFO - Step 9090: lr=3.17E-06, loss= 1.0624 (max= 1.6885), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:32,098 - root - INFO - Step 9090: lr=3.17E-06, loss= 1.0624 (max= 1.6885), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:32,098 - root - INFO - Step 9090: lr=3.17E-06, loss= 1.0624 (max= 1.6885), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:32,098 - root - INFO - Step 9090: lr=3.17E-06, loss= 1.0624 (max= 1.6885), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:32,098 - root - INFO - Step 9090: lr=3.17E-06, loss= 1.0624 (max= 1.6885), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:10:32,098 - root - INFO - Step 9090: lr=3.17E-06, loss= 1.0624 (max= 1.6885), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:03,929 - root - INFO - Step 9100: lr=3.17E-06, loss= 1.0649 (max= 1.5462), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:03,929 - root - INFO - Step 9100: lr=3.17E-06, loss= 1.0649 (max= 1.5462), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:03,929 - root - INFO - Step 9100: lr=3.17E-06, loss= 1.0649 (max= 1.5462), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:03,929 - root - INFO - Step 9100: lr=3.17E-06, loss= 1.0649 (max= 1.5462), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:03,929 - root - INFO - Step 9100: lr=3.17E-06, loss= 1.0649 (max= 1.5462), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:03,929 - root - INFO - Step 9100: lr=3.17E-06, loss= 1.0649 (max= 1.5462), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:03,929 - root - INFO - Step 9100: lr=3.17E-06, loss= 1.0649 (max= 1.5462), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:03,929 - root - INFO - Step 9100: lr=3.17E-06, loss= 1.0649 (max= 1.5462), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:35,783 - root - INFO - Step 9110: lr=3.16E-06, loss= 1.0587 (max= 1.8393), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:35,783 - root - INFO - Step 9110: lr=3.16E-06, loss= 1.0587 (max= 1.8393), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:35,783 - root - INFO - Step 9110: lr=3.16E-06, loss= 1.0587 (max= 1.8393), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:35,783 - root - INFO - Step 9110: lr=3.16E-06, loss= 1.0587 (max= 1.8393), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:35,783 - root - INFO - Step 9110: lr=3.16E-06, loss= 1.0587 (max= 1.8393), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:35,783 - root - INFO - Step 9110: lr=3.16E-06, loss= 1.0587 (max= 1.8393), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:35,783 - root - INFO - Step 9110: lr=3.16E-06, loss= 1.0587 (max= 1.8393), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:11:35,784 - root - INFO - Step 9110: lr=3.16E-06, loss= 1.0587 (max= 1.8393), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:07,638 - root - INFO - Step 9120: lr=3.16E-06, loss= 1.0802 (max= 1.7565), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:07,638 - root - INFO - Step 9120: lr=3.16E-06, loss= 1.0802 (max= 1.7565), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:07,639 - root - INFO - Step 9120: lr=3.16E-06, loss= 1.0802 (max= 1.7565), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:07,639 - root - INFO - Step 9120: lr=3.16E-06, loss= 1.0802 (max= 1.7565), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:07,639 - root - INFO - Step 9120: lr=3.16E-06, loss= 1.0802 (max= 1.7565), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:07,639 - root - INFO - Step 9120: lr=3.16E-06, loss= 1.0802 (max= 1.7565), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:07,639 - root - INFO - Step 9120: lr=3.16E-06, loss= 1.0802 (max= 1.7565), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:07,639 - root - INFO - Step 9120: lr=3.16E-06, loss= 1.0802 (max= 1.7565), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:39,527 - root - INFO - Step 9130: lr=3.16E-06, loss= 1.0686 (max= 1.6755), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:39,527 - root - INFO - Step 9130: lr=3.16E-06, loss= 1.0686 (max= 1.6755), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:39,528 - root - INFO - Step 9130: lr=3.16E-06, loss= 1.0686 (max= 1.6755), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:39,528 - root - INFO - Step 9130: lr=3.16E-06, loss= 1.0686 (max= 1.6755), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:39,528 - root - INFO - Step 9130: lr=3.16E-06, loss= 1.0686 (max= 1.6755), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:39,528 - root - INFO - Step 9130: lr=3.16E-06, loss= 1.0686 (max= 1.6755), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:39,528 - root - INFO - Step 9130: lr=3.16E-06, loss= 1.0686 (max= 1.6755), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:12:39,528 - root - INFO - Step 9130: lr=3.16E-06, loss= 1.0686 (max= 1.6755), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:13:11,524 - root - INFO - Step 9140: lr=3.15E-06, loss= 1.0735 (max= 1.4651), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) 2025-10-26 16:13:11,524 - root - INFO - Step 9140: lr=3.15E-06, loss= 1.0735 (max= 1.4651), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) 2025-10-26 16:13:11,524 - root - INFO - Step 9140: lr=3.15E-06, loss= 1.0735 (max= 1.4651), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) 2025-10-26 16:13:11,524 - root - INFO - Step 9140: lr=3.15E-06, loss= 1.0735 (max= 1.4651), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) 2025-10-26 16:13:11,524 - root - INFO - Step 9140: lr=3.15E-06, loss= 1.0735 (max= 1.4651), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) 2025-10-26 16:13:11,524 - root - INFO - Step 9140: lr=3.15E-06, loss= 1.0735 (max= 1.4651), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) 2025-10-26 16:13:11,524 - root - INFO - Step 9140: lr=3.15E-06, loss= 1.0735 (max= 1.4651), tps=20485, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) 2025-10-26 16:13:11,525 - root - INFO - Step 9140: lr=3.15E-06, loss= 1.0735 (max= 1.4651), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.05%) 2025-10-26 16:13:43,380 - root - INFO - Step 9150: lr=3.15E-06, loss= 1.0713 (max= 1.4579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:13:43,380 - root - INFO - Step 9150: lr=3.15E-06, loss= 1.0713 (max= 1.4579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:13:43,380 - root - INFO - Step 9150: lr=3.15E-06, loss= 1.0713 (max= 1.4579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:13:43,381 - root - INFO - Step 9150: lr=3.15E-06, loss= 1.0713 (max= 1.4579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:13:43,381 - root - INFO - Step 9150: lr=3.15E-06, loss= 1.0713 (max= 1.4579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:13:43,381 - root - INFO - Step 9150: lr=3.15E-06, loss= 1.0713 (max= 1.4579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:13:43,381 - root - INFO - Step 9150: lr=3.15E-06, loss= 1.0713 (max= 1.4579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:13:43,381 - root - INFO - Step 9150: lr=3.15E-06, loss= 1.0713 (max= 1.4579), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:15,509 - root - INFO - Step 9160: lr=3.14E-06, loss= 1.0823 (max= 1.5009), tps=20400, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:15,509 - root - INFO - Step 9160: lr=3.14E-06, loss= 1.0823 (max= 1.5009), tps=20400, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:15,509 - root - INFO - Step 9160: lr=3.14E-06, loss= 1.0823 (max= 1.5009), tps=20400, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:15,509 - root - INFO - Step 9160: lr=3.14E-06, loss= 1.0823 (max= 1.5009), tps=20400, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:15,509 - root - INFO - Step 9160: lr=3.14E-06, loss= 1.0823 (max= 1.5009), tps=20400, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:15,509 - root - INFO - Step 9160: lr=3.14E-06, loss= 1.0823 (max= 1.5009), tps=20400, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:15,509 - root - INFO - Step 9160: lr=3.14E-06, loss= 1.0823 (max= 1.5009), tps=20400, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:15,510 - root - INFO - Step 9160: lr=3.14E-06, loss= 1.0823 (max= 1.5009), tps=20401, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:47,639 - root - INFO - Step 9170: lr=3.14E-06, loss= 1.0620 (max= 1.5556), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:47,639 - root - INFO - Step 9170: lr=3.14E-06, loss= 1.0620 (max= 1.5556), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:47,639 - root - INFO - Step 9170: lr=3.14E-06, loss= 1.0620 (max= 1.5556), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:47,639 - root - INFO - Step 9170: lr=3.14E-06, loss= 1.0620 (max= 1.5556), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:47,640 - root - INFO - Step 9170: lr=3.14E-06, loss= 1.0620 (max= 1.5556), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:47,640 - root - INFO - Step 9170: lr=3.14E-06, loss= 1.0620 (max= 1.5556), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:47,640 - root - INFO - Step 9170: lr=3.14E-06, loss= 1.0620 (max= 1.5556), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:14:47,640 - root - INFO - Step 9170: lr=3.14E-06, loss= 1.0620 (max= 1.5556), tps=20399, mfu=42.50%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:19,462 - root - INFO - Step 9180: lr=3.14E-06, loss= 1.0683 (max= 1.4720), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:19,462 - root - INFO - Step 9180: lr=3.14E-06, loss= 1.0683 (max= 1.4720), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:19,462 - root - INFO - Step 9180: lr=3.14E-06, loss= 1.0683 (max= 1.4720), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:19,462 - root - INFO - Step 9180: lr=3.14E-06, loss= 1.0683 (max= 1.4720), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:19,463 - root - INFO - Step 9180: lr=3.14E-06, loss= 1.0683 (max= 1.4720), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:19,463 - root - INFO - Step 9180: lr=3.14E-06, loss= 1.0683 (max= 1.4720), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:19,463 - root - INFO - Step 9180: lr=3.14E-06, loss= 1.0683 (max= 1.4720), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:19,463 - root - INFO - Step 9180: lr=3.14E-06, loss= 1.0683 (max= 1.4720), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:51,428 - root - INFO - Step 9190: lr=3.13E-06, loss= 1.0907 (max= 1.5754), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:51,428 - root - INFO - Step 9190: lr=3.13E-06, loss= 1.0907 (max= 1.5754), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:51,428 - root - INFO - Step 9190: lr=3.13E-06, loss= 1.0907 (max= 1.5754), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:51,428 - root - INFO - Step 9190: lr=3.13E-06, loss= 1.0907 (max= 1.5754), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:51,429 - root - INFO - Step 9190: lr=3.13E-06, loss= 1.0907 (max= 1.5754), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:51,429 - root - INFO - Step 9190: lr=3.13E-06, loss= 1.0907 (max= 1.5754), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:51,429 - root - INFO - Step 9190: lr=3.13E-06, loss= 1.0907 (max= 1.5754), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:15:51,429 - root - INFO - Step 9190: lr=3.13E-06, loss= 1.0907 (max= 1.5754), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:23,253 - root - INFO - Step 9200: lr=3.13E-06, loss= 1.0696 (max= 1.5117), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:23,253 - root - INFO - Step 9200: lr=3.13E-06, loss= 1.0696 (max= 1.5117), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:23,253 - root - INFO - Step 9200: lr=3.13E-06, loss= 1.0696 (max= 1.5117), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:23,253 - root - INFO - Step 9200: lr=3.13E-06, loss= 1.0696 (max= 1.5117), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:23,254 - root - INFO - Step 9200: lr=3.13E-06, loss= 1.0696 (max= 1.5117), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:23,254 - root - INFO - Step 9200: lr=3.13E-06, loss= 1.0696 (max= 1.5117), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:23,254 - root - INFO - Step 9200: lr=3.13E-06, loss= 1.0696 (max= 1.5117), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:23,254 - root - INFO - Step 9200: lr=3.13E-06, loss= 1.0696 (max= 1.5117), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:55,187 - root - INFO - Step 9210: lr=3.12E-06, loss= 1.0636 (max= 1.5722), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:55,187 - root - INFO - Step 9210: lr=3.12E-06, loss= 1.0636 (max= 1.5722), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:55,187 - root - INFO - Step 9210: lr=3.12E-06, loss= 1.0636 (max= 1.5722), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:55,187 - root - INFO - Step 9210: lr=3.12E-06, loss= 1.0636 (max= 1.5722), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:55,187 - root - INFO - Step 9210: lr=3.12E-06, loss= 1.0636 (max= 1.5722), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:55,187 - root - INFO - Step 9210: lr=3.12E-06, loss= 1.0636 (max= 1.5722), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:55,187 - root - INFO - Step 9210: lr=3.12E-06, loss= 1.0636 (max= 1.5722), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:16:55,188 - root - INFO - Step 9210: lr=3.12E-06, loss= 1.0636 (max= 1.5722), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:27,070 - root - INFO - Step 9220: lr=3.12E-06, loss= 1.0714 (max= 1.5461), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:27,071 - root - INFO - Step 9220: lr=3.12E-06, loss= 1.0714 (max= 1.5461), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:27,071 - root - INFO - Step 9220: lr=3.12E-06, loss= 1.0714 (max= 1.5461), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:27,071 - root - INFO - Step 9220: lr=3.12E-06, loss= 1.0714 (max= 1.5461), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:27,071 - root - INFO - Step 9220: lr=3.12E-06, loss= 1.0714 (max= 1.5461), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:27,071 - root - INFO - Step 9220: lr=3.12E-06, loss= 1.0714 (max= 1.5461), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:27,071 - root - INFO - Step 9220: lr=3.12E-06, loss= 1.0714 (max= 1.5461), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:27,071 - root - INFO - Step 9220: lr=3.12E-06, loss= 1.0714 (max= 1.5461), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:58,933 - root - INFO - Step 9230: lr=3.12E-06, loss= 1.0748 (max= 1.4833), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:58,933 - root - INFO - Step 9230: lr=3.12E-06, loss= 1.0748 (max= 1.4833), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:58,933 - root - INFO - Step 9230: lr=3.12E-06, loss= 1.0748 (max= 1.4833), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:58,933 - root - INFO - Step 9230: lr=3.12E-06, loss= 1.0748 (max= 1.4833), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:58,933 - root - INFO - Step 9230: lr=3.12E-06, loss= 1.0748 (max= 1.4833), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:58,933 - root - INFO - Step 9230: lr=3.12E-06, loss= 1.0748 (max= 1.4833), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:58,933 - root - INFO - Step 9230: lr=3.12E-06, loss= 1.0748 (max= 1.4833), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:17:58,933 - root - INFO - Step 9230: lr=3.12E-06, loss= 1.0748 (max= 1.4833), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:18:30,802 - root - INFO - Step 9240: lr=3.11E-06, loss= 1.0691 (max= 1.4266), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:18:30,802 - root - INFO - Step 9240: lr=3.11E-06, loss= 1.0691 (max= 1.4266), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:18:30,802 - root - INFO - Step 9240: lr=3.11E-06, loss= 1.0691 (max= 1.4266), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:18:30,802 - root - INFO - Step 9240: lr=3.11E-06, loss= 1.0691 (max= 1.4266), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:18:30,802 - root - INFO - Step 9240: lr=3.11E-06, loss= 1.0691 (max= 1.4266), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:18:30,802 - root - INFO - Step 9240: lr=3.11E-06, loss= 1.0691 (max= 1.4266), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:18:30,803 - root - INFO - Step 9240: lr=3.11E-06, loss= 1.0691 (max= 1.4266), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:18:30,803 - root - INFO - Step 9240: lr=3.11E-06, loss= 1.0691 (max= 1.4266), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:02,691 - root - INFO - Step 9250: lr=3.11E-06, loss= 1.0645 (max= 1.6940), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:02,691 - root - INFO - Step 9250: lr=3.11E-06, loss= 1.0645 (max= 1.6940), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:02,691 - root - INFO - Step 9250: lr=3.11E-06, loss= 1.0645 (max= 1.6940), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:02,691 - root - INFO - Step 9250: lr=3.11E-06, loss= 1.0645 (max= 1.6940), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:02,691 - root - INFO - Step 9250: lr=3.11E-06, loss= 1.0645 (max= 1.6940), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:02,691 - root - INFO - Step 9250: lr=3.11E-06, loss= 1.0645 (max= 1.6940), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:02,691 - root - INFO - Step 9250: lr=3.11E-06, loss= 1.0645 (max= 1.6940), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:02,692 - root - INFO - Step 9250: lr=3.11E-06, loss= 1.0645 (max= 1.6940), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:34,579 - root - INFO - Step 9260: lr=3.10E-06, loss= 1.0747 (max= 1.5103), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:34,579 - root - INFO - Step 9260: lr=3.10E-06, loss= 1.0747 (max= 1.5103), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:34,580 - root - INFO - Step 9260: lr=3.10E-06, loss= 1.0747 (max= 1.5103), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:34,580 - root - INFO - Step 9260: lr=3.10E-06, loss= 1.0747 (max= 1.5103), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:34,580 - root - INFO - Step 9260: lr=3.10E-06, loss= 1.0747 (max= 1.5103), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:34,580 - root - INFO - Step 9260: lr=3.10E-06, loss= 1.0747 (max= 1.5103), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:34,580 - root - INFO - Step 9260: lr=3.10E-06, loss= 1.0747 (max= 1.5103), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:19:34,580 - root - INFO - Step 9260: lr=3.10E-06, loss= 1.0747 (max= 1.5103), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:06,398 - root - INFO - Step 9270: lr=3.10E-06, loss= 1.0779 (max= 1.4105), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:06,398 - root - INFO - Step 9270: lr=3.10E-06, loss= 1.0779 (max= 1.4105), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:06,398 - root - INFO - Step 9270: lr=3.10E-06, loss= 1.0779 (max= 1.4105), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:06,398 - root - INFO - Step 9270: lr=3.10E-06, loss= 1.0779 (max= 1.4105), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:06,398 - root - INFO - Step 9270: lr=3.10E-06, loss= 1.0779 (max= 1.4105), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:06,398 - root - INFO - Step 9270: lr=3.10E-06, loss= 1.0779 (max= 1.4105), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:06,398 - root - INFO - Step 9270: lr=3.10E-06, loss= 1.0779 (max= 1.4105), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:06,398 - root - INFO - Step 9270: lr=3.10E-06, loss= 1.0779 (max= 1.4105), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:38,254 - root - INFO - Step 9280: lr=3.10E-06, loss= 1.0789 (max= 1.5632), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:38,254 - root - INFO - Step 9280: lr=3.10E-06, loss= 1.0789 (max= 1.5632), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:38,254 - root - INFO - Step 9280: lr=3.10E-06, loss= 1.0789 (max= 1.5632), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:38,254 - root - INFO - Step 9280: lr=3.10E-06, loss= 1.0789 (max= 1.5632), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:38,254 - root - INFO - Step 9280: lr=3.10E-06, loss= 1.0789 (max= 1.5632), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:38,254 - root - INFO - Step 9280: lr=3.10E-06, loss= 1.0789 (max= 1.5632), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:38,254 - root - INFO - Step 9280: lr=3.10E-06, loss= 1.0789 (max= 1.5632), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:20:38,255 - root - INFO - Step 9280: lr=3.10E-06, loss= 1.0789 (max= 1.5632), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:10,117 - root - INFO - Step 9290: lr=3.09E-06, loss= 1.0586 (max= 1.7066), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:10,117 - root - INFO - Step 9290: lr=3.09E-06, loss= 1.0586 (max= 1.7066), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:10,117 - root - INFO - Step 9290: lr=3.09E-06, loss= 1.0586 (max= 1.7066), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:10,117 - root - INFO - Step 9290: lr=3.09E-06, loss= 1.0586 (max= 1.7066), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:10,117 - root - INFO - Step 9290: lr=3.09E-06, loss= 1.0586 (max= 1.7066), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:10,117 - root - INFO - Step 9290: lr=3.09E-06, loss= 1.0586 (max= 1.7066), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:10,117 - root - INFO - Step 9290: lr=3.09E-06, loss= 1.0586 (max= 1.7066), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:10,118 - root - INFO - Step 9290: lr=3.09E-06, loss= 1.0586 (max= 1.7066), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:41,848 - root - INFO - Step 9300: lr=3.09E-06, loss= 1.0920 (max= 1.5222), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:41,848 - root - INFO - Step 9300: lr=3.09E-06, loss= 1.0920 (max= 1.5222), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:41,848 - root - INFO - Step 9300: lr=3.09E-06, loss= 1.0920 (max= 1.5222), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:41,848 - root - INFO - Step 9300: lr=3.09E-06, loss= 1.0920 (max= 1.5222), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:41,848 - root - INFO - Step 9300: lr=3.09E-06, loss= 1.0920 (max= 1.5222), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:41,848 - root - INFO - Step 9300: lr=3.09E-06, loss= 1.0920 (max= 1.5222), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:41,848 - root - INFO - Step 9300: lr=3.09E-06, loss= 1.0920 (max= 1.5222), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:21:41,849 - root - INFO - Step 9300: lr=3.09E-06, loss= 1.0920 (max= 1.5222), tps=20656, mfu=43.04%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:13,973 - root - INFO - Step 9310: lr=3.09E-06, loss= 1.0953 (max= 1.5949), tps=20402, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:13,973 - root - INFO - Step 9310: lr=3.09E-06, loss= 1.0953 (max= 1.5949), tps=20402, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:13,973 - root - INFO - Step 9310: lr=3.09E-06, loss= 1.0953 (max= 1.5949), tps=20402, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:13,974 - root - INFO - Step 9310: lr=3.09E-06, loss= 1.0953 (max= 1.5949), tps=20402, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:13,974 - root - INFO - Step 9310: lr=3.09E-06, loss= 1.0953 (max= 1.5949), tps=20402, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:13,974 - root - INFO - Step 9310: lr=3.09E-06, loss= 1.0953 (max= 1.5949), tps=20402, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:13,974 - root - INFO - Step 9310: lr=3.09E-06, loss= 1.0953 (max= 1.5949), tps=20402, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:13,974 - root - INFO - Step 9310: lr=3.09E-06, loss= 1.0953 (max= 1.5949), tps=20403, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:45,786 - root - INFO - Step 9320: lr=3.08E-06, loss= 1.0665 (max= 1.4575), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:45,786 - root - INFO - Step 9320: lr=3.08E-06, loss= 1.0665 (max= 1.4575), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:45,786 - root - INFO - Step 9320: lr=3.08E-06, loss= 1.0665 (max= 1.4575), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:45,787 - root - INFO - Step 9320: lr=3.08E-06, loss= 1.0665 (max= 1.4575), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:45,787 - root - INFO - Step 9320: lr=3.08E-06, loss= 1.0665 (max= 1.4575), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:45,787 - root - INFO - Step 9320: lr=3.08E-06, loss= 1.0665 (max= 1.4575), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:45,787 - root - INFO - Step 9320: lr=3.08E-06, loss= 1.0665 (max= 1.4575), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:22:45,787 - root - INFO - Step 9320: lr=3.08E-06, loss= 1.0665 (max= 1.4575), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:17,635 - root - INFO - Step 9330: lr=3.08E-06, loss= 1.0494 (max= 1.4100), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:17,635 - root - INFO - Step 9330: lr=3.08E-06, loss= 1.0494 (max= 1.4100), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:17,635 - root - INFO - Step 9330: lr=3.08E-06, loss= 1.0494 (max= 1.4100), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:17,636 - root - INFO - Step 9330: lr=3.08E-06, loss= 1.0494 (max= 1.4100), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:17,636 - root - INFO - Step 9330: lr=3.08E-06, loss= 1.0494 (max= 1.4100), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:17,636 - root - INFO - Step 9330: lr=3.08E-06, loss= 1.0494 (max= 1.4100), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:17,636 - root - INFO - Step 9330: lr=3.08E-06, loss= 1.0494 (max= 1.4100), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:17,637 - root - INFO - Step 9330: lr=3.08E-06, loss= 1.0494 (max= 1.4100), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:49,465 - root - INFO - Step 9340: lr=3.07E-06, loss= 1.0607 (max= 1.5071), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:49,465 - root - INFO - Step 9340: lr=3.07E-06, loss= 1.0607 (max= 1.5071), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:49,465 - root - INFO - Step 9340: lr=3.07E-06, loss= 1.0607 (max= 1.5071), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:49,465 - root - INFO - Step 9340: lr=3.07E-06, loss= 1.0607 (max= 1.5071), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:49,465 - root - INFO - Step 9340: lr=3.07E-06, loss= 1.0607 (max= 1.5071), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:49,466 - root - INFO - Step 9340: lr=3.07E-06, loss= 1.0607 (max= 1.5071), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:49,466 - root - INFO - Step 9340: lr=3.07E-06, loss= 1.0607 (max= 1.5071), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:23:49,466 - root - INFO - Step 9340: lr=3.07E-06, loss= 1.0607 (max= 1.5071), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:21,288 - root - INFO - Step 9350: lr=3.07E-06, loss= 1.0786 (max= 1.5627), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:21,288 - root - INFO - Step 9350: lr=3.07E-06, loss= 1.0786 (max= 1.5627), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:21,288 - root - INFO - Step 9350: lr=3.07E-06, loss= 1.0786 (max= 1.5627), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:21,288 - root - INFO - Step 9350: lr=3.07E-06, loss= 1.0786 (max= 1.5627), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:21,288 - root - INFO - Step 9350: lr=3.07E-06, loss= 1.0786 (max= 1.5627), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:21,288 - root - INFO - Step 9350: lr=3.07E-06, loss= 1.0786 (max= 1.5627), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:21,288 - root - INFO - Step 9350: lr=3.07E-06, loss= 1.0786 (max= 1.5627), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:21,289 - root - INFO - Step 9350: lr=3.07E-06, loss= 1.0786 (max= 1.5627), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:53,112 - root - INFO - Step 9360: lr=3.07E-06, loss= 1.0833 (max= 1.6866), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:53,112 - root - INFO - Step 9360: lr=3.07E-06, loss= 1.0833 (max= 1.6866), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:53,112 - root - INFO - Step 9360: lr=3.07E-06, loss= 1.0833 (max= 1.6866), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:53,112 - root - INFO - Step 9360: lr=3.07E-06, loss= 1.0833 (max= 1.6866), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:53,112 - root - INFO - Step 9360: lr=3.07E-06, loss= 1.0833 (max= 1.6866), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:53,112 - root - INFO - Step 9360: lr=3.07E-06, loss= 1.0833 (max= 1.6866), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:53,112 - root - INFO - Step 9360: lr=3.07E-06, loss= 1.0833 (max= 1.6866), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:24:53,113 - root - INFO - Step 9360: lr=3.07E-06, loss= 1.0833 (max= 1.6866), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:24,984 - root - INFO - Step 9370: lr=3.06E-06, loss= 1.0520 (max= 1.5119), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:24,984 - root - INFO - Step 9370: lr=3.06E-06, loss= 1.0520 (max= 1.5119), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:24,984 - root - INFO - Step 9370: lr=3.06E-06, loss= 1.0520 (max= 1.5119), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:24,984 - root - INFO - Step 9370: lr=3.06E-06, loss= 1.0520 (max= 1.5119), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:24,984 - root - INFO - Step 9370: lr=3.06E-06, loss= 1.0520 (max= 1.5119), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:24,984 - root - INFO - Step 9370: lr=3.06E-06, loss= 1.0520 (max= 1.5119), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:24,984 - root - INFO - Step 9370: lr=3.06E-06, loss= 1.0520 (max= 1.5119), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:24,985 - root - INFO - Step 9370: lr=3.06E-06, loss= 1.0520 (max= 1.5119), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:56,884 - root - INFO - Step 9380: lr=3.06E-06, loss= 1.0858 (max= 1.4853), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:56,884 - root - INFO - Step 9380: lr=3.06E-06, loss= 1.0858 (max= 1.4853), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:56,884 - root - INFO - Step 9380: lr=3.06E-06, loss= 1.0858 (max= 1.4853), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:56,885 - root - INFO - Step 9380: lr=3.06E-06, loss= 1.0858 (max= 1.4853), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:56,885 - root - INFO - Step 9380: lr=3.06E-06, loss= 1.0858 (max= 1.4853), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:56,885 - root - INFO - Step 9380: lr=3.06E-06, loss= 1.0858 (max= 1.4853), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:56,885 - root - INFO - Step 9380: lr=3.06E-06, loss= 1.0858 (max= 1.4853), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:25:56,885 - root - INFO - Step 9380: lr=3.06E-06, loss= 1.0858 (max= 1.4853), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:26:28,716 - root - INFO - Step 9390: lr=3.05E-06, loss= 1.0663 (max= 1.5527), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:26:28,716 - root - INFO - Step 9390: lr=3.05E-06, loss= 1.0663 (max= 1.5527), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:26:28,716 - root - INFO - Step 9390: lr=3.05E-06, loss= 1.0663 (max= 1.5527), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:26:28,717 - root - INFO - Step 9390: lr=3.05E-06, loss= 1.0663 (max= 1.5527), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:26:28,717 - root - INFO - Step 9390: lr=3.05E-06, loss= 1.0663 (max= 1.5527), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:26:28,717 - root - INFO - Step 9390: lr=3.05E-06, loss= 1.0663 (max= 1.5527), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:26:28,717 - root - INFO - Step 9390: lr=3.05E-06, loss= 1.0663 (max= 1.5527), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:26:28,717 - root - INFO - Step 9390: lr=3.05E-06, loss= 1.0663 (max= 1.5527), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:00,590 - root - INFO - Step 9400: lr=3.05E-06, loss= 1.0721 (max= 1.8765), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:00,590 - root - INFO - Step 9400: lr=3.05E-06, loss= 1.0721 (max= 1.8765), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:00,590 - root - INFO - Step 9400: lr=3.05E-06, loss= 1.0721 (max= 1.8765), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:00,590 - root - INFO - Step 9400: lr=3.05E-06, loss= 1.0721 (max= 1.8765), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:00,590 - root - INFO - Step 9400: lr=3.05E-06, loss= 1.0721 (max= 1.8765), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:00,590 - root - INFO - Step 9400: lr=3.05E-06, loss= 1.0721 (max= 1.8765), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:00,590 - root - INFO - Step 9400: lr=3.05E-06, loss= 1.0721 (max= 1.8765), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:00,591 - root - INFO - Step 9400: lr=3.05E-06, loss= 1.0721 (max= 1.8765), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:32,449 - root - INFO - Step 9410: lr=3.05E-06, loss= 1.0514 (max= 1.4837), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:32,449 - root - INFO - Step 9410: lr=3.05E-06, loss= 1.0514 (max= 1.4837), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:32,449 - root - INFO - Step 9410: lr=3.05E-06, loss= 1.0514 (max= 1.4837), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:32,449 - root - INFO - Step 9410: lr=3.05E-06, loss= 1.0514 (max= 1.4837), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:32,449 - root - INFO - Step 9410: lr=3.05E-06, loss= 1.0514 (max= 1.4837), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:32,449 - root - INFO - Step 9410: lr=3.05E-06, loss= 1.0514 (max= 1.4837), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:32,449 - root - INFO - Step 9410: lr=3.05E-06, loss= 1.0514 (max= 1.4837), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:27:32,450 - root - INFO - Step 9410: lr=3.05E-06, loss= 1.0514 (max= 1.4837), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:04,274 - root - INFO - Step 9420: lr=3.04E-06, loss= 1.0907 (max= 1.5621), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:04,274 - root - INFO - Step 9420: lr=3.04E-06, loss= 1.0907 (max= 1.5621), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:04,274 - root - INFO - Step 9420: lr=3.04E-06, loss= 1.0907 (max= 1.5621), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:04,274 - root - INFO - Step 9420: lr=3.04E-06, loss= 1.0907 (max= 1.5621), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:04,274 - root - INFO - Step 9420: lr=3.04E-06, loss= 1.0907 (max= 1.5621), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:04,274 - root - INFO - Step 9420: lr=3.04E-06, loss= 1.0907 (max= 1.5621), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:04,274 - root - INFO - Step 9420: lr=3.04E-06, loss= 1.0907 (max= 1.5621), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:04,275 - root - INFO - Step 9420: lr=3.04E-06, loss= 1.0907 (max= 1.5621), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:19,397 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:3352295 2025-10-26 16:28:36,142 - root - INFO - Step 9430: lr=3.04E-06, loss= 1.0449 (max= 1.4581), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:36,142 - root - INFO - Step 9430: lr=3.04E-06, loss= 1.0449 (max= 1.4581), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:36,142 - root - INFO - Step 9430: lr=3.04E-06, loss= 1.0449 (max= 1.4581), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:36,142 - root - INFO - Step 9430: lr=3.04E-06, loss= 1.0449 (max= 1.4581), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:36,142 - root - INFO - Step 9430: lr=3.04E-06, loss= 1.0449 (max= 1.4581), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:36,142 - root - INFO - Step 9430: lr=3.04E-06, loss= 1.0449 (max= 1.4581), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:36,142 - root - INFO - Step 9430: lr=3.04E-06, loss= 1.0449 (max= 1.4581), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:28:36,143 - root - INFO - Step 9430: lr=3.04E-06, loss= 1.0449 (max= 1.4581), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:07,966 - root - INFO - Step 9440: lr=3.03E-06, loss= 1.0742 (max= 1.8378), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:07,966 - root - INFO - Step 9440: lr=3.03E-06, loss= 1.0742 (max= 1.8378), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:07,966 - root - INFO - Step 9440: lr=3.03E-06, loss= 1.0742 (max= 1.8378), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:07,966 - root - INFO - Step 9440: lr=3.03E-06, loss= 1.0742 (max= 1.8378), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:07,966 - root - INFO - Step 9440: lr=3.03E-06, loss= 1.0742 (max= 1.8378), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:07,966 - root - INFO - Step 9440: lr=3.03E-06, loss= 1.0742 (max= 1.8378), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:07,966 - root - INFO - Step 9440: lr=3.03E-06, loss= 1.0742 (max= 1.8378), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:07,966 - root - INFO - Step 9440: lr=3.03E-06, loss= 1.0742 (max= 1.8378), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:39,843 - root - INFO - Step 9450: lr=3.03E-06, loss= 1.0534 (max= 1.5651), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:39,843 - root - INFO - Step 9450: lr=3.03E-06, loss= 1.0534 (max= 1.5651), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:39,843 - root - INFO - Step 9450: lr=3.03E-06, loss= 1.0534 (max= 1.5651), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:39,843 - root - INFO - Step 9450: lr=3.03E-06, loss= 1.0534 (max= 1.5651), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:39,843 - root - INFO - Step 9450: lr=3.03E-06, loss= 1.0534 (max= 1.5651), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:39,844 - root - INFO - Step 9450: lr=3.03E-06, loss= 1.0534 (max= 1.5651), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:39,844 - root - INFO - Step 9450: lr=3.03E-06, loss= 1.0534 (max= 1.5651), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:29:39,844 - root - INFO - Step 9450: lr=3.03E-06, loss= 1.0534 (max= 1.5651), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:11,689 - root - INFO - Step 9460: lr=3.03E-06, loss= 1.0532 (max= 1.5014), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:11,689 - root - INFO - Step 9460: lr=3.03E-06, loss= 1.0532 (max= 1.5014), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:11,689 - root - INFO - Step 9460: lr=3.03E-06, loss= 1.0532 (max= 1.5014), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:11,689 - root - INFO - Step 9460: lr=3.03E-06, loss= 1.0532 (max= 1.5014), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:11,689 - root - INFO - Step 9460: lr=3.03E-06, loss= 1.0532 (max= 1.5014), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:11,689 - root - INFO - Step 9460: lr=3.03E-06, loss= 1.0532 (max= 1.5014), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:11,690 - root - INFO - Step 9460: lr=3.03E-06, loss= 1.0532 (max= 1.5014), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:11,690 - root - INFO - Step 9460: lr=3.03E-06, loss= 1.0532 (max= 1.5014), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:43,618 - root - INFO - Step 9470: lr=3.02E-06, loss= 1.0689 (max= 1.6476), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:43,618 - root - INFO - Step 9470: lr=3.02E-06, loss= 1.0689 (max= 1.6476), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:43,618 - root - INFO - Step 9470: lr=3.02E-06, loss= 1.0689 (max= 1.6476), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:43,618 - root - INFO - Step 9470: lr=3.02E-06, loss= 1.0689 (max= 1.6476), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:43,618 - root - INFO - Step 9470: lr=3.02E-06, loss= 1.0689 (max= 1.6476), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:43,618 - root - INFO - Step 9470: lr=3.02E-06, loss= 1.0689 (max= 1.6476), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:43,619 - root - INFO - Step 9470: lr=3.02E-06, loss= 1.0689 (max= 1.6476), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:30:43,619 - root - INFO - Step 9470: lr=3.02E-06, loss= 1.0689 (max= 1.6476), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:15,499 - root - INFO - Step 9480: lr=3.02E-06, loss= 1.0448 (max= 1.5817), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:15,499 - root - INFO - Step 9480: lr=3.02E-06, loss= 1.0448 (max= 1.5817), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:15,499 - root - INFO - Step 9480: lr=3.02E-06, loss= 1.0448 (max= 1.5817), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:15,499 - root - INFO - Step 9480: lr=3.02E-06, loss= 1.0448 (max= 1.5817), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:15,499 - root - INFO - Step 9480: lr=3.02E-06, loss= 1.0448 (max= 1.5817), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:15,499 - root - INFO - Step 9480: lr=3.02E-06, loss= 1.0448 (max= 1.5817), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:15,499 - root - INFO - Step 9480: lr=3.02E-06, loss= 1.0448 (max= 1.5817), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:15,500 - root - INFO - Step 9480: lr=3.02E-06, loss= 1.0448 (max= 1.5817), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:47,504 - root - INFO - Step 9490: lr=3.02E-06, loss= 1.0592 (max= 1.5514), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:47,504 - root - INFO - Step 9490: lr=3.02E-06, loss= 1.0592 (max= 1.5514), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:47,504 - root - INFO - Step 9490: lr=3.02E-06, loss= 1.0592 (max= 1.5514), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:47,504 - root - INFO - Step 9490: lr=3.02E-06, loss= 1.0592 (max= 1.5514), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:47,504 - root - INFO - Step 9490: lr=3.02E-06, loss= 1.0592 (max= 1.5514), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:47,505 - root - INFO - Step 9490: lr=3.02E-06, loss= 1.0592 (max= 1.5514), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:47,505 - root - INFO - Step 9490: lr=3.02E-06, loss= 1.0592 (max= 1.5514), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:31:47,505 - root - INFO - Step 9490: lr=3.02E-06, loss= 1.0592 (max= 1.5514), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:19,291 - root - INFO - Step 9500: lr=3.01E-06, loss= 1.0550 (max= 2.0485), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:19,291 - root - INFO - Step 9500: lr=3.01E-06, loss= 1.0550 (max= 2.0485), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:19,291 - root - INFO - Step 9500: lr=3.01E-06, loss= 1.0550 (max= 2.0485), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:19,291 - root - INFO - Step 9500: lr=3.01E-06, loss= 1.0550 (max= 2.0485), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:19,291 - root - INFO - Step 9500: lr=3.01E-06, loss= 1.0550 (max= 2.0485), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:19,291 - root - INFO - Step 9500: lr=3.01E-06, loss= 1.0550 (max= 2.0485), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:19,291 - root - INFO - Step 9500: lr=3.01E-06, loss= 1.0550 (max= 2.0485), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:19,292 - root - INFO - Step 9500: lr=3.01E-06, loss= 1.0550 (max= 2.0485), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:51,168 - root - INFO - Step 9510: lr=3.01E-06, loss= 1.0371 (max= 1.7186), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:51,168 - root - INFO - Step 9510: lr=3.01E-06, loss= 1.0371 (max= 1.7186), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:51,168 - root - INFO - Step 9510: lr=3.01E-06, loss= 1.0371 (max= 1.7186), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:51,168 - root - INFO - Step 9510: lr=3.01E-06, loss= 1.0371 (max= 1.7186), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:51,168 - root - INFO - Step 9510: lr=3.01E-06, loss= 1.0371 (max= 1.7186), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:51,168 - root - INFO - Step 9510: lr=3.01E-06, loss= 1.0371 (max= 1.7186), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:51,168 - root - INFO - Step 9510: lr=3.01E-06, loss= 1.0371 (max= 1.7186), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:32:51,169 - root - INFO - Step 9510: lr=3.01E-06, loss= 1.0371 (max= 1.7186), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:23,030 - root - INFO - Step 9520: lr=3.00E-06, loss= 1.0560 (max= 1.5137), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:23,030 - root - INFO - Step 9520: lr=3.00E-06, loss= 1.0560 (max= 1.5137), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:23,030 - root - INFO - Step 9520: lr=3.00E-06, loss= 1.0560 (max= 1.5137), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:23,030 - root - INFO - Step 9520: lr=3.00E-06, loss= 1.0560 (max= 1.5137), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:23,030 - root - INFO - Step 9520: lr=3.00E-06, loss= 1.0560 (max= 1.5137), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:23,030 - root - INFO - Step 9520: lr=3.00E-06, loss= 1.0560 (max= 1.5137), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:23,030 - root - INFO - Step 9520: lr=3.00E-06, loss= 1.0560 (max= 1.5137), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:23,030 - root - INFO - Step 9520: lr=3.00E-06, loss= 1.0560 (max= 1.5137), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:54,811 - root - INFO - Step 9530: lr=3.00E-06, loss= 1.0437 (max= 1.5328), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:54,811 - root - INFO - Step 9530: lr=3.00E-06, loss= 1.0437 (max= 1.5328), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:54,811 - root - INFO - Step 9530: lr=3.00E-06, loss= 1.0437 (max= 1.5328), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:54,811 - root - INFO - Step 9530: lr=3.00E-06, loss= 1.0437 (max= 1.5328), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:54,811 - root - INFO - Step 9530: lr=3.00E-06, loss= 1.0437 (max= 1.5328), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:54,811 - root - INFO - Step 9530: lr=3.00E-06, loss= 1.0437 (max= 1.5328), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:54,811 - root - INFO - Step 9530: lr=3.00E-06, loss= 1.0437 (max= 1.5328), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:33:54,812 - root - INFO - Step 9530: lr=3.00E-06, loss= 1.0437 (max= 1.5328), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:26,751 - root - INFO - Step 9540: lr=3.00E-06, loss= 1.0523 (max= 1.4663), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:26,751 - root - INFO - Step 9540: lr=3.00E-06, loss= 1.0523 (max= 1.4663), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:26,751 - root - INFO - Step 9540: lr=3.00E-06, loss= 1.0523 (max= 1.4663), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:26,751 - root - INFO - Step 9540: lr=3.00E-06, loss= 1.0523 (max= 1.4663), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:26,751 - root - INFO - Step 9540: lr=3.00E-06, loss= 1.0523 (max= 1.4663), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:26,751 - root - INFO - Step 9540: lr=3.00E-06, loss= 1.0523 (max= 1.4663), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:26,751 - root - INFO - Step 9540: lr=3.00E-06, loss= 1.0523 (max= 1.4663), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:26,751 - root - INFO - Step 9540: lr=3.00E-06, loss= 1.0523 (max= 1.4663), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:58,548 - root - INFO - Step 9550: lr=2.99E-06, loss= 1.0662 (max= 1.6266), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:58,548 - root - INFO - Step 9550: lr=2.99E-06, loss= 1.0662 (max= 1.6266), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:58,548 - root - INFO - Step 9550: lr=2.99E-06, loss= 1.0662 (max= 1.6266), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:58,548 - root - INFO - Step 9550: lr=2.99E-06, loss= 1.0662 (max= 1.6266), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:58,548 - root - INFO - Step 9550: lr=2.99E-06, loss= 1.0662 (max= 1.6266), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:58,548 - root - INFO - Step 9550: lr=2.99E-06, loss= 1.0662 (max= 1.6266), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:58,548 - root - INFO - Step 9550: lr=2.99E-06, loss= 1.0662 (max= 1.6266), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:34:58,548 - root - INFO - Step 9550: lr=2.99E-06, loss= 1.0662 (max= 1.6266), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:35:30,447 - root - INFO - Step 9560: lr=2.99E-06, loss= 1.0411 (max= 1.4598), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:35:30,447 - root - INFO - Step 9560: lr=2.99E-06, loss= 1.0411 (max= 1.4598), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:35:30,447 - root - INFO - Step 9560: lr=2.99E-06, loss= 1.0411 (max= 1.4598), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:35:30,447 - root - INFO - Step 9560: lr=2.99E-06, loss= 1.0411 (max= 1.4598), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:35:30,447 - root - INFO - Step 9560: lr=2.99E-06, loss= 1.0411 (max= 1.4598), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:35:30,447 - root - INFO - Step 9560: lr=2.99E-06, loss= 1.0411 (max= 1.4598), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:35:30,447 - root - INFO - Step 9560: lr=2.99E-06, loss= 1.0411 (max= 1.4598), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:35:30,448 - root - INFO - Step 9560: lr=2.99E-06, loss= 1.0411 (max= 1.4598), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:02,375 - root - INFO - Step 9570: lr=2.98E-06, loss= 1.0558 (max= 1.4578), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:02,375 - root - INFO - Step 9570: lr=2.98E-06, loss= 1.0558 (max= 1.4578), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:02,375 - root - INFO - Step 9570: lr=2.98E-06, loss= 1.0558 (max= 1.4578), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:02,375 - root - INFO - Step 9570: lr=2.98E-06, loss= 1.0558 (max= 1.4578), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:02,375 - root - INFO - Step 9570: lr=2.98E-06, loss= 1.0558 (max= 1.4578), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:02,375 - root - INFO - Step 9570: lr=2.98E-06, loss= 1.0558 (max= 1.4578), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:02,375 - root - INFO - Step 9570: lr=2.98E-06, loss= 1.0558 (max= 1.4578), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:02,375 - root - INFO - Step 9570: lr=2.98E-06, loss= 1.0558 (max= 1.4578), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:34,158 - root - INFO - Step 9580: lr=2.98E-06, loss= 1.0308 (max= 1.3446), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:34,158 - root - INFO - Step 9580: lr=2.98E-06, loss= 1.0308 (max= 1.3446), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:34,158 - root - INFO - Step 9580: lr=2.98E-06, loss= 1.0308 (max= 1.3446), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:34,158 - root - INFO - Step 9580: lr=2.98E-06, loss= 1.0308 (max= 1.3446), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:34,158 - root - INFO - Step 9580: lr=2.98E-06, loss= 1.0308 (max= 1.3446), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:34,158 - root - INFO - Step 9580: lr=2.98E-06, loss= 1.0308 (max= 1.3446), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:34,158 - root - INFO - Step 9580: lr=2.98E-06, loss= 1.0308 (max= 1.3446), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:36:34,158 - root - INFO - Step 9580: lr=2.98E-06, loss= 1.0308 (max= 1.3446), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:06,075 - root - INFO - Step 9590: lr=2.98E-06, loss= 1.0799 (max= 1.7792), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:06,075 - root - INFO - Step 9590: lr=2.98E-06, loss= 1.0799 (max= 1.7792), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:06,075 - root - INFO - Step 9590: lr=2.98E-06, loss= 1.0799 (max= 1.7792), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:06,075 - root - INFO - Step 9590: lr=2.98E-06, loss= 1.0799 (max= 1.7792), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:06,075 - root - INFO - Step 9590: lr=2.98E-06, loss= 1.0799 (max= 1.7792), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:06,075 - root - INFO - Step 9590: lr=2.98E-06, loss= 1.0799 (max= 1.7792), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:06,075 - root - INFO - Step 9590: lr=2.98E-06, loss= 1.0799 (max= 1.7792), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:06,076 - root - INFO - Step 9590: lr=2.98E-06, loss= 1.0799 (max= 1.7792), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:38,046 - root - INFO - Step 9600: lr=2.97E-06, loss= 1.0661 (max= 1.4613), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:38,046 - root - INFO - Step 9600: lr=2.97E-06, loss= 1.0661 (max= 1.4613), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:38,046 - root - INFO - Step 9600: lr=2.97E-06, loss= 1.0661 (max= 1.4613), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:38,046 - root - INFO - Step 9600: lr=2.97E-06, loss= 1.0661 (max= 1.4613), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:38,046 - root - INFO - Step 9600: lr=2.97E-06, loss= 1.0661 (max= 1.4613), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:38,047 - root - INFO - Step 9600: lr=2.97E-06, loss= 1.0661 (max= 1.4613), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:38,047 - root - INFO - Step 9600: lr=2.97E-06, loss= 1.0661 (max= 1.4613), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:37:38,047 - root - INFO - Step 9600: lr=2.97E-06, loss= 1.0661 (max= 1.4613), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:38:10,915 - root - INFO - Step 9610: lr=2.97E-06, loss= 1.0510 (max= 1.6250), tps=19941, mfu=41.55%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.10s, 5.84%) 2025-10-26 16:38:10,915 - root - INFO - Step 9610: lr=2.97E-06, loss= 1.0510 (max= 1.6250), tps=19941, mfu=41.55%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.10s, 5.84%) 2025-10-26 16:38:10,915 - root - INFO - Step 9610: lr=2.97E-06, loss= 1.0510 (max= 1.6250), tps=19941, mfu=41.55%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.10s, 5.84%) 2025-10-26 16:38:10,915 - root - INFO - Step 9610: lr=2.97E-06, loss= 1.0510 (max= 1.6250), tps=19941, mfu=41.55%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.10s, 5.84%) 2025-10-26 16:38:10,915 - root - INFO - Step 9610: lr=2.97E-06, loss= 1.0510 (max= 1.6250), tps=19941, mfu=41.55%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.10s, 5.84%) 2025-10-26 16:38:10,915 - root - INFO - Step 9610: lr=2.97E-06, loss= 1.0510 (max= 1.6250), tps=19940, mfu=41.55%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.10s, 5.84%) 2025-10-26 16:38:10,915 - root - INFO - Step 9610: lr=2.97E-06, loss= 1.0510 (max= 1.6250), tps=19941, mfu=41.55%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.10s, 5.84%) 2025-10-26 16:38:10,916 - root - INFO - Step 9610: lr=2.97E-06, loss= 1.0510 (max= 1.6250), tps=19941, mfu=41.55%, memory: 154.31GiB(86.51%) time/data_loading=0.01s (max=0.10s, 5.84%) 2025-10-26 16:38:42,731 - root - INFO - Step 9620: lr=2.96E-06, loss= 1.0861 (max= 1.6122), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:38:42,731 - root - INFO - Step 9620: lr=2.96E-06, loss= 1.0861 (max= 1.6122), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:38:42,731 - root - INFO - Step 9620: lr=2.96E-06, loss= 1.0861 (max= 1.6122), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:38:42,731 - root - INFO - Step 9620: lr=2.96E-06, loss= 1.0861 (max= 1.6122), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:38:42,731 - root - INFO - Step 9620: lr=2.96E-06, loss= 1.0861 (max= 1.6122), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:38:42,731 - root - INFO - Step 9620: lr=2.96E-06, loss= 1.0861 (max= 1.6122), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:38:42,731 - root - INFO - Step 9620: lr=2.96E-06, loss= 1.0861 (max= 1.6122), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:38:42,732 - root - INFO - Step 9620: lr=2.96E-06, loss= 1.0861 (max= 1.6122), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:14,596 - root - INFO - Step 9630: lr=2.96E-06, loss= 1.0715 (max= 1.6042), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:14,596 - root - INFO - Step 9630: lr=2.96E-06, loss= 1.0715 (max= 1.6042), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:14,596 - root - INFO - Step 9630: lr=2.96E-06, loss= 1.0715 (max= 1.6042), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:14,596 - root - INFO - Step 9630: lr=2.96E-06, loss= 1.0715 (max= 1.6042), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:14,596 - root - INFO - Step 9630: lr=2.96E-06, loss= 1.0715 (max= 1.6042), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:14,596 - root - INFO - Step 9630: lr=2.96E-06, loss= 1.0715 (max= 1.6042), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:14,596 - root - INFO - Step 9630: lr=2.96E-06, loss= 1.0715 (max= 1.6042), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:14,597 - root - INFO - Step 9630: lr=2.96E-06, loss= 1.0715 (max= 1.6042), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:46,554 - root - INFO - Step 9640: lr=2.96E-06, loss= 1.0528 (max= 1.4145), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:46,554 - root - INFO - Step 9640: lr=2.96E-06, loss= 1.0528 (max= 1.4145), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:46,554 - root - INFO - Step 9640: lr=2.96E-06, loss= 1.0528 (max= 1.4145), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:46,554 - root - INFO - Step 9640: lr=2.96E-06, loss= 1.0528 (max= 1.4145), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:46,554 - root - INFO - Step 9640: lr=2.96E-06, loss= 1.0528 (max= 1.4145), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:46,554 - root - INFO - Step 9640: lr=2.96E-06, loss= 1.0528 (max= 1.4145), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:46,554 - root - INFO - Step 9640: lr=2.96E-06, loss= 1.0528 (max= 1.4145), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:39:46,555 - root - INFO - Step 9640: lr=2.96E-06, loss= 1.0528 (max= 1.4145), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:18,329 - root - INFO - Step 9650: lr=2.95E-06, loss= 1.0704 (max= 1.5140), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:18,329 - root - INFO - Step 9650: lr=2.95E-06, loss= 1.0704 (max= 1.5140), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:18,329 - root - INFO - Step 9650: lr=2.95E-06, loss= 1.0704 (max= 1.5140), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:18,329 - root - INFO - Step 9650: lr=2.95E-06, loss= 1.0704 (max= 1.5140), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:18,329 - root - INFO - Step 9650: lr=2.95E-06, loss= 1.0704 (max= 1.5140), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:18,329 - root - INFO - Step 9650: lr=2.95E-06, loss= 1.0704 (max= 1.5140), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:18,330 - root - INFO - Step 9650: lr=2.95E-06, loss= 1.0704 (max= 1.5140), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:18,330 - root - INFO - Step 9650: lr=2.95E-06, loss= 1.0704 (max= 1.5140), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:50,166 - root - INFO - Step 9660: lr=2.95E-06, loss= 1.0554 (max= 1.4769), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:50,167 - root - INFO - Step 9660: lr=2.95E-06, loss= 1.0554 (max= 1.4769), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:50,167 - root - INFO - Step 9660: lr=2.95E-06, loss= 1.0554 (max= 1.4769), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:50,167 - root - INFO - Step 9660: lr=2.95E-06, loss= 1.0554 (max= 1.4769), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:50,167 - root - INFO - Step 9660: lr=2.95E-06, loss= 1.0554 (max= 1.4769), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:50,167 - root - INFO - Step 9660: lr=2.95E-06, loss= 1.0554 (max= 1.4769), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:50,167 - root - INFO - Step 9660: lr=2.95E-06, loss= 1.0554 (max= 1.4769), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:40:50,167 - root - INFO - Step 9660: lr=2.95E-06, loss= 1.0554 (max= 1.4769), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:22,328 - root - INFO - Step 9670: lr=2.95E-06, loss= 1.0525 (max= 1.5223), tps=20380, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:22,328 - root - INFO - Step 9670: lr=2.95E-06, loss= 1.0525 (max= 1.5223), tps=20379, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:22,328 - root - INFO - Step 9670: lr=2.95E-06, loss= 1.0525 (max= 1.5223), tps=20380, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:22,328 - root - INFO - Step 9670: lr=2.95E-06, loss= 1.0525 (max= 1.5223), tps=20380, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:22,328 - root - INFO - Step 9670: lr=2.95E-06, loss= 1.0525 (max= 1.5223), tps=20380, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:22,328 - root - INFO - Step 9670: lr=2.95E-06, loss= 1.0525 (max= 1.5223), tps=20380, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:22,328 - root - INFO - Step 9670: lr=2.95E-06, loss= 1.0525 (max= 1.5223), tps=20379, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:22,329 - root - INFO - Step 9670: lr=2.95E-06, loss= 1.0525 (max= 1.5223), tps=20380, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:54,254 - root - INFO - Step 9680: lr=2.94E-06, loss= 1.0480 (max= 1.4809), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:54,254 - root - INFO - Step 9680: lr=2.94E-06, loss= 1.0480 (max= 1.4809), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:54,254 - root - INFO - Step 9680: lr=2.94E-06, loss= 1.0480 (max= 1.4809), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:54,254 - root - INFO - Step 9680: lr=2.94E-06, loss= 1.0480 (max= 1.4809), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:54,254 - root - INFO - Step 9680: lr=2.94E-06, loss= 1.0480 (max= 1.4809), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:54,254 - root - INFO - Step 9680: lr=2.94E-06, loss= 1.0480 (max= 1.4809), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:54,254 - root - INFO - Step 9680: lr=2.94E-06, loss= 1.0480 (max= 1.4809), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:41:54,254 - root - INFO - Step 9680: lr=2.94E-06, loss= 1.0480 (max= 1.4809), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:26,018 - root - INFO - Step 9690: lr=2.94E-06, loss= 1.0476 (max= 1.4239), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:26,018 - root - INFO - Step 9690: lr=2.94E-06, loss= 1.0476 (max= 1.4239), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:26,018 - root - INFO - Step 9690: lr=2.94E-06, loss= 1.0476 (max= 1.4239), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:26,018 - root - INFO - Step 9690: lr=2.94E-06, loss= 1.0476 (max= 1.4239), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:26,018 - root - INFO - Step 9690: lr=2.94E-06, loss= 1.0476 (max= 1.4239), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:26,018 - root - INFO - Step 9690: lr=2.94E-06, loss= 1.0476 (max= 1.4239), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:26,018 - root - INFO - Step 9690: lr=2.94E-06, loss= 1.0476 (max= 1.4239), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:26,019 - root - INFO - Step 9690: lr=2.94E-06, loss= 1.0476 (max= 1.4239), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:57,878 - root - INFO - Step 9700: lr=2.93E-06, loss= 1.0432 (max= 1.4900), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:57,878 - root - INFO - Step 9700: lr=2.93E-06, loss= 1.0432 (max= 1.4900), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:57,878 - root - INFO - Step 9700: lr=2.93E-06, loss= 1.0432 (max= 1.4900), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:57,878 - root - INFO - Step 9700: lr=2.93E-06, loss= 1.0432 (max= 1.4900), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:57,878 - root - INFO - Step 9700: lr=2.93E-06, loss= 1.0432 (max= 1.4900), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:57,878 - root - INFO - Step 9700: lr=2.93E-06, loss= 1.0432 (max= 1.4900), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:57,878 - root - INFO - Step 9700: lr=2.93E-06, loss= 1.0432 (max= 1.4900), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:42:57,879 - root - INFO - Step 9700: lr=2.93E-06, loss= 1.0432 (max= 1.4900), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:43:29,717 - root - INFO - Step 9710: lr=2.93E-06, loss= 1.0467 (max= 1.5922), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:43:29,717 - root - INFO - Step 9710: lr=2.93E-06, loss= 1.0467 (max= 1.5922), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:43:29,717 - root - INFO - Step 9710: lr=2.93E-06, loss= 1.0467 (max= 1.5922), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:43:29,717 - root - INFO - Step 9710: lr=2.93E-06, loss= 1.0467 (max= 1.5922), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:43:29,717 - root - INFO - Step 9710: lr=2.93E-06, loss= 1.0467 (max= 1.5922), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:43:29,717 - root - INFO - Step 9710: lr=2.93E-06, loss= 1.0467 (max= 1.5922), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:43:29,717 - root - INFO - Step 9710: lr=2.93E-06, loss= 1.0467 (max= 1.5922), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:43:29,718 - root - INFO - Step 9710: lr=2.93E-06, loss= 1.0467 (max= 1.5922), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:01,534 - root - INFO - Step 9720: lr=2.93E-06, loss= 1.0622 (max= 1.5670), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:01,534 - root - INFO - Step 9720: lr=2.93E-06, loss= 1.0622 (max= 1.5670), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:01,534 - root - INFO - Step 9720: lr=2.93E-06, loss= 1.0622 (max= 1.5670), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:01,534 - root - INFO - Step 9720: lr=2.93E-06, loss= 1.0622 (max= 1.5670), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:01,534 - root - INFO - Step 9720: lr=2.93E-06, loss= 1.0622 (max= 1.5670), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:01,534 - root - INFO - Step 9720: lr=2.93E-06, loss= 1.0622 (max= 1.5670), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:01,534 - root - INFO - Step 9720: lr=2.93E-06, loss= 1.0622 (max= 1.5670), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:01,535 - root - INFO - Step 9720: lr=2.93E-06, loss= 1.0622 (max= 1.5670), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:33,359 - root - INFO - Step 9730: lr=2.92E-06, loss= 1.0510 (max= 1.5569), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:33,359 - root - INFO - Step 9730: lr=2.92E-06, loss= 1.0510 (max= 1.5569), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:33,359 - root - INFO - Step 9730: lr=2.92E-06, loss= 1.0510 (max= 1.5569), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:33,359 - root - INFO - Step 9730: lr=2.92E-06, loss= 1.0510 (max= 1.5569), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:33,359 - root - INFO - Step 9730: lr=2.92E-06, loss= 1.0510 (max= 1.5569), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:33,359 - root - INFO - Step 9730: lr=2.92E-06, loss= 1.0510 (max= 1.5569), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:33,359 - root - INFO - Step 9730: lr=2.92E-06, loss= 1.0510 (max= 1.5569), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:44:33,360 - root - INFO - Step 9730: lr=2.92E-06, loss= 1.0510 (max= 1.5569), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:05,423 - root - INFO - Step 9740: lr=2.92E-06, loss= 1.0599 (max= 1.6085), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:05,423 - root - INFO - Step 9740: lr=2.92E-06, loss= 1.0599 (max= 1.6085), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:05,423 - root - INFO - Step 9740: lr=2.92E-06, loss= 1.0599 (max= 1.6085), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:05,423 - root - INFO - Step 9740: lr=2.92E-06, loss= 1.0599 (max= 1.6085), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:05,423 - root - INFO - Step 9740: lr=2.92E-06, loss= 1.0599 (max= 1.6085), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:05,423 - root - INFO - Step 9740: lr=2.92E-06, loss= 1.0599 (max= 1.6085), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:05,423 - root - INFO - Step 9740: lr=2.92E-06, loss= 1.0599 (max= 1.6085), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:05,424 - root - INFO - Step 9740: lr=2.92E-06, loss= 1.0599 (max= 1.6085), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:37,262 - root - INFO - Step 9750: lr=2.91E-06, loss= 1.0699 (max= 1.6582), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:37,262 - root - INFO - Step 9750: lr=2.91E-06, loss= 1.0699 (max= 1.6582), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:37,262 - root - INFO - Step 9750: lr=2.91E-06, loss= 1.0699 (max= 1.6582), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:37,262 - root - INFO - Step 9750: lr=2.91E-06, loss= 1.0699 (max= 1.6582), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:37,262 - root - INFO - Step 9750: lr=2.91E-06, loss= 1.0699 (max= 1.6582), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:37,262 - root - INFO - Step 9750: lr=2.91E-06, loss= 1.0699 (max= 1.6582), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:37,262 - root - INFO - Step 9750: lr=2.91E-06, loss= 1.0699 (max= 1.6582), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:45:37,263 - root - INFO - Step 9750: lr=2.91E-06, loss= 1.0699 (max= 1.6582), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:09,147 - root - INFO - Step 9760: lr=2.91E-06, loss= 1.0531 (max= 1.4693), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:09,147 - root - INFO - Step 9760: lr=2.91E-06, loss= 1.0531 (max= 1.4693), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:09,147 - root - INFO - Step 9760: lr=2.91E-06, loss= 1.0531 (max= 1.4693), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:09,148 - root - INFO - Step 9760: lr=2.91E-06, loss= 1.0531 (max= 1.4693), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:09,148 - root - INFO - Step 9760: lr=2.91E-06, loss= 1.0531 (max= 1.4693), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:09,148 - root - INFO - Step 9760: lr=2.91E-06, loss= 1.0531 (max= 1.4693), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:09,148 - root - INFO - Step 9760: lr=2.91E-06, loss= 1.0531 (max= 1.4693), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:09,148 - root - INFO - Step 9760: lr=2.91E-06, loss= 1.0531 (max= 1.4693), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:41,315 - root - INFO - Step 9770: lr=2.91E-06, loss= 1.0430 (max= 1.4751), tps=20376, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:41,315 - root - INFO - Step 9770: lr=2.91E-06, loss= 1.0430 (max= 1.4751), tps=20375, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:41,315 - root - INFO - Step 9770: lr=2.91E-06, loss= 1.0430 (max= 1.4751), tps=20376, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:41,315 - root - INFO - Step 9770: lr=2.91E-06, loss= 1.0430 (max= 1.4751), tps=20375, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:41,315 - root - INFO - Step 9770: lr=2.91E-06, loss= 1.0430 (max= 1.4751), tps=20376, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:41,315 - root - INFO - Step 9770: lr=2.91E-06, loss= 1.0430 (max= 1.4751), tps=20376, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:41,315 - root - INFO - Step 9770: lr=2.91E-06, loss= 1.0430 (max= 1.4751), tps=20376, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:46:41,316 - root - INFO - Step 9770: lr=2.91E-06, loss= 1.0430 (max= 1.4751), tps=20376, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:13,246 - root - INFO - Step 9780: lr=2.90E-06, loss= 1.0646 (max= 1.5078), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:13,246 - root - INFO - Step 9780: lr=2.90E-06, loss= 1.0646 (max= 1.5078), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:13,246 - root - INFO - Step 9780: lr=2.90E-06, loss= 1.0646 (max= 1.5078), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:13,246 - root - INFO - Step 9780: lr=2.90E-06, loss= 1.0646 (max= 1.5078), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:13,246 - root - INFO - Step 9780: lr=2.90E-06, loss= 1.0646 (max= 1.5078), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:13,247 - root - INFO - Step 9780: lr=2.90E-06, loss= 1.0646 (max= 1.5078), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:13,247 - root - INFO - Step 9780: lr=2.90E-06, loss= 1.0646 (max= 1.5078), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:13,247 - root - INFO - Step 9780: lr=2.90E-06, loss= 1.0646 (max= 1.5078), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:45,063 - root - INFO - Step 9790: lr=2.90E-06, loss= 1.0652 (max= 1.7110), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:45,063 - root - INFO - Step 9790: lr=2.90E-06, loss= 1.0652 (max= 1.7110), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:45,063 - root - INFO - Step 9790: lr=2.90E-06, loss= 1.0652 (max= 1.7110), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:45,063 - root - INFO - Step 9790: lr=2.90E-06, loss= 1.0652 (max= 1.7110), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:45,063 - root - INFO - Step 9790: lr=2.90E-06, loss= 1.0652 (max= 1.7110), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:45,063 - root - INFO - Step 9790: lr=2.90E-06, loss= 1.0652 (max= 1.7110), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:45,063 - root - INFO - Step 9790: lr=2.90E-06, loss= 1.0652 (max= 1.7110), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:47:45,064 - root - INFO - Step 9790: lr=2.90E-06, loss= 1.0652 (max= 1.7110), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:17,200 - root - INFO - Step 9800: lr=2.90E-06, loss= 1.0701 (max= 1.5425), tps=20395, mfu=42.49%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:17,200 - root - INFO - Step 9800: lr=2.90E-06, loss= 1.0701 (max= 1.5425), tps=20395, mfu=42.49%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:17,200 - root - INFO - Step 9800: lr=2.90E-06, loss= 1.0701 (max= 1.5425), tps=20395, mfu=42.49%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:17,200 - root - INFO - Step 9800: lr=2.90E-06, loss= 1.0701 (max= 1.5425), tps=20395, mfu=42.49%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:17,200 - root - INFO - Step 9800: lr=2.90E-06, loss= 1.0701 (max= 1.5425), tps=20395, mfu=42.49%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:17,200 - root - INFO - Step 9800: lr=2.90E-06, loss= 1.0701 (max= 1.5425), tps=20395, mfu=42.49%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:17,200 - root - INFO - Step 9800: lr=2.90E-06, loss= 1.0701 (max= 1.5425), tps=20395, mfu=42.49%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:17,201 - root - INFO - Step 9800: lr=2.90E-06, loss= 1.0701 (max= 1.5425), tps=20395, mfu=42.49%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:49,014 - root - INFO - Step 9810: lr=2.89E-06, loss= 1.0642 (max= 1.6802), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:49,014 - root - INFO - Step 9810: lr=2.89E-06, loss= 1.0642 (max= 1.6802), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:49,014 - root - INFO - Step 9810: lr=2.89E-06, loss= 1.0642 (max= 1.6802), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:49,014 - root - INFO - Step 9810: lr=2.89E-06, loss= 1.0642 (max= 1.6802), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:49,014 - root - INFO - Step 9810: lr=2.89E-06, loss= 1.0642 (max= 1.6802), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:49,014 - root - INFO - Step 9810: lr=2.89E-06, loss= 1.0642 (max= 1.6802), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:49,014 - root - INFO - Step 9810: lr=2.89E-06, loss= 1.0642 (max= 1.6802), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:48:49,015 - root - INFO - Step 9810: lr=2.89E-06, loss= 1.0642 (max= 1.6802), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:20,990 - root - INFO - Step 9820: lr=2.89E-06, loss= 1.0737 (max= 1.5022), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:20,990 - root - INFO - Step 9820: lr=2.89E-06, loss= 1.0737 (max= 1.5022), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:20,990 - root - INFO - Step 9820: lr=2.89E-06, loss= 1.0737 (max= 1.5022), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:20,990 - root - INFO - Step 9820: lr=2.89E-06, loss= 1.0737 (max= 1.5022), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:20,990 - root - INFO - Step 9820: lr=2.89E-06, loss= 1.0737 (max= 1.5022), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:20,990 - root - INFO - Step 9820: lr=2.89E-06, loss= 1.0737 (max= 1.5022), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:20,990 - root - INFO - Step 9820: lr=2.89E-06, loss= 1.0737 (max= 1.5022), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:20,991 - root - INFO - Step 9820: lr=2.89E-06, loss= 1.0737 (max= 1.5022), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:52,893 - root - INFO - Step 9830: lr=2.88E-06, loss= 1.0712 (max= 1.4985), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:52,893 - root - INFO - Step 9830: lr=2.88E-06, loss= 1.0712 (max= 1.4985), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:52,893 - root - INFO - Step 9830: lr=2.88E-06, loss= 1.0712 (max= 1.4985), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:52,893 - root - INFO - Step 9830: lr=2.88E-06, loss= 1.0712 (max= 1.4985), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:52,893 - root - INFO - Step 9830: lr=2.88E-06, loss= 1.0712 (max= 1.4985), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:52,893 - root - INFO - Step 9830: lr=2.88E-06, loss= 1.0712 (max= 1.4985), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:52,893 - root - INFO - Step 9830: lr=2.88E-06, loss= 1.0712 (max= 1.4985), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:49:52,894 - root - INFO - Step 9830: lr=2.88E-06, loss= 1.0712 (max= 1.4985), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:24,643 - root - INFO - Step 9840: lr=2.88E-06, loss= 1.0530 (max= 1.5741), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:24,643 - root - INFO - Step 9840: lr=2.88E-06, loss= 1.0530 (max= 1.5741), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:24,644 - root - INFO - Step 9840: lr=2.88E-06, loss= 1.0530 (max= 1.5741), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:24,644 - root - INFO - Step 9840: lr=2.88E-06, loss= 1.0530 (max= 1.5741), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:24,644 - root - INFO - Step 9840: lr=2.88E-06, loss= 1.0530 (max= 1.5741), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:24,644 - root - INFO - Step 9840: lr=2.88E-06, loss= 1.0530 (max= 1.5741), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:24,644 - root - INFO - Step 9840: lr=2.88E-06, loss= 1.0530 (max= 1.5741), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:24,644 - root - INFO - Step 9840: lr=2.88E-06, loss= 1.0530 (max= 1.5741), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:56,512 - root - INFO - Step 9850: lr=2.88E-06, loss= 1.0711 (max= 1.5271), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:56,512 - root - INFO - Step 9850: lr=2.88E-06, loss= 1.0711 (max= 1.5271), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:56,512 - root - INFO - Step 9850: lr=2.88E-06, loss= 1.0711 (max= 1.5271), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:56,512 - root - INFO - Step 9850: lr=2.88E-06, loss= 1.0711 (max= 1.5271), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:56,512 - root - INFO - Step 9850: lr=2.88E-06, loss= 1.0711 (max= 1.5271), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:56,512 - root - INFO - Step 9850: lr=2.88E-06, loss= 1.0711 (max= 1.5271), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:56,513 - root - INFO - Step 9850: lr=2.88E-06, loss= 1.0711 (max= 1.5271), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:50:56,513 - root - INFO - Step 9850: lr=2.88E-06, loss= 1.0711 (max= 1.5271), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:51:28,398 - root - INFO - Step 9860: lr=2.87E-06, loss= 1.0496 (max= 1.4587), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:51:28,398 - root - INFO - Step 9860: lr=2.87E-06, loss= 1.0496 (max= 1.4587), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:51:28,398 - root - INFO - Step 9860: lr=2.87E-06, loss= 1.0496 (max= 1.4587), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:51:28,398 - root - INFO - Step 9860: lr=2.87E-06, loss= 1.0496 (max= 1.4587), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:51:28,398 - root - INFO - Step 9860: lr=2.87E-06, loss= 1.0496 (max= 1.4587), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:51:28,398 - root - INFO - Step 9860: lr=2.87E-06, loss= 1.0496 (max= 1.4587), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:51:28,398 - root - INFO - Step 9860: lr=2.87E-06, loss= 1.0496 (max= 1.4587), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:51:28,399 - root - INFO - Step 9860: lr=2.87E-06, loss= 1.0496 (max= 1.4587), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:00,224 - root - INFO - Step 9870: lr=2.87E-06, loss= 1.0717 (max= 1.5728), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:00,224 - root - INFO - Step 9870: lr=2.87E-06, loss= 1.0717 (max= 1.5728), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:00,224 - root - INFO - Step 9870: lr=2.87E-06, loss= 1.0717 (max= 1.5728), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:00,224 - root - INFO - Step 9870: lr=2.87E-06, loss= 1.0717 (max= 1.5728), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:00,224 - root - INFO - Step 9870: lr=2.87E-06, loss= 1.0717 (max= 1.5728), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:00,224 - root - INFO - Step 9870: lr=2.87E-06, loss= 1.0717 (max= 1.5728), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:00,224 - root - INFO - Step 9870: lr=2.87E-06, loss= 1.0717 (max= 1.5728), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:00,225 - root - INFO - Step 9870: lr=2.87E-06, loss= 1.0717 (max= 1.5728), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:32,095 - root - INFO - Step 9880: lr=2.87E-06, loss= 1.0930 (max= 1.5245), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:32,095 - root - INFO - Step 9880: lr=2.87E-06, loss= 1.0930 (max= 1.5245), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:32,095 - root - INFO - Step 9880: lr=2.87E-06, loss= 1.0930 (max= 1.5245), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:32,096 - root - INFO - Step 9880: lr=2.87E-06, loss= 1.0930 (max= 1.5245), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:32,096 - root - INFO - Step 9880: lr=2.87E-06, loss= 1.0930 (max= 1.5245), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:32,096 - root - INFO - Step 9880: lr=2.87E-06, loss= 1.0930 (max= 1.5245), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:32,096 - root - INFO - Step 9880: lr=2.87E-06, loss= 1.0930 (max= 1.5245), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:52:32,096 - root - INFO - Step 9880: lr=2.87E-06, loss= 1.0930 (max= 1.5245), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:03,981 - root - INFO - Step 9890: lr=2.86E-06, loss= 1.0946 (max= 1.5992), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:03,981 - root - INFO - Step 9890: lr=2.86E-06, loss= 1.0946 (max= 1.5992), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:03,982 - root - INFO - Step 9890: lr=2.86E-06, loss= 1.0946 (max= 1.5992), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:03,982 - root - INFO - Step 9890: lr=2.86E-06, loss= 1.0946 (max= 1.5992), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:03,982 - root - INFO - Step 9890: lr=2.86E-06, loss= 1.0946 (max= 1.5992), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:03,982 - root - INFO - Step 9890: lr=2.86E-06, loss= 1.0946 (max= 1.5992), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:03,982 - root - INFO - Step 9890: lr=2.86E-06, loss= 1.0946 (max= 1.5992), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:03,982 - root - INFO - Step 9890: lr=2.86E-06, loss= 1.0946 (max= 1.5992), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:35,826 - root - INFO - Step 9900: lr=2.86E-06, loss= 1.0743 (max= 1.4748), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:35,826 - root - INFO - Step 9900: lr=2.86E-06, loss= 1.0743 (max= 1.4748), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:35,826 - root - INFO - Step 9900: lr=2.86E-06, loss= 1.0743 (max= 1.4748), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:35,826 - root - INFO - Step 9900: lr=2.86E-06, loss= 1.0743 (max= 1.4748), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:35,826 - root - INFO - Step 9900: lr=2.86E-06, loss= 1.0743 (max= 1.4748), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:35,826 - root - INFO - Step 9900: lr=2.86E-06, loss= 1.0743 (max= 1.4748), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:35,826 - root - INFO - Step 9900: lr=2.86E-06, loss= 1.0743 (max= 1.4748), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:53:35,827 - root - INFO - Step 9900: lr=2.86E-06, loss= 1.0743 (max= 1.4748), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:07,879 - root - INFO - Step 9910: lr=2.85E-06, loss= 1.0786 (max= 1.5461), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:07,879 - root - INFO - Step 9910: lr=2.85E-06, loss= 1.0786 (max= 1.5461), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:07,879 - root - INFO - Step 9910: lr=2.85E-06, loss= 1.0786 (max= 1.5461), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:07,879 - root - INFO - Step 9910: lr=2.85E-06, loss= 1.0786 (max= 1.5461), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:07,879 - root - INFO - Step 9910: lr=2.85E-06, loss= 1.0786 (max= 1.5461), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:07,879 - root - INFO - Step 9910: lr=2.85E-06, loss= 1.0786 (max= 1.5461), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:07,879 - root - INFO - Step 9910: lr=2.85E-06, loss= 1.0786 (max= 1.5461), tps=20448, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:07,880 - root - INFO - Step 9910: lr=2.85E-06, loss= 1.0786 (max= 1.5461), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:39,702 - root - INFO - Step 9920: lr=2.85E-06, loss= 1.0670 (max= 1.5264), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:39,702 - root - INFO - Step 9920: lr=2.85E-06, loss= 1.0670 (max= 1.5264), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:39,702 - root - INFO - Step 9920: lr=2.85E-06, loss= 1.0670 (max= 1.5264), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:39,702 - root - INFO - Step 9920: lr=2.85E-06, loss= 1.0670 (max= 1.5264), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:39,702 - root - INFO - Step 9920: lr=2.85E-06, loss= 1.0670 (max= 1.5264), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:39,702 - root - INFO - Step 9920: lr=2.85E-06, loss= 1.0670 (max= 1.5264), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:39,702 - root - INFO - Step 9920: lr=2.85E-06, loss= 1.0670 (max= 1.5264), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:54:39,703 - root - INFO - Step 9920: lr=2.85E-06, loss= 1.0670 (max= 1.5264), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:11,718 - root - INFO - Step 9930: lr=2.85E-06, loss= 1.0611 (max= 1.5274), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:11,718 - root - INFO - Step 9930: lr=2.85E-06, loss= 1.0611 (max= 1.5274), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:11,718 - root - INFO - Step 9930: lr=2.85E-06, loss= 1.0611 (max= 1.5274), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:11,718 - root - INFO - Step 9930: lr=2.85E-06, loss= 1.0611 (max= 1.5274), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:11,718 - root - INFO - Step 9930: lr=2.85E-06, loss= 1.0611 (max= 1.5274), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:11,719 - root - INFO - Step 9930: lr=2.85E-06, loss= 1.0611 (max= 1.5274), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:11,719 - root - INFO - Step 9930: lr=2.85E-06, loss= 1.0611 (max= 1.5274), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:11,719 - root - INFO - Step 9930: lr=2.85E-06, loss= 1.0611 (max= 1.5274), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:43,495 - root - INFO - Step 9940: lr=2.84E-06, loss= 1.0655 (max= 1.4244), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:43,495 - root - INFO - Step 9940: lr=2.84E-06, loss= 1.0655 (max= 1.4244), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:43,496 - root - INFO - Step 9940: lr=2.84E-06, loss= 1.0655 (max= 1.4244), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:43,496 - root - INFO - Step 9940: lr=2.84E-06, loss= 1.0655 (max= 1.4244), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:43,496 - root - INFO - Step 9940: lr=2.84E-06, loss= 1.0655 (max= 1.4244), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:43,496 - root - INFO - Step 9940: lr=2.84E-06, loss= 1.0655 (max= 1.4244), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:43,496 - root - INFO - Step 9940: lr=2.84E-06, loss= 1.0655 (max= 1.4244), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:55:43,496 - root - INFO - Step 9940: lr=2.84E-06, loss= 1.0655 (max= 1.4244), tps=20626, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:15,319 - root - INFO - Step 9950: lr=2.84E-06, loss= 1.0826 (max= 1.4541), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:15,319 - root - INFO - Step 9950: lr=2.84E-06, loss= 1.0826 (max= 1.4541), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:15,319 - root - INFO - Step 9950: lr=2.84E-06, loss= 1.0826 (max= 1.4541), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:15,319 - root - INFO - Step 9950: lr=2.84E-06, loss= 1.0826 (max= 1.4541), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:15,319 - root - INFO - Step 9950: lr=2.84E-06, loss= 1.0826 (max= 1.4541), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:15,319 - root - INFO - Step 9950: lr=2.84E-06, loss= 1.0826 (max= 1.4541), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:15,319 - root - INFO - Step 9950: lr=2.84E-06, loss= 1.0826 (max= 1.4541), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:15,320 - root - INFO - Step 9950: lr=2.84E-06, loss= 1.0826 (max= 1.4541), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:47,229 - root - INFO - Step 9960: lr=2.83E-06, loss= 1.0796 (max= 1.5928), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:47,229 - root - INFO - Step 9960: lr=2.83E-06, loss= 1.0796 (max= 1.5928), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:47,229 - root - INFO - Step 9960: lr=2.83E-06, loss= 1.0796 (max= 1.5928), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:47,229 - root - INFO - Step 9960: lr=2.83E-06, loss= 1.0796 (max= 1.5928), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:47,229 - root - INFO - Step 9960: lr=2.83E-06, loss= 1.0796 (max= 1.5928), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:47,229 - root - INFO - Step 9960: lr=2.83E-06, loss= 1.0796 (max= 1.5928), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:47,229 - root - INFO - Step 9960: lr=2.83E-06, loss= 1.0796 (max= 1.5928), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:56:47,230 - root - INFO - Step 9960: lr=2.83E-06, loss= 1.0796 (max= 1.5928), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:19,179 - root - INFO - Step 9970: lr=2.83E-06, loss= 1.0799 (max= 1.4792), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:19,179 - root - INFO - Step 9970: lr=2.83E-06, loss= 1.0799 (max= 1.4792), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:19,179 - root - INFO - Step 9970: lr=2.83E-06, loss= 1.0799 (max= 1.4792), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:19,179 - root - INFO - Step 9970: lr=2.83E-06, loss= 1.0799 (max= 1.4792), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:19,179 - root - INFO - Step 9970: lr=2.83E-06, loss= 1.0799 (max= 1.4792), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:19,179 - root - INFO - Step 9970: lr=2.83E-06, loss= 1.0799 (max= 1.4792), tps=20514, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:19,179 - root - INFO - Step 9970: lr=2.83E-06, loss= 1.0799 (max= 1.4792), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:19,180 - root - INFO - Step 9970: lr=2.83E-06, loss= 1.0799 (max= 1.4792), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:51,148 - root - INFO - Step 9980: lr=2.83E-06, loss= 1.1011 (max= 1.7895), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:51,148 - root - INFO - Step 9980: lr=2.83E-06, loss= 1.1011 (max= 1.7895), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:51,148 - root - INFO - Step 9980: lr=2.83E-06, loss= 1.1011 (max= 1.7895), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:51,148 - root - INFO - Step 9980: lr=2.83E-06, loss= 1.1011 (max= 1.7895), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:51,148 - root - INFO - Step 9980: lr=2.83E-06, loss= 1.1011 (max= 1.7895), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:51,148 - root - INFO - Step 9980: lr=2.83E-06, loss= 1.1011 (max= 1.7895), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:51,148 - root - INFO - Step 9980: lr=2.83E-06, loss= 1.1011 (max= 1.7895), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:57:51,149 - root - INFO - Step 9980: lr=2.83E-06, loss= 1.1011 (max= 1.7895), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:23,208 - root - INFO - Step 9990: lr=2.82E-06, loss= 1.0694 (max= 1.4576), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:23,208 - root - INFO - Step 9990: lr=2.82E-06, loss= 1.0694 (max= 1.4576), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:23,208 - root - INFO - Step 9990: lr=2.82E-06, loss= 1.0694 (max= 1.4576), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:23,208 - root - INFO - Step 9990: lr=2.82E-06, loss= 1.0694 (max= 1.4576), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:23,208 - root - INFO - Step 9990: lr=2.82E-06, loss= 1.0694 (max= 1.4576), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:23,209 - root - INFO - Step 9990: lr=2.82E-06, loss= 1.0694 (max= 1.4576), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:23,209 - root - INFO - Step 9990: lr=2.82E-06, loss= 1.0694 (max= 1.4576), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:23,209 - root - INFO - Step 9990: lr=2.82E-06, loss= 1.0694 (max= 1.4576), tps=20444, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-10000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-10000! Save time: 4.416930437088013 2025-10-26 16:58:55,023 - root - INFO - Step 10000: lr=2.82E-06, loss= 1.0723 (max= 1.7757), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:55,023 - root - INFO - Saving a full checkpoint at step 10000 2025-10-26 16:58:55,023 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:58:55,023 - root - INFO - Step 10000: lr=2.82E-06, loss= 1.0723 (max= 1.7757), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:55,023 - root - INFO - Step 10000: lr=2.82E-06, loss= 1.0723 (max= 1.7757), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:55,024 - root - INFO - Saving a full checkpoint at step 10000 2025-10-26 16:58:55,024 - root - INFO - Step 10000: lr=2.82E-06, loss= 1.0723 (max= 1.7757), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:55,024 - root - INFO - Step 10000: lr=2.82E-06, loss= 1.0723 (max= 1.7757), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:55,024 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:58:55,024 - root - INFO - Saving a full checkpoint at step 10000 2025-10-26 16:58:55,024 - root - INFO - Saving a full checkpoint at step 10000 2025-10-26 16:58:55,024 - root - INFO - Step 10000: lr=2.82E-06, loss= 1.0723 (max= 1.7757), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:55,024 - root - INFO - Saving a full checkpoint at step 10000 2025-10-26 16:58:55,024 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:58:55,024 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:58:55,024 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:58:55,024 - root - INFO - Saving a full checkpoint at step 10000 2025-10-26 16:58:55,024 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:58:55,024 - root - INFO - Step 10000: lr=2.82E-06, loss= 1.0723 (max= 1.7757), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:55,024 - root - INFO - Saving a full checkpoint at step 10000 2025-10-26 16:58:55,024 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:58:55,024 - root - INFO - Step 10000: lr=2.82E-06, loss= 1.0723 (max= 1.7757), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:58:55,024 - root - INFO - Saving a full checkpoint at step 10000 2025-10-26 16:58:55,024 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 16:59:10,010 - root - INFO - Finished saving the checkpoint in 14.99 seconds 2025-10-26 16:59:10,017 - root - INFO - Finished saving the checkpoint in 14.99 seconds 2025-10-26 16:59:10,018 - root - INFO - Finished saving the checkpoint in 14.99 seconds 2025-10-26 16:59:10,018 - root - INFO - Finished saving the checkpoint in 14.99 seconds 2025-10-26 16:59:10,018 - root - INFO - Finished saving the checkpoint in 14.99 seconds 2025-10-26 16:59:10,018 - root - INFO - Finished saving the checkpoint in 14.99 seconds 2025-10-26 16:59:10,018 - root - INFO - Finished saving the checkpoint in 14.99 seconds 2025-10-26 16:59:10,021 - root - INFO - Finished saving the checkpoint in 15.00 seconds 2025-10-26 16:59:41,766 - root - INFO - Step 10010: lr=2.82E-06, loss= 1.0589 (max= 1.5711), tps=14022, mfu=29.21%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:59:41,766 - root - INFO - Step 10010: lr=2.82E-06, loss= 1.0589 (max= 1.5711), tps=14022, mfu=29.21%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:59:41,766 - root - INFO - Step 10010: lr=2.82E-06, loss= 1.0589 (max= 1.5711), tps=14022, mfu=29.21%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:59:41,766 - root - INFO - Step 10010: lr=2.82E-06, loss= 1.0589 (max= 1.5711), tps=14022, mfu=29.21%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:59:41,766 - root - INFO - Step 10010: lr=2.82E-06, loss= 1.0589 (max= 1.5711), tps=14022, mfu=29.21%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:59:41,766 - root - INFO - Step 10010: lr=2.82E-06, loss= 1.0589 (max= 1.5711), tps=14022, mfu=29.21%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:59:41,767 - root - INFO - Step 10010: lr=2.82E-06, loss= 1.0589 (max= 1.5711), tps=14022, mfu=29.21%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 16:59:41,767 - root - INFO - Step 10010: lr=2.82E-06, loss= 1.0589 (max= 1.5711), tps=14022, mfu=29.21%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:13,580 - root - INFO - Step 10020: lr=2.81E-06, loss= 1.0975 (max= 1.4841), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:13,580 - root - INFO - Step 10020: lr=2.81E-06, loss= 1.0975 (max= 1.4841), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:13,580 - root - INFO - Step 10020: lr=2.81E-06, loss= 1.0975 (max= 1.4841), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:13,580 - root - INFO - Step 10020: lr=2.81E-06, loss= 1.0975 (max= 1.4841), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:13,580 - root - INFO - Step 10020: lr=2.81E-06, loss= 1.0975 (max= 1.4841), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:13,580 - root - INFO - Step 10020: lr=2.81E-06, loss= 1.0975 (max= 1.4841), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:13,580 - root - INFO - Step 10020: lr=2.81E-06, loss= 1.0975 (max= 1.4841), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:13,581 - root - INFO - Step 10020: lr=2.81E-06, loss= 1.0975 (max= 1.4841), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:45,404 - root - INFO - Step 10030: lr=2.81E-06, loss= 1.0713 (max= 1.5112), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:45,404 - root - INFO - Step 10030: lr=2.81E-06, loss= 1.0713 (max= 1.5112), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:45,405 - root - INFO - Step 10030: lr=2.81E-06, loss= 1.0713 (max= 1.5112), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:45,405 - root - INFO - Step 10030: lr=2.81E-06, loss= 1.0713 (max= 1.5112), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:45,405 - root - INFO - Step 10030: lr=2.81E-06, loss= 1.0713 (max= 1.5112), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:45,405 - root - INFO - Step 10030: lr=2.81E-06, loss= 1.0713 (max= 1.5112), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:45,405 - root - INFO - Step 10030: lr=2.81E-06, loss= 1.0713 (max= 1.5112), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:00:45,405 - root - INFO - Step 10030: lr=2.81E-06, loss= 1.0713 (max= 1.5112), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:17,302 - root - INFO - Step 10040: lr=2.80E-06, loss= 1.1003 (max= 1.8043), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:17,302 - root - INFO - Step 10040: lr=2.80E-06, loss= 1.1003 (max= 1.8043), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:17,302 - root - INFO - Step 10040: lr=2.80E-06, loss= 1.1003 (max= 1.8043), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:17,302 - root - INFO - Step 10040: lr=2.80E-06, loss= 1.1003 (max= 1.8043), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:17,302 - root - INFO - Step 10040: lr=2.80E-06, loss= 1.1003 (max= 1.8043), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:17,302 - root - INFO - Step 10040: lr=2.80E-06, loss= 1.1003 (max= 1.8043), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:17,303 - root - INFO - Step 10040: lr=2.80E-06, loss= 1.1003 (max= 1.8043), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:17,303 - root - INFO - Step 10040: lr=2.80E-06, loss= 1.1003 (max= 1.8043), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:49,103 - root - INFO - Step 10050: lr=2.80E-06, loss= 1.0732 (max= 1.7268), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:49,104 - root - INFO - Step 10050: lr=2.80E-06, loss= 1.0732 (max= 1.7268), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:49,104 - root - INFO - Step 10050: lr=2.80E-06, loss= 1.0732 (max= 1.7268), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:49,104 - root - INFO - Step 10050: lr=2.80E-06, loss= 1.0732 (max= 1.7268), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:49,104 - root - INFO - Step 10050: lr=2.80E-06, loss= 1.0732 (max= 1.7268), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:49,104 - root - INFO - Step 10050: lr=2.80E-06, loss= 1.0732 (max= 1.7268), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:49,104 - root - INFO - Step 10050: lr=2.80E-06, loss= 1.0732 (max= 1.7268), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:01:49,104 - root - INFO - Step 10050: lr=2.80E-06, loss= 1.0732 (max= 1.7268), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:20,910 - root - INFO - Step 10060: lr=2.80E-06, loss= 1.0632 (max= 1.8745), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:20,910 - root - INFO - Step 10060: lr=2.80E-06, loss= 1.0632 (max= 1.8745), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:20,910 - root - INFO - Step 10060: lr=2.80E-06, loss= 1.0632 (max= 1.8745), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:20,910 - root - INFO - Step 10060: lr=2.80E-06, loss= 1.0632 (max= 1.8745), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:20,910 - root - INFO - Step 10060: lr=2.80E-06, loss= 1.0632 (max= 1.8745), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:20,910 - root - INFO - Step 10060: lr=2.80E-06, loss= 1.0632 (max= 1.8745), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:20,910 - root - INFO - Step 10060: lr=2.80E-06, loss= 1.0632 (max= 1.8745), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:20,911 - root - INFO - Step 10060: lr=2.80E-06, loss= 1.0632 (max= 1.8745), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:52,800 - root - INFO - Step 10070: lr=2.79E-06, loss= 1.0993 (max= 1.6491), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:52,800 - root - INFO - Step 10070: lr=2.79E-06, loss= 1.0993 (max= 1.6491), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:52,800 - root - INFO - Step 10070: lr=2.79E-06, loss= 1.0993 (max= 1.6491), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:52,800 - root - INFO - Step 10070: lr=2.79E-06, loss= 1.0993 (max= 1.6491), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:52,800 - root - INFO - Step 10070: lr=2.79E-06, loss= 1.0993 (max= 1.6491), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:52,800 - root - INFO - Step 10070: lr=2.79E-06, loss= 1.0993 (max= 1.6491), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:52,801 - root - INFO - Step 10070: lr=2.79E-06, loss= 1.0993 (max= 1.6491), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:02:52,801 - root - INFO - Step 10070: lr=2.79E-06, loss= 1.0993 (max= 1.6491), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:24,601 - root - INFO - Step 10080: lr=2.79E-06, loss= 1.0835 (max= 1.6413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:24,601 - root - INFO - Step 10080: lr=2.79E-06, loss= 1.0835 (max= 1.6413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:24,601 - root - INFO - Step 10080: lr=2.79E-06, loss= 1.0835 (max= 1.6413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:24,601 - root - INFO - Step 10080: lr=2.79E-06, loss= 1.0835 (max= 1.6413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:24,601 - root - INFO - Step 10080: lr=2.79E-06, loss= 1.0835 (max= 1.6413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:24,601 - root - INFO - Step 10080: lr=2.79E-06, loss= 1.0835 (max= 1.6413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:24,602 - root - INFO - Step 10080: lr=2.79E-06, loss= 1.0835 (max= 1.6413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:24,602 - root - INFO - Step 10080: lr=2.79E-06, loss= 1.0835 (max= 1.6413), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:56,494 - root - INFO - Step 10090: lr=2.79E-06, loss= 1.1006 (max= 1.8111), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:56,494 - root - INFO - Step 10090: lr=2.79E-06, loss= 1.1006 (max= 1.8111), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:56,494 - root - INFO - Step 10090: lr=2.79E-06, loss= 1.1006 (max= 1.8111), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:56,494 - root - INFO - Step 10090: lr=2.79E-06, loss= 1.1006 (max= 1.8111), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:56,494 - root - INFO - Step 10090: lr=2.79E-06, loss= 1.1006 (max= 1.8111), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:56,494 - root - INFO - Step 10090: lr=2.79E-06, loss= 1.1006 (max= 1.8111), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:56,495 - root - INFO - Step 10090: lr=2.79E-06, loss= 1.1006 (max= 1.8111), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:03:56,495 - root - INFO - Step 10090: lr=2.79E-06, loss= 1.1006 (max= 1.8111), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:04:28,334 - root - INFO - Step 10100: lr=2.78E-06, loss= 1.0867 (max= 1.6208), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:04:28,335 - root - INFO - Step 10100: lr=2.78E-06, loss= 1.0867 (max= 1.6208), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:04:28,335 - root - INFO - Step 10100: lr=2.78E-06, loss= 1.0867 (max= 1.6208), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:04:28,335 - root - INFO - Step 10100: lr=2.78E-06, loss= 1.0867 (max= 1.6208), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:04:28,335 - root - INFO - Step 10100: lr=2.78E-06, loss= 1.0867 (max= 1.6208), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:04:28,335 - root - INFO - Step 10100: lr=2.78E-06, loss= 1.0867 (max= 1.6208), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:04:28,335 - root - INFO - Step 10100: lr=2.78E-06, loss= 1.0867 (max= 1.6208), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:04:28,335 - root - INFO - Step 10100: lr=2.78E-06, loss= 1.0867 (max= 1.6208), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:00,272 - root - INFO - Step 10110: lr=2.78E-06, loss= 1.0617 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:00,272 - root - INFO - Step 10110: lr=2.78E-06, loss= 1.0617 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:00,272 - root - INFO - Step 10110: lr=2.78E-06, loss= 1.0617 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:00,272 - root - INFO - Step 10110: lr=2.78E-06, loss= 1.0617 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:00,272 - root - INFO - Step 10110: lr=2.78E-06, loss= 1.0617 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:00,273 - root - INFO - Step 10110: lr=2.78E-06, loss= 1.0617 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:00,273 - root - INFO - Step 10110: lr=2.78E-06, loss= 1.0617 (max= 1.4289), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:00,274 - root - INFO - Step 10110: lr=2.78E-06, loss= 1.0617 (max= 1.4289), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:32,161 - root - INFO - Step 10120: lr=2.77E-06, loss= 1.0694 (max= 1.5633), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:32,161 - root - INFO - Step 10120: lr=2.77E-06, loss= 1.0694 (max= 1.5633), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:32,161 - root - INFO - Step 10120: lr=2.77E-06, loss= 1.0694 (max= 1.5633), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:32,161 - root - INFO - Step 10120: lr=2.77E-06, loss= 1.0694 (max= 1.5633), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:32,161 - root - INFO - Step 10120: lr=2.77E-06, loss= 1.0694 (max= 1.5633), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:32,161 - root - INFO - Step 10120: lr=2.77E-06, loss= 1.0694 (max= 1.5633), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:32,162 - root - INFO - Step 10120: lr=2.77E-06, loss= 1.0694 (max= 1.5633), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:05:32,162 - root - INFO - Step 10120: lr=2.77E-06, loss= 1.0694 (max= 1.5633), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:04,028 - root - INFO - Step 10130: lr=2.77E-06, loss= 1.0549 (max= 1.6072), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:04,028 - root - INFO - Step 10130: lr=2.77E-06, loss= 1.0549 (max= 1.6072), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:04,028 - root - INFO - Step 10130: lr=2.77E-06, loss= 1.0549 (max= 1.6072), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:04,028 - root - INFO - Step 10130: lr=2.77E-06, loss= 1.0549 (max= 1.6072), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:04,028 - root - INFO - Step 10130: lr=2.77E-06, loss= 1.0549 (max= 1.6072), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:04,028 - root - INFO - Step 10130: lr=2.77E-06, loss= 1.0549 (max= 1.6072), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:04,028 - root - INFO - Step 10130: lr=2.77E-06, loss= 1.0549 (max= 1.6072), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:04,029 - root - INFO - Step 10130: lr=2.77E-06, loss= 1.0549 (max= 1.6072), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:22,312 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:3587049 2025-10-26 17:06:35,893 - root - INFO - Step 10140: lr=2.77E-06, loss= 1.0567 (max= 1.6277), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:35,893 - root - INFO - Step 10140: lr=2.77E-06, loss= 1.0567 (max= 1.6277), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:35,893 - root - INFO - Step 10140: lr=2.77E-06, loss= 1.0567 (max= 1.6277), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:35,893 - root - INFO - Step 10140: lr=2.77E-06, loss= 1.0567 (max= 1.6277), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:35,893 - root - INFO - Step 10140: lr=2.77E-06, loss= 1.0567 (max= 1.6277), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:35,893 - root - INFO - Step 10140: lr=2.77E-06, loss= 1.0567 (max= 1.6277), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:35,893 - root - INFO - Step 10140: lr=2.77E-06, loss= 1.0567 (max= 1.6277), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:06:35,894 - root - INFO - Step 10140: lr=2.77E-06, loss= 1.0567 (max= 1.6277), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:07,893 - root - INFO - Step 10150: lr=2.76E-06, loss= 1.0676 (max= 1.5151), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:07,893 - root - INFO - Step 10150: lr=2.76E-06, loss= 1.0676 (max= 1.5151), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:07,893 - root - INFO - Step 10150: lr=2.76E-06, loss= 1.0676 (max= 1.5151), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:07,893 - root - INFO - Step 10150: lr=2.76E-06, loss= 1.0676 (max= 1.5151), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:07,894 - root - INFO - Step 10150: lr=2.76E-06, loss= 1.0676 (max= 1.5151), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:07,894 - root - INFO - Step 10150: lr=2.76E-06, loss= 1.0676 (max= 1.5151), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:07,894 - root - INFO - Step 10150: lr=2.76E-06, loss= 1.0676 (max= 1.5151), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:07,894 - root - INFO - Step 10150: lr=2.76E-06, loss= 1.0676 (max= 1.5151), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:39,837 - root - INFO - Step 10160: lr=2.76E-06, loss= 1.0925 (max= 1.5217), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:39,838 - root - INFO - Step 10160: lr=2.76E-06, loss= 1.0925 (max= 1.5217), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:39,838 - root - INFO - Step 10160: lr=2.76E-06, loss= 1.0925 (max= 1.5217), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:39,838 - root - INFO - Step 10160: lr=2.76E-06, loss= 1.0925 (max= 1.5217), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:39,838 - root - INFO - Step 10160: lr=2.76E-06, loss= 1.0925 (max= 1.5217), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:39,838 - root - INFO - Step 10160: lr=2.76E-06, loss= 1.0925 (max= 1.5217), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:39,838 - root - INFO - Step 10160: lr=2.76E-06, loss= 1.0925 (max= 1.5217), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:07:39,838 - root - INFO - Step 10160: lr=2.76E-06, loss= 1.0925 (max= 1.5217), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:11,652 - root - INFO - Step 10170: lr=2.76E-06, loss= 1.0723 (max= 1.5636), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:11,652 - root - INFO - Step 10170: lr=2.76E-06, loss= 1.0723 (max= 1.5636), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:11,652 - root - INFO - Step 10170: lr=2.76E-06, loss= 1.0723 (max= 1.5636), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:11,652 - root - INFO - Step 10170: lr=2.76E-06, loss= 1.0723 (max= 1.5636), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:11,652 - root - INFO - Step 10170: lr=2.76E-06, loss= 1.0723 (max= 1.5636), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:11,652 - root - INFO - Step 10170: lr=2.76E-06, loss= 1.0723 (max= 1.5636), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:11,652 - root - INFO - Step 10170: lr=2.76E-06, loss= 1.0723 (max= 1.5636), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:11,653 - root - INFO - Step 10170: lr=2.76E-06, loss= 1.0723 (max= 1.5636), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:43,508 - root - INFO - Step 10180: lr=2.75E-06, loss= 1.0965 (max= 1.4316), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:43,508 - root - INFO - Step 10180: lr=2.75E-06, loss= 1.0965 (max= 1.4316), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:43,508 - root - INFO - Step 10180: lr=2.75E-06, loss= 1.0965 (max= 1.4316), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:43,508 - root - INFO - Step 10180: lr=2.75E-06, loss= 1.0965 (max= 1.4316), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:43,508 - root - INFO - Step 10180: lr=2.75E-06, loss= 1.0965 (max= 1.4316), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:43,508 - root - INFO - Step 10180: lr=2.75E-06, loss= 1.0965 (max= 1.4316), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:43,508 - root - INFO - Step 10180: lr=2.75E-06, loss= 1.0965 (max= 1.4316), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:08:43,509 - root - INFO - Step 10180: lr=2.75E-06, loss= 1.0965 (max= 1.4316), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:15,347 - root - INFO - Step 10190: lr=2.75E-06, loss= 1.0812 (max= 1.5861), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:15,347 - root - INFO - Step 10190: lr=2.75E-06, loss= 1.0812 (max= 1.5861), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:15,347 - root - INFO - Step 10190: lr=2.75E-06, loss= 1.0812 (max= 1.5861), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:15,347 - root - INFO - Step 10190: lr=2.75E-06, loss= 1.0812 (max= 1.5861), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:15,347 - root - INFO - Step 10190: lr=2.75E-06, loss= 1.0812 (max= 1.5861), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:15,347 - root - INFO - Step 10190: lr=2.75E-06, loss= 1.0812 (max= 1.5861), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:15,347 - root - INFO - Step 10190: lr=2.75E-06, loss= 1.0812 (max= 1.5861), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:15,348 - root - INFO - Step 10190: lr=2.75E-06, loss= 1.0812 (max= 1.5861), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:47,181 - root - INFO - Step 10200: lr=2.74E-06, loss= 1.0766 (max= 1.6659), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:47,181 - root - INFO - Step 10200: lr=2.74E-06, loss= 1.0766 (max= 1.6659), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:47,181 - root - INFO - Step 10200: lr=2.74E-06, loss= 1.0766 (max= 1.6659), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:47,181 - root - INFO - Step 10200: lr=2.74E-06, loss= 1.0766 (max= 1.6659), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:47,181 - root - INFO - Step 10200: lr=2.74E-06, loss= 1.0766 (max= 1.6659), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:47,181 - root - INFO - Step 10200: lr=2.74E-06, loss= 1.0766 (max= 1.6659), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:47,181 - root - INFO - Step 10200: lr=2.74E-06, loss= 1.0766 (max= 1.6659), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:09:47,182 - root - INFO - Step 10200: lr=2.74E-06, loss= 1.0766 (max= 1.6659), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:19,100 - root - INFO - Step 10210: lr=2.74E-06, loss= 1.0589 (max= 1.5556), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:19,101 - root - INFO - Step 10210: lr=2.74E-06, loss= 1.0589 (max= 1.5556), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:19,101 - root - INFO - Step 10210: lr=2.74E-06, loss= 1.0589 (max= 1.5556), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:19,101 - root - INFO - Step 10210: lr=2.74E-06, loss= 1.0589 (max= 1.5556), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:19,101 - root - INFO - Step 10210: lr=2.74E-06, loss= 1.0589 (max= 1.5556), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:19,101 - root - INFO - Step 10210: lr=2.74E-06, loss= 1.0589 (max= 1.5556), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:19,101 - root - INFO - Step 10210: lr=2.74E-06, loss= 1.0589 (max= 1.5556), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:19,101 - root - INFO - Step 10210: lr=2.74E-06, loss= 1.0589 (max= 1.5556), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:50,959 - root - INFO - Step 10220: lr=2.74E-06, loss= 1.0943 (max= 1.9351), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:50,959 - root - INFO - Step 10220: lr=2.74E-06, loss= 1.0943 (max= 1.9351), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:50,959 - root - INFO - Step 10220: lr=2.74E-06, loss= 1.0943 (max= 1.9351), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:50,959 - root - INFO - Step 10220: lr=2.74E-06, loss= 1.0943 (max= 1.9351), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:50,959 - root - INFO - Step 10220: lr=2.74E-06, loss= 1.0943 (max= 1.9351), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:50,959 - root - INFO - Step 10220: lr=2.74E-06, loss= 1.0943 (max= 1.9351), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:50,960 - root - INFO - Step 10220: lr=2.74E-06, loss= 1.0943 (max= 1.9351), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:10:50,960 - root - INFO - Step 10220: lr=2.74E-06, loss= 1.0943 (max= 1.9351), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:22,832 - root - INFO - Step 10230: lr=2.73E-06, loss= 1.0822 (max= 1.7395), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:22,832 - root - INFO - Step 10230: lr=2.73E-06, loss= 1.0822 (max= 1.7395), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:22,832 - root - INFO - Step 10230: lr=2.73E-06, loss= 1.0822 (max= 1.7395), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:22,832 - root - INFO - Step 10230: lr=2.73E-06, loss= 1.0822 (max= 1.7395), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:22,832 - root - INFO - Step 10230: lr=2.73E-06, loss= 1.0822 (max= 1.7395), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:22,832 - root - INFO - Step 10230: lr=2.73E-06, loss= 1.0822 (max= 1.7395), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:22,832 - root - INFO - Step 10230: lr=2.73E-06, loss= 1.0822 (max= 1.7395), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:22,833 - root - INFO - Step 10230: lr=2.73E-06, loss= 1.0822 (max= 1.7395), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:54,817 - root - INFO - Step 10240: lr=2.73E-06, loss= 1.0848 (max= 1.5207), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:54,817 - root - INFO - Step 10240: lr=2.73E-06, loss= 1.0848 (max= 1.5207), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:54,817 - root - INFO - Step 10240: lr=2.73E-06, loss= 1.0848 (max= 1.5207), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:54,817 - root - INFO - Step 10240: lr=2.73E-06, loss= 1.0848 (max= 1.5207), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:54,817 - root - INFO - Step 10240: lr=2.73E-06, loss= 1.0848 (max= 1.5207), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:54,817 - root - INFO - Step 10240: lr=2.73E-06, loss= 1.0848 (max= 1.5207), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:54,817 - root - INFO - Step 10240: lr=2.73E-06, loss= 1.0848 (max= 1.5207), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:11:54,818 - root - INFO - Step 10240: lr=2.73E-06, loss= 1.0848 (max= 1.5207), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:26,639 - root - INFO - Step 10250: lr=2.73E-06, loss= 1.0863 (max= 1.7002), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:26,639 - root - INFO - Step 10250: lr=2.73E-06, loss= 1.0863 (max= 1.7002), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:26,639 - root - INFO - Step 10250: lr=2.73E-06, loss= 1.0863 (max= 1.7002), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:26,639 - root - INFO - Step 10250: lr=2.73E-06, loss= 1.0863 (max= 1.7002), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:26,639 - root - INFO - Step 10250: lr=2.73E-06, loss= 1.0863 (max= 1.7002), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:26,639 - root - INFO - Step 10250: lr=2.73E-06, loss= 1.0863 (max= 1.7002), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:26,639 - root - INFO - Step 10250: lr=2.73E-06, loss= 1.0863 (max= 1.7002), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:26,640 - root - INFO - Step 10250: lr=2.73E-06, loss= 1.0863 (max= 1.7002), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:58,444 - root - INFO - Step 10260: lr=2.72E-06, loss= 1.0432 (max= 1.5970), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:58,444 - root - INFO - Step 10260: lr=2.72E-06, loss= 1.0432 (max= 1.5970), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:58,444 - root - INFO - Step 10260: lr=2.72E-06, loss= 1.0432 (max= 1.5970), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:58,444 - root - INFO - Step 10260: lr=2.72E-06, loss= 1.0432 (max= 1.5970), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:58,444 - root - INFO - Step 10260: lr=2.72E-06, loss= 1.0432 (max= 1.5970), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:58,444 - root - INFO - Step 10260: lr=2.72E-06, loss= 1.0432 (max= 1.5970), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:58,444 - root - INFO - Step 10260: lr=2.72E-06, loss= 1.0432 (max= 1.5970), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:12:58,444 - root - INFO - Step 10260: lr=2.72E-06, loss= 1.0432 (max= 1.5970), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:13:30,330 - root - INFO - Step 10270: lr=2.72E-06, loss= 1.0571 (max= 1.5262), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:13:30,330 - root - INFO - Step 10270: lr=2.72E-06, loss= 1.0571 (max= 1.5262), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:13:30,330 - root - INFO - Step 10270: lr=2.72E-06, loss= 1.0571 (max= 1.5262), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:13:30,330 - root - INFO - Step 10270: lr=2.72E-06, loss= 1.0571 (max= 1.5262), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:13:30,330 - root - INFO - Step 10270: lr=2.72E-06, loss= 1.0571 (max= 1.5262), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:13:30,330 - root - INFO - Step 10270: lr=2.72E-06, loss= 1.0571 (max= 1.5262), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:13:30,330 - root - INFO - Step 10270: lr=2.72E-06, loss= 1.0571 (max= 1.5262), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:13:30,330 - root - INFO - Step 10270: lr=2.72E-06, loss= 1.0571 (max= 1.5262), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:02,397 - root - INFO - Step 10280: lr=2.71E-06, loss= 1.0644 (max= 1.4827), tps=20439, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:02,397 - root - INFO - Step 10280: lr=2.71E-06, loss= 1.0644 (max= 1.4827), tps=20439, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:02,397 - root - INFO - Step 10280: lr=2.71E-06, loss= 1.0644 (max= 1.4827), tps=20439, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:02,397 - root - INFO - Step 10280: lr=2.71E-06, loss= 1.0644 (max= 1.4827), tps=20439, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:02,397 - root - INFO - Step 10280: lr=2.71E-06, loss= 1.0644 (max= 1.4827), tps=20439, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:02,397 - root - INFO - Step 10280: lr=2.71E-06, loss= 1.0644 (max= 1.4827), tps=20439, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:02,397 - root - INFO - Step 10280: lr=2.71E-06, loss= 1.0644 (max= 1.4827), tps=20439, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:02,398 - root - INFO - Step 10280: lr=2.71E-06, loss= 1.0644 (max= 1.4827), tps=20439, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:34,233 - root - INFO - Step 10290: lr=2.71E-06, loss= 1.0729 (max= 1.4802), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:34,233 - root - INFO - Step 10290: lr=2.71E-06, loss= 1.0729 (max= 1.4802), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:34,233 - root - INFO - Step 10290: lr=2.71E-06, loss= 1.0729 (max= 1.4802), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:34,233 - root - INFO - Step 10290: lr=2.71E-06, loss= 1.0729 (max= 1.4802), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:34,233 - root - INFO - Step 10290: lr=2.71E-06, loss= 1.0729 (max= 1.4802), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:34,233 - root - INFO - Step 10290: lr=2.71E-06, loss= 1.0729 (max= 1.4802), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:34,233 - root - INFO - Step 10290: lr=2.71E-06, loss= 1.0729 (max= 1.4802), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:14:34,233 - root - INFO - Step 10290: lr=2.71E-06, loss= 1.0729 (max= 1.4802), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:06,126 - root - INFO - Step 10300: lr=2.71E-06, loss= 1.0722 (max= 1.6036), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:06,126 - root - INFO - Step 10300: lr=2.71E-06, loss= 1.0722 (max= 1.6036), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:06,126 - root - INFO - Step 10300: lr=2.71E-06, loss= 1.0722 (max= 1.6036), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:06,126 - root - INFO - Step 10300: lr=2.71E-06, loss= 1.0722 (max= 1.6036), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:06,126 - root - INFO - Step 10300: lr=2.71E-06, loss= 1.0722 (max= 1.6036), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:06,126 - root - INFO - Step 10300: lr=2.71E-06, loss= 1.0722 (max= 1.6036), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:06,126 - root - INFO - Step 10300: lr=2.71E-06, loss= 1.0722 (max= 1.6036), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:06,126 - root - INFO - Step 10300: lr=2.71E-06, loss= 1.0722 (max= 1.6036), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:37,946 - root - INFO - Step 10310: lr=2.70E-06, loss= 1.0440 (max= 1.5316), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:37,946 - root - INFO - Step 10310: lr=2.70E-06, loss= 1.0440 (max= 1.5316), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:37,946 - root - INFO - Step 10310: lr=2.70E-06, loss= 1.0440 (max= 1.5316), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:37,946 - root - INFO - Step 10310: lr=2.70E-06, loss= 1.0440 (max= 1.5316), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:37,946 - root - INFO - Step 10310: lr=2.70E-06, loss= 1.0440 (max= 1.5316), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:37,946 - root - INFO - Step 10310: lr=2.70E-06, loss= 1.0440 (max= 1.5316), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:37,946 - root - INFO - Step 10310: lr=2.70E-06, loss= 1.0440 (max= 1.5316), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:15:37,947 - root - INFO - Step 10310: lr=2.70E-06, loss= 1.0440 (max= 1.5316), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:09,778 - root - INFO - Step 10320: lr=2.70E-06, loss= 1.0769 (max= 1.6607), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:09,778 - root - INFO - Step 10320: lr=2.70E-06, loss= 1.0769 (max= 1.6607), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:09,778 - root - INFO - Step 10320: lr=2.70E-06, loss= 1.0769 (max= 1.6607), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:09,779 - root - INFO - Step 10320: lr=2.70E-06, loss= 1.0769 (max= 1.6607), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:09,779 - root - INFO - Step 10320: lr=2.70E-06, loss= 1.0769 (max= 1.6607), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:09,779 - root - INFO - Step 10320: lr=2.70E-06, loss= 1.0769 (max= 1.6607), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:09,779 - root - INFO - Step 10320: lr=2.70E-06, loss= 1.0769 (max= 1.6607), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:09,779 - root - INFO - Step 10320: lr=2.70E-06, loss= 1.0769 (max= 1.6607), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:41,544 - root - INFO - Step 10330: lr=2.70E-06, loss= 1.0467 (max= 1.6250), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:41,545 - root - INFO - Step 10330: lr=2.70E-06, loss= 1.0467 (max= 1.6250), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:41,545 - root - INFO - Step 10330: lr=2.70E-06, loss= 1.0467 (max= 1.6250), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:41,545 - root - INFO - Step 10330: lr=2.70E-06, loss= 1.0467 (max= 1.6250), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:41,545 - root - INFO - Step 10330: lr=2.70E-06, loss= 1.0467 (max= 1.6250), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:41,545 - root - INFO - Step 10330: lr=2.70E-06, loss= 1.0467 (max= 1.6250), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:41,545 - root - INFO - Step 10330: lr=2.70E-06, loss= 1.0467 (max= 1.6250), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:16:41,545 - root - INFO - Step 10330: lr=2.70E-06, loss= 1.0467 (max= 1.6250), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:13,399 - root - INFO - Step 10340: lr=2.69E-06, loss= 1.0678 (max= 1.5397), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:13,400 - root - INFO - Step 10340: lr=2.69E-06, loss= 1.0678 (max= 1.5397), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:13,400 - root - INFO - Step 10340: lr=2.69E-06, loss= 1.0678 (max= 1.5397), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:13,400 - root - INFO - Step 10340: lr=2.69E-06, loss= 1.0678 (max= 1.5397), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:13,400 - root - INFO - Step 10340: lr=2.69E-06, loss= 1.0678 (max= 1.5397), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:13,400 - root - INFO - Step 10340: lr=2.69E-06, loss= 1.0678 (max= 1.5397), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:13,400 - root - INFO - Step 10340: lr=2.69E-06, loss= 1.0678 (max= 1.5397), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:13,400 - root - INFO - Step 10340: lr=2.69E-06, loss= 1.0678 (max= 1.5397), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:45,276 - root - INFO - Step 10350: lr=2.69E-06, loss= 1.0890 (max= 1.7252), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:45,276 - root - INFO - Step 10350: lr=2.69E-06, loss= 1.0890 (max= 1.7252), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:45,276 - root - INFO - Step 10350: lr=2.69E-06, loss= 1.0890 (max= 1.7252), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:45,276 - root - INFO - Step 10350: lr=2.69E-06, loss= 1.0890 (max= 1.7252), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:45,276 - root - INFO - Step 10350: lr=2.69E-06, loss= 1.0890 (max= 1.7252), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:45,276 - root - INFO - Step 10350: lr=2.69E-06, loss= 1.0890 (max= 1.7252), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:45,276 - root - INFO - Step 10350: lr=2.69E-06, loss= 1.0890 (max= 1.7252), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:17:45,277 - root - INFO - Step 10350: lr=2.69E-06, loss= 1.0890 (max= 1.7252), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:17,130 - root - INFO - Step 10360: lr=2.68E-06, loss= 1.0440 (max= 1.4752), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:17,130 - root - INFO - Step 10360: lr=2.68E-06, loss= 1.0440 (max= 1.4752), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:17,130 - root - INFO - Step 10360: lr=2.68E-06, loss= 1.0440 (max= 1.4752), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:17,130 - root - INFO - Step 10360: lr=2.68E-06, loss= 1.0440 (max= 1.4752), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:17,130 - root - INFO - Step 10360: lr=2.68E-06, loss= 1.0440 (max= 1.4752), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:17,130 - root - INFO - Step 10360: lr=2.68E-06, loss= 1.0440 (max= 1.4752), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:17,130 - root - INFO - Step 10360: lr=2.68E-06, loss= 1.0440 (max= 1.4752), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:17,131 - root - INFO - Step 10360: lr=2.68E-06, loss= 1.0440 (max= 1.4752), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:48,939 - root - INFO - Step 10370: lr=2.68E-06, loss= 1.0633 (max= 1.4639), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:48,939 - root - INFO - Step 10370: lr=2.68E-06, loss= 1.0633 (max= 1.4639), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:48,939 - root - INFO - Step 10370: lr=2.68E-06, loss= 1.0633 (max= 1.4639), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:48,939 - root - INFO - Step 10370: lr=2.68E-06, loss= 1.0633 (max= 1.4639), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:48,939 - root - INFO - Step 10370: lr=2.68E-06, loss= 1.0633 (max= 1.4639), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:48,939 - root - INFO - Step 10370: lr=2.68E-06, loss= 1.0633 (max= 1.4639), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:48,940 - root - INFO - Step 10370: lr=2.68E-06, loss= 1.0633 (max= 1.4639), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:18:48,941 - root - INFO - Step 10370: lr=2.68E-06, loss= 1.0633 (max= 1.4639), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:20,770 - root - INFO - Step 10380: lr=2.68E-06, loss= 1.0619 (max= 1.4631), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:20,770 - root - INFO - Step 10380: lr=2.68E-06, loss= 1.0619 (max= 1.4631), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:20,770 - root - INFO - Step 10380: lr=2.68E-06, loss= 1.0619 (max= 1.4631), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:20,770 - root - INFO - Step 10380: lr=2.68E-06, loss= 1.0619 (max= 1.4631), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:20,771 - root - INFO - Step 10380: lr=2.68E-06, loss= 1.0619 (max= 1.4631), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:20,771 - root - INFO - Step 10380: lr=2.68E-06, loss= 1.0619 (max= 1.4631), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:20,771 - root - INFO - Step 10380: lr=2.68E-06, loss= 1.0619 (max= 1.4631), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:20,772 - root - INFO - Step 10380: lr=2.68E-06, loss= 1.0619 (max= 1.4631), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:52,607 - root - INFO - Step 10390: lr=2.67E-06, loss= 1.1010 (max= 1.5883), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:52,607 - root - INFO - Step 10390: lr=2.67E-06, loss= 1.1010 (max= 1.5883), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:52,607 - root - INFO - Step 10390: lr=2.67E-06, loss= 1.1010 (max= 1.5883), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:52,607 - root - INFO - Step 10390: lr=2.67E-06, loss= 1.1010 (max= 1.5883), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:52,607 - root - INFO - Step 10390: lr=2.67E-06, loss= 1.1010 (max= 1.5883), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:52,607 - root - INFO - Step 10390: lr=2.67E-06, loss= 1.1010 (max= 1.5883), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:52,608 - root - INFO - Step 10390: lr=2.67E-06, loss= 1.1010 (max= 1.5883), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:19:52,608 - root - INFO - Step 10390: lr=2.67E-06, loss= 1.1010 (max= 1.5883), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:24,485 - root - INFO - Step 10400: lr=2.67E-06, loss= 1.0706 (max= 1.6272), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:24,485 - root - INFO - Step 10400: lr=2.67E-06, loss= 1.0706 (max= 1.6272), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:24,485 - root - INFO - Step 10400: lr=2.67E-06, loss= 1.0706 (max= 1.6272), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:24,485 - root - INFO - Step 10400: lr=2.67E-06, loss= 1.0706 (max= 1.6272), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:24,485 - root - INFO - Step 10400: lr=2.67E-06, loss= 1.0706 (max= 1.6272), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:24,485 - root - INFO - Step 10400: lr=2.67E-06, loss= 1.0706 (max= 1.6272), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:24,485 - root - INFO - Step 10400: lr=2.67E-06, loss= 1.0706 (max= 1.6272), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:24,486 - root - INFO - Step 10400: lr=2.67E-06, loss= 1.0706 (max= 1.6272), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:56,378 - root - INFO - Step 10410: lr=2.67E-06, loss= 1.0445 (max= 1.5848), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:56,378 - root - INFO - Step 10410: lr=2.67E-06, loss= 1.0445 (max= 1.5848), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:56,378 - root - INFO - Step 10410: lr=2.67E-06, loss= 1.0445 (max= 1.5848), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:56,378 - root - INFO - Step 10410: lr=2.67E-06, loss= 1.0445 (max= 1.5848), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:56,378 - root - INFO - Step 10410: lr=2.67E-06, loss= 1.0445 (max= 1.5848), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:56,378 - root - INFO - Step 10410: lr=2.67E-06, loss= 1.0445 (max= 1.5848), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:56,378 - root - INFO - Step 10410: lr=2.67E-06, loss= 1.0445 (max= 1.5848), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:20:56,378 - root - INFO - Step 10410: lr=2.67E-06, loss= 1.0445 (max= 1.5848), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:21:28,316 - root - INFO - Step 10420: lr=2.66E-06, loss= 1.0700 (max= 1.7286), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:21:28,316 - root - INFO - Step 10420: lr=2.66E-06, loss= 1.0700 (max= 1.7286), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:21:28,316 - root - INFO - Step 10420: lr=2.66E-06, loss= 1.0700 (max= 1.7286), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:21:28,316 - root - INFO - Step 10420: lr=2.66E-06, loss= 1.0700 (max= 1.7286), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:21:28,316 - root - INFO - Step 10420: lr=2.66E-06, loss= 1.0700 (max= 1.7286), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:21:28,316 - root - INFO - Step 10420: lr=2.66E-06, loss= 1.0700 (max= 1.7286), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:21:28,317 - root - INFO - Step 10420: lr=2.66E-06, loss= 1.0700 (max= 1.7286), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:21:28,317 - root - INFO - Step 10420: lr=2.66E-06, loss= 1.0700 (max= 1.7286), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:00,233 - root - INFO - Step 10430: lr=2.66E-06, loss= 1.0417 (max= 1.5909), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:00,233 - root - INFO - Step 10430: lr=2.66E-06, loss= 1.0417 (max= 1.5909), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:00,233 - root - INFO - Step 10430: lr=2.66E-06, loss= 1.0417 (max= 1.5909), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:00,234 - root - INFO - Step 10430: lr=2.66E-06, loss= 1.0417 (max= 1.5909), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:00,234 - root - INFO - Step 10430: lr=2.66E-06, loss= 1.0417 (max= 1.5909), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:00,234 - root - INFO - Step 10430: lr=2.66E-06, loss= 1.0417 (max= 1.5909), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:00,234 - root - INFO - Step 10430: lr=2.66E-06, loss= 1.0417 (max= 1.5909), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:00,234 - root - INFO - Step 10430: lr=2.66E-06, loss= 1.0417 (max= 1.5909), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:32,219 - root - INFO - Step 10440: lr=2.66E-06, loss= 1.0742 (max= 1.5875), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:32,219 - root - INFO - Step 10440: lr=2.66E-06, loss= 1.0742 (max= 1.5875), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:32,219 - root - INFO - Step 10440: lr=2.66E-06, loss= 1.0742 (max= 1.5875), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:32,219 - root - INFO - Step 10440: lr=2.66E-06, loss= 1.0742 (max= 1.5875), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:32,220 - root - INFO - Step 10440: lr=2.66E-06, loss= 1.0742 (max= 1.5875), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:32,220 - root - INFO - Step 10440: lr=2.66E-06, loss= 1.0742 (max= 1.5875), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:32,220 - root - INFO - Step 10440: lr=2.66E-06, loss= 1.0742 (max= 1.5875), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:22:32,220 - root - INFO - Step 10440: lr=2.66E-06, loss= 1.0742 (max= 1.5875), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:03,976 - root - INFO - Step 10450: lr=2.65E-06, loss= 1.0774 (max= 1.5869), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:03,976 - root - INFO - Step 10450: lr=2.65E-06, loss= 1.0774 (max= 1.5869), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:03,976 - root - INFO - Step 10450: lr=2.65E-06, loss= 1.0774 (max= 1.5869), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:03,976 - root - INFO - Step 10450: lr=2.65E-06, loss= 1.0774 (max= 1.5869), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:03,976 - root - INFO - Step 10450: lr=2.65E-06, loss= 1.0774 (max= 1.5869), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:03,976 - root - INFO - Step 10450: lr=2.65E-06, loss= 1.0774 (max= 1.5869), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:03,976 - root - INFO - Step 10450: lr=2.65E-06, loss= 1.0774 (max= 1.5869), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:03,977 - root - INFO - Step 10450: lr=2.65E-06, loss= 1.0774 (max= 1.5869), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:35,879 - root - INFO - Step 10460: lr=2.65E-06, loss= 1.0552 (max= 1.7408), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:35,880 - root - INFO - Step 10460: lr=2.65E-06, loss= 1.0552 (max= 1.7408), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:35,880 - root - INFO - Step 10460: lr=2.65E-06, loss= 1.0552 (max= 1.7408), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:35,880 - root - INFO - Step 10460: lr=2.65E-06, loss= 1.0552 (max= 1.7408), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:35,880 - root - INFO - Step 10460: lr=2.65E-06, loss= 1.0552 (max= 1.7408), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:35,880 - root - INFO - Step 10460: lr=2.65E-06, loss= 1.0552 (max= 1.7408), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:35,880 - root - INFO - Step 10460: lr=2.65E-06, loss= 1.0552 (max= 1.7408), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:23:35,880 - root - INFO - Step 10460: lr=2.65E-06, loss= 1.0552 (max= 1.7408), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:07,729 - root - INFO - Step 10470: lr=2.64E-06, loss= 1.0919 (max= 1.7526), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:07,729 - root - INFO - Step 10470: lr=2.64E-06, loss= 1.0919 (max= 1.7526), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:07,729 - root - INFO - Step 10470: lr=2.64E-06, loss= 1.0919 (max= 1.7526), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:07,729 - root - INFO - Step 10470: lr=2.64E-06, loss= 1.0919 (max= 1.7526), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:07,729 - root - INFO - Step 10470: lr=2.64E-06, loss= 1.0919 (max= 1.7526), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:07,729 - root - INFO - Step 10470: lr=2.64E-06, loss= 1.0919 (max= 1.7526), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:07,729 - root - INFO - Step 10470: lr=2.64E-06, loss= 1.0919 (max= 1.7526), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:07,729 - root - INFO - Step 10470: lr=2.64E-06, loss= 1.0919 (max= 1.7526), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:40,129 - root - INFO - Step 10480: lr=2.64E-06, loss= 1.0640 (max= 1.5140), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:40,129 - root - INFO - Step 10480: lr=2.64E-06, loss= 1.0640 (max= 1.5140), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:40,129 - root - INFO - Step 10480: lr=2.64E-06, loss= 1.0640 (max= 1.5140), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:40,129 - root - INFO - Step 10480: lr=2.64E-06, loss= 1.0640 (max= 1.5140), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:40,129 - root - INFO - Step 10480: lr=2.64E-06, loss= 1.0640 (max= 1.5140), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:40,129 - root - INFO - Step 10480: lr=2.64E-06, loss= 1.0640 (max= 1.5140), tps=20229, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:40,130 - root - INFO - Step 10480: lr=2.64E-06, loss= 1.0640 (max= 1.5140), tps=20229, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:24:40,130 - root - INFO - Step 10480: lr=2.64E-06, loss= 1.0640 (max= 1.5140), tps=20229, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:11,964 - root - INFO - Step 10490: lr=2.64E-06, loss= 1.0580 (max= 1.6310), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:11,964 - root - INFO - Step 10490: lr=2.64E-06, loss= 1.0580 (max= 1.6310), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:11,964 - root - INFO - Step 10490: lr=2.64E-06, loss= 1.0580 (max= 1.6310), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:11,964 - root - INFO - Step 10490: lr=2.64E-06, loss= 1.0580 (max= 1.6310), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:11,964 - root - INFO - Step 10490: lr=2.64E-06, loss= 1.0580 (max= 1.6310), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:11,964 - root - INFO - Step 10490: lr=2.64E-06, loss= 1.0580 (max= 1.6310), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:11,964 - root - INFO - Step 10490: lr=2.64E-06, loss= 1.0580 (max= 1.6310), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:11,965 - root - INFO - Step 10490: lr=2.64E-06, loss= 1.0580 (max= 1.6310), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:43,752 - root - INFO - Step 10500: lr=2.63E-06, loss= 1.0486 (max= 1.5634), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:43,752 - root - INFO - Step 10500: lr=2.63E-06, loss= 1.0486 (max= 1.5634), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:43,752 - root - INFO - Step 10500: lr=2.63E-06, loss= 1.0486 (max= 1.5634), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:43,752 - root - INFO - Step 10500: lr=2.63E-06, loss= 1.0486 (max= 1.5634), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:43,752 - root - INFO - Step 10500: lr=2.63E-06, loss= 1.0486 (max= 1.5634), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:43,752 - root - INFO - Step 10500: lr=2.63E-06, loss= 1.0486 (max= 1.5634), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:43,752 - root - INFO - Step 10500: lr=2.63E-06, loss= 1.0486 (max= 1.5634), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:25:43,753 - root - INFO - Step 10500: lr=2.63E-06, loss= 1.0486 (max= 1.5634), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:15,578 - root - INFO - Step 10510: lr=2.63E-06, loss= 1.0977 (max= 1.6421), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:15,578 - root - INFO - Step 10510: lr=2.63E-06, loss= 1.0977 (max= 1.6421), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:15,578 - root - INFO - Step 10510: lr=2.63E-06, loss= 1.0977 (max= 1.6421), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:15,578 - root - INFO - Step 10510: lr=2.63E-06, loss= 1.0977 (max= 1.6421), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:15,578 - root - INFO - Step 10510: lr=2.63E-06, loss= 1.0977 (max= 1.6421), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:15,578 - root - INFO - Step 10510: lr=2.63E-06, loss= 1.0977 (max= 1.6421), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:15,578 - root - INFO - Step 10510: lr=2.63E-06, loss= 1.0977 (max= 1.6421), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:15,579 - root - INFO - Step 10510: lr=2.63E-06, loss= 1.0977 (max= 1.6421), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:35,197 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2656800 2025-10-26 17:26:47,437 - root - INFO - Step 10520: lr=2.63E-06, loss= 1.0756 (max= 1.7010), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:47,437 - root - INFO - Step 10520: lr=2.63E-06, loss= 1.0756 (max= 1.7010), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:47,437 - root - INFO - Step 10520: lr=2.63E-06, loss= 1.0756 (max= 1.7010), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:47,437 - root - INFO - Step 10520: lr=2.63E-06, loss= 1.0756 (max= 1.7010), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:47,437 - root - INFO - Step 10520: lr=2.63E-06, loss= 1.0756 (max= 1.7010), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:47,437 - root - INFO - Step 10520: lr=2.63E-06, loss= 1.0756 (max= 1.7010), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:47,437 - root - INFO - Step 10520: lr=2.63E-06, loss= 1.0756 (max= 1.7010), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:26:47,438 - root - INFO - Step 10520: lr=2.63E-06, loss= 1.0756 (max= 1.7010), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:19,351 - root - INFO - Step 10530: lr=2.62E-06, loss= 1.0531 (max= 1.6264), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:19,351 - root - INFO - Step 10530: lr=2.62E-06, loss= 1.0531 (max= 1.6264), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:19,351 - root - INFO - Step 10530: lr=2.62E-06, loss= 1.0531 (max= 1.6264), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:19,351 - root - INFO - Step 10530: lr=2.62E-06, loss= 1.0531 (max= 1.6264), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:19,351 - root - INFO - Step 10530: lr=2.62E-06, loss= 1.0531 (max= 1.6264), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:19,351 - root - INFO - Step 10530: lr=2.62E-06, loss= 1.0531 (max= 1.6264), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:19,351 - root - INFO - Step 10530: lr=2.62E-06, loss= 1.0531 (max= 1.6264), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:19,352 - root - INFO - Step 10530: lr=2.62E-06, loss= 1.0531 (max= 1.6264), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:51,263 - root - INFO - Step 10540: lr=2.62E-06, loss= 1.0914 (max= 1.5285), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:51,263 - root - INFO - Step 10540: lr=2.62E-06, loss= 1.0914 (max= 1.5285), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:51,263 - root - INFO - Step 10540: lr=2.62E-06, loss= 1.0914 (max= 1.5285), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:51,263 - root - INFO - Step 10540: lr=2.62E-06, loss= 1.0914 (max= 1.5285), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:51,263 - root - INFO - Step 10540: lr=2.62E-06, loss= 1.0914 (max= 1.5285), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:51,263 - root - INFO - Step 10540: lr=2.62E-06, loss= 1.0914 (max= 1.5285), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:51,263 - root - INFO - Step 10540: lr=2.62E-06, loss= 1.0914 (max= 1.5285), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:27:51,264 - root - INFO - Step 10540: lr=2.62E-06, loss= 1.0914 (max= 1.5285), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:23,351 - root - INFO - Step 10550: lr=2.61E-06, loss= 1.0544 (max= 1.5582), tps=20426, mfu=42.56%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:23,351 - root - INFO - Step 10550: lr=2.61E-06, loss= 1.0544 (max= 1.5582), tps=20426, mfu=42.56%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:23,351 - root - INFO - Step 10550: lr=2.61E-06, loss= 1.0544 (max= 1.5582), tps=20426, mfu=42.56%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:23,351 - root - INFO - Step 10550: lr=2.61E-06, loss= 1.0544 (max= 1.5582), tps=20426, mfu=42.56%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:23,351 - root - INFO - Step 10550: lr=2.61E-06, loss= 1.0544 (max= 1.5582), tps=20426, mfu=42.56%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:23,351 - root - INFO - Step 10550: lr=2.61E-06, loss= 1.0544 (max= 1.5582), tps=20426, mfu=42.56%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:23,351 - root - INFO - Step 10550: lr=2.61E-06, loss= 1.0544 (max= 1.5582), tps=20426, mfu=42.56%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:23,352 - root - INFO - Step 10550: lr=2.61E-06, loss= 1.0544 (max= 1.5582), tps=20427, mfu=42.56%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:55,182 - root - INFO - Step 10560: lr=2.61E-06, loss= 1.0574 (max= 1.4433), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:55,182 - root - INFO - Step 10560: lr=2.61E-06, loss= 1.0574 (max= 1.4433), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:55,182 - root - INFO - Step 10560: lr=2.61E-06, loss= 1.0574 (max= 1.4433), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:55,182 - root - INFO - Step 10560: lr=2.61E-06, loss= 1.0574 (max= 1.4433), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:55,182 - root - INFO - Step 10560: lr=2.61E-06, loss= 1.0574 (max= 1.4433), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:55,182 - root - INFO - Step 10560: lr=2.61E-06, loss= 1.0574 (max= 1.4433), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:55,182 - root - INFO - Step 10560: lr=2.61E-06, loss= 1.0574 (max= 1.4433), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:28:55,183 - root - INFO - Step 10560: lr=2.61E-06, loss= 1.0574 (max= 1.4433), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:27,067 - root - INFO - Step 10570: lr=2.61E-06, loss= 1.0647 (max= 1.6874), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:27,067 - root - INFO - Step 10570: lr=2.61E-06, loss= 1.0647 (max= 1.6874), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:27,067 - root - INFO - Step 10570: lr=2.61E-06, loss= 1.0647 (max= 1.6874), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:27,067 - root - INFO - Step 10570: lr=2.61E-06, loss= 1.0647 (max= 1.6874), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:27,067 - root - INFO - Step 10570: lr=2.61E-06, loss= 1.0647 (max= 1.6874), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:27,067 - root - INFO - Step 10570: lr=2.61E-06, loss= 1.0647 (max= 1.6874), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:27,067 - root - INFO - Step 10570: lr=2.61E-06, loss= 1.0647 (max= 1.6874), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:27,068 - root - INFO - Step 10570: lr=2.61E-06, loss= 1.0647 (max= 1.6874), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:58,845 - root - INFO - Step 10580: lr=2.60E-06, loss= 1.0663 (max= 1.5260), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:58,845 - root - INFO - Step 10580: lr=2.60E-06, loss= 1.0663 (max= 1.5260), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:58,845 - root - INFO - Step 10580: lr=2.60E-06, loss= 1.0663 (max= 1.5260), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:58,845 - root - INFO - Step 10580: lr=2.60E-06, loss= 1.0663 (max= 1.5260), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:58,845 - root - INFO - Step 10580: lr=2.60E-06, loss= 1.0663 (max= 1.5260), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:58,845 - root - INFO - Step 10580: lr=2.60E-06, loss= 1.0663 (max= 1.5260), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:58,845 - root - INFO - Step 10580: lr=2.60E-06, loss= 1.0663 (max= 1.5260), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:29:58,846 - root - INFO - Step 10580: lr=2.60E-06, loss= 1.0663 (max= 1.5260), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:30:30,660 - root - INFO - Step 10590: lr=2.60E-06, loss= 1.0515 (max= 1.4845), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:30:30,660 - root - INFO - Step 10590: lr=2.60E-06, loss= 1.0515 (max= 1.4845), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:30:30,660 - root - INFO - Step 10590: lr=2.60E-06, loss= 1.0515 (max= 1.4845), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:30:30,660 - root - INFO - Step 10590: lr=2.60E-06, loss= 1.0515 (max= 1.4845), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:30:30,660 - root - INFO - Step 10590: lr=2.60E-06, loss= 1.0515 (max= 1.4845), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:30:30,660 - root - INFO - Step 10590: lr=2.60E-06, loss= 1.0515 (max= 1.4845), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:30:30,660 - root - INFO - Step 10590: lr=2.60E-06, loss= 1.0515 (max= 1.4845), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:30:30,661 - root - INFO - Step 10590: lr=2.60E-06, loss= 1.0515 (max= 1.4845), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:02,508 - root - INFO - Step 10600: lr=2.60E-06, loss= 1.0684 (max= 1.7196), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:02,508 - root - INFO - Step 10600: lr=2.60E-06, loss= 1.0684 (max= 1.7196), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:02,508 - root - INFO - Step 10600: lr=2.60E-06, loss= 1.0684 (max= 1.7196), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:02,508 - root - INFO - Step 10600: lr=2.60E-06, loss= 1.0684 (max= 1.7196), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:02,508 - root - INFO - Step 10600: lr=2.60E-06, loss= 1.0684 (max= 1.7196), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:02,508 - root - INFO - Step 10600: lr=2.60E-06, loss= 1.0684 (max= 1.7196), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:02,508 - root - INFO - Step 10600: lr=2.60E-06, loss= 1.0684 (max= 1.7196), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:02,508 - root - INFO - Step 10600: lr=2.60E-06, loss= 1.0684 (max= 1.7196), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:34,360 - root - INFO - Step 10610: lr=2.59E-06, loss= 1.0601 (max= 1.8330), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:34,360 - root - INFO - Step 10610: lr=2.59E-06, loss= 1.0601 (max= 1.8330), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:34,361 - root - INFO - Step 10610: lr=2.59E-06, loss= 1.0601 (max= 1.8330), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:34,361 - root - INFO - Step 10610: lr=2.59E-06, loss= 1.0601 (max= 1.8330), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:34,361 - root - INFO - Step 10610: lr=2.59E-06, loss= 1.0601 (max= 1.8330), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:34,361 - root - INFO - Step 10610: lr=2.59E-06, loss= 1.0601 (max= 1.8330), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:34,361 - root - INFO - Step 10610: lr=2.59E-06, loss= 1.0601 (max= 1.8330), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:31:34,361 - root - INFO - Step 10610: lr=2.59E-06, loss= 1.0601 (max= 1.8330), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:06,162 - root - INFO - Step 10620: lr=2.59E-06, loss= 1.0562 (max= 1.8868), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:06,162 - root - INFO - Step 10620: lr=2.59E-06, loss= 1.0562 (max= 1.8868), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:06,162 - root - INFO - Step 10620: lr=2.59E-06, loss= 1.0562 (max= 1.8868), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:06,162 - root - INFO - Step 10620: lr=2.59E-06, loss= 1.0562 (max= 1.8868), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:06,162 - root - INFO - Step 10620: lr=2.59E-06, loss= 1.0562 (max= 1.8868), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:06,162 - root - INFO - Step 10620: lr=2.59E-06, loss= 1.0562 (max= 1.8868), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:06,162 - root - INFO - Step 10620: lr=2.59E-06, loss= 1.0562 (max= 1.8868), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:06,163 - root - INFO - Step 10620: lr=2.59E-06, loss= 1.0562 (max= 1.8868), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:37,957 - root - INFO - Step 10630: lr=2.59E-06, loss= 1.0741 (max= 1.6910), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:37,957 - root - INFO - Step 10630: lr=2.59E-06, loss= 1.0741 (max= 1.6910), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:37,957 - root - INFO - Step 10630: lr=2.59E-06, loss= 1.0741 (max= 1.6910), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:37,957 - root - INFO - Step 10630: lr=2.59E-06, loss= 1.0741 (max= 1.6910), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:37,957 - root - INFO - Step 10630: lr=2.59E-06, loss= 1.0741 (max= 1.6910), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:37,957 - root - INFO - Step 10630: lr=2.59E-06, loss= 1.0741 (max= 1.6910), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:37,957 - root - INFO - Step 10630: lr=2.59E-06, loss= 1.0741 (max= 1.6910), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:32:37,958 - root - INFO - Step 10630: lr=2.59E-06, loss= 1.0741 (max= 1.6910), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:09,758 - root - INFO - Step 10640: lr=2.58E-06, loss= 1.0792 (max= 1.5190), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:09,758 - root - INFO - Step 10640: lr=2.58E-06, loss= 1.0792 (max= 1.5190), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:09,758 - root - INFO - Step 10640: lr=2.58E-06, loss= 1.0792 (max= 1.5190), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:09,758 - root - INFO - Step 10640: lr=2.58E-06, loss= 1.0792 (max= 1.5190), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:09,758 - root - INFO - Step 10640: lr=2.58E-06, loss= 1.0792 (max= 1.5190), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:09,758 - root - INFO - Step 10640: lr=2.58E-06, loss= 1.0792 (max= 1.5190), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:09,758 - root - INFO - Step 10640: lr=2.58E-06, loss= 1.0792 (max= 1.5190), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:09,759 - root - INFO - Step 10640: lr=2.58E-06, loss= 1.0792 (max= 1.5190), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:41,568 - root - INFO - Step 10650: lr=2.58E-06, loss= 1.0757 (max= 1.5291), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:41,568 - root - INFO - Step 10650: lr=2.58E-06, loss= 1.0757 (max= 1.5291), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:41,568 - root - INFO - Step 10650: lr=2.58E-06, loss= 1.0757 (max= 1.5291), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:41,568 - root - INFO - Step 10650: lr=2.58E-06, loss= 1.0757 (max= 1.5291), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:41,568 - root - INFO - Step 10650: lr=2.58E-06, loss= 1.0757 (max= 1.5291), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:41,568 - root - INFO - Step 10650: lr=2.58E-06, loss= 1.0757 (max= 1.5291), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:41,568 - root - INFO - Step 10650: lr=2.58E-06, loss= 1.0757 (max= 1.5291), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:33:41,569 - root - INFO - Step 10650: lr=2.58E-06, loss= 1.0757 (max= 1.5291), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:13,382 - root - INFO - Step 10660: lr=2.57E-06, loss= 1.0776 (max= 1.5277), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:13,382 - root - INFO - Step 10660: lr=2.57E-06, loss= 1.0776 (max= 1.5277), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:13,382 - root - INFO - Step 10660: lr=2.57E-06, loss= 1.0776 (max= 1.5277), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:13,382 - root - INFO - Step 10660: lr=2.57E-06, loss= 1.0776 (max= 1.5277), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:13,382 - root - INFO - Step 10660: lr=2.57E-06, loss= 1.0776 (max= 1.5277), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:13,382 - root - INFO - Step 10660: lr=2.57E-06, loss= 1.0776 (max= 1.5277), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:13,382 - root - INFO - Step 10660: lr=2.57E-06, loss= 1.0776 (max= 1.5277), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:13,383 - root - INFO - Step 10660: lr=2.57E-06, loss= 1.0776 (max= 1.5277), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:45,212 - root - INFO - Step 10670: lr=2.57E-06, loss= 1.0850 (max= 1.4599), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:45,212 - root - INFO - Step 10670: lr=2.57E-06, loss= 1.0850 (max= 1.4599), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:45,212 - root - INFO - Step 10670: lr=2.57E-06, loss= 1.0850 (max= 1.4599), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:45,212 - root - INFO - Step 10670: lr=2.57E-06, loss= 1.0850 (max= 1.4599), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:45,212 - root - INFO - Step 10670: lr=2.57E-06, loss= 1.0850 (max= 1.4599), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:45,212 - root - INFO - Step 10670: lr=2.57E-06, loss= 1.0850 (max= 1.4599), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:45,212 - root - INFO - Step 10670: lr=2.57E-06, loss= 1.0850 (max= 1.4599), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:34:45,213 - root - INFO - Step 10670: lr=2.57E-06, loss= 1.0850 (max= 1.4599), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:17,049 - root - INFO - Step 10680: lr=2.57E-06, loss= 1.0576 (max= 1.5525), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:17,049 - root - INFO - Step 10680: lr=2.57E-06, loss= 1.0576 (max= 1.5525), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:17,049 - root - INFO - Step 10680: lr=2.57E-06, loss= 1.0576 (max= 1.5525), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:17,049 - root - INFO - Step 10680: lr=2.57E-06, loss= 1.0576 (max= 1.5525), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:17,049 - root - INFO - Step 10680: lr=2.57E-06, loss= 1.0576 (max= 1.5525), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:17,049 - root - INFO - Step 10680: lr=2.57E-06, loss= 1.0576 (max= 1.5525), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:17,049 - root - INFO - Step 10680: lr=2.57E-06, loss= 1.0576 (max= 1.5525), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:17,050 - root - INFO - Step 10680: lr=2.57E-06, loss= 1.0576 (max= 1.5525), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:48,845 - root - INFO - Step 10690: lr=2.56E-06, loss= 1.0566 (max= 1.5049), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:48,845 - root - INFO - Step 10690: lr=2.56E-06, loss= 1.0566 (max= 1.5049), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:48,845 - root - INFO - Step 10690: lr=2.56E-06, loss= 1.0566 (max= 1.5049), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:48,846 - root - INFO - Step 10690: lr=2.56E-06, loss= 1.0566 (max= 1.5049), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:48,846 - root - INFO - Step 10690: lr=2.56E-06, loss= 1.0566 (max= 1.5049), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:48,846 - root - INFO - Step 10690: lr=2.56E-06, loss= 1.0566 (max= 1.5049), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:48,846 - root - INFO - Step 10690: lr=2.56E-06, loss= 1.0566 (max= 1.5049), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:35:48,846 - root - INFO - Step 10690: lr=2.56E-06, loss= 1.0566 (max= 1.5049), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:20,669 - root - INFO - Step 10700: lr=2.56E-06, loss= 1.0742 (max= 1.4735), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:20,669 - root - INFO - Step 10700: lr=2.56E-06, loss= 1.0742 (max= 1.4735), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:20,669 - root - INFO - Step 10700: lr=2.56E-06, loss= 1.0742 (max= 1.4735), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:20,669 - root - INFO - Step 10700: lr=2.56E-06, loss= 1.0742 (max= 1.4735), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:20,669 - root - INFO - Step 10700: lr=2.56E-06, loss= 1.0742 (max= 1.4735), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:20,669 - root - INFO - Step 10700: lr=2.56E-06, loss= 1.0742 (max= 1.4735), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:20,669 - root - INFO - Step 10700: lr=2.56E-06, loss= 1.0742 (max= 1.4735), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:20,670 - root - INFO - Step 10700: lr=2.56E-06, loss= 1.0742 (max= 1.4735), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:52,485 - root - INFO - Step 10710: lr=2.56E-06, loss= 1.0830 (max= 1.4664), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:52,485 - root - INFO - Step 10710: lr=2.56E-06, loss= 1.0830 (max= 1.4664), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:52,485 - root - INFO - Step 10710: lr=2.56E-06, loss= 1.0830 (max= 1.4664), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:52,485 - root - INFO - Step 10710: lr=2.56E-06, loss= 1.0830 (max= 1.4664), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:52,485 - root - INFO - Step 10710: lr=2.56E-06, loss= 1.0830 (max= 1.4664), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:52,485 - root - INFO - Step 10710: lr=2.56E-06, loss= 1.0830 (max= 1.4664), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:52,485 - root - INFO - Step 10710: lr=2.56E-06, loss= 1.0830 (max= 1.4664), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:36:52,485 - root - INFO - Step 10710: lr=2.56E-06, loss= 1.0830 (max= 1.4664), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:24,314 - root - INFO - Step 10720: lr=2.55E-06, loss= 1.0695 (max= 1.6405), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:24,314 - root - INFO - Step 10720: lr=2.55E-06, loss= 1.0695 (max= 1.6405), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:24,314 - root - INFO - Step 10720: lr=2.55E-06, loss= 1.0695 (max= 1.6405), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:24,314 - root - INFO - Step 10720: lr=2.55E-06, loss= 1.0695 (max= 1.6405), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:24,314 - root - INFO - Step 10720: lr=2.55E-06, loss= 1.0695 (max= 1.6405), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:24,314 - root - INFO - Step 10720: lr=2.55E-06, loss= 1.0695 (max= 1.6405), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:24,314 - root - INFO - Step 10720: lr=2.55E-06, loss= 1.0695 (max= 1.6405), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:24,315 - root - INFO - Step 10720: lr=2.55E-06, loss= 1.0695 (max= 1.6405), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:56,141 - root - INFO - Step 10730: lr=2.55E-06, loss= 1.0845 (max= 1.5085), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:56,141 - root - INFO - Step 10730: lr=2.55E-06, loss= 1.0845 (max= 1.5085), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:56,141 - root - INFO - Step 10730: lr=2.55E-06, loss= 1.0845 (max= 1.5085), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:56,141 - root - INFO - Step 10730: lr=2.55E-06, loss= 1.0845 (max= 1.5085), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:56,141 - root - INFO - Step 10730: lr=2.55E-06, loss= 1.0845 (max= 1.5085), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:56,141 - root - INFO - Step 10730: lr=2.55E-06, loss= 1.0845 (max= 1.5085), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:56,141 - root - INFO - Step 10730: lr=2.55E-06, loss= 1.0845 (max= 1.5085), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:37:56,142 - root - INFO - Step 10730: lr=2.55E-06, loss= 1.0845 (max= 1.5085), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:27,944 - root - INFO - Step 10740: lr=2.55E-06, loss= 1.0750 (max= 1.7128), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:27,944 - root - INFO - Step 10740: lr=2.55E-06, loss= 1.0750 (max= 1.7128), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:27,944 - root - INFO - Step 10740: lr=2.55E-06, loss= 1.0750 (max= 1.7128), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:27,944 - root - INFO - Step 10740: lr=2.55E-06, loss= 1.0750 (max= 1.7128), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:27,944 - root - INFO - Step 10740: lr=2.55E-06, loss= 1.0750 (max= 1.7128), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:27,944 - root - INFO - Step 10740: lr=2.55E-06, loss= 1.0750 (max= 1.7128), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:27,944 - root - INFO - Step 10740: lr=2.55E-06, loss= 1.0750 (max= 1.7128), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:27,945 - root - INFO - Step 10740: lr=2.55E-06, loss= 1.0750 (max= 1.7128), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:46,147 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:386757 2025-10-26 17:38:59,721 - root - INFO - Step 10750: lr=2.54E-06, loss= 1.0986 (max= 1.6501), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:59,721 - root - INFO - Step 10750: lr=2.54E-06, loss= 1.0986 (max= 1.6501), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:59,721 - root - INFO - Step 10750: lr=2.54E-06, loss= 1.0986 (max= 1.6501), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:59,721 - root - INFO - Step 10750: lr=2.54E-06, loss= 1.0986 (max= 1.6501), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:59,721 - root - INFO - Step 10750: lr=2.54E-06, loss= 1.0986 (max= 1.6501), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:59,721 - root - INFO - Step 10750: lr=2.54E-06, loss= 1.0986 (max= 1.6501), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:59,721 - root - INFO - Step 10750: lr=2.54E-06, loss= 1.0986 (max= 1.6501), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:38:59,722 - root - INFO - Step 10750: lr=2.54E-06, loss= 1.0986 (max= 1.6501), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:39:31,698 - root - INFO - Step 10760: lr=2.54E-06, loss= 1.0741 (max= 1.4609), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:39:31,698 - root - INFO - Step 10760: lr=2.54E-06, loss= 1.0741 (max= 1.4609), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:39:31,698 - root - INFO - Step 10760: lr=2.54E-06, loss= 1.0741 (max= 1.4609), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:39:31,698 - root - INFO - Step 10760: lr=2.54E-06, loss= 1.0741 (max= 1.4609), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:39:31,698 - root - INFO - Step 10760: lr=2.54E-06, loss= 1.0741 (max= 1.4609), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:39:31,698 - root - INFO - Step 10760: lr=2.54E-06, loss= 1.0741 (max= 1.4609), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:39:31,698 - root - INFO - Step 10760: lr=2.54E-06, loss= 1.0741 (max= 1.4609), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:39:31,699 - root - INFO - Step 10760: lr=2.54E-06, loss= 1.0741 (max= 1.4609), tps=20497, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:03,733 - root - INFO - Step 10770: lr=2.53E-06, loss= 1.0690 (max= 1.5744), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:03,733 - root - INFO - Step 10770: lr=2.53E-06, loss= 1.0690 (max= 1.5744), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:03,733 - root - INFO - Step 10770: lr=2.53E-06, loss= 1.0690 (max= 1.5744), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:03,733 - root - INFO - Step 10770: lr=2.53E-06, loss= 1.0690 (max= 1.5744), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:03,733 - root - INFO - Step 10770: lr=2.53E-06, loss= 1.0690 (max= 1.5744), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:03,733 - root - INFO - Step 10770: lr=2.53E-06, loss= 1.0690 (max= 1.5744), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:03,733 - root - INFO - Step 10770: lr=2.53E-06, loss= 1.0690 (max= 1.5744), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:03,733 - root - INFO - Step 10770: lr=2.53E-06, loss= 1.0690 (max= 1.5744), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:35,545 - root - INFO - Step 10780: lr=2.53E-06, loss= 1.0657 (max= 1.5470), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:35,546 - root - INFO - Step 10780: lr=2.53E-06, loss= 1.0657 (max= 1.5470), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:35,546 - root - INFO - Step 10780: lr=2.53E-06, loss= 1.0657 (max= 1.5470), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:35,546 - root - INFO - Step 10780: lr=2.53E-06, loss= 1.0657 (max= 1.5470), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:35,546 - root - INFO - Step 10780: lr=2.53E-06, loss= 1.0657 (max= 1.5470), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:35,546 - root - INFO - Step 10780: lr=2.53E-06, loss= 1.0657 (max= 1.5470), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:35,546 - root - INFO - Step 10780: lr=2.53E-06, loss= 1.0657 (max= 1.5470), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:40:35,546 - root - INFO - Step 10780: lr=2.53E-06, loss= 1.0657 (max= 1.5470), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:07,347 - root - INFO - Step 10790: lr=2.53E-06, loss= 1.0751 (max= 1.5350), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:07,347 - root - INFO - Step 10790: lr=2.53E-06, loss= 1.0751 (max= 1.5350), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:07,347 - root - INFO - Step 10790: lr=2.53E-06, loss= 1.0751 (max= 1.5350), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:07,347 - root - INFO - Step 10790: lr=2.53E-06, loss= 1.0751 (max= 1.5350), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:07,347 - root - INFO - Step 10790: lr=2.53E-06, loss= 1.0751 (max= 1.5350), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:07,347 - root - INFO - Step 10790: lr=2.53E-06, loss= 1.0751 (max= 1.5350), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:07,347 - root - INFO - Step 10790: lr=2.53E-06, loss= 1.0751 (max= 1.5350), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:07,348 - root - INFO - Step 10790: lr=2.53E-06, loss= 1.0751 (max= 1.5350), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:39,208 - root - INFO - Step 10800: lr=2.52E-06, loss= 1.0738 (max= 1.6343), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:39,208 - root - INFO - Step 10800: lr=2.52E-06, loss= 1.0738 (max= 1.6343), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:39,208 - root - INFO - Step 10800: lr=2.52E-06, loss= 1.0738 (max= 1.6343), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:39,209 - root - INFO - Step 10800: lr=2.52E-06, loss= 1.0738 (max= 1.6343), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:39,209 - root - INFO - Step 10800: lr=2.52E-06, loss= 1.0738 (max= 1.6343), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:39,209 - root - INFO - Step 10800: lr=2.52E-06, loss= 1.0738 (max= 1.6343), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:39,209 - root - INFO - Step 10800: lr=2.52E-06, loss= 1.0738 (max= 1.6343), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:39,209 - root - INFO - Step 10800: lr=2.52E-06, loss= 1.0738 (max= 1.6343), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:41:55,652 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:3425330 2025-10-26 17:42:11,175 - root - INFO - Step 10810: lr=2.52E-06, loss= 1.0761 (max= 1.4574), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:11,175 - root - INFO - Step 10810: lr=2.52E-06, loss= 1.0761 (max= 1.4574), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:11,175 - root - INFO - Step 10810: lr=2.52E-06, loss= 1.0761 (max= 1.4574), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:11,175 - root - INFO - Step 10810: lr=2.52E-06, loss= 1.0761 (max= 1.4574), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:11,175 - root - INFO - Step 10810: lr=2.52E-06, loss= 1.0761 (max= 1.4574), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:11,176 - root - INFO - Step 10810: lr=2.52E-06, loss= 1.0761 (max= 1.4574), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:11,176 - root - INFO - Step 10810: lr=2.52E-06, loss= 1.0761 (max= 1.4574), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:11,176 - root - INFO - Step 10810: lr=2.52E-06, loss= 1.0761 (max= 1.4574), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:42,992 - root - INFO - Step 10820: lr=2.52E-06, loss= 1.0753 (max= 1.4917), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:42,992 - root - INFO - Step 10820: lr=2.52E-06, loss= 1.0753 (max= 1.4917), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:42,992 - root - INFO - Step 10820: lr=2.52E-06, loss= 1.0753 (max= 1.4917), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:42,992 - root - INFO - Step 10820: lr=2.52E-06, loss= 1.0753 (max= 1.4917), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:42,992 - root - INFO - Step 10820: lr=2.52E-06, loss= 1.0753 (max= 1.4917), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:42,993 - root - INFO - Step 10820: lr=2.52E-06, loss= 1.0753 (max= 1.4917), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:42,993 - root - INFO - Step 10820: lr=2.52E-06, loss= 1.0753 (max= 1.4917), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:42:42,993 - root - INFO - Step 10820: lr=2.52E-06, loss= 1.0753 (max= 1.4917), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:14,820 - root - INFO - Step 10830: lr=2.51E-06, loss= 1.0558 (max= 1.7267), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:14,820 - root - INFO - Step 10830: lr=2.51E-06, loss= 1.0558 (max= 1.7267), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:14,820 - root - INFO - Step 10830: lr=2.51E-06, loss= 1.0558 (max= 1.7267), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:14,820 - root - INFO - Step 10830: lr=2.51E-06, loss= 1.0558 (max= 1.7267), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:14,820 - root - INFO - Step 10830: lr=2.51E-06, loss= 1.0558 (max= 1.7267), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:14,820 - root - INFO - Step 10830: lr=2.51E-06, loss= 1.0558 (max= 1.7267), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:14,820 - root - INFO - Step 10830: lr=2.51E-06, loss= 1.0558 (max= 1.7267), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:14,821 - root - INFO - Step 10830: lr=2.51E-06, loss= 1.0558 (max= 1.7267), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:40,851 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:7462973 2025-10-26 17:43:46,633 - root - INFO - Step 10840: lr=2.51E-06, loss= 1.0666 (max= 1.4826), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:46,633 - root - INFO - Step 10840: lr=2.51E-06, loss= 1.0666 (max= 1.4826), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:46,633 - root - INFO - Step 10840: lr=2.51E-06, loss= 1.0666 (max= 1.4826), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:46,633 - root - INFO - Step 10840: lr=2.51E-06, loss= 1.0666 (max= 1.4826), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:46,633 - root - INFO - Step 10840: lr=2.51E-06, loss= 1.0666 (max= 1.4826), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:46,633 - root - INFO - Step 10840: lr=2.51E-06, loss= 1.0666 (max= 1.4826), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:46,633 - root - INFO - Step 10840: lr=2.51E-06, loss= 1.0666 (max= 1.4826), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:43:46,634 - root - INFO - Step 10840: lr=2.51E-06, loss= 1.0666 (max= 1.4826), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:18,459 - root - INFO - Step 10850: lr=2.51E-06, loss= 1.0586 (max= 1.5135), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:18,459 - root - INFO - Step 10850: lr=2.51E-06, loss= 1.0586 (max= 1.5135), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:18,459 - root - INFO - Step 10850: lr=2.51E-06, loss= 1.0586 (max= 1.5135), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:18,459 - root - INFO - Step 10850: lr=2.51E-06, loss= 1.0586 (max= 1.5135), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:18,459 - root - INFO - Step 10850: lr=2.51E-06, loss= 1.0586 (max= 1.5135), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:18,459 - root - INFO - Step 10850: lr=2.51E-06, loss= 1.0586 (max= 1.5135), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:18,459 - root - INFO - Step 10850: lr=2.51E-06, loss= 1.0586 (max= 1.5135), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:18,459 - root - INFO - Step 10850: lr=2.51E-06, loss= 1.0586 (max= 1.5135), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:50,238 - root - INFO - Step 10860: lr=2.50E-06, loss= 1.0563 (max= 1.5421), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:50,238 - root - INFO - Step 10860: lr=2.50E-06, loss= 1.0563 (max= 1.5421), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:50,238 - root - INFO - Step 10860: lr=2.50E-06, loss= 1.0563 (max= 1.5421), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:50,238 - root - INFO - Step 10860: lr=2.50E-06, loss= 1.0563 (max= 1.5421), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:50,238 - root - INFO - Step 10860: lr=2.50E-06, loss= 1.0563 (max= 1.5421), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:50,238 - root - INFO - Step 10860: lr=2.50E-06, loss= 1.0563 (max= 1.5421), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:50,238 - root - INFO - Step 10860: lr=2.50E-06, loss= 1.0563 (max= 1.5421), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:44:50,239 - root - INFO - Step 10860: lr=2.50E-06, loss= 1.0563 (max= 1.5421), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:22,085 - root - INFO - Step 10870: lr=2.50E-06, loss= 1.0663 (max= 1.6417), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:22,085 - root - INFO - Step 10870: lr=2.50E-06, loss= 1.0663 (max= 1.6417), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:22,085 - root - INFO - Step 10870: lr=2.50E-06, loss= 1.0663 (max= 1.6417), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:22,085 - root - INFO - Step 10870: lr=2.50E-06, loss= 1.0663 (max= 1.6417), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:22,085 - root - INFO - Step 10870: lr=2.50E-06, loss= 1.0663 (max= 1.6417), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:22,085 - root - INFO - Step 10870: lr=2.50E-06, loss= 1.0663 (max= 1.6417), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:22,085 - root - INFO - Step 10870: lr=2.50E-06, loss= 1.0663 (max= 1.6417), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:22,086 - root - INFO - Step 10870: lr=2.50E-06, loss= 1.0663 (max= 1.6417), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:54,019 - root - INFO - Step 10880: lr=2.49E-06, loss= 1.0743 (max= 1.6382), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:54,019 - root - INFO - Step 10880: lr=2.49E-06, loss= 1.0743 (max= 1.6382), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:54,019 - root - INFO - Step 10880: lr=2.49E-06, loss= 1.0743 (max= 1.6382), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:54,019 - root - INFO - Step 10880: lr=2.49E-06, loss= 1.0743 (max= 1.6382), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:54,020 - root - INFO - Step 10880: lr=2.49E-06, loss= 1.0743 (max= 1.6382), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:54,020 - root - INFO - Step 10880: lr=2.49E-06, loss= 1.0743 (max= 1.6382), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:54,020 - root - INFO - Step 10880: lr=2.49E-06, loss= 1.0743 (max= 1.6382), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:45:54,020 - root - INFO - Step 10880: lr=2.49E-06, loss= 1.0743 (max= 1.6382), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:25,787 - root - INFO - Step 10890: lr=2.49E-06, loss= 1.0465 (max= 1.5738), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:25,787 - root - INFO - Step 10890: lr=2.49E-06, loss= 1.0465 (max= 1.5738), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:25,787 - root - INFO - Step 10890: lr=2.49E-06, loss= 1.0465 (max= 1.5738), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:25,787 - root - INFO - Step 10890: lr=2.49E-06, loss= 1.0465 (max= 1.5738), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:25,788 - root - INFO - Step 10890: lr=2.49E-06, loss= 1.0465 (max= 1.5738), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:25,788 - root - INFO - Step 10890: lr=2.49E-06, loss= 1.0465 (max= 1.5738), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:25,788 - root - INFO - Step 10890: lr=2.49E-06, loss= 1.0465 (max= 1.5738), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:25,788 - root - INFO - Step 10890: lr=2.49E-06, loss= 1.0465 (max= 1.5738), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:57,645 - root - INFO - Step 10900: lr=2.49E-06, loss= 1.0618 (max= 1.6228), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:57,645 - root - INFO - Step 10900: lr=2.49E-06, loss= 1.0618 (max= 1.6228), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:57,645 - root - INFO - Step 10900: lr=2.49E-06, loss= 1.0618 (max= 1.6228), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:57,645 - root - INFO - Step 10900: lr=2.49E-06, loss= 1.0618 (max= 1.6228), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:57,645 - root - INFO - Step 10900: lr=2.49E-06, loss= 1.0618 (max= 1.6228), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:57,645 - root - INFO - Step 10900: lr=2.49E-06, loss= 1.0618 (max= 1.6228), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:57,645 - root - INFO - Step 10900: lr=2.49E-06, loss= 1.0618 (max= 1.6228), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:46:57,646 - root - INFO - Step 10900: lr=2.49E-06, loss= 1.0618 (max= 1.6228), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:47:29,439 - root - INFO - Step 10910: lr=2.48E-06, loss= 1.0697 (max= 1.7161), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:47:29,439 - root - INFO - Step 10910: lr=2.48E-06, loss= 1.0697 (max= 1.7161), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:47:29,439 - root - INFO - Step 10910: lr=2.48E-06, loss= 1.0697 (max= 1.7161), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:47:29,439 - root - INFO - Step 10910: lr=2.48E-06, loss= 1.0697 (max= 1.7161), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:47:29,439 - root - INFO - Step 10910: lr=2.48E-06, loss= 1.0697 (max= 1.7161), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:47:29,439 - root - INFO - Step 10910: lr=2.48E-06, loss= 1.0697 (max= 1.7161), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:47:29,439 - root - INFO - Step 10910: lr=2.48E-06, loss= 1.0697 (max= 1.7161), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:47:29,440 - root - INFO - Step 10910: lr=2.48E-06, loss= 1.0697 (max= 1.7161), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:01,284 - root - INFO - Step 10920: lr=2.48E-06, loss= 1.0956 (max= 1.4569), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:01,284 - root - INFO - Step 10920: lr=2.48E-06, loss= 1.0956 (max= 1.4569), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:01,284 - root - INFO - Step 10920: lr=2.48E-06, loss= 1.0956 (max= 1.4569), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:01,284 - root - INFO - Step 10920: lr=2.48E-06, loss= 1.0956 (max= 1.4569), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:01,284 - root - INFO - Step 10920: lr=2.48E-06, loss= 1.0956 (max= 1.4569), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:01,284 - root - INFO - Step 10920: lr=2.48E-06, loss= 1.0956 (max= 1.4569), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:01,284 - root - INFO - Step 10920: lr=2.48E-06, loss= 1.0956 (max= 1.4569), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:01,285 - root - INFO - Step 10920: lr=2.48E-06, loss= 1.0956 (max= 1.4569), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:33,212 - root - INFO - Step 10930: lr=2.48E-06, loss= 1.0661 (max= 1.5093), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:33,212 - root - INFO - Step 10930: lr=2.48E-06, loss= 1.0661 (max= 1.5093), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:33,212 - root - INFO - Step 10930: lr=2.48E-06, loss= 1.0661 (max= 1.5093), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:33,212 - root - INFO - Step 10930: lr=2.48E-06, loss= 1.0661 (max= 1.5093), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:33,212 - root - INFO - Step 10930: lr=2.48E-06, loss= 1.0661 (max= 1.5093), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:33,212 - root - INFO - Step 10930: lr=2.48E-06, loss= 1.0661 (max= 1.5093), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:33,213 - root - INFO - Step 10930: lr=2.48E-06, loss= 1.0661 (max= 1.5093), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:48:33,213 - root - INFO - Step 10930: lr=2.48E-06, loss= 1.0661 (max= 1.5093), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:05,017 - root - INFO - Step 10940: lr=2.47E-06, loss= 1.0851 (max= 1.5271), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:05,017 - root - INFO - Step 10940: lr=2.47E-06, loss= 1.0851 (max= 1.5271), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:05,017 - root - INFO - Step 10940: lr=2.47E-06, loss= 1.0851 (max= 1.5271), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:05,017 - root - INFO - Step 10940: lr=2.47E-06, loss= 1.0851 (max= 1.5271), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:05,017 - root - INFO - Step 10940: lr=2.47E-06, loss= 1.0851 (max= 1.5271), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:05,017 - root - INFO - Step 10940: lr=2.47E-06, loss= 1.0851 (max= 1.5271), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:05,017 - root - INFO - Step 10940: lr=2.47E-06, loss= 1.0851 (max= 1.5271), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:05,018 - root - INFO - Step 10940: lr=2.47E-06, loss= 1.0851 (max= 1.5271), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:36,838 - root - INFO - Step 10950: lr=2.47E-06, loss= 1.0859 (max= 1.8176), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:36,839 - root - INFO - Step 10950: lr=2.47E-06, loss= 1.0859 (max= 1.8176), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:36,839 - root - INFO - Step 10950: lr=2.47E-06, loss= 1.0859 (max= 1.8176), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:36,839 - root - INFO - Step 10950: lr=2.47E-06, loss= 1.0859 (max= 1.8176), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:36,839 - root - INFO - Step 10950: lr=2.47E-06, loss= 1.0859 (max= 1.8176), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:36,839 - root - INFO - Step 10950: lr=2.47E-06, loss= 1.0859 (max= 1.8176), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:36,839 - root - INFO - Step 10950: lr=2.47E-06, loss= 1.0859 (max= 1.8176), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:49:36,839 - root - INFO - Step 10950: lr=2.47E-06, loss= 1.0859 (max= 1.8176), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:08,735 - root - INFO - Step 10960: lr=2.47E-06, loss= 1.0805 (max= 1.7031), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:08,735 - root - INFO - Step 10960: lr=2.47E-06, loss= 1.0805 (max= 1.7031), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:08,735 - root - INFO - Step 10960: lr=2.47E-06, loss= 1.0805 (max= 1.7031), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:08,735 - root - INFO - Step 10960: lr=2.47E-06, loss= 1.0805 (max= 1.7031), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:08,735 - root - INFO - Step 10960: lr=2.47E-06, loss= 1.0805 (max= 1.7031), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:08,735 - root - INFO - Step 10960: lr=2.47E-06, loss= 1.0805 (max= 1.7031), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:08,735 - root - INFO - Step 10960: lr=2.47E-06, loss= 1.0805 (max= 1.7031), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:08,736 - root - INFO - Step 10960: lr=2.47E-06, loss= 1.0805 (max= 1.7031), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:40,531 - root - INFO - Step 10970: lr=2.46E-06, loss= 1.0448 (max= 1.5142), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:40,532 - root - INFO - Step 10970: lr=2.46E-06, loss= 1.0448 (max= 1.5142), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:40,532 - root - INFO - Step 10970: lr=2.46E-06, loss= 1.0448 (max= 1.5142), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:40,532 - root - INFO - Step 10970: lr=2.46E-06, loss= 1.0448 (max= 1.5142), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:40,532 - root - INFO - Step 10970: lr=2.46E-06, loss= 1.0448 (max= 1.5142), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:40,532 - root - INFO - Step 10970: lr=2.46E-06, loss= 1.0448 (max= 1.5142), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:40,532 - root - INFO - Step 10970: lr=2.46E-06, loss= 1.0448 (max= 1.5142), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:50:40,533 - root - INFO - Step 10970: lr=2.46E-06, loss= 1.0448 (max= 1.5142), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:12,646 - root - INFO - Step 10980: lr=2.46E-06, loss= 1.0733 (max= 1.5529), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:12,646 - root - INFO - Step 10980: lr=2.46E-06, loss= 1.0733 (max= 1.5529), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:12,647 - root - INFO - Step 10980: lr=2.46E-06, loss= 1.0733 (max= 1.5529), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:12,647 - root - INFO - Step 10980: lr=2.46E-06, loss= 1.0733 (max= 1.5529), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:12,647 - root - INFO - Step 10980: lr=2.46E-06, loss= 1.0733 (max= 1.5529), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:12,647 - root - INFO - Step 10980: lr=2.46E-06, loss= 1.0733 (max= 1.5529), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:12,647 - root - INFO - Step 10980: lr=2.46E-06, loss= 1.0733 (max= 1.5529), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:12,647 - root - INFO - Step 10980: lr=2.46E-06, loss= 1.0733 (max= 1.5529), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:44,501 - root - INFO - Step 10990: lr=2.45E-06, loss= 1.0612 (max= 1.6980), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:44,501 - root - INFO - Step 10990: lr=2.45E-06, loss= 1.0612 (max= 1.6980), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:44,501 - root - INFO - Step 10990: lr=2.45E-06, loss= 1.0612 (max= 1.6980), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:44,501 - root - INFO - Step 10990: lr=2.45E-06, loss= 1.0612 (max= 1.6980), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:44,501 - root - INFO - Step 10990: lr=2.45E-06, loss= 1.0612 (max= 1.6980), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:44,501 - root - INFO - Step 10990: lr=2.45E-06, loss= 1.0612 (max= 1.6980), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:44,501 - root - INFO - Step 10990: lr=2.45E-06, loss= 1.0612 (max= 1.6980), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:51:44,502 - root - INFO - Step 10990: lr=2.45E-06, loss= 1.0612 (max= 1.6980), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-11000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-11000! Save time: 4.3940980434417725 2025-10-26 17:52:16,379 - root - INFO - Step 11000: lr=2.45E-06, loss= 1.0566 (max= 1.7188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:52:16,379 - root - INFO - Saving a full checkpoint at step 11000 2025-10-26 17:52:16,379 - root - INFO - Step 11000: lr=2.45E-06, loss= 1.0566 (max= 1.7188), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:52:16,379 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 17:52:16,379 - root - INFO - Saving a full checkpoint at step 11000 2025-10-26 17:52:16,380 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 17:52:16,379 - root - INFO - Step 11000: lr=2.45E-06, loss= 1.0566 (max= 1.7188), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:52:16,379 - root - INFO - Step 11000: lr=2.45E-06, loss= 1.0566 (max= 1.7188), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:52:16,380 - root - INFO - Step 11000: lr=2.45E-06, loss= 1.0566 (max= 1.7188), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:52:16,380 - root - INFO - Step 11000: lr=2.45E-06, loss= 1.0566 (max= 1.7188), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:52:16,380 - root - INFO - Saving a full checkpoint at step 11000 2025-10-26 17:52:16,380 - root - INFO - Saving a full checkpoint at step 11000 2025-10-26 17:52:16,380 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 17:52:16,380 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 17:52:16,380 - root - INFO - Step 11000: lr=2.45E-06, loss= 1.0566 (max= 1.7188), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:52:16,380 - root - INFO - Saving a full checkpoint at step 11000 2025-10-26 17:52:16,380 - root - INFO - Saving a full checkpoint at step 11000 2025-10-26 17:52:16,380 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 17:52:16,380 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 17:52:16,380 - root - INFO - Saving a full checkpoint at step 11000 2025-10-26 17:52:16,380 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 17:52:16,380 - root - INFO - Step 11000: lr=2.45E-06, loss= 1.0566 (max= 1.7188), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:52:16,380 - root - INFO - Saving a full checkpoint at step 11000 2025-10-26 17:52:16,380 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 17:52:30,921 - root - INFO - Finished saving the checkpoint in 14.54 seconds 2025-10-26 17:52:30,928 - root - INFO - Finished saving the checkpoint in 14.55 seconds 2025-10-26 17:52:30,929 - root - INFO - Finished saving the checkpoint in 14.55 seconds 2025-10-26 17:52:30,929 - root - INFO - Finished saving the checkpoint in 14.55 seconds 2025-10-26 17:52:30,929 - root - INFO - Finished saving the checkpoint in 14.55 seconds 2025-10-26 17:52:30,930 - root - INFO - Finished saving the checkpoint in 14.55 seconds 2025-10-26 17:52:30,930 - root - INFO - Finished saving the checkpoint in 14.55 seconds 2025-10-26 17:52:30,931 - root - INFO - Finished saving the checkpoint in 14.55 seconds 2025-10-26 17:53:02,742 - root - INFO - Step 11010: lr=2.45E-06, loss= 1.0548 (max= 1.4022), tps=14137, mfu=29.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:02,742 - root - INFO - Step 11010: lr=2.45E-06, loss= 1.0548 (max= 1.4022), tps=14137, mfu=29.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:02,742 - root - INFO - Step 11010: lr=2.45E-06, loss= 1.0548 (max= 1.4022), tps=14137, mfu=29.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:02,742 - root - INFO - Step 11010: lr=2.45E-06, loss= 1.0548 (max= 1.4022), tps=14137, mfu=29.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:02,742 - root - INFO - Step 11010: lr=2.45E-06, loss= 1.0548 (max= 1.4022), tps=14137, mfu=29.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:02,742 - root - INFO - Step 11010: lr=2.45E-06, loss= 1.0548 (max= 1.4022), tps=14137, mfu=29.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:02,742 - root - INFO - Step 11010: lr=2.45E-06, loss= 1.0548 (max= 1.4022), tps=14137, mfu=29.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:02,743 - root - INFO - Step 11010: lr=2.45E-06, loss= 1.0548 (max= 1.4022), tps=14137, mfu=29.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:34,645 - root - INFO - Step 11020: lr=2.44E-06, loss= 1.0611 (max= 1.6357), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:34,645 - root - INFO - Step 11020: lr=2.44E-06, loss= 1.0611 (max= 1.6357), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:34,646 - root - INFO - Step 11020: lr=2.44E-06, loss= 1.0611 (max= 1.6357), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:34,646 - root - INFO - Step 11020: lr=2.44E-06, loss= 1.0611 (max= 1.6357), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:34,646 - root - INFO - Step 11020: lr=2.44E-06, loss= 1.0611 (max= 1.6357), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:34,646 - root - INFO - Step 11020: lr=2.44E-06, loss= 1.0611 (max= 1.6357), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:34,646 - root - INFO - Step 11020: lr=2.44E-06, loss= 1.0611 (max= 1.6357), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:53:34,646 - root - INFO - Step 11020: lr=2.44E-06, loss= 1.0611 (max= 1.6357), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:06,627 - root - INFO - Step 11030: lr=2.44E-06, loss= 1.0627 (max= 1.4922), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:06,627 - root - INFO - Step 11030: lr=2.44E-06, loss= 1.0627 (max= 1.4922), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:06,627 - root - INFO - Step 11030: lr=2.44E-06, loss= 1.0627 (max= 1.4922), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:06,627 - root - INFO - Step 11030: lr=2.44E-06, loss= 1.0627 (max= 1.4922), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:06,627 - root - INFO - Step 11030: lr=2.44E-06, loss= 1.0627 (max= 1.4922), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:06,627 - root - INFO - Step 11030: lr=2.44E-06, loss= 1.0627 (max= 1.4922), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:06,627 - root - INFO - Step 11030: lr=2.44E-06, loss= 1.0627 (max= 1.4922), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:06,628 - root - INFO - Step 11030: lr=2.44E-06, loss= 1.0627 (max= 1.4922), tps=20494, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:38,418 - root - INFO - Step 11040: lr=2.44E-06, loss= 1.0505 (max= 1.5263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:38,418 - root - INFO - Step 11040: lr=2.44E-06, loss= 1.0505 (max= 1.5263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:38,418 - root - INFO - Step 11040: lr=2.44E-06, loss= 1.0505 (max= 1.5263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:38,418 - root - INFO - Step 11040: lr=2.44E-06, loss= 1.0505 (max= 1.5263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:38,418 - root - INFO - Step 11040: lr=2.44E-06, loss= 1.0505 (max= 1.5263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:38,418 - root - INFO - Step 11040: lr=2.44E-06, loss= 1.0505 (max= 1.5263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:38,418 - root - INFO - Step 11040: lr=2.44E-06, loss= 1.0505 (max= 1.5263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:54:38,419 - root - INFO - Step 11040: lr=2.44E-06, loss= 1.0505 (max= 1.5263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:10,230 - root - INFO - Step 11050: lr=2.43E-06, loss= 1.0725 (max= 1.4835), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:10,231 - root - INFO - Step 11050: lr=2.43E-06, loss= 1.0725 (max= 1.4835), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:10,231 - root - INFO - Step 11050: lr=2.43E-06, loss= 1.0725 (max= 1.4835), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:10,231 - root - INFO - Step 11050: lr=2.43E-06, loss= 1.0725 (max= 1.4835), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:10,231 - root - INFO - Step 11050: lr=2.43E-06, loss= 1.0725 (max= 1.4835), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:10,231 - root - INFO - Step 11050: lr=2.43E-06, loss= 1.0725 (max= 1.4835), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:10,231 - root - INFO - Step 11050: lr=2.43E-06, loss= 1.0725 (max= 1.4835), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:10,231 - root - INFO - Step 11050: lr=2.43E-06, loss= 1.0725 (max= 1.4835), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:42,047 - root - INFO - Step 11060: lr=2.43E-06, loss= 1.0548 (max= 1.8279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:42,047 - root - INFO - Step 11060: lr=2.43E-06, loss= 1.0548 (max= 1.8279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:42,047 - root - INFO - Step 11060: lr=2.43E-06, loss= 1.0548 (max= 1.8279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:42,048 - root - INFO - Step 11060: lr=2.43E-06, loss= 1.0548 (max= 1.8279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:42,048 - root - INFO - Step 11060: lr=2.43E-06, loss= 1.0548 (max= 1.8279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:42,048 - root - INFO - Step 11060: lr=2.43E-06, loss= 1.0548 (max= 1.8279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:42,048 - root - INFO - Step 11060: lr=2.43E-06, loss= 1.0548 (max= 1.8279), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:55:42,048 - root - INFO - Step 11060: lr=2.43E-06, loss= 1.0548 (max= 1.8279), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:13,851 - root - INFO - Step 11070: lr=2.43E-06, loss= 1.0351 (max= 1.3612), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:13,851 - root - INFO - Step 11070: lr=2.43E-06, loss= 1.0351 (max= 1.3612), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:13,852 - root - INFO - Step 11070: lr=2.43E-06, loss= 1.0351 (max= 1.3612), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:13,852 - root - INFO - Step 11070: lr=2.43E-06, loss= 1.0351 (max= 1.3612), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:13,852 - root - INFO - Step 11070: lr=2.43E-06, loss= 1.0351 (max= 1.3612), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:13,852 - root - INFO - Step 11070: lr=2.43E-06, loss= 1.0351 (max= 1.3612), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:13,852 - root - INFO - Step 11070: lr=2.43E-06, loss= 1.0351 (max= 1.3612), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:13,852 - root - INFO - Step 11070: lr=2.43E-06, loss= 1.0351 (max= 1.3612), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:45,826 - root - INFO - Step 11080: lr=2.42E-06, loss= 1.0642 (max= 1.5046), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:45,826 - root - INFO - Step 11080: lr=2.42E-06, loss= 1.0642 (max= 1.5046), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:45,826 - root - INFO - Step 11080: lr=2.42E-06, loss= 1.0642 (max= 1.5046), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:45,826 - root - INFO - Step 11080: lr=2.42E-06, loss= 1.0642 (max= 1.5046), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:45,826 - root - INFO - Step 11080: lr=2.42E-06, loss= 1.0642 (max= 1.5046), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:45,826 - root - INFO - Step 11080: lr=2.42E-06, loss= 1.0642 (max= 1.5046), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:45,826 - root - INFO - Step 11080: lr=2.42E-06, loss= 1.0642 (max= 1.5046), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:56:45,826 - root - INFO - Step 11080: lr=2.42E-06, loss= 1.0642 (max= 1.5046), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:18,244 - root - INFO - Step 11090: lr=2.42E-06, loss= 1.0584 (max= 1.7649), tps=20218, mfu=42.12%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:18,244 - root - INFO - Step 11090: lr=2.42E-06, loss= 1.0584 (max= 1.7649), tps=20218, mfu=42.12%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:18,245 - root - INFO - Step 11090: lr=2.42E-06, loss= 1.0584 (max= 1.7649), tps=20218, mfu=42.12%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:18,245 - root - INFO - Step 11090: lr=2.42E-06, loss= 1.0584 (max= 1.7649), tps=20218, mfu=42.12%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:18,245 - root - INFO - Step 11090: lr=2.42E-06, loss= 1.0584 (max= 1.7649), tps=20218, mfu=42.12%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:18,245 - root - INFO - Step 11090: lr=2.42E-06, loss= 1.0584 (max= 1.7649), tps=20218, mfu=42.12%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:18,245 - root - INFO - Step 11090: lr=2.42E-06, loss= 1.0584 (max= 1.7649), tps=20218, mfu=42.12%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:18,245 - root - INFO - Step 11090: lr=2.42E-06, loss= 1.0584 (max= 1.7649), tps=20218, mfu=42.12%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:50,098 - root - INFO - Step 11100: lr=2.42E-06, loss= 1.0398 (max= 1.5071), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:50,098 - root - INFO - Step 11100: lr=2.42E-06, loss= 1.0398 (max= 1.5071), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:50,098 - root - INFO - Step 11100: lr=2.42E-06, loss= 1.0398 (max= 1.5071), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:50,098 - root - INFO - Step 11100: lr=2.42E-06, loss= 1.0398 (max= 1.5071), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:50,098 - root - INFO - Step 11100: lr=2.42E-06, loss= 1.0398 (max= 1.5071), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:50,098 - root - INFO - Step 11100: lr=2.42E-06, loss= 1.0398 (max= 1.5071), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:50,098 - root - INFO - Step 11100: lr=2.42E-06, loss= 1.0398 (max= 1.5071), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:57:50,099 - root - INFO - Step 11100: lr=2.42E-06, loss= 1.0398 (max= 1.5071), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:21,995 - root - INFO - Step 11110: lr=2.41E-06, loss= 1.0424 (max= 1.5177), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:21,995 - root - INFO - Step 11110: lr=2.41E-06, loss= 1.0424 (max= 1.5177), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:21,995 - root - INFO - Step 11110: lr=2.41E-06, loss= 1.0424 (max= 1.5177), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:21,995 - root - INFO - Step 11110: lr=2.41E-06, loss= 1.0424 (max= 1.5177), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:21,995 - root - INFO - Step 11110: lr=2.41E-06, loss= 1.0424 (max= 1.5177), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:21,995 - root - INFO - Step 11110: lr=2.41E-06, loss= 1.0424 (max= 1.5177), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:21,996 - root - INFO - Step 11110: lr=2.41E-06, loss= 1.0424 (max= 1.5177), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:21,996 - root - INFO - Step 11110: lr=2.41E-06, loss= 1.0424 (max= 1.5177), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:53,772 - root - INFO - Step 11120: lr=2.41E-06, loss= 1.0575 (max= 1.4876), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:53,772 - root - INFO - Step 11120: lr=2.41E-06, loss= 1.0575 (max= 1.4876), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:53,773 - root - INFO - Step 11120: lr=2.41E-06, loss= 1.0575 (max= 1.4876), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:53,773 - root - INFO - Step 11120: lr=2.41E-06, loss= 1.0575 (max= 1.4876), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:53,773 - root - INFO - Step 11120: lr=2.41E-06, loss= 1.0575 (max= 1.4876), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:53,773 - root - INFO - Step 11120: lr=2.41E-06, loss= 1.0575 (max= 1.4876), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:53,773 - root - INFO - Step 11120: lr=2.41E-06, loss= 1.0575 (max= 1.4876), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:58:53,773 - root - INFO - Step 11120: lr=2.41E-06, loss= 1.0575 (max= 1.4876), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:25,682 - root - INFO - Step 11130: lr=2.40E-06, loss= 1.0511 (max= 1.6163), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:25,682 - root - INFO - Step 11130: lr=2.40E-06, loss= 1.0511 (max= 1.6163), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:25,683 - root - INFO - Step 11130: lr=2.40E-06, loss= 1.0511 (max= 1.6163), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:25,683 - root - INFO - Step 11130: lr=2.40E-06, loss= 1.0511 (max= 1.6163), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:25,683 - root - INFO - Step 11130: lr=2.40E-06, loss= 1.0511 (max= 1.6163), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:25,683 - root - INFO - Step 11130: lr=2.40E-06, loss= 1.0511 (max= 1.6163), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:25,683 - root - INFO - Step 11130: lr=2.40E-06, loss= 1.0511 (max= 1.6163), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:25,683 - root - INFO - Step 11130: lr=2.40E-06, loss= 1.0511 (max= 1.6163), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:57,450 - root - INFO - Step 11140: lr=2.40E-06, loss= 1.0599 (max= 1.5347), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:57,450 - root - INFO - Step 11140: lr=2.40E-06, loss= 1.0599 (max= 1.5347), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:57,450 - root - INFO - Step 11140: lr=2.40E-06, loss= 1.0599 (max= 1.5347), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:57,450 - root - INFO - Step 11140: lr=2.40E-06, loss= 1.0599 (max= 1.5347), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:57,450 - root - INFO - Step 11140: lr=2.40E-06, loss= 1.0599 (max= 1.5347), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:57,450 - root - INFO - Step 11140: lr=2.40E-06, loss= 1.0599 (max= 1.5347), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:57,451 - root - INFO - Step 11140: lr=2.40E-06, loss= 1.0599 (max= 1.5347), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 17:59:57,451 - root - INFO - Step 11140: lr=2.40E-06, loss= 1.0599 (max= 1.5347), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:00:29,255 - root - INFO - Step 11150: lr=2.40E-06, loss= 1.0612 (max= 1.5552), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:00:29,255 - root - INFO - Step 11150: lr=2.40E-06, loss= 1.0612 (max= 1.5552), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:00:29,255 - root - INFO - Step 11150: lr=2.40E-06, loss= 1.0612 (max= 1.5552), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:00:29,256 - root - INFO - Step 11150: lr=2.40E-06, loss= 1.0612 (max= 1.5552), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:00:29,256 - root - INFO - Step 11150: lr=2.40E-06, loss= 1.0612 (max= 1.5552), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:00:29,256 - root - INFO - Step 11150: lr=2.40E-06, loss= 1.0612 (max= 1.5552), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:00:29,256 - root - INFO - Step 11150: lr=2.40E-06, loss= 1.0612 (max= 1.5552), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:00:29,256 - root - INFO - Step 11150: lr=2.40E-06, loss= 1.0612 (max= 1.5552), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:01,061 - root - INFO - Step 11160: lr=2.39E-06, loss= 1.0549 (max= 1.5947), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:01,061 - root - INFO - Step 11160: lr=2.39E-06, loss= 1.0549 (max= 1.5947), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:01,061 - root - INFO - Step 11160: lr=2.39E-06, loss= 1.0549 (max= 1.5947), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:01,061 - root - INFO - Step 11160: lr=2.39E-06, loss= 1.0549 (max= 1.5947), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:01,061 - root - INFO - Step 11160: lr=2.39E-06, loss= 1.0549 (max= 1.5947), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:01,061 - root - INFO - Step 11160: lr=2.39E-06, loss= 1.0549 (max= 1.5947), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:01,061 - root - INFO - Step 11160: lr=2.39E-06, loss= 1.0549 (max= 1.5947), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:01,062 - root - INFO - Step 11160: lr=2.39E-06, loss= 1.0549 (max= 1.5947), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:32,946 - root - INFO - Step 11170: lr=2.39E-06, loss= 1.0503 (max= 1.4879), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:32,946 - root - INFO - Step 11170: lr=2.39E-06, loss= 1.0503 (max= 1.4879), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:32,946 - root - INFO - Step 11170: lr=2.39E-06, loss= 1.0503 (max= 1.4879), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:32,947 - root - INFO - Step 11170: lr=2.39E-06, loss= 1.0503 (max= 1.4879), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:32,947 - root - INFO - Step 11170: lr=2.39E-06, loss= 1.0503 (max= 1.4879), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:32,947 - root - INFO - Step 11170: lr=2.39E-06, loss= 1.0503 (max= 1.4879), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:32,947 - root - INFO - Step 11170: lr=2.39E-06, loss= 1.0503 (max= 1.4879), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:01:32,947 - root - INFO - Step 11170: lr=2.39E-06, loss= 1.0503 (max= 1.4879), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:04,990 - root - INFO - Step 11180: lr=2.39E-06, loss= 1.0610 (max= 1.6208), tps=20454, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:04,990 - root - INFO - Step 11180: lr=2.39E-06, loss= 1.0610 (max= 1.6208), tps=20454, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:04,990 - root - INFO - Step 11180: lr=2.39E-06, loss= 1.0610 (max= 1.6208), tps=20454, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:04,990 - root - INFO - Step 11180: lr=2.39E-06, loss= 1.0610 (max= 1.6208), tps=20454, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:04,990 - root - INFO - Step 11180: lr=2.39E-06, loss= 1.0610 (max= 1.6208), tps=20454, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:04,990 - root - INFO - Step 11180: lr=2.39E-06, loss= 1.0610 (max= 1.6208), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:04,990 - root - INFO - Step 11180: lr=2.39E-06, loss= 1.0610 (max= 1.6208), tps=20454, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:04,991 - root - INFO - Step 11180: lr=2.39E-06, loss= 1.0610 (max= 1.6208), tps=20455, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:36,859 - root - INFO - Step 11190: lr=2.38E-06, loss= 1.0737 (max= 1.7652), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:36,859 - root - INFO - Step 11190: lr=2.38E-06, loss= 1.0737 (max= 1.7652), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:36,859 - root - INFO - Step 11190: lr=2.38E-06, loss= 1.0737 (max= 1.7652), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:36,859 - root - INFO - Step 11190: lr=2.38E-06, loss= 1.0737 (max= 1.7652), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:36,859 - root - INFO - Step 11190: lr=2.38E-06, loss= 1.0737 (max= 1.7652), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:36,859 - root - INFO - Step 11190: lr=2.38E-06, loss= 1.0737 (max= 1.7652), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:36,859 - root - INFO - Step 11190: lr=2.38E-06, loss= 1.0737 (max= 1.7652), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:02:36,860 - root - INFO - Step 11190: lr=2.38E-06, loss= 1.0737 (max= 1.7652), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:08,656 - root - INFO - Step 11200: lr=2.38E-06, loss= 1.0271 (max= 1.4030), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:08,656 - root - INFO - Step 11200: lr=2.38E-06, loss= 1.0271 (max= 1.4030), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:08,656 - root - INFO - Step 11200: lr=2.38E-06, loss= 1.0271 (max= 1.4030), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:08,656 - root - INFO - Step 11200: lr=2.38E-06, loss= 1.0271 (max= 1.4030), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:08,656 - root - INFO - Step 11200: lr=2.38E-06, loss= 1.0271 (max= 1.4030), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:08,656 - root - INFO - Step 11200: lr=2.38E-06, loss= 1.0271 (max= 1.4030), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:08,656 - root - INFO - Step 11200: lr=2.38E-06, loss= 1.0271 (max= 1.4030), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:08,657 - root - INFO - Step 11200: lr=2.38E-06, loss= 1.0271 (max= 1.4030), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:40,546 - root - INFO - Step 11210: lr=2.38E-06, loss= 1.0346 (max= 1.5376), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:40,546 - root - INFO - Step 11210: lr=2.38E-06, loss= 1.0346 (max= 1.5376), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:40,547 - root - INFO - Step 11210: lr=2.38E-06, loss= 1.0346 (max= 1.5376), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:40,547 - root - INFO - Step 11210: lr=2.38E-06, loss= 1.0346 (max= 1.5376), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:40,547 - root - INFO - Step 11210: lr=2.38E-06, loss= 1.0346 (max= 1.5376), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:40,547 - root - INFO - Step 11210: lr=2.38E-06, loss= 1.0346 (max= 1.5376), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:40,547 - root - INFO - Step 11210: lr=2.38E-06, loss= 1.0346 (max= 1.5376), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:03:40,547 - root - INFO - Step 11210: lr=2.38E-06, loss= 1.0346 (max= 1.5376), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:12,453 - root - INFO - Step 11220: lr=2.37E-06, loss= 1.0473 (max= 1.4596), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:12,453 - root - INFO - Step 11220: lr=2.37E-06, loss= 1.0473 (max= 1.4596), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:12,453 - root - INFO - Step 11220: lr=2.37E-06, loss= 1.0473 (max= 1.4596), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:12,453 - root - INFO - Step 11220: lr=2.37E-06, loss= 1.0473 (max= 1.4596), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:12,453 - root - INFO - Step 11220: lr=2.37E-06, loss= 1.0473 (max= 1.4596), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:12,453 - root - INFO - Step 11220: lr=2.37E-06, loss= 1.0473 (max= 1.4596), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:12,453 - root - INFO - Step 11220: lr=2.37E-06, loss= 1.0473 (max= 1.4596), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:12,454 - root - INFO - Step 11220: lr=2.37E-06, loss= 1.0473 (max= 1.4596), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:44,308 - root - INFO - Step 11230: lr=2.37E-06, loss= 1.0406 (max= 1.5363), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:44,308 - root - INFO - Step 11230: lr=2.37E-06, loss= 1.0406 (max= 1.5363), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:44,308 - root - INFO - Step 11230: lr=2.37E-06, loss= 1.0406 (max= 1.5363), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:44,308 - root - INFO - Step 11230: lr=2.37E-06, loss= 1.0406 (max= 1.5363), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:44,308 - root - INFO - Step 11230: lr=2.37E-06, loss= 1.0406 (max= 1.5363), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:44,308 - root - INFO - Step 11230: lr=2.37E-06, loss= 1.0406 (max= 1.5363), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:44,308 - root - INFO - Step 11230: lr=2.37E-06, loss= 1.0406 (max= 1.5363), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:04:44,309 - root - INFO - Step 11230: lr=2.37E-06, loss= 1.0406 (max= 1.5363), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:12,094 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:4658430 2025-10-26 18:05:16,168 - root - INFO - Step 11240: lr=2.37E-06, loss= 1.0566 (max= 1.5726), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:16,168 - root - INFO - Step 11240: lr=2.37E-06, loss= 1.0566 (max= 1.5726), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:16,168 - root - INFO - Step 11240: lr=2.37E-06, loss= 1.0566 (max= 1.5726), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:16,168 - root - INFO - Step 11240: lr=2.37E-06, loss= 1.0566 (max= 1.5726), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:16,168 - root - INFO - Step 11240: lr=2.37E-06, loss= 1.0566 (max= 1.5726), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:16,168 - root - INFO - Step 11240: lr=2.37E-06, loss= 1.0566 (max= 1.5726), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:16,168 - root - INFO - Step 11240: lr=2.37E-06, loss= 1.0566 (max= 1.5726), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:16,169 - root - INFO - Step 11240: lr=2.37E-06, loss= 1.0566 (max= 1.5726), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:48,086 - root - INFO - Step 11250: lr=2.36E-06, loss= 1.0477 (max= 1.4013), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:48,086 - root - INFO - Step 11250: lr=2.36E-06, loss= 1.0477 (max= 1.4013), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:48,086 - root - INFO - Step 11250: lr=2.36E-06, loss= 1.0477 (max= 1.4013), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:48,086 - root - INFO - Step 11250: lr=2.36E-06, loss= 1.0477 (max= 1.4013), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:48,086 - root - INFO - Step 11250: lr=2.36E-06, loss= 1.0477 (max= 1.4013), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:48,086 - root - INFO - Step 11250: lr=2.36E-06, loss= 1.0477 (max= 1.4013), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:48,086 - root - INFO - Step 11250: lr=2.36E-06, loss= 1.0477 (max= 1.4013), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:05:48,087 - root - INFO - Step 11250: lr=2.36E-06, loss= 1.0477 (max= 1.4013), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:19,891 - root - INFO - Step 11260: lr=2.36E-06, loss= 1.0464 (max= 1.6817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:19,891 - root - INFO - Step 11260: lr=2.36E-06, loss= 1.0464 (max= 1.6817), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:19,892 - root - INFO - Step 11260: lr=2.36E-06, loss= 1.0464 (max= 1.6817), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:19,892 - root - INFO - Step 11260: lr=2.36E-06, loss= 1.0464 (max= 1.6817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:19,892 - root - INFO - Step 11260: lr=2.36E-06, loss= 1.0464 (max= 1.6817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:19,892 - root - INFO - Step 11260: lr=2.36E-06, loss= 1.0464 (max= 1.6817), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:19,892 - root - INFO - Step 11260: lr=2.36E-06, loss= 1.0464 (max= 1.6817), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:19,892 - root - INFO - Step 11260: lr=2.36E-06, loss= 1.0464 (max= 1.6817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:51,744 - root - INFO - Step 11270: lr=2.35E-06, loss= 1.0413 (max= 1.4893), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:51,744 - root - INFO - Step 11270: lr=2.35E-06, loss= 1.0413 (max= 1.4893), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:51,745 - root - INFO - Step 11270: lr=2.35E-06, loss= 1.0413 (max= 1.4893), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:51,745 - root - INFO - Step 11270: lr=2.35E-06, loss= 1.0413 (max= 1.4893), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:51,745 - root - INFO - Step 11270: lr=2.35E-06, loss= 1.0413 (max= 1.4893), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:51,745 - root - INFO - Step 11270: lr=2.35E-06, loss= 1.0413 (max= 1.4893), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:51,745 - root - INFO - Step 11270: lr=2.35E-06, loss= 1.0413 (max= 1.4893), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:06:51,745 - root - INFO - Step 11270: lr=2.35E-06, loss= 1.0413 (max= 1.4893), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:23,538 - root - INFO - Step 11280: lr=2.35E-06, loss= 1.0415 (max= 1.5832), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:23,538 - root - INFO - Step 11280: lr=2.35E-06, loss= 1.0415 (max= 1.5832), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:23,538 - root - INFO - Step 11280: lr=2.35E-06, loss= 1.0415 (max= 1.5832), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:23,538 - root - INFO - Step 11280: lr=2.35E-06, loss= 1.0415 (max= 1.5832), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:23,538 - root - INFO - Step 11280: lr=2.35E-06, loss= 1.0415 (max= 1.5832), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:23,538 - root - INFO - Step 11280: lr=2.35E-06, loss= 1.0415 (max= 1.5832), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:23,539 - root - INFO - Step 11280: lr=2.35E-06, loss= 1.0415 (max= 1.5832), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:23,539 - root - INFO - Step 11280: lr=2.35E-06, loss= 1.0415 (max= 1.5832), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:55,376 - root - INFO - Step 11290: lr=2.35E-06, loss= 1.0467 (max= 1.5139), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:55,376 - root - INFO - Step 11290: lr=2.35E-06, loss= 1.0467 (max= 1.5139), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:55,377 - root - INFO - Step 11290: lr=2.35E-06, loss= 1.0467 (max= 1.5139), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:55,377 - root - INFO - Step 11290: lr=2.35E-06, loss= 1.0467 (max= 1.5139), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:55,377 - root - INFO - Step 11290: lr=2.35E-06, loss= 1.0467 (max= 1.5139), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:55,377 - root - INFO - Step 11290: lr=2.35E-06, loss= 1.0467 (max= 1.5139), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:55,377 - root - INFO - Step 11290: lr=2.35E-06, loss= 1.0467 (max= 1.5139), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:07:55,377 - root - INFO - Step 11290: lr=2.35E-06, loss= 1.0467 (max= 1.5139), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:27,198 - root - INFO - Step 11300: lr=2.34E-06, loss= 1.0369 (max= 1.5289), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:27,198 - root - INFO - Step 11300: lr=2.34E-06, loss= 1.0369 (max= 1.5289), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:27,198 - root - INFO - Step 11300: lr=2.34E-06, loss= 1.0369 (max= 1.5289), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:27,199 - root - INFO - Step 11300: lr=2.34E-06, loss= 1.0369 (max= 1.5289), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:27,199 - root - INFO - Step 11300: lr=2.34E-06, loss= 1.0369 (max= 1.5289), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:27,199 - root - INFO - Step 11300: lr=2.34E-06, loss= 1.0369 (max= 1.5289), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:27,199 - root - INFO - Step 11300: lr=2.34E-06, loss= 1.0369 (max= 1.5289), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:27,199 - root - INFO - Step 11300: lr=2.34E-06, loss= 1.0369 (max= 1.5289), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:58,918 - root - INFO - Step 11310: lr=2.34E-06, loss= 1.0528 (max= 1.5212), tps=20663, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:58,918 - root - INFO - Step 11310: lr=2.34E-06, loss= 1.0528 (max= 1.5212), tps=20663, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:58,918 - root - INFO - Step 11310: lr=2.34E-06, loss= 1.0528 (max= 1.5212), tps=20663, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:58,918 - root - INFO - Step 11310: lr=2.34E-06, loss= 1.0528 (max= 1.5212), tps=20663, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:58,918 - root - INFO - Step 11310: lr=2.34E-06, loss= 1.0528 (max= 1.5212), tps=20663, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:58,918 - root - INFO - Step 11310: lr=2.34E-06, loss= 1.0528 (max= 1.5212), tps=20663, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:58,918 - root - INFO - Step 11310: lr=2.34E-06, loss= 1.0528 (max= 1.5212), tps=20663, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:08:58,918 - root - INFO - Step 11310: lr=2.34E-06, loss= 1.0528 (max= 1.5212), tps=20663, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:09:30,809 - root - INFO - Step 11320: lr=2.34E-06, loss= 1.0314 (max= 1.5313), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:09:30,809 - root - INFO - Step 11320: lr=2.34E-06, loss= 1.0314 (max= 1.5313), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:09:30,810 - root - INFO - Step 11320: lr=2.34E-06, loss= 1.0314 (max= 1.5313), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:09:30,810 - root - INFO - Step 11320: lr=2.34E-06, loss= 1.0314 (max= 1.5313), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:09:30,810 - root - INFO - Step 11320: lr=2.34E-06, loss= 1.0314 (max= 1.5313), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:09:30,810 - root - INFO - Step 11320: lr=2.34E-06, loss= 1.0314 (max= 1.5313), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:09:30,810 - root - INFO - Step 11320: lr=2.34E-06, loss= 1.0314 (max= 1.5313), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:09:30,810 - root - INFO - Step 11320: lr=2.34E-06, loss= 1.0314 (max= 1.5313), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:02,558 - root - INFO - Step 11330: lr=2.33E-06, loss= 1.0705 (max= 1.5918), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:02,558 - root - INFO - Step 11330: lr=2.33E-06, loss= 1.0705 (max= 1.5918), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:02,558 - root - INFO - Step 11330: lr=2.33E-06, loss= 1.0705 (max= 1.5918), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:02,558 - root - INFO - Step 11330: lr=2.33E-06, loss= 1.0705 (max= 1.5918), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:02,558 - root - INFO - Step 11330: lr=2.33E-06, loss= 1.0705 (max= 1.5918), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:02,558 - root - INFO - Step 11330: lr=2.33E-06, loss= 1.0705 (max= 1.5918), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:02,559 - root - INFO - Step 11330: lr=2.33E-06, loss= 1.0705 (max= 1.5918), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:02,559 - root - INFO - Step 11330: lr=2.33E-06, loss= 1.0705 (max= 1.5918), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:34,501 - root - INFO - Step 11340: lr=2.33E-06, loss= 1.0635 (max= 1.5307), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:34,501 - root - INFO - Step 11340: lr=2.33E-06, loss= 1.0635 (max= 1.5307), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:34,501 - root - INFO - Step 11340: lr=2.33E-06, loss= 1.0635 (max= 1.5307), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:34,501 - root - INFO - Step 11340: lr=2.33E-06, loss= 1.0635 (max= 1.5307), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:34,501 - root - INFO - Step 11340: lr=2.33E-06, loss= 1.0635 (max= 1.5307), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:34,501 - root - INFO - Step 11340: lr=2.33E-06, loss= 1.0635 (max= 1.5307), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:34,501 - root - INFO - Step 11340: lr=2.33E-06, loss= 1.0635 (max= 1.5307), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:10:34,501 - root - INFO - Step 11340: lr=2.33E-06, loss= 1.0635 (max= 1.5307), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:06,329 - root - INFO - Step 11350: lr=2.33E-06, loss= 1.0599 (max= 1.4882), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:06,329 - root - INFO - Step 11350: lr=2.33E-06, loss= 1.0599 (max= 1.4882), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:06,329 - root - INFO - Step 11350: lr=2.33E-06, loss= 1.0599 (max= 1.4882), tps=20593, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:06,329 - root - INFO - Step 11350: lr=2.33E-06, loss= 1.0599 (max= 1.4882), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:06,329 - root - INFO - Step 11350: lr=2.33E-06, loss= 1.0599 (max= 1.4882), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:06,329 - root - INFO - Step 11350: lr=2.33E-06, loss= 1.0599 (max= 1.4882), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:06,329 - root - INFO - Step 11350: lr=2.33E-06, loss= 1.0599 (max= 1.4882), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:06,329 - root - INFO - Step 11350: lr=2.33E-06, loss= 1.0599 (max= 1.4882), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:38,356 - root - INFO - Step 11360: lr=2.32E-06, loss= 1.0298 (max= 1.4486), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:38,356 - root - INFO - Step 11360: lr=2.32E-06, loss= 1.0298 (max= 1.4486), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:38,356 - root - INFO - Step 11360: lr=2.32E-06, loss= 1.0298 (max= 1.4486), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:38,356 - root - INFO - Step 11360: lr=2.32E-06, loss= 1.0298 (max= 1.4486), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:38,356 - root - INFO - Step 11360: lr=2.32E-06, loss= 1.0298 (max= 1.4486), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:38,356 - root - INFO - Step 11360: lr=2.32E-06, loss= 1.0298 (max= 1.4486), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:38,356 - root - INFO - Step 11360: lr=2.32E-06, loss= 1.0298 (max= 1.4486), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:11:38,356 - root - INFO - Step 11360: lr=2.32E-06, loss= 1.0298 (max= 1.4486), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:10,712 - root - INFO - Step 11370: lr=2.32E-06, loss= 1.0646 (max= 1.5488), tps=20256, mfu=42.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:10,712 - root - INFO - Step 11370: lr=2.32E-06, loss= 1.0646 (max= 1.5488), tps=20256, mfu=42.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:10,713 - root - INFO - Step 11370: lr=2.32E-06, loss= 1.0646 (max= 1.5488), tps=20256, mfu=42.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:10,713 - root - INFO - Step 11370: lr=2.32E-06, loss= 1.0646 (max= 1.5488), tps=20257, mfu=42.21%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:10,713 - root - INFO - Step 11370: lr=2.32E-06, loss= 1.0646 (max= 1.5488), tps=20257, mfu=42.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:10,713 - root - INFO - Step 11370: lr=2.32E-06, loss= 1.0646 (max= 1.5488), tps=20257, mfu=42.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:10,713 - root - INFO - Step 11370: lr=2.32E-06, loss= 1.0646 (max= 1.5488), tps=20256, mfu=42.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:10,713 - root - INFO - Step 11370: lr=2.32E-06, loss= 1.0646 (max= 1.5488), tps=20256, mfu=42.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:42,753 - root - INFO - Step 11380: lr=2.32E-06, loss= 1.0362 (max= 1.5412), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:42,753 - root - INFO - Step 11380: lr=2.32E-06, loss= 1.0362 (max= 1.5412), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:42,753 - root - INFO - Step 11380: lr=2.32E-06, loss= 1.0362 (max= 1.5412), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:42,753 - root - INFO - Step 11380: lr=2.32E-06, loss= 1.0362 (max= 1.5412), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:42,753 - root - INFO - Step 11380: lr=2.32E-06, loss= 1.0362 (max= 1.5412), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:42,753 - root - INFO - Step 11380: lr=2.32E-06, loss= 1.0362 (max= 1.5412), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:42,753 - root - INFO - Step 11380: lr=2.32E-06, loss= 1.0362 (max= 1.5412), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:12:42,754 - root - INFO - Step 11380: lr=2.32E-06, loss= 1.0362 (max= 1.5412), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:14,631 - root - INFO - Step 11390: lr=2.31E-06, loss= 1.0565 (max= 1.4203), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:14,631 - root - INFO - Step 11390: lr=2.31E-06, loss= 1.0565 (max= 1.4203), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:14,631 - root - INFO - Step 11390: lr=2.31E-06, loss= 1.0565 (max= 1.4203), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:14,631 - root - INFO - Step 11390: lr=2.31E-06, loss= 1.0565 (max= 1.4203), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:14,631 - root - INFO - Step 11390: lr=2.31E-06, loss= 1.0565 (max= 1.4203), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:14,631 - root - INFO - Step 11390: lr=2.31E-06, loss= 1.0565 (max= 1.4203), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:14,631 - root - INFO - Step 11390: lr=2.31E-06, loss= 1.0565 (max= 1.4203), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:14,631 - root - INFO - Step 11390: lr=2.31E-06, loss= 1.0565 (max= 1.4203), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:46,518 - root - INFO - Step 11400: lr=2.31E-06, loss= 1.0306 (max= 1.4510), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:46,519 - root - INFO - Step 11400: lr=2.31E-06, loss= 1.0306 (max= 1.4510), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:46,519 - root - INFO - Step 11400: lr=2.31E-06, loss= 1.0306 (max= 1.4510), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:46,519 - root - INFO - Step 11400: lr=2.31E-06, loss= 1.0306 (max= 1.4510), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:46,519 - root - INFO - Step 11400: lr=2.31E-06, loss= 1.0306 (max= 1.4510), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:46,519 - root - INFO - Step 11400: lr=2.31E-06, loss= 1.0306 (max= 1.4510), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:46,519 - root - INFO - Step 11400: lr=2.31E-06, loss= 1.0306 (max= 1.4510), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:13:46,519 - root - INFO - Step 11400: lr=2.31E-06, loss= 1.0306 (max= 1.4510), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:18,347 - root - INFO - Step 11410: lr=2.31E-06, loss= 1.0782 (max= 1.4909), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:18,347 - root - INFO - Step 11410: lr=2.31E-06, loss= 1.0782 (max= 1.4909), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:18,347 - root - INFO - Step 11410: lr=2.31E-06, loss= 1.0782 (max= 1.4909), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:18,347 - root - INFO - Step 11410: lr=2.31E-06, loss= 1.0782 (max= 1.4909), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:18,348 - root - INFO - Step 11410: lr=2.31E-06, loss= 1.0782 (max= 1.4909), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:18,348 - root - INFO - Step 11410: lr=2.31E-06, loss= 1.0782 (max= 1.4909), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:18,348 - root - INFO - Step 11410: lr=2.31E-06, loss= 1.0782 (max= 1.4909), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:18,348 - root - INFO - Step 11410: lr=2.31E-06, loss= 1.0782 (max= 1.4909), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:50,301 - root - INFO - Step 11420: lr=2.30E-06, loss= 1.0693 (max= 1.5731), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:50,301 - root - INFO - Step 11420: lr=2.30E-06, loss= 1.0693 (max= 1.5731), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:50,301 - root - INFO - Step 11420: lr=2.30E-06, loss= 1.0693 (max= 1.5731), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:50,301 - root - INFO - Step 11420: lr=2.30E-06, loss= 1.0693 (max= 1.5731), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:50,301 - root - INFO - Step 11420: lr=2.30E-06, loss= 1.0693 (max= 1.5731), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:50,301 - root - INFO - Step 11420: lr=2.30E-06, loss= 1.0693 (max= 1.5731), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:50,301 - root - INFO - Step 11420: lr=2.30E-06, loss= 1.0693 (max= 1.5731), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:14:50,301 - root - INFO - Step 11420: lr=2.30E-06, loss= 1.0693 (max= 1.5731), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:22,168 - root - INFO - Step 11430: lr=2.30E-06, loss= 1.0436 (max= 1.5022), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:22,168 - root - INFO - Step 11430: lr=2.30E-06, loss= 1.0436 (max= 1.5022), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:22,168 - root - INFO - Step 11430: lr=2.30E-06, loss= 1.0436 (max= 1.5022), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:22,168 - root - INFO - Step 11430: lr=2.30E-06, loss= 1.0436 (max= 1.5022), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:22,168 - root - INFO - Step 11430: lr=2.30E-06, loss= 1.0436 (max= 1.5022), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:22,168 - root - INFO - Step 11430: lr=2.30E-06, loss= 1.0436 (max= 1.5022), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:22,168 - root - INFO - Step 11430: lr=2.30E-06, loss= 1.0436 (max= 1.5022), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:22,168 - root - INFO - Step 11430: lr=2.30E-06, loss= 1.0436 (max= 1.5022), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:53,979 - root - INFO - Step 11440: lr=2.29E-06, loss= 1.0653 (max= 1.4683), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:53,979 - root - INFO - Step 11440: lr=2.29E-06, loss= 1.0653 (max= 1.4683), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:53,979 - root - INFO - Step 11440: lr=2.29E-06, loss= 1.0653 (max= 1.4683), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:53,979 - root - INFO - Step 11440: lr=2.29E-06, loss= 1.0653 (max= 1.4683), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:53,979 - root - INFO - Step 11440: lr=2.29E-06, loss= 1.0653 (max= 1.4683), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:53,979 - root - INFO - Step 11440: lr=2.29E-06, loss= 1.0653 (max= 1.4683), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:53,979 - root - INFO - Step 11440: lr=2.29E-06, loss= 1.0653 (max= 1.4683), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:15:53,979 - root - INFO - Step 11440: lr=2.29E-06, loss= 1.0653 (max= 1.4683), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:25,790 - root - INFO - Step 11450: lr=2.29E-06, loss= 1.0448 (max= 1.6477), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:25,790 - root - INFO - Step 11450: lr=2.29E-06, loss= 1.0448 (max= 1.6477), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:25,790 - root - INFO - Step 11450: lr=2.29E-06, loss= 1.0448 (max= 1.6477), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:25,790 - root - INFO - Step 11450: lr=2.29E-06, loss= 1.0448 (max= 1.6477), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:25,790 - root - INFO - Step 11450: lr=2.29E-06, loss= 1.0448 (max= 1.6477), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:25,790 - root - INFO - Step 11450: lr=2.29E-06, loss= 1.0448 (max= 1.6477), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:25,790 - root - INFO - Step 11450: lr=2.29E-06, loss= 1.0448 (max= 1.6477), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:25,790 - root - INFO - Step 11450: lr=2.29E-06, loss= 1.0448 (max= 1.6477), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:57,818 - root - INFO - Step 11460: lr=2.29E-06, loss= 1.0465 (max= 1.6539), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:57,818 - root - INFO - Step 11460: lr=2.29E-06, loss= 1.0465 (max= 1.6539), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:57,818 - root - INFO - Step 11460: lr=2.29E-06, loss= 1.0465 (max= 1.6539), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:57,818 - root - INFO - Step 11460: lr=2.29E-06, loss= 1.0465 (max= 1.6539), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:57,818 - root - INFO - Step 11460: lr=2.29E-06, loss= 1.0465 (max= 1.6539), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:57,818 - root - INFO - Step 11460: lr=2.29E-06, loss= 1.0465 (max= 1.6539), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:57,818 - root - INFO - Step 11460: lr=2.29E-06, loss= 1.0465 (max= 1.6539), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:16:57,818 - root - INFO - Step 11460: lr=2.29E-06, loss= 1.0465 (max= 1.6539), tps=20464, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:17:29,725 - root - INFO - Step 11470: lr=2.28E-06, loss= 1.0485 (max= 1.4567), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:17:29,725 - root - INFO - Step 11470: lr=2.28E-06, loss= 1.0485 (max= 1.4567), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:17:29,725 - root - INFO - Step 11470: lr=2.28E-06, loss= 1.0485 (max= 1.4567), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:17:29,725 - root - INFO - Step 11470: lr=2.28E-06, loss= 1.0485 (max= 1.4567), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:17:29,725 - root - INFO - Step 11470: lr=2.28E-06, loss= 1.0485 (max= 1.4567), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:17:29,725 - root - INFO - Step 11470: lr=2.28E-06, loss= 1.0485 (max= 1.4567), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:17:29,725 - root - INFO - Step 11470: lr=2.28E-06, loss= 1.0485 (max= 1.4567), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:17:29,726 - root - INFO - Step 11470: lr=2.28E-06, loss= 1.0485 (max= 1.4567), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:01,474 - root - INFO - Step 11480: lr=2.28E-06, loss= 1.0288 (max= 1.4272), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:01,475 - root - INFO - Step 11480: lr=2.28E-06, loss= 1.0288 (max= 1.4272), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:01,475 - root - INFO - Step 11480: lr=2.28E-06, loss= 1.0288 (max= 1.4272), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:01,475 - root - INFO - Step 11480: lr=2.28E-06, loss= 1.0288 (max= 1.4272), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:01,475 - root - INFO - Step 11480: lr=2.28E-06, loss= 1.0288 (max= 1.4272), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:01,475 - root - INFO - Step 11480: lr=2.28E-06, loss= 1.0288 (max= 1.4272), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:01,475 - root - INFO - Step 11480: lr=2.28E-06, loss= 1.0288 (max= 1.4272), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:01,475 - root - INFO - Step 11480: lr=2.28E-06, loss= 1.0288 (max= 1.4272), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:33,283 - root - INFO - Step 11490: lr=2.28E-06, loss= 1.0781 (max= 1.5052), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:33,283 - root - INFO - Step 11490: lr=2.28E-06, loss= 1.0781 (max= 1.5052), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:33,283 - root - INFO - Step 11490: lr=2.28E-06, loss= 1.0781 (max= 1.5052), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:33,283 - root - INFO - Step 11490: lr=2.28E-06, loss= 1.0781 (max= 1.5052), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:33,283 - root - INFO - Step 11490: lr=2.28E-06, loss= 1.0781 (max= 1.5052), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:33,283 - root - INFO - Step 11490: lr=2.28E-06, loss= 1.0781 (max= 1.5052), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:33,283 - root - INFO - Step 11490: lr=2.28E-06, loss= 1.0781 (max= 1.5052), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:18:33,283 - root - INFO - Step 11490: lr=2.28E-06, loss= 1.0781 (max= 1.5052), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:05,207 - root - INFO - Step 11500: lr=2.27E-06, loss= 1.0650 (max= 1.7728), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:05,207 - root - INFO - Step 11500: lr=2.27E-06, loss= 1.0650 (max= 1.7728), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:05,207 - root - INFO - Step 11500: lr=2.27E-06, loss= 1.0650 (max= 1.7728), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:05,207 - root - INFO - Step 11500: lr=2.27E-06, loss= 1.0650 (max= 1.7728), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:05,207 - root - INFO - Step 11500: lr=2.27E-06, loss= 1.0650 (max= 1.7728), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:05,208 - root - INFO - Step 11500: lr=2.27E-06, loss= 1.0650 (max= 1.7728), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:05,208 - root - INFO - Step 11500: lr=2.27E-06, loss= 1.0650 (max= 1.7728), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:05,208 - root - INFO - Step 11500: lr=2.27E-06, loss= 1.0650 (max= 1.7728), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:37,144 - root - INFO - Step 11510: lr=2.27E-06, loss= 1.0430 (max= 1.5310), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:37,144 - root - INFO - Step 11510: lr=2.27E-06, loss= 1.0430 (max= 1.5310), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:37,144 - root - INFO - Step 11510: lr=2.27E-06, loss= 1.0430 (max= 1.5310), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:37,144 - root - INFO - Step 11510: lr=2.27E-06, loss= 1.0430 (max= 1.5310), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:37,144 - root - INFO - Step 11510: lr=2.27E-06, loss= 1.0430 (max= 1.5310), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:37,144 - root - INFO - Step 11510: lr=2.27E-06, loss= 1.0430 (max= 1.5310), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:37,144 - root - INFO - Step 11510: lr=2.27E-06, loss= 1.0430 (max= 1.5310), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:19:37,145 - root - INFO - Step 11510: lr=2.27E-06, loss= 1.0430 (max= 1.5310), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:08,943 - root - INFO - Step 11520: lr=2.27E-06, loss= 1.0487 (max= 1.4392), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:08,943 - root - INFO - Step 11520: lr=2.27E-06, loss= 1.0487 (max= 1.4392), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:08,943 - root - INFO - Step 11520: lr=2.27E-06, loss= 1.0487 (max= 1.4392), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:08,943 - root - INFO - Step 11520: lr=2.27E-06, loss= 1.0487 (max= 1.4392), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:08,943 - root - INFO - Step 11520: lr=2.27E-06, loss= 1.0487 (max= 1.4392), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:08,943 - root - INFO - Step 11520: lr=2.27E-06, loss= 1.0487 (max= 1.4392), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:08,943 - root - INFO - Step 11520: lr=2.27E-06, loss= 1.0487 (max= 1.4392), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:08,943 - root - INFO - Step 11520: lr=2.27E-06, loss= 1.0487 (max= 1.4392), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:40,832 - root - INFO - Step 11530: lr=2.26E-06, loss= 1.0269 (max= 1.4257), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:40,832 - root - INFO - Step 11530: lr=2.26E-06, loss= 1.0269 (max= 1.4257), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:40,832 - root - INFO - Step 11530: lr=2.26E-06, loss= 1.0269 (max= 1.4257), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:40,832 - root - INFO - Step 11530: lr=2.26E-06, loss= 1.0269 (max= 1.4257), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:40,832 - root - INFO - Step 11530: lr=2.26E-06, loss= 1.0269 (max= 1.4257), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:40,832 - root - INFO - Step 11530: lr=2.26E-06, loss= 1.0269 (max= 1.4257), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:40,832 - root - INFO - Step 11530: lr=2.26E-06, loss= 1.0269 (max= 1.4257), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:20:40,832 - root - INFO - Step 11530: lr=2.26E-06, loss= 1.0269 (max= 1.4257), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:12,806 - root - INFO - Step 11540: lr=2.26E-06, loss= 1.0304 (max= 1.5809), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:12,806 - root - INFO - Step 11540: lr=2.26E-06, loss= 1.0304 (max= 1.5809), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:12,806 - root - INFO - Step 11540: lr=2.26E-06, loss= 1.0304 (max= 1.5809), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:12,806 - root - INFO - Step 11540: lr=2.26E-06, loss= 1.0304 (max= 1.5809), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:12,806 - root - INFO - Step 11540: lr=2.26E-06, loss= 1.0304 (max= 1.5809), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:12,806 - root - INFO - Step 11540: lr=2.26E-06, loss= 1.0304 (max= 1.5809), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:12,806 - root - INFO - Step 11540: lr=2.26E-06, loss= 1.0304 (max= 1.5809), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:12,806 - root - INFO - Step 11540: lr=2.26E-06, loss= 1.0304 (max= 1.5809), tps=20499, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:44,789 - root - INFO - Step 11550: lr=2.26E-06, loss= 1.0696 (max= 1.5935), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:44,789 - root - INFO - Step 11550: lr=2.26E-06, loss= 1.0696 (max= 1.5935), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:44,789 - root - INFO - Step 11550: lr=2.26E-06, loss= 1.0696 (max= 1.5935), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:44,789 - root - INFO - Step 11550: lr=2.26E-06, loss= 1.0696 (max= 1.5935), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:44,789 - root - INFO - Step 11550: lr=2.26E-06, loss= 1.0696 (max= 1.5935), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:44,789 - root - INFO - Step 11550: lr=2.26E-06, loss= 1.0696 (max= 1.5935), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:44,789 - root - INFO - Step 11550: lr=2.26E-06, loss= 1.0696 (max= 1.5935), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:21:44,789 - root - INFO - Step 11550: lr=2.26E-06, loss= 1.0696 (max= 1.5935), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:16,635 - root - INFO - Step 11560: lr=2.25E-06, loss= 1.0596 (max= 1.4664), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:16,635 - root - INFO - Step 11560: lr=2.25E-06, loss= 1.0596 (max= 1.4664), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:16,635 - root - INFO - Step 11560: lr=2.25E-06, loss= 1.0596 (max= 1.4664), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:16,635 - root - INFO - Step 11560: lr=2.25E-06, loss= 1.0596 (max= 1.4664), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:16,635 - root - INFO - Step 11560: lr=2.25E-06, loss= 1.0596 (max= 1.4664), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:16,635 - root - INFO - Step 11560: lr=2.25E-06, loss= 1.0596 (max= 1.4664), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:16,635 - root - INFO - Step 11560: lr=2.25E-06, loss= 1.0596 (max= 1.4664), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:16,635 - root - INFO - Step 11560: lr=2.25E-06, loss= 1.0596 (max= 1.4664), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:48,445 - root - INFO - Step 11570: lr=2.25E-06, loss= 1.0513 (max= 1.5269), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:48,445 - root - INFO - Step 11570: lr=2.25E-06, loss= 1.0513 (max= 1.5269), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:48,445 - root - INFO - Step 11570: lr=2.25E-06, loss= 1.0513 (max= 1.5269), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:48,445 - root - INFO - Step 11570: lr=2.25E-06, loss= 1.0513 (max= 1.5269), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:48,445 - root - INFO - Step 11570: lr=2.25E-06, loss= 1.0513 (max= 1.5269), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:48,445 - root - INFO - Step 11570: lr=2.25E-06, loss= 1.0513 (max= 1.5269), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:48,445 - root - INFO - Step 11570: lr=2.25E-06, loss= 1.0513 (max= 1.5269), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:22:48,445 - root - INFO - Step 11570: lr=2.25E-06, loss= 1.0513 (max= 1.5269), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:20,590 - root - INFO - Step 11580: lr=2.25E-06, loss= 1.0462 (max= 1.4272), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:20,590 - root - INFO - Step 11580: lr=2.25E-06, loss= 1.0462 (max= 1.4272), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:20,590 - root - INFO - Step 11580: lr=2.25E-06, loss= 1.0462 (max= 1.4272), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:20,590 - root - INFO - Step 11580: lr=2.25E-06, loss= 1.0462 (max= 1.4272), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:20,590 - root - INFO - Step 11580: lr=2.25E-06, loss= 1.0462 (max= 1.4272), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:20,590 - root - INFO - Step 11580: lr=2.25E-06, loss= 1.0462 (max= 1.4272), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:20,590 - root - INFO - Step 11580: lr=2.25E-06, loss= 1.0462 (max= 1.4272), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:20,590 - root - INFO - Step 11580: lr=2.25E-06, loss= 1.0462 (max= 1.4272), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:52,373 - root - INFO - Step 11590: lr=2.24E-06, loss= 1.0371 (max= 1.4206), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:52,373 - root - INFO - Step 11590: lr=2.24E-06, loss= 1.0371 (max= 1.4206), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:52,373 - root - INFO - Step 11590: lr=2.24E-06, loss= 1.0371 (max= 1.4206), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:52,373 - root - INFO - Step 11590: lr=2.24E-06, loss= 1.0371 (max= 1.4206), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:52,373 - root - INFO - Step 11590: lr=2.24E-06, loss= 1.0371 (max= 1.4206), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:52,373 - root - INFO - Step 11590: lr=2.24E-06, loss= 1.0371 (max= 1.4206), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:52,373 - root - INFO - Step 11590: lr=2.24E-06, loss= 1.0371 (max= 1.4206), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:23:52,373 - root - INFO - Step 11590: lr=2.24E-06, loss= 1.0371 (max= 1.4206), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:24,127 - root - INFO - Step 11600: lr=2.24E-06, loss= 1.0488 (max= 1.4874), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:24,127 - root - INFO - Step 11600: lr=2.24E-06, loss= 1.0488 (max= 1.4874), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:24,127 - root - INFO - Step 11600: lr=2.24E-06, loss= 1.0488 (max= 1.4874), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:24,127 - root - INFO - Step 11600: lr=2.24E-06, loss= 1.0488 (max= 1.4874), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:24,127 - root - INFO - Step 11600: lr=2.24E-06, loss= 1.0488 (max= 1.4874), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:24,127 - root - INFO - Step 11600: lr=2.24E-06, loss= 1.0488 (max= 1.4874), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:24,127 - root - INFO - Step 11600: lr=2.24E-06, loss= 1.0488 (max= 1.4874), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:24,127 - root - INFO - Step 11600: lr=2.24E-06, loss= 1.0488 (max= 1.4874), tps=20641, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:56,111 - root - INFO - Step 11610: lr=2.23E-06, loss= 1.0553 (max= 1.4477), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:56,111 - root - INFO - Step 11610: lr=2.23E-06, loss= 1.0553 (max= 1.4477), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:56,111 - root - INFO - Step 11610: lr=2.23E-06, loss= 1.0553 (max= 1.4477), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:56,111 - root - INFO - Step 11610: lr=2.23E-06, loss= 1.0553 (max= 1.4477), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:56,111 - root - INFO - Step 11610: lr=2.23E-06, loss= 1.0553 (max= 1.4477), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:56,111 - root - INFO - Step 11610: lr=2.23E-06, loss= 1.0553 (max= 1.4477), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:56,111 - root - INFO - Step 11610: lr=2.23E-06, loss= 1.0553 (max= 1.4477), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:24:56,111 - root - INFO - Step 11610: lr=2.23E-06, loss= 1.0553 (max= 1.4477), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:27,950 - root - INFO - Step 11620: lr=2.23E-06, loss= 1.0598 (max= 1.7186), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:27,950 - root - INFO - Step 11620: lr=2.23E-06, loss= 1.0598 (max= 1.7186), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:27,950 - root - INFO - Step 11620: lr=2.23E-06, loss= 1.0598 (max= 1.7186), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:27,950 - root - INFO - Step 11620: lr=2.23E-06, loss= 1.0598 (max= 1.7186), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:27,950 - root - INFO - Step 11620: lr=2.23E-06, loss= 1.0598 (max= 1.7186), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:27,950 - root - INFO - Step 11620: lr=2.23E-06, loss= 1.0598 (max= 1.7186), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:27,950 - root - INFO - Step 11620: lr=2.23E-06, loss= 1.0598 (max= 1.7186), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:27,950 - root - INFO - Step 11620: lr=2.23E-06, loss= 1.0598 (max= 1.7186), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:59,769 - root - INFO - Step 11630: lr=2.23E-06, loss= 1.0286 (max= 1.4558), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:59,769 - root - INFO - Step 11630: lr=2.23E-06, loss= 1.0286 (max= 1.4558), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:59,769 - root - INFO - Step 11630: lr=2.23E-06, loss= 1.0286 (max= 1.4558), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:59,769 - root - INFO - Step 11630: lr=2.23E-06, loss= 1.0286 (max= 1.4558), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:59,769 - root - INFO - Step 11630: lr=2.23E-06, loss= 1.0286 (max= 1.4558), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:59,769 - root - INFO - Step 11630: lr=2.23E-06, loss= 1.0286 (max= 1.4558), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:59,769 - root - INFO - Step 11630: lr=2.23E-06, loss= 1.0286 (max= 1.4558), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:25:59,769 - root - INFO - Step 11630: lr=2.23E-06, loss= 1.0286 (max= 1.4558), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:26:31,732 - root - INFO - Step 11640: lr=2.22E-06, loss= 1.0591 (max= 1.4858), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:26:31,732 - root - INFO - Step 11640: lr=2.22E-06, loss= 1.0591 (max= 1.4858), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:26:31,732 - root - INFO - Step 11640: lr=2.22E-06, loss= 1.0591 (max= 1.4858), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:26:31,732 - root - INFO - Step 11640: lr=2.22E-06, loss= 1.0591 (max= 1.4858), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:26:31,732 - root - INFO - Step 11640: lr=2.22E-06, loss= 1.0591 (max= 1.4858), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:26:31,732 - root - INFO - Step 11640: lr=2.22E-06, loss= 1.0591 (max= 1.4858), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:26:31,732 - root - INFO - Step 11640: lr=2.22E-06, loss= 1.0591 (max= 1.4858), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:26:31,732 - root - INFO - Step 11640: lr=2.22E-06, loss= 1.0591 (max= 1.4858), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:03,551 - root - INFO - Step 11650: lr=2.22E-06, loss= 1.0685 (max= 1.7248), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:03,551 - root - INFO - Step 11650: lr=2.22E-06, loss= 1.0685 (max= 1.7248), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:03,552 - root - INFO - Step 11650: lr=2.22E-06, loss= 1.0685 (max= 1.7248), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:03,552 - root - INFO - Step 11650: lr=2.22E-06, loss= 1.0685 (max= 1.7248), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:03,552 - root - INFO - Step 11650: lr=2.22E-06, loss= 1.0685 (max= 1.7248), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:03,552 - root - INFO - Step 11650: lr=2.22E-06, loss= 1.0685 (max= 1.7248), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:03,552 - root - INFO - Step 11650: lr=2.22E-06, loss= 1.0685 (max= 1.7248), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:03,552 - root - INFO - Step 11650: lr=2.22E-06, loss= 1.0685 (max= 1.7248), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:35,333 - root - INFO - Step 11660: lr=2.22E-06, loss= 1.0527 (max= 1.5269), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:35,333 - root - INFO - Step 11660: lr=2.22E-06, loss= 1.0527 (max= 1.5269), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:35,333 - root - INFO - Step 11660: lr=2.22E-06, loss= 1.0527 (max= 1.5269), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:35,333 - root - INFO - Step 11660: lr=2.22E-06, loss= 1.0527 (max= 1.5269), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:35,333 - root - INFO - Step 11660: lr=2.22E-06, loss= 1.0527 (max= 1.5269), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:35,333 - root - INFO - Step 11660: lr=2.22E-06, loss= 1.0527 (max= 1.5269), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:35,333 - root - INFO - Step 11660: lr=2.22E-06, loss= 1.0527 (max= 1.5269), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:27:35,333 - root - INFO - Step 11660: lr=2.22E-06, loss= 1.0527 (max= 1.5269), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:07,207 - root - INFO - Step 11670: lr=2.21E-06, loss= 1.0268 (max= 1.4180), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:07,207 - root - INFO - Step 11670: lr=2.21E-06, loss= 1.0268 (max= 1.4180), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:07,207 - root - INFO - Step 11670: lr=2.21E-06, loss= 1.0268 (max= 1.4180), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:07,207 - root - INFO - Step 11670: lr=2.21E-06, loss= 1.0268 (max= 1.4180), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:07,207 - root - INFO - Step 11670: lr=2.21E-06, loss= 1.0268 (max= 1.4180), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:07,207 - root - INFO - Step 11670: lr=2.21E-06, loss= 1.0268 (max= 1.4180), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:07,207 - root - INFO - Step 11670: lr=2.21E-06, loss= 1.0268 (max= 1.4180), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:07,207 - root - INFO - Step 11670: lr=2.21E-06, loss= 1.0268 (max= 1.4180), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:39,057 - root - INFO - Step 11680: lr=2.21E-06, loss= 1.0567 (max= 1.4342), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:39,057 - root - INFO - Step 11680: lr=2.21E-06, loss= 1.0567 (max= 1.4342), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:39,057 - root - INFO - Step 11680: lr=2.21E-06, loss= 1.0567 (max= 1.4342), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:39,058 - root - INFO - Step 11680: lr=2.21E-06, loss= 1.0567 (max= 1.4342), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:39,058 - root - INFO - Step 11680: lr=2.21E-06, loss= 1.0567 (max= 1.4342), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:39,058 - root - INFO - Step 11680: lr=2.21E-06, loss= 1.0567 (max= 1.4342), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:39,058 - root - INFO - Step 11680: lr=2.21E-06, loss= 1.0567 (max= 1.4342), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:28:39,058 - root - INFO - Step 11680: lr=2.21E-06, loss= 1.0567 (max= 1.4342), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:11,030 - root - INFO - Step 11690: lr=2.21E-06, loss= 1.0519 (max= 1.4006), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:11,030 - root - INFO - Step 11690: lr=2.21E-06, loss= 1.0519 (max= 1.4006), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:11,030 - root - INFO - Step 11690: lr=2.21E-06, loss= 1.0519 (max= 1.4006), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:11,030 - root - INFO - Step 11690: lr=2.21E-06, loss= 1.0519 (max= 1.4006), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:11,030 - root - INFO - Step 11690: lr=2.21E-06, loss= 1.0519 (max= 1.4006), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:11,030 - root - INFO - Step 11690: lr=2.21E-06, loss= 1.0519 (max= 1.4006), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:11,030 - root - INFO - Step 11690: lr=2.21E-06, loss= 1.0519 (max= 1.4006), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:11,030 - root - INFO - Step 11690: lr=2.21E-06, loss= 1.0519 (max= 1.4006), tps=20500, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:42,797 - root - INFO - Step 11700: lr=2.20E-06, loss= 1.0509 (max= 1.4968), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:42,797 - root - INFO - Step 11700: lr=2.20E-06, loss= 1.0509 (max= 1.4968), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:42,797 - root - INFO - Step 11700: lr=2.20E-06, loss= 1.0509 (max= 1.4968), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:42,797 - root - INFO - Step 11700: lr=2.20E-06, loss= 1.0509 (max= 1.4968), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:42,797 - root - INFO - Step 11700: lr=2.20E-06, loss= 1.0509 (max= 1.4968), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:42,797 - root - INFO - Step 11700: lr=2.20E-06, loss= 1.0509 (max= 1.4968), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:42,797 - root - INFO - Step 11700: lr=2.20E-06, loss= 1.0509 (max= 1.4968), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:29:42,797 - root - INFO - Step 11700: lr=2.20E-06, loss= 1.0509 (max= 1.4968), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:14,606 - root - INFO - Step 11710: lr=2.20E-06, loss= 1.0580 (max= 1.4724), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:14,606 - root - INFO - Step 11710: lr=2.20E-06, loss= 1.0580 (max= 1.4724), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:14,607 - root - INFO - Step 11710: lr=2.20E-06, loss= 1.0580 (max= 1.4724), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:14,607 - root - INFO - Step 11710: lr=2.20E-06, loss= 1.0580 (max= 1.4724), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:14,607 - root - INFO - Step 11710: lr=2.20E-06, loss= 1.0580 (max= 1.4724), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:14,607 - root - INFO - Step 11710: lr=2.20E-06, loss= 1.0580 (max= 1.4724), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:14,607 - root - INFO - Step 11710: lr=2.20E-06, loss= 1.0580 (max= 1.4724), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:14,607 - root - INFO - Step 11710: lr=2.20E-06, loss= 1.0580 (max= 1.4724), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:46,483 - root - INFO - Step 11720: lr=2.20E-06, loss= 1.0598 (max= 1.4999), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:46,483 - root - INFO - Step 11720: lr=2.20E-06, loss= 1.0598 (max= 1.4999), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:46,483 - root - INFO - Step 11720: lr=2.20E-06, loss= 1.0598 (max= 1.4999), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:46,483 - root - INFO - Step 11720: lr=2.20E-06, loss= 1.0598 (max= 1.4999), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:46,483 - root - INFO - Step 11720: lr=2.20E-06, loss= 1.0598 (max= 1.4999), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:46,483 - root - INFO - Step 11720: lr=2.20E-06, loss= 1.0598 (max= 1.4999), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:46,483 - root - INFO - Step 11720: lr=2.20E-06, loss= 1.0598 (max= 1.4999), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:46,483 - root - INFO - Step 11720: lr=2.20E-06, loss= 1.0598 (max= 1.4999), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:30:47,052 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2563056 2025-10-26 18:31:18,310 - root - INFO - Step 11730: lr=2.19E-06, loss= 1.0655 (max= 1.6047), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:18,311 - root - INFO - Step 11730: lr=2.19E-06, loss= 1.0655 (max= 1.6047), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:18,311 - root - INFO - Step 11730: lr=2.19E-06, loss= 1.0655 (max= 1.6047), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:18,311 - root - INFO - Step 11730: lr=2.19E-06, loss= 1.0655 (max= 1.6047), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:18,311 - root - INFO - Step 11730: lr=2.19E-06, loss= 1.0655 (max= 1.6047), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:18,311 - root - INFO - Step 11730: lr=2.19E-06, loss= 1.0655 (max= 1.6047), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:18,311 - root - INFO - Step 11730: lr=2.19E-06, loss= 1.0655 (max= 1.6047), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:18,311 - root - INFO - Step 11730: lr=2.19E-06, loss= 1.0655 (max= 1.6047), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:50,166 - root - INFO - Step 11740: lr=2.19E-06, loss= 1.0510 (max= 1.7889), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:50,167 - root - INFO - Step 11740: lr=2.19E-06, loss= 1.0510 (max= 1.7889), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:50,167 - root - INFO - Step 11740: lr=2.19E-06, loss= 1.0510 (max= 1.7889), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:50,167 - root - INFO - Step 11740: lr=2.19E-06, loss= 1.0510 (max= 1.7889), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:50,167 - root - INFO - Step 11740: lr=2.19E-06, loss= 1.0510 (max= 1.7889), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:50,167 - root - INFO - Step 11740: lr=2.19E-06, loss= 1.0510 (max= 1.7889), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:50,167 - root - INFO - Step 11740: lr=2.19E-06, loss= 1.0510 (max= 1.7889), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:31:50,167 - root - INFO - Step 11740: lr=2.19E-06, loss= 1.0510 (max= 1.7889), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:21,980 - root - INFO - Step 11750: lr=2.19E-06, loss= 1.0559 (max= 1.6334), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:21,980 - root - INFO - Step 11750: lr=2.19E-06, loss= 1.0559 (max= 1.6334), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:21,980 - root - INFO - Step 11750: lr=2.19E-06, loss= 1.0559 (max= 1.6334), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:21,980 - root - INFO - Step 11750: lr=2.19E-06, loss= 1.0559 (max= 1.6334), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:21,980 - root - INFO - Step 11750: lr=2.19E-06, loss= 1.0559 (max= 1.6334), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:21,980 - root - INFO - Step 11750: lr=2.19E-06, loss= 1.0559 (max= 1.6334), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:21,980 - root - INFO - Step 11750: lr=2.19E-06, loss= 1.0559 (max= 1.6334), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:21,980 - root - INFO - Step 11750: lr=2.19E-06, loss= 1.0559 (max= 1.6334), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:53,858 - root - INFO - Step 11760: lr=2.18E-06, loss= 1.0409 (max= 1.4807), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:53,858 - root - INFO - Step 11760: lr=2.18E-06, loss= 1.0409 (max= 1.4807), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:53,858 - root - INFO - Step 11760: lr=2.18E-06, loss= 1.0409 (max= 1.4807), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:53,858 - root - INFO - Step 11760: lr=2.18E-06, loss= 1.0409 (max= 1.4807), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:53,858 - root - INFO - Step 11760: lr=2.18E-06, loss= 1.0409 (max= 1.4807), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:53,859 - root - INFO - Step 11760: lr=2.18E-06, loss= 1.0409 (max= 1.4807), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:53,859 - root - INFO - Step 11760: lr=2.18E-06, loss= 1.0409 (max= 1.4807), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:32:53,859 - root - INFO - Step 11760: lr=2.18E-06, loss= 1.0409 (max= 1.4807), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:25,730 - root - INFO - Step 11770: lr=2.18E-06, loss= 1.0751 (max= 1.5902), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:25,730 - root - INFO - Step 11770: lr=2.18E-06, loss= 1.0751 (max= 1.5902), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:25,731 - root - INFO - Step 11770: lr=2.18E-06, loss= 1.0751 (max= 1.5902), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:25,731 - root - INFO - Step 11770: lr=2.18E-06, loss= 1.0751 (max= 1.5902), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:25,731 - root - INFO - Step 11770: lr=2.18E-06, loss= 1.0751 (max= 1.5902), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:25,731 - root - INFO - Step 11770: lr=2.18E-06, loss= 1.0751 (max= 1.5902), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:25,731 - root - INFO - Step 11770: lr=2.18E-06, loss= 1.0751 (max= 1.5902), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:25,731 - root - INFO - Step 11770: lr=2.18E-06, loss= 1.0751 (max= 1.5902), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:57,630 - root - INFO - Step 11780: lr=2.18E-06, loss= 1.0647 (max= 1.5272), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:57,630 - root - INFO - Step 11780: lr=2.18E-06, loss= 1.0647 (max= 1.5272), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:57,630 - root - INFO - Step 11780: lr=2.18E-06, loss= 1.0647 (max= 1.5272), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:57,630 - root - INFO - Step 11780: lr=2.18E-06, loss= 1.0647 (max= 1.5272), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:57,630 - root - INFO - Step 11780: lr=2.18E-06, loss= 1.0647 (max= 1.5272), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:57,630 - root - INFO - Step 11780: lr=2.18E-06, loss= 1.0647 (max= 1.5272), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:57,630 - root - INFO - Step 11780: lr=2.18E-06, loss= 1.0647 (max= 1.5272), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:33:57,630 - root - INFO - Step 11780: lr=2.18E-06, loss= 1.0647 (max= 1.5272), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:34:29,508 - root - INFO - Step 11790: lr=2.17E-06, loss= 1.0467 (max= 1.4685), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:34:29,508 - root - INFO - Step 11790: lr=2.17E-06, loss= 1.0467 (max= 1.4685), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:34:29,508 - root - INFO - Step 11790: lr=2.17E-06, loss= 1.0467 (max= 1.4685), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:34:29,508 - root - INFO - Step 11790: lr=2.17E-06, loss= 1.0467 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:34:29,508 - root - INFO - Step 11790: lr=2.17E-06, loss= 1.0467 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:34:29,508 - root - INFO - Step 11790: lr=2.17E-06, loss= 1.0467 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:34:29,508 - root - INFO - Step 11790: lr=2.17E-06, loss= 1.0467 (max= 1.4685), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:34:29,508 - root - INFO - Step 11790: lr=2.17E-06, loss= 1.0467 (max= 1.4685), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:34:54,069 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5508300 2025-10-26 18:35:01,319 - root - INFO - Step 11800: lr=2.17E-06, loss= 1.0698 (max= 1.5157), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:01,319 - root - INFO - Step 11800: lr=2.17E-06, loss= 1.0698 (max= 1.5157), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:01,319 - root - INFO - Step 11800: lr=2.17E-06, loss= 1.0698 (max= 1.5157), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:01,319 - root - INFO - Step 11800: lr=2.17E-06, loss= 1.0698 (max= 1.5157), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:01,319 - root - INFO - Step 11800: lr=2.17E-06, loss= 1.0698 (max= 1.5157), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:01,319 - root - INFO - Step 11800: lr=2.17E-06, loss= 1.0698 (max= 1.5157), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:01,319 - root - INFO - Step 11800: lr=2.17E-06, loss= 1.0698 (max= 1.5157), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:01,319 - root - INFO - Step 11800: lr=2.17E-06, loss= 1.0698 (max= 1.5157), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:33,176 - root - INFO - Step 11810: lr=2.17E-06, loss= 1.0521 (max= 1.5835), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:33,177 - root - INFO - Step 11810: lr=2.17E-06, loss= 1.0521 (max= 1.5835), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:33,177 - root - INFO - Step 11810: lr=2.17E-06, loss= 1.0521 (max= 1.5835), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:33,177 - root - INFO - Step 11810: lr=2.17E-06, loss= 1.0521 (max= 1.5835), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:33,177 - root - INFO - Step 11810: lr=2.17E-06, loss= 1.0521 (max= 1.5835), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:33,177 - root - INFO - Step 11810: lr=2.17E-06, loss= 1.0521 (max= 1.5835), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:33,177 - root - INFO - Step 11810: lr=2.17E-06, loss= 1.0521 (max= 1.5835), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:35:33,177 - root - INFO - Step 11810: lr=2.17E-06, loss= 1.0521 (max= 1.5835), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:05,203 - root - INFO - Step 11820: lr=2.16E-06, loss= 1.0712 (max= 1.5641), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:05,204 - root - INFO - Step 11820: lr=2.16E-06, loss= 1.0712 (max= 1.5641), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:05,204 - root - INFO - Step 11820: lr=2.16E-06, loss= 1.0712 (max= 1.5641), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:05,204 - root - INFO - Step 11820: lr=2.16E-06, loss= 1.0712 (max= 1.5641), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:05,204 - root - INFO - Step 11820: lr=2.16E-06, loss= 1.0712 (max= 1.5641), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:05,204 - root - INFO - Step 11820: lr=2.16E-06, loss= 1.0712 (max= 1.5641), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:05,204 - root - INFO - Step 11820: lr=2.16E-06, loss= 1.0712 (max= 1.5641), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:05,204 - root - INFO - Step 11820: lr=2.16E-06, loss= 1.0712 (max= 1.5641), tps=20465, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:37,037 - root - INFO - Step 11830: lr=2.16E-06, loss= 1.0748 (max= 1.5749), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:37,037 - root - INFO - Step 11830: lr=2.16E-06, loss= 1.0748 (max= 1.5749), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:37,037 - root - INFO - Step 11830: lr=2.16E-06, loss= 1.0748 (max= 1.5749), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:37,037 - root - INFO - Step 11830: lr=2.16E-06, loss= 1.0748 (max= 1.5749), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:37,037 - root - INFO - Step 11830: lr=2.16E-06, loss= 1.0748 (max= 1.5749), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:37,037 - root - INFO - Step 11830: lr=2.16E-06, loss= 1.0748 (max= 1.5749), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:37,037 - root - INFO - Step 11830: lr=2.16E-06, loss= 1.0748 (max= 1.5749), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:36:37,038 - root - INFO - Step 11830: lr=2.16E-06, loss= 1.0748 (max= 1.5749), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:08,913 - root - INFO - Step 11840: lr=2.16E-06, loss= 1.0616 (max= 1.4472), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:08,914 - root - INFO - Step 11840: lr=2.16E-06, loss= 1.0616 (max= 1.4472), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:08,914 - root - INFO - Step 11840: lr=2.16E-06, loss= 1.0616 (max= 1.4472), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:08,914 - root - INFO - Step 11840: lr=2.16E-06, loss= 1.0616 (max= 1.4472), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:08,914 - root - INFO - Step 11840: lr=2.16E-06, loss= 1.0616 (max= 1.4472), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:08,914 - root - INFO - Step 11840: lr=2.16E-06, loss= 1.0616 (max= 1.4472), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:08,914 - root - INFO - Step 11840: lr=2.16E-06, loss= 1.0616 (max= 1.4472), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:08,914 - root - INFO - Step 11840: lr=2.16E-06, loss= 1.0616 (max= 1.4472), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:40,708 - root - INFO - Step 11850: lr=2.15E-06, loss= 1.0597 (max= 1.6408), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:40,708 - root - INFO - Step 11850: lr=2.15E-06, loss= 1.0597 (max= 1.6408), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:40,708 - root - INFO - Step 11850: lr=2.15E-06, loss= 1.0597 (max= 1.6408), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:40,708 - root - INFO - Step 11850: lr=2.15E-06, loss= 1.0597 (max= 1.6408), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:40,708 - root - INFO - Step 11850: lr=2.15E-06, loss= 1.0597 (max= 1.6408), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:40,708 - root - INFO - Step 11850: lr=2.15E-06, loss= 1.0597 (max= 1.6408), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:40,708 - root - INFO - Step 11850: lr=2.15E-06, loss= 1.0597 (max= 1.6408), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:37:40,708 - root - INFO - Step 11850: lr=2.15E-06, loss= 1.0597 (max= 1.6408), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:12,543 - root - INFO - Step 11860: lr=2.15E-06, loss= 1.0897 (max= 1.5275), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:12,543 - root - INFO - Step 11860: lr=2.15E-06, loss= 1.0897 (max= 1.5275), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:12,543 - root - INFO - Step 11860: lr=2.15E-06, loss= 1.0897 (max= 1.5275), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:12,543 - root - INFO - Step 11860: lr=2.15E-06, loss= 1.0897 (max= 1.5275), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:12,543 - root - INFO - Step 11860: lr=2.15E-06, loss= 1.0897 (max= 1.5275), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:12,543 - root - INFO - Step 11860: lr=2.15E-06, loss= 1.0897 (max= 1.5275), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:12,543 - root - INFO - Step 11860: lr=2.15E-06, loss= 1.0897 (max= 1.5275), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:12,543 - root - INFO - Step 11860: lr=2.15E-06, loss= 1.0897 (max= 1.5275), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:44,391 - root - INFO - Step 11870: lr=2.14E-06, loss= 1.0571 (max= 1.5250), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:44,391 - root - INFO - Step 11870: lr=2.14E-06, loss= 1.0571 (max= 1.5250), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:44,391 - root - INFO - Step 11870: lr=2.14E-06, loss= 1.0571 (max= 1.5250), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:44,391 - root - INFO - Step 11870: lr=2.14E-06, loss= 1.0571 (max= 1.5250), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:44,391 - root - INFO - Step 11870: lr=2.14E-06, loss= 1.0571 (max= 1.5250), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:44,391 - root - INFO - Step 11870: lr=2.14E-06, loss= 1.0571 (max= 1.5250), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:44,391 - root - INFO - Step 11870: lr=2.14E-06, loss= 1.0571 (max= 1.5250), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:38:44,391 - root - INFO - Step 11870: lr=2.14E-06, loss= 1.0571 (max= 1.5250), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:16,325 - root - INFO - Step 11880: lr=2.14E-06, loss= 1.0706 (max= 1.5046), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:16,326 - root - INFO - Step 11880: lr=2.14E-06, loss= 1.0706 (max= 1.5046), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:16,326 - root - INFO - Step 11880: lr=2.14E-06, loss= 1.0706 (max= 1.5046), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:16,326 - root - INFO - Step 11880: lr=2.14E-06, loss= 1.0706 (max= 1.5046), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:16,326 - root - INFO - Step 11880: lr=2.14E-06, loss= 1.0706 (max= 1.5046), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:16,326 - root - INFO - Step 11880: lr=2.14E-06, loss= 1.0706 (max= 1.5046), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:16,326 - root - INFO - Step 11880: lr=2.14E-06, loss= 1.0706 (max= 1.5046), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:16,326 - root - INFO - Step 11880: lr=2.14E-06, loss= 1.0706 (max= 1.5046), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:48,346 - root - INFO - Step 11890: lr=2.14E-06, loss= 1.0682 (max= 1.5042), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:48,346 - root - INFO - Step 11890: lr=2.14E-06, loss= 1.0682 (max= 1.5042), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:48,346 - root - INFO - Step 11890: lr=2.14E-06, loss= 1.0682 (max= 1.5042), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:48,346 - root - INFO - Step 11890: lr=2.14E-06, loss= 1.0682 (max= 1.5042), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:48,346 - root - INFO - Step 11890: lr=2.14E-06, loss= 1.0682 (max= 1.5042), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:48,346 - root - INFO - Step 11890: lr=2.14E-06, loss= 1.0682 (max= 1.5042), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:48,346 - root - INFO - Step 11890: lr=2.14E-06, loss= 1.0682 (max= 1.5042), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:39:48,346 - root - INFO - Step 11890: lr=2.14E-06, loss= 1.0682 (max= 1.5042), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:20,165 - root - INFO - Step 11900: lr=2.13E-06, loss= 1.0600 (max= 1.5215), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:20,165 - root - INFO - Step 11900: lr=2.13E-06, loss= 1.0600 (max= 1.5215), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:20,165 - root - INFO - Step 11900: lr=2.13E-06, loss= 1.0600 (max= 1.5215), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:20,165 - root - INFO - Step 11900: lr=2.13E-06, loss= 1.0600 (max= 1.5215), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:20,165 - root - INFO - Step 11900: lr=2.13E-06, loss= 1.0600 (max= 1.5215), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:20,165 - root - INFO - Step 11900: lr=2.13E-06, loss= 1.0600 (max= 1.5215), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:20,166 - root - INFO - Step 11900: lr=2.13E-06, loss= 1.0600 (max= 1.5215), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:20,166 - root - INFO - Step 11900: lr=2.13E-06, loss= 1.0600 (max= 1.5215), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:52,038 - root - INFO - Step 11910: lr=2.13E-06, loss= 1.0540 (max= 1.4797), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:52,038 - root - INFO - Step 11910: lr=2.13E-06, loss= 1.0540 (max= 1.4797), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:52,038 - root - INFO - Step 11910: lr=2.13E-06, loss= 1.0540 (max= 1.4797), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:52,038 - root - INFO - Step 11910: lr=2.13E-06, loss= 1.0540 (max= 1.4797), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:52,038 - root - INFO - Step 11910: lr=2.13E-06, loss= 1.0540 (max= 1.4797), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:52,038 - root - INFO - Step 11910: lr=2.13E-06, loss= 1.0540 (max= 1.4797), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:52,038 - root - INFO - Step 11910: lr=2.13E-06, loss= 1.0540 (max= 1.4797), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:40:52,038 - root - INFO - Step 11910: lr=2.13E-06, loss= 1.0540 (max= 1.4797), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:23,892 - root - INFO - Step 11920: lr=2.13E-06, loss= 1.0621 (max= 1.5515), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:23,893 - root - INFO - Step 11920: lr=2.13E-06, loss= 1.0621 (max= 1.5515), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:23,893 - root - INFO - Step 11920: lr=2.13E-06, loss= 1.0621 (max= 1.5515), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:23,893 - root - INFO - Step 11920: lr=2.13E-06, loss= 1.0621 (max= 1.5515), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:23,893 - root - INFO - Step 11920: lr=2.13E-06, loss= 1.0621 (max= 1.5515), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:23,893 - root - INFO - Step 11920: lr=2.13E-06, loss= 1.0621 (max= 1.5515), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:23,893 - root - INFO - Step 11920: lr=2.13E-06, loss= 1.0621 (max= 1.5515), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:23,893 - root - INFO - Step 11920: lr=2.13E-06, loss= 1.0621 (max= 1.5515), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:55,709 - root - INFO - Step 11930: lr=2.12E-06, loss= 1.0555 (max= 1.5460), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:55,709 - root - INFO - Step 11930: lr=2.12E-06, loss= 1.0555 (max= 1.5460), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:55,709 - root - INFO - Step 11930: lr=2.12E-06, loss= 1.0555 (max= 1.5460), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:55,709 - root - INFO - Step 11930: lr=2.12E-06, loss= 1.0555 (max= 1.5460), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:55,709 - root - INFO - Step 11930: lr=2.12E-06, loss= 1.0555 (max= 1.5460), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:55,709 - root - INFO - Step 11930: lr=2.12E-06, loss= 1.0555 (max= 1.5460), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:55,709 - root - INFO - Step 11930: lr=2.12E-06, loss= 1.0555 (max= 1.5460), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:41:55,709 - root - INFO - Step 11930: lr=2.12E-06, loss= 1.0555 (max= 1.5460), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:27,701 - root - INFO - Step 11940: lr=2.12E-06, loss= 1.0818 (max= 1.4368), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:27,701 - root - INFO - Step 11940: lr=2.12E-06, loss= 1.0818 (max= 1.4368), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:27,701 - root - INFO - Step 11940: lr=2.12E-06, loss= 1.0818 (max= 1.4368), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:27,701 - root - INFO - Step 11940: lr=2.12E-06, loss= 1.0818 (max= 1.4368), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:27,701 - root - INFO - Step 11940: lr=2.12E-06, loss= 1.0818 (max= 1.4368), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:27,701 - root - INFO - Step 11940: lr=2.12E-06, loss= 1.0818 (max= 1.4368), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:27,701 - root - INFO - Step 11940: lr=2.12E-06, loss= 1.0818 (max= 1.4368), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:27,701 - root - INFO - Step 11940: lr=2.12E-06, loss= 1.0818 (max= 1.4368), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:59,605 - root - INFO - Step 11950: lr=2.12E-06, loss= 1.0494 (max= 1.4906), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:59,605 - root - INFO - Step 11950: lr=2.12E-06, loss= 1.0494 (max= 1.4906), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:59,606 - root - INFO - Step 11950: lr=2.12E-06, loss= 1.0494 (max= 1.4906), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:59,606 - root - INFO - Step 11950: lr=2.12E-06, loss= 1.0494 (max= 1.4906), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:59,606 - root - INFO - Step 11950: lr=2.12E-06, loss= 1.0494 (max= 1.4906), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:59,606 - root - INFO - Step 11950: lr=2.12E-06, loss= 1.0494 (max= 1.4906), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:59,606 - root - INFO - Step 11950: lr=2.12E-06, loss= 1.0494 (max= 1.4906), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:42:59,606 - root - INFO - Step 11950: lr=2.12E-06, loss= 1.0494 (max= 1.4906), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:43:31,537 - root - INFO - Step 11960: lr=2.11E-06, loss= 1.0613 (max= 1.5480), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:43:31,538 - root - INFO - Step 11960: lr=2.11E-06, loss= 1.0613 (max= 1.5480), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:43:31,538 - root - INFO - Step 11960: lr=2.11E-06, loss= 1.0613 (max= 1.5480), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:43:31,538 - root - INFO - Step 11960: lr=2.11E-06, loss= 1.0613 (max= 1.5480), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:43:31,538 - root - INFO - Step 11960: lr=2.11E-06, loss= 1.0613 (max= 1.5480), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:43:31,538 - root - INFO - Step 11960: lr=2.11E-06, loss= 1.0613 (max= 1.5480), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:43:31,538 - root - INFO - Step 11960: lr=2.11E-06, loss= 1.0613 (max= 1.5480), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:43:31,538 - root - INFO - Step 11960: lr=2.11E-06, loss= 1.0613 (max= 1.5480), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:03,365 - root - INFO - Step 11970: lr=2.11E-06, loss= 1.0642 (max= 1.5438), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:03,365 - root - INFO - Step 11970: lr=2.11E-06, loss= 1.0642 (max= 1.5438), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:03,365 - root - INFO - Step 11970: lr=2.11E-06, loss= 1.0642 (max= 1.5438), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:03,365 - root - INFO - Step 11970: lr=2.11E-06, loss= 1.0642 (max= 1.5438), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:03,365 - root - INFO - Step 11970: lr=2.11E-06, loss= 1.0642 (max= 1.5438), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:03,365 - root - INFO - Step 11970: lr=2.11E-06, loss= 1.0642 (max= 1.5438), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:03,365 - root - INFO - Step 11970: lr=2.11E-06, loss= 1.0642 (max= 1.5438), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:03,365 - root - INFO - Step 11970: lr=2.11E-06, loss= 1.0642 (max= 1.5438), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:35,264 - root - INFO - Step 11980: lr=2.11E-06, loss= 1.0732 (max= 1.4645), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:35,264 - root - INFO - Step 11980: lr=2.11E-06, loss= 1.0732 (max= 1.4645), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:35,265 - root - INFO - Step 11980: lr=2.11E-06, loss= 1.0732 (max= 1.4645), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:35,265 - root - INFO - Step 11980: lr=2.11E-06, loss= 1.0732 (max= 1.4645), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:35,265 - root - INFO - Step 11980: lr=2.11E-06, loss= 1.0732 (max= 1.4645), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:35,265 - root - INFO - Step 11980: lr=2.11E-06, loss= 1.0732 (max= 1.4645), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:35,265 - root - INFO - Step 11980: lr=2.11E-06, loss= 1.0732 (max= 1.4645), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:44:35,265 - root - INFO - Step 11980: lr=2.11E-06, loss= 1.0732 (max= 1.4645), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:07,127 - root - INFO - Step 11990: lr=2.10E-06, loss= 1.0638 (max= 1.4507), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:07,127 - root - INFO - Step 11990: lr=2.10E-06, loss= 1.0638 (max= 1.4507), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:07,127 - root - INFO - Step 11990: lr=2.10E-06, loss= 1.0638 (max= 1.4507), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:07,127 - root - INFO - Step 11990: lr=2.10E-06, loss= 1.0638 (max= 1.4507), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:07,127 - root - INFO - Step 11990: lr=2.10E-06, loss= 1.0638 (max= 1.4507), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:07,127 - root - INFO - Step 11990: lr=2.10E-06, loss= 1.0638 (max= 1.4507), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:07,127 - root - INFO - Step 11990: lr=2.10E-06, loss= 1.0638 (max= 1.4507), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:07,128 - root - INFO - Step 11990: lr=2.10E-06, loss= 1.0638 (max= 1.4507), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-12000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-12000! Save time: 4.403936862945557 2025-10-26 18:45:39,120 - root - INFO - Step 12000: lr=2.10E-06, loss= 1.0242 (max= 1.6088), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:39,120 - root - INFO - Step 12000: lr=2.10E-06, loss= 1.0242 (max= 1.6088), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:39,120 - root - INFO - Saving a full checkpoint at step 12000 2025-10-26 18:45:39,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 18:45:39,120 - root - INFO - Saving a full checkpoint at step 12000 2025-10-26 18:45:39,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 18:45:39,120 - root - INFO - Step 12000: lr=2.10E-06, loss= 1.0242 (max= 1.6088), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:39,120 - root - INFO - Step 12000: lr=2.10E-06, loss= 1.0242 (max= 1.6088), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:39,120 - root - INFO - Step 12000: lr=2.10E-06, loss= 1.0242 (max= 1.6088), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:39,120 - root - INFO - Saving a full checkpoint at step 12000 2025-10-26 18:45:39,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 18:45:39,120 - root - INFO - Step 12000: lr=2.10E-06, loss= 1.0242 (max= 1.6088), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:39,120 - root - INFO - Saving a full checkpoint at step 12000 2025-10-26 18:45:39,120 - root - INFO - Saving a full checkpoint at step 12000 2025-10-26 18:45:39,120 - root - INFO - Step 12000: lr=2.10E-06, loss= 1.0242 (max= 1.6088), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:39,120 - root - INFO - Step 12000: lr=2.10E-06, loss= 1.0242 (max= 1.6088), tps=20487, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:45:39,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 18:45:39,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 18:45:39,120 - root - INFO - Saving a full checkpoint at step 12000 2025-10-26 18:45:39,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 18:45:39,120 - root - INFO - Saving a full checkpoint at step 12000 2025-10-26 18:45:39,120 - root - INFO - Saving a full checkpoint at step 12000 2025-10-26 18:45:39,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 18:45:39,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 18:45:54,237 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 18:45:54,244 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 18:45:54,244 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 18:45:54,244 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 18:45:54,244 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 18:45:54,244 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 18:45:54,245 - root - INFO - Finished saving the checkpoint in 15.13 seconds 2025-10-26 18:45:54,245 - root - INFO - Finished saving the checkpoint in 15.13 seconds 2025-10-26 18:46:26,186 - root - INFO - Step 12010: lr=2.10E-06, loss= 1.0258 (max= 1.6753), tps=13925, mfu=29.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:26,186 - root - INFO - Step 12010: lr=2.10E-06, loss= 1.0258 (max= 1.6753), tps=13925, mfu=29.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:26,186 - root - INFO - Step 12010: lr=2.10E-06, loss= 1.0258 (max= 1.6753), tps=13925, mfu=29.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:26,186 - root - INFO - Step 12010: lr=2.10E-06, loss= 1.0258 (max= 1.6753), tps=13925, mfu=29.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:26,186 - root - INFO - Step 12010: lr=2.10E-06, loss= 1.0258 (max= 1.6753), tps=13925, mfu=29.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:26,186 - root - INFO - Step 12010: lr=2.10E-06, loss= 1.0258 (max= 1.6753), tps=13925, mfu=29.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:26,186 - root - INFO - Step 12010: lr=2.10E-06, loss= 1.0258 (max= 1.6753), tps=13925, mfu=29.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:26,186 - root - INFO - Step 12010: lr=2.10E-06, loss= 1.0258 (max= 1.6753), tps=13925, mfu=29.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:58,055 - root - INFO - Step 12020: lr=2.09E-06, loss= 1.0515 (max= 1.6586), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:58,055 - root - INFO - Step 12020: lr=2.09E-06, loss= 1.0515 (max= 1.6586), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:58,056 - root - INFO - Step 12020: lr=2.09E-06, loss= 1.0515 (max= 1.6586), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:58,056 - root - INFO - Step 12020: lr=2.09E-06, loss= 1.0515 (max= 1.6586), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:58,056 - root - INFO - Step 12020: lr=2.09E-06, loss= 1.0515 (max= 1.6586), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:58,056 - root - INFO - Step 12020: lr=2.09E-06, loss= 1.0515 (max= 1.6586), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:58,056 - root - INFO - Step 12020: lr=2.09E-06, loss= 1.0515 (max= 1.6586), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:46:58,056 - root - INFO - Step 12020: lr=2.09E-06, loss= 1.0515 (max= 1.6586), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:47:30,006 - root - INFO - Step 12030: lr=2.09E-06, loss= 1.0789 (max= 1.5813), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:47:30,006 - root - INFO - Step 12030: lr=2.09E-06, loss= 1.0789 (max= 1.5813), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:47:30,007 - root - INFO - Step 12030: lr=2.09E-06, loss= 1.0789 (max= 1.5813), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:47:30,007 - root - INFO - Step 12030: lr=2.09E-06, loss= 1.0789 (max= 1.5813), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:47:30,007 - root - INFO - Step 12030: lr=2.09E-06, loss= 1.0789 (max= 1.5813), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:47:30,007 - root - INFO - Step 12030: lr=2.09E-06, loss= 1.0789 (max= 1.5813), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:47:30,007 - root - INFO - Step 12030: lr=2.09E-06, loss= 1.0789 (max= 1.5813), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:47:30,007 - root - INFO - Step 12030: lr=2.09E-06, loss= 1.0789 (max= 1.5813), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:02,405 - root - INFO - Step 12040: lr=2.09E-06, loss= 1.0508 (max= 1.6440), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:02,406 - root - INFO - Step 12040: lr=2.09E-06, loss= 1.0508 (max= 1.6440), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:02,406 - root - INFO - Step 12040: lr=2.09E-06, loss= 1.0508 (max= 1.6440), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:02,406 - root - INFO - Step 12040: lr=2.09E-06, loss= 1.0508 (max= 1.6440), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:02,406 - root - INFO - Step 12040: lr=2.09E-06, loss= 1.0508 (max= 1.6440), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:02,406 - root - INFO - Step 12040: lr=2.09E-06, loss= 1.0508 (max= 1.6440), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:02,406 - root - INFO - Step 12040: lr=2.09E-06, loss= 1.0508 (max= 1.6440), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:02,406 - root - INFO - Step 12040: lr=2.09E-06, loss= 1.0508 (max= 1.6440), tps=20230, mfu=42.15%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:34,252 - root - INFO - Step 12050: lr=2.08E-06, loss= 1.0724 (max= 1.4748), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:34,252 - root - INFO - Step 12050: lr=2.08E-06, loss= 1.0724 (max= 1.4748), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:34,252 - root - INFO - Step 12050: lr=2.08E-06, loss= 1.0724 (max= 1.4748), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:34,252 - root - INFO - Step 12050: lr=2.08E-06, loss= 1.0724 (max= 1.4748), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:34,252 - root - INFO - Step 12050: lr=2.08E-06, loss= 1.0724 (max= 1.4748), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:34,252 - root - INFO - Step 12050: lr=2.08E-06, loss= 1.0724 (max= 1.4748), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:34,252 - root - INFO - Step 12050: lr=2.08E-06, loss= 1.0724 (max= 1.4748), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:48:34,252 - root - INFO - Step 12050: lr=2.08E-06, loss= 1.0724 (max= 1.4748), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:06,187 - root - INFO - Step 12060: lr=2.08E-06, loss= 1.0545 (max= 1.4383), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:06,187 - root - INFO - Step 12060: lr=2.08E-06, loss= 1.0545 (max= 1.4383), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:06,187 - root - INFO - Step 12060: lr=2.08E-06, loss= 1.0545 (max= 1.4383), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:06,187 - root - INFO - Step 12060: lr=2.08E-06, loss= 1.0545 (max= 1.4383), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:06,187 - root - INFO - Step 12060: lr=2.08E-06, loss= 1.0545 (max= 1.4383), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:06,187 - root - INFO - Step 12060: lr=2.08E-06, loss= 1.0545 (max= 1.4383), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:06,187 - root - INFO - Step 12060: lr=2.08E-06, loss= 1.0545 (max= 1.4383), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:06,187 - root - INFO - Step 12060: lr=2.08E-06, loss= 1.0545 (max= 1.4383), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:38,072 - root - INFO - Step 12070: lr=2.08E-06, loss= 1.0330 (max= 1.5113), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:38,072 - root - INFO - Step 12070: lr=2.08E-06, loss= 1.0330 (max= 1.5113), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:38,072 - root - INFO - Step 12070: lr=2.08E-06, loss= 1.0330 (max= 1.5113), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:38,072 - root - INFO - Step 12070: lr=2.08E-06, loss= 1.0330 (max= 1.5113), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:38,072 - root - INFO - Step 12070: lr=2.08E-06, loss= 1.0330 (max= 1.5113), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:38,072 - root - INFO - Step 12070: lr=2.08E-06, loss= 1.0330 (max= 1.5113), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:38,072 - root - INFO - Step 12070: lr=2.08E-06, loss= 1.0330 (max= 1.5113), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:38,072 - root - INFO - Step 12070: lr=2.08E-06, loss= 1.0330 (max= 1.5113), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:49:50,043 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5063142 2025-10-26 18:50:09,967 - root - INFO - Step 12080: lr=2.07E-06, loss= 1.0653 (max= 1.5785), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:09,967 - root - INFO - Step 12080: lr=2.07E-06, loss= 1.0653 (max= 1.5785), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:09,967 - root - INFO - Step 12080: lr=2.07E-06, loss= 1.0653 (max= 1.5785), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:09,967 - root - INFO - Step 12080: lr=2.07E-06, loss= 1.0653 (max= 1.5785), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:09,967 - root - INFO - Step 12080: lr=2.07E-06, loss= 1.0653 (max= 1.5785), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:09,967 - root - INFO - Step 12080: lr=2.07E-06, loss= 1.0653 (max= 1.5785), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:09,967 - root - INFO - Step 12080: lr=2.07E-06, loss= 1.0653 (max= 1.5785), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:09,967 - root - INFO - Step 12080: lr=2.07E-06, loss= 1.0653 (max= 1.5785), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:41,801 - root - INFO - Step 12090: lr=2.07E-06, loss= 1.0646 (max= 1.4843), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:41,801 - root - INFO - Step 12090: lr=2.07E-06, loss= 1.0646 (max= 1.4843), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:41,801 - root - INFO - Step 12090: lr=2.07E-06, loss= 1.0646 (max= 1.4843), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:41,801 - root - INFO - Step 12090: lr=2.07E-06, loss= 1.0646 (max= 1.4843), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:41,801 - root - INFO - Step 12090: lr=2.07E-06, loss= 1.0646 (max= 1.4843), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:41,801 - root - INFO - Step 12090: lr=2.07E-06, loss= 1.0646 (max= 1.4843), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:41,801 - root - INFO - Step 12090: lr=2.07E-06, loss= 1.0646 (max= 1.4843), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:50:41,801 - root - INFO - Step 12090: lr=2.07E-06, loss= 1.0646 (max= 1.4843), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:13,659 - root - INFO - Step 12100: lr=2.07E-06, loss= 1.0520 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:13,659 - root - INFO - Step 12100: lr=2.07E-06, loss= 1.0520 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:13,659 - root - INFO - Step 12100: lr=2.07E-06, loss= 1.0520 (max= 1.8215), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:13,659 - root - INFO - Step 12100: lr=2.07E-06, loss= 1.0520 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:13,659 - root - INFO - Step 12100: lr=2.07E-06, loss= 1.0520 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:13,660 - root - INFO - Step 12100: lr=2.07E-06, loss= 1.0520 (max= 1.8215), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:13,660 - root - INFO - Step 12100: lr=2.07E-06, loss= 1.0520 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:13,660 - root - INFO - Step 12100: lr=2.07E-06, loss= 1.0520 (max= 1.8215), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:45,417 - root - INFO - Step 12110: lr=2.06E-06, loss= 1.0522 (max= 1.5561), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:45,417 - root - INFO - Step 12110: lr=2.06E-06, loss= 1.0522 (max= 1.5561), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:45,417 - root - INFO - Step 12110: lr=2.06E-06, loss= 1.0522 (max= 1.5561), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:45,417 - root - INFO - Step 12110: lr=2.06E-06, loss= 1.0522 (max= 1.5561), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:45,417 - root - INFO - Step 12110: lr=2.06E-06, loss= 1.0522 (max= 1.5561), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:45,417 - root - INFO - Step 12110: lr=2.06E-06, loss= 1.0522 (max= 1.5561), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:45,417 - root - INFO - Step 12110: lr=2.06E-06, loss= 1.0522 (max= 1.5561), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:51:45,417 - root - INFO - Step 12110: lr=2.06E-06, loss= 1.0522 (max= 1.5561), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:17,331 - root - INFO - Step 12120: lr=2.06E-06, loss= 1.0719 (max= 1.5214), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:17,331 - root - INFO - Step 12120: lr=2.06E-06, loss= 1.0719 (max= 1.5214), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:17,331 - root - INFO - Step 12120: lr=2.06E-06, loss= 1.0719 (max= 1.5214), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:17,331 - root - INFO - Step 12120: lr=2.06E-06, loss= 1.0719 (max= 1.5214), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:17,331 - root - INFO - Step 12120: lr=2.06E-06, loss= 1.0719 (max= 1.5214), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:17,331 - root - INFO - Step 12120: lr=2.06E-06, loss= 1.0719 (max= 1.5214), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:17,331 - root - INFO - Step 12120: lr=2.06E-06, loss= 1.0719 (max= 1.5214), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:17,332 - root - INFO - Step 12120: lr=2.06E-06, loss= 1.0719 (max= 1.5214), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:49,135 - root - INFO - Step 12130: lr=2.06E-06, loss= 1.0708 (max= 1.7575), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:49,135 - root - INFO - Step 12130: lr=2.06E-06, loss= 1.0708 (max= 1.7575), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:49,136 - root - INFO - Step 12130: lr=2.06E-06, loss= 1.0708 (max= 1.7575), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:49,136 - root - INFO - Step 12130: lr=2.06E-06, loss= 1.0708 (max= 1.7575), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:49,136 - root - INFO - Step 12130: lr=2.06E-06, loss= 1.0708 (max= 1.7575), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:49,136 - root - INFO - Step 12130: lr=2.06E-06, loss= 1.0708 (max= 1.7575), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:49,136 - root - INFO - Step 12130: lr=2.06E-06, loss= 1.0708 (max= 1.7575), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:52:49,136 - root - INFO - Step 12130: lr=2.06E-06, loss= 1.0708 (max= 1.7575), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:20,972 - root - INFO - Step 12140: lr=2.05E-06, loss= 1.0793 (max= 1.5170), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:20,972 - root - INFO - Step 12140: lr=2.05E-06, loss= 1.0793 (max= 1.5170), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:20,972 - root - INFO - Step 12140: lr=2.05E-06, loss= 1.0793 (max= 1.5170), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:20,972 - root - INFO - Step 12140: lr=2.05E-06, loss= 1.0793 (max= 1.5170), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:20,972 - root - INFO - Step 12140: lr=2.05E-06, loss= 1.0793 (max= 1.5170), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:20,972 - root - INFO - Step 12140: lr=2.05E-06, loss= 1.0793 (max= 1.5170), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:20,972 - root - INFO - Step 12140: lr=2.05E-06, loss= 1.0793 (max= 1.5170), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:20,973 - root - INFO - Step 12140: lr=2.05E-06, loss= 1.0793 (max= 1.5170), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:52,804 - root - INFO - Step 12150: lr=2.05E-06, loss= 1.0399 (max= 1.5400), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:52,804 - root - INFO - Step 12150: lr=2.05E-06, loss= 1.0399 (max= 1.5400), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:52,804 - root - INFO - Step 12150: lr=2.05E-06, loss= 1.0399 (max= 1.5400), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:52,804 - root - INFO - Step 12150: lr=2.05E-06, loss= 1.0399 (max= 1.5400), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:52,804 - root - INFO - Step 12150: lr=2.05E-06, loss= 1.0399 (max= 1.5400), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:52,804 - root - INFO - Step 12150: lr=2.05E-06, loss= 1.0399 (max= 1.5400), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:52,804 - root - INFO - Step 12150: lr=2.05E-06, loss= 1.0399 (max= 1.5400), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:53:52,804 - root - INFO - Step 12150: lr=2.05E-06, loss= 1.0399 (max= 1.5400), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:24,586 - root - INFO - Step 12160: lr=2.05E-06, loss= 1.0351 (max= 1.4326), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:24,586 - root - INFO - Step 12160: lr=2.05E-06, loss= 1.0351 (max= 1.4326), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:24,586 - root - INFO - Step 12160: lr=2.05E-06, loss= 1.0351 (max= 1.4326), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:24,587 - root - INFO - Step 12160: lr=2.05E-06, loss= 1.0351 (max= 1.4326), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:24,587 - root - INFO - Step 12160: lr=2.05E-06, loss= 1.0351 (max= 1.4326), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:24,587 - root - INFO - Step 12160: lr=2.05E-06, loss= 1.0351 (max= 1.4326), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:24,587 - root - INFO - Step 12160: lr=2.05E-06, loss= 1.0351 (max= 1.4326), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:24,587 - root - INFO - Step 12160: lr=2.05E-06, loss= 1.0351 (max= 1.4326), tps=20622, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:56,381 - root - INFO - Step 12170: lr=2.04E-06, loss= 1.0623 (max= 1.4244), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:56,381 - root - INFO - Step 12170: lr=2.04E-06, loss= 1.0623 (max= 1.4244), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:56,381 - root - INFO - Step 12170: lr=2.04E-06, loss= 1.0623 (max= 1.4244), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:56,381 - root - INFO - Step 12170: lr=2.04E-06, loss= 1.0623 (max= 1.4244), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:56,381 - root - INFO - Step 12170: lr=2.04E-06, loss= 1.0623 (max= 1.4244), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:56,381 - root - INFO - Step 12170: lr=2.04E-06, loss= 1.0623 (max= 1.4244), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:56,381 - root - INFO - Step 12170: lr=2.04E-06, loss= 1.0623 (max= 1.4244), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:54:56,381 - root - INFO - Step 12170: lr=2.04E-06, loss= 1.0623 (max= 1.4244), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:55:28,226 - root - INFO - Step 12180: lr=2.04E-06, loss= 1.0343 (max= 1.5462), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:55:28,226 - root - INFO - Step 12180: lr=2.04E-06, loss= 1.0343 (max= 1.5462), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:55:28,227 - root - INFO - Step 12180: lr=2.04E-06, loss= 1.0343 (max= 1.5462), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:55:28,227 - root - INFO - Step 12180: lr=2.04E-06, loss= 1.0343 (max= 1.5462), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:55:28,227 - root - INFO - Step 12180: lr=2.04E-06, loss= 1.0343 (max= 1.5462), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:55:28,227 - root - INFO - Step 12180: lr=2.04E-06, loss= 1.0343 (max= 1.5462), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:55:28,227 - root - INFO - Step 12180: lr=2.04E-06, loss= 1.0343 (max= 1.5462), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:55:28,227 - root - INFO - Step 12180: lr=2.04E-06, loss= 1.0343 (max= 1.5462), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:00,151 - root - INFO - Step 12190: lr=2.03E-06, loss= 1.0608 (max= 1.4545), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:00,151 - root - INFO - Step 12190: lr=2.03E-06, loss= 1.0608 (max= 1.4545), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:00,151 - root - INFO - Step 12190: lr=2.03E-06, loss= 1.0608 (max= 1.4545), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:00,151 - root - INFO - Step 12190: lr=2.03E-06, loss= 1.0608 (max= 1.4545), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:00,151 - root - INFO - Step 12190: lr=2.03E-06, loss= 1.0608 (max= 1.4545), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:00,151 - root - INFO - Step 12190: lr=2.03E-06, loss= 1.0608 (max= 1.4545), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:00,151 - root - INFO - Step 12190: lr=2.03E-06, loss= 1.0608 (max= 1.4545), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:00,151 - root - INFO - Step 12190: lr=2.03E-06, loss= 1.0608 (max= 1.4545), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:31,968 - root - INFO - Step 12200: lr=2.03E-06, loss= 1.0553 (max= 1.4761), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:31,968 - root - INFO - Step 12200: lr=2.03E-06, loss= 1.0553 (max= 1.4761), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:31,968 - root - INFO - Step 12200: lr=2.03E-06, loss= 1.0553 (max= 1.4761), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:31,968 - root - INFO - Step 12200: lr=2.03E-06, loss= 1.0553 (max= 1.4761), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:31,968 - root - INFO - Step 12200: lr=2.03E-06, loss= 1.0553 (max= 1.4761), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:31,968 - root - INFO - Step 12200: lr=2.03E-06, loss= 1.0553 (max= 1.4761), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:31,968 - root - INFO - Step 12200: lr=2.03E-06, loss= 1.0553 (max= 1.4761), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:56:31,968 - root - INFO - Step 12200: lr=2.03E-06, loss= 1.0553 (max= 1.4761), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:03,824 - root - INFO - Step 12210: lr=2.03E-06, loss= 1.0474 (max= 1.4128), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:03,824 - root - INFO - Step 12210: lr=2.03E-06, loss= 1.0474 (max= 1.4128), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:03,824 - root - INFO - Step 12210: lr=2.03E-06, loss= 1.0474 (max= 1.4128), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:03,824 - root - INFO - Step 12210: lr=2.03E-06, loss= 1.0474 (max= 1.4128), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:03,824 - root - INFO - Step 12210: lr=2.03E-06, loss= 1.0474 (max= 1.4128), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:03,824 - root - INFO - Step 12210: lr=2.03E-06, loss= 1.0474 (max= 1.4128), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:03,824 - root - INFO - Step 12210: lr=2.03E-06, loss= 1.0474 (max= 1.4128), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:03,824 - root - INFO - Step 12210: lr=2.03E-06, loss= 1.0474 (max= 1.4128), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:35,666 - root - INFO - Step 12220: lr=2.02E-06, loss= 1.0523 (max= 1.7749), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:35,666 - root - INFO - Step 12220: lr=2.02E-06, loss= 1.0523 (max= 1.7749), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:35,666 - root - INFO - Step 12220: lr=2.02E-06, loss= 1.0523 (max= 1.7749), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:35,666 - root - INFO - Step 12220: lr=2.02E-06, loss= 1.0523 (max= 1.7749), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:35,666 - root - INFO - Step 12220: lr=2.02E-06, loss= 1.0523 (max= 1.7749), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:35,666 - root - INFO - Step 12220: lr=2.02E-06, loss= 1.0523 (max= 1.7749), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:35,666 - root - INFO - Step 12220: lr=2.02E-06, loss= 1.0523 (max= 1.7749), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:57:35,666 - root - INFO - Step 12220: lr=2.02E-06, loss= 1.0523 (max= 1.7749), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:07,471 - root - INFO - Step 12230: lr=2.02E-06, loss= 1.0771 (max= 1.6708), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:07,471 - root - INFO - Step 12230: lr=2.02E-06, loss= 1.0771 (max= 1.6708), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:07,471 - root - INFO - Step 12230: lr=2.02E-06, loss= 1.0771 (max= 1.6708), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:07,471 - root - INFO - Step 12230: lr=2.02E-06, loss= 1.0771 (max= 1.6708), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:07,471 - root - INFO - Step 12230: lr=2.02E-06, loss= 1.0771 (max= 1.6708), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:07,471 - root - INFO - Step 12230: lr=2.02E-06, loss= 1.0771 (max= 1.6708), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:07,471 - root - INFO - Step 12230: lr=2.02E-06, loss= 1.0771 (max= 1.6708), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:07,471 - root - INFO - Step 12230: lr=2.02E-06, loss= 1.0771 (max= 1.6708), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:39,306 - root - INFO - Step 12240: lr=2.02E-06, loss= 1.0562 (max= 1.5550), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:39,306 - root - INFO - Step 12240: lr=2.02E-06, loss= 1.0562 (max= 1.5550), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:39,306 - root - INFO - Step 12240: lr=2.02E-06, loss= 1.0562 (max= 1.5550), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:39,306 - root - INFO - Step 12240: lr=2.02E-06, loss= 1.0562 (max= 1.5550), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:39,306 - root - INFO - Step 12240: lr=2.02E-06, loss= 1.0562 (max= 1.5550), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:39,306 - root - INFO - Step 12240: lr=2.02E-06, loss= 1.0562 (max= 1.5550), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:39,306 - root - INFO - Step 12240: lr=2.02E-06, loss= 1.0562 (max= 1.5550), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:58:39,306 - root - INFO - Step 12240: lr=2.02E-06, loss= 1.0562 (max= 1.5550), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:11,131 - root - INFO - Step 12250: lr=2.01E-06, loss= 1.0323 (max= 1.4423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:11,131 - root - INFO - Step 12250: lr=2.01E-06, loss= 1.0323 (max= 1.4423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:11,131 - root - INFO - Step 12250: lr=2.01E-06, loss= 1.0323 (max= 1.4423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:11,131 - root - INFO - Step 12250: lr=2.01E-06, loss= 1.0323 (max= 1.4423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:11,131 - root - INFO - Step 12250: lr=2.01E-06, loss= 1.0323 (max= 1.4423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:11,131 - root - INFO - Step 12250: lr=2.01E-06, loss= 1.0323 (max= 1.4423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:11,131 - root - INFO - Step 12250: lr=2.01E-06, loss= 1.0323 (max= 1.4423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:11,131 - root - INFO - Step 12250: lr=2.01E-06, loss= 1.0323 (max= 1.4423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:43,020 - root - INFO - Step 12260: lr=2.01E-06, loss= 1.0570 (max= 1.5522), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:43,020 - root - INFO - Step 12260: lr=2.01E-06, loss= 1.0570 (max= 1.5522), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:43,020 - root - INFO - Step 12260: lr=2.01E-06, loss= 1.0570 (max= 1.5522), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:43,020 - root - INFO - Step 12260: lr=2.01E-06, loss= 1.0570 (max= 1.5522), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:43,020 - root - INFO - Step 12260: lr=2.01E-06, loss= 1.0570 (max= 1.5522), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:43,020 - root - INFO - Step 12260: lr=2.01E-06, loss= 1.0570 (max= 1.5522), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:43,020 - root - INFO - Step 12260: lr=2.01E-06, loss= 1.0570 (max= 1.5522), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 18:59:43,020 - root - INFO - Step 12260: lr=2.01E-06, loss= 1.0570 (max= 1.5522), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:14,788 - root - INFO - Step 12270: lr=2.01E-06, loss= 1.0789 (max= 1.4991), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:14,788 - root - INFO - Step 12270: lr=2.01E-06, loss= 1.0789 (max= 1.4991), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:14,788 - root - INFO - Step 12270: lr=2.01E-06, loss= 1.0789 (max= 1.4991), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:14,788 - root - INFO - Step 12270: lr=2.01E-06, loss= 1.0789 (max= 1.4991), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:14,788 - root - INFO - Step 12270: lr=2.01E-06, loss= 1.0789 (max= 1.4991), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:14,788 - root - INFO - Step 12270: lr=2.01E-06, loss= 1.0789 (max= 1.4991), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:14,788 - root - INFO - Step 12270: lr=2.01E-06, loss= 1.0789 (max= 1.4991), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:14,788 - root - INFO - Step 12270: lr=2.01E-06, loss= 1.0789 (max= 1.4991), tps=20632, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:46,598 - root - INFO - Step 12280: lr=2.00E-06, loss= 1.0375 (max= 2.2465), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:46,598 - root - INFO - Step 12280: lr=2.00E-06, loss= 1.0375 (max= 2.2465), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:46,598 - root - INFO - Step 12280: lr=2.00E-06, loss= 1.0375 (max= 2.2465), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:46,598 - root - INFO - Step 12280: lr=2.00E-06, loss= 1.0375 (max= 2.2465), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:46,598 - root - INFO - Step 12280: lr=2.00E-06, loss= 1.0375 (max= 2.2465), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:46,598 - root - INFO - Step 12280: lr=2.00E-06, loss= 1.0375 (max= 2.2465), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:46,598 - root - INFO - Step 12280: lr=2.00E-06, loss= 1.0375 (max= 2.2465), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:00:46,598 - root - INFO - Step 12280: lr=2.00E-06, loss= 1.0375 (max= 2.2465), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:18,432 - root - INFO - Step 12290: lr=2.00E-06, loss= 1.0501 (max= 1.5283), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:18,432 - root - INFO - Step 12290: lr=2.00E-06, loss= 1.0501 (max= 1.5283), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:18,432 - root - INFO - Step 12290: lr=2.00E-06, loss= 1.0501 (max= 1.5283), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:18,432 - root - INFO - Step 12290: lr=2.00E-06, loss= 1.0501 (max= 1.5283), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:18,432 - root - INFO - Step 12290: lr=2.00E-06, loss= 1.0501 (max= 1.5283), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:18,432 - root - INFO - Step 12290: lr=2.00E-06, loss= 1.0501 (max= 1.5283), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:18,432 - root - INFO - Step 12290: lr=2.00E-06, loss= 1.0501 (max= 1.5283), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:18,433 - root - INFO - Step 12290: lr=2.00E-06, loss= 1.0501 (max= 1.5283), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:50,403 - root - INFO - Step 12300: lr=2.00E-06, loss= 1.0553 (max= 1.5011), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:50,403 - root - INFO - Step 12300: lr=2.00E-06, loss= 1.0553 (max= 1.5011), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:50,403 - root - INFO - Step 12300: lr=2.00E-06, loss= 1.0553 (max= 1.5011), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:50,403 - root - INFO - Step 12300: lr=2.00E-06, loss= 1.0553 (max= 1.5011), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:50,403 - root - INFO - Step 12300: lr=2.00E-06, loss= 1.0553 (max= 1.5011), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:50,403 - root - INFO - Step 12300: lr=2.00E-06, loss= 1.0553 (max= 1.5011), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:50,403 - root - INFO - Step 12300: lr=2.00E-06, loss= 1.0553 (max= 1.5011), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:01:50,403 - root - INFO - Step 12300: lr=2.00E-06, loss= 1.0553 (max= 1.5011), tps=20501, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:22,324 - root - INFO - Step 12310: lr=1.99E-06, loss= 1.0450 (max= 1.5460), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:22,325 - root - INFO - Step 12310: lr=1.99E-06, loss= 1.0450 (max= 1.5460), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:22,325 - root - INFO - Step 12310: lr=1.99E-06, loss= 1.0450 (max= 1.5460), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:22,325 - root - INFO - Step 12310: lr=1.99E-06, loss= 1.0450 (max= 1.5460), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:22,325 - root - INFO - Step 12310: lr=1.99E-06, loss= 1.0450 (max= 1.5460), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:22,325 - root - INFO - Step 12310: lr=1.99E-06, loss= 1.0450 (max= 1.5460), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:22,325 - root - INFO - Step 12310: lr=1.99E-06, loss= 1.0450 (max= 1.5460), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:22,325 - root - INFO - Step 12310: lr=1.99E-06, loss= 1.0450 (max= 1.5460), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:54,136 - root - INFO - Step 12320: lr=1.99E-06, loss= 1.0688 (max= 1.3923), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:54,136 - root - INFO - Step 12320: lr=1.99E-06, loss= 1.0688 (max= 1.3923), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:54,137 - root - INFO - Step 12320: lr=1.99E-06, loss= 1.0688 (max= 1.3923), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:54,137 - root - INFO - Step 12320: lr=1.99E-06, loss= 1.0688 (max= 1.3923), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:54,137 - root - INFO - Step 12320: lr=1.99E-06, loss= 1.0688 (max= 1.3923), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:54,137 - root - INFO - Step 12320: lr=1.99E-06, loss= 1.0688 (max= 1.3923), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:54,137 - root - INFO - Step 12320: lr=1.99E-06, loss= 1.0688 (max= 1.3923), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:02:54,137 - root - INFO - Step 12320: lr=1.99E-06, loss= 1.0688 (max= 1.3923), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:26,000 - root - INFO - Step 12330: lr=1.99E-06, loss= 1.0730 (max= 1.6991), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:26,000 - root - INFO - Step 12330: lr=1.99E-06, loss= 1.0730 (max= 1.6991), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:26,000 - root - INFO - Step 12330: lr=1.99E-06, loss= 1.0730 (max= 1.6991), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:26,000 - root - INFO - Step 12330: lr=1.99E-06, loss= 1.0730 (max= 1.6991), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:26,000 - root - INFO - Step 12330: lr=1.99E-06, loss= 1.0730 (max= 1.6991), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:26,000 - root - INFO - Step 12330: lr=1.99E-06, loss= 1.0730 (max= 1.6991), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:26,000 - root - INFO - Step 12330: lr=1.99E-06, loss= 1.0730 (max= 1.6991), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:26,000 - root - INFO - Step 12330: lr=1.99E-06, loss= 1.0730 (max= 1.6991), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:57,830 - root - INFO - Step 12340: lr=1.98E-06, loss= 1.0306 (max= 1.4456), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:57,831 - root - INFO - Step 12340: lr=1.98E-06, loss= 1.0306 (max= 1.4456), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:57,831 - root - INFO - Step 12340: lr=1.98E-06, loss= 1.0306 (max= 1.4456), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:57,831 - root - INFO - Step 12340: lr=1.98E-06, loss= 1.0306 (max= 1.4456), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:57,831 - root - INFO - Step 12340: lr=1.98E-06, loss= 1.0306 (max= 1.4456), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:57,831 - root - INFO - Step 12340: lr=1.98E-06, loss= 1.0306 (max= 1.4456), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:57,831 - root - INFO - Step 12340: lr=1.98E-06, loss= 1.0306 (max= 1.4456), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:03:57,831 - root - INFO - Step 12340: lr=1.98E-06, loss= 1.0306 (max= 1.4456), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:04:29,748 - root - INFO - Step 12350: lr=1.98E-06, loss= 1.0117 (max= 1.4473), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:04:29,748 - root - INFO - Step 12350: lr=1.98E-06, loss= 1.0117 (max= 1.4473), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:04:29,748 - root - INFO - Step 12350: lr=1.98E-06, loss= 1.0117 (max= 1.4473), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:04:29,748 - root - INFO - Step 12350: lr=1.98E-06, loss= 1.0117 (max= 1.4473), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:04:29,748 - root - INFO - Step 12350: lr=1.98E-06, loss= 1.0117 (max= 1.4473), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:04:29,748 - root - INFO - Step 12350: lr=1.98E-06, loss= 1.0117 (max= 1.4473), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:04:29,748 - root - INFO - Step 12350: lr=1.98E-06, loss= 1.0117 (max= 1.4473), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:04:29,748 - root - INFO - Step 12350: lr=1.98E-06, loss= 1.0117 (max= 1.4473), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:01,590 - root - INFO - Step 12360: lr=1.98E-06, loss= 1.0580 (max= 1.5918), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:01,590 - root - INFO - Step 12360: lr=1.98E-06, loss= 1.0580 (max= 1.5918), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:01,590 - root - INFO - Step 12360: lr=1.98E-06, loss= 1.0580 (max= 1.5918), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:01,590 - root - INFO - Step 12360: lr=1.98E-06, loss= 1.0580 (max= 1.5918), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:01,590 - root - INFO - Step 12360: lr=1.98E-06, loss= 1.0580 (max= 1.5918), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:01,590 - root - INFO - Step 12360: lr=1.98E-06, loss= 1.0580 (max= 1.5918), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:01,590 - root - INFO - Step 12360: lr=1.98E-06, loss= 1.0580 (max= 1.5918), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:01,590 - root - INFO - Step 12360: lr=1.98E-06, loss= 1.0580 (max= 1.5918), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:33,415 - root - INFO - Step 12370: lr=1.97E-06, loss= 1.0492 (max= 1.4655), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:33,415 - root - INFO - Step 12370: lr=1.97E-06, loss= 1.0492 (max= 1.4655), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:33,416 - root - INFO - Step 12370: lr=1.97E-06, loss= 1.0492 (max= 1.4655), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:33,416 - root - INFO - Step 12370: lr=1.97E-06, loss= 1.0492 (max= 1.4655), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:33,416 - root - INFO - Step 12370: lr=1.97E-06, loss= 1.0492 (max= 1.4655), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:33,416 - root - INFO - Step 12370: lr=1.97E-06, loss= 1.0492 (max= 1.4655), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:33,416 - root - INFO - Step 12370: lr=1.97E-06, loss= 1.0492 (max= 1.4655), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:05:33,416 - root - INFO - Step 12370: lr=1.97E-06, loss= 1.0492 (max= 1.4655), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:05,209 - root - INFO - Step 12380: lr=1.97E-06, loss= 1.0558 (max= 1.8548), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:05,209 - root - INFO - Step 12380: lr=1.97E-06, loss= 1.0558 (max= 1.8548), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:05,209 - root - INFO - Step 12380: lr=1.97E-06, loss= 1.0558 (max= 1.8548), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:05,209 - root - INFO - Step 12380: lr=1.97E-06, loss= 1.0558 (max= 1.8548), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:05,209 - root - INFO - Step 12380: lr=1.97E-06, loss= 1.0558 (max= 1.8548), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:05,209 - root - INFO - Step 12380: lr=1.97E-06, loss= 1.0558 (max= 1.8548), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:05,209 - root - INFO - Step 12380: lr=1.97E-06, loss= 1.0558 (max= 1.8548), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:05,209 - root - INFO - Step 12380: lr=1.97E-06, loss= 1.0558 (max= 1.8548), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:37,085 - root - INFO - Step 12390: lr=1.97E-06, loss= 1.0440 (max= 1.4300), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:37,085 - root - INFO - Step 12390: lr=1.97E-06, loss= 1.0440 (max= 1.4300), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:37,085 - root - INFO - Step 12390: lr=1.97E-06, loss= 1.0440 (max= 1.4300), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:37,085 - root - INFO - Step 12390: lr=1.97E-06, loss= 1.0440 (max= 1.4300), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:37,085 - root - INFO - Step 12390: lr=1.97E-06, loss= 1.0440 (max= 1.4300), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:37,085 - root - INFO - Step 12390: lr=1.97E-06, loss= 1.0440 (max= 1.4300), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:37,085 - root - INFO - Step 12390: lr=1.97E-06, loss= 1.0440 (max= 1.4300), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:06:37,086 - root - INFO - Step 12390: lr=1.97E-06, loss= 1.0440 (max= 1.4300), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:08,921 - root - INFO - Step 12400: lr=1.96E-06, loss= 1.0404 (max= 1.5070), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:08,921 - root - INFO - Step 12400: lr=1.96E-06, loss= 1.0404 (max= 1.5070), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:08,922 - root - INFO - Step 12400: lr=1.96E-06, loss= 1.0404 (max= 1.5070), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:08,922 - root - INFO - Step 12400: lr=1.96E-06, loss= 1.0404 (max= 1.5070), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:08,922 - root - INFO - Step 12400: lr=1.96E-06, loss= 1.0404 (max= 1.5070), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:08,922 - root - INFO - Step 12400: lr=1.96E-06, loss= 1.0404 (max= 1.5070), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:08,922 - root - INFO - Step 12400: lr=1.96E-06, loss= 1.0404 (max= 1.5070), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:08,922 - root - INFO - Step 12400: lr=1.96E-06, loss= 1.0404 (max= 1.5070), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:40,712 - root - INFO - Step 12410: lr=1.96E-06, loss= 1.0299 (max= 1.4161), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:40,712 - root - INFO - Step 12410: lr=1.96E-06, loss= 1.0299 (max= 1.4161), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:40,712 - root - INFO - Step 12410: lr=1.96E-06, loss= 1.0299 (max= 1.4161), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:40,712 - root - INFO - Step 12410: lr=1.96E-06, loss= 1.0299 (max= 1.4161), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:40,712 - root - INFO - Step 12410: lr=1.96E-06, loss= 1.0299 (max= 1.4161), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:40,712 - root - INFO - Step 12410: lr=1.96E-06, loss= 1.0299 (max= 1.4161), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:40,712 - root - INFO - Step 12410: lr=1.96E-06, loss= 1.0299 (max= 1.4161), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:07:40,712 - root - INFO - Step 12410: lr=1.96E-06, loss= 1.0299 (max= 1.4161), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:12,492 - root - INFO - Step 12420: lr=1.96E-06, loss= 1.0235 (max= 1.5370), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:12,492 - root - INFO - Step 12420: lr=1.96E-06, loss= 1.0235 (max= 1.5370), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:12,493 - root - INFO - Step 12420: lr=1.96E-06, loss= 1.0235 (max= 1.5370), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:12,493 - root - INFO - Step 12420: lr=1.96E-06, loss= 1.0235 (max= 1.5370), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:12,493 - root - INFO - Step 12420: lr=1.96E-06, loss= 1.0235 (max= 1.5370), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:12,493 - root - INFO - Step 12420: lr=1.96E-06, loss= 1.0235 (max= 1.5370), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:12,493 - root - INFO - Step 12420: lr=1.96E-06, loss= 1.0235 (max= 1.5370), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:12,493 - root - INFO - Step 12420: lr=1.96E-06, loss= 1.0235 (max= 1.5370), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:44,319 - root - INFO - Step 12430: lr=1.95E-06, loss= 1.0083 (max= 1.5659), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:44,319 - root - INFO - Step 12430: lr=1.95E-06, loss= 1.0083 (max= 1.5659), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:44,319 - root - INFO - Step 12430: lr=1.95E-06, loss= 1.0083 (max= 1.5659), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:44,319 - root - INFO - Step 12430: lr=1.95E-06, loss= 1.0083 (max= 1.5659), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:44,319 - root - INFO - Step 12430: lr=1.95E-06, loss= 1.0083 (max= 1.5659), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:44,319 - root - INFO - Step 12430: lr=1.95E-06, loss= 1.0083 (max= 1.5659), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:44,319 - root - INFO - Step 12430: lr=1.95E-06, loss= 1.0083 (max= 1.5659), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:08:44,319 - root - INFO - Step 12430: lr=1.95E-06, loss= 1.0083 (max= 1.5659), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:16,117 - root - INFO - Step 12440: lr=1.95E-06, loss= 1.0663 (max= 1.9145), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:16,117 - root - INFO - Step 12440: lr=1.95E-06, loss= 1.0663 (max= 1.9145), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:16,117 - root - INFO - Step 12440: lr=1.95E-06, loss= 1.0663 (max= 1.9145), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:16,117 - root - INFO - Step 12440: lr=1.95E-06, loss= 1.0663 (max= 1.9145), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:16,117 - root - INFO - Step 12440: lr=1.95E-06, loss= 1.0663 (max= 1.9145), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:16,117 - root - INFO - Step 12440: lr=1.95E-06, loss= 1.0663 (max= 1.9145), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:16,117 - root - INFO - Step 12440: lr=1.95E-06, loss= 1.0663 (max= 1.9145), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:16,117 - root - INFO - Step 12440: lr=1.95E-06, loss= 1.0663 (max= 1.9145), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:48,106 - root - INFO - Step 12450: lr=1.95E-06, loss= 1.0452 (max= 1.4913), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:48,106 - root - INFO - Step 12450: lr=1.95E-06, loss= 1.0452 (max= 1.4913), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:48,106 - root - INFO - Step 12450: lr=1.95E-06, loss= 1.0452 (max= 1.4913), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:48,106 - root - INFO - Step 12450: lr=1.95E-06, loss= 1.0452 (max= 1.4913), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:48,106 - root - INFO - Step 12450: lr=1.95E-06, loss= 1.0452 (max= 1.4913), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:48,106 - root - INFO - Step 12450: lr=1.95E-06, loss= 1.0452 (max= 1.4913), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:48,106 - root - INFO - Step 12450: lr=1.95E-06, loss= 1.0452 (max= 1.4913), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:09:48,106 - root - INFO - Step 12450: lr=1.95E-06, loss= 1.0452 (max= 1.4913), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:19,863 - root - INFO - Step 12460: lr=1.94E-06, loss= 1.0431 (max= 1.6330), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:19,863 - root - INFO - Step 12460: lr=1.94E-06, loss= 1.0431 (max= 1.6330), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:19,863 - root - INFO - Step 12460: lr=1.94E-06, loss= 1.0431 (max= 1.6330), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:19,863 - root - INFO - Step 12460: lr=1.94E-06, loss= 1.0431 (max= 1.6330), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:19,863 - root - INFO - Step 12460: lr=1.94E-06, loss= 1.0431 (max= 1.6330), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:19,863 - root - INFO - Step 12460: lr=1.94E-06, loss= 1.0431 (max= 1.6330), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:19,864 - root - INFO - Step 12460: lr=1.94E-06, loss= 1.0431 (max= 1.6330), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:19,864 - root - INFO - Step 12460: lr=1.94E-06, loss= 1.0431 (max= 1.6330), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:51,723 - root - INFO - Step 12470: lr=1.94E-06, loss= 1.0512 (max= 1.5158), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:51,723 - root - INFO - Step 12470: lr=1.94E-06, loss= 1.0512 (max= 1.5158), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:51,723 - root - INFO - Step 12470: lr=1.94E-06, loss= 1.0512 (max= 1.5158), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:51,724 - root - INFO - Step 12470: lr=1.94E-06, loss= 1.0512 (max= 1.5158), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:51,724 - root - INFO - Step 12470: lr=1.94E-06, loss= 1.0512 (max= 1.5158), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:51,724 - root - INFO - Step 12470: lr=1.94E-06, loss= 1.0512 (max= 1.5158), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:51,724 - root - INFO - Step 12470: lr=1.94E-06, loss= 1.0512 (max= 1.5158), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:10:51,724 - root - INFO - Step 12470: lr=1.94E-06, loss= 1.0512 (max= 1.5158), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:23,527 - root - INFO - Step 12480: lr=1.94E-06, loss= 1.0520 (max= 1.4605), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:23,527 - root - INFO - Step 12480: lr=1.94E-06, loss= 1.0520 (max= 1.4605), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:23,527 - root - INFO - Step 12480: lr=1.94E-06, loss= 1.0520 (max= 1.4605), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:23,527 - root - INFO - Step 12480: lr=1.94E-06, loss= 1.0520 (max= 1.4605), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:23,528 - root - INFO - Step 12480: lr=1.94E-06, loss= 1.0520 (max= 1.4605), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:23,528 - root - INFO - Step 12480: lr=1.94E-06, loss= 1.0520 (max= 1.4605), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:23,528 - root - INFO - Step 12480: lr=1.94E-06, loss= 1.0520 (max= 1.4605), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:23,528 - root - INFO - Step 12480: lr=1.94E-06, loss= 1.0520 (max= 1.4605), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:55,323 - root - INFO - Step 12490: lr=1.93E-06, loss= 1.0507 (max= 1.4238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:55,323 - root - INFO - Step 12490: lr=1.93E-06, loss= 1.0507 (max= 1.4238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:55,323 - root - INFO - Step 12490: lr=1.93E-06, loss= 1.0507 (max= 1.4238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:55,323 - root - INFO - Step 12490: lr=1.93E-06, loss= 1.0507 (max= 1.4238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:55,323 - root - INFO - Step 12490: lr=1.93E-06, loss= 1.0507 (max= 1.4238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:55,323 - root - INFO - Step 12490: lr=1.93E-06, loss= 1.0507 (max= 1.4238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:55,323 - root - INFO - Step 12490: lr=1.93E-06, loss= 1.0507 (max= 1.4238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:11:55,323 - root - INFO - Step 12490: lr=1.93E-06, loss= 1.0507 (max= 1.4238), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:27,291 - root - INFO - Step 12500: lr=1.93E-06, loss= 1.0366 (max= 2.0008), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:27,291 - root - INFO - Step 12500: lr=1.93E-06, loss= 1.0366 (max= 2.0008), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:27,291 - root - INFO - Step 12500: lr=1.93E-06, loss= 1.0366 (max= 2.0008), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:27,291 - root - INFO - Step 12500: lr=1.93E-06, loss= 1.0366 (max= 2.0008), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:27,291 - root - INFO - Step 12500: lr=1.93E-06, loss= 1.0366 (max= 2.0008), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:27,291 - root - INFO - Step 12500: lr=1.93E-06, loss= 1.0366 (max= 2.0008), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:27,291 - root - INFO - Step 12500: lr=1.93E-06, loss= 1.0366 (max= 2.0008), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:27,292 - root - INFO - Step 12500: lr=1.93E-06, loss= 1.0366 (max= 2.0008), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:59,249 - root - INFO - Step 12510: lr=1.93E-06, loss= 1.0546 (max= 1.5464), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:59,249 - root - INFO - Step 12510: lr=1.93E-06, loss= 1.0546 (max= 1.5464), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:59,249 - root - INFO - Step 12510: lr=1.93E-06, loss= 1.0546 (max= 1.5464), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:59,249 - root - INFO - Step 12510: lr=1.93E-06, loss= 1.0546 (max= 1.5464), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:59,249 - root - INFO - Step 12510: lr=1.93E-06, loss= 1.0546 (max= 1.5464), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:59,249 - root - INFO - Step 12510: lr=1.93E-06, loss= 1.0546 (max= 1.5464), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:59,249 - root - INFO - Step 12510: lr=1.93E-06, loss= 1.0546 (max= 1.5464), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:12:59,249 - root - INFO - Step 12510: lr=1.93E-06, loss= 1.0546 (max= 1.5464), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:13:31,094 - root - INFO - Step 12520: lr=1.92E-06, loss= 1.0806 (max= 1.8554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:13:31,094 - root - INFO - Step 12520: lr=1.92E-06, loss= 1.0806 (max= 1.8554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:13:31,094 - root - INFO - Step 12520: lr=1.92E-06, loss= 1.0806 (max= 1.8554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:13:31,094 - root - INFO - Step 12520: lr=1.92E-06, loss= 1.0806 (max= 1.8554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:13:31,095 - root - INFO - Step 12520: lr=1.92E-06, loss= 1.0806 (max= 1.8554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:13:31,095 - root - INFO - Step 12520: lr=1.92E-06, loss= 1.0806 (max= 1.8554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:13:31,095 - root - INFO - Step 12520: lr=1.92E-06, loss= 1.0806 (max= 1.8554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:13:31,095 - root - INFO - Step 12520: lr=1.92E-06, loss= 1.0806 (max= 1.8554), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:02,971 - root - INFO - Step 12530: lr=1.92E-06, loss= 1.0454 (max= 1.5296), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:02,971 - root - INFO - Step 12530: lr=1.92E-06, loss= 1.0454 (max= 1.5296), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:02,971 - root - INFO - Step 12530: lr=1.92E-06, loss= 1.0454 (max= 1.5296), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:02,971 - root - INFO - Step 12530: lr=1.92E-06, loss= 1.0454 (max= 1.5296), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:02,971 - root - INFO - Step 12530: lr=1.92E-06, loss= 1.0454 (max= 1.5296), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:02,971 - root - INFO - Step 12530: lr=1.92E-06, loss= 1.0454 (max= 1.5296), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:02,971 - root - INFO - Step 12530: lr=1.92E-06, loss= 1.0454 (max= 1.5296), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:02,971 - root - INFO - Step 12530: lr=1.92E-06, loss= 1.0454 (max= 1.5296), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:34,831 - root - INFO - Step 12540: lr=1.92E-06, loss= 1.0477 (max= 1.5049), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:34,831 - root - INFO - Step 12540: lr=1.92E-06, loss= 1.0477 (max= 1.5049), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:34,831 - root - INFO - Step 12540: lr=1.92E-06, loss= 1.0477 (max= 1.5049), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:34,831 - root - INFO - Step 12540: lr=1.92E-06, loss= 1.0477 (max= 1.5049), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:34,831 - root - INFO - Step 12540: lr=1.92E-06, loss= 1.0477 (max= 1.5049), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:34,831 - root - INFO - Step 12540: lr=1.92E-06, loss= 1.0477 (max= 1.5049), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:34,831 - root - INFO - Step 12540: lr=1.92E-06, loss= 1.0477 (max= 1.5049), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:14:34,831 - root - INFO - Step 12540: lr=1.92E-06, loss= 1.0477 (max= 1.5049), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:06,710 - root - INFO - Step 12550: lr=1.91E-06, loss= 1.0696 (max= 1.5538), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:06,710 - root - INFO - Step 12550: lr=1.91E-06, loss= 1.0696 (max= 1.5538), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:06,710 - root - INFO - Step 12550: lr=1.91E-06, loss= 1.0696 (max= 1.5538), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:06,710 - root - INFO - Step 12550: lr=1.91E-06, loss= 1.0696 (max= 1.5538), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:06,710 - root - INFO - Step 12550: lr=1.91E-06, loss= 1.0696 (max= 1.5538), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:06,710 - root - INFO - Step 12550: lr=1.91E-06, loss= 1.0696 (max= 1.5538), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:06,710 - root - INFO - Step 12550: lr=1.91E-06, loss= 1.0696 (max= 1.5538), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:06,711 - root - INFO - Step 12550: lr=1.91E-06, loss= 1.0696 (max= 1.5538), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:38,543 - root - INFO - Step 12560: lr=1.91E-06, loss= 1.0685 (max= 1.4393), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:38,543 - root - INFO - Step 12560: lr=1.91E-06, loss= 1.0685 (max= 1.4393), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:38,543 - root - INFO - Step 12560: lr=1.91E-06, loss= 1.0685 (max= 1.4393), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:38,543 - root - INFO - Step 12560: lr=1.91E-06, loss= 1.0685 (max= 1.4393), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:38,543 - root - INFO - Step 12560: lr=1.91E-06, loss= 1.0685 (max= 1.4393), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:38,543 - root - INFO - Step 12560: lr=1.91E-06, loss= 1.0685 (max= 1.4393), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:38,543 - root - INFO - Step 12560: lr=1.91E-06, loss= 1.0685 (max= 1.4393), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:15:38,543 - root - INFO - Step 12560: lr=1.91E-06, loss= 1.0685 (max= 1.4393), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:10,639 - root - INFO - Step 12570: lr=1.91E-06, loss= 1.0341 (max= 1.5272), tps=20421, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:10,639 - root - INFO - Step 12570: lr=1.91E-06, loss= 1.0341 (max= 1.5272), tps=20421, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:10,639 - root - INFO - Step 12570: lr=1.91E-06, loss= 1.0341 (max= 1.5272), tps=20421, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:10,639 - root - INFO - Step 12570: lr=1.91E-06, loss= 1.0341 (max= 1.5272), tps=20421, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:10,639 - root - INFO - Step 12570: lr=1.91E-06, loss= 1.0341 (max= 1.5272), tps=20421, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:10,639 - root - INFO - Step 12570: lr=1.91E-06, loss= 1.0341 (max= 1.5272), tps=20421, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:10,639 - root - INFO - Step 12570: lr=1.91E-06, loss= 1.0341 (max= 1.5272), tps=20421, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:10,639 - root - INFO - Step 12570: lr=1.91E-06, loss= 1.0341 (max= 1.5272), tps=20421, mfu=42.55%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:42,485 - root - INFO - Step 12580: lr=1.90E-06, loss= 1.0858 (max= 1.7916), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:42,485 - root - INFO - Step 12580: lr=1.90E-06, loss= 1.0858 (max= 1.7916), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:42,485 - root - INFO - Step 12580: lr=1.90E-06, loss= 1.0858 (max= 1.7916), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:42,485 - root - INFO - Step 12580: lr=1.90E-06, loss= 1.0858 (max= 1.7916), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:42,485 - root - INFO - Step 12580: lr=1.90E-06, loss= 1.0858 (max= 1.7916), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:42,485 - root - INFO - Step 12580: lr=1.90E-06, loss= 1.0858 (max= 1.7916), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:42,485 - root - INFO - Step 12580: lr=1.90E-06, loss= 1.0858 (max= 1.7916), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:16:42,485 - root - INFO - Step 12580: lr=1.90E-06, loss= 1.0858 (max= 1.7916), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:14,292 - root - INFO - Step 12590: lr=1.90E-06, loss= 1.0748 (max= 1.5559), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:14,292 - root - INFO - Step 12590: lr=1.90E-06, loss= 1.0748 (max= 1.5559), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:14,292 - root - INFO - Step 12590: lr=1.90E-06, loss= 1.0748 (max= 1.5559), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:14,292 - root - INFO - Step 12590: lr=1.90E-06, loss= 1.0748 (max= 1.5559), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:14,292 - root - INFO - Step 12590: lr=1.90E-06, loss= 1.0748 (max= 1.5559), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:14,292 - root - INFO - Step 12590: lr=1.90E-06, loss= 1.0748 (max= 1.5559), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:14,292 - root - INFO - Step 12590: lr=1.90E-06, loss= 1.0748 (max= 1.5559), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:14,292 - root - INFO - Step 12590: lr=1.90E-06, loss= 1.0748 (max= 1.5559), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:46,099 - root - INFO - Step 12600: lr=1.90E-06, loss= 1.0493 (max= 1.4517), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:46,099 - root - INFO - Step 12600: lr=1.90E-06, loss= 1.0493 (max= 1.4517), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:46,099 - root - INFO - Step 12600: lr=1.90E-06, loss= 1.0493 (max= 1.4517), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:46,099 - root - INFO - Step 12600: lr=1.90E-06, loss= 1.0493 (max= 1.4517), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:46,099 - root - INFO - Step 12600: lr=1.90E-06, loss= 1.0493 (max= 1.4517), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:46,099 - root - INFO - Step 12600: lr=1.90E-06, loss= 1.0493 (max= 1.4517), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:46,099 - root - INFO - Step 12600: lr=1.90E-06, loss= 1.0493 (max= 1.4517), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:17:46,099 - root - INFO - Step 12600: lr=1.90E-06, loss= 1.0493 (max= 1.4517), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:17,879 - root - INFO - Step 12610: lr=1.89E-06, loss= 1.0500 (max= 1.5251), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:17,879 - root - INFO - Step 12610: lr=1.89E-06, loss= 1.0500 (max= 1.5251), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:17,879 - root - INFO - Step 12610: lr=1.89E-06, loss= 1.0500 (max= 1.5251), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:17,879 - root - INFO - Step 12610: lr=1.89E-06, loss= 1.0500 (max= 1.5251), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:17,879 - root - INFO - Step 12610: lr=1.89E-06, loss= 1.0500 (max= 1.5251), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:17,879 - root - INFO - Step 12610: lr=1.89E-06, loss= 1.0500 (max= 1.5251), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:17,879 - root - INFO - Step 12610: lr=1.89E-06, loss= 1.0500 (max= 1.5251), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:17,879 - root - INFO - Step 12610: lr=1.89E-06, loss= 1.0500 (max= 1.5251), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:49,675 - root - INFO - Step 12620: lr=1.89E-06, loss= 1.0549 (max= 1.4841), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:49,675 - root - INFO - Step 12620: lr=1.89E-06, loss= 1.0549 (max= 1.4841), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:49,675 - root - INFO - Step 12620: lr=1.89E-06, loss= 1.0549 (max= 1.4841), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:49,675 - root - INFO - Step 12620: lr=1.89E-06, loss= 1.0549 (max= 1.4841), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:49,675 - root - INFO - Step 12620: lr=1.89E-06, loss= 1.0549 (max= 1.4841), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:49,675 - root - INFO - Step 12620: lr=1.89E-06, loss= 1.0549 (max= 1.4841), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:49,675 - root - INFO - Step 12620: lr=1.89E-06, loss= 1.0549 (max= 1.4841), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:18:49,675 - root - INFO - Step 12620: lr=1.89E-06, loss= 1.0549 (max= 1.4841), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:21,465 - root - INFO - Step 12630: lr=1.89E-06, loss= 1.0521 (max= 1.5660), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:21,465 - root - INFO - Step 12630: lr=1.89E-06, loss= 1.0521 (max= 1.5660), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:21,465 - root - INFO - Step 12630: lr=1.89E-06, loss= 1.0521 (max= 1.5660), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:21,465 - root - INFO - Step 12630: lr=1.89E-06, loss= 1.0521 (max= 1.5660), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:21,465 - root - INFO - Step 12630: lr=1.89E-06, loss= 1.0521 (max= 1.5660), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:21,465 - root - INFO - Step 12630: lr=1.89E-06, loss= 1.0521 (max= 1.5660), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:21,465 - root - INFO - Step 12630: lr=1.89E-06, loss= 1.0521 (max= 1.5660), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:21,465 - root - INFO - Step 12630: lr=1.89E-06, loss= 1.0521 (max= 1.5660), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:53,297 - root - INFO - Step 12640: lr=1.88E-06, loss= 1.0519 (max= 1.5266), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:53,297 - root - INFO - Step 12640: lr=1.88E-06, loss= 1.0519 (max= 1.5266), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:53,297 - root - INFO - Step 12640: lr=1.88E-06, loss= 1.0519 (max= 1.5266), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:53,297 - root - INFO - Step 12640: lr=1.88E-06, loss= 1.0519 (max= 1.5266), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:53,297 - root - INFO - Step 12640: lr=1.88E-06, loss= 1.0519 (max= 1.5266), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:53,297 - root - INFO - Step 12640: lr=1.88E-06, loss= 1.0519 (max= 1.5266), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:53,297 - root - INFO - Step 12640: lr=1.88E-06, loss= 1.0519 (max= 1.5266), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:19:53,297 - root - INFO - Step 12640: lr=1.88E-06, loss= 1.0519 (max= 1.5266), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:25,144 - root - INFO - Step 12650: lr=1.88E-06, loss= 1.0561 (max= 1.6452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:25,144 - root - INFO - Step 12650: lr=1.88E-06, loss= 1.0561 (max= 1.6452), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:25,144 - root - INFO - Step 12650: lr=1.88E-06, loss= 1.0561 (max= 1.6452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:25,144 - root - INFO - Step 12650: lr=1.88E-06, loss= 1.0561 (max= 1.6452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:25,144 - root - INFO - Step 12650: lr=1.88E-06, loss= 1.0561 (max= 1.6452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:25,144 - root - INFO - Step 12650: lr=1.88E-06, loss= 1.0561 (max= 1.6452), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:25,144 - root - INFO - Step 12650: lr=1.88E-06, loss= 1.0561 (max= 1.6452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:25,144 - root - INFO - Step 12650: lr=1.88E-06, loss= 1.0561 (max= 1.6452), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:46,630 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:6683379 2025-10-26 19:20:57,024 - root - INFO - Step 12660: lr=1.88E-06, loss= 1.0451 (max= 1.4239), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:57,024 - root - INFO - Step 12660: lr=1.88E-06, loss= 1.0451 (max= 1.4239), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:57,025 - root - INFO - Step 12660: lr=1.88E-06, loss= 1.0451 (max= 1.4239), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:57,025 - root - INFO - Step 12660: lr=1.88E-06, loss= 1.0451 (max= 1.4239), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:57,025 - root - INFO - Step 12660: lr=1.88E-06, loss= 1.0451 (max= 1.4239), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:57,025 - root - INFO - Step 12660: lr=1.88E-06, loss= 1.0451 (max= 1.4239), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:57,025 - root - INFO - Step 12660: lr=1.88E-06, loss= 1.0451 (max= 1.4239), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:20:57,025 - root - INFO - Step 12660: lr=1.88E-06, loss= 1.0451 (max= 1.4239), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:21:28,940 - root - INFO - Step 12670: lr=1.87E-06, loss= 1.0882 (max= 1.6527), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:21:28,940 - root - INFO - Step 12670: lr=1.87E-06, loss= 1.0882 (max= 1.6527), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:21:28,940 - root - INFO - Step 12670: lr=1.87E-06, loss= 1.0882 (max= 1.6527), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:21:28,940 - root - INFO - Step 12670: lr=1.87E-06, loss= 1.0882 (max= 1.6527), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:21:28,940 - root - INFO - Step 12670: lr=1.87E-06, loss= 1.0882 (max= 1.6527), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:21:28,940 - root - INFO - Step 12670: lr=1.87E-06, loss= 1.0882 (max= 1.6527), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:21:28,940 - root - INFO - Step 12670: lr=1.87E-06, loss= 1.0882 (max= 1.6527), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:21:28,940 - root - INFO - Step 12670: lr=1.87E-06, loss= 1.0882 (max= 1.6527), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:00,795 - root - INFO - Step 12680: lr=1.87E-06, loss= 1.0247 (max= 1.5337), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:00,796 - root - INFO - Step 12680: lr=1.87E-06, loss= 1.0247 (max= 1.5337), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:00,796 - root - INFO - Step 12680: lr=1.87E-06, loss= 1.0247 (max= 1.5337), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:00,796 - root - INFO - Step 12680: lr=1.87E-06, loss= 1.0247 (max= 1.5337), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:00,796 - root - INFO - Step 12680: lr=1.87E-06, loss= 1.0247 (max= 1.5337), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:00,796 - root - INFO - Step 12680: lr=1.87E-06, loss= 1.0247 (max= 1.5337), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:00,796 - root - INFO - Step 12680: lr=1.87E-06, loss= 1.0247 (max= 1.5337), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:00,796 - root - INFO - Step 12680: lr=1.87E-06, loss= 1.0247 (max= 1.5337), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:32,720 - root - INFO - Step 12690: lr=1.87E-06, loss= 1.0359 (max= 1.4901), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:32,720 - root - INFO - Step 12690: lr=1.87E-06, loss= 1.0359 (max= 1.4901), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:32,720 - root - INFO - Step 12690: lr=1.87E-06, loss= 1.0359 (max= 1.4901), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:32,720 - root - INFO - Step 12690: lr=1.87E-06, loss= 1.0359 (max= 1.4901), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:32,720 - root - INFO - Step 12690: lr=1.87E-06, loss= 1.0359 (max= 1.4901), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:32,720 - root - INFO - Step 12690: lr=1.87E-06, loss= 1.0359 (max= 1.4901), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:32,720 - root - INFO - Step 12690: lr=1.87E-06, loss= 1.0359 (max= 1.4901), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:22:32,720 - root - INFO - Step 12690: lr=1.87E-06, loss= 1.0359 (max= 1.4901), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:04,587 - root - INFO - Step 12700: lr=1.86E-06, loss= 1.0920 (max= 1.6338), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:04,587 - root - INFO - Step 12700: lr=1.86E-06, loss= 1.0920 (max= 1.6338), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:04,587 - root - INFO - Step 12700: lr=1.86E-06, loss= 1.0920 (max= 1.6338), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:04,587 - root - INFO - Step 12700: lr=1.86E-06, loss= 1.0920 (max= 1.6338), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:04,587 - root - INFO - Step 12700: lr=1.86E-06, loss= 1.0920 (max= 1.6338), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:04,587 - root - INFO - Step 12700: lr=1.86E-06, loss= 1.0920 (max= 1.6338), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:04,587 - root - INFO - Step 12700: lr=1.86E-06, loss= 1.0920 (max= 1.6338), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:04,587 - root - INFO - Step 12700: lr=1.86E-06, loss= 1.0920 (max= 1.6338), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:36,395 - root - INFO - Step 12710: lr=1.86E-06, loss= 1.0295 (max= 1.4966), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:36,395 - root - INFO - Step 12710: lr=1.86E-06, loss= 1.0295 (max= 1.4966), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:36,395 - root - INFO - Step 12710: lr=1.86E-06, loss= 1.0295 (max= 1.4966), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:36,395 - root - INFO - Step 12710: lr=1.86E-06, loss= 1.0295 (max= 1.4966), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:36,395 - root - INFO - Step 12710: lr=1.86E-06, loss= 1.0295 (max= 1.4966), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:36,395 - root - INFO - Step 12710: lr=1.86E-06, loss= 1.0295 (max= 1.4966), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:36,395 - root - INFO - Step 12710: lr=1.86E-06, loss= 1.0295 (max= 1.4966), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:23:36,395 - root - INFO - Step 12710: lr=1.86E-06, loss= 1.0295 (max= 1.4966), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:08,205 - root - INFO - Step 12720: lr=1.86E-06, loss= 1.0380 (max= 1.5744), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:08,205 - root - INFO - Step 12720: lr=1.86E-06, loss= 1.0380 (max= 1.5744), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:08,205 - root - INFO - Step 12720: lr=1.86E-06, loss= 1.0380 (max= 1.5744), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:08,205 - root - INFO - Step 12720: lr=1.86E-06, loss= 1.0380 (max= 1.5744), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:08,205 - root - INFO - Step 12720: lr=1.86E-06, loss= 1.0380 (max= 1.5744), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:08,205 - root - INFO - Step 12720: lr=1.86E-06, loss= 1.0380 (max= 1.5744), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:08,205 - root - INFO - Step 12720: lr=1.86E-06, loss= 1.0380 (max= 1.5744), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:08,205 - root - INFO - Step 12720: lr=1.86E-06, loss= 1.0380 (max= 1.5744), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:40,280 - root - INFO - Step 12730: lr=1.85E-06, loss= 1.0530 (max= 1.7026), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:40,280 - root - INFO - Step 12730: lr=1.85E-06, loss= 1.0530 (max= 1.7026), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:40,280 - root - INFO - Step 12730: lr=1.85E-06, loss= 1.0530 (max= 1.7026), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:40,280 - root - INFO - Step 12730: lr=1.85E-06, loss= 1.0530 (max= 1.7026), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:40,280 - root - INFO - Step 12730: lr=1.85E-06, loss= 1.0530 (max= 1.7026), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:40,280 - root - INFO - Step 12730: lr=1.85E-06, loss= 1.0530 (max= 1.7026), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:40,280 - root - INFO - Step 12730: lr=1.85E-06, loss= 1.0530 (max= 1.7026), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:24:40,280 - root - INFO - Step 12730: lr=1.85E-06, loss= 1.0530 (max= 1.7026), tps=20434, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:12,081 - root - INFO - Step 12740: lr=1.85E-06, loss= 1.0348 (max= 1.4287), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:12,081 - root - INFO - Step 12740: lr=1.85E-06, loss= 1.0348 (max= 1.4287), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:12,081 - root - INFO - Step 12740: lr=1.85E-06, loss= 1.0348 (max= 1.4287), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:12,081 - root - INFO - Step 12740: lr=1.85E-06, loss= 1.0348 (max= 1.4287), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:12,081 - root - INFO - Step 12740: lr=1.85E-06, loss= 1.0348 (max= 1.4287), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:12,081 - root - INFO - Step 12740: lr=1.85E-06, loss= 1.0348 (max= 1.4287), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:12,081 - root - INFO - Step 12740: lr=1.85E-06, loss= 1.0348 (max= 1.4287), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:12,081 - root - INFO - Step 12740: lr=1.85E-06, loss= 1.0348 (max= 1.4287), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:43,890 - root - INFO - Step 12750: lr=1.85E-06, loss= 1.0571 (max= 1.4733), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:43,890 - root - INFO - Step 12750: lr=1.85E-06, loss= 1.0571 (max= 1.4733), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:43,890 - root - INFO - Step 12750: lr=1.85E-06, loss= 1.0571 (max= 1.4733), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:43,890 - root - INFO - Step 12750: lr=1.85E-06, loss= 1.0571 (max= 1.4733), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:43,890 - root - INFO - Step 12750: lr=1.85E-06, loss= 1.0571 (max= 1.4733), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:43,890 - root - INFO - Step 12750: lr=1.85E-06, loss= 1.0571 (max= 1.4733), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:43,890 - root - INFO - Step 12750: lr=1.85E-06, loss= 1.0571 (max= 1.4733), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:25:43,890 - root - INFO - Step 12750: lr=1.85E-06, loss= 1.0571 (max= 1.4733), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:15,697 - root - INFO - Step 12760: lr=1.84E-06, loss= 1.0375 (max= 1.5129), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:15,697 - root - INFO - Step 12760: lr=1.84E-06, loss= 1.0375 (max= 1.5129), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:15,697 - root - INFO - Step 12760: lr=1.84E-06, loss= 1.0375 (max= 1.5129), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:15,697 - root - INFO - Step 12760: lr=1.84E-06, loss= 1.0375 (max= 1.5129), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:15,697 - root - INFO - Step 12760: lr=1.84E-06, loss= 1.0375 (max= 1.5129), tps=20607, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:15,697 - root - INFO - Step 12760: lr=1.84E-06, loss= 1.0375 (max= 1.5129), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:15,697 - root - INFO - Step 12760: lr=1.84E-06, loss= 1.0375 (max= 1.5129), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:15,697 - root - INFO - Step 12760: lr=1.84E-06, loss= 1.0375 (max= 1.5129), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:47,499 - root - INFO - Step 12770: lr=1.84E-06, loss= 1.0762 (max= 1.6088), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:47,499 - root - INFO - Step 12770: lr=1.84E-06, loss= 1.0762 (max= 1.6088), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:47,499 - root - INFO - Step 12770: lr=1.84E-06, loss= 1.0762 (max= 1.6088), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:47,499 - root - INFO - Step 12770: lr=1.84E-06, loss= 1.0762 (max= 1.6088), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:47,499 - root - INFO - Step 12770: lr=1.84E-06, loss= 1.0762 (max= 1.6088), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:47,499 - root - INFO - Step 12770: lr=1.84E-06, loss= 1.0762 (max= 1.6088), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:47,499 - root - INFO - Step 12770: lr=1.84E-06, loss= 1.0762 (max= 1.6088), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:26:47,500 - root - INFO - Step 12770: lr=1.84E-06, loss= 1.0762 (max= 1.6088), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:19,418 - root - INFO - Step 12780: lr=1.84E-06, loss= 1.0361 (max= 1.5165), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:19,418 - root - INFO - Step 12780: lr=1.84E-06, loss= 1.0361 (max= 1.5165), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:19,418 - root - INFO - Step 12780: lr=1.84E-06, loss= 1.0361 (max= 1.5165), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:19,418 - root - INFO - Step 12780: lr=1.84E-06, loss= 1.0361 (max= 1.5165), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:19,418 - root - INFO - Step 12780: lr=1.84E-06, loss= 1.0361 (max= 1.5165), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:19,418 - root - INFO - Step 12780: lr=1.84E-06, loss= 1.0361 (max= 1.5165), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:19,418 - root - INFO - Step 12780: lr=1.84E-06, loss= 1.0361 (max= 1.5165), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:19,418 - root - INFO - Step 12780: lr=1.84E-06, loss= 1.0361 (max= 1.5165), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:51,340 - root - INFO - Step 12790: lr=1.83E-06, loss= 1.0466 (max= 1.4694), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:51,340 - root - INFO - Step 12790: lr=1.83E-06, loss= 1.0466 (max= 1.4694), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:51,340 - root - INFO - Step 12790: lr=1.83E-06, loss= 1.0466 (max= 1.4694), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:51,340 - root - INFO - Step 12790: lr=1.83E-06, loss= 1.0466 (max= 1.4694), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:51,340 - root - INFO - Step 12790: lr=1.83E-06, loss= 1.0466 (max= 1.4694), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:51,340 - root - INFO - Step 12790: lr=1.83E-06, loss= 1.0466 (max= 1.4694), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:51,340 - root - INFO - Step 12790: lr=1.83E-06, loss= 1.0466 (max= 1.4694), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:27:51,340 - root - INFO - Step 12790: lr=1.83E-06, loss= 1.0466 (max= 1.4694), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:23,303 - root - INFO - Step 12800: lr=1.83E-06, loss= 1.0400 (max= 1.6206), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:23,303 - root - INFO - Step 12800: lr=1.83E-06, loss= 1.0400 (max= 1.6206), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:23,303 - root - INFO - Step 12800: lr=1.83E-06, loss= 1.0400 (max= 1.6206), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:23,303 - root - INFO - Step 12800: lr=1.83E-06, loss= 1.0400 (max= 1.6206), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:23,303 - root - INFO - Step 12800: lr=1.83E-06, loss= 1.0400 (max= 1.6206), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:23,303 - root - INFO - Step 12800: lr=1.83E-06, loss= 1.0400 (max= 1.6206), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:23,303 - root - INFO - Step 12800: lr=1.83E-06, loss= 1.0400 (max= 1.6206), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:23,303 - root - INFO - Step 12800: lr=1.83E-06, loss= 1.0400 (max= 1.6206), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:55,191 - root - INFO - Step 12810: lr=1.83E-06, loss= 1.0573 (max= 1.3784), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:55,191 - root - INFO - Step 12810: lr=1.83E-06, loss= 1.0573 (max= 1.3784), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:55,191 - root - INFO - Step 12810: lr=1.83E-06, loss= 1.0573 (max= 1.3784), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:55,191 - root - INFO - Step 12810: lr=1.83E-06, loss= 1.0573 (max= 1.3784), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:55,191 - root - INFO - Step 12810: lr=1.83E-06, loss= 1.0573 (max= 1.3784), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:55,191 - root - INFO - Step 12810: lr=1.83E-06, loss= 1.0573 (max= 1.3784), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:55,191 - root - INFO - Step 12810: lr=1.83E-06, loss= 1.0573 (max= 1.3784), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:28:55,191 - root - INFO - Step 12810: lr=1.83E-06, loss= 1.0573 (max= 1.3784), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:26,953 - root - INFO - Step 12820: lr=1.82E-06, loss= 1.0312 (max= 1.4599), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:26,953 - root - INFO - Step 12820: lr=1.82E-06, loss= 1.0312 (max= 1.4599), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:26,953 - root - INFO - Step 12820: lr=1.82E-06, loss= 1.0312 (max= 1.4599), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:26,953 - root - INFO - Step 12820: lr=1.82E-06, loss= 1.0312 (max= 1.4599), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:26,953 - root - INFO - Step 12820: lr=1.82E-06, loss= 1.0312 (max= 1.4599), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:26,953 - root - INFO - Step 12820: lr=1.82E-06, loss= 1.0312 (max= 1.4599), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:26,953 - root - INFO - Step 12820: lr=1.82E-06, loss= 1.0312 (max= 1.4599), tps=20636, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:26,953 - root - INFO - Step 12820: lr=1.82E-06, loss= 1.0312 (max= 1.4599), tps=20636, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:58,830 - root - INFO - Step 12830: lr=1.82E-06, loss= 1.0477 (max= 1.4293), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:58,830 - root - INFO - Step 12830: lr=1.82E-06, loss= 1.0477 (max= 1.4293), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:58,830 - root - INFO - Step 12830: lr=1.82E-06, loss= 1.0477 (max= 1.4293), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:58,830 - root - INFO - Step 12830: lr=1.82E-06, loss= 1.0477 (max= 1.4293), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:58,830 - root - INFO - Step 12830: lr=1.82E-06, loss= 1.0477 (max= 1.4293), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:58,830 - root - INFO - Step 12830: lr=1.82E-06, loss= 1.0477 (max= 1.4293), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:58,830 - root - INFO - Step 12830: lr=1.82E-06, loss= 1.0477 (max= 1.4293), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:29:58,830 - root - INFO - Step 12830: lr=1.82E-06, loss= 1.0477 (max= 1.4293), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:30:30,689 - root - INFO - Step 12840: lr=1.82E-06, loss= 1.0361 (max= 1.5223), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:30:30,689 - root - INFO - Step 12840: lr=1.82E-06, loss= 1.0361 (max= 1.5223), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:30:30,689 - root - INFO - Step 12840: lr=1.82E-06, loss= 1.0361 (max= 1.5223), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:30:30,689 - root - INFO - Step 12840: lr=1.82E-06, loss= 1.0361 (max= 1.5223), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:30:30,689 - root - INFO - Step 12840: lr=1.82E-06, loss= 1.0361 (max= 1.5223), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:30:30,689 - root - INFO - Step 12840: lr=1.82E-06, loss= 1.0361 (max= 1.5223), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:30:30,689 - root - INFO - Step 12840: lr=1.82E-06, loss= 1.0361 (max= 1.5223), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:30:30,689 - root - INFO - Step 12840: lr=1.82E-06, loss= 1.0361 (max= 1.5223), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:02,529 - root - INFO - Step 12850: lr=1.81E-06, loss= 1.0540 (max= 1.4796), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:02,529 - root - INFO - Step 12850: lr=1.81E-06, loss= 1.0540 (max= 1.4796), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:02,529 - root - INFO - Step 12850: lr=1.81E-06, loss= 1.0540 (max= 1.4796), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:02,529 - root - INFO - Step 12850: lr=1.81E-06, loss= 1.0540 (max= 1.4796), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:02,529 - root - INFO - Step 12850: lr=1.81E-06, loss= 1.0540 (max= 1.4796), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:02,529 - root - INFO - Step 12850: lr=1.81E-06, loss= 1.0540 (max= 1.4796), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:02,529 - root - INFO - Step 12850: lr=1.81E-06, loss= 1.0540 (max= 1.4796), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:02,530 - root - INFO - Step 12850: lr=1.81E-06, loss= 1.0540 (max= 1.4796), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:34,554 - root - INFO - Step 12860: lr=1.81E-06, loss= 1.0469 (max= 1.5339), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:34,554 - root - INFO - Step 12860: lr=1.81E-06, loss= 1.0469 (max= 1.5339), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:34,554 - root - INFO - Step 12860: lr=1.81E-06, loss= 1.0469 (max= 1.5339), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:34,554 - root - INFO - Step 12860: lr=1.81E-06, loss= 1.0469 (max= 1.5339), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:34,554 - root - INFO - Step 12860: lr=1.81E-06, loss= 1.0469 (max= 1.5339), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:34,554 - root - INFO - Step 12860: lr=1.81E-06, loss= 1.0469 (max= 1.5339), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:34,554 - root - INFO - Step 12860: lr=1.81E-06, loss= 1.0469 (max= 1.5339), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:31:34,554 - root - INFO - Step 12860: lr=1.81E-06, loss= 1.0469 (max= 1.5339), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:06,339 - root - INFO - Step 12870: lr=1.81E-06, loss= 1.0327 (max= 1.6458), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:06,339 - root - INFO - Step 12870: lr=1.81E-06, loss= 1.0327 (max= 1.6458), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:06,339 - root - INFO - Step 12870: lr=1.81E-06, loss= 1.0327 (max= 1.6458), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:06,339 - root - INFO - Step 12870: lr=1.81E-06, loss= 1.0327 (max= 1.6458), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:06,339 - root - INFO - Step 12870: lr=1.81E-06, loss= 1.0327 (max= 1.6458), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:06,339 - root - INFO - Step 12870: lr=1.81E-06, loss= 1.0327 (max= 1.6458), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:06,339 - root - INFO - Step 12870: lr=1.81E-06, loss= 1.0327 (max= 1.6458), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:06,339 - root - INFO - Step 12870: lr=1.81E-06, loss= 1.0327 (max= 1.6458), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:38,119 - root - INFO - Step 12880: lr=1.80E-06, loss= 1.0391 (max= 1.7223), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:38,119 - root - INFO - Step 12880: lr=1.80E-06, loss= 1.0391 (max= 1.7223), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:38,120 - root - INFO - Step 12880: lr=1.80E-06, loss= 1.0391 (max= 1.7223), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:38,120 - root - INFO - Step 12880: lr=1.80E-06, loss= 1.0391 (max= 1.7223), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:38,120 - root - INFO - Step 12880: lr=1.80E-06, loss= 1.0391 (max= 1.7223), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:38,120 - root - INFO - Step 12880: lr=1.80E-06, loss= 1.0391 (max= 1.7223), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:38,120 - root - INFO - Step 12880: lr=1.80E-06, loss= 1.0391 (max= 1.7223), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:32:38,120 - root - INFO - Step 12880: lr=1.80E-06, loss= 1.0391 (max= 1.7223), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:09,957 - root - INFO - Step 12890: lr=1.80E-06, loss= 1.0635 (max= 1.4466), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:09,957 - root - INFO - Step 12890: lr=1.80E-06, loss= 1.0635 (max= 1.4466), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:09,957 - root - INFO - Step 12890: lr=1.80E-06, loss= 1.0635 (max= 1.4466), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:09,957 - root - INFO - Step 12890: lr=1.80E-06, loss= 1.0635 (max= 1.4466), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:09,957 - root - INFO - Step 12890: lr=1.80E-06, loss= 1.0635 (max= 1.4466), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:09,957 - root - INFO - Step 12890: lr=1.80E-06, loss= 1.0635 (max= 1.4466), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:09,957 - root - INFO - Step 12890: lr=1.80E-06, loss= 1.0635 (max= 1.4466), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:09,957 - root - INFO - Step 12890: lr=1.80E-06, loss= 1.0635 (max= 1.4466), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:41,838 - root - INFO - Step 12900: lr=1.80E-06, loss= 1.0437 (max= 1.4616), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:41,838 - root - INFO - Step 12900: lr=1.80E-06, loss= 1.0437 (max= 1.4616), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:41,838 - root - INFO - Step 12900: lr=1.80E-06, loss= 1.0437 (max= 1.4616), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:41,838 - root - INFO - Step 12900: lr=1.80E-06, loss= 1.0437 (max= 1.4616), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:41,838 - root - INFO - Step 12900: lr=1.80E-06, loss= 1.0437 (max= 1.4616), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:41,838 - root - INFO - Step 12900: lr=1.80E-06, loss= 1.0437 (max= 1.4616), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:41,838 - root - INFO - Step 12900: lr=1.80E-06, loss= 1.0437 (max= 1.4616), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:33:41,838 - root - INFO - Step 12900: lr=1.80E-06, loss= 1.0437 (max= 1.4616), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:13,721 - root - INFO - Step 12910: lr=1.79E-06, loss= 1.0432 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:13,721 - root - INFO - Step 12910: lr=1.79E-06, loss= 1.0432 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:13,721 - root - INFO - Step 12910: lr=1.79E-06, loss= 1.0432 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:13,721 - root - INFO - Step 12910: lr=1.79E-06, loss= 1.0432 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:13,721 - root - INFO - Step 12910: lr=1.79E-06, loss= 1.0432 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:13,721 - root - INFO - Step 12910: lr=1.79E-06, loss= 1.0432 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:13,722 - root - INFO - Step 12910: lr=1.79E-06, loss= 1.0432 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:13,722 - root - INFO - Step 12910: lr=1.79E-06, loss= 1.0432 (max= 1.4070), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:45,589 - root - INFO - Step 12920: lr=1.79E-06, loss= 1.0628 (max= 1.4850), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:45,589 - root - INFO - Step 12920: lr=1.79E-06, loss= 1.0628 (max= 1.4850), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:45,590 - root - INFO - Step 12920: lr=1.79E-06, loss= 1.0628 (max= 1.4850), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:45,590 - root - INFO - Step 12920: lr=1.79E-06, loss= 1.0628 (max= 1.4850), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:45,590 - root - INFO - Step 12920: lr=1.79E-06, loss= 1.0628 (max= 1.4850), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:45,590 - root - INFO - Step 12920: lr=1.79E-06, loss= 1.0628 (max= 1.4850), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:45,590 - root - INFO - Step 12920: lr=1.79E-06, loss= 1.0628 (max= 1.4850), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:34:45,590 - root - INFO - Step 12920: lr=1.79E-06, loss= 1.0628 (max= 1.4850), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:17,448 - root - INFO - Step 12930: lr=1.79E-06, loss= 1.0535 (max= 1.9531), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:17,448 - root - INFO - Step 12930: lr=1.79E-06, loss= 1.0535 (max= 1.9531), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:17,448 - root - INFO - Step 12930: lr=1.79E-06, loss= 1.0535 (max= 1.9531), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:17,448 - root - INFO - Step 12930: lr=1.79E-06, loss= 1.0535 (max= 1.9531), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:17,448 - root - INFO - Step 12930: lr=1.79E-06, loss= 1.0535 (max= 1.9531), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:17,448 - root - INFO - Step 12930: lr=1.79E-06, loss= 1.0535 (max= 1.9531), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:17,448 - root - INFO - Step 12930: lr=1.79E-06, loss= 1.0535 (max= 1.9531), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:17,448 - root - INFO - Step 12930: lr=1.79E-06, loss= 1.0535 (max= 1.9531), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:49,296 - root - INFO - Step 12940: lr=1.78E-06, loss= 1.0480 (max= 1.4769), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:49,296 - root - INFO - Step 12940: lr=1.78E-06, loss= 1.0480 (max= 1.4769), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:49,296 - root - INFO - Step 12940: lr=1.78E-06, loss= 1.0480 (max= 1.4769), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:49,296 - root - INFO - Step 12940: lr=1.78E-06, loss= 1.0480 (max= 1.4769), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:49,296 - root - INFO - Step 12940: lr=1.78E-06, loss= 1.0480 (max= 1.4769), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:49,296 - root - INFO - Step 12940: lr=1.78E-06, loss= 1.0480 (max= 1.4769), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:49,296 - root - INFO - Step 12940: lr=1.78E-06, loss= 1.0480 (max= 1.4769), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:35:49,296 - root - INFO - Step 12940: lr=1.78E-06, loss= 1.0480 (max= 1.4769), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:21,219 - root - INFO - Step 12950: lr=1.78E-06, loss= 1.0455 (max= 1.5589), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:21,219 - root - INFO - Step 12950: lr=1.78E-06, loss= 1.0455 (max= 1.5589), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:21,219 - root - INFO - Step 12950: lr=1.78E-06, loss= 1.0455 (max= 1.5589), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:21,219 - root - INFO - Step 12950: lr=1.78E-06, loss= 1.0455 (max= 1.5589), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:21,219 - root - INFO - Step 12950: lr=1.78E-06, loss= 1.0455 (max= 1.5589), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:21,219 - root - INFO - Step 12950: lr=1.78E-06, loss= 1.0455 (max= 1.5589), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:21,219 - root - INFO - Step 12950: lr=1.78E-06, loss= 1.0455 (max= 1.5589), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:21,219 - root - INFO - Step 12950: lr=1.78E-06, loss= 1.0455 (max= 1.5589), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:53,048 - root - INFO - Step 12960: lr=1.78E-06, loss= 1.0557 (max= 1.4722), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:53,048 - root - INFO - Step 12960: lr=1.78E-06, loss= 1.0557 (max= 1.4722), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:53,048 - root - INFO - Step 12960: lr=1.78E-06, loss= 1.0557 (max= 1.4722), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:53,048 - root - INFO - Step 12960: lr=1.78E-06, loss= 1.0557 (max= 1.4722), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:53,048 - root - INFO - Step 12960: lr=1.78E-06, loss= 1.0557 (max= 1.4722), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:53,048 - root - INFO - Step 12960: lr=1.78E-06, loss= 1.0557 (max= 1.4722), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:53,048 - root - INFO - Step 12960: lr=1.78E-06, loss= 1.0557 (max= 1.4722), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:36:53,048 - root - INFO - Step 12960: lr=1.78E-06, loss= 1.0557 (max= 1.4722), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:24,907 - root - INFO - Step 12970: lr=1.77E-06, loss= 1.0424 (max= 1.5548), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:24,907 - root - INFO - Step 12970: lr=1.77E-06, loss= 1.0424 (max= 1.5548), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:24,907 - root - INFO - Step 12970: lr=1.77E-06, loss= 1.0424 (max= 1.5548), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:24,907 - root - INFO - Step 12970: lr=1.77E-06, loss= 1.0424 (max= 1.5548), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:24,907 - root - INFO - Step 12970: lr=1.77E-06, loss= 1.0424 (max= 1.5548), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:24,907 - root - INFO - Step 12970: lr=1.77E-06, loss= 1.0424 (max= 1.5548), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:24,907 - root - INFO - Step 12970: lr=1.77E-06, loss= 1.0424 (max= 1.5548), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:24,907 - root - INFO - Step 12970: lr=1.77E-06, loss= 1.0424 (max= 1.5548), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:56,681 - root - INFO - Step 12980: lr=1.77E-06, loss= 1.0377 (max= 1.4074), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:56,681 - root - INFO - Step 12980: lr=1.77E-06, loss= 1.0377 (max= 1.4074), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:56,681 - root - INFO - Step 12980: lr=1.77E-06, loss= 1.0377 (max= 1.4074), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:56,681 - root - INFO - Step 12980: lr=1.77E-06, loss= 1.0377 (max= 1.4074), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:56,681 - root - INFO - Step 12980: lr=1.77E-06, loss= 1.0377 (max= 1.4074), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:56,681 - root - INFO - Step 12980: lr=1.77E-06, loss= 1.0377 (max= 1.4074), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:56,681 - root - INFO - Step 12980: lr=1.77E-06, loss= 1.0377 (max= 1.4074), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:37:56,681 - root - INFO - Step 12980: lr=1.77E-06, loss= 1.0377 (max= 1.4074), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:38:28,737 - root - INFO - Step 12990: lr=1.77E-06, loss= 1.0560 (max= 1.5116), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:38:28,737 - root - INFO - Step 12990: lr=1.77E-06, loss= 1.0560 (max= 1.5116), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:38:28,737 - root - INFO - Step 12990: lr=1.77E-06, loss= 1.0560 (max= 1.5116), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:38:28,737 - root - INFO - Step 12990: lr=1.77E-06, loss= 1.0560 (max= 1.5116), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:38:28,737 - root - INFO - Step 12990: lr=1.77E-06, loss= 1.0560 (max= 1.5116), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:38:28,737 - root - INFO - Step 12990: lr=1.77E-06, loss= 1.0560 (max= 1.5116), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:38:28,737 - root - INFO - Step 12990: lr=1.77E-06, loss= 1.0560 (max= 1.5116), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:38:28,738 - root - INFO - Step 12990: lr=1.77E-06, loss= 1.0560 (max= 1.5116), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-13000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-13000! Save time: 4.344569683074951 2025-10-26 19:39:00,608 - root - INFO - Step 13000: lr=1.76E-06, loss= 1.0436 (max= 1.5258), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:00,608 - root - INFO - Saving a full checkpoint at step 13000 2025-10-26 19:39:00,608 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 19:39:00,608 - root - INFO - Step 13000: lr=1.76E-06, loss= 1.0436 (max= 1.5258), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:00,609 - root - INFO - Step 13000: lr=1.76E-06, loss= 1.0436 (max= 1.5258), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:00,609 - root - INFO - Step 13000: lr=1.76E-06, loss= 1.0436 (max= 1.5258), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:00,609 - root - INFO - Step 13000: lr=1.76E-06, loss= 1.0436 (max= 1.5258), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:00,609 - root - INFO - Step 13000: lr=1.76E-06, loss= 1.0436 (max= 1.5258), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:00,609 - root - INFO - Step 13000: lr=1.76E-06, loss= 1.0436 (max= 1.5258), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:00,609 - root - INFO - Step 13000: lr=1.76E-06, loss= 1.0436 (max= 1.5258), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:00,609 - root - INFO - Saving a full checkpoint at step 13000 2025-10-26 19:39:00,609 - root - INFO - Saving a full checkpoint at step 13000 2025-10-26 19:39:00,609 - root - INFO - Saving a full checkpoint at step 13000 2025-10-26 19:39:00,609 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 19:39:00,609 - root - INFO - Saving a full checkpoint at step 13000 2025-10-26 19:39:00,609 - root - INFO - Saving a full checkpoint at step 13000 2025-10-26 19:39:00,609 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 19:39:00,609 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 19:39:00,609 - root - INFO - Saving a full checkpoint at step 13000 2025-10-26 19:39:00,609 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 19:39:00,609 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 19:39:00,609 - root - INFO - Saving a full checkpoint at step 13000 2025-10-26 19:39:00,609 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 19:39:00,609 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 19:39:15,416 - root - INFO - Finished saving the checkpoint in 14.81 seconds 2025-10-26 19:39:15,424 - root - INFO - Finished saving the checkpoint in 14.82 seconds 2025-10-26 19:39:15,424 - root - INFO - Finished saving the checkpoint in 14.82 seconds 2025-10-26 19:39:15,424 - root - INFO - Finished saving the checkpoint in 14.82 seconds 2025-10-26 19:39:15,424 - root - INFO - Finished saving the checkpoint in 14.82 seconds 2025-10-26 19:39:15,425 - root - INFO - Finished saving the checkpoint in 14.82 seconds 2025-10-26 19:39:15,425 - root - INFO - Finished saving the checkpoint in 14.82 seconds 2025-10-26 19:39:15,426 - root - INFO - Finished saving the checkpoint in 14.82 seconds 2025-10-26 19:39:47,202 - root - INFO - Step 13010: lr=1.76E-06, loss= 1.0469 (max= 1.4873), tps=14066, mfu=29.31%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:47,203 - root - INFO - Step 13010: lr=1.76E-06, loss= 1.0469 (max= 1.4873), tps=14066, mfu=29.31%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:47,203 - root - INFO - Step 13010: lr=1.76E-06, loss= 1.0469 (max= 1.4873), tps=14066, mfu=29.31%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:47,203 - root - INFO - Step 13010: lr=1.76E-06, loss= 1.0469 (max= 1.4873), tps=14066, mfu=29.31%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:47,203 - root - INFO - Step 13010: lr=1.76E-06, loss= 1.0469 (max= 1.4873), tps=14066, mfu=29.31%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:47,203 - root - INFO - Step 13010: lr=1.76E-06, loss= 1.0469 (max= 1.4873), tps=14066, mfu=29.31%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:47,203 - root - INFO - Step 13010: lr=1.76E-06, loss= 1.0469 (max= 1.4873), tps=14066, mfu=29.31%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:39:47,203 - root - INFO - Step 13010: lr=1.76E-06, loss= 1.0469 (max= 1.4873), tps=14066, mfu=29.31%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:19,054 - root - INFO - Step 13020: lr=1.76E-06, loss= 1.0859 (max= 1.6647), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:19,054 - root - INFO - Step 13020: lr=1.76E-06, loss= 1.0859 (max= 1.6647), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:19,054 - root - INFO - Step 13020: lr=1.76E-06, loss= 1.0859 (max= 1.6647), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:19,054 - root - INFO - Step 13020: lr=1.76E-06, loss= 1.0859 (max= 1.6647), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:19,054 - root - INFO - Step 13020: lr=1.76E-06, loss= 1.0859 (max= 1.6647), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:19,054 - root - INFO - Step 13020: lr=1.76E-06, loss= 1.0859 (max= 1.6647), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:19,054 - root - INFO - Step 13020: lr=1.76E-06, loss= 1.0859 (max= 1.6647), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:19,054 - root - INFO - Step 13020: lr=1.76E-06, loss= 1.0859 (max= 1.6647), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:45,115 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5120280 2025-10-26 19:40:50,899 - root - INFO - Step 13030: lr=1.75E-06, loss= 1.0520 (max= 1.7753), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:50,900 - root - INFO - Step 13030: lr=1.75E-06, loss= 1.0520 (max= 1.7753), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:50,900 - root - INFO - Step 13030: lr=1.75E-06, loss= 1.0520 (max= 1.7753), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:50,900 - root - INFO - Step 13030: lr=1.75E-06, loss= 1.0520 (max= 1.7753), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:50,900 - root - INFO - Step 13030: lr=1.75E-06, loss= 1.0520 (max= 1.7753), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:50,900 - root - INFO - Step 13030: lr=1.75E-06, loss= 1.0520 (max= 1.7753), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:50,900 - root - INFO - Step 13030: lr=1.75E-06, loss= 1.0520 (max= 1.7753), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:40:50,900 - root - INFO - Step 13030: lr=1.75E-06, loss= 1.0520 (max= 1.7753), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:22,723 - root - INFO - Step 13040: lr=1.75E-06, loss= 1.0826 (max= 1.5022), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:22,723 - root - INFO - Step 13040: lr=1.75E-06, loss= 1.0826 (max= 1.5022), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:22,723 - root - INFO - Step 13040: lr=1.75E-06, loss= 1.0826 (max= 1.5022), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:22,723 - root - INFO - Step 13040: lr=1.75E-06, loss= 1.0826 (max= 1.5022), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:22,723 - root - INFO - Step 13040: lr=1.75E-06, loss= 1.0826 (max= 1.5022), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:22,723 - root - INFO - Step 13040: lr=1.75E-06, loss= 1.0826 (max= 1.5022), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:22,723 - root - INFO - Step 13040: lr=1.75E-06, loss= 1.0826 (max= 1.5022), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:22,724 - root - INFO - Step 13040: lr=1.75E-06, loss= 1.0826 (max= 1.5022), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:54,517 - root - INFO - Step 13050: lr=1.75E-06, loss= 1.0495 (max= 1.5145), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:54,517 - root - INFO - Step 13050: lr=1.75E-06, loss= 1.0495 (max= 1.5145), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:54,517 - root - INFO - Step 13050: lr=1.75E-06, loss= 1.0495 (max= 1.5145), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:54,517 - root - INFO - Step 13050: lr=1.75E-06, loss= 1.0495 (max= 1.5145), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:54,517 - root - INFO - Step 13050: lr=1.75E-06, loss= 1.0495 (max= 1.5145), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:54,517 - root - INFO - Step 13050: lr=1.75E-06, loss= 1.0495 (max= 1.5145), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:54,517 - root - INFO - Step 13050: lr=1.75E-06, loss= 1.0495 (max= 1.5145), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:41:54,517 - root - INFO - Step 13050: lr=1.75E-06, loss= 1.0495 (max= 1.5145), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:26,315 - root - INFO - Step 13060: lr=1.74E-06, loss= 1.0653 (max= 1.5486), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:26,315 - root - INFO - Step 13060: lr=1.74E-06, loss= 1.0653 (max= 1.5486), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:26,315 - root - INFO - Step 13060: lr=1.74E-06, loss= 1.0653 (max= 1.5486), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:26,315 - root - INFO - Step 13060: lr=1.74E-06, loss= 1.0653 (max= 1.5486), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:26,315 - root - INFO - Step 13060: lr=1.74E-06, loss= 1.0653 (max= 1.5486), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:26,315 - root - INFO - Step 13060: lr=1.74E-06, loss= 1.0653 (max= 1.5486), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:26,315 - root - INFO - Step 13060: lr=1.74E-06, loss= 1.0653 (max= 1.5486), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:26,315 - root - INFO - Step 13060: lr=1.74E-06, loss= 1.0653 (max= 1.5486), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:58,171 - root - INFO - Step 13070: lr=1.74E-06, loss= 1.0535 (max= 1.5076), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:58,171 - root - INFO - Step 13070: lr=1.74E-06, loss= 1.0535 (max= 1.5076), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:58,171 - root - INFO - Step 13070: lr=1.74E-06, loss= 1.0535 (max= 1.5076), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:58,171 - root - INFO - Step 13070: lr=1.74E-06, loss= 1.0535 (max= 1.5076), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:58,171 - root - INFO - Step 13070: lr=1.74E-06, loss= 1.0535 (max= 1.5076), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:58,171 - root - INFO - Step 13070: lr=1.74E-06, loss= 1.0535 (max= 1.5076), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:58,171 - root - INFO - Step 13070: lr=1.74E-06, loss= 1.0535 (max= 1.5076), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:42:58,171 - root - INFO - Step 13070: lr=1.74E-06, loss= 1.0535 (max= 1.5076), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:43:30,049 - root - INFO - Step 13080: lr=1.74E-06, loss= 1.0340 (max= 1.5056), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:43:30,049 - root - INFO - Step 13080: lr=1.74E-06, loss= 1.0340 (max= 1.5056), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:43:30,049 - root - INFO - Step 13080: lr=1.74E-06, loss= 1.0340 (max= 1.5056), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:43:30,050 - root - INFO - Step 13080: lr=1.74E-06, loss= 1.0340 (max= 1.5056), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:43:30,050 - root - INFO - Step 13080: lr=1.74E-06, loss= 1.0340 (max= 1.5056), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:43:30,050 - root - INFO - Step 13080: lr=1.74E-06, loss= 1.0340 (max= 1.5056), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:43:30,050 - root - INFO - Step 13080: lr=1.74E-06, loss= 1.0340 (max= 1.5056), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:43:30,050 - root - INFO - Step 13080: lr=1.74E-06, loss= 1.0340 (max= 1.5056), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:01,900 - root - INFO - Step 13090: lr=1.73E-06, loss= 1.0231 (max= 1.6766), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:01,900 - root - INFO - Step 13090: lr=1.73E-06, loss= 1.0231 (max= 1.6766), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:01,900 - root - INFO - Step 13090: lr=1.73E-06, loss= 1.0231 (max= 1.6766), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:01,900 - root - INFO - Step 13090: lr=1.73E-06, loss= 1.0231 (max= 1.6766), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:01,900 - root - INFO - Step 13090: lr=1.73E-06, loss= 1.0231 (max= 1.6766), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:01,900 - root - INFO - Step 13090: lr=1.73E-06, loss= 1.0231 (max= 1.6766), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:01,900 - root - INFO - Step 13090: lr=1.73E-06, loss= 1.0231 (max= 1.6766), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:01,900 - root - INFO - Step 13090: lr=1.73E-06, loss= 1.0231 (max= 1.6766), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:33,709 - root - INFO - Step 13100: lr=1.73E-06, loss= 1.0443 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:33,709 - root - INFO - Step 13100: lr=1.73E-06, loss= 1.0443 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:33,709 - root - INFO - Step 13100: lr=1.73E-06, loss= 1.0443 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:33,710 - root - INFO - Step 13100: lr=1.73E-06, loss= 1.0443 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:33,710 - root - INFO - Step 13100: lr=1.73E-06, loss= 1.0443 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:33,710 - root - INFO - Step 13100: lr=1.73E-06, loss= 1.0443 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:33,710 - root - INFO - Step 13100: lr=1.73E-06, loss= 1.0443 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:44:33,710 - root - INFO - Step 13100: lr=1.73E-06, loss= 1.0443 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:05,533 - root - INFO - Step 13110: lr=1.73E-06, loss= 1.0411 (max= 1.4843), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:05,534 - root - INFO - Step 13110: lr=1.73E-06, loss= 1.0411 (max= 1.4843), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:05,534 - root - INFO - Step 13110: lr=1.73E-06, loss= 1.0411 (max= 1.4843), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:05,534 - root - INFO - Step 13110: lr=1.73E-06, loss= 1.0411 (max= 1.4843), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:05,534 - root - INFO - Step 13110: lr=1.73E-06, loss= 1.0411 (max= 1.4843), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:05,534 - root - INFO - Step 13110: lr=1.73E-06, loss= 1.0411 (max= 1.4843), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:05,534 - root - INFO - Step 13110: lr=1.73E-06, loss= 1.0411 (max= 1.4843), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:05,534 - root - INFO - Step 13110: lr=1.73E-06, loss= 1.0411 (max= 1.4843), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:37,513 - root - INFO - Step 13120: lr=1.72E-06, loss= 1.0752 (max= 1.5398), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:37,513 - root - INFO - Step 13120: lr=1.72E-06, loss= 1.0752 (max= 1.5398), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:37,513 - root - INFO - Step 13120: lr=1.72E-06, loss= 1.0752 (max= 1.5398), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:37,513 - root - INFO - Step 13120: lr=1.72E-06, loss= 1.0752 (max= 1.5398), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:37,513 - root - INFO - Step 13120: lr=1.72E-06, loss= 1.0752 (max= 1.5398), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:37,513 - root - INFO - Step 13120: lr=1.72E-06, loss= 1.0752 (max= 1.5398), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:37,513 - root - INFO - Step 13120: lr=1.72E-06, loss= 1.0752 (max= 1.5398), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:45:37,513 - root - INFO - Step 13120: lr=1.72E-06, loss= 1.0752 (max= 1.5398), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:09,450 - root - INFO - Step 13130: lr=1.72E-06, loss= 1.0622 (max= 1.6576), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:09,450 - root - INFO - Step 13130: lr=1.72E-06, loss= 1.0622 (max= 1.6576), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:09,450 - root - INFO - Step 13130: lr=1.72E-06, loss= 1.0622 (max= 1.6576), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:09,450 - root - INFO - Step 13130: lr=1.72E-06, loss= 1.0622 (max= 1.6576), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:09,450 - root - INFO - Step 13130: lr=1.72E-06, loss= 1.0622 (max= 1.6576), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:09,450 - root - INFO - Step 13130: lr=1.72E-06, loss= 1.0622 (max= 1.6576), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:09,450 - root - INFO - Step 13130: lr=1.72E-06, loss= 1.0622 (max= 1.6576), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:09,450 - root - INFO - Step 13130: lr=1.72E-06, loss= 1.0622 (max= 1.6576), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:41,260 - root - INFO - Step 13140: lr=1.72E-06, loss= 1.0472 (max= 1.5033), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:41,260 - root - INFO - Step 13140: lr=1.72E-06, loss= 1.0472 (max= 1.5033), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:41,260 - root - INFO - Step 13140: lr=1.72E-06, loss= 1.0472 (max= 1.5033), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:41,260 - root - INFO - Step 13140: lr=1.72E-06, loss= 1.0472 (max= 1.5033), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:41,261 - root - INFO - Step 13140: lr=1.72E-06, loss= 1.0472 (max= 1.5033), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:41,261 - root - INFO - Step 13140: lr=1.72E-06, loss= 1.0472 (max= 1.5033), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:41,261 - root - INFO - Step 13140: lr=1.72E-06, loss= 1.0472 (max= 1.5033), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:46:41,261 - root - INFO - Step 13140: lr=1.72E-06, loss= 1.0472 (max= 1.5033), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:13,088 - root - INFO - Step 13150: lr=1.71E-06, loss= 1.0685 (max= 1.7205), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:13,088 - root - INFO - Step 13150: lr=1.71E-06, loss= 1.0685 (max= 1.7205), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:13,088 - root - INFO - Step 13150: lr=1.71E-06, loss= 1.0685 (max= 1.7205), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:13,088 - root - INFO - Step 13150: lr=1.71E-06, loss= 1.0685 (max= 1.7205), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:13,088 - root - INFO - Step 13150: lr=1.71E-06, loss= 1.0685 (max= 1.7205), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:13,088 - root - INFO - Step 13150: lr=1.71E-06, loss= 1.0685 (max= 1.7205), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:13,088 - root - INFO - Step 13150: lr=1.71E-06, loss= 1.0685 (max= 1.7205), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:13,088 - root - INFO - Step 13150: lr=1.71E-06, loss= 1.0685 (max= 1.7205), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:44,951 - root - INFO - Step 13160: lr=1.71E-06, loss= 1.0709 (max= 1.6780), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:44,952 - root - INFO - Step 13160: lr=1.71E-06, loss= 1.0709 (max= 1.6780), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:44,952 - root - INFO - Step 13160: lr=1.71E-06, loss= 1.0709 (max= 1.6780), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:44,952 - root - INFO - Step 13160: lr=1.71E-06, loss= 1.0709 (max= 1.6780), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:44,952 - root - INFO - Step 13160: lr=1.71E-06, loss= 1.0709 (max= 1.6780), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:44,952 - root - INFO - Step 13160: lr=1.71E-06, loss= 1.0709 (max= 1.6780), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:44,952 - root - INFO - Step 13160: lr=1.71E-06, loss= 1.0709 (max= 1.6780), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:47:44,952 - root - INFO - Step 13160: lr=1.71E-06, loss= 1.0709 (max= 1.6780), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:16,858 - root - INFO - Step 13170: lr=1.71E-06, loss= 1.0687 (max= 1.6604), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:16,859 - root - INFO - Step 13170: lr=1.71E-06, loss= 1.0687 (max= 1.6604), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:16,859 - root - INFO - Step 13170: lr=1.71E-06, loss= 1.0687 (max= 1.6604), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:16,859 - root - INFO - Step 13170: lr=1.71E-06, loss= 1.0687 (max= 1.6604), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:16,859 - root - INFO - Step 13170: lr=1.71E-06, loss= 1.0687 (max= 1.6604), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:16,859 - root - INFO - Step 13170: lr=1.71E-06, loss= 1.0687 (max= 1.6604), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:16,859 - root - INFO - Step 13170: lr=1.71E-06, loss= 1.0687 (max= 1.6604), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:16,859 - root - INFO - Step 13170: lr=1.71E-06, loss= 1.0687 (max= 1.6604), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:48,686 - root - INFO - Step 13180: lr=1.70E-06, loss= 1.0702 (max= 1.5661), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:48,686 - root - INFO - Step 13180: lr=1.70E-06, loss= 1.0702 (max= 1.5661), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:48,686 - root - INFO - Step 13180: lr=1.70E-06, loss= 1.0702 (max= 1.5661), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:48,686 - root - INFO - Step 13180: lr=1.70E-06, loss= 1.0702 (max= 1.5661), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:48,686 - root - INFO - Step 13180: lr=1.70E-06, loss= 1.0702 (max= 1.5661), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:48,686 - root - INFO - Step 13180: lr=1.70E-06, loss= 1.0702 (max= 1.5661), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:48,686 - root - INFO - Step 13180: lr=1.70E-06, loss= 1.0702 (max= 1.5661), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:48:48,686 - root - INFO - Step 13180: lr=1.70E-06, loss= 1.0702 (max= 1.5661), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:20,528 - root - INFO - Step 13190: lr=1.70E-06, loss= 1.0695 (max= 1.5524), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:20,528 - root - INFO - Step 13190: lr=1.70E-06, loss= 1.0695 (max= 1.5524), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:20,528 - root - INFO - Step 13190: lr=1.70E-06, loss= 1.0695 (max= 1.5524), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:20,528 - root - INFO - Step 13190: lr=1.70E-06, loss= 1.0695 (max= 1.5524), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:20,528 - root - INFO - Step 13190: lr=1.70E-06, loss= 1.0695 (max= 1.5524), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:20,528 - root - INFO - Step 13190: lr=1.70E-06, loss= 1.0695 (max= 1.5524), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:20,528 - root - INFO - Step 13190: lr=1.70E-06, loss= 1.0695 (max= 1.5524), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:20,528 - root - INFO - Step 13190: lr=1.70E-06, loss= 1.0695 (max= 1.5524), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:52,368 - root - INFO - Step 13200: lr=1.70E-06, loss= 1.0579 (max= 1.4736), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:52,368 - root - INFO - Step 13200: lr=1.70E-06, loss= 1.0579 (max= 1.4736), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:52,368 - root - INFO - Step 13200: lr=1.70E-06, loss= 1.0579 (max= 1.4736), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:52,368 - root - INFO - Step 13200: lr=1.70E-06, loss= 1.0579 (max= 1.4736), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:52,368 - root - INFO - Step 13200: lr=1.70E-06, loss= 1.0579 (max= 1.4736), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:52,368 - root - INFO - Step 13200: lr=1.70E-06, loss= 1.0579 (max= 1.4736), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:52,368 - root - INFO - Step 13200: lr=1.70E-06, loss= 1.0579 (max= 1.4736), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:49:52,369 - root - INFO - Step 13200: lr=1.70E-06, loss= 1.0579 (max= 1.4736), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:24,355 - root - INFO - Step 13210: lr=1.69E-06, loss= 1.0550 (max= 1.4672), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:24,355 - root - INFO - Step 13210: lr=1.69E-06, loss= 1.0550 (max= 1.4672), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:24,355 - root - INFO - Step 13210: lr=1.69E-06, loss= 1.0550 (max= 1.4672), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:24,355 - root - INFO - Step 13210: lr=1.69E-06, loss= 1.0550 (max= 1.4672), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:24,355 - root - INFO - Step 13210: lr=1.69E-06, loss= 1.0550 (max= 1.4672), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:24,355 - root - INFO - Step 13210: lr=1.69E-06, loss= 1.0550 (max= 1.4672), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:24,355 - root - INFO - Step 13210: lr=1.69E-06, loss= 1.0550 (max= 1.4672), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:24,355 - root - INFO - Step 13210: lr=1.69E-06, loss= 1.0550 (max= 1.4672), tps=20490, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:56,216 - root - INFO - Step 13220: lr=1.69E-06, loss= 1.0544 (max= 1.4893), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:56,216 - root - INFO - Step 13220: lr=1.69E-06, loss= 1.0544 (max= 1.4893), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:56,216 - root - INFO - Step 13220: lr=1.69E-06, loss= 1.0544 (max= 1.4893), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:56,216 - root - INFO - Step 13220: lr=1.69E-06, loss= 1.0544 (max= 1.4893), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:56,216 - root - INFO - Step 13220: lr=1.69E-06, loss= 1.0544 (max= 1.4893), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:56,216 - root - INFO - Step 13220: lr=1.69E-06, loss= 1.0544 (max= 1.4893), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:56,216 - root - INFO - Step 13220: lr=1.69E-06, loss= 1.0544 (max= 1.4893), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:50:56,216 - root - INFO - Step 13220: lr=1.69E-06, loss= 1.0544 (max= 1.4893), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:51:28,168 - root - INFO - Step 13230: lr=1.69E-06, loss= 1.0416 (max= 1.5433), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:51:28,168 - root - INFO - Step 13230: lr=1.69E-06, loss= 1.0416 (max= 1.5433), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:51:28,168 - root - INFO - Step 13230: lr=1.69E-06, loss= 1.0416 (max= 1.5433), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:51:28,168 - root - INFO - Step 13230: lr=1.69E-06, loss= 1.0416 (max= 1.5433), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:51:28,168 - root - INFO - Step 13230: lr=1.69E-06, loss= 1.0416 (max= 1.5433), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:51:28,168 - root - INFO - Step 13230: lr=1.69E-06, loss= 1.0416 (max= 1.5433), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:51:28,169 - root - INFO - Step 13230: lr=1.69E-06, loss= 1.0416 (max= 1.5433), tps=20513, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:51:28,169 - root - INFO - Step 13230: lr=1.69E-06, loss= 1.0416 (max= 1.5433), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:00,207 - root - INFO - Step 13240: lr=1.68E-06, loss= 1.0465 (max= 1.5018), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:00,207 - root - INFO - Step 13240: lr=1.68E-06, loss= 1.0465 (max= 1.5018), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:00,207 - root - INFO - Step 13240: lr=1.68E-06, loss= 1.0465 (max= 1.5018), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:00,207 - root - INFO - Step 13240: lr=1.68E-06, loss= 1.0465 (max= 1.5018), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:00,207 - root - INFO - Step 13240: lr=1.68E-06, loss= 1.0465 (max= 1.5018), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:00,207 - root - INFO - Step 13240: lr=1.68E-06, loss= 1.0465 (max= 1.5018), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:00,207 - root - INFO - Step 13240: lr=1.68E-06, loss= 1.0465 (max= 1.5018), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:00,208 - root - INFO - Step 13240: lr=1.68E-06, loss= 1.0465 (max= 1.5018), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:32,004 - root - INFO - Step 13250: lr=1.68E-06, loss= 1.0533 (max= 1.5898), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:32,004 - root - INFO - Step 13250: lr=1.68E-06, loss= 1.0533 (max= 1.5898), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:32,004 - root - INFO - Step 13250: lr=1.68E-06, loss= 1.0533 (max= 1.5898), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:32,004 - root - INFO - Step 13250: lr=1.68E-06, loss= 1.0533 (max= 1.5898), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:32,004 - root - INFO - Step 13250: lr=1.68E-06, loss= 1.0533 (max= 1.5898), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:32,004 - root - INFO - Step 13250: lr=1.68E-06, loss= 1.0533 (max= 1.5898), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:32,004 - root - INFO - Step 13250: lr=1.68E-06, loss= 1.0533 (max= 1.5898), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:52:32,004 - root - INFO - Step 13250: lr=1.68E-06, loss= 1.0533 (max= 1.5898), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:03,863 - root - INFO - Step 13260: lr=1.68E-06, loss= 1.0757 (max= 1.4350), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:03,863 - root - INFO - Step 13260: lr=1.68E-06, loss= 1.0757 (max= 1.4350), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:03,863 - root - INFO - Step 13260: lr=1.68E-06, loss= 1.0757 (max= 1.4350), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:03,863 - root - INFO - Step 13260: lr=1.68E-06, loss= 1.0757 (max= 1.4350), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:03,863 - root - INFO - Step 13260: lr=1.68E-06, loss= 1.0757 (max= 1.4350), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:03,863 - root - INFO - Step 13260: lr=1.68E-06, loss= 1.0757 (max= 1.4350), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:03,863 - root - INFO - Step 13260: lr=1.68E-06, loss= 1.0757 (max= 1.4350), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:03,863 - root - INFO - Step 13260: lr=1.68E-06, loss= 1.0757 (max= 1.4350), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:35,681 - root - INFO - Step 13270: lr=1.68E-06, loss= 1.0652 (max= 1.8725), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:35,681 - root - INFO - Step 13270: lr=1.68E-06, loss= 1.0652 (max= 1.8725), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:35,681 - root - INFO - Step 13270: lr=1.68E-06, loss= 1.0652 (max= 1.8725), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:35,681 - root - INFO - Step 13270: lr=1.68E-06, loss= 1.0652 (max= 1.8725), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:35,681 - root - INFO - Step 13270: lr=1.68E-06, loss= 1.0652 (max= 1.8725), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:35,681 - root - INFO - Step 13270: lr=1.68E-06, loss= 1.0652 (max= 1.8725), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:35,681 - root - INFO - Step 13270: lr=1.68E-06, loss= 1.0652 (max= 1.8725), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:53:35,681 - root - INFO - Step 13270: lr=1.68E-06, loss= 1.0652 (max= 1.8725), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:07,454 - root - INFO - Step 13280: lr=1.67E-06, loss= 1.0503 (max= 1.4332), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:07,454 - root - INFO - Step 13280: lr=1.67E-06, loss= 1.0503 (max= 1.4332), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:07,454 - root - INFO - Step 13280: lr=1.67E-06, loss= 1.0503 (max= 1.4332), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:07,454 - root - INFO - Step 13280: lr=1.67E-06, loss= 1.0503 (max= 1.4332), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:07,454 - root - INFO - Step 13280: lr=1.67E-06, loss= 1.0503 (max= 1.4332), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:07,454 - root - INFO - Step 13280: lr=1.67E-06, loss= 1.0503 (max= 1.4332), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:07,454 - root - INFO - Step 13280: lr=1.67E-06, loss= 1.0503 (max= 1.4332), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:07,454 - root - INFO - Step 13280: lr=1.67E-06, loss= 1.0503 (max= 1.4332), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:39,271 - root - INFO - Step 13290: lr=1.67E-06, loss= 1.0579 (max= 1.4340), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:39,272 - root - INFO - Step 13290: lr=1.67E-06, loss= 1.0579 (max= 1.4340), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:39,272 - root - INFO - Step 13290: lr=1.67E-06, loss= 1.0579 (max= 1.4340), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:39,272 - root - INFO - Step 13290: lr=1.67E-06, loss= 1.0579 (max= 1.4340), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:39,272 - root - INFO - Step 13290: lr=1.67E-06, loss= 1.0579 (max= 1.4340), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:39,272 - root - INFO - Step 13290: lr=1.67E-06, loss= 1.0579 (max= 1.4340), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:39,272 - root - INFO - Step 13290: lr=1.67E-06, loss= 1.0579 (max= 1.4340), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:54:39,272 - root - INFO - Step 13290: lr=1.67E-06, loss= 1.0579 (max= 1.4340), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:11,100 - root - INFO - Step 13300: lr=1.67E-06, loss= 1.0226 (max= 1.4678), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:11,100 - root - INFO - Step 13300: lr=1.67E-06, loss= 1.0226 (max= 1.4678), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:11,100 - root - INFO - Step 13300: lr=1.67E-06, loss= 1.0226 (max= 1.4678), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:11,100 - root - INFO - Step 13300: lr=1.67E-06, loss= 1.0226 (max= 1.4678), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:11,100 - root - INFO - Step 13300: lr=1.67E-06, loss= 1.0226 (max= 1.4678), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:11,100 - root - INFO - Step 13300: lr=1.67E-06, loss= 1.0226 (max= 1.4678), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:11,100 - root - INFO - Step 13300: lr=1.67E-06, loss= 1.0226 (max= 1.4678), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:11,100 - root - INFO - Step 13300: lr=1.67E-06, loss= 1.0226 (max= 1.4678), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:42,996 - root - INFO - Step 13310: lr=1.66E-06, loss= 1.0852 (max= 1.5453), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:42,996 - root - INFO - Step 13310: lr=1.66E-06, loss= 1.0852 (max= 1.5453), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:42,996 - root - INFO - Step 13310: lr=1.66E-06, loss= 1.0852 (max= 1.5453), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:42,996 - root - INFO - Step 13310: lr=1.66E-06, loss= 1.0852 (max= 1.5453), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:42,996 - root - INFO - Step 13310: lr=1.66E-06, loss= 1.0852 (max= 1.5453), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:42,996 - root - INFO - Step 13310: lr=1.66E-06, loss= 1.0852 (max= 1.5453), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:42,996 - root - INFO - Step 13310: lr=1.66E-06, loss= 1.0852 (max= 1.5453), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:55:42,996 - root - INFO - Step 13310: lr=1.66E-06, loss= 1.0852 (max= 1.5453), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:14,861 - root - INFO - Step 13320: lr=1.66E-06, loss= 1.0585 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:14,861 - root - INFO - Step 13320: lr=1.66E-06, loss= 1.0585 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:14,861 - root - INFO - Step 13320: lr=1.66E-06, loss= 1.0585 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:14,861 - root - INFO - Step 13320: lr=1.66E-06, loss= 1.0585 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:14,861 - root - INFO - Step 13320: lr=1.66E-06, loss= 1.0585 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:14,861 - root - INFO - Step 13320: lr=1.66E-06, loss= 1.0585 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:14,861 - root - INFO - Step 13320: lr=1.66E-06, loss= 1.0585 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:14,861 - root - INFO - Step 13320: lr=1.66E-06, loss= 1.0585 (max= 1.5423), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:46,765 - root - INFO - Step 13330: lr=1.66E-06, loss= 1.0664 (max= 1.5850), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:46,765 - root - INFO - Step 13330: lr=1.66E-06, loss= 1.0664 (max= 1.5850), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:46,765 - root - INFO - Step 13330: lr=1.66E-06, loss= 1.0664 (max= 1.5850), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:46,765 - root - INFO - Step 13330: lr=1.66E-06, loss= 1.0664 (max= 1.5850), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:46,765 - root - INFO - Step 13330: lr=1.66E-06, loss= 1.0664 (max= 1.5850), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:46,765 - root - INFO - Step 13330: lr=1.66E-06, loss= 1.0664 (max= 1.5850), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:46,765 - root - INFO - Step 13330: lr=1.66E-06, loss= 1.0664 (max= 1.5850), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:56:46,765 - root - INFO - Step 13330: lr=1.66E-06, loss= 1.0664 (max= 1.5850), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:18,878 - root - INFO - Step 13340: lr=1.65E-06, loss= 1.0702 (max= 1.4961), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:18,878 - root - INFO - Step 13340: lr=1.65E-06, loss= 1.0702 (max= 1.4961), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:18,878 - root - INFO - Step 13340: lr=1.65E-06, loss= 1.0702 (max= 1.4961), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:18,878 - root - INFO - Step 13340: lr=1.65E-06, loss= 1.0702 (max= 1.4961), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:18,878 - root - INFO - Step 13340: lr=1.65E-06, loss= 1.0702 (max= 1.4961), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:18,878 - root - INFO - Step 13340: lr=1.65E-06, loss= 1.0702 (max= 1.4961), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:18,878 - root - INFO - Step 13340: lr=1.65E-06, loss= 1.0702 (max= 1.4961), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:18,878 - root - INFO - Step 13340: lr=1.65E-06, loss= 1.0702 (max= 1.4961), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:50,777 - root - INFO - Step 13350: lr=1.65E-06, loss= 1.0302 (max= 1.5037), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:50,778 - root - INFO - Step 13350: lr=1.65E-06, loss= 1.0302 (max= 1.5037), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:50,778 - root - INFO - Step 13350: lr=1.65E-06, loss= 1.0302 (max= 1.5037), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:50,778 - root - INFO - Step 13350: lr=1.65E-06, loss= 1.0302 (max= 1.5037), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:50,778 - root - INFO - Step 13350: lr=1.65E-06, loss= 1.0302 (max= 1.5037), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:50,778 - root - INFO - Step 13350: lr=1.65E-06, loss= 1.0302 (max= 1.5037), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:50,778 - root - INFO - Step 13350: lr=1.65E-06, loss= 1.0302 (max= 1.5037), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:57:50,778 - root - INFO - Step 13350: lr=1.65E-06, loss= 1.0302 (max= 1.5037), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:22,665 - root - INFO - Step 13360: lr=1.65E-06, loss= 1.0578 (max= 1.6349), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:22,665 - root - INFO - Step 13360: lr=1.65E-06, loss= 1.0578 (max= 1.6349), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:22,665 - root - INFO - Step 13360: lr=1.65E-06, loss= 1.0578 (max= 1.6349), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:22,665 - root - INFO - Step 13360: lr=1.65E-06, loss= 1.0578 (max= 1.6349), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:22,665 - root - INFO - Step 13360: lr=1.65E-06, loss= 1.0578 (max= 1.6349), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:22,665 - root - INFO - Step 13360: lr=1.65E-06, loss= 1.0578 (max= 1.6349), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:22,665 - root - INFO - Step 13360: lr=1.65E-06, loss= 1.0578 (max= 1.6349), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:22,665 - root - INFO - Step 13360: lr=1.65E-06, loss= 1.0578 (max= 1.6349), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:54,481 - root - INFO - Step 13370: lr=1.64E-06, loss= 1.0498 (max= 1.5183), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:54,482 - root - INFO - Step 13370: lr=1.64E-06, loss= 1.0498 (max= 1.5183), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:54,482 - root - INFO - Step 13370: lr=1.64E-06, loss= 1.0498 (max= 1.5183), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:54,482 - root - INFO - Step 13370: lr=1.64E-06, loss= 1.0498 (max= 1.5183), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:54,482 - root - INFO - Step 13370: lr=1.64E-06, loss= 1.0498 (max= 1.5183), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:54,482 - root - INFO - Step 13370: lr=1.64E-06, loss= 1.0498 (max= 1.5183), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:54,482 - root - INFO - Step 13370: lr=1.64E-06, loss= 1.0498 (max= 1.5183), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:58:54,482 - root - INFO - Step 13370: lr=1.64E-06, loss= 1.0498 (max= 1.5183), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:26,339 - root - INFO - Step 13380: lr=1.64E-06, loss= 1.0550 (max= 1.4707), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:26,339 - root - INFO - Step 13380: lr=1.64E-06, loss= 1.0550 (max= 1.4707), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:26,339 - root - INFO - Step 13380: lr=1.64E-06, loss= 1.0550 (max= 1.4707), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:26,339 - root - INFO - Step 13380: lr=1.64E-06, loss= 1.0550 (max= 1.4707), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:26,339 - root - INFO - Step 13380: lr=1.64E-06, loss= 1.0550 (max= 1.4707), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:26,339 - root - INFO - Step 13380: lr=1.64E-06, loss= 1.0550 (max= 1.4707), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:26,339 - root - INFO - Step 13380: lr=1.64E-06, loss= 1.0550 (max= 1.4707), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:26,339 - root - INFO - Step 13380: lr=1.64E-06, loss= 1.0550 (max= 1.4707), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:58,177 - root - INFO - Step 13390: lr=1.64E-06, loss= 1.0563 (max= 1.6156), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:58,177 - root - INFO - Step 13390: lr=1.64E-06, loss= 1.0563 (max= 1.6156), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:58,177 - root - INFO - Step 13390: lr=1.64E-06, loss= 1.0563 (max= 1.6156), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:58,177 - root - INFO - Step 13390: lr=1.64E-06, loss= 1.0563 (max= 1.6156), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:58,177 - root - INFO - Step 13390: lr=1.64E-06, loss= 1.0563 (max= 1.6156), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:58,177 - root - INFO - Step 13390: lr=1.64E-06, loss= 1.0563 (max= 1.6156), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:58,177 - root - INFO - Step 13390: lr=1.64E-06, loss= 1.0563 (max= 1.6156), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 19:59:58,177 - root - INFO - Step 13390: lr=1.64E-06, loss= 1.0563 (max= 1.6156), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:00:30,136 - root - INFO - Step 13400: lr=1.63E-06, loss= 1.0456 (max= 1.6190), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:00:30,136 - root - INFO - Step 13400: lr=1.63E-06, loss= 1.0456 (max= 1.6190), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:00:30,136 - root - INFO - Step 13400: lr=1.63E-06, loss= 1.0456 (max= 1.6190), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:00:30,136 - root - INFO - Step 13400: lr=1.63E-06, loss= 1.0456 (max= 1.6190), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:00:30,136 - root - INFO - Step 13400: lr=1.63E-06, loss= 1.0456 (max= 1.6190), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:00:30,136 - root - INFO - Step 13400: lr=1.63E-06, loss= 1.0456 (max= 1.6190), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:00:30,136 - root - INFO - Step 13400: lr=1.63E-06, loss= 1.0456 (max= 1.6190), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:00:30,137 - root - INFO - Step 13400: lr=1.63E-06, loss= 1.0456 (max= 1.6190), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:02,027 - root - INFO - Step 13410: lr=1.63E-06, loss= 1.0612 (max= 1.7292), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:02,028 - root - INFO - Step 13410: lr=1.63E-06, loss= 1.0612 (max= 1.7292), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:02,028 - root - INFO - Step 13410: lr=1.63E-06, loss= 1.0612 (max= 1.7292), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:02,028 - root - INFO - Step 13410: lr=1.63E-06, loss= 1.0612 (max= 1.7292), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:02,028 - root - INFO - Step 13410: lr=1.63E-06, loss= 1.0612 (max= 1.7292), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:02,028 - root - INFO - Step 13410: lr=1.63E-06, loss= 1.0612 (max= 1.7292), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:02,028 - root - INFO - Step 13410: lr=1.63E-06, loss= 1.0612 (max= 1.7292), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:02,028 - root - INFO - Step 13410: lr=1.63E-06, loss= 1.0612 (max= 1.7292), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:33,974 - root - INFO - Step 13420: lr=1.63E-06, loss= 1.0675 (max= 1.5729), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:33,974 - root - INFO - Step 13420: lr=1.63E-06, loss= 1.0675 (max= 1.5729), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:33,974 - root - INFO - Step 13420: lr=1.63E-06, loss= 1.0675 (max= 1.5729), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:33,974 - root - INFO - Step 13420: lr=1.63E-06, loss= 1.0675 (max= 1.5729), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:33,974 - root - INFO - Step 13420: lr=1.63E-06, loss= 1.0675 (max= 1.5729), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:33,974 - root - INFO - Step 13420: lr=1.63E-06, loss= 1.0675 (max= 1.5729), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:33,974 - root - INFO - Step 13420: lr=1.63E-06, loss= 1.0675 (max= 1.5729), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:01:33,974 - root - INFO - Step 13420: lr=1.63E-06, loss= 1.0675 (max= 1.5729), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:05,879 - root - INFO - Step 13430: lr=1.62E-06, loss= 1.0607 (max= 1.4667), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:05,879 - root - INFO - Step 13430: lr=1.62E-06, loss= 1.0607 (max= 1.4667), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:05,879 - root - INFO - Step 13430: lr=1.62E-06, loss= 1.0607 (max= 1.4667), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:05,879 - root - INFO - Step 13430: lr=1.62E-06, loss= 1.0607 (max= 1.4667), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:05,879 - root - INFO - Step 13430: lr=1.62E-06, loss= 1.0607 (max= 1.4667), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:05,879 - root - INFO - Step 13430: lr=1.62E-06, loss= 1.0607 (max= 1.4667), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:05,879 - root - INFO - Step 13430: lr=1.62E-06, loss= 1.0607 (max= 1.4667), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:05,879 - root - INFO - Step 13430: lr=1.62E-06, loss= 1.0607 (max= 1.4667), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:38,025 - root - INFO - Step 13440: lr=1.62E-06, loss= 1.0448 (max= 1.5110), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:38,025 - root - INFO - Step 13440: lr=1.62E-06, loss= 1.0448 (max= 1.5110), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:38,025 - root - INFO - Step 13440: lr=1.62E-06, loss= 1.0448 (max= 1.5110), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:38,025 - root - INFO - Step 13440: lr=1.62E-06, loss= 1.0448 (max= 1.5110), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:38,025 - root - INFO - Step 13440: lr=1.62E-06, loss= 1.0448 (max= 1.5110), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:38,025 - root - INFO - Step 13440: lr=1.62E-06, loss= 1.0448 (max= 1.5110), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:38,025 - root - INFO - Step 13440: lr=1.62E-06, loss= 1.0448 (max= 1.5110), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:02:38,025 - root - INFO - Step 13440: lr=1.62E-06, loss= 1.0448 (max= 1.5110), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:09,826 - root - INFO - Step 13450: lr=1.62E-06, loss= 1.0690 (max= 1.5058), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:09,827 - root - INFO - Step 13450: lr=1.62E-06, loss= 1.0690 (max= 1.5058), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:09,827 - root - INFO - Step 13450: lr=1.62E-06, loss= 1.0690 (max= 1.5058), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:09,827 - root - INFO - Step 13450: lr=1.62E-06, loss= 1.0690 (max= 1.5058), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:09,827 - root - INFO - Step 13450: lr=1.62E-06, loss= 1.0690 (max= 1.5058), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:09,827 - root - INFO - Step 13450: lr=1.62E-06, loss= 1.0690 (max= 1.5058), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:09,827 - root - INFO - Step 13450: lr=1.62E-06, loss= 1.0690 (max= 1.5058), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:09,827 - root - INFO - Step 13450: lr=1.62E-06, loss= 1.0690 (max= 1.5058), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:41,589 - root - INFO - Step 13460: lr=1.61E-06, loss= 1.0564 (max= 1.5519), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:41,589 - root - INFO - Step 13460: lr=1.61E-06, loss= 1.0564 (max= 1.5519), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:41,589 - root - INFO - Step 13460: lr=1.61E-06, loss= 1.0564 (max= 1.5519), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:41,589 - root - INFO - Step 13460: lr=1.61E-06, loss= 1.0564 (max= 1.5519), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:41,589 - root - INFO - Step 13460: lr=1.61E-06, loss= 1.0564 (max= 1.5519), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:41,589 - root - INFO - Step 13460: lr=1.61E-06, loss= 1.0564 (max= 1.5519), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:41,589 - root - INFO - Step 13460: lr=1.61E-06, loss= 1.0564 (max= 1.5519), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:03:41,589 - root - INFO - Step 13460: lr=1.61E-06, loss= 1.0564 (max= 1.5519), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:13,611 - root - INFO - Step 13470: lr=1.61E-06, loss= 1.0525 (max= 1.6604), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:13,611 - root - INFO - Step 13470: lr=1.61E-06, loss= 1.0525 (max= 1.6604), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:13,611 - root - INFO - Step 13470: lr=1.61E-06, loss= 1.0525 (max= 1.6604), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:13,611 - root - INFO - Step 13470: lr=1.61E-06, loss= 1.0525 (max= 1.6604), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:13,611 - root - INFO - Step 13470: lr=1.61E-06, loss= 1.0525 (max= 1.6604), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:13,611 - root - INFO - Step 13470: lr=1.61E-06, loss= 1.0525 (max= 1.6604), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:13,611 - root - INFO - Step 13470: lr=1.61E-06, loss= 1.0525 (max= 1.6604), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:13,611 - root - INFO - Step 13470: lr=1.61E-06, loss= 1.0525 (max= 1.6604), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:38,222 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2750265 2025-10-26 20:04:45,471 - root - INFO - Step 13480: lr=1.61E-06, loss= 1.0516 (max= 1.5088), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:45,471 - root - INFO - Step 13480: lr=1.61E-06, loss= 1.0516 (max= 1.5088), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:45,471 - root - INFO - Step 13480: lr=1.61E-06, loss= 1.0516 (max= 1.5088), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:45,471 - root - INFO - Step 13480: lr=1.61E-06, loss= 1.0516 (max= 1.5088), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:45,472 - root - INFO - Step 13480: lr=1.61E-06, loss= 1.0516 (max= 1.5088), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:45,472 - root - INFO - Step 13480: lr=1.61E-06, loss= 1.0516 (max= 1.5088), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:45,472 - root - INFO - Step 13480: lr=1.61E-06, loss= 1.0516 (max= 1.5088), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:04:45,472 - root - INFO - Step 13480: lr=1.61E-06, loss= 1.0516 (max= 1.5088), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:17,259 - root - INFO - Step 13490: lr=1.60E-06, loss= 1.0513 (max= 1.4283), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:17,259 - root - INFO - Step 13490: lr=1.60E-06, loss= 1.0513 (max= 1.4283), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:17,259 - root - INFO - Step 13490: lr=1.60E-06, loss= 1.0513 (max= 1.4283), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:17,259 - root - INFO - Step 13490: lr=1.60E-06, loss= 1.0513 (max= 1.4283), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:17,259 - root - INFO - Step 13490: lr=1.60E-06, loss= 1.0513 (max= 1.4283), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:17,259 - root - INFO - Step 13490: lr=1.60E-06, loss= 1.0513 (max= 1.4283), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:17,259 - root - INFO - Step 13490: lr=1.60E-06, loss= 1.0513 (max= 1.4283), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:17,259 - root - INFO - Step 13490: lr=1.60E-06, loss= 1.0513 (max= 1.4283), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:49,228 - root - INFO - Step 13500: lr=1.60E-06, loss= 1.0439 (max= 1.5416), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:49,229 - root - INFO - Step 13500: lr=1.60E-06, loss= 1.0439 (max= 1.5416), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:49,229 - root - INFO - Step 13500: lr=1.60E-06, loss= 1.0439 (max= 1.5416), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:49,229 - root - INFO - Step 13500: lr=1.60E-06, loss= 1.0439 (max= 1.5416), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:49,229 - root - INFO - Step 13500: lr=1.60E-06, loss= 1.0439 (max= 1.5416), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:49,229 - root - INFO - Step 13500: lr=1.60E-06, loss= 1.0439 (max= 1.5416), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:49,229 - root - INFO - Step 13500: lr=1.60E-06, loss= 1.0439 (max= 1.5416), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:05:49,229 - root - INFO - Step 13500: lr=1.60E-06, loss= 1.0439 (max= 1.5416), tps=20502, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:21,101 - root - INFO - Step 13510: lr=1.60E-06, loss= 1.0470 (max= 1.4992), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:21,101 - root - INFO - Step 13510: lr=1.60E-06, loss= 1.0470 (max= 1.4992), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:21,101 - root - INFO - Step 13510: lr=1.60E-06, loss= 1.0470 (max= 1.4992), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:21,101 - root - INFO - Step 13510: lr=1.60E-06, loss= 1.0470 (max= 1.4992), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:21,101 - root - INFO - Step 13510: lr=1.60E-06, loss= 1.0470 (max= 1.4992), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:21,101 - root - INFO - Step 13510: lr=1.60E-06, loss= 1.0470 (max= 1.4992), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:21,101 - root - INFO - Step 13510: lr=1.60E-06, loss= 1.0470 (max= 1.4992), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:21,101 - root - INFO - Step 13510: lr=1.60E-06, loss= 1.0470 (max= 1.4992), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:53,122 - root - INFO - Step 13520: lr=1.59E-06, loss= 1.0460 (max= 1.4866), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:53,123 - root - INFO - Step 13520: lr=1.59E-06, loss= 1.0460 (max= 1.4866), tps=20469, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:53,123 - root - INFO - Step 13520: lr=1.59E-06, loss= 1.0460 (max= 1.4866), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:53,123 - root - INFO - Step 13520: lr=1.59E-06, loss= 1.0460 (max= 1.4866), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:53,123 - root - INFO - Step 13520: lr=1.59E-06, loss= 1.0460 (max= 1.4866), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:53,123 - root - INFO - Step 13520: lr=1.59E-06, loss= 1.0460 (max= 1.4866), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:53,123 - root - INFO - Step 13520: lr=1.59E-06, loss= 1.0460 (max= 1.4866), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:06:53,123 - root - INFO - Step 13520: lr=1.59E-06, loss= 1.0460 (max= 1.4866), tps=20468, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:24,919 - root - INFO - Step 13530: lr=1.59E-06, loss= 1.0325 (max= 1.5380), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:24,919 - root - INFO - Step 13530: lr=1.59E-06, loss= 1.0325 (max= 1.5380), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:24,919 - root - INFO - Step 13530: lr=1.59E-06, loss= 1.0325 (max= 1.5380), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:24,919 - root - INFO - Step 13530: lr=1.59E-06, loss= 1.0325 (max= 1.5380), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:24,919 - root - INFO - Step 13530: lr=1.59E-06, loss= 1.0325 (max= 1.5380), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:24,920 - root - INFO - Step 13530: lr=1.59E-06, loss= 1.0325 (max= 1.5380), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:24,920 - root - INFO - Step 13530: lr=1.59E-06, loss= 1.0325 (max= 1.5380), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:24,920 - root - INFO - Step 13530: lr=1.59E-06, loss= 1.0325 (max= 1.5380), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:56,984 - root - INFO - Step 13540: lr=1.59E-06, loss= 1.0707 (max= 1.8450), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:56,984 - root - INFO - Step 13540: lr=1.59E-06, loss= 1.0707 (max= 1.8450), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:56,984 - root - INFO - Step 13540: lr=1.59E-06, loss= 1.0707 (max= 1.8450), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:56,984 - root - INFO - Step 13540: lr=1.59E-06, loss= 1.0707 (max= 1.8450), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:56,984 - root - INFO - Step 13540: lr=1.59E-06, loss= 1.0707 (max= 1.8450), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:56,984 - root - INFO - Step 13540: lr=1.59E-06, loss= 1.0707 (max= 1.8450), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:56,984 - root - INFO - Step 13540: lr=1.59E-06, loss= 1.0707 (max= 1.8450), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:07:56,984 - root - INFO - Step 13540: lr=1.59E-06, loss= 1.0707 (max= 1.8450), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:08:28,842 - root - INFO - Step 13550: lr=1.58E-06, loss= 1.0723 (max= 1.5004), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:08:28,842 - root - INFO - Step 13550: lr=1.58E-06, loss= 1.0723 (max= 1.5004), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:08:28,842 - root - INFO - Step 13550: lr=1.58E-06, loss= 1.0723 (max= 1.5004), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:08:28,842 - root - INFO - Step 13550: lr=1.58E-06, loss= 1.0723 (max= 1.5004), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:08:28,842 - root - INFO - Step 13550: lr=1.58E-06, loss= 1.0723 (max= 1.5004), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:08:28,842 - root - INFO - Step 13550: lr=1.58E-06, loss= 1.0723 (max= 1.5004), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:08:28,842 - root - INFO - Step 13550: lr=1.58E-06, loss= 1.0723 (max= 1.5004), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:08:28,842 - root - INFO - Step 13550: lr=1.58E-06, loss= 1.0723 (max= 1.5004), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:00,807 - root - INFO - Step 13560: lr=1.58E-06, loss= 1.0527 (max= 1.5107), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:00,807 - root - INFO - Step 13560: lr=1.58E-06, loss= 1.0527 (max= 1.5107), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:00,807 - root - INFO - Step 13560: lr=1.58E-06, loss= 1.0527 (max= 1.5107), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:00,807 - root - INFO - Step 13560: lr=1.58E-06, loss= 1.0527 (max= 1.5107), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:00,807 - root - INFO - Step 13560: lr=1.58E-06, loss= 1.0527 (max= 1.5107), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:00,807 - root - INFO - Step 13560: lr=1.58E-06, loss= 1.0527 (max= 1.5107), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:00,807 - root - INFO - Step 13560: lr=1.58E-06, loss= 1.0527 (max= 1.5107), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:00,807 - root - INFO - Step 13560: lr=1.58E-06, loss= 1.0527 (max= 1.5107), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:32,609 - root - INFO - Step 13570: lr=1.58E-06, loss= 1.0546 (max= 1.4379), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:32,609 - root - INFO - Step 13570: lr=1.58E-06, loss= 1.0546 (max= 1.4379), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:32,609 - root - INFO - Step 13570: lr=1.58E-06, loss= 1.0546 (max= 1.4379), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:32,609 - root - INFO - Step 13570: lr=1.58E-06, loss= 1.0546 (max= 1.4379), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:32,609 - root - INFO - Step 13570: lr=1.58E-06, loss= 1.0546 (max= 1.4379), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:32,609 - root - INFO - Step 13570: lr=1.58E-06, loss= 1.0546 (max= 1.4379), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:32,609 - root - INFO - Step 13570: lr=1.58E-06, loss= 1.0546 (max= 1.4379), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:09:32,609 - root - INFO - Step 13570: lr=1.58E-06, loss= 1.0546 (max= 1.4379), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:04,449 - root - INFO - Step 13580: lr=1.57E-06, loss= 1.0293 (max= 1.5213), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:04,450 - root - INFO - Step 13580: lr=1.57E-06, loss= 1.0293 (max= 1.5213), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:04,450 - root - INFO - Step 13580: lr=1.57E-06, loss= 1.0293 (max= 1.5213), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:04,450 - root - INFO - Step 13580: lr=1.57E-06, loss= 1.0293 (max= 1.5213), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:04,450 - root - INFO - Step 13580: lr=1.57E-06, loss= 1.0293 (max= 1.5213), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:04,450 - root - INFO - Step 13580: lr=1.57E-06, loss= 1.0293 (max= 1.5213), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:04,450 - root - INFO - Step 13580: lr=1.57E-06, loss= 1.0293 (max= 1.5213), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:04,450 - root - INFO - Step 13580: lr=1.57E-06, loss= 1.0293 (max= 1.5213), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:36,275 - root - INFO - Step 13590: lr=1.57E-06, loss= 1.0820 (max= 1.5018), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:36,275 - root - INFO - Step 13590: lr=1.57E-06, loss= 1.0820 (max= 1.5018), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:36,275 - root - INFO - Step 13590: lr=1.57E-06, loss= 1.0820 (max= 1.5018), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:36,275 - root - INFO - Step 13590: lr=1.57E-06, loss= 1.0820 (max= 1.5018), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:36,275 - root - INFO - Step 13590: lr=1.57E-06, loss= 1.0820 (max= 1.5018), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:36,275 - root - INFO - Step 13590: lr=1.57E-06, loss= 1.0820 (max= 1.5018), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:36,275 - root - INFO - Step 13590: lr=1.57E-06, loss= 1.0820 (max= 1.5018), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:10:36,275 - root - INFO - Step 13590: lr=1.57E-06, loss= 1.0820 (max= 1.5018), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:08,308 - root - INFO - Step 13600: lr=1.57E-06, loss= 1.0696 (max= 1.7758), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:08,309 - root - INFO - Step 13600: lr=1.57E-06, loss= 1.0696 (max= 1.7758), tps=20461, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:08,309 - root - INFO - Step 13600: lr=1.57E-06, loss= 1.0696 (max= 1.7758), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:08,309 - root - INFO - Step 13600: lr=1.57E-06, loss= 1.0696 (max= 1.7758), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:08,309 - root - INFO - Step 13600: lr=1.57E-06, loss= 1.0696 (max= 1.7758), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:08,309 - root - INFO - Step 13600: lr=1.57E-06, loss= 1.0696 (max= 1.7758), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:08,309 - root - INFO - Step 13600: lr=1.57E-06, loss= 1.0696 (max= 1.7758), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:08,309 - root - INFO - Step 13600: lr=1.57E-06, loss= 1.0696 (max= 1.7758), tps=20460, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:40,214 - root - INFO - Step 13610: lr=1.56E-06, loss= 1.0623 (max= 1.5058), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:40,214 - root - INFO - Step 13610: lr=1.56E-06, loss= 1.0623 (max= 1.5058), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:40,214 - root - INFO - Step 13610: lr=1.56E-06, loss= 1.0623 (max= 1.5058), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:40,214 - root - INFO - Step 13610: lr=1.56E-06, loss= 1.0623 (max= 1.5058), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:40,214 - root - INFO - Step 13610: lr=1.56E-06, loss= 1.0623 (max= 1.5058), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:40,214 - root - INFO - Step 13610: lr=1.56E-06, loss= 1.0623 (max= 1.5058), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:40,214 - root - INFO - Step 13610: lr=1.56E-06, loss= 1.0623 (max= 1.5058), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:11:40,214 - root - INFO - Step 13610: lr=1.56E-06, loss= 1.0623 (max= 1.5058), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:12,099 - root - INFO - Step 13620: lr=1.56E-06, loss= 1.0698 (max= 1.5717), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:12,099 - root - INFO - Step 13620: lr=1.56E-06, loss= 1.0698 (max= 1.5717), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:12,099 - root - INFO - Step 13620: lr=1.56E-06, loss= 1.0698 (max= 1.5717), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:12,099 - root - INFO - Step 13620: lr=1.56E-06, loss= 1.0698 (max= 1.5717), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:12,099 - root - INFO - Step 13620: lr=1.56E-06, loss= 1.0698 (max= 1.5717), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:12,099 - root - INFO - Step 13620: lr=1.56E-06, loss= 1.0698 (max= 1.5717), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:12,099 - root - INFO - Step 13620: lr=1.56E-06, loss= 1.0698 (max= 1.5717), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:12,099 - root - INFO - Step 13620: lr=1.56E-06, loss= 1.0698 (max= 1.5717), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:43,890 - root - INFO - Step 13630: lr=1.56E-06, loss= 1.0605 (max= 1.4665), tps=20617, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:43,890 - root - INFO - Step 13630: lr=1.56E-06, loss= 1.0605 (max= 1.4665), tps=20617, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:43,890 - root - INFO - Step 13630: lr=1.56E-06, loss= 1.0605 (max= 1.4665), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:43,890 - root - INFO - Step 13630: lr=1.56E-06, loss= 1.0605 (max= 1.4665), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:43,890 - root - INFO - Step 13630: lr=1.56E-06, loss= 1.0605 (max= 1.4665), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:43,890 - root - INFO - Step 13630: lr=1.56E-06, loss= 1.0605 (max= 1.4665), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:43,890 - root - INFO - Step 13630: lr=1.56E-06, loss= 1.0605 (max= 1.4665), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:12:43,890 - root - INFO - Step 13630: lr=1.56E-06, loss= 1.0605 (max= 1.4665), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:16,013 - root - INFO - Step 13640: lr=1.56E-06, loss= 1.0588 (max= 1.4957), tps=20404, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:16,013 - root - INFO - Step 13640: lr=1.56E-06, loss= 1.0588 (max= 1.4957), tps=20404, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:16,013 - root - INFO - Step 13640: lr=1.56E-06, loss= 1.0588 (max= 1.4957), tps=20404, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:16,013 - root - INFO - Step 13640: lr=1.56E-06, loss= 1.0588 (max= 1.4957), tps=20404, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:16,013 - root - INFO - Step 13640: lr=1.56E-06, loss= 1.0588 (max= 1.4957), tps=20404, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:16,013 - root - INFO - Step 13640: lr=1.56E-06, loss= 1.0588 (max= 1.4957), tps=20404, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:16,013 - root - INFO - Step 13640: lr=1.56E-06, loss= 1.0588 (max= 1.4957), tps=20404, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:16,013 - root - INFO - Step 13640: lr=1.56E-06, loss= 1.0588 (max= 1.4957), tps=20404, mfu=42.51%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:47,868 - root - INFO - Step 13650: lr=1.55E-06, loss= 1.0550 (max= 1.4389), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:47,868 - root - INFO - Step 13650: lr=1.55E-06, loss= 1.0550 (max= 1.4389), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:47,868 - root - INFO - Step 13650: lr=1.55E-06, loss= 1.0550 (max= 1.4389), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:47,868 - root - INFO - Step 13650: lr=1.55E-06, loss= 1.0550 (max= 1.4389), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:47,868 - root - INFO - Step 13650: lr=1.55E-06, loss= 1.0550 (max= 1.4389), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:47,868 - root - INFO - Step 13650: lr=1.55E-06, loss= 1.0550 (max= 1.4389), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:47,868 - root - INFO - Step 13650: lr=1.55E-06, loss= 1.0550 (max= 1.4389), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:13:47,868 - root - INFO - Step 13650: lr=1.55E-06, loss= 1.0550 (max= 1.4389), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:19,750 - root - INFO - Step 13660: lr=1.55E-06, loss= 1.0540 (max= 1.8414), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:19,750 - root - INFO - Step 13660: lr=1.55E-06, loss= 1.0540 (max= 1.8414), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:19,750 - root - INFO - Step 13660: lr=1.55E-06, loss= 1.0540 (max= 1.8414), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:19,750 - root - INFO - Step 13660: lr=1.55E-06, loss= 1.0540 (max= 1.8414), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:19,750 - root - INFO - Step 13660: lr=1.55E-06, loss= 1.0540 (max= 1.8414), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:19,750 - root - INFO - Step 13660: lr=1.55E-06, loss= 1.0540 (max= 1.8414), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:19,750 - root - INFO - Step 13660: lr=1.55E-06, loss= 1.0540 (max= 1.8414), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:19,750 - root - INFO - Step 13660: lr=1.55E-06, loss= 1.0540 (max= 1.8414), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:51,864 - root - INFO - Step 13670: lr=1.55E-06, loss= 1.0464 (max= 1.6439), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:51,864 - root - INFO - Step 13670: lr=1.55E-06, loss= 1.0464 (max= 1.6439), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:51,864 - root - INFO - Step 13670: lr=1.55E-06, loss= 1.0464 (max= 1.6439), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:51,864 - root - INFO - Step 13670: lr=1.55E-06, loss= 1.0464 (max= 1.6439), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:51,864 - root - INFO - Step 13670: lr=1.55E-06, loss= 1.0464 (max= 1.6439), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:51,864 - root - INFO - Step 13670: lr=1.55E-06, loss= 1.0464 (max= 1.6439), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:51,864 - root - INFO - Step 13670: lr=1.55E-06, loss= 1.0464 (max= 1.6439), tps=20410, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:14:51,864 - root - INFO - Step 13670: lr=1.55E-06, loss= 1.0464 (max= 1.6439), tps=20409, mfu=42.52%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:23,775 - root - INFO - Step 13680: lr=1.54E-06, loss= 1.0607 (max= 1.5788), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:23,775 - root - INFO - Step 13680: lr=1.54E-06, loss= 1.0607 (max= 1.5788), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:23,775 - root - INFO - Step 13680: lr=1.54E-06, loss= 1.0607 (max= 1.5788), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:23,775 - root - INFO - Step 13680: lr=1.54E-06, loss= 1.0607 (max= 1.5788), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:23,775 - root - INFO - Step 13680: lr=1.54E-06, loss= 1.0607 (max= 1.5788), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:23,775 - root - INFO - Step 13680: lr=1.54E-06, loss= 1.0607 (max= 1.5788), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:23,775 - root - INFO - Step 13680: lr=1.54E-06, loss= 1.0607 (max= 1.5788), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:23,775 - root - INFO - Step 13680: lr=1.54E-06, loss= 1.0607 (max= 1.5788), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:55,618 - root - INFO - Step 13690: lr=1.54E-06, loss= 1.0715 (max= 1.6253), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:55,618 - root - INFO - Step 13690: lr=1.54E-06, loss= 1.0715 (max= 1.6253), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:55,618 - root - INFO - Step 13690: lr=1.54E-06, loss= 1.0715 (max= 1.6253), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:55,618 - root - INFO - Step 13690: lr=1.54E-06, loss= 1.0715 (max= 1.6253), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:55,618 - root - INFO - Step 13690: lr=1.54E-06, loss= 1.0715 (max= 1.6253), tps=20583, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:55,618 - root - INFO - Step 13690: lr=1.54E-06, loss= 1.0715 (max= 1.6253), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:55,618 - root - INFO - Step 13690: lr=1.54E-06, loss= 1.0715 (max= 1.6253), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:15:55,618 - root - INFO - Step 13690: lr=1.54E-06, loss= 1.0715 (max= 1.6253), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:27,444 - root - INFO - Step 13700: lr=1.54E-06, loss= 1.0572 (max= 1.3863), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:27,444 - root - INFO - Step 13700: lr=1.54E-06, loss= 1.0572 (max= 1.3863), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:27,444 - root - INFO - Step 13700: lr=1.54E-06, loss= 1.0572 (max= 1.3863), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:27,444 - root - INFO - Step 13700: lr=1.54E-06, loss= 1.0572 (max= 1.3863), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:27,444 - root - INFO - Step 13700: lr=1.54E-06, loss= 1.0572 (max= 1.3863), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:27,444 - root - INFO - Step 13700: lr=1.54E-06, loss= 1.0572 (max= 1.3863), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:27,444 - root - INFO - Step 13700: lr=1.54E-06, loss= 1.0572 (max= 1.3863), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:27,444 - root - INFO - Step 13700: lr=1.54E-06, loss= 1.0572 (max= 1.3863), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:59,241 - root - INFO - Step 13710: lr=1.53E-06, loss= 1.0543 (max= 1.4507), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:59,241 - root - INFO - Step 13710: lr=1.53E-06, loss= 1.0543 (max= 1.4507), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:59,241 - root - INFO - Step 13710: lr=1.53E-06, loss= 1.0543 (max= 1.4507), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:59,241 - root - INFO - Step 13710: lr=1.53E-06, loss= 1.0543 (max= 1.4507), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:59,241 - root - INFO - Step 13710: lr=1.53E-06, loss= 1.0543 (max= 1.4507), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:59,241 - root - INFO - Step 13710: lr=1.53E-06, loss= 1.0543 (max= 1.4507), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:59,241 - root - INFO - Step 13710: lr=1.53E-06, loss= 1.0543 (max= 1.4507), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:16:59,241 - root - INFO - Step 13710: lr=1.53E-06, loss= 1.0543 (max= 1.4507), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:17:31,306 - root - INFO - Step 13720: lr=1.53E-06, loss= 1.0492 (max= 1.5110), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:17:31,306 - root - INFO - Step 13720: lr=1.53E-06, loss= 1.0492 (max= 1.5110), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:17:31,306 - root - INFO - Step 13720: lr=1.53E-06, loss= 1.0492 (max= 1.5110), tps=20441, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:17:31,306 - root - INFO - Step 13720: lr=1.53E-06, loss= 1.0492 (max= 1.5110), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:17:31,306 - root - INFO - Step 13720: lr=1.53E-06, loss= 1.0492 (max= 1.5110), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:17:31,306 - root - INFO - Step 13720: lr=1.53E-06, loss= 1.0492 (max= 1.5110), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:17:31,307 - root - INFO - Step 13720: lr=1.53E-06, loss= 1.0492 (max= 1.5110), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:17:31,307 - root - INFO - Step 13720: lr=1.53E-06, loss= 1.0492 (max= 1.5110), tps=20440, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:03,169 - root - INFO - Step 13730: lr=1.53E-06, loss= 1.0580 (max= 1.5472), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:03,169 - root - INFO - Step 13730: lr=1.53E-06, loss= 1.0580 (max= 1.5472), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:03,169 - root - INFO - Step 13730: lr=1.53E-06, loss= 1.0580 (max= 1.5472), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:03,169 - root - INFO - Step 13730: lr=1.53E-06, loss= 1.0580 (max= 1.5472), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:03,169 - root - INFO - Step 13730: lr=1.53E-06, loss= 1.0580 (max= 1.5472), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:03,169 - root - INFO - Step 13730: lr=1.53E-06, loss= 1.0580 (max= 1.5472), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:03,169 - root - INFO - Step 13730: lr=1.53E-06, loss= 1.0580 (max= 1.5472), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:03,169 - root - INFO - Step 13730: lr=1.53E-06, loss= 1.0580 (max= 1.5472), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:35,077 - root - INFO - Step 13740: lr=1.52E-06, loss= 1.0663 (max= 1.4073), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:35,077 - root - INFO - Step 13740: lr=1.52E-06, loss= 1.0663 (max= 1.4073), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:35,077 - root - INFO - Step 13740: lr=1.52E-06, loss= 1.0663 (max= 1.4073), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:35,077 - root - INFO - Step 13740: lr=1.52E-06, loss= 1.0663 (max= 1.4073), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:35,077 - root - INFO - Step 13740: lr=1.52E-06, loss= 1.0663 (max= 1.4073), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:35,077 - root - INFO - Step 13740: lr=1.52E-06, loss= 1.0663 (max= 1.4073), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:35,077 - root - INFO - Step 13740: lr=1.52E-06, loss= 1.0663 (max= 1.4073), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:35,077 - root - INFO - Step 13740: lr=1.52E-06, loss= 1.0663 (max= 1.4073), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:18:48,338 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:990646 2025-10-26 20:19:06,846 - root - INFO - Step 13750: lr=1.52E-06, loss= 1.0598 (max= 1.5708), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:06,846 - root - INFO - Step 13750: lr=1.52E-06, loss= 1.0598 (max= 1.5708), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:06,846 - root - INFO - Step 13750: lr=1.52E-06, loss= 1.0598 (max= 1.5708), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:06,846 - root - INFO - Step 13750: lr=1.52E-06, loss= 1.0598 (max= 1.5708), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:06,846 - root - INFO - Step 13750: lr=1.52E-06, loss= 1.0598 (max= 1.5708), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:06,846 - root - INFO - Step 13750: lr=1.52E-06, loss= 1.0598 (max= 1.5708), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:06,846 - root - INFO - Step 13750: lr=1.52E-06, loss= 1.0598 (max= 1.5708), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:06,846 - root - INFO - Step 13750: lr=1.52E-06, loss= 1.0598 (max= 1.5708), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:38,646 - root - INFO - Step 13760: lr=1.52E-06, loss= 1.0779 (max= 1.4886), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:38,646 - root - INFO - Step 13760: lr=1.52E-06, loss= 1.0779 (max= 1.4886), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:38,646 - root - INFO - Step 13760: lr=1.52E-06, loss= 1.0779 (max= 1.4886), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:38,646 - root - INFO - Step 13760: lr=1.52E-06, loss= 1.0779 (max= 1.4886), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:38,646 - root - INFO - Step 13760: lr=1.52E-06, loss= 1.0779 (max= 1.4886), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:38,646 - root - INFO - Step 13760: lr=1.52E-06, loss= 1.0779 (max= 1.4886), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:38,646 - root - INFO - Step 13760: lr=1.52E-06, loss= 1.0779 (max= 1.4886), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:19:38,647 - root - INFO - Step 13760: lr=1.52E-06, loss= 1.0779 (max= 1.4886), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:10,558 - root - INFO - Step 13770: lr=1.51E-06, loss= 1.0702 (max= 1.4611), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:10,558 - root - INFO - Step 13770: lr=1.51E-06, loss= 1.0702 (max= 1.4611), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:10,558 - root - INFO - Step 13770: lr=1.51E-06, loss= 1.0702 (max= 1.4611), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:10,558 - root - INFO - Step 13770: lr=1.51E-06, loss= 1.0702 (max= 1.4611), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:10,558 - root - INFO - Step 13770: lr=1.51E-06, loss= 1.0702 (max= 1.4611), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:10,558 - root - INFO - Step 13770: lr=1.51E-06, loss= 1.0702 (max= 1.4611), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:10,558 - root - INFO - Step 13770: lr=1.51E-06, loss= 1.0702 (max= 1.4611), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:10,558 - root - INFO - Step 13770: lr=1.51E-06, loss= 1.0702 (max= 1.4611), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:42,513 - root - INFO - Step 13780: lr=1.51E-06, loss= 1.0544 (max= 1.4904), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:42,513 - root - INFO - Step 13780: lr=1.51E-06, loss= 1.0544 (max= 1.4904), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:42,513 - root - INFO - Step 13780: lr=1.51E-06, loss= 1.0544 (max= 1.4904), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:42,513 - root - INFO - Step 13780: lr=1.51E-06, loss= 1.0544 (max= 1.4904), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:42,513 - root - INFO - Step 13780: lr=1.51E-06, loss= 1.0544 (max= 1.4904), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:42,513 - root - INFO - Step 13780: lr=1.51E-06, loss= 1.0544 (max= 1.4904), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:42,513 - root - INFO - Step 13780: lr=1.51E-06, loss= 1.0544 (max= 1.4904), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:20:42,513 - root - INFO - Step 13780: lr=1.51E-06, loss= 1.0544 (max= 1.4904), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:14,365 - root - INFO - Step 13790: lr=1.51E-06, loss= 1.0813 (max= 1.4335), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:14,366 - root - INFO - Step 13790: lr=1.51E-06, loss= 1.0813 (max= 1.4335), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:14,366 - root - INFO - Step 13790: lr=1.51E-06, loss= 1.0813 (max= 1.4335), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:14,365 - root - INFO - Step 13790: lr=1.51E-06, loss= 1.0813 (max= 1.4335), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:14,366 - root - INFO - Step 13790: lr=1.51E-06, loss= 1.0813 (max= 1.4335), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:14,366 - root - INFO - Step 13790: lr=1.51E-06, loss= 1.0813 (max= 1.4335), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:14,366 - root - INFO - Step 13790: lr=1.51E-06, loss= 1.0813 (max= 1.4335), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:14,366 - root - INFO - Step 13790: lr=1.51E-06, loss= 1.0813 (max= 1.4335), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:46,246 - root - INFO - Step 13800: lr=1.50E-06, loss= 1.0511 (max= 1.4923), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:46,246 - root - INFO - Step 13800: lr=1.50E-06, loss= 1.0511 (max= 1.4923), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:46,246 - root - INFO - Step 13800: lr=1.50E-06, loss= 1.0511 (max= 1.4923), tps=20559, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:46,246 - root - INFO - Step 13800: lr=1.50E-06, loss= 1.0511 (max= 1.4923), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:46,246 - root - INFO - Step 13800: lr=1.50E-06, loss= 1.0511 (max= 1.4923), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:46,246 - root - INFO - Step 13800: lr=1.50E-06, loss= 1.0511 (max= 1.4923), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:46,246 - root - INFO - Step 13800: lr=1.50E-06, loss= 1.0511 (max= 1.4923), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:21:46,246 - root - INFO - Step 13800: lr=1.50E-06, loss= 1.0511 (max= 1.4923), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:18,093 - root - INFO - Step 13810: lr=1.50E-06, loss= 1.0742 (max= 1.4914), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:18,093 - root - INFO - Step 13810: lr=1.50E-06, loss= 1.0742 (max= 1.4914), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:18,094 - root - INFO - Step 13810: lr=1.50E-06, loss= 1.0742 (max= 1.4914), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:18,094 - root - INFO - Step 13810: lr=1.50E-06, loss= 1.0742 (max= 1.4914), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:18,094 - root - INFO - Step 13810: lr=1.50E-06, loss= 1.0742 (max= 1.4914), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:18,094 - root - INFO - Step 13810: lr=1.50E-06, loss= 1.0742 (max= 1.4914), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:18,094 - root - INFO - Step 13810: lr=1.50E-06, loss= 1.0742 (max= 1.4914), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:18,094 - root - INFO - Step 13810: lr=1.50E-06, loss= 1.0742 (max= 1.4914), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:50,014 - root - INFO - Step 13820: lr=1.50E-06, loss= 1.0629 (max= 1.7215), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:50,014 - root - INFO - Step 13820: lr=1.50E-06, loss= 1.0629 (max= 1.7215), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:50,014 - root - INFO - Step 13820: lr=1.50E-06, loss= 1.0629 (max= 1.7215), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:50,014 - root - INFO - Step 13820: lr=1.50E-06, loss= 1.0629 (max= 1.7215), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:50,014 - root - INFO - Step 13820: lr=1.50E-06, loss= 1.0629 (max= 1.7215), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:50,014 - root - INFO - Step 13820: lr=1.50E-06, loss= 1.0629 (max= 1.7215), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:50,014 - root - INFO - Step 13820: lr=1.50E-06, loss= 1.0629 (max= 1.7215), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:22:50,014 - root - INFO - Step 13820: lr=1.50E-06, loss= 1.0629 (max= 1.7215), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:21,818 - root - INFO - Step 13830: lr=1.49E-06, loss= 1.0612 (max= 1.7365), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:21,818 - root - INFO - Step 13830: lr=1.49E-06, loss= 1.0612 (max= 1.7365), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:21,818 - root - INFO - Step 13830: lr=1.49E-06, loss= 1.0612 (max= 1.7365), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:21,818 - root - INFO - Step 13830: lr=1.49E-06, loss= 1.0612 (max= 1.7365), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:21,818 - root - INFO - Step 13830: lr=1.49E-06, loss= 1.0612 (max= 1.7365), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:21,818 - root - INFO - Step 13830: lr=1.49E-06, loss= 1.0612 (max= 1.7365), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:21,818 - root - INFO - Step 13830: lr=1.49E-06, loss= 1.0612 (max= 1.7365), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:21,818 - root - INFO - Step 13830: lr=1.49E-06, loss= 1.0612 (max= 1.7365), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:53,623 - root - INFO - Step 13840: lr=1.49E-06, loss= 1.0498 (max= 1.5343), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:53,623 - root - INFO - Step 13840: lr=1.49E-06, loss= 1.0498 (max= 1.5343), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:53,623 - root - INFO - Step 13840: lr=1.49E-06, loss= 1.0498 (max= 1.5343), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:53,623 - root - INFO - Step 13840: lr=1.49E-06, loss= 1.0498 (max= 1.5343), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:53,623 - root - INFO - Step 13840: lr=1.49E-06, loss= 1.0498 (max= 1.5343), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:53,623 - root - INFO - Step 13840: lr=1.49E-06, loss= 1.0498 (max= 1.5343), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:53,623 - root - INFO - Step 13840: lr=1.49E-06, loss= 1.0498 (max= 1.5343), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:23:53,623 - root - INFO - Step 13840: lr=1.49E-06, loss= 1.0498 (max= 1.5343), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:25,441 - root - INFO - Step 13850: lr=1.49E-06, loss= 1.0713 (max= 1.5002), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:25,441 - root - INFO - Step 13850: lr=1.49E-06, loss= 1.0713 (max= 1.5002), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:25,441 - root - INFO - Step 13850: lr=1.49E-06, loss= 1.0713 (max= 1.5002), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:25,441 - root - INFO - Step 13850: lr=1.49E-06, loss= 1.0713 (max= 1.5002), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:25,441 - root - INFO - Step 13850: lr=1.49E-06, loss= 1.0713 (max= 1.5002), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:25,441 - root - INFO - Step 13850: lr=1.49E-06, loss= 1.0713 (max= 1.5002), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:25,441 - root - INFO - Step 13850: lr=1.49E-06, loss= 1.0713 (max= 1.5002), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:25,441 - root - INFO - Step 13850: lr=1.49E-06, loss= 1.0713 (max= 1.5002), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:57,235 - root - INFO - Step 13860: lr=1.48E-06, loss= 1.0772 (max= 1.5363), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:57,235 - root - INFO - Step 13860: lr=1.48E-06, loss= 1.0772 (max= 1.5363), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:57,235 - root - INFO - Step 13860: lr=1.48E-06, loss= 1.0772 (max= 1.5363), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:57,235 - root - INFO - Step 13860: lr=1.48E-06, loss= 1.0772 (max= 1.5363), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:57,235 - root - INFO - Step 13860: lr=1.48E-06, loss= 1.0772 (max= 1.5363), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:57,235 - root - INFO - Step 13860: lr=1.48E-06, loss= 1.0772 (max= 1.5363), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:57,235 - root - INFO - Step 13860: lr=1.48E-06, loss= 1.0772 (max= 1.5363), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:24:57,235 - root - INFO - Step 13860: lr=1.48E-06, loss= 1.0772 (max= 1.5363), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:25:29,246 - root - INFO - Step 13870: lr=1.48E-06, loss= 1.0883 (max= 1.5382), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:25:29,246 - root - INFO - Step 13870: lr=1.48E-06, loss= 1.0883 (max= 1.5382), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:25:29,246 - root - INFO - Step 13870: lr=1.48E-06, loss= 1.0883 (max= 1.5382), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:25:29,246 - root - INFO - Step 13870: lr=1.48E-06, loss= 1.0883 (max= 1.5382), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:25:29,246 - root - INFO - Step 13870: lr=1.48E-06, loss= 1.0883 (max= 1.5382), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:25:29,246 - root - INFO - Step 13870: lr=1.48E-06, loss= 1.0883 (max= 1.5382), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:25:29,246 - root - INFO - Step 13870: lr=1.48E-06, loss= 1.0883 (max= 1.5382), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:25:29,246 - root - INFO - Step 13870: lr=1.48E-06, loss= 1.0883 (max= 1.5382), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:00,994 - root - INFO - Step 13880: lr=1.48E-06, loss= 1.0698 (max= 1.6098), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:00,995 - root - INFO - Step 13880: lr=1.48E-06, loss= 1.0698 (max= 1.6098), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:00,995 - root - INFO - Step 13880: lr=1.48E-06, loss= 1.0698 (max= 1.6098), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:00,995 - root - INFO - Step 13880: lr=1.48E-06, loss= 1.0698 (max= 1.6098), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:00,995 - root - INFO - Step 13880: lr=1.48E-06, loss= 1.0698 (max= 1.6098), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:00,995 - root - INFO - Step 13880: lr=1.48E-06, loss= 1.0698 (max= 1.6098), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:00,995 - root - INFO - Step 13880: lr=1.48E-06, loss= 1.0698 (max= 1.6098), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:00,996 - root - INFO - Step 13880: lr=1.48E-06, loss= 1.0698 (max= 1.6098), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:32,809 - root - INFO - Step 13890: lr=1.48E-06, loss= 1.0814 (max= 1.6441), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:32,809 - root - INFO - Step 13890: lr=1.48E-06, loss= 1.0814 (max= 1.6441), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:32,809 - root - INFO - Step 13890: lr=1.48E-06, loss= 1.0814 (max= 1.6441), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:32,809 - root - INFO - Step 13890: lr=1.48E-06, loss= 1.0814 (max= 1.6441), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:32,809 - root - INFO - Step 13890: lr=1.48E-06, loss= 1.0814 (max= 1.6441), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:32,809 - root - INFO - Step 13890: lr=1.48E-06, loss= 1.0814 (max= 1.6441), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:32,809 - root - INFO - Step 13890: lr=1.48E-06, loss= 1.0814 (max= 1.6441), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:26:32,809 - root - INFO - Step 13890: lr=1.48E-06, loss= 1.0814 (max= 1.6441), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:04,698 - root - INFO - Step 13900: lr=1.47E-06, loss= 1.0365 (max= 1.5237), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:04,698 - root - INFO - Step 13900: lr=1.47E-06, loss= 1.0365 (max= 1.5237), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:04,698 - root - INFO - Step 13900: lr=1.47E-06, loss= 1.0365 (max= 1.5237), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:04,698 - root - INFO - Step 13900: lr=1.47E-06, loss= 1.0365 (max= 1.5237), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:04,698 - root - INFO - Step 13900: lr=1.47E-06, loss= 1.0365 (max= 1.5237), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:04,698 - root - INFO - Step 13900: lr=1.47E-06, loss= 1.0365 (max= 1.5237), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:04,698 - root - INFO - Step 13900: lr=1.47E-06, loss= 1.0365 (max= 1.5237), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:04,698 - root - INFO - Step 13900: lr=1.47E-06, loss= 1.0365 (max= 1.5237), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:36,540 - root - INFO - Step 13910: lr=1.47E-06, loss= 1.0586 (max= 1.4397), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:36,540 - root - INFO - Step 13910: lr=1.47E-06, loss= 1.0586 (max= 1.4397), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:36,540 - root - INFO - Step 13910: lr=1.47E-06, loss= 1.0586 (max= 1.4397), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:36,540 - root - INFO - Step 13910: lr=1.47E-06, loss= 1.0586 (max= 1.4397), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:36,540 - root - INFO - Step 13910: lr=1.47E-06, loss= 1.0586 (max= 1.4397), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:36,540 - root - INFO - Step 13910: lr=1.47E-06, loss= 1.0586 (max= 1.4397), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:36,540 - root - INFO - Step 13910: lr=1.47E-06, loss= 1.0586 (max= 1.4397), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:27:36,540 - root - INFO - Step 13910: lr=1.47E-06, loss= 1.0586 (max= 1.4397), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:08,498 - root - INFO - Step 13920: lr=1.47E-06, loss= 1.0611 (max= 1.6663), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:08,498 - root - INFO - Step 13920: lr=1.47E-06, loss= 1.0611 (max= 1.6663), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:08,499 - root - INFO - Step 13920: lr=1.47E-06, loss= 1.0611 (max= 1.6663), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:08,499 - root - INFO - Step 13920: lr=1.47E-06, loss= 1.0611 (max= 1.6663), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:08,499 - root - INFO - Step 13920: lr=1.47E-06, loss= 1.0611 (max= 1.6663), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:08,499 - root - INFO - Step 13920: lr=1.47E-06, loss= 1.0611 (max= 1.6663), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:08,499 - root - INFO - Step 13920: lr=1.47E-06, loss= 1.0611 (max= 1.6663), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:08,499 - root - INFO - Step 13920: lr=1.47E-06, loss= 1.0611 (max= 1.6663), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:40,384 - root - INFO - Step 13930: lr=1.46E-06, loss= 1.0642 (max= 1.6357), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:40,384 - root - INFO - Step 13930: lr=1.46E-06, loss= 1.0642 (max= 1.6357), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:40,384 - root - INFO - Step 13930: lr=1.46E-06, loss= 1.0642 (max= 1.6357), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:40,385 - root - INFO - Step 13930: lr=1.46E-06, loss= 1.0642 (max= 1.6357), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:40,385 - root - INFO - Step 13930: lr=1.46E-06, loss= 1.0642 (max= 1.6357), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:40,385 - root - INFO - Step 13930: lr=1.46E-06, loss= 1.0642 (max= 1.6357), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:40,385 - root - INFO - Step 13930: lr=1.46E-06, loss= 1.0642 (max= 1.6357), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:28:40,385 - root - INFO - Step 13930: lr=1.46E-06, loss= 1.0642 (max= 1.6357), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:12,189 - root - INFO - Step 13940: lr=1.46E-06, loss= 1.0390 (max= 1.4817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:12,189 - root - INFO - Step 13940: lr=1.46E-06, loss= 1.0390 (max= 1.4817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:12,189 - root - INFO - Step 13940: lr=1.46E-06, loss= 1.0390 (max= 1.4817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:12,189 - root - INFO - Step 13940: lr=1.46E-06, loss= 1.0390 (max= 1.4817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:12,189 - root - INFO - Step 13940: lr=1.46E-06, loss= 1.0390 (max= 1.4817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:12,189 - root - INFO - Step 13940: lr=1.46E-06, loss= 1.0390 (max= 1.4817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:12,189 - root - INFO - Step 13940: lr=1.46E-06, loss= 1.0390 (max= 1.4817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:12,189 - root - INFO - Step 13940: lr=1.46E-06, loss= 1.0390 (max= 1.4817), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:44,047 - root - INFO - Step 13950: lr=1.46E-06, loss= 1.0772 (max= 1.4959), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:44,047 - root - INFO - Step 13950: lr=1.46E-06, loss= 1.0772 (max= 1.4959), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:44,047 - root - INFO - Step 13950: lr=1.46E-06, loss= 1.0772 (max= 1.4959), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:44,047 - root - INFO - Step 13950: lr=1.46E-06, loss= 1.0772 (max= 1.4959), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:44,047 - root - INFO - Step 13950: lr=1.46E-06, loss= 1.0772 (max= 1.4959), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:44,047 - root - INFO - Step 13950: lr=1.46E-06, loss= 1.0772 (max= 1.4959), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:44,047 - root - INFO - Step 13950: lr=1.46E-06, loss= 1.0772 (max= 1.4959), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:29:44,047 - root - INFO - Step 13950: lr=1.46E-06, loss= 1.0772 (max= 1.4959), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:15,937 - root - INFO - Step 13960: lr=1.45E-06, loss= 1.0734 (max= 1.6266), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:15,937 - root - INFO - Step 13960: lr=1.45E-06, loss= 1.0734 (max= 1.6266), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:15,937 - root - INFO - Step 13960: lr=1.45E-06, loss= 1.0734 (max= 1.6266), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:15,937 - root - INFO - Step 13960: lr=1.45E-06, loss= 1.0734 (max= 1.6266), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:15,937 - root - INFO - Step 13960: lr=1.45E-06, loss= 1.0734 (max= 1.6266), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:15,937 - root - INFO - Step 13960: lr=1.45E-06, loss= 1.0734 (max= 1.6266), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:15,937 - root - INFO - Step 13960: lr=1.45E-06, loss= 1.0734 (max= 1.6266), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:15,938 - root - INFO - Step 13960: lr=1.45E-06, loss= 1.0734 (max= 1.6266), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:47,787 - root - INFO - Step 13970: lr=1.45E-06, loss= 1.0478 (max= 1.7570), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:47,787 - root - INFO - Step 13970: lr=1.45E-06, loss= 1.0478 (max= 1.7570), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:47,787 - root - INFO - Step 13970: lr=1.45E-06, loss= 1.0478 (max= 1.7570), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:47,787 - root - INFO - Step 13970: lr=1.45E-06, loss= 1.0478 (max= 1.7570), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:47,787 - root - INFO - Step 13970: lr=1.45E-06, loss= 1.0478 (max= 1.7570), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:47,787 - root - INFO - Step 13970: lr=1.45E-06, loss= 1.0478 (max= 1.7570), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:47,787 - root - INFO - Step 13970: lr=1.45E-06, loss= 1.0478 (max= 1.7570), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:30:47,787 - root - INFO - Step 13970: lr=1.45E-06, loss= 1.0478 (max= 1.7570), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:19,643 - root - INFO - Step 13980: lr=1.45E-06, loss= 1.0666 (max= 1.5497), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:19,643 - root - INFO - Step 13980: lr=1.45E-06, loss= 1.0666 (max= 1.5497), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:19,643 - root - INFO - Step 13980: lr=1.45E-06, loss= 1.0666 (max= 1.5497), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:19,643 - root - INFO - Step 13980: lr=1.45E-06, loss= 1.0666 (max= 1.5497), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:19,643 - root - INFO - Step 13980: lr=1.45E-06, loss= 1.0666 (max= 1.5497), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:19,643 - root - INFO - Step 13980: lr=1.45E-06, loss= 1.0666 (max= 1.5497), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:19,644 - root - INFO - Step 13980: lr=1.45E-06, loss= 1.0666 (max= 1.5497), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:19,644 - root - INFO - Step 13980: lr=1.45E-06, loss= 1.0666 (max= 1.5497), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:51,582 - root - INFO - Step 13990: lr=1.44E-06, loss= 1.0642 (max= 1.8233), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:51,582 - root - INFO - Step 13990: lr=1.44E-06, loss= 1.0642 (max= 1.8233), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:51,582 - root - INFO - Step 13990: lr=1.44E-06, loss= 1.0642 (max= 1.8233), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:51,582 - root - INFO - Step 13990: lr=1.44E-06, loss= 1.0642 (max= 1.8233), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:51,582 - root - INFO - Step 13990: lr=1.44E-06, loss= 1.0642 (max= 1.8233), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:51,582 - root - INFO - Step 13990: lr=1.44E-06, loss= 1.0642 (max= 1.8233), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:51,582 - root - INFO - Step 13990: lr=1.44E-06, loss= 1.0642 (max= 1.8233), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:31:51,582 - root - INFO - Step 13990: lr=1.44E-06, loss= 1.0642 (max= 1.8233), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-14000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-14000! Save time: 4.3417439460754395 2025-10-26 20:32:23,381 - root - INFO - Step 14000: lr=1.44E-06, loss= 1.0508 (max= 1.5093), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:32:23,381 - root - INFO - Step 14000: lr=1.44E-06, loss= 1.0508 (max= 1.5093), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:32:23,381 - root - INFO - Step 14000: lr=1.44E-06, loss= 1.0508 (max= 1.5093), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:32:23,381 - root - INFO - Saving a full checkpoint at step 14000 2025-10-26 20:32:23,381 - root - INFO - Saving a full checkpoint at step 14000 2025-10-26 20:32:23,381 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 20:32:23,381 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 20:32:23,381 - root - INFO - Saving a full checkpoint at step 14000 2025-10-26 20:32:23,381 - root - INFO - Step 14000: lr=1.44E-06, loss= 1.0508 (max= 1.5093), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:32:23,381 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 20:32:23,381 - root - INFO - Step 14000: lr=1.44E-06, loss= 1.0508 (max= 1.5093), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:32:23,381 - root - INFO - Step 14000: lr=1.44E-06, loss= 1.0508 (max= 1.5093), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:32:23,381 - root - INFO - Saving a full checkpoint at step 14000 2025-10-26 20:32:23,381 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 20:32:23,381 - root - INFO - Step 14000: lr=1.44E-06, loss= 1.0508 (max= 1.5093), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:32:23,381 - root - INFO - Step 14000: lr=1.44E-06, loss= 1.0508 (max= 1.5093), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:32:23,381 - root - INFO - Saving a full checkpoint at step 14000 2025-10-26 20:32:23,381 - root - INFO - Saving a full checkpoint at step 14000 2025-10-26 20:32:23,381 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 20:32:23,381 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 20:32:23,381 - root - INFO - Saving a full checkpoint at step 14000 2025-10-26 20:32:23,381 - root - INFO - Saving a full checkpoint at step 14000 2025-10-26 20:32:23,381 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 20:32:23,381 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 20:32:36,963 - root - INFO - Finished saving the checkpoint in 13.58 seconds 2025-10-26 20:32:36,971 - root - INFO - Finished saving the checkpoint in 13.59 seconds 2025-10-26 20:32:36,971 - root - INFO - Finished saving the checkpoint in 13.59 seconds 2025-10-26 20:32:36,971 - root - INFO - Finished saving the checkpoint in 13.59 seconds 2025-10-26 20:32:36,971 - root - INFO - Finished saving the checkpoint in 13.59 seconds 2025-10-26 20:32:36,971 - root - INFO - Finished saving the checkpoint in 13.59 seconds 2025-10-26 20:32:36,972 - root - INFO - Finished saving the checkpoint in 13.59 seconds 2025-10-26 20:32:36,973 - root - INFO - Finished saving the checkpoint in 13.59 seconds 2025-10-26 20:33:08,762 - root - INFO - Step 14010: lr=1.44E-06, loss= 1.0835 (max= 1.6911), tps=14442, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:08,762 - root - INFO - Step 14010: lr=1.44E-06, loss= 1.0835 (max= 1.6911), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:08,762 - root - INFO - Step 14010: lr=1.44E-06, loss= 1.0835 (max= 1.6911), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:08,762 - root - INFO - Step 14010: lr=1.44E-06, loss= 1.0835 (max= 1.6911), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:08,762 - root - INFO - Step 14010: lr=1.44E-06, loss= 1.0835 (max= 1.6911), tps=14443, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:08,762 - root - INFO - Step 14010: lr=1.44E-06, loss= 1.0835 (max= 1.6911), tps=14442, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:08,762 - root - INFO - Step 14010: lr=1.44E-06, loss= 1.0835 (max= 1.6911), tps=14442, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:08,762 - root - INFO - Step 14010: lr=1.44E-06, loss= 1.0835 (max= 1.6911), tps=14442, mfu=30.09%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:40,755 - root - INFO - Step 14020: lr=1.43E-06, loss= 1.0675 (max= 1.8552), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:40,755 - root - INFO - Step 14020: lr=1.43E-06, loss= 1.0675 (max= 1.8552), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:40,755 - root - INFO - Step 14020: lr=1.43E-06, loss= 1.0675 (max= 1.8552), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:40,755 - root - INFO - Step 14020: lr=1.43E-06, loss= 1.0675 (max= 1.8552), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:40,755 - root - INFO - Step 14020: lr=1.43E-06, loss= 1.0675 (max= 1.8552), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:40,755 - root - INFO - Step 14020: lr=1.43E-06, loss= 1.0675 (max= 1.8552), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:40,755 - root - INFO - Step 14020: lr=1.43E-06, loss= 1.0675 (max= 1.8552), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:33:40,755 - root - INFO - Step 14020: lr=1.43E-06, loss= 1.0675 (max= 1.8552), tps=20487, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:12,601 - root - INFO - Step 14030: lr=1.43E-06, loss= 1.0776 (max= 1.6897), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:12,602 - root - INFO - Step 14030: lr=1.43E-06, loss= 1.0776 (max= 1.6897), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:12,602 - root - INFO - Step 14030: lr=1.43E-06, loss= 1.0776 (max= 1.6897), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:12,602 - root - INFO - Step 14030: lr=1.43E-06, loss= 1.0776 (max= 1.6897), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:12,602 - root - INFO - Step 14030: lr=1.43E-06, loss= 1.0776 (max= 1.6897), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:12,602 - root - INFO - Step 14030: lr=1.43E-06, loss= 1.0776 (max= 1.6897), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:12,602 - root - INFO - Step 14030: lr=1.43E-06, loss= 1.0776 (max= 1.6897), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:12,602 - root - INFO - Step 14030: lr=1.43E-06, loss= 1.0776 (max= 1.6897), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:44,651 - root - INFO - Step 14040: lr=1.43E-06, loss= 1.0900 (max= 1.5711), tps=20450, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:44,651 - root - INFO - Step 14040: lr=1.43E-06, loss= 1.0900 (max= 1.5711), tps=20450, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:44,651 - root - INFO - Step 14040: lr=1.43E-06, loss= 1.0900 (max= 1.5711), tps=20450, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:44,652 - root - INFO - Step 14040: lr=1.43E-06, loss= 1.0900 (max= 1.5711), tps=20450, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:44,652 - root - INFO - Step 14040: lr=1.43E-06, loss= 1.0900 (max= 1.5711), tps=20450, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:44,652 - root - INFO - Step 14040: lr=1.43E-06, loss= 1.0900 (max= 1.5711), tps=20450, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:44,652 - root - INFO - Step 14040: lr=1.43E-06, loss= 1.0900 (max= 1.5711), tps=20450, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:34:44,652 - root - INFO - Step 14040: lr=1.43E-06, loss= 1.0900 (max= 1.5711), tps=20450, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:16,561 - root - INFO - Step 14050: lr=1.42E-06, loss= 1.0974 (max= 1.7683), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:16,561 - root - INFO - Step 14050: lr=1.42E-06, loss= 1.0974 (max= 1.7683), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:16,561 - root - INFO - Step 14050: lr=1.42E-06, loss= 1.0974 (max= 1.7683), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:16,561 - root - INFO - Step 14050: lr=1.42E-06, loss= 1.0974 (max= 1.7683), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:16,561 - root - INFO - Step 14050: lr=1.42E-06, loss= 1.0974 (max= 1.7683), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:16,561 - root - INFO - Step 14050: lr=1.42E-06, loss= 1.0974 (max= 1.7683), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:16,561 - root - INFO - Step 14050: lr=1.42E-06, loss= 1.0974 (max= 1.7683), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:16,561 - root - INFO - Step 14050: lr=1.42E-06, loss= 1.0974 (max= 1.7683), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:48,428 - root - INFO - Step 14060: lr=1.42E-06, loss= 1.1003 (max= 1.7985), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:48,428 - root - INFO - Step 14060: lr=1.42E-06, loss= 1.1003 (max= 1.7985), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:48,428 - root - INFO - Step 14060: lr=1.42E-06, loss= 1.1003 (max= 1.7985), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:48,428 - root - INFO - Step 14060: lr=1.42E-06, loss= 1.1003 (max= 1.7985), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:48,428 - root - INFO - Step 14060: lr=1.42E-06, loss= 1.1003 (max= 1.7985), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:48,428 - root - INFO - Step 14060: lr=1.42E-06, loss= 1.1003 (max= 1.7985), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:48,428 - root - INFO - Step 14060: lr=1.42E-06, loss= 1.1003 (max= 1.7985), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:35:48,428 - root - INFO - Step 14060: lr=1.42E-06, loss= 1.1003 (max= 1.7985), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:20,254 - root - INFO - Step 14070: lr=1.42E-06, loss= 1.0925 (max= 1.7839), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:20,254 - root - INFO - Step 14070: lr=1.42E-06, loss= 1.0925 (max= 1.7839), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:20,254 - root - INFO - Step 14070: lr=1.42E-06, loss= 1.0925 (max= 1.7839), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:20,254 - root - INFO - Step 14070: lr=1.42E-06, loss= 1.0925 (max= 1.7839), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:20,254 - root - INFO - Step 14070: lr=1.42E-06, loss= 1.0925 (max= 1.7839), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:20,254 - root - INFO - Step 14070: lr=1.42E-06, loss= 1.0925 (max= 1.7839), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:20,254 - root - INFO - Step 14070: lr=1.42E-06, loss= 1.0925 (max= 1.7839), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:20,254 - root - INFO - Step 14070: lr=1.42E-06, loss= 1.0925 (max= 1.7839), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:52,119 - root - INFO - Step 14080: lr=1.42E-06, loss= 1.0833 (max= 1.5719), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:52,119 - root - INFO - Step 14080: lr=1.42E-06, loss= 1.0833 (max= 1.5719), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:52,119 - root - INFO - Step 14080: lr=1.42E-06, loss= 1.0833 (max= 1.5719), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:52,119 - root - INFO - Step 14080: lr=1.42E-06, loss= 1.0833 (max= 1.5719), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:52,119 - root - INFO - Step 14080: lr=1.42E-06, loss= 1.0833 (max= 1.5719), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:52,119 - root - INFO - Step 14080: lr=1.42E-06, loss= 1.0833 (max= 1.5719), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:52,119 - root - INFO - Step 14080: lr=1.42E-06, loss= 1.0833 (max= 1.5719), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:36:52,119 - root - INFO - Step 14080: lr=1.42E-06, loss= 1.0833 (max= 1.5719), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:24,026 - root - INFO - Step 14090: lr=1.41E-06, loss= 1.1129 (max= 1.7922), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:24,026 - root - INFO - Step 14090: lr=1.41E-06, loss= 1.1129 (max= 1.7922), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:24,026 - root - INFO - Step 14090: lr=1.41E-06, loss= 1.1129 (max= 1.7922), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:24,026 - root - INFO - Step 14090: lr=1.41E-06, loss= 1.1129 (max= 1.7922), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:24,026 - root - INFO - Step 14090: lr=1.41E-06, loss= 1.1129 (max= 1.7922), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:24,026 - root - INFO - Step 14090: lr=1.41E-06, loss= 1.1129 (max= 1.7922), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:24,026 - root - INFO - Step 14090: lr=1.41E-06, loss= 1.1129 (max= 1.7922), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:24,026 - root - INFO - Step 14090: lr=1.41E-06, loss= 1.1129 (max= 1.7922), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:55,848 - root - INFO - Step 14100: lr=1.41E-06, loss= 1.1187 (max= 2.0004), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:55,848 - root - INFO - Step 14100: lr=1.41E-06, loss= 1.1187 (max= 2.0004), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:55,848 - root - INFO - Step 14100: lr=1.41E-06, loss= 1.1187 (max= 2.0004), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:55,848 - root - INFO - Step 14100: lr=1.41E-06, loss= 1.1187 (max= 2.0004), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:55,848 - root - INFO - Step 14100: lr=1.41E-06, loss= 1.1187 (max= 2.0004), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:55,848 - root - INFO - Step 14100: lr=1.41E-06, loss= 1.1187 (max= 2.0004), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:55,848 - root - INFO - Step 14100: lr=1.41E-06, loss= 1.1187 (max= 2.0004), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:37:55,848 - root - INFO - Step 14100: lr=1.41E-06, loss= 1.1187 (max= 2.0004), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:27,661 - root - INFO - Step 14110: lr=1.41E-06, loss= 1.1220 (max= 2.0399), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:27,661 - root - INFO - Step 14110: lr=1.41E-06, loss= 1.1220 (max= 2.0399), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:27,661 - root - INFO - Step 14110: lr=1.41E-06, loss= 1.1220 (max= 2.0399), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:27,661 - root - INFO - Step 14110: lr=1.41E-06, loss= 1.1220 (max= 2.0399), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:27,661 - root - INFO - Step 14110: lr=1.41E-06, loss= 1.1220 (max= 2.0399), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:27,661 - root - INFO - Step 14110: lr=1.41E-06, loss= 1.1220 (max= 2.0399), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:27,661 - root - INFO - Step 14110: lr=1.41E-06, loss= 1.1220 (max= 2.0399), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:27,661 - root - INFO - Step 14110: lr=1.41E-06, loss= 1.1220 (max= 2.0399), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:59,439 - root - INFO - Step 14120: lr=1.40E-06, loss= 1.1079 (max= 1.7779), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:59,439 - root - INFO - Step 14120: lr=1.40E-06, loss= 1.1079 (max= 1.7779), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:59,439 - root - INFO - Step 14120: lr=1.40E-06, loss= 1.1079 (max= 1.7779), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:59,439 - root - INFO - Step 14120: lr=1.40E-06, loss= 1.1079 (max= 1.7779), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:59,439 - root - INFO - Step 14120: lr=1.40E-06, loss= 1.1079 (max= 1.7779), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:59,439 - root - INFO - Step 14120: lr=1.40E-06, loss= 1.1079 (max= 1.7779), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:59,439 - root - INFO - Step 14120: lr=1.40E-06, loss= 1.1079 (max= 1.7779), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:38:59,439 - root - INFO - Step 14120: lr=1.40E-06, loss= 1.1079 (max= 1.7779), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:39:31,313 - root - INFO - Step 14130: lr=1.40E-06, loss= 1.1123 (max= 1.9849), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:39:31,313 - root - INFO - Step 14130: lr=1.40E-06, loss= 1.1123 (max= 1.9849), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:39:31,313 - root - INFO - Step 14130: lr=1.40E-06, loss= 1.1123 (max= 1.9849), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:39:31,313 - root - INFO - Step 14130: lr=1.40E-06, loss= 1.1123 (max= 1.9849), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:39:31,313 - root - INFO - Step 14130: lr=1.40E-06, loss= 1.1123 (max= 1.9849), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:39:31,313 - root - INFO - Step 14130: lr=1.40E-06, loss= 1.1123 (max= 1.9849), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:39:31,313 - root - INFO - Step 14130: lr=1.40E-06, loss= 1.1123 (max= 1.9849), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:39:31,313 - root - INFO - Step 14130: lr=1.40E-06, loss= 1.1123 (max= 1.9849), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:03,228 - root - INFO - Step 14140: lr=1.40E-06, loss= 1.1038 (max= 1.9657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:03,228 - root - INFO - Step 14140: lr=1.40E-06, loss= 1.1038 (max= 1.9657), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:03,228 - root - INFO - Step 14140: lr=1.40E-06, loss= 1.1038 (max= 1.9657), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:03,228 - root - INFO - Step 14140: lr=1.40E-06, loss= 1.1038 (max= 1.9657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:03,228 - root - INFO - Step 14140: lr=1.40E-06, loss= 1.1038 (max= 1.9657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:03,228 - root - INFO - Step 14140: lr=1.40E-06, loss= 1.1038 (max= 1.9657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:03,228 - root - INFO - Step 14140: lr=1.40E-06, loss= 1.1038 (max= 1.9657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:03,228 - root - INFO - Step 14140: lr=1.40E-06, loss= 1.1038 (max= 1.9657), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:35,090 - root - INFO - Step 14150: lr=1.39E-06, loss= 1.1022 (max= 1.8610), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:35,090 - root - INFO - Step 14150: lr=1.39E-06, loss= 1.1022 (max= 1.8610), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:35,090 - root - INFO - Step 14150: lr=1.39E-06, loss= 1.1022 (max= 1.8610), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:35,090 - root - INFO - Step 14150: lr=1.39E-06, loss= 1.1022 (max= 1.8610), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:35,090 - root - INFO - Step 14150: lr=1.39E-06, loss= 1.1022 (max= 1.8610), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:35,090 - root - INFO - Step 14150: lr=1.39E-06, loss= 1.1022 (max= 1.8610), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:35,090 - root - INFO - Step 14150: lr=1.39E-06, loss= 1.1022 (max= 1.8610), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:40:35,090 - root - INFO - Step 14150: lr=1.39E-06, loss= 1.1022 (max= 1.8610), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:06,941 - root - INFO - Step 14160: lr=1.39E-06, loss= 1.0993 (max= 1.8797), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:06,941 - root - INFO - Step 14160: lr=1.39E-06, loss= 1.0993 (max= 1.8797), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:06,941 - root - INFO - Step 14160: lr=1.39E-06, loss= 1.0993 (max= 1.8797), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:06,941 - root - INFO - Step 14160: lr=1.39E-06, loss= 1.0993 (max= 1.8797), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:06,941 - root - INFO - Step 14160: lr=1.39E-06, loss= 1.0993 (max= 1.8797), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:06,941 - root - INFO - Step 14160: lr=1.39E-06, loss= 1.0993 (max= 1.8797), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:06,941 - root - INFO - Step 14160: lr=1.39E-06, loss= 1.0993 (max= 1.8797), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:06,941 - root - INFO - Step 14160: lr=1.39E-06, loss= 1.0993 (max= 1.8797), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:38,842 - root - INFO - Step 14170: lr=1.39E-06, loss= 1.1002 (max= 1.8527), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:38,842 - root - INFO - Step 14170: lr=1.39E-06, loss= 1.1002 (max= 1.8527), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:38,842 - root - INFO - Step 14170: lr=1.39E-06, loss= 1.1002 (max= 1.8527), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:38,842 - root - INFO - Step 14170: lr=1.39E-06, loss= 1.1002 (max= 1.8527), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:38,842 - root - INFO - Step 14170: lr=1.39E-06, loss= 1.1002 (max= 1.8527), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:38,842 - root - INFO - Step 14170: lr=1.39E-06, loss= 1.1002 (max= 1.8527), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:38,842 - root - INFO - Step 14170: lr=1.39E-06, loss= 1.1002 (max= 1.8527), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:41:38,842 - root - INFO - Step 14170: lr=1.39E-06, loss= 1.1002 (max= 1.8527), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:42:10,734 - root - INFO - Step 14180: lr=1.38E-06, loss= 1.0670 (max= 1.4761), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:42:10,734 - root - INFO - Step 14180: lr=1.38E-06, loss= 1.0670 (max= 1.4761), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:42:10,734 - root - INFO - Step 14180: lr=1.38E-06, loss= 1.0670 (max= 1.4761), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:42:10,734 - root - INFO - Step 14180: lr=1.38E-06, loss= 1.0670 (max= 1.4761), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:42:10,734 - root - INFO - Step 14180: lr=1.38E-06, loss= 1.0670 (max= 1.4761), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:42:10,734 - root - INFO - Step 14180: lr=1.38E-06, loss= 1.0670 (max= 1.4761), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:42:10,734 - root - INFO - Step 14180: lr=1.38E-06, loss= 1.0670 (max= 1.4761), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:42:10,734 - root - INFO - Step 14180: lr=1.38E-06, loss= 1.0670 (max= 1.4761), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:42:42,542 - root - INFO - Step 14190: lr=1.38E-06, loss= 1.0828 (max= 1.6801), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.02s, 1.10%) 2025-10-26 20:42:42,542 - root - INFO - Step 14190: lr=1.38E-06, loss= 1.0828 (max= 1.6801), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.02s, 1.10%) 2025-10-26 20:42:42,542 - root - INFO - Step 14190: lr=1.38E-06, loss= 1.0828 (max= 1.6801), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.02s, 1.10%) 2025-10-26 20:42:42,542 - root - INFO - Step 14190: lr=1.38E-06, loss= 1.0828 (max= 1.6801), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.02s, 1.10%) 2025-10-26 20:42:42,542 - root - INFO - Step 14190: lr=1.38E-06, loss= 1.0828 (max= 1.6801), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.02s, 1.10%) 2025-10-26 20:42:42,542 - root - INFO - Step 14190: lr=1.38E-06, loss= 1.0828 (max= 1.6801), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.02s, 1.10%) 2025-10-26 20:42:42,542 - root - INFO - Step 14190: lr=1.38E-06, loss= 1.0828 (max= 1.6801), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.02s, 1.10%) 2025-10-26 20:42:42,542 - root - INFO - Step 14190: lr=1.38E-06, loss= 1.0828 (max= 1.6801), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.02s, 1.10%) 2025-10-26 20:43:14,388 - root - INFO - Step 14200: lr=1.38E-06, loss= 1.0871 (max= 1.6397), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:14,388 - root - INFO - Step 14200: lr=1.38E-06, loss= 1.0871 (max= 1.6397), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:14,388 - root - INFO - Step 14200: lr=1.38E-06, loss= 1.0871 (max= 1.6397), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:14,388 - root - INFO - Step 14200: lr=1.38E-06, loss= 1.0871 (max= 1.6397), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:14,388 - root - INFO - Step 14200: lr=1.38E-06, loss= 1.0871 (max= 1.6397), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:14,388 - root - INFO - Step 14200: lr=1.38E-06, loss= 1.0871 (max= 1.6397), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:14,388 - root - INFO - Step 14200: lr=1.38E-06, loss= 1.0871 (max= 1.6397), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:14,388 - root - INFO - Step 14200: lr=1.38E-06, loss= 1.0871 (max= 1.6397), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:46,220 - root - INFO - Step 14210: lr=1.37E-06, loss= 1.0772 (max= 2.1196), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:46,221 - root - INFO - Step 14210: lr=1.37E-06, loss= 1.0772 (max= 2.1196), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:46,221 - root - INFO - Step 14210: lr=1.37E-06, loss= 1.0772 (max= 2.1196), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:46,221 - root - INFO - Step 14210: lr=1.37E-06, loss= 1.0772 (max= 2.1196), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:46,221 - root - INFO - Step 14210: lr=1.37E-06, loss= 1.0772 (max= 2.1196), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:46,221 - root - INFO - Step 14210: lr=1.37E-06, loss= 1.0772 (max= 2.1196), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:46,221 - root - INFO - Step 14210: lr=1.37E-06, loss= 1.0772 (max= 2.1196), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:43:46,221 - root - INFO - Step 14210: lr=1.37E-06, loss= 1.0772 (max= 2.1196), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:18,088 - root - INFO - Step 14220: lr=1.37E-06, loss= 1.0917 (max= 1.8600), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:18,088 - root - INFO - Step 14220: lr=1.37E-06, loss= 1.0917 (max= 1.8600), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:18,088 - root - INFO - Step 14220: lr=1.37E-06, loss= 1.0917 (max= 1.8600), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:18,088 - root - INFO - Step 14220: lr=1.37E-06, loss= 1.0917 (max= 1.8600), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:18,088 - root - INFO - Step 14220: lr=1.37E-06, loss= 1.0917 (max= 1.8600), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:18,088 - root - INFO - Step 14220: lr=1.37E-06, loss= 1.0917 (max= 1.8600), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:18,088 - root - INFO - Step 14220: lr=1.37E-06, loss= 1.0917 (max= 1.8600), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:18,089 - root - INFO - Step 14220: lr=1.37E-06, loss= 1.0917 (max= 1.8600), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:49,949 - root - INFO - Step 14230: lr=1.37E-06, loss= 1.1155 (max= 1.8973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:49,949 - root - INFO - Step 14230: lr=1.37E-06, loss= 1.1155 (max= 1.8973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:49,950 - root - INFO - Step 14230: lr=1.37E-06, loss= 1.1155 (max= 1.8973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:49,950 - root - INFO - Step 14230: lr=1.37E-06, loss= 1.1155 (max= 1.8973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:49,950 - root - INFO - Step 14230: lr=1.37E-06, loss= 1.1155 (max= 1.8973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:49,950 - root - INFO - Step 14230: lr=1.37E-06, loss= 1.1155 (max= 1.8973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:49,950 - root - INFO - Step 14230: lr=1.37E-06, loss= 1.1155 (max= 1.8973), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:44:49,950 - root - INFO - Step 14230: lr=1.37E-06, loss= 1.1155 (max= 1.8973), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:21,767 - root - INFO - Step 14240: lr=1.36E-06, loss= 1.0874 (max= 2.0280), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:21,767 - root - INFO - Step 14240: lr=1.36E-06, loss= 1.0874 (max= 2.0280), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:21,767 - root - INFO - Step 14240: lr=1.36E-06, loss= 1.0874 (max= 2.0280), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:21,768 - root - INFO - Step 14240: lr=1.36E-06, loss= 1.0874 (max= 2.0280), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:21,768 - root - INFO - Step 14240: lr=1.36E-06, loss= 1.0874 (max= 2.0280), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:21,768 - root - INFO - Step 14240: lr=1.36E-06, loss= 1.0874 (max= 2.0280), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:21,768 - root - INFO - Step 14240: lr=1.36E-06, loss= 1.0874 (max= 2.0280), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:21,768 - root - INFO - Step 14240: lr=1.36E-06, loss= 1.0874 (max= 2.0280), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:53,578 - root - INFO - Step 14250: lr=1.36E-06, loss= 1.0968 (max= 1.8020), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:53,578 - root - INFO - Step 14250: lr=1.36E-06, loss= 1.0968 (max= 1.8020), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:53,578 - root - INFO - Step 14250: lr=1.36E-06, loss= 1.0968 (max= 1.8020), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:53,578 - root - INFO - Step 14250: lr=1.36E-06, loss= 1.0968 (max= 1.8020), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:53,578 - root - INFO - Step 14250: lr=1.36E-06, loss= 1.0968 (max= 1.8020), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:53,578 - root - INFO - Step 14250: lr=1.36E-06, loss= 1.0968 (max= 1.8020), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:53,578 - root - INFO - Step 14250: lr=1.36E-06, loss= 1.0968 (max= 1.8020), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:45:53,578 - root - INFO - Step 14250: lr=1.36E-06, loss= 1.0968 (max= 1.8020), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:25,359 - root - INFO - Step 14260: lr=1.36E-06, loss= 1.0893 (max= 2.2926), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:25,359 - root - INFO - Step 14260: lr=1.36E-06, loss= 1.0893 (max= 2.2926), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:25,359 - root - INFO - Step 14260: lr=1.36E-06, loss= 1.0893 (max= 2.2926), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:25,359 - root - INFO - Step 14260: lr=1.36E-06, loss= 1.0893 (max= 2.2926), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:25,359 - root - INFO - Step 14260: lr=1.36E-06, loss= 1.0893 (max= 2.2926), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:25,359 - root - INFO - Step 14260: lr=1.36E-06, loss= 1.0893 (max= 2.2926), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:25,359 - root - INFO - Step 14260: lr=1.36E-06, loss= 1.0893 (max= 2.2926), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:25,359 - root - INFO - Step 14260: lr=1.36E-06, loss= 1.0893 (max= 2.2926), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:57,536 - root - INFO - Step 14270: lr=1.36E-06, loss= 1.0767 (max= 1.9503), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:57,536 - root - INFO - Step 14270: lr=1.36E-06, loss= 1.0767 (max= 1.9503), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:57,536 - root - INFO - Step 14270: lr=1.36E-06, loss= 1.0767 (max= 1.9503), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:57,536 - root - INFO - Step 14270: lr=1.36E-06, loss= 1.0767 (max= 1.9503), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:57,536 - root - INFO - Step 14270: lr=1.36E-06, loss= 1.0767 (max= 1.9503), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:57,536 - root - INFO - Step 14270: lr=1.36E-06, loss= 1.0767 (max= 1.9503), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:57,536 - root - INFO - Step 14270: lr=1.36E-06, loss= 1.0767 (max= 1.9503), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:46:57,536 - root - INFO - Step 14270: lr=1.36E-06, loss= 1.0767 (max= 1.9503), tps=20369, mfu=42.44%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:47:29,452 - root - INFO - Step 14280: lr=1.35E-06, loss= 1.1028 (max= 1.7553), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:47:29,452 - root - INFO - Step 14280: lr=1.35E-06, loss= 1.1028 (max= 1.7553), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:47:29,452 - root - INFO - Step 14280: lr=1.35E-06, loss= 1.1028 (max= 1.7553), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:47:29,452 - root - INFO - Step 14280: lr=1.35E-06, loss= 1.1028 (max= 1.7553), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:47:29,452 - root - INFO - Step 14280: lr=1.35E-06, loss= 1.1028 (max= 1.7553), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:47:29,452 - root - INFO - Step 14280: lr=1.35E-06, loss= 1.1028 (max= 1.7553), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:47:29,452 - root - INFO - Step 14280: lr=1.35E-06, loss= 1.1028 (max= 1.7553), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:47:29,452 - root - INFO - Step 14280: lr=1.35E-06, loss= 1.1028 (max= 1.7553), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:01,292 - root - INFO - Step 14290: lr=1.35E-06, loss= 1.1060 (max= 2.0163), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:01,292 - root - INFO - Step 14290: lr=1.35E-06, loss= 1.1060 (max= 2.0163), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:01,292 - root - INFO - Step 14290: lr=1.35E-06, loss= 1.1060 (max= 2.0163), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:01,292 - root - INFO - Step 14290: lr=1.35E-06, loss= 1.1060 (max= 2.0163), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:01,292 - root - INFO - Step 14290: lr=1.35E-06, loss= 1.1060 (max= 2.0163), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:01,292 - root - INFO - Step 14290: lr=1.35E-06, loss= 1.1060 (max= 2.0163), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:01,292 - root - INFO - Step 14290: lr=1.35E-06, loss= 1.1060 (max= 2.0163), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:01,292 - root - INFO - Step 14290: lr=1.35E-06, loss= 1.1060 (max= 2.0163), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:33,129 - root - INFO - Step 14300: lr=1.35E-06, loss= 1.1032 (max= 1.6809), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:33,129 - root - INFO - Step 14300: lr=1.35E-06, loss= 1.1032 (max= 1.6809), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:33,129 - root - INFO - Step 14300: lr=1.35E-06, loss= 1.1032 (max= 1.6809), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:33,129 - root - INFO - Step 14300: lr=1.35E-06, loss= 1.1032 (max= 1.6809), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:33,129 - root - INFO - Step 14300: lr=1.35E-06, loss= 1.1032 (max= 1.6809), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:33,129 - root - INFO - Step 14300: lr=1.35E-06, loss= 1.1032 (max= 1.6809), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:33,129 - root - INFO - Step 14300: lr=1.35E-06, loss= 1.1032 (max= 1.6809), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:48:33,129 - root - INFO - Step 14300: lr=1.35E-06, loss= 1.1032 (max= 1.6809), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:04,914 - root - INFO - Step 14310: lr=1.34E-06, loss= 1.0794 (max= 1.6938), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:04,914 - root - INFO - Step 14310: lr=1.34E-06, loss= 1.0794 (max= 1.6938), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:04,914 - root - INFO - Step 14310: lr=1.34E-06, loss= 1.0794 (max= 1.6938), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:04,914 - root - INFO - Step 14310: lr=1.34E-06, loss= 1.0794 (max= 1.6938), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:04,914 - root - INFO - Step 14310: lr=1.34E-06, loss= 1.0794 (max= 1.6938), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:04,914 - root - INFO - Step 14310: lr=1.34E-06, loss= 1.0794 (max= 1.6938), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:04,914 - root - INFO - Step 14310: lr=1.34E-06, loss= 1.0794 (max= 1.6938), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:04,915 - root - INFO - Step 14310: lr=1.34E-06, loss= 1.0794 (max= 1.6938), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:36,776 - root - INFO - Step 14320: lr=1.34E-06, loss= 1.1038 (max= 1.8234), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:36,777 - root - INFO - Step 14320: lr=1.34E-06, loss= 1.1038 (max= 1.8234), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:36,777 - root - INFO - Step 14320: lr=1.34E-06, loss= 1.1038 (max= 1.8234), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:36,777 - root - INFO - Step 14320: lr=1.34E-06, loss= 1.1038 (max= 1.8234), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:36,777 - root - INFO - Step 14320: lr=1.34E-06, loss= 1.1038 (max= 1.8234), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:36,777 - root - INFO - Step 14320: lr=1.34E-06, loss= 1.1038 (max= 1.8234), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:36,777 - root - INFO - Step 14320: lr=1.34E-06, loss= 1.1038 (max= 1.8234), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:49:36,777 - root - INFO - Step 14320: lr=1.34E-06, loss= 1.1038 (max= 1.8234), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:08,562 - root - INFO - Step 14330: lr=1.34E-06, loss= 1.0828 (max= 1.7941), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:08,562 - root - INFO - Step 14330: lr=1.34E-06, loss= 1.0828 (max= 1.7941), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:08,562 - root - INFO - Step 14330: lr=1.34E-06, loss= 1.0828 (max= 1.7941), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:08,562 - root - INFO - Step 14330: lr=1.34E-06, loss= 1.0828 (max= 1.7941), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:08,562 - root - INFO - Step 14330: lr=1.34E-06, loss= 1.0828 (max= 1.7941), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:08,562 - root - INFO - Step 14330: lr=1.34E-06, loss= 1.0828 (max= 1.7941), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:08,562 - root - INFO - Step 14330: lr=1.34E-06, loss= 1.0828 (max= 1.7941), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:08,562 - root - INFO - Step 14330: lr=1.34E-06, loss= 1.0828 (max= 1.7941), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:40,338 - root - INFO - Step 14340: lr=1.33E-06, loss= 1.0648 (max= 1.8533), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:40,338 - root - INFO - Step 14340: lr=1.33E-06, loss= 1.0648 (max= 1.8533), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:40,338 - root - INFO - Step 14340: lr=1.33E-06, loss= 1.0648 (max= 1.8533), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:40,338 - root - INFO - Step 14340: lr=1.33E-06, loss= 1.0648 (max= 1.8533), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:40,338 - root - INFO - Step 14340: lr=1.33E-06, loss= 1.0648 (max= 1.8533), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:40,338 - root - INFO - Step 14340: lr=1.33E-06, loss= 1.0648 (max= 1.8533), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:40,338 - root - INFO - Step 14340: lr=1.33E-06, loss= 1.0648 (max= 1.8533), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:50:40,338 - root - INFO - Step 14340: lr=1.33E-06, loss= 1.0648 (max= 1.8533), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:12,100 - root - INFO - Step 14350: lr=1.33E-06, loss= 1.0893 (max= 1.9795), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:12,100 - root - INFO - Step 14350: lr=1.33E-06, loss= 1.0893 (max= 1.9795), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:12,101 - root - INFO - Step 14350: lr=1.33E-06, loss= 1.0893 (max= 1.9795), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:12,101 - root - INFO - Step 14350: lr=1.33E-06, loss= 1.0893 (max= 1.9795), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:12,101 - root - INFO - Step 14350: lr=1.33E-06, loss= 1.0893 (max= 1.9795), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:12,101 - root - INFO - Step 14350: lr=1.33E-06, loss= 1.0893 (max= 1.9795), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:12,101 - root - INFO - Step 14350: lr=1.33E-06, loss= 1.0893 (max= 1.9795), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:12,101 - root - INFO - Step 14350: lr=1.33E-06, loss= 1.0893 (max= 1.9795), tps=20635, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:43,948 - root - INFO - Step 14360: lr=1.33E-06, loss= 1.1029 (max= 1.9655), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:43,948 - root - INFO - Step 14360: lr=1.33E-06, loss= 1.1029 (max= 1.9655), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:43,948 - root - INFO - Step 14360: lr=1.33E-06, loss= 1.1029 (max= 1.9655), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:43,948 - root - INFO - Step 14360: lr=1.33E-06, loss= 1.1029 (max= 1.9655), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:43,948 - root - INFO - Step 14360: lr=1.33E-06, loss= 1.1029 (max= 1.9655), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:43,948 - root - INFO - Step 14360: lr=1.33E-06, loss= 1.1029 (max= 1.9655), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:43,948 - root - INFO - Step 14360: lr=1.33E-06, loss= 1.1029 (max= 1.9655), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:51:43,948 - root - INFO - Step 14360: lr=1.33E-06, loss= 1.1029 (max= 1.9655), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:15,761 - root - INFO - Step 14370: lr=1.32E-06, loss= 1.1093 (max= 2.0107), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:15,761 - root - INFO - Step 14370: lr=1.32E-06, loss= 1.1093 (max= 2.0107), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:15,762 - root - INFO - Step 14370: lr=1.32E-06, loss= 1.1093 (max= 2.0107), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:15,762 - root - INFO - Step 14370: lr=1.32E-06, loss= 1.1093 (max= 2.0107), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:15,762 - root - INFO - Step 14370: lr=1.32E-06, loss= 1.1093 (max= 2.0107), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:15,762 - root - INFO - Step 14370: lr=1.32E-06, loss= 1.1093 (max= 2.0107), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:15,762 - root - INFO - Step 14370: lr=1.32E-06, loss= 1.1093 (max= 2.0107), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:15,762 - root - INFO - Step 14370: lr=1.32E-06, loss= 1.1093 (max= 2.0107), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:47,707 - root - INFO - Step 14380: lr=1.32E-06, loss= 1.0940 (max= 1.6491), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:47,707 - root - INFO - Step 14380: lr=1.32E-06, loss= 1.0940 (max= 1.6491), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:47,707 - root - INFO - Step 14380: lr=1.32E-06, loss= 1.0940 (max= 1.6491), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:47,707 - root - INFO - Step 14380: lr=1.32E-06, loss= 1.0940 (max= 1.6491), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:47,707 - root - INFO - Step 14380: lr=1.32E-06, loss= 1.0940 (max= 1.6491), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:47,707 - root - INFO - Step 14380: lr=1.32E-06, loss= 1.0940 (max= 1.6491), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:47,707 - root - INFO - Step 14380: lr=1.32E-06, loss= 1.0940 (max= 1.6491), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:52:47,707 - root - INFO - Step 14380: lr=1.32E-06, loss= 1.0940 (max= 1.6491), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:19,543 - root - INFO - Step 14390: lr=1.32E-06, loss= 1.0690 (max= 1.8316), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:19,543 - root - INFO - Step 14390: lr=1.32E-06, loss= 1.0690 (max= 1.8316), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:19,543 - root - INFO - Step 14390: lr=1.32E-06, loss= 1.0690 (max= 1.8316), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:19,543 - root - INFO - Step 14390: lr=1.32E-06, loss= 1.0690 (max= 1.8316), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:19,543 - root - INFO - Step 14390: lr=1.32E-06, loss= 1.0690 (max= 1.8316), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:19,543 - root - INFO - Step 14390: lr=1.32E-06, loss= 1.0690 (max= 1.8316), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:19,543 - root - INFO - Step 14390: lr=1.32E-06, loss= 1.0690 (max= 1.8316), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:19,543 - root - INFO - Step 14390: lr=1.32E-06, loss= 1.0690 (max= 1.8316), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:51,567 - root - INFO - Step 14400: lr=1.31E-06, loss= 1.1056 (max= 1.8033), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:51,567 - root - INFO - Step 14400: lr=1.31E-06, loss= 1.1056 (max= 1.8033), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:51,567 - root - INFO - Step 14400: lr=1.31E-06, loss= 1.1056 (max= 1.8033), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:51,567 - root - INFO - Step 14400: lr=1.31E-06, loss= 1.1056 (max= 1.8033), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:51,567 - root - INFO - Step 14400: lr=1.31E-06, loss= 1.1056 (max= 1.8033), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:51,567 - root - INFO - Step 14400: lr=1.31E-06, loss= 1.1056 (max= 1.8033), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:51,567 - root - INFO - Step 14400: lr=1.31E-06, loss= 1.1056 (max= 1.8033), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:53:51,567 - root - INFO - Step 14400: lr=1.31E-06, loss= 1.1056 (max= 1.8033), tps=20467, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:23,550 - root - INFO - Step 14410: lr=1.31E-06, loss= 1.0675 (max= 1.5751), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:23,550 - root - INFO - Step 14410: lr=1.31E-06, loss= 1.0675 (max= 1.5751), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:23,550 - root - INFO - Step 14410: lr=1.31E-06, loss= 1.0675 (max= 1.5751), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:23,550 - root - INFO - Step 14410: lr=1.31E-06, loss= 1.0675 (max= 1.5751), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:23,550 - root - INFO - Step 14410: lr=1.31E-06, loss= 1.0675 (max= 1.5751), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:23,550 - root - INFO - Step 14410: lr=1.31E-06, loss= 1.0675 (max= 1.5751), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:23,550 - root - INFO - Step 14410: lr=1.31E-06, loss= 1.0675 (max= 1.5751), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:23,550 - root - INFO - Step 14410: lr=1.31E-06, loss= 1.0675 (max= 1.5751), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:55,761 - root - INFO - Step 14420: lr=1.31E-06, loss= 1.0826 (max= 2.1718), tps=20348, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:55,761 - root - INFO - Step 14420: lr=1.31E-06, loss= 1.0826 (max= 2.1718), tps=20348, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:55,762 - root - INFO - Step 14420: lr=1.31E-06, loss= 1.0826 (max= 2.1718), tps=20348, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:55,762 - root - INFO - Step 14420: lr=1.31E-06, loss= 1.0826 (max= 2.1718), tps=20348, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:55,762 - root - INFO - Step 14420: lr=1.31E-06, loss= 1.0826 (max= 2.1718), tps=20348, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:55,762 - root - INFO - Step 14420: lr=1.31E-06, loss= 1.0826 (max= 2.1718), tps=20348, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:55,762 - root - INFO - Step 14420: lr=1.31E-06, loss= 1.0826 (max= 2.1718), tps=20348, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:54:55,762 - root - INFO - Step 14420: lr=1.31E-06, loss= 1.0826 (max= 2.1718), tps=20348, mfu=42.39%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:27,598 - root - INFO - Step 14430: lr=1.31E-06, loss= 1.1047 (max= 1.8749), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:27,599 - root - INFO - Step 14430: lr=1.31E-06, loss= 1.1047 (max= 1.8749), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:27,599 - root - INFO - Step 14430: lr=1.31E-06, loss= 1.1047 (max= 1.8749), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:27,599 - root - INFO - Step 14430: lr=1.31E-06, loss= 1.1047 (max= 1.8749), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:27,599 - root - INFO - Step 14430: lr=1.31E-06, loss= 1.1047 (max= 1.8749), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:27,599 - root - INFO - Step 14430: lr=1.31E-06, loss= 1.1047 (max= 1.8749), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:27,599 - root - INFO - Step 14430: lr=1.31E-06, loss= 1.1047 (max= 1.8749), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:27,599 - root - INFO - Step 14430: lr=1.31E-06, loss= 1.1047 (max= 1.8749), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:59,483 - root - INFO - Step 14440: lr=1.30E-06, loss= 1.0870 (max= 1.8126), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:59,483 - root - INFO - Step 14440: lr=1.30E-06, loss= 1.0870 (max= 1.8126), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:59,483 - root - INFO - Step 14440: lr=1.30E-06, loss= 1.0870 (max= 1.8126), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:59,484 - root - INFO - Step 14440: lr=1.30E-06, loss= 1.0870 (max= 1.8126), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:59,484 - root - INFO - Step 14440: lr=1.30E-06, loss= 1.0870 (max= 1.8126), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:59,484 - root - INFO - Step 14440: lr=1.30E-06, loss= 1.0870 (max= 1.8126), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:59,484 - root - INFO - Step 14440: lr=1.30E-06, loss= 1.0870 (max= 1.8126), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:55:59,484 - root - INFO - Step 14440: lr=1.30E-06, loss= 1.0870 (max= 1.8126), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:56:31,316 - root - INFO - Step 14450: lr=1.30E-06, loss= 1.0718 (max= 1.5257), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:56:31,316 - root - INFO - Step 14450: lr=1.30E-06, loss= 1.0718 (max= 1.5257), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:56:31,316 - root - INFO - Step 14450: lr=1.30E-06, loss= 1.0718 (max= 1.5257), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:56:31,316 - root - INFO - Step 14450: lr=1.30E-06, loss= 1.0718 (max= 1.5257), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:56:31,316 - root - INFO - Step 14450: lr=1.30E-06, loss= 1.0718 (max= 1.5257), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:56:31,316 - root - INFO - Step 14450: lr=1.30E-06, loss= 1.0718 (max= 1.5257), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:56:31,316 - root - INFO - Step 14450: lr=1.30E-06, loss= 1.0718 (max= 1.5257), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:56:31,316 - root - INFO - Step 14450: lr=1.30E-06, loss= 1.0718 (max= 1.5257), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:03,127 - root - INFO - Step 14460: lr=1.30E-06, loss= 1.0647 (max= 1.7216), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:03,127 - root - INFO - Step 14460: lr=1.30E-06, loss= 1.0647 (max= 1.7216), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:03,127 - root - INFO - Step 14460: lr=1.30E-06, loss= 1.0647 (max= 1.7216), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:03,127 - root - INFO - Step 14460: lr=1.30E-06, loss= 1.0647 (max= 1.7216), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:03,127 - root - INFO - Step 14460: lr=1.30E-06, loss= 1.0647 (max= 1.7216), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:03,127 - root - INFO - Step 14460: lr=1.30E-06, loss= 1.0647 (max= 1.7216), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:03,127 - root - INFO - Step 14460: lr=1.30E-06, loss= 1.0647 (max= 1.7216), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:03,127 - root - INFO - Step 14460: lr=1.30E-06, loss= 1.0647 (max= 1.7216), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:19,599 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:146141 2025-10-26 20:57:34,995 - root - INFO - Step 14470: lr=1.29E-06, loss= 1.0770 (max= 1.5792), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:34,995 - root - INFO - Step 14470: lr=1.29E-06, loss= 1.0770 (max= 1.5792), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:34,995 - root - INFO - Step 14470: lr=1.29E-06, loss= 1.0770 (max= 1.5792), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:34,995 - root - INFO - Step 14470: lr=1.29E-06, loss= 1.0770 (max= 1.5792), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:34,995 - root - INFO - Step 14470: lr=1.29E-06, loss= 1.0770 (max= 1.5792), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:34,995 - root - INFO - Step 14470: lr=1.29E-06, loss= 1.0770 (max= 1.5792), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:34,995 - root - INFO - Step 14470: lr=1.29E-06, loss= 1.0770 (max= 1.5792), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:57:34,995 - root - INFO - Step 14470: lr=1.29E-06, loss= 1.0770 (max= 1.5792), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:06,784 - root - INFO - Step 14480: lr=1.29E-06, loss= 1.0743 (max= 1.7750), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:06,784 - root - INFO - Step 14480: lr=1.29E-06, loss= 1.0743 (max= 1.7750), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:06,784 - root - INFO - Step 14480: lr=1.29E-06, loss= 1.0743 (max= 1.7750), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:06,784 - root - INFO - Step 14480: lr=1.29E-06, loss= 1.0743 (max= 1.7750), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:06,784 - root - INFO - Step 14480: lr=1.29E-06, loss= 1.0743 (max= 1.7750), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:06,784 - root - INFO - Step 14480: lr=1.29E-06, loss= 1.0743 (max= 1.7750), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:06,784 - root - INFO - Step 14480: lr=1.29E-06, loss= 1.0743 (max= 1.7750), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:06,784 - root - INFO - Step 14480: lr=1.29E-06, loss= 1.0743 (max= 1.7750), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:38,692 - root - INFO - Step 14490: lr=1.29E-06, loss= 1.0904 (max= 2.0771), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:38,692 - root - INFO - Step 14490: lr=1.29E-06, loss= 1.0904 (max= 2.0771), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:38,692 - root - INFO - Step 14490: lr=1.29E-06, loss= 1.0904 (max= 2.0771), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:38,692 - root - INFO - Step 14490: lr=1.29E-06, loss= 1.0904 (max= 2.0771), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:38,692 - root - INFO - Step 14490: lr=1.29E-06, loss= 1.0904 (max= 2.0771), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:38,692 - root - INFO - Step 14490: lr=1.29E-06, loss= 1.0904 (max= 2.0771), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:38,692 - root - INFO - Step 14490: lr=1.29E-06, loss= 1.0904 (max= 2.0771), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:58:38,692 - root - INFO - Step 14490: lr=1.29E-06, loss= 1.0904 (max= 2.0771), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:10,479 - root - INFO - Step 14500: lr=1.28E-06, loss= 1.0992 (max= 1.8471), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:10,479 - root - INFO - Step 14500: lr=1.28E-06, loss= 1.0992 (max= 1.8471), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:10,479 - root - INFO - Step 14500: lr=1.28E-06, loss= 1.0992 (max= 1.8471), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:10,479 - root - INFO - Step 14500: lr=1.28E-06, loss= 1.0992 (max= 1.8471), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:10,479 - root - INFO - Step 14500: lr=1.28E-06, loss= 1.0992 (max= 1.8471), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:10,479 - root - INFO - Step 14500: lr=1.28E-06, loss= 1.0992 (max= 1.8471), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:10,480 - root - INFO - Step 14500: lr=1.28E-06, loss= 1.0992 (max= 1.8471), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:10,480 - root - INFO - Step 14500: lr=1.28E-06, loss= 1.0992 (max= 1.8471), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:42,364 - root - INFO - Step 14510: lr=1.28E-06, loss= 1.0723 (max= 1.7498), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:42,364 - root - INFO - Step 14510: lr=1.28E-06, loss= 1.0723 (max= 1.7498), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:42,364 - root - INFO - Step 14510: lr=1.28E-06, loss= 1.0723 (max= 1.7498), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:42,364 - root - INFO - Step 14510: lr=1.28E-06, loss= 1.0723 (max= 1.7498), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:42,364 - root - INFO - Step 14510: lr=1.28E-06, loss= 1.0723 (max= 1.7498), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:42,364 - root - INFO - Step 14510: lr=1.28E-06, loss= 1.0723 (max= 1.7498), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:42,364 - root - INFO - Step 14510: lr=1.28E-06, loss= 1.0723 (max= 1.7498), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 20:59:42,364 - root - INFO - Step 14510: lr=1.28E-06, loss= 1.0723 (max= 1.7498), tps=20556, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:14,223 - root - INFO - Step 14520: lr=1.28E-06, loss= 1.0808 (max= 1.5271), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:14,223 - root - INFO - Step 14520: lr=1.28E-06, loss= 1.0808 (max= 1.5271), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:14,223 - root - INFO - Step 14520: lr=1.28E-06, loss= 1.0808 (max= 1.5271), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:14,223 - root - INFO - Step 14520: lr=1.28E-06, loss= 1.0808 (max= 1.5271), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:14,223 - root - INFO - Step 14520: lr=1.28E-06, loss= 1.0808 (max= 1.5271), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:14,223 - root - INFO - Step 14520: lr=1.28E-06, loss= 1.0808 (max= 1.5271), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:14,223 - root - INFO - Step 14520: lr=1.28E-06, loss= 1.0808 (max= 1.5271), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:14,223 - root - INFO - Step 14520: lr=1.28E-06, loss= 1.0808 (max= 1.5271), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:46,160 - root - INFO - Step 14530: lr=1.27E-06, loss= 1.0694 (max= 1.5173), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:46,160 - root - INFO - Step 14530: lr=1.27E-06, loss= 1.0694 (max= 1.5173), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:46,160 - root - INFO - Step 14530: lr=1.27E-06, loss= 1.0694 (max= 1.5173), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:46,160 - root - INFO - Step 14530: lr=1.27E-06, loss= 1.0694 (max= 1.5173), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:46,160 - root - INFO - Step 14530: lr=1.27E-06, loss= 1.0694 (max= 1.5173), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:46,160 - root - INFO - Step 14530: lr=1.27E-06, loss= 1.0694 (max= 1.5173), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:46,160 - root - INFO - Step 14530: lr=1.27E-06, loss= 1.0694 (max= 1.5173), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:00:46,161 - root - INFO - Step 14530: lr=1.27E-06, loss= 1.0694 (max= 1.5173), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:18,235 - root - INFO - Step 14540: lr=1.27E-06, loss= 1.1007 (max= 1.7022), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:18,235 - root - INFO - Step 14540: lr=1.27E-06, loss= 1.1007 (max= 1.7022), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:18,235 - root - INFO - Step 14540: lr=1.27E-06, loss= 1.1007 (max= 1.7022), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:18,235 - root - INFO - Step 14540: lr=1.27E-06, loss= 1.1007 (max= 1.7022), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:18,235 - root - INFO - Step 14540: lr=1.27E-06, loss= 1.1007 (max= 1.7022), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:18,235 - root - INFO - Step 14540: lr=1.27E-06, loss= 1.1007 (max= 1.7022), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:18,235 - root - INFO - Step 14540: lr=1.27E-06, loss= 1.1007 (max= 1.7022), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:18,235 - root - INFO - Step 14540: lr=1.27E-06, loss= 1.1007 (max= 1.7022), tps=20434, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:50,090 - root - INFO - Step 14550: lr=1.27E-06, loss= 1.0667 (max= 1.5748), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:50,090 - root - INFO - Step 14550: lr=1.27E-06, loss= 1.0667 (max= 1.5748), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:50,090 - root - INFO - Step 14550: lr=1.27E-06, loss= 1.0667 (max= 1.5748), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:50,090 - root - INFO - Step 14550: lr=1.27E-06, loss= 1.0667 (max= 1.5748), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:50,090 - root - INFO - Step 14550: lr=1.27E-06, loss= 1.0667 (max= 1.5748), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:50,090 - root - INFO - Step 14550: lr=1.27E-06, loss= 1.0667 (max= 1.5748), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:50,090 - root - INFO - Step 14550: lr=1.27E-06, loss= 1.0667 (max= 1.5748), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:01:50,090 - root - INFO - Step 14550: lr=1.27E-06, loss= 1.0667 (max= 1.5748), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:21,931 - root - INFO - Step 14560: lr=1.26E-06, loss= 1.0734 (max= 1.6008), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:21,931 - root - INFO - Step 14560: lr=1.26E-06, loss= 1.0734 (max= 1.6008), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:21,931 - root - INFO - Step 14560: lr=1.26E-06, loss= 1.0734 (max= 1.6008), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:21,931 - root - INFO - Step 14560: lr=1.26E-06, loss= 1.0734 (max= 1.6008), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:21,931 - root - INFO - Step 14560: lr=1.26E-06, loss= 1.0734 (max= 1.6008), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:21,931 - root - INFO - Step 14560: lr=1.26E-06, loss= 1.0734 (max= 1.6008), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:21,931 - root - INFO - Step 14560: lr=1.26E-06, loss= 1.0734 (max= 1.6008), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:21,931 - root - INFO - Step 14560: lr=1.26E-06, loss= 1.0734 (max= 1.6008), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:53,768 - root - INFO - Step 14570: lr=1.26E-06, loss= 1.0843 (max= 1.5803), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:53,768 - root - INFO - Step 14570: lr=1.26E-06, loss= 1.0843 (max= 1.5803), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:53,768 - root - INFO - Step 14570: lr=1.26E-06, loss= 1.0843 (max= 1.5803), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:53,768 - root - INFO - Step 14570: lr=1.26E-06, loss= 1.0843 (max= 1.5803), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:53,768 - root - INFO - Step 14570: lr=1.26E-06, loss= 1.0843 (max= 1.5803), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:53,768 - root - INFO - Step 14570: lr=1.26E-06, loss= 1.0843 (max= 1.5803), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:53,768 - root - INFO - Step 14570: lr=1.26E-06, loss= 1.0843 (max= 1.5803), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:02:53,768 - root - INFO - Step 14570: lr=1.26E-06, loss= 1.0843 (max= 1.5803), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:25,589 - root - INFO - Step 14580: lr=1.26E-06, loss= 1.0765 (max= 1.5988), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:25,589 - root - INFO - Step 14580: lr=1.26E-06, loss= 1.0765 (max= 1.5988), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:25,589 - root - INFO - Step 14580: lr=1.26E-06, loss= 1.0765 (max= 1.5988), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:25,589 - root - INFO - Step 14580: lr=1.26E-06, loss= 1.0765 (max= 1.5988), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:25,589 - root - INFO - Step 14580: lr=1.26E-06, loss= 1.0765 (max= 1.5988), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:25,589 - root - INFO - Step 14580: lr=1.26E-06, loss= 1.0765 (max= 1.5988), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:25,589 - root - INFO - Step 14580: lr=1.26E-06, loss= 1.0765 (max= 1.5988), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:25,589 - root - INFO - Step 14580: lr=1.26E-06, loss= 1.0765 (max= 1.5988), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:57,412 - root - INFO - Step 14590: lr=1.26E-06, loss= 1.0410 (max= 1.5225), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:57,412 - root - INFO - Step 14590: lr=1.26E-06, loss= 1.0410 (max= 1.5225), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:57,412 - root - INFO - Step 14590: lr=1.26E-06, loss= 1.0410 (max= 1.5225), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:57,412 - root - INFO - Step 14590: lr=1.26E-06, loss= 1.0410 (max= 1.5225), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:57,412 - root - INFO - Step 14590: lr=1.26E-06, loss= 1.0410 (max= 1.5225), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:57,412 - root - INFO - Step 14590: lr=1.26E-06, loss= 1.0410 (max= 1.5225), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:57,412 - root - INFO - Step 14590: lr=1.26E-06, loss= 1.0410 (max= 1.5225), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:03:57,412 - root - INFO - Step 14590: lr=1.26E-06, loss= 1.0410 (max= 1.5225), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:04:29,522 - root - INFO - Step 14600: lr=1.25E-06, loss= 1.0578 (max= 1.5384), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:04:29,522 - root - INFO - Step 14600: lr=1.25E-06, loss= 1.0578 (max= 1.5384), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:04:29,522 - root - INFO - Step 14600: lr=1.25E-06, loss= 1.0578 (max= 1.5384), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:04:29,522 - root - INFO - Step 14600: lr=1.25E-06, loss= 1.0578 (max= 1.5384), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:04:29,522 - root - INFO - Step 14600: lr=1.25E-06, loss= 1.0578 (max= 1.5384), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:04:29,522 - root - INFO - Step 14600: lr=1.25E-06, loss= 1.0578 (max= 1.5384), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:04:29,522 - root - INFO - Step 14600: lr=1.25E-06, loss= 1.0578 (max= 1.5384), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:04:29,522 - root - INFO - Step 14600: lr=1.25E-06, loss= 1.0578 (max= 1.5384), tps=20412, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:01,309 - root - INFO - Step 14610: lr=1.25E-06, loss= 1.0846 (max= 1.5233), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:01,310 - root - INFO - Step 14610: lr=1.25E-06, loss= 1.0846 (max= 1.5233), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:01,310 - root - INFO - Step 14610: lr=1.25E-06, loss= 1.0846 (max= 1.5233), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:01,310 - root - INFO - Step 14610: lr=1.25E-06, loss= 1.0846 (max= 1.5233), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:01,310 - root - INFO - Step 14610: lr=1.25E-06, loss= 1.0846 (max= 1.5233), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:01,310 - root - INFO - Step 14610: lr=1.25E-06, loss= 1.0846 (max= 1.5233), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:01,310 - root - INFO - Step 14610: lr=1.25E-06, loss= 1.0846 (max= 1.5233), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:01,310 - root - INFO - Step 14610: lr=1.25E-06, loss= 1.0846 (max= 1.5233), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:33,141 - root - INFO - Step 14620: lr=1.25E-06, loss= 1.0545 (max= 1.6495), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:33,141 - root - INFO - Step 14620: lr=1.25E-06, loss= 1.0545 (max= 1.6495), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:33,141 - root - INFO - Step 14620: lr=1.25E-06, loss= 1.0545 (max= 1.6495), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:33,141 - root - INFO - Step 14620: lr=1.25E-06, loss= 1.0545 (max= 1.6495), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:33,141 - root - INFO - Step 14620: lr=1.25E-06, loss= 1.0545 (max= 1.6495), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:33,141 - root - INFO - Step 14620: lr=1.25E-06, loss= 1.0545 (max= 1.6495), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:33,141 - root - INFO - Step 14620: lr=1.25E-06, loss= 1.0545 (max= 1.6495), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:05:33,141 - root - INFO - Step 14620: lr=1.25E-06, loss= 1.0545 (max= 1.6495), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:04,966 - root - INFO - Step 14630: lr=1.24E-06, loss= 1.0781 (max= 1.7093), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:04,966 - root - INFO - Step 14630: lr=1.24E-06, loss= 1.0781 (max= 1.7093), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:04,966 - root - INFO - Step 14630: lr=1.24E-06, loss= 1.0781 (max= 1.7093), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:04,966 - root - INFO - Step 14630: lr=1.24E-06, loss= 1.0781 (max= 1.7093), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:04,966 - root - INFO - Step 14630: lr=1.24E-06, loss= 1.0781 (max= 1.7093), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:04,966 - root - INFO - Step 14630: lr=1.24E-06, loss= 1.0781 (max= 1.7093), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:04,966 - root - INFO - Step 14630: lr=1.24E-06, loss= 1.0781 (max= 1.7093), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:04,966 - root - INFO - Step 14630: lr=1.24E-06, loss= 1.0781 (max= 1.7093), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:36,971 - root - INFO - Step 14640: lr=1.24E-06, loss= 1.0666 (max= 1.6014), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:36,971 - root - INFO - Step 14640: lr=1.24E-06, loss= 1.0666 (max= 1.6014), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:36,971 - root - INFO - Step 14640: lr=1.24E-06, loss= 1.0666 (max= 1.6014), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:36,971 - root - INFO - Step 14640: lr=1.24E-06, loss= 1.0666 (max= 1.6014), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:36,971 - root - INFO - Step 14640: lr=1.24E-06, loss= 1.0666 (max= 1.6014), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:36,971 - root - INFO - Step 14640: lr=1.24E-06, loss= 1.0666 (max= 1.6014), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:36,971 - root - INFO - Step 14640: lr=1.24E-06, loss= 1.0666 (max= 1.6014), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:06:36,971 - root - INFO - Step 14640: lr=1.24E-06, loss= 1.0666 (max= 1.6014), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:08,798 - root - INFO - Step 14650: lr=1.24E-06, loss= 1.0690 (max= 1.6307), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:08,798 - root - INFO - Step 14650: lr=1.24E-06, loss= 1.0690 (max= 1.6307), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:08,798 - root - INFO - Step 14650: lr=1.24E-06, loss= 1.0690 (max= 1.6307), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:08,798 - root - INFO - Step 14650: lr=1.24E-06, loss= 1.0690 (max= 1.6307), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:08,798 - root - INFO - Step 14650: lr=1.24E-06, loss= 1.0690 (max= 1.6307), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:08,798 - root - INFO - Step 14650: lr=1.24E-06, loss= 1.0690 (max= 1.6307), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:08,798 - root - INFO - Step 14650: lr=1.24E-06, loss= 1.0690 (max= 1.6307), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:08,798 - root - INFO - Step 14650: lr=1.24E-06, loss= 1.0690 (max= 1.6307), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:40,612 - root - INFO - Step 14660: lr=1.23E-06, loss= 1.0886 (max= 1.6553), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:40,612 - root - INFO - Step 14660: lr=1.23E-06, loss= 1.0886 (max= 1.6553), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:40,612 - root - INFO - Step 14660: lr=1.23E-06, loss= 1.0886 (max= 1.6553), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:40,612 - root - INFO - Step 14660: lr=1.23E-06, loss= 1.0886 (max= 1.6553), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:40,613 - root - INFO - Step 14660: lr=1.23E-06, loss= 1.0886 (max= 1.6553), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:40,613 - root - INFO - Step 14660: lr=1.23E-06, loss= 1.0886 (max= 1.6553), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:40,613 - root - INFO - Step 14660: lr=1.23E-06, loss= 1.0886 (max= 1.6553), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:07:40,613 - root - INFO - Step 14660: lr=1.23E-06, loss= 1.0886 (max= 1.6553), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:12,485 - root - INFO - Step 14670: lr=1.23E-06, loss= 1.0691 (max= 1.5196), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:12,485 - root - INFO - Step 14670: lr=1.23E-06, loss= 1.0691 (max= 1.5196), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:12,485 - root - INFO - Step 14670: lr=1.23E-06, loss= 1.0691 (max= 1.5196), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:12,485 - root - INFO - Step 14670: lr=1.23E-06, loss= 1.0691 (max= 1.5196), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:12,485 - root - INFO - Step 14670: lr=1.23E-06, loss= 1.0691 (max= 1.5196), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:12,485 - root - INFO - Step 14670: lr=1.23E-06, loss= 1.0691 (max= 1.5196), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:12,485 - root - INFO - Step 14670: lr=1.23E-06, loss= 1.0691 (max= 1.5196), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:12,485 - root - INFO - Step 14670: lr=1.23E-06, loss= 1.0691 (max= 1.5196), tps=20564, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:44,259 - root - INFO - Step 14680: lr=1.23E-06, loss= 1.0737 (max= 1.5636), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:44,259 - root - INFO - Step 14680: lr=1.23E-06, loss= 1.0737 (max= 1.5636), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:44,259 - root - INFO - Step 14680: lr=1.23E-06, loss= 1.0737 (max= 1.5636), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:44,259 - root - INFO - Step 14680: lr=1.23E-06, loss= 1.0737 (max= 1.5636), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:44,259 - root - INFO - Step 14680: lr=1.23E-06, loss= 1.0737 (max= 1.5636), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:44,259 - root - INFO - Step 14680: lr=1.23E-06, loss= 1.0737 (max= 1.5636), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:44,259 - root - INFO - Step 14680: lr=1.23E-06, loss= 1.0737 (max= 1.5636), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:08:44,260 - root - INFO - Step 14680: lr=1.23E-06, loss= 1.0737 (max= 1.5636), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:16,064 - root - INFO - Step 14690: lr=1.22E-06, loss= 1.0526 (max= 1.4768), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:16,064 - root - INFO - Step 14690: lr=1.22E-06, loss= 1.0526 (max= 1.4768), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:16,064 - root - INFO - Step 14690: lr=1.22E-06, loss= 1.0526 (max= 1.4768), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:16,064 - root - INFO - Step 14690: lr=1.22E-06, loss= 1.0526 (max= 1.4768), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:16,064 - root - INFO - Step 14690: lr=1.22E-06, loss= 1.0526 (max= 1.4768), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:16,064 - root - INFO - Step 14690: lr=1.22E-06, loss= 1.0526 (max= 1.4768), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:16,064 - root - INFO - Step 14690: lr=1.22E-06, loss= 1.0526 (max= 1.4768), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:16,064 - root - INFO - Step 14690: lr=1.22E-06, loss= 1.0526 (max= 1.4768), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:47,879 - root - INFO - Step 14700: lr=1.22E-06, loss= 1.0553 (max= 1.6699), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:47,878 - root - INFO - Step 14700: lr=1.22E-06, loss= 1.0553 (max= 1.6699), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:47,879 - root - INFO - Step 14700: lr=1.22E-06, loss= 1.0553 (max= 1.6699), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:47,879 - root - INFO - Step 14700: lr=1.22E-06, loss= 1.0553 (max= 1.6699), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:47,879 - root - INFO - Step 14700: lr=1.22E-06, loss= 1.0553 (max= 1.6699), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:47,879 - root - INFO - Step 14700: lr=1.22E-06, loss= 1.0553 (max= 1.6699), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:47,879 - root - INFO - Step 14700: lr=1.22E-06, loss= 1.0553 (max= 1.6699), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:09:47,879 - root - INFO - Step 14700: lr=1.22E-06, loss= 1.0553 (max= 1.6699), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:19,838 - root - INFO - Step 14710: lr=1.22E-06, loss= 1.0724 (max= 1.5073), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:19,838 - root - INFO - Step 14710: lr=1.22E-06, loss= 1.0724 (max= 1.5073), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:19,838 - root - INFO - Step 14710: lr=1.22E-06, loss= 1.0724 (max= 1.5073), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:19,838 - root - INFO - Step 14710: lr=1.22E-06, loss= 1.0724 (max= 1.5073), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:19,838 - root - INFO - Step 14710: lr=1.22E-06, loss= 1.0724 (max= 1.5073), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:19,838 - root - INFO - Step 14710: lr=1.22E-06, loss= 1.0724 (max= 1.5073), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:19,838 - root - INFO - Step 14710: lr=1.22E-06, loss= 1.0724 (max= 1.5073), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:19,839 - root - INFO - Step 14710: lr=1.22E-06, loss= 1.0724 (max= 1.5073), tps=20508, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:51,685 - root - INFO - Step 14720: lr=1.22E-06, loss= 1.0711 (max= 1.6553), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:51,685 - root - INFO - Step 14720: lr=1.22E-06, loss= 1.0711 (max= 1.6553), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:51,685 - root - INFO - Step 14720: lr=1.22E-06, loss= 1.0711 (max= 1.6553), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:51,685 - root - INFO - Step 14720: lr=1.22E-06, loss= 1.0711 (max= 1.6553), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:51,685 - root - INFO - Step 14720: lr=1.22E-06, loss= 1.0711 (max= 1.6553), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:51,685 - root - INFO - Step 14720: lr=1.22E-06, loss= 1.0711 (max= 1.6553), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:51,685 - root - INFO - Step 14720: lr=1.22E-06, loss= 1.0711 (max= 1.6553), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:10:51,685 - root - INFO - Step 14720: lr=1.22E-06, loss= 1.0711 (max= 1.6553), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:23,630 - root - INFO - Step 14730: lr=1.21E-06, loss= 1.0685 (max= 1.5604), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:23,630 - root - INFO - Step 14730: lr=1.21E-06, loss= 1.0685 (max= 1.5604), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:23,630 - root - INFO - Step 14730: lr=1.21E-06, loss= 1.0685 (max= 1.5604), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:23,630 - root - INFO - Step 14730: lr=1.21E-06, loss= 1.0685 (max= 1.5604), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:23,630 - root - INFO - Step 14730: lr=1.21E-06, loss= 1.0685 (max= 1.5604), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:23,630 - root - INFO - Step 14730: lr=1.21E-06, loss= 1.0685 (max= 1.5604), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:23,630 - root - INFO - Step 14730: lr=1.21E-06, loss= 1.0685 (max= 1.5604), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:23,630 - root - INFO - Step 14730: lr=1.21E-06, loss= 1.0685 (max= 1.5604), tps=20517, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:55,476 - root - INFO - Step 14740: lr=1.21E-06, loss= 1.0703 (max= 1.5197), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:55,476 - root - INFO - Step 14740: lr=1.21E-06, loss= 1.0703 (max= 1.5197), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:55,476 - root - INFO - Step 14740: lr=1.21E-06, loss= 1.0703 (max= 1.5197), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:55,476 - root - INFO - Step 14740: lr=1.21E-06, loss= 1.0703 (max= 1.5197), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:55,476 - root - INFO - Step 14740: lr=1.21E-06, loss= 1.0703 (max= 1.5197), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:55,476 - root - INFO - Step 14740: lr=1.21E-06, loss= 1.0703 (max= 1.5197), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:55,476 - root - INFO - Step 14740: lr=1.21E-06, loss= 1.0703 (max= 1.5197), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:11:55,476 - root - INFO - Step 14740: lr=1.21E-06, loss= 1.0703 (max= 1.5197), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:27,333 - root - INFO - Step 14750: lr=1.21E-06, loss= 1.0570 (max= 1.5149), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:27,333 - root - INFO - Step 14750: lr=1.21E-06, loss= 1.0570 (max= 1.5149), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:27,333 - root - INFO - Step 14750: lr=1.21E-06, loss= 1.0570 (max= 1.5149), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:27,333 - root - INFO - Step 14750: lr=1.21E-06, loss= 1.0570 (max= 1.5149), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:27,333 - root - INFO - Step 14750: lr=1.21E-06, loss= 1.0570 (max= 1.5149), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:27,333 - root - INFO - Step 14750: lr=1.21E-06, loss= 1.0570 (max= 1.5149), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:27,333 - root - INFO - Step 14750: lr=1.21E-06, loss= 1.0570 (max= 1.5149), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:27,333 - root - INFO - Step 14750: lr=1.21E-06, loss= 1.0570 (max= 1.5149), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:59,316 - root - INFO - Step 14760: lr=1.20E-06, loss= 1.0734 (max= 1.5210), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:59,316 - root - INFO - Step 14760: lr=1.20E-06, loss= 1.0734 (max= 1.5210), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:59,316 - root - INFO - Step 14760: lr=1.20E-06, loss= 1.0734 (max= 1.5210), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:59,316 - root - INFO - Step 14760: lr=1.20E-06, loss= 1.0734 (max= 1.5210), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:59,316 - root - INFO - Step 14760: lr=1.20E-06, loss= 1.0734 (max= 1.5210), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:59,316 - root - INFO - Step 14760: lr=1.20E-06, loss= 1.0734 (max= 1.5210), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:59,316 - root - INFO - Step 14760: lr=1.20E-06, loss= 1.0734 (max= 1.5210), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:12:59,316 - root - INFO - Step 14760: lr=1.20E-06, loss= 1.0734 (max= 1.5210), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:13:31,241 - root - INFO - Step 14770: lr=1.20E-06, loss= 1.0665 (max= 1.8227), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:13:31,241 - root - INFO - Step 14770: lr=1.20E-06, loss= 1.0665 (max= 1.8227), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:13:31,241 - root - INFO - Step 14770: lr=1.20E-06, loss= 1.0665 (max= 1.8227), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:13:31,241 - root - INFO - Step 14770: lr=1.20E-06, loss= 1.0665 (max= 1.8227), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:13:31,241 - root - INFO - Step 14770: lr=1.20E-06, loss= 1.0665 (max= 1.8227), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:13:31,241 - root - INFO - Step 14770: lr=1.20E-06, loss= 1.0665 (max= 1.8227), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:13:31,241 - root - INFO - Step 14770: lr=1.20E-06, loss= 1.0665 (max= 1.8227), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:13:31,241 - root - INFO - Step 14770: lr=1.20E-06, loss= 1.0665 (max= 1.8227), tps=20530, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:03,135 - root - INFO - Step 14780: lr=1.20E-06, loss= 1.0642 (max= 1.7363), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:03,135 - root - INFO - Step 14780: lr=1.20E-06, loss= 1.0642 (max= 1.7363), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:03,135 - root - INFO - Step 14780: lr=1.20E-06, loss= 1.0642 (max= 1.7363), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:03,135 - root - INFO - Step 14780: lr=1.20E-06, loss= 1.0642 (max= 1.7363), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:03,135 - root - INFO - Step 14780: lr=1.20E-06, loss= 1.0642 (max= 1.7363), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:03,135 - root - INFO - Step 14780: lr=1.20E-06, loss= 1.0642 (max= 1.7363), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:03,135 - root - INFO - Step 14780: lr=1.20E-06, loss= 1.0642 (max= 1.7363), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:03,135 - root - INFO - Step 14780: lr=1.20E-06, loss= 1.0642 (max= 1.7363), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:35,058 - root - INFO - Step 14790: lr=1.19E-06, loss= 1.0874 (max= 1.6397), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:35,058 - root - INFO - Step 14790: lr=1.19E-06, loss= 1.0874 (max= 1.6397), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:35,058 - root - INFO - Step 14790: lr=1.19E-06, loss= 1.0874 (max= 1.6397), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:35,058 - root - INFO - Step 14790: lr=1.19E-06, loss= 1.0874 (max= 1.6397), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:35,058 - root - INFO - Step 14790: lr=1.19E-06, loss= 1.0874 (max= 1.6397), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:35,058 - root - INFO - Step 14790: lr=1.19E-06, loss= 1.0874 (max= 1.6397), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:35,058 - root - INFO - Step 14790: lr=1.19E-06, loss= 1.0874 (max= 1.6397), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:14:35,058 - root - INFO - Step 14790: lr=1.19E-06, loss= 1.0874 (max= 1.6397), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:06,893 - root - INFO - Step 14800: lr=1.19E-06, loss= 1.0500 (max= 1.5615), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:06,893 - root - INFO - Step 14800: lr=1.19E-06, loss= 1.0500 (max= 1.5615), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:06,893 - root - INFO - Step 14800: lr=1.19E-06, loss= 1.0500 (max= 1.5615), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:06,893 - root - INFO - Step 14800: lr=1.19E-06, loss= 1.0500 (max= 1.5615), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:06,893 - root - INFO - Step 14800: lr=1.19E-06, loss= 1.0500 (max= 1.5615), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:06,893 - root - INFO - Step 14800: lr=1.19E-06, loss= 1.0500 (max= 1.5615), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:06,893 - root - INFO - Step 14800: lr=1.19E-06, loss= 1.0500 (max= 1.5615), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:06,893 - root - INFO - Step 14800: lr=1.19E-06, loss= 1.0500 (max= 1.5615), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:38,734 - root - INFO - Step 14810: lr=1.19E-06, loss= 1.0526 (max= 1.4458), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:38,734 - root - INFO - Step 14810: lr=1.19E-06, loss= 1.0526 (max= 1.4458), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:38,734 - root - INFO - Step 14810: lr=1.19E-06, loss= 1.0526 (max= 1.4458), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:38,734 - root - INFO - Step 14810: lr=1.19E-06, loss= 1.0526 (max= 1.4458), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:38,734 - root - INFO - Step 14810: lr=1.19E-06, loss= 1.0526 (max= 1.4458), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:38,734 - root - INFO - Step 14810: lr=1.19E-06, loss= 1.0526 (max= 1.4458), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:38,734 - root - INFO - Step 14810: lr=1.19E-06, loss= 1.0526 (max= 1.4458), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:15:38,734 - root - INFO - Step 14810: lr=1.19E-06, loss= 1.0526 (max= 1.4458), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:10,577 - root - INFO - Step 14820: lr=1.18E-06, loss= 1.0905 (max= 1.5763), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:10,577 - root - INFO - Step 14820: lr=1.18E-06, loss= 1.0905 (max= 1.5763), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:10,577 - root - INFO - Step 14820: lr=1.18E-06, loss= 1.0905 (max= 1.5763), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:10,577 - root - INFO - Step 14820: lr=1.18E-06, loss= 1.0905 (max= 1.5763), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:10,577 - root - INFO - Step 14820: lr=1.18E-06, loss= 1.0905 (max= 1.5763), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:10,577 - root - INFO - Step 14820: lr=1.18E-06, loss= 1.0905 (max= 1.5763), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:10,577 - root - INFO - Step 14820: lr=1.18E-06, loss= 1.0905 (max= 1.5763), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:10,577 - root - INFO - Step 14820: lr=1.18E-06, loss= 1.0905 (max= 1.5763), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:42,447 - root - INFO - Step 14830: lr=1.18E-06, loss= 1.1080 (max= 1.7163), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:42,447 - root - INFO - Step 14830: lr=1.18E-06, loss= 1.1080 (max= 1.7163), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:42,447 - root - INFO - Step 14830: lr=1.18E-06, loss= 1.1080 (max= 1.7163), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:42,447 - root - INFO - Step 14830: lr=1.18E-06, loss= 1.1080 (max= 1.7163), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:42,447 - root - INFO - Step 14830: lr=1.18E-06, loss= 1.1080 (max= 1.7163), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:42,447 - root - INFO - Step 14830: lr=1.18E-06, loss= 1.1080 (max= 1.7163), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:42,447 - root - INFO - Step 14830: lr=1.18E-06, loss= 1.1080 (max= 1.7163), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:16:42,447 - root - INFO - Step 14830: lr=1.18E-06, loss= 1.1080 (max= 1.7163), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:14,363 - root - INFO - Step 14840: lr=1.18E-06, loss= 1.0716 (max= 1.6955), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:14,363 - root - INFO - Step 14840: lr=1.18E-06, loss= 1.0716 (max= 1.6955), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:14,363 - root - INFO - Step 14840: lr=1.18E-06, loss= 1.0716 (max= 1.6955), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:14,363 - root - INFO - Step 14840: lr=1.18E-06, loss= 1.0716 (max= 1.6955), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:14,363 - root - INFO - Step 14840: lr=1.18E-06, loss= 1.0716 (max= 1.6955), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:14,363 - root - INFO - Step 14840: lr=1.18E-06, loss= 1.0716 (max= 1.6955), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:14,363 - root - INFO - Step 14840: lr=1.18E-06, loss= 1.0716 (max= 1.6955), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:14,363 - root - INFO - Step 14840: lr=1.18E-06, loss= 1.0716 (max= 1.6955), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:46,194 - root - INFO - Step 14850: lr=1.18E-06, loss= 1.0361 (max= 1.5057), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:46,194 - root - INFO - Step 14850: lr=1.18E-06, loss= 1.0361 (max= 1.5057), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:46,194 - root - INFO - Step 14850: lr=1.18E-06, loss= 1.0361 (max= 1.5057), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:46,194 - root - INFO - Step 14850: lr=1.18E-06, loss= 1.0361 (max= 1.5057), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:46,194 - root - INFO - Step 14850: lr=1.18E-06, loss= 1.0361 (max= 1.5057), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:46,194 - root - INFO - Step 14850: lr=1.18E-06, loss= 1.0361 (max= 1.5057), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:46,194 - root - INFO - Step 14850: lr=1.18E-06, loss= 1.0361 (max= 1.5057), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:17:46,194 - root - INFO - Step 14850: lr=1.18E-06, loss= 1.0361 (max= 1.5057), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:13,975 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:665826 2025-10-26 21:18:18,022 - root - INFO - Step 14860: lr=1.17E-06, loss= 1.0933 (max= 2.2127), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:18,023 - root - INFO - Step 14860: lr=1.17E-06, loss= 1.0933 (max= 2.2127), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:18,023 - root - INFO - Step 14860: lr=1.17E-06, loss= 1.0933 (max= 2.2127), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:18,023 - root - INFO - Step 14860: lr=1.17E-06, loss= 1.0933 (max= 2.2127), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:18,023 - root - INFO - Step 14860: lr=1.17E-06, loss= 1.0933 (max= 2.2127), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:18,023 - root - INFO - Step 14860: lr=1.17E-06, loss= 1.0933 (max= 2.2127), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:18,023 - root - INFO - Step 14860: lr=1.17E-06, loss= 1.0933 (max= 2.2127), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:18,023 - root - INFO - Step 14860: lr=1.17E-06, loss= 1.0933 (max= 2.2127), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:49,893 - root - INFO - Step 14870: lr=1.17E-06, loss= 1.0667 (max= 1.5673), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:49,893 - root - INFO - Step 14870: lr=1.17E-06, loss= 1.0667 (max= 1.5673), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:49,893 - root - INFO - Step 14870: lr=1.17E-06, loss= 1.0667 (max= 1.5673), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:49,893 - root - INFO - Step 14870: lr=1.17E-06, loss= 1.0667 (max= 1.5673), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:49,893 - root - INFO - Step 14870: lr=1.17E-06, loss= 1.0667 (max= 1.5673), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:49,893 - root - INFO - Step 14870: lr=1.17E-06, loss= 1.0667 (max= 1.5673), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:49,893 - root - INFO - Step 14870: lr=1.17E-06, loss= 1.0667 (max= 1.5673), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:18:49,893 - root - INFO - Step 14870: lr=1.17E-06, loss= 1.0667 (max= 1.5673), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:21,821 - root - INFO - Step 14880: lr=1.17E-06, loss= 1.0860 (max= 1.8202), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:21,820 - root - INFO - Step 14880: lr=1.17E-06, loss= 1.0860 (max= 1.8202), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:21,821 - root - INFO - Step 14880: lr=1.17E-06, loss= 1.0860 (max= 1.8202), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:21,821 - root - INFO - Step 14880: lr=1.17E-06, loss= 1.0860 (max= 1.8202), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:21,821 - root - INFO - Step 14880: lr=1.17E-06, loss= 1.0860 (max= 1.8202), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:21,821 - root - INFO - Step 14880: lr=1.17E-06, loss= 1.0860 (max= 1.8202), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:21,821 - root - INFO - Step 14880: lr=1.17E-06, loss= 1.0860 (max= 1.8202), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:21,821 - root - INFO - Step 14880: lr=1.17E-06, loss= 1.0860 (max= 1.8202), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:53,782 - root - INFO - Step 14890: lr=1.16E-06, loss= 1.1107 (max= 1.6364), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:53,783 - root - INFO - Step 14890: lr=1.16E-06, loss= 1.1107 (max= 1.6364), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:53,783 - root - INFO - Step 14890: lr=1.16E-06, loss= 1.1107 (max= 1.6364), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:53,783 - root - INFO - Step 14890: lr=1.16E-06, loss= 1.1107 (max= 1.6364), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:53,783 - root - INFO - Step 14890: lr=1.16E-06, loss= 1.1107 (max= 1.6364), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:53,783 - root - INFO - Step 14890: lr=1.16E-06, loss= 1.1107 (max= 1.6364), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:53,783 - root - INFO - Step 14890: lr=1.16E-06, loss= 1.1107 (max= 1.6364), tps=20506, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:19:53,783 - root - INFO - Step 14890: lr=1.16E-06, loss= 1.1107 (max= 1.6364), tps=20507, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:25,622 - root - INFO - Step 14900: lr=1.16E-06, loss= 1.0930 (max= 1.5534), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:25,622 - root - INFO - Step 14900: lr=1.16E-06, loss= 1.0930 (max= 1.5534), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:25,622 - root - INFO - Step 14900: lr=1.16E-06, loss= 1.0930 (max= 1.5534), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:25,622 - root - INFO - Step 14900: lr=1.16E-06, loss= 1.0930 (max= 1.5534), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:25,622 - root - INFO - Step 14900: lr=1.16E-06, loss= 1.0930 (max= 1.5534), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:25,622 - root - INFO - Step 14900: lr=1.16E-06, loss= 1.0930 (max= 1.5534), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:25,622 - root - INFO - Step 14900: lr=1.16E-06, loss= 1.0930 (max= 1.5534), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:25,622 - root - INFO - Step 14900: lr=1.16E-06, loss= 1.0930 (max= 1.5534), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:57,547 - root - INFO - Step 14910: lr=1.16E-06, loss= 1.0653 (max= 1.5057), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:57,547 - root - INFO - Step 14910: lr=1.16E-06, loss= 1.0653 (max= 1.5057), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:57,547 - root - INFO - Step 14910: lr=1.16E-06, loss= 1.0653 (max= 1.5057), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:57,547 - root - INFO - Step 14910: lr=1.16E-06, loss= 1.0653 (max= 1.5057), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:57,547 - root - INFO - Step 14910: lr=1.16E-06, loss= 1.0653 (max= 1.5057), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:57,547 - root - INFO - Step 14910: lr=1.16E-06, loss= 1.0653 (max= 1.5057), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:57,547 - root - INFO - Step 14910: lr=1.16E-06, loss= 1.0653 (max= 1.5057), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:20:57,547 - root - INFO - Step 14910: lr=1.16E-06, loss= 1.0653 (max= 1.5057), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:21:29,368 - root - INFO - Step 14920: lr=1.15E-06, loss= 1.0834 (max= 1.4406), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:21:29,368 - root - INFO - Step 14920: lr=1.15E-06, loss= 1.0834 (max= 1.4406), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:21:29,368 - root - INFO - Step 14920: lr=1.15E-06, loss= 1.0834 (max= 1.4406), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:21:29,368 - root - INFO - Step 14920: lr=1.15E-06, loss= 1.0834 (max= 1.4406), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:21:29,368 - root - INFO - Step 14920: lr=1.15E-06, loss= 1.0834 (max= 1.4406), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:21:29,368 - root - INFO - Step 14920: lr=1.15E-06, loss= 1.0834 (max= 1.4406), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:21:29,368 - root - INFO - Step 14920: lr=1.15E-06, loss= 1.0834 (max= 1.4406), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:21:29,368 - root - INFO - Step 14920: lr=1.15E-06, loss= 1.0834 (max= 1.4406), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:01,227 - root - INFO - Step 14930: lr=1.15E-06, loss= 1.0647 (max= 1.7079), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:01,227 - root - INFO - Step 14930: lr=1.15E-06, loss= 1.0647 (max= 1.7079), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:01,227 - root - INFO - Step 14930: lr=1.15E-06, loss= 1.0647 (max= 1.7079), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:01,227 - root - INFO - Step 14930: lr=1.15E-06, loss= 1.0647 (max= 1.7079), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:01,227 - root - INFO - Step 14930: lr=1.15E-06, loss= 1.0647 (max= 1.7079), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:01,227 - root - INFO - Step 14930: lr=1.15E-06, loss= 1.0647 (max= 1.7079), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:01,227 - root - INFO - Step 14930: lr=1.15E-06, loss= 1.0647 (max= 1.7079), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:01,227 - root - INFO - Step 14930: lr=1.15E-06, loss= 1.0647 (max= 1.7079), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:33,056 - root - INFO - Step 14940: lr=1.15E-06, loss= 1.0792 (max= 1.4401), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:33,056 - root - INFO - Step 14940: lr=1.15E-06, loss= 1.0792 (max= 1.4401), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:33,056 - root - INFO - Step 14940: lr=1.15E-06, loss= 1.0792 (max= 1.4401), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:33,056 - root - INFO - Step 14940: lr=1.15E-06, loss= 1.0792 (max= 1.4401), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:33,056 - root - INFO - Step 14940: lr=1.15E-06, loss= 1.0792 (max= 1.4401), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:33,056 - root - INFO - Step 14940: lr=1.15E-06, loss= 1.0792 (max= 1.4401), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:33,056 - root - INFO - Step 14940: lr=1.15E-06, loss= 1.0792 (max= 1.4401), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:22:33,056 - root - INFO - Step 14940: lr=1.15E-06, loss= 1.0792 (max= 1.4401), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:04,959 - root - INFO - Step 14950: lr=1.14E-06, loss= 1.0967 (max= 1.6178), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:04,959 - root - INFO - Step 14950: lr=1.14E-06, loss= 1.0967 (max= 1.6178), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:04,959 - root - INFO - Step 14950: lr=1.14E-06, loss= 1.0967 (max= 1.6178), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:04,959 - root - INFO - Step 14950: lr=1.14E-06, loss= 1.0967 (max= 1.6178), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:04,959 - root - INFO - Step 14950: lr=1.14E-06, loss= 1.0967 (max= 1.6178), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:04,959 - root - INFO - Step 14950: lr=1.14E-06, loss= 1.0967 (max= 1.6178), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:04,959 - root - INFO - Step 14950: lr=1.14E-06, loss= 1.0967 (max= 1.6178), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:04,959 - root - INFO - Step 14950: lr=1.14E-06, loss= 1.0967 (max= 1.6178), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:36,803 - root - INFO - Step 14960: lr=1.14E-06, loss= 1.0872 (max= 1.5832), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:36,803 - root - INFO - Step 14960: lr=1.14E-06, loss= 1.0872 (max= 1.5832), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:36,803 - root - INFO - Step 14960: lr=1.14E-06, loss= 1.0872 (max= 1.5832), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:36,803 - root - INFO - Step 14960: lr=1.14E-06, loss= 1.0872 (max= 1.5832), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:36,803 - root - INFO - Step 14960: lr=1.14E-06, loss= 1.0872 (max= 1.5832), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:36,803 - root - INFO - Step 14960: lr=1.14E-06, loss= 1.0872 (max= 1.5832), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:36,804 - root - INFO - Step 14960: lr=1.14E-06, loss= 1.0872 (max= 1.5832), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:23:36,804 - root - INFO - Step 14960: lr=1.14E-06, loss= 1.0872 (max= 1.5832), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:08,626 - root - INFO - Step 14970: lr=1.14E-06, loss= 1.0775 (max= 1.6121), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:08,626 - root - INFO - Step 14970: lr=1.14E-06, loss= 1.0775 (max= 1.6121), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:08,626 - root - INFO - Step 14970: lr=1.14E-06, loss= 1.0775 (max= 1.6121), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:08,626 - root - INFO - Step 14970: lr=1.14E-06, loss= 1.0775 (max= 1.6121), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:08,626 - root - INFO - Step 14970: lr=1.14E-06, loss= 1.0775 (max= 1.6121), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:08,626 - root - INFO - Step 14970: lr=1.14E-06, loss= 1.0775 (max= 1.6121), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:08,626 - root - INFO - Step 14970: lr=1.14E-06, loss= 1.0775 (max= 1.6121), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:08,626 - root - INFO - Step 14970: lr=1.14E-06, loss= 1.0775 (max= 1.6121), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:40,490 - root - INFO - Step 14980: lr=1.14E-06, loss= 1.0759 (max= 1.6110), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:40,490 - root - INFO - Step 14980: lr=1.14E-06, loss= 1.0759 (max= 1.6110), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:40,490 - root - INFO - Step 14980: lr=1.14E-06, loss= 1.0759 (max= 1.6110), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:40,490 - root - INFO - Step 14980: lr=1.14E-06, loss= 1.0759 (max= 1.6110), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:40,490 - root - INFO - Step 14980: lr=1.14E-06, loss= 1.0759 (max= 1.6110), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:40,490 - root - INFO - Step 14980: lr=1.14E-06, loss= 1.0759 (max= 1.6110), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:40,490 - root - INFO - Step 14980: lr=1.14E-06, loss= 1.0759 (max= 1.6110), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:24:40,490 - root - INFO - Step 14980: lr=1.14E-06, loss= 1.0759 (max= 1.6110), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:12,474 - root - INFO - Step 14990: lr=1.13E-06, loss= 1.0644 (max= 1.4410), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:12,475 - root - INFO - Step 14990: lr=1.13E-06, loss= 1.0644 (max= 1.4410), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:12,475 - root - INFO - Step 14990: lr=1.13E-06, loss= 1.0644 (max= 1.4410), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:12,475 - root - INFO - Step 14990: lr=1.13E-06, loss= 1.0644 (max= 1.4410), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:12,475 - root - INFO - Step 14990: lr=1.13E-06, loss= 1.0644 (max= 1.4410), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:12,475 - root - INFO - Step 14990: lr=1.13E-06, loss= 1.0644 (max= 1.4410), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:12,475 - root - INFO - Step 14990: lr=1.13E-06, loss= 1.0644 (max= 1.4410), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:12,475 - root - INFO - Step 14990: lr=1.13E-06, loss= 1.0644 (max= 1.4410), tps=20492, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-15000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-15000! Save time: 4.529843330383301 2025-10-26 21:25:44,199 - root - INFO - Step 15000: lr=1.13E-06, loss= 1.0699 (max= 1.6705), tps=20660, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:44,199 - root - INFO - Saving a full checkpoint at step 15000 2025-10-26 21:25:44,199 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 21:25:44,199 - root - INFO - Step 15000: lr=1.13E-06, loss= 1.0699 (max= 1.6705), tps=20660, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:44,199 - root - INFO - Step 15000: lr=1.13E-06, loss= 1.0699 (max= 1.6705), tps=20660, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:44,199 - root - INFO - Step 15000: lr=1.13E-06, loss= 1.0699 (max= 1.6705), tps=20660, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:44,199 - root - INFO - Step 15000: lr=1.13E-06, loss= 1.0699 (max= 1.6705), tps=20660, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:44,199 - root - INFO - Saving a full checkpoint at step 15000 2025-10-26 21:25:44,199 - root - INFO - Saving a full checkpoint at step 15000 2025-10-26 21:25:44,199 - root - INFO - Step 15000: lr=1.13E-06, loss= 1.0699 (max= 1.6705), tps=20660, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:44,199 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 21:25:44,199 - root - INFO - Step 15000: lr=1.13E-06, loss= 1.0699 (max= 1.6705), tps=20660, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:44,199 - root - INFO - Saving a full checkpoint at step 15000 2025-10-26 21:25:44,199 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 21:25:44,199 - root - INFO - Saving a full checkpoint at step 15000 2025-10-26 21:25:44,199 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 21:25:44,199 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 21:25:44,199 - root - INFO - Step 15000: lr=1.13E-06, loss= 1.0699 (max= 1.6705), tps=20660, mfu=43.05%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:25:44,199 - root - INFO - Saving a full checkpoint at step 15000 2025-10-26 21:25:44,199 - root - INFO - Saving a full checkpoint at step 15000 2025-10-26 21:25:44,199 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 21:25:44,199 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 21:25:44,199 - root - INFO - Saving a full checkpoint at step 15000 2025-10-26 21:25:44,199 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 21:25:59,309 - root - INFO - Finished saving the checkpoint in 15.11 seconds 2025-10-26 21:25:59,316 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 21:25:59,317 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 21:25:59,317 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 21:25:59,317 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 21:25:59,317 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 21:25:59,317 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 21:25:59,318 - root - INFO - Finished saving the checkpoint in 15.12 seconds 2025-10-26 21:26:04,807 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:118001 2025-10-26 21:26:31,116 - root - INFO - Step 15010: lr=1.13E-06, loss= 1.0773 (max= 1.5144), tps=13969, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:26:31,116 - root - INFO - Step 15010: lr=1.13E-06, loss= 1.0773 (max= 1.5144), tps=13970, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:26:31,116 - root - INFO - Step 15010: lr=1.13E-06, loss= 1.0773 (max= 1.5144), tps=13969, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:26:31,116 - root - INFO - Step 15010: lr=1.13E-06, loss= 1.0773 (max= 1.5144), tps=13969, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:26:31,116 - root - INFO - Step 15010: lr=1.13E-06, loss= 1.0773 (max= 1.5144), tps=13969, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:26:31,116 - root - INFO - Step 15010: lr=1.13E-06, loss= 1.0773 (max= 1.5144), tps=13970, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:26:31,116 - root - INFO - Step 15010: lr=1.13E-06, loss= 1.0773 (max= 1.5144), tps=13969, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:26:31,116 - root - INFO - Step 15010: lr=1.13E-06, loss= 1.0773 (max= 1.5144), tps=13970, mfu=29.11%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:02,978 - root - INFO - Step 15020: lr=1.12E-06, loss= 1.0673 (max= 1.5721), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:02,978 - root - INFO - Step 15020: lr=1.12E-06, loss= 1.0673 (max= 1.5721), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:02,978 - root - INFO - Step 15020: lr=1.12E-06, loss= 1.0673 (max= 1.5721), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:02,978 - root - INFO - Step 15020: lr=1.12E-06, loss= 1.0673 (max= 1.5721), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:02,978 - root - INFO - Step 15020: lr=1.12E-06, loss= 1.0673 (max= 1.5721), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:02,978 - root - INFO - Step 15020: lr=1.12E-06, loss= 1.0673 (max= 1.5721), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:02,978 - root - INFO - Step 15020: lr=1.12E-06, loss= 1.0673 (max= 1.5721), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:02,978 - root - INFO - Step 15020: lr=1.12E-06, loss= 1.0673 (max= 1.5721), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:34,837 - root - INFO - Step 15030: lr=1.12E-06, loss= 1.0594 (max= 1.5337), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:34,837 - root - INFO - Step 15030: lr=1.12E-06, loss= 1.0594 (max= 1.5337), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:34,837 - root - INFO - Step 15030: lr=1.12E-06, loss= 1.0594 (max= 1.5337), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:34,837 - root - INFO - Step 15030: lr=1.12E-06, loss= 1.0594 (max= 1.5337), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:34,837 - root - INFO - Step 15030: lr=1.12E-06, loss= 1.0594 (max= 1.5337), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:34,837 - root - INFO - Step 15030: lr=1.12E-06, loss= 1.0594 (max= 1.5337), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:34,837 - root - INFO - Step 15030: lr=1.12E-06, loss= 1.0594 (max= 1.5337), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:27:34,837 - root - INFO - Step 15030: lr=1.12E-06, loss= 1.0594 (max= 1.5337), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:06,778 - root - INFO - Step 15040: lr=1.12E-06, loss= 1.0617 (max= 1.5018), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:06,778 - root - INFO - Step 15040: lr=1.12E-06, loss= 1.0617 (max= 1.5018), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:06,778 - root - INFO - Step 15040: lr=1.12E-06, loss= 1.0617 (max= 1.5018), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:06,779 - root - INFO - Step 15040: lr=1.12E-06, loss= 1.0617 (max= 1.5018), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:06,779 - root - INFO - Step 15040: lr=1.12E-06, loss= 1.0617 (max= 1.5018), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:06,779 - root - INFO - Step 15040: lr=1.12E-06, loss= 1.0617 (max= 1.5018), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:06,779 - root - INFO - Step 15040: lr=1.12E-06, loss= 1.0617 (max= 1.5018), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:06,779 - root - INFO - Step 15040: lr=1.12E-06, loss= 1.0617 (max= 1.5018), tps=20520, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:38,581 - root - INFO - Step 15050: lr=1.11E-06, loss= 1.0596 (max= 1.4550), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:38,581 - root - INFO - Step 15050: lr=1.11E-06, loss= 1.0596 (max= 1.4550), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:38,581 - root - INFO - Step 15050: lr=1.11E-06, loss= 1.0596 (max= 1.4550), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:38,581 - root - INFO - Step 15050: lr=1.11E-06, loss= 1.0596 (max= 1.4550), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:38,581 - root - INFO - Step 15050: lr=1.11E-06, loss= 1.0596 (max= 1.4550), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:38,582 - root - INFO - Step 15050: lr=1.11E-06, loss= 1.0596 (max= 1.4550), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:38,582 - root - INFO - Step 15050: lr=1.11E-06, loss= 1.0596 (max= 1.4550), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:28:38,582 - root - INFO - Step 15050: lr=1.11E-06, loss= 1.0596 (max= 1.4550), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:10,524 - root - INFO - Step 15060: lr=1.11E-06, loss= 1.0733 (max= 1.6186), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:10,524 - root - INFO - Step 15060: lr=1.11E-06, loss= 1.0733 (max= 1.6186), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:10,524 - root - INFO - Step 15060: lr=1.11E-06, loss= 1.0733 (max= 1.6186), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:10,524 - root - INFO - Step 15060: lr=1.11E-06, loss= 1.0733 (max= 1.6186), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:10,524 - root - INFO - Step 15060: lr=1.11E-06, loss= 1.0733 (max= 1.6186), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:10,524 - root - INFO - Step 15060: lr=1.11E-06, loss= 1.0733 (max= 1.6186), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:10,524 - root - INFO - Step 15060: lr=1.11E-06, loss= 1.0733 (max= 1.6186), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:10,524 - root - INFO - Step 15060: lr=1.11E-06, loss= 1.0733 (max= 1.6186), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:42,393 - root - INFO - Step 15070: lr=1.11E-06, loss= 1.0869 (max= 1.5084), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:42,393 - root - INFO - Step 15070: lr=1.11E-06, loss= 1.0869 (max= 1.5084), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:42,393 - root - INFO - Step 15070: lr=1.11E-06, loss= 1.0869 (max= 1.5084), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:42,393 - root - INFO - Step 15070: lr=1.11E-06, loss= 1.0869 (max= 1.5084), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:42,393 - root - INFO - Step 15070: lr=1.11E-06, loss= 1.0869 (max= 1.5084), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:42,394 - root - INFO - Step 15070: lr=1.11E-06, loss= 1.0869 (max= 1.5084), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:42,394 - root - INFO - Step 15070: lr=1.11E-06, loss= 1.0869 (max= 1.5084), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:29:42,394 - root - INFO - Step 15070: lr=1.11E-06, loss= 1.0869 (max= 1.5084), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:14,228 - root - INFO - Step 15080: lr=1.10E-06, loss= 1.0928 (max= 1.6198), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:14,228 - root - INFO - Step 15080: lr=1.10E-06, loss= 1.0928 (max= 1.6198), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:14,228 - root - INFO - Step 15080: lr=1.10E-06, loss= 1.0928 (max= 1.6198), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:14,228 - root - INFO - Step 15080: lr=1.10E-06, loss= 1.0928 (max= 1.6198), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:14,228 - root - INFO - Step 15080: lr=1.10E-06, loss= 1.0928 (max= 1.6198), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:14,228 - root - INFO - Step 15080: lr=1.10E-06, loss= 1.0928 (max= 1.6198), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:14,228 - root - INFO - Step 15080: lr=1.10E-06, loss= 1.0928 (max= 1.6198), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:14,228 - root - INFO - Step 15080: lr=1.10E-06, loss= 1.0928 (max= 1.6198), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:45,992 - root - INFO - Step 15090: lr=1.10E-06, loss= 1.0838 (max= 1.5439), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:45,993 - root - INFO - Step 15090: lr=1.10E-06, loss= 1.0838 (max= 1.5439), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:45,993 - root - INFO - Step 15090: lr=1.10E-06, loss= 1.0838 (max= 1.5439), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:45,993 - root - INFO - Step 15090: lr=1.10E-06, loss= 1.0838 (max= 1.5439), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:45,993 - root - INFO - Step 15090: lr=1.10E-06, loss= 1.0838 (max= 1.5439), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:45,993 - root - INFO - Step 15090: lr=1.10E-06, loss= 1.0838 (max= 1.5439), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:45,993 - root - INFO - Step 15090: lr=1.10E-06, loss= 1.0838 (max= 1.5439), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:30:45,993 - root - INFO - Step 15090: lr=1.10E-06, loss= 1.0838 (max= 1.5439), tps=20634, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:17,784 - root - INFO - Step 15100: lr=1.10E-06, loss= 1.0688 (max= 1.5475), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:17,784 - root - INFO - Step 15100: lr=1.10E-06, loss= 1.0688 (max= 1.5475), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:17,784 - root - INFO - Step 15100: lr=1.10E-06, loss= 1.0688 (max= 1.5475), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:17,785 - root - INFO - Step 15100: lr=1.10E-06, loss= 1.0688 (max= 1.5475), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:17,785 - root - INFO - Step 15100: lr=1.10E-06, loss= 1.0688 (max= 1.5475), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:17,785 - root - INFO - Step 15100: lr=1.10E-06, loss= 1.0688 (max= 1.5475), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:17,785 - root - INFO - Step 15100: lr=1.10E-06, loss= 1.0688 (max= 1.5475), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:17,785 - root - INFO - Step 15100: lr=1.10E-06, loss= 1.0688 (max= 1.5475), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:49,587 - root - INFO - Step 15110: lr=1.10E-06, loss= 1.0858 (max= 1.6556), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:49,587 - root - INFO - Step 15110: lr=1.10E-06, loss= 1.0858 (max= 1.6556), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:49,587 - root - INFO - Step 15110: lr=1.10E-06, loss= 1.0858 (max= 1.6556), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:49,587 - root - INFO - Step 15110: lr=1.10E-06, loss= 1.0858 (max= 1.6556), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:49,587 - root - INFO - Step 15110: lr=1.10E-06, loss= 1.0858 (max= 1.6556), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:49,587 - root - INFO - Step 15110: lr=1.10E-06, loss= 1.0858 (max= 1.6556), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:49,587 - root - INFO - Step 15110: lr=1.10E-06, loss= 1.0858 (max= 1.6556), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:31:49,587 - root - INFO - Step 15110: lr=1.10E-06, loss= 1.0858 (max= 1.6556), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:21,429 - root - INFO - Step 15120: lr=1.09E-06, loss= 1.0408 (max= 1.4633), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:21,429 - root - INFO - Step 15120: lr=1.09E-06, loss= 1.0408 (max= 1.4633), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:21,429 - root - INFO - Step 15120: lr=1.09E-06, loss= 1.0408 (max= 1.4633), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:21,429 - root - INFO - Step 15120: lr=1.09E-06, loss= 1.0408 (max= 1.4633), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:21,429 - root - INFO - Step 15120: lr=1.09E-06, loss= 1.0408 (max= 1.4633), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:21,429 - root - INFO - Step 15120: lr=1.09E-06, loss= 1.0408 (max= 1.4633), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:21,429 - root - INFO - Step 15120: lr=1.09E-06, loss= 1.0408 (max= 1.4633), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:21,429 - root - INFO - Step 15120: lr=1.09E-06, loss= 1.0408 (max= 1.4633), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:53,254 - root - INFO - Step 15130: lr=1.09E-06, loss= 1.0882 (max= 1.5423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:53,254 - root - INFO - Step 15130: lr=1.09E-06, loss= 1.0882 (max= 1.5423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:53,254 - root - INFO - Step 15130: lr=1.09E-06, loss= 1.0882 (max= 1.5423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:53,254 - root - INFO - Step 15130: lr=1.09E-06, loss= 1.0882 (max= 1.5423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:53,254 - root - INFO - Step 15130: lr=1.09E-06, loss= 1.0882 (max= 1.5423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:53,254 - root - INFO - Step 15130: lr=1.09E-06, loss= 1.0882 (max= 1.5423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:53,254 - root - INFO - Step 15130: lr=1.09E-06, loss= 1.0882 (max= 1.5423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:32:53,254 - root - INFO - Step 15130: lr=1.09E-06, loss= 1.0882 (max= 1.5423), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:25,149 - root - INFO - Step 15140: lr=1.09E-06, loss= 1.0416 (max= 1.5645), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:25,149 - root - INFO - Step 15140: lr=1.09E-06, loss= 1.0416 (max= 1.5645), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:25,149 - root - INFO - Step 15140: lr=1.09E-06, loss= 1.0416 (max= 1.5645), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:25,149 - root - INFO - Step 15140: lr=1.09E-06, loss= 1.0416 (max= 1.5645), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:25,149 - root - INFO - Step 15140: lr=1.09E-06, loss= 1.0416 (max= 1.5645), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:25,149 - root - INFO - Step 15140: lr=1.09E-06, loss= 1.0416 (max= 1.5645), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:25,149 - root - INFO - Step 15140: lr=1.09E-06, loss= 1.0416 (max= 1.5645), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:25,149 - root - INFO - Step 15140: lr=1.09E-06, loss= 1.0416 (max= 1.5645), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:57,027 - root - INFO - Step 15150: lr=1.08E-06, loss= 1.0713 (max= 1.5856), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:57,027 - root - INFO - Step 15150: lr=1.08E-06, loss= 1.0713 (max= 1.5856), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:57,027 - root - INFO - Step 15150: lr=1.08E-06, loss= 1.0713 (max= 1.5856), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:57,027 - root - INFO - Step 15150: lr=1.08E-06, loss= 1.0713 (max= 1.5856), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:57,027 - root - INFO - Step 15150: lr=1.08E-06, loss= 1.0713 (max= 1.5856), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:57,027 - root - INFO - Step 15150: lr=1.08E-06, loss= 1.0713 (max= 1.5856), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:57,027 - root - INFO - Step 15150: lr=1.08E-06, loss= 1.0713 (max= 1.5856), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:33:57,027 - root - INFO - Step 15150: lr=1.08E-06, loss= 1.0713 (max= 1.5856), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:34:13,490 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5340366 2025-10-26 21:34:28,806 - root - INFO - Step 15160: lr=1.08E-06, loss= 1.0492 (max= 1.6002), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:34:28,806 - root - INFO - Step 15160: lr=1.08E-06, loss= 1.0492 (max= 1.6002), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:34:28,806 - root - INFO - Step 15160: lr=1.08E-06, loss= 1.0492 (max= 1.6002), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:34:28,806 - root - INFO - Step 15160: lr=1.08E-06, loss= 1.0492 (max= 1.6002), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:34:28,806 - root - INFO - Step 15160: lr=1.08E-06, loss= 1.0492 (max= 1.6002), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:34:28,806 - root - INFO - Step 15160: lr=1.08E-06, loss= 1.0492 (max= 1.6002), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:34:28,806 - root - INFO - Step 15160: lr=1.08E-06, loss= 1.0492 (max= 1.6002), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:34:28,806 - root - INFO - Step 15160: lr=1.08E-06, loss= 1.0492 (max= 1.6002), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:00,656 - root - INFO - Step 15170: lr=1.08E-06, loss= 1.0619 (max= 1.6147), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:00,656 - root - INFO - Step 15170: lr=1.08E-06, loss= 1.0619 (max= 1.6147), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:00,656 - root - INFO - Step 15170: lr=1.08E-06, loss= 1.0619 (max= 1.6147), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:00,656 - root - INFO - Step 15170: lr=1.08E-06, loss= 1.0619 (max= 1.6147), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:00,656 - root - INFO - Step 15170: lr=1.08E-06, loss= 1.0619 (max= 1.6147), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:00,656 - root - INFO - Step 15170: lr=1.08E-06, loss= 1.0619 (max= 1.6147), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:00,656 - root - INFO - Step 15170: lr=1.08E-06, loss= 1.0619 (max= 1.6147), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:00,656 - root - INFO - Step 15170: lr=1.08E-06, loss= 1.0619 (max= 1.6147), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:32,492 - root - INFO - Step 15180: lr=1.07E-06, loss= 1.0628 (max= 1.4806), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:32,492 - root - INFO - Step 15180: lr=1.07E-06, loss= 1.0628 (max= 1.4806), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:32,492 - root - INFO - Step 15180: lr=1.07E-06, loss= 1.0628 (max= 1.4806), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:32,492 - root - INFO - Step 15180: lr=1.07E-06, loss= 1.0628 (max= 1.4806), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:32,492 - root - INFO - Step 15180: lr=1.07E-06, loss= 1.0628 (max= 1.4806), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:32,492 - root - INFO - Step 15180: lr=1.07E-06, loss= 1.0628 (max= 1.4806), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:32,492 - root - INFO - Step 15180: lr=1.07E-06, loss= 1.0628 (max= 1.4806), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:35:32,492 - root - INFO - Step 15180: lr=1.07E-06, loss= 1.0628 (max= 1.4806), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:04,418 - root - INFO - Step 15190: lr=1.07E-06, loss= 1.0529 (max= 1.4556), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:04,418 - root - INFO - Step 15190: lr=1.07E-06, loss= 1.0529 (max= 1.4556), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:04,418 - root - INFO - Step 15190: lr=1.07E-06, loss= 1.0529 (max= 1.4556), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:04,418 - root - INFO - Step 15190: lr=1.07E-06, loss= 1.0529 (max= 1.4556), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:04,418 - root - INFO - Step 15190: lr=1.07E-06, loss= 1.0529 (max= 1.4556), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:04,418 - root - INFO - Step 15190: lr=1.07E-06, loss= 1.0529 (max= 1.4556), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:04,418 - root - INFO - Step 15190: lr=1.07E-06, loss= 1.0529 (max= 1.4556), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:04,418 - root - INFO - Step 15190: lr=1.07E-06, loss= 1.0529 (max= 1.4556), tps=20530, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:36,430 - root - INFO - Step 15200: lr=1.07E-06, loss= 1.0517 (max= 1.6279), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:36,431 - root - INFO - Step 15200: lr=1.07E-06, loss= 1.0517 (max= 1.6279), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:36,431 - root - INFO - Step 15200: lr=1.07E-06, loss= 1.0517 (max= 1.6279), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:36,431 - root - INFO - Step 15200: lr=1.07E-06, loss= 1.0517 (max= 1.6279), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:36,431 - root - INFO - Step 15200: lr=1.07E-06, loss= 1.0517 (max= 1.6279), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:36,431 - root - INFO - Step 15200: lr=1.07E-06, loss= 1.0517 (max= 1.6279), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:36,431 - root - INFO - Step 15200: lr=1.07E-06, loss= 1.0517 (max= 1.6279), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:36:36,431 - root - INFO - Step 15200: lr=1.07E-06, loss= 1.0517 (max= 1.6279), tps=20474, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:08,258 - root - INFO - Step 15210: lr=1.07E-06, loss= 1.0606 (max= 1.5888), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:08,258 - root - INFO - Step 15210: lr=1.07E-06, loss= 1.0606 (max= 1.5888), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:08,258 - root - INFO - Step 15210: lr=1.07E-06, loss= 1.0606 (max= 1.5888), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:08,258 - root - INFO - Step 15210: lr=1.07E-06, loss= 1.0606 (max= 1.5888), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:08,258 - root - INFO - Step 15210: lr=1.07E-06, loss= 1.0606 (max= 1.5888), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:08,258 - root - INFO - Step 15210: lr=1.07E-06, loss= 1.0606 (max= 1.5888), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:08,258 - root - INFO - Step 15210: lr=1.07E-06, loss= 1.0606 (max= 1.5888), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:08,258 - root - INFO - Step 15210: lr=1.07E-06, loss= 1.0606 (max= 1.5888), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:40,072 - root - INFO - Step 15220: lr=1.06E-06, loss= 1.0752 (max= 1.7200), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:40,073 - root - INFO - Step 15220: lr=1.06E-06, loss= 1.0752 (max= 1.7200), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:40,073 - root - INFO - Step 15220: lr=1.06E-06, loss= 1.0752 (max= 1.7200), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:40,073 - root - INFO - Step 15220: lr=1.06E-06, loss= 1.0752 (max= 1.7200), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:40,073 - root - INFO - Step 15220: lr=1.06E-06, loss= 1.0752 (max= 1.7200), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:40,073 - root - INFO - Step 15220: lr=1.06E-06, loss= 1.0752 (max= 1.7200), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:40,073 - root - INFO - Step 15220: lr=1.06E-06, loss= 1.0752 (max= 1.7200), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:37:40,073 - root - INFO - Step 15220: lr=1.06E-06, loss= 1.0752 (max= 1.7200), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:11,911 - root - INFO - Step 15230: lr=1.06E-06, loss= 1.0623 (max= 1.6633), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:11,911 - root - INFO - Step 15230: lr=1.06E-06, loss= 1.0623 (max= 1.6633), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:11,911 - root - INFO - Step 15230: lr=1.06E-06, loss= 1.0623 (max= 1.6633), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:11,911 - root - INFO - Step 15230: lr=1.06E-06, loss= 1.0623 (max= 1.6633), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:11,911 - root - INFO - Step 15230: lr=1.06E-06, loss= 1.0623 (max= 1.6633), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:11,911 - root - INFO - Step 15230: lr=1.06E-06, loss= 1.0623 (max= 1.6633), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:11,911 - root - INFO - Step 15230: lr=1.06E-06, loss= 1.0623 (max= 1.6633), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:11,911 - root - INFO - Step 15230: lr=1.06E-06, loss= 1.0623 (max= 1.6633), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:43,724 - root - INFO - Step 15240: lr=1.06E-06, loss= 1.0591 (max= 1.5217), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:43,724 - root - INFO - Step 15240: lr=1.06E-06, loss= 1.0591 (max= 1.5217), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:43,724 - root - INFO - Step 15240: lr=1.06E-06, loss= 1.0591 (max= 1.5217), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:43,724 - root - INFO - Step 15240: lr=1.06E-06, loss= 1.0591 (max= 1.5217), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:43,724 - root - INFO - Step 15240: lr=1.06E-06, loss= 1.0591 (max= 1.5217), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:43,724 - root - INFO - Step 15240: lr=1.06E-06, loss= 1.0591 (max= 1.5217), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:43,724 - root - INFO - Step 15240: lr=1.06E-06, loss= 1.0591 (max= 1.5217), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:38:43,724 - root - INFO - Step 15240: lr=1.06E-06, loss= 1.0591 (max= 1.5217), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:15,666 - root - INFO - Step 15250: lr=1.05E-06, loss= 1.0745 (max= 1.5615), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:15,666 - root - INFO - Step 15250: lr=1.05E-06, loss= 1.0745 (max= 1.5615), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:15,666 - root - INFO - Step 15250: lr=1.05E-06, loss= 1.0745 (max= 1.5615), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:15,666 - root - INFO - Step 15250: lr=1.05E-06, loss= 1.0745 (max= 1.5615), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:15,666 - root - INFO - Step 15250: lr=1.05E-06, loss= 1.0745 (max= 1.5615), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:15,666 - root - INFO - Step 15250: lr=1.05E-06, loss= 1.0745 (max= 1.5615), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:15,666 - root - INFO - Step 15250: lr=1.05E-06, loss= 1.0745 (max= 1.5615), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:15,666 - root - INFO - Step 15250: lr=1.05E-06, loss= 1.0745 (max= 1.5615), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:47,518 - root - INFO - Step 15260: lr=1.05E-06, loss= 1.0383 (max= 1.4690), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:47,518 - root - INFO - Step 15260: lr=1.05E-06, loss= 1.0383 (max= 1.4690), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:47,518 - root - INFO - Step 15260: lr=1.05E-06, loss= 1.0383 (max= 1.4690), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:47,518 - root - INFO - Step 15260: lr=1.05E-06, loss= 1.0383 (max= 1.4690), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:47,518 - root - INFO - Step 15260: lr=1.05E-06, loss= 1.0383 (max= 1.4690), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:47,518 - root - INFO - Step 15260: lr=1.05E-06, loss= 1.0383 (max= 1.4690), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:47,518 - root - INFO - Step 15260: lr=1.05E-06, loss= 1.0383 (max= 1.4690), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:39:47,518 - root - INFO - Step 15260: lr=1.05E-06, loss= 1.0383 (max= 1.4690), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:19,438 - root - INFO - Step 15270: lr=1.05E-06, loss= 1.0690 (max= 1.5369), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:19,438 - root - INFO - Step 15270: lr=1.05E-06, loss= 1.0690 (max= 1.5369), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:19,439 - root - INFO - Step 15270: lr=1.05E-06, loss= 1.0690 (max= 1.5369), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:19,439 - root - INFO - Step 15270: lr=1.05E-06, loss= 1.0690 (max= 1.5369), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:19,439 - root - INFO - Step 15270: lr=1.05E-06, loss= 1.0690 (max= 1.5369), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:19,439 - root - INFO - Step 15270: lr=1.05E-06, loss= 1.0690 (max= 1.5369), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:19,439 - root - INFO - Step 15270: lr=1.05E-06, loss= 1.0690 (max= 1.5369), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:19,439 - root - INFO - Step 15270: lr=1.05E-06, loss= 1.0690 (max= 1.5369), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:51,220 - root - INFO - Step 15280: lr=1.04E-06, loss= 1.0733 (max= 1.4729), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:51,220 - root - INFO - Step 15280: lr=1.04E-06, loss= 1.0733 (max= 1.4729), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:51,220 - root - INFO - Step 15280: lr=1.04E-06, loss= 1.0733 (max= 1.4729), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:51,220 - root - INFO - Step 15280: lr=1.04E-06, loss= 1.0733 (max= 1.4729), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:51,220 - root - INFO - Step 15280: lr=1.04E-06, loss= 1.0733 (max= 1.4729), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:51,220 - root - INFO - Step 15280: lr=1.04E-06, loss= 1.0733 (max= 1.4729), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:51,220 - root - INFO - Step 15280: lr=1.04E-06, loss= 1.0733 (max= 1.4729), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:40:51,221 - root - INFO - Step 15280: lr=1.04E-06, loss= 1.0733 (max= 1.4729), tps=20623, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:22,956 - root - INFO - Step 15290: lr=1.04E-06, loss= 1.0270 (max= 1.4433), tps=20653, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:22,956 - root - INFO - Step 15290: lr=1.04E-06, loss= 1.0270 (max= 1.4433), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:22,956 - root - INFO - Step 15290: lr=1.04E-06, loss= 1.0270 (max= 1.4433), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:22,956 - root - INFO - Step 15290: lr=1.04E-06, loss= 1.0270 (max= 1.4433), tps=20653, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:22,956 - root - INFO - Step 15290: lr=1.04E-06, loss= 1.0270 (max= 1.4433), tps=20653, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:22,956 - root - INFO - Step 15290: lr=1.04E-06, loss= 1.0270 (max= 1.4433), tps=20653, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:22,956 - root - INFO - Step 15290: lr=1.04E-06, loss= 1.0270 (max= 1.4433), tps=20652, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:22,956 - root - INFO - Step 15290: lr=1.04E-06, loss= 1.0270 (max= 1.4433), tps=20653, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:54,803 - root - INFO - Step 15300: lr=1.04E-06, loss= 1.0618 (max= 1.4569), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:54,803 - root - INFO - Step 15300: lr=1.04E-06, loss= 1.0618 (max= 1.4569), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:54,803 - root - INFO - Step 15300: lr=1.04E-06, loss= 1.0618 (max= 1.4569), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:54,803 - root - INFO - Step 15300: lr=1.04E-06, loss= 1.0618 (max= 1.4569), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:54,803 - root - INFO - Step 15300: lr=1.04E-06, loss= 1.0618 (max= 1.4569), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:54,804 - root - INFO - Step 15300: lr=1.04E-06, loss= 1.0618 (max= 1.4569), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:54,804 - root - INFO - Step 15300: lr=1.04E-06, loss= 1.0618 (max= 1.4569), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:41:54,804 - root - INFO - Step 15300: lr=1.04E-06, loss= 1.0618 (max= 1.4569), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:26,637 - root - INFO - Step 15310: lr=1.03E-06, loss= 1.0620 (max= 1.4646), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:26,637 - root - INFO - Step 15310: lr=1.03E-06, loss= 1.0620 (max= 1.4646), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:26,637 - root - INFO - Step 15310: lr=1.03E-06, loss= 1.0620 (max= 1.4646), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:26,637 - root - INFO - Step 15310: lr=1.03E-06, loss= 1.0620 (max= 1.4646), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:26,637 - root - INFO - Step 15310: lr=1.03E-06, loss= 1.0620 (max= 1.4646), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:26,638 - root - INFO - Step 15310: lr=1.03E-06, loss= 1.0620 (max= 1.4646), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:26,638 - root - INFO - Step 15310: lr=1.03E-06, loss= 1.0620 (max= 1.4646), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:26,638 - root - INFO - Step 15310: lr=1.03E-06, loss= 1.0620 (max= 1.4646), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:58,643 - root - INFO - Step 15320: lr=1.03E-06, loss= 1.0566 (max= 1.6161), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:58,643 - root - INFO - Step 15320: lr=1.03E-06, loss= 1.0566 (max= 1.6161), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:58,643 - root - INFO - Step 15320: lr=1.03E-06, loss= 1.0566 (max= 1.6161), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:58,643 - root - INFO - Step 15320: lr=1.03E-06, loss= 1.0566 (max= 1.6161), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:58,643 - root - INFO - Step 15320: lr=1.03E-06, loss= 1.0566 (max= 1.6161), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:58,643 - root - INFO - Step 15320: lr=1.03E-06, loss= 1.0566 (max= 1.6161), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:58,643 - root - INFO - Step 15320: lr=1.03E-06, loss= 1.0566 (max= 1.6161), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:42:58,643 - root - INFO - Step 15320: lr=1.03E-06, loss= 1.0566 (max= 1.6161), tps=20479, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:43:30,640 - root - INFO - Step 15330: lr=1.03E-06, loss= 1.0705 (max= 1.7987), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:43:30,640 - root - INFO - Step 15330: lr=1.03E-06, loss= 1.0705 (max= 1.7987), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:43:30,640 - root - INFO - Step 15330: lr=1.03E-06, loss= 1.0705 (max= 1.7987), tps=20483, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:43:30,641 - root - INFO - Step 15330: lr=1.03E-06, loss= 1.0705 (max= 1.7987), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:43:30,641 - root - INFO - Step 15330: lr=1.03E-06, loss= 1.0705 (max= 1.7987), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:43:30,641 - root - INFO - Step 15330: lr=1.03E-06, loss= 1.0705 (max= 1.7987), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:43:30,641 - root - INFO - Step 15330: lr=1.03E-06, loss= 1.0705 (max= 1.7987), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:43:30,641 - root - INFO - Step 15330: lr=1.03E-06, loss= 1.0705 (max= 1.7987), tps=20484, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:02,390 - root - INFO - Step 15340: lr=1.03E-06, loss= 1.0363 (max= 1.4392), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:02,390 - root - INFO - Step 15340: lr=1.03E-06, loss= 1.0363 (max= 1.4392), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:02,390 - root - INFO - Step 15340: lr=1.03E-06, loss= 1.0363 (max= 1.4392), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:02,390 - root - INFO - Step 15340: lr=1.03E-06, loss= 1.0363 (max= 1.4392), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:02,390 - root - INFO - Step 15340: lr=1.03E-06, loss= 1.0363 (max= 1.4392), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:02,390 - root - INFO - Step 15340: lr=1.03E-06, loss= 1.0363 (max= 1.4392), tps=20643, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:02,390 - root - INFO - Step 15340: lr=1.03E-06, loss= 1.0363 (max= 1.4392), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:02,390 - root - INFO - Step 15340: lr=1.03E-06, loss= 1.0363 (max= 1.4392), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:34,447 - root - INFO - Step 15350: lr=1.02E-06, loss= 1.0323 (max= 1.5019), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:34,447 - root - INFO - Step 15350: lr=1.02E-06, loss= 1.0323 (max= 1.5019), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:34,447 - root - INFO - Step 15350: lr=1.02E-06, loss= 1.0323 (max= 1.5019), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:34,447 - root - INFO - Step 15350: lr=1.02E-06, loss= 1.0323 (max= 1.5019), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:34,447 - root - INFO - Step 15350: lr=1.02E-06, loss= 1.0323 (max= 1.5019), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:34,447 - root - INFO - Step 15350: lr=1.02E-06, loss= 1.0323 (max= 1.5019), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:34,447 - root - INFO - Step 15350: lr=1.02E-06, loss= 1.0323 (max= 1.5019), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:34,447 - root - INFO - Step 15350: lr=1.02E-06, loss= 1.0323 (max= 1.5019), tps=20446, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:44:55,824 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:1830982 2025-10-26 21:45:06,218 - root - INFO - Step 15360: lr=1.02E-06, loss= 1.0817 (max= 1.4902), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:06,218 - root - INFO - Step 15360: lr=1.02E-06, loss= 1.0817 (max= 1.4902), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:06,218 - root - INFO - Step 15360: lr=1.02E-06, loss= 1.0817 (max= 1.4902), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:06,218 - root - INFO - Step 15360: lr=1.02E-06, loss= 1.0817 (max= 1.4902), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:06,218 - root - INFO - Step 15360: lr=1.02E-06, loss= 1.0817 (max= 1.4902), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:06,218 - root - INFO - Step 15360: lr=1.02E-06, loss= 1.0817 (max= 1.4902), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:06,218 - root - INFO - Step 15360: lr=1.02E-06, loss= 1.0817 (max= 1.4902), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:06,218 - root - INFO - Step 15360: lr=1.02E-06, loss= 1.0817 (max= 1.4902), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:37,975 - root - INFO - Step 15370: lr=1.02E-06, loss= 1.0620 (max= 1.8730), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:37,975 - root - INFO - Step 15370: lr=1.02E-06, loss= 1.0620 (max= 1.8730), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:37,975 - root - INFO - Step 15370: lr=1.02E-06, loss= 1.0620 (max= 1.8730), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:37,975 - root - INFO - Step 15370: lr=1.02E-06, loss= 1.0620 (max= 1.8730), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:37,975 - root - INFO - Step 15370: lr=1.02E-06, loss= 1.0620 (max= 1.8730), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:37,975 - root - INFO - Step 15370: lr=1.02E-06, loss= 1.0620 (max= 1.8730), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:37,975 - root - INFO - Step 15370: lr=1.02E-06, loss= 1.0620 (max= 1.8730), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:45:37,975 - root - INFO - Step 15370: lr=1.02E-06, loss= 1.0620 (max= 1.8730), tps=20639, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:09,796 - root - INFO - Step 15380: lr=1.01E-06, loss= 1.0552 (max= 1.5410), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:09,796 - root - INFO - Step 15380: lr=1.01E-06, loss= 1.0552 (max= 1.5410), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:09,796 - root - INFO - Step 15380: lr=1.01E-06, loss= 1.0552 (max= 1.5410), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:09,796 - root - INFO - Step 15380: lr=1.01E-06, loss= 1.0552 (max= 1.5410), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:09,796 - root - INFO - Step 15380: lr=1.01E-06, loss= 1.0552 (max= 1.5410), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:09,796 - root - INFO - Step 15380: lr=1.01E-06, loss= 1.0552 (max= 1.5410), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:09,796 - root - INFO - Step 15380: lr=1.01E-06, loss= 1.0552 (max= 1.5410), tps=20597, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:09,796 - root - INFO - Step 15380: lr=1.01E-06, loss= 1.0552 (max= 1.5410), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:41,632 - root - INFO - Step 15390: lr=1.01E-06, loss= 1.0508 (max= 1.5667), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:41,632 - root - INFO - Step 15390: lr=1.01E-06, loss= 1.0508 (max= 1.5667), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:41,633 - root - INFO - Step 15390: lr=1.01E-06, loss= 1.0508 (max= 1.5667), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:41,633 - root - INFO - Step 15390: lr=1.01E-06, loss= 1.0508 (max= 1.5667), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:41,633 - root - INFO - Step 15390: lr=1.01E-06, loss= 1.0508 (max= 1.5667), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:41,633 - root - INFO - Step 15390: lr=1.01E-06, loss= 1.0508 (max= 1.5667), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:41,633 - root - INFO - Step 15390: lr=1.01E-06, loss= 1.0508 (max= 1.5667), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:46:41,633 - root - INFO - Step 15390: lr=1.01E-06, loss= 1.0508 (max= 1.5667), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:13,693 - root - INFO - Step 15400: lr=1.01E-06, loss= 1.0478 (max= 1.6186), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:13,693 - root - INFO - Step 15400: lr=1.01E-06, loss= 1.0478 (max= 1.6186), tps=20444, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:13,693 - root - INFO - Step 15400: lr=1.01E-06, loss= 1.0478 (max= 1.6186), tps=20444, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:13,693 - root - INFO - Step 15400: lr=1.01E-06, loss= 1.0478 (max= 1.6186), tps=20444, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:13,693 - root - INFO - Step 15400: lr=1.01E-06, loss= 1.0478 (max= 1.6186), tps=20444, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:13,693 - root - INFO - Step 15400: lr=1.01E-06, loss= 1.0478 (max= 1.6186), tps=20444, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:13,693 - root - INFO - Step 15400: lr=1.01E-06, loss= 1.0478 (max= 1.6186), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:13,693 - root - INFO - Step 15400: lr=1.01E-06, loss= 1.0478 (max= 1.6186), tps=20444, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:45,678 - root - INFO - Step 15410: lr=1.00E-06, loss= 1.0619 (max= 1.5022), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:45,678 - root - INFO - Step 15410: lr=1.00E-06, loss= 1.0619 (max= 1.5022), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:45,678 - root - INFO - Step 15410: lr=1.00E-06, loss= 1.0619 (max= 1.5022), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:45,678 - root - INFO - Step 15410: lr=1.00E-06, loss= 1.0619 (max= 1.5022), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:45,679 - root - INFO - Step 15410: lr=1.00E-06, loss= 1.0619 (max= 1.5022), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:45,679 - root - INFO - Step 15410: lr=1.00E-06, loss= 1.0619 (max= 1.5022), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:45,679 - root - INFO - Step 15410: lr=1.00E-06, loss= 1.0619 (max= 1.5022), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:47:45,679 - root - INFO - Step 15410: lr=1.00E-06, loss= 1.0619 (max= 1.5022), tps=20491, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:17,497 - root - INFO - Step 15420: lr=1.00E-06, loss= 1.0414 (max= 1.3690), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:17,497 - root - INFO - Step 15420: lr=1.00E-06, loss= 1.0414 (max= 1.3690), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:17,497 - root - INFO - Step 15420: lr=1.00E-06, loss= 1.0414 (max= 1.3690), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:17,498 - root - INFO - Step 15420: lr=1.00E-06, loss= 1.0414 (max= 1.3690), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:17,498 - root - INFO - Step 15420: lr=1.00E-06, loss= 1.0414 (max= 1.3690), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:17,498 - root - INFO - Step 15420: lr=1.00E-06, loss= 1.0414 (max= 1.3690), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:17,498 - root - INFO - Step 15420: lr=1.00E-06, loss= 1.0414 (max= 1.3690), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:17,498 - root - INFO - Step 15420: lr=1.00E-06, loss= 1.0414 (max= 1.3690), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:49,327 - root - INFO - Step 15430: lr=9.99E-07, loss= 1.0377 (max= 1.3867), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:49,327 - root - INFO - Step 15430: lr=9.99E-07, loss= 1.0377 (max= 1.3867), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:49,327 - root - INFO - Step 15430: lr=9.99E-07, loss= 1.0377 (max= 1.3867), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:49,327 - root - INFO - Step 15430: lr=9.99E-07, loss= 1.0377 (max= 1.3867), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:49,327 - root - INFO - Step 15430: lr=9.99E-07, loss= 1.0377 (max= 1.3867), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:49,327 - root - INFO - Step 15430: lr=9.99E-07, loss= 1.0377 (max= 1.3867), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:49,327 - root - INFO - Step 15430: lr=9.99E-07, loss= 1.0377 (max= 1.3867), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:48:49,328 - root - INFO - Step 15430: lr=9.99E-07, loss= 1.0377 (max= 1.3867), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:21,318 - root - INFO - Step 15440: lr=9.95E-07, loss= 1.0599 (max= 1.4371), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:21,318 - root - INFO - Step 15440: lr=9.95E-07, loss= 1.0599 (max= 1.4371), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:21,318 - root - INFO - Step 15440: lr=9.95E-07, loss= 1.0599 (max= 1.4371), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:21,318 - root - INFO - Step 15440: lr=9.95E-07, loss= 1.0599 (max= 1.4371), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:21,319 - root - INFO - Step 15440: lr=9.95E-07, loss= 1.0599 (max= 1.4371), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:21,319 - root - INFO - Step 15440: lr=9.95E-07, loss= 1.0599 (max= 1.4371), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:21,319 - root - INFO - Step 15440: lr=9.95E-07, loss= 1.0599 (max= 1.4371), tps=20489, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:21,319 - root - INFO - Step 15440: lr=9.95E-07, loss= 1.0599 (max= 1.4371), tps=20488, mfu=42.69%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:53,120 - root - INFO - Step 15450: lr=9.92E-07, loss= 1.0635 (max= 1.5227), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:53,120 - root - INFO - Step 15450: lr=9.92E-07, loss= 1.0635 (max= 1.5227), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:53,120 - root - INFO - Step 15450: lr=9.92E-07, loss= 1.0635 (max= 1.5227), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:53,120 - root - INFO - Step 15450: lr=9.92E-07, loss= 1.0635 (max= 1.5227), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:53,120 - root - INFO - Step 15450: lr=9.92E-07, loss= 1.0635 (max= 1.5227), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:53,120 - root - INFO - Step 15450: lr=9.92E-07, loss= 1.0635 (max= 1.5227), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:53,120 - root - INFO - Step 15450: lr=9.92E-07, loss= 1.0635 (max= 1.5227), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:49:53,120 - root - INFO - Step 15450: lr=9.92E-07, loss= 1.0635 (max= 1.5227), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:24,970 - root - INFO - Step 15460: lr=9.89E-07, loss= 1.0685 (max= 1.4127), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:24,970 - root - INFO - Step 15460: lr=9.89E-07, loss= 1.0685 (max= 1.4127), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:24,970 - root - INFO - Step 15460: lr=9.89E-07, loss= 1.0685 (max= 1.4127), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:24,970 - root - INFO - Step 15460: lr=9.89E-07, loss= 1.0685 (max= 1.4127), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:24,970 - root - INFO - Step 15460: lr=9.89E-07, loss= 1.0685 (max= 1.4127), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:24,970 - root - INFO - Step 15460: lr=9.89E-07, loss= 1.0685 (max= 1.4127), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:24,970 - root - INFO - Step 15460: lr=9.89E-07, loss= 1.0685 (max= 1.4127), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:24,970 - root - INFO - Step 15460: lr=9.89E-07, loss= 1.0685 (max= 1.4127), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:56,729 - root - INFO - Step 15470: lr=9.86E-07, loss= 1.0491 (max= 1.6524), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:56,729 - root - INFO - Step 15470: lr=9.86E-07, loss= 1.0491 (max= 1.6524), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:56,729 - root - INFO - Step 15470: lr=9.86E-07, loss= 1.0491 (max= 1.6524), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:56,729 - root - INFO - Step 15470: lr=9.86E-07, loss= 1.0491 (max= 1.6524), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:56,729 - root - INFO - Step 15470: lr=9.86E-07, loss= 1.0491 (max= 1.6524), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:56,729 - root - INFO - Step 15470: lr=9.86E-07, loss= 1.0491 (max= 1.6524), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:56,729 - root - INFO - Step 15470: lr=9.86E-07, loss= 1.0491 (max= 1.6524), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:50:56,729 - root - INFO - Step 15470: lr=9.86E-07, loss= 1.0491 (max= 1.6524), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:51:28,636 - root - INFO - Step 15480: lr=9.83E-07, loss= 1.0711 (max= 1.4845), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:51:28,636 - root - INFO - Step 15480: lr=9.83E-07, loss= 1.0711 (max= 1.4845), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:51:28,636 - root - INFO - Step 15480: lr=9.83E-07, loss= 1.0711 (max= 1.4845), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:51:28,636 - root - INFO - Step 15480: lr=9.83E-07, loss= 1.0711 (max= 1.4845), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:51:28,636 - root - INFO - Step 15480: lr=9.83E-07, loss= 1.0711 (max= 1.4845), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:51:28,636 - root - INFO - Step 15480: lr=9.83E-07, loss= 1.0711 (max= 1.4845), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:51:28,636 - root - INFO - Step 15480: lr=9.83E-07, loss= 1.0711 (max= 1.4845), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:51:28,636 - root - INFO - Step 15480: lr=9.83E-07, loss= 1.0711 (max= 1.4845), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:00,472 - root - INFO - Step 15490: lr=9.80E-07, loss= 1.0429 (max= 1.6556), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:00,472 - root - INFO - Step 15490: lr=9.80E-07, loss= 1.0429 (max= 1.6556), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:00,472 - root - INFO - Step 15490: lr=9.80E-07, loss= 1.0429 (max= 1.6556), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:00,472 - root - INFO - Step 15490: lr=9.80E-07, loss= 1.0429 (max= 1.6556), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:00,472 - root - INFO - Step 15490: lr=9.80E-07, loss= 1.0429 (max= 1.6556), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:00,472 - root - INFO - Step 15490: lr=9.80E-07, loss= 1.0429 (max= 1.6556), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:00,472 - root - INFO - Step 15490: lr=9.80E-07, loss= 1.0429 (max= 1.6556), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:00,472 - root - INFO - Step 15490: lr=9.80E-07, loss= 1.0429 (max= 1.6556), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:32,301 - root - INFO - Step 15500: lr=9.77E-07, loss= 1.0628 (max= 1.8055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:32,301 - root - INFO - Step 15500: lr=9.77E-07, loss= 1.0628 (max= 1.8055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:32,302 - root - INFO - Step 15500: lr=9.77E-07, loss= 1.0628 (max= 1.8055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:32,302 - root - INFO - Step 15500: lr=9.77E-07, loss= 1.0628 (max= 1.8055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:32,302 - root - INFO - Step 15500: lr=9.77E-07, loss= 1.0628 (max= 1.8055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:32,302 - root - INFO - Step 15500: lr=9.77E-07, loss= 1.0628 (max= 1.8055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:32,302 - root - INFO - Step 15500: lr=9.77E-07, loss= 1.0628 (max= 1.8055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:52:32,302 - root - INFO - Step 15500: lr=9.77E-07, loss= 1.0628 (max= 1.8055), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:04,189 - root - INFO - Step 15510: lr=9.74E-07, loss= 1.0441 (max= 1.7694), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:04,189 - root - INFO - Step 15510: lr=9.74E-07, loss= 1.0441 (max= 1.7694), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:04,189 - root - INFO - Step 15510: lr=9.74E-07, loss= 1.0441 (max= 1.7694), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:04,189 - root - INFO - Step 15510: lr=9.74E-07, loss= 1.0441 (max= 1.7694), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:04,189 - root - INFO - Step 15510: lr=9.74E-07, loss= 1.0441 (max= 1.7694), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:04,189 - root - INFO - Step 15510: lr=9.74E-07, loss= 1.0441 (max= 1.7694), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:04,189 - root - INFO - Step 15510: lr=9.74E-07, loss= 1.0441 (max= 1.7694), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:04,189 - root - INFO - Step 15510: lr=9.74E-07, loss= 1.0441 (max= 1.7694), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:35,997 - root - INFO - Step 15520: lr=9.71E-07, loss= 1.0608 (max= 1.4969), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:35,997 - root - INFO - Step 15520: lr=9.71E-07, loss= 1.0608 (max= 1.4969), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:35,997 - root - INFO - Step 15520: lr=9.71E-07, loss= 1.0608 (max= 1.4969), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:35,997 - root - INFO - Step 15520: lr=9.71E-07, loss= 1.0608 (max= 1.4969), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:35,997 - root - INFO - Step 15520: lr=9.71E-07, loss= 1.0608 (max= 1.4969), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:35,997 - root - INFO - Step 15520: lr=9.71E-07, loss= 1.0608 (max= 1.4969), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:35,997 - root - INFO - Step 15520: lr=9.71E-07, loss= 1.0608 (max= 1.4969), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:53:35,997 - root - INFO - Step 15520: lr=9.71E-07, loss= 1.0608 (max= 1.4969), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:07,953 - root - INFO - Step 15530: lr=9.68E-07, loss= 1.0659 (max= 1.4896), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:07,953 - root - INFO - Step 15530: lr=9.68E-07, loss= 1.0659 (max= 1.4896), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:07,953 - root - INFO - Step 15530: lr=9.68E-07, loss= 1.0659 (max= 1.4896), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:07,953 - root - INFO - Step 15530: lr=9.68E-07, loss= 1.0659 (max= 1.4896), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:07,953 - root - INFO - Step 15530: lr=9.68E-07, loss= 1.0659 (max= 1.4896), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:07,953 - root - INFO - Step 15530: lr=9.68E-07, loss= 1.0659 (max= 1.4896), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:07,953 - root - INFO - Step 15530: lr=9.68E-07, loss= 1.0659 (max= 1.4896), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:07,953 - root - INFO - Step 15530: lr=9.68E-07, loss= 1.0659 (max= 1.4896), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:26,323 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2028478 2025-10-26 21:54:39,916 - root - INFO - Step 15540: lr=9.65E-07, loss= 1.0513 (max= 1.4094), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:39,916 - root - INFO - Step 15540: lr=9.65E-07, loss= 1.0513 (max= 1.4094), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:39,916 - root - INFO - Step 15540: lr=9.65E-07, loss= 1.0513 (max= 1.4094), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:39,916 - root - INFO - Step 15540: lr=9.65E-07, loss= 1.0513 (max= 1.4094), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:39,916 - root - INFO - Step 15540: lr=9.65E-07, loss= 1.0513 (max= 1.4094), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:39,916 - root - INFO - Step 15540: lr=9.65E-07, loss= 1.0513 (max= 1.4094), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:39,916 - root - INFO - Step 15540: lr=9.65E-07, loss= 1.0513 (max= 1.4094), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:54:39,917 - root - INFO - Step 15540: lr=9.65E-07, loss= 1.0513 (max= 1.4094), tps=20506, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:11,715 - root - INFO - Step 15550: lr=9.62E-07, loss= 1.0643 (max= 1.4594), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:11,715 - root - INFO - Step 15550: lr=9.62E-07, loss= 1.0643 (max= 1.4594), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:11,715 - root - INFO - Step 15550: lr=9.62E-07, loss= 1.0643 (max= 1.4594), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:11,715 - root - INFO - Step 15550: lr=9.62E-07, loss= 1.0643 (max= 1.4594), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:11,715 - root - INFO - Step 15550: lr=9.62E-07, loss= 1.0643 (max= 1.4594), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:11,715 - root - INFO - Step 15550: lr=9.62E-07, loss= 1.0643 (max= 1.4594), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:11,715 - root - INFO - Step 15550: lr=9.62E-07, loss= 1.0643 (max= 1.4594), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:11,715 - root - INFO - Step 15550: lr=9.62E-07, loss= 1.0643 (max= 1.4594), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:43,542 - root - INFO - Step 15560: lr=9.59E-07, loss= 1.0602 (max= 1.6377), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:43,542 - root - INFO - Step 15560: lr=9.59E-07, loss= 1.0602 (max= 1.6377), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:43,542 - root - INFO - Step 15560: lr=9.59E-07, loss= 1.0602 (max= 1.6377), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:43,542 - root - INFO - Step 15560: lr=9.59E-07, loss= 1.0602 (max= 1.6377), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:43,542 - root - INFO - Step 15560: lr=9.59E-07, loss= 1.0602 (max= 1.6377), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:43,542 - root - INFO - Step 15560: lr=9.59E-07, loss= 1.0602 (max= 1.6377), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:43,542 - root - INFO - Step 15560: lr=9.59E-07, loss= 1.0602 (max= 1.6377), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:55:43,542 - root - INFO - Step 15560: lr=9.59E-07, loss= 1.0602 (max= 1.6377), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:15,353 - root - INFO - Step 15570: lr=9.56E-07, loss= 1.0505 (max= 1.5107), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:15,353 - root - INFO - Step 15570: lr=9.56E-07, loss= 1.0505 (max= 1.5107), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:15,353 - root - INFO - Step 15570: lr=9.56E-07, loss= 1.0505 (max= 1.5107), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:15,354 - root - INFO - Step 15570: lr=9.56E-07, loss= 1.0505 (max= 1.5107), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:15,354 - root - INFO - Step 15570: lr=9.56E-07, loss= 1.0505 (max= 1.5107), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:15,354 - root - INFO - Step 15570: lr=9.56E-07, loss= 1.0505 (max= 1.5107), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:15,354 - root - INFO - Step 15570: lr=9.56E-07, loss= 1.0505 (max= 1.5107), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:15,354 - root - INFO - Step 15570: lr=9.56E-07, loss= 1.0505 (max= 1.5107), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:35,073 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2675332 2025-10-26 21:56:47,291 - root - INFO - Step 15580: lr=9.53E-07, loss= 1.0429 (max= 1.4511), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:47,291 - root - INFO - Step 15580: lr=9.53E-07, loss= 1.0429 (max= 1.4511), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:47,291 - root - INFO - Step 15580: lr=9.53E-07, loss= 1.0429 (max= 1.4511), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:47,292 - root - INFO - Step 15580: lr=9.53E-07, loss= 1.0429 (max= 1.4511), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:47,292 - root - INFO - Step 15580: lr=9.53E-07, loss= 1.0429 (max= 1.4511), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:47,292 - root - INFO - Step 15580: lr=9.53E-07, loss= 1.0429 (max= 1.4511), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:47,292 - root - INFO - Step 15580: lr=9.53E-07, loss= 1.0429 (max= 1.4511), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:56:47,292 - root - INFO - Step 15580: lr=9.53E-07, loss= 1.0429 (max= 1.4511), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:19,136 - root - INFO - Step 15590: lr=9.50E-07, loss= 1.0565 (max= 1.6420), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:19,136 - root - INFO - Step 15590: lr=9.50E-07, loss= 1.0565 (max= 1.6420), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:19,136 - root - INFO - Step 15590: lr=9.50E-07, loss= 1.0565 (max= 1.6420), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:19,136 - root - INFO - Step 15590: lr=9.50E-07, loss= 1.0565 (max= 1.6420), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:19,136 - root - INFO - Step 15590: lr=9.50E-07, loss= 1.0565 (max= 1.6420), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:19,136 - root - INFO - Step 15590: lr=9.50E-07, loss= 1.0565 (max= 1.6420), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:19,136 - root - INFO - Step 15590: lr=9.50E-07, loss= 1.0565 (max= 1.6420), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:19,136 - root - INFO - Step 15590: lr=9.50E-07, loss= 1.0565 (max= 1.6420), tps=20582, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:51,016 - root - INFO - Step 15600: lr=9.47E-07, loss= 1.0491 (max= 1.6350), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:51,016 - root - INFO - Step 15600: lr=9.47E-07, loss= 1.0491 (max= 1.6350), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:51,016 - root - INFO - Step 15600: lr=9.47E-07, loss= 1.0491 (max= 1.6350), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:51,016 - root - INFO - Step 15600: lr=9.47E-07, loss= 1.0491 (max= 1.6350), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:51,016 - root - INFO - Step 15600: lr=9.47E-07, loss= 1.0491 (max= 1.6350), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:51,016 - root - INFO - Step 15600: lr=9.47E-07, loss= 1.0491 (max= 1.6350), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:51,016 - root - INFO - Step 15600: lr=9.47E-07, loss= 1.0491 (max= 1.6350), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:57:51,016 - root - INFO - Step 15600: lr=9.47E-07, loss= 1.0491 (max= 1.6350), tps=20559, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:22,817 - root - INFO - Step 15610: lr=9.44E-07, loss= 1.0644 (max= 1.6770), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:22,817 - root - INFO - Step 15610: lr=9.44E-07, loss= 1.0644 (max= 1.6770), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:22,817 - root - INFO - Step 15610: lr=9.44E-07, loss= 1.0644 (max= 1.6770), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:22,817 - root - INFO - Step 15610: lr=9.44E-07, loss= 1.0644 (max= 1.6770), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:22,817 - root - INFO - Step 15610: lr=9.44E-07, loss= 1.0644 (max= 1.6770), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:22,817 - root - INFO - Step 15610: lr=9.44E-07, loss= 1.0644 (max= 1.6770), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:22,817 - root - INFO - Step 15610: lr=9.44E-07, loss= 1.0644 (max= 1.6770), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:22,817 - root - INFO - Step 15610: lr=9.44E-07, loss= 1.0644 (max= 1.6770), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:54,796 - root - INFO - Step 15620: lr=9.41E-07, loss= 1.0353 (max= 1.4635), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:54,796 - root - INFO - Step 15620: lr=9.41E-07, loss= 1.0353 (max= 1.4635), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:54,796 - root - INFO - Step 15620: lr=9.41E-07, loss= 1.0353 (max= 1.4635), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:54,796 - root - INFO - Step 15620: lr=9.41E-07, loss= 1.0353 (max= 1.4635), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:54,796 - root - INFO - Step 15620: lr=9.41E-07, loss= 1.0353 (max= 1.4635), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:54,796 - root - INFO - Step 15620: lr=9.41E-07, loss= 1.0353 (max= 1.4635), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:54,796 - root - INFO - Step 15620: lr=9.41E-07, loss= 1.0353 (max= 1.4635), tps=20496, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:58:54,797 - root - INFO - Step 15620: lr=9.41E-07, loss= 1.0353 (max= 1.4635), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:26,598 - root - INFO - Step 15630: lr=9.38E-07, loss= 1.0509 (max= 1.4915), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:26,598 - root - INFO - Step 15630: lr=9.38E-07, loss= 1.0509 (max= 1.4915), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:26,598 - root - INFO - Step 15630: lr=9.38E-07, loss= 1.0509 (max= 1.4915), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:26,598 - root - INFO - Step 15630: lr=9.38E-07, loss= 1.0509 (max= 1.4915), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:26,598 - root - INFO - Step 15630: lr=9.38E-07, loss= 1.0509 (max= 1.4915), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:26,599 - root - INFO - Step 15630: lr=9.38E-07, loss= 1.0509 (max= 1.4915), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:26,599 - root - INFO - Step 15630: lr=9.38E-07, loss= 1.0509 (max= 1.4915), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:26,599 - root - INFO - Step 15630: lr=9.38E-07, loss= 1.0509 (max= 1.4915), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:58,444 - root - INFO - Step 15640: lr=9.35E-07, loss= 1.0488 (max= 1.7177), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:58,444 - root - INFO - Step 15640: lr=9.35E-07, loss= 1.0488 (max= 1.7177), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:58,444 - root - INFO - Step 15640: lr=9.35E-07, loss= 1.0488 (max= 1.7177), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:58,444 - root - INFO - Step 15640: lr=9.35E-07, loss= 1.0488 (max= 1.7177), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:58,444 - root - INFO - Step 15640: lr=9.35E-07, loss= 1.0488 (max= 1.7177), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:58,444 - root - INFO - Step 15640: lr=9.35E-07, loss= 1.0488 (max= 1.7177), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:58,444 - root - INFO - Step 15640: lr=9.35E-07, loss= 1.0488 (max= 1.7177), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 21:59:58,444 - root - INFO - Step 15640: lr=9.35E-07, loss= 1.0488 (max= 1.7177), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:00:30,286 - root - INFO - Step 15650: lr=9.32E-07, loss= 1.0220 (max= 1.5846), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:00:30,286 - root - INFO - Step 15650: lr=9.32E-07, loss= 1.0220 (max= 1.5846), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:00:30,286 - root - INFO - Step 15650: lr=9.32E-07, loss= 1.0220 (max= 1.5846), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:00:30,286 - root - INFO - Step 15650: lr=9.32E-07, loss= 1.0220 (max= 1.5846), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:00:30,286 - root - INFO - Step 15650: lr=9.32E-07, loss= 1.0220 (max= 1.5846), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:00:30,286 - root - INFO - Step 15650: lr=9.32E-07, loss= 1.0220 (max= 1.5846), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:00:30,286 - root - INFO - Step 15650: lr=9.32E-07, loss= 1.0220 (max= 1.5846), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:00:30,287 - root - INFO - Step 15650: lr=9.32E-07, loss= 1.0220 (max= 1.5846), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:00:43,619 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5930909 2025-10-26 22:01:02,128 - root - INFO - Step 15660: lr=9.29E-07, loss= 1.0383 (max= 1.4447), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:02,129 - root - INFO - Step 15660: lr=9.29E-07, loss= 1.0383 (max= 1.4447), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:02,129 - root - INFO - Step 15660: lr=9.29E-07, loss= 1.0383 (max= 1.4447), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:02,129 - root - INFO - Step 15660: lr=9.29E-07, loss= 1.0383 (max= 1.4447), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:02,129 - root - INFO - Step 15660: lr=9.29E-07, loss= 1.0383 (max= 1.4447), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:02,129 - root - INFO - Step 15660: lr=9.29E-07, loss= 1.0383 (max= 1.4447), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:02,129 - root - INFO - Step 15660: lr=9.29E-07, loss= 1.0383 (max= 1.4447), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:02,129 - root - INFO - Step 15660: lr=9.29E-07, loss= 1.0383 (max= 1.4447), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:33,965 - root - INFO - Step 15670: lr=9.26E-07, loss= 1.0235 (max= 1.7874), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:33,965 - root - INFO - Step 15670: lr=9.26E-07, loss= 1.0235 (max= 1.7874), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:33,965 - root - INFO - Step 15670: lr=9.26E-07, loss= 1.0235 (max= 1.7874), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:33,965 - root - INFO - Step 15670: lr=9.26E-07, loss= 1.0235 (max= 1.7874), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:33,965 - root - INFO - Step 15670: lr=9.26E-07, loss= 1.0235 (max= 1.7874), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:33,965 - root - INFO - Step 15670: lr=9.26E-07, loss= 1.0235 (max= 1.7874), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:33,965 - root - INFO - Step 15670: lr=9.26E-07, loss= 1.0235 (max= 1.7874), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:01:33,965 - root - INFO - Step 15670: lr=9.26E-07, loss= 1.0235 (max= 1.7874), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:05,781 - root - INFO - Step 15680: lr=9.23E-07, loss= 1.0329 (max= 1.6627), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:05,781 - root - INFO - Step 15680: lr=9.23E-07, loss= 1.0329 (max= 1.6627), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:05,782 - root - INFO - Step 15680: lr=9.23E-07, loss= 1.0329 (max= 1.6627), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:05,782 - root - INFO - Step 15680: lr=9.23E-07, loss= 1.0329 (max= 1.6627), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:05,782 - root - INFO - Step 15680: lr=9.23E-07, loss= 1.0329 (max= 1.6627), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:05,782 - root - INFO - Step 15680: lr=9.23E-07, loss= 1.0329 (max= 1.6627), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:05,782 - root - INFO - Step 15680: lr=9.23E-07, loss= 1.0329 (max= 1.6627), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:05,782 - root - INFO - Step 15680: lr=9.23E-07, loss= 1.0329 (max= 1.6627), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:37,601 - root - INFO - Step 15690: lr=9.20E-07, loss= 1.0262 (max= 1.5238), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:37,601 - root - INFO - Step 15690: lr=9.20E-07, loss= 1.0262 (max= 1.5238), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:37,601 - root - INFO - Step 15690: lr=9.20E-07, loss= 1.0262 (max= 1.5238), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:37,601 - root - INFO - Step 15690: lr=9.20E-07, loss= 1.0262 (max= 1.5238), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:37,601 - root - INFO - Step 15690: lr=9.20E-07, loss= 1.0262 (max= 1.5238), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:37,601 - root - INFO - Step 15690: lr=9.20E-07, loss= 1.0262 (max= 1.5238), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:37,601 - root - INFO - Step 15690: lr=9.20E-07, loss= 1.0262 (max= 1.5238), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:02:37,601 - root - INFO - Step 15690: lr=9.20E-07, loss= 1.0262 (max= 1.5238), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:09,601 - root - INFO - Step 15700: lr=9.17E-07, loss= 1.0253 (max= 1.4282), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:09,601 - root - INFO - Step 15700: lr=9.17E-07, loss= 1.0253 (max= 1.4282), tps=20482, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:09,601 - root - INFO - Step 15700: lr=9.17E-07, loss= 1.0253 (max= 1.4282), tps=20482, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:09,601 - root - INFO - Step 15700: lr=9.17E-07, loss= 1.0253 (max= 1.4282), tps=20482, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:09,601 - root - INFO - Step 15700: lr=9.17E-07, loss= 1.0253 (max= 1.4282), tps=20482, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:09,601 - root - INFO - Step 15700: lr=9.17E-07, loss= 1.0253 (max= 1.4282), tps=20482, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:09,601 - root - INFO - Step 15700: lr=9.17E-07, loss= 1.0253 (max= 1.4282), tps=20482, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:09,601 - root - INFO - Step 15700: lr=9.17E-07, loss= 1.0253 (max= 1.4282), tps=20482, mfu=42.67%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:41,451 - root - INFO - Step 15710: lr=9.14E-07, loss= 1.0486 (max= 1.5888), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:41,451 - root - INFO - Step 15710: lr=9.14E-07, loss= 1.0486 (max= 1.5888), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:41,452 - root - INFO - Step 15710: lr=9.14E-07, loss= 1.0486 (max= 1.5888), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:41,452 - root - INFO - Step 15710: lr=9.14E-07, loss= 1.0486 (max= 1.5888), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:41,452 - root - INFO - Step 15710: lr=9.14E-07, loss= 1.0486 (max= 1.5888), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:41,452 - root - INFO - Step 15710: lr=9.14E-07, loss= 1.0486 (max= 1.5888), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:41,452 - root - INFO - Step 15710: lr=9.14E-07, loss= 1.0486 (max= 1.5888), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:03:41,452 - root - INFO - Step 15710: lr=9.14E-07, loss= 1.0486 (max= 1.5888), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:13,395 - root - INFO - Step 15720: lr=9.12E-07, loss= 1.0235 (max= 1.4752), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:13,395 - root - INFO - Step 15720: lr=9.12E-07, loss= 1.0235 (max= 1.4752), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:13,395 - root - INFO - Step 15720: lr=9.12E-07, loss= 1.0235 (max= 1.4752), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:13,395 - root - INFO - Step 15720: lr=9.12E-07, loss= 1.0235 (max= 1.4752), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:13,395 - root - INFO - Step 15720: lr=9.12E-07, loss= 1.0235 (max= 1.4752), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:13,395 - root - INFO - Step 15720: lr=9.12E-07, loss= 1.0235 (max= 1.4752), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:13,395 - root - INFO - Step 15720: lr=9.12E-07, loss= 1.0235 (max= 1.4752), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:13,395 - root - INFO - Step 15720: lr=9.12E-07, loss= 1.0235 (max= 1.4752), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:45,200 - root - INFO - Step 15730: lr=9.09E-07, loss= 1.0557 (max= 1.7352), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:45,200 - root - INFO - Step 15730: lr=9.09E-07, loss= 1.0557 (max= 1.7352), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:45,200 - root - INFO - Step 15730: lr=9.09E-07, loss= 1.0557 (max= 1.7352), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:45,200 - root - INFO - Step 15730: lr=9.09E-07, loss= 1.0557 (max= 1.7352), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:45,200 - root - INFO - Step 15730: lr=9.09E-07, loss= 1.0557 (max= 1.7352), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:45,200 - root - INFO - Step 15730: lr=9.09E-07, loss= 1.0557 (max= 1.7352), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:45,200 - root - INFO - Step 15730: lr=9.09E-07, loss= 1.0557 (max= 1.7352), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:04:45,200 - root - INFO - Step 15730: lr=9.09E-07, loss= 1.0557 (max= 1.7352), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:17,030 - root - INFO - Step 15740: lr=9.06E-07, loss= 1.0372 (max= 1.5197), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:17,030 - root - INFO - Step 15740: lr=9.06E-07, loss= 1.0372 (max= 1.5197), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:17,031 - root - INFO - Step 15740: lr=9.06E-07, loss= 1.0372 (max= 1.5197), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:17,031 - root - INFO - Step 15740: lr=9.06E-07, loss= 1.0372 (max= 1.5197), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:17,031 - root - INFO - Step 15740: lr=9.06E-07, loss= 1.0372 (max= 1.5197), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:17,031 - root - INFO - Step 15740: lr=9.06E-07, loss= 1.0372 (max= 1.5197), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:17,031 - root - INFO - Step 15740: lr=9.06E-07, loss= 1.0372 (max= 1.5197), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:17,031 - root - INFO - Step 15740: lr=9.06E-07, loss= 1.0372 (max= 1.5197), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:36,639 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:7112796 2025-10-26 22:05:48,883 - root - INFO - Step 15750: lr=9.03E-07, loss= 1.0167 (max= 1.4911), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:48,883 - root - INFO - Step 15750: lr=9.03E-07, loss= 1.0167 (max= 1.4911), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:48,884 - root - INFO - Step 15750: lr=9.03E-07, loss= 1.0167 (max= 1.4911), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:48,884 - root - INFO - Step 15750: lr=9.03E-07, loss= 1.0167 (max= 1.4911), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:48,884 - root - INFO - Step 15750: lr=9.03E-07, loss= 1.0167 (max= 1.4911), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:48,884 - root - INFO - Step 15750: lr=9.03E-07, loss= 1.0167 (max= 1.4911), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:48,884 - root - INFO - Step 15750: lr=9.03E-07, loss= 1.0167 (max= 1.4911), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:05:48,884 - root - INFO - Step 15750: lr=9.03E-07, loss= 1.0167 (max= 1.4911), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:20,805 - root - INFO - Step 15760: lr=9.00E-07, loss= 1.0736 (max= 1.4908), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:20,805 - root - INFO - Step 15760: lr=9.00E-07, loss= 1.0736 (max= 1.4908), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:20,805 - root - INFO - Step 15760: lr=9.00E-07, loss= 1.0736 (max= 1.4908), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:20,805 - root - INFO - Step 15760: lr=9.00E-07, loss= 1.0736 (max= 1.4908), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:20,805 - root - INFO - Step 15760: lr=9.00E-07, loss= 1.0736 (max= 1.4908), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:20,805 - root - INFO - Step 15760: lr=9.00E-07, loss= 1.0736 (max= 1.4908), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:20,805 - root - INFO - Step 15760: lr=9.00E-07, loss= 1.0736 (max= 1.4908), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:20,805 - root - INFO - Step 15760: lr=9.00E-07, loss= 1.0736 (max= 1.4908), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:52,682 - root - INFO - Step 15770: lr=8.97E-07, loss= 1.0559 (max= 1.6199), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:52,682 - root - INFO - Step 15770: lr=8.97E-07, loss= 1.0559 (max= 1.6199), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:52,682 - root - INFO - Step 15770: lr=8.97E-07, loss= 1.0559 (max= 1.6199), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:52,682 - root - INFO - Step 15770: lr=8.97E-07, loss= 1.0559 (max= 1.6199), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:52,682 - root - INFO - Step 15770: lr=8.97E-07, loss= 1.0559 (max= 1.6199), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:52,682 - root - INFO - Step 15770: lr=8.97E-07, loss= 1.0559 (max= 1.6199), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:52,682 - root - INFO - Step 15770: lr=8.97E-07, loss= 1.0559 (max= 1.6199), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:06:52,682 - root - INFO - Step 15770: lr=8.97E-07, loss= 1.0559 (max= 1.6199), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:24,480 - root - INFO - Step 15780: lr=8.94E-07, loss= 1.0430 (max= 1.5088), tps=20612, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:24,481 - root - INFO - Step 15780: lr=8.94E-07, loss= 1.0430 (max= 1.5088), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:24,481 - root - INFO - Step 15780: lr=8.94E-07, loss= 1.0430 (max= 1.5088), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:24,481 - root - INFO - Step 15780: lr=8.94E-07, loss= 1.0430 (max= 1.5088), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:24,481 - root - INFO - Step 15780: lr=8.94E-07, loss= 1.0430 (max= 1.5088), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:24,481 - root - INFO - Step 15780: lr=8.94E-07, loss= 1.0430 (max= 1.5088), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:24,481 - root - INFO - Step 15780: lr=8.94E-07, loss= 1.0430 (max= 1.5088), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:24,481 - root - INFO - Step 15780: lr=8.94E-07, loss= 1.0430 (max= 1.5088), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:56,333 - root - INFO - Step 15790: lr=8.91E-07, loss= 1.0336 (max= 1.4741), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:56,333 - root - INFO - Step 15790: lr=8.91E-07, loss= 1.0336 (max= 1.4741), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:56,333 - root - INFO - Step 15790: lr=8.91E-07, loss= 1.0336 (max= 1.4741), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:56,333 - root - INFO - Step 15790: lr=8.91E-07, loss= 1.0336 (max= 1.4741), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:56,333 - root - INFO - Step 15790: lr=8.91E-07, loss= 1.0336 (max= 1.4741), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:56,333 - root - INFO - Step 15790: lr=8.91E-07, loss= 1.0336 (max= 1.4741), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:56,333 - root - INFO - Step 15790: lr=8.91E-07, loss= 1.0336 (max= 1.4741), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:07:56,333 - root - INFO - Step 15790: lr=8.91E-07, loss= 1.0336 (max= 1.4741), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:08:28,125 - root - INFO - Step 15800: lr=8.88E-07, loss= 1.0605 (max= 1.7395), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:08:28,125 - root - INFO - Step 15800: lr=8.88E-07, loss= 1.0605 (max= 1.7395), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:08:28,125 - root - INFO - Step 15800: lr=8.88E-07, loss= 1.0605 (max= 1.7395), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:08:28,125 - root - INFO - Step 15800: lr=8.88E-07, loss= 1.0605 (max= 1.7395), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:08:28,125 - root - INFO - Step 15800: lr=8.88E-07, loss= 1.0605 (max= 1.7395), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:08:28,125 - root - INFO - Step 15800: lr=8.88E-07, loss= 1.0605 (max= 1.7395), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:08:28,125 - root - INFO - Step 15800: lr=8.88E-07, loss= 1.0605 (max= 1.7395), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:08:28,125 - root - INFO - Step 15800: lr=8.88E-07, loss= 1.0605 (max= 1.7395), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:00,080 - root - INFO - Step 15810: lr=8.85E-07, loss= 1.0439 (max= 1.8581), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:00,080 - root - INFO - Step 15810: lr=8.85E-07, loss= 1.0439 (max= 1.8581), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:00,080 - root - INFO - Step 15810: lr=8.85E-07, loss= 1.0439 (max= 1.8581), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:00,080 - root - INFO - Step 15810: lr=8.85E-07, loss= 1.0439 (max= 1.8581), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:00,080 - root - INFO - Step 15810: lr=8.85E-07, loss= 1.0439 (max= 1.8581), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:00,080 - root - INFO - Step 15810: lr=8.85E-07, loss= 1.0439 (max= 1.8581), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:00,080 - root - INFO - Step 15810: lr=8.85E-07, loss= 1.0439 (max= 1.8581), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:00,080 - root - INFO - Step 15810: lr=8.85E-07, loss= 1.0439 (max= 1.8581), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:31,858 - root - INFO - Step 15820: lr=8.82E-07, loss= 1.0536 (max= 1.7219), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:31,858 - root - INFO - Step 15820: lr=8.82E-07, loss= 1.0536 (max= 1.7219), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:31,858 - root - INFO - Step 15820: lr=8.82E-07, loss= 1.0536 (max= 1.7219), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:31,858 - root - INFO - Step 15820: lr=8.82E-07, loss= 1.0536 (max= 1.7219), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:31,858 - root - INFO - Step 15820: lr=8.82E-07, loss= 1.0536 (max= 1.7219), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:31,858 - root - INFO - Step 15820: lr=8.82E-07, loss= 1.0536 (max= 1.7219), tps=20626, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:31,858 - root - INFO - Step 15820: lr=8.82E-07, loss= 1.0536 (max= 1.7219), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:09:31,858 - root - INFO - Step 15820: lr=8.82E-07, loss= 1.0536 (max= 1.7219), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:03,719 - root - INFO - Step 15830: lr=8.79E-07, loss= 1.0400 (max= 1.6269), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:03,719 - root - INFO - Step 15830: lr=8.79E-07, loss= 1.0400 (max= 1.6269), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:03,720 - root - INFO - Step 15830: lr=8.79E-07, loss= 1.0400 (max= 1.6269), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:03,720 - root - INFO - Step 15830: lr=8.79E-07, loss= 1.0400 (max= 1.6269), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:03,720 - root - INFO - Step 15830: lr=8.79E-07, loss= 1.0400 (max= 1.6269), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:03,720 - root - INFO - Step 15830: lr=8.79E-07, loss= 1.0400 (max= 1.6269), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:03,720 - root - INFO - Step 15830: lr=8.79E-07, loss= 1.0400 (max= 1.6269), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:03,720 - root - INFO - Step 15830: lr=8.79E-07, loss= 1.0400 (max= 1.6269), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:35,627 - root - INFO - Step 15840: lr=8.76E-07, loss= 1.0429 (max= 1.5203), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:35,627 - root - INFO - Step 15840: lr=8.76E-07, loss= 1.0429 (max= 1.5203), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:35,627 - root - INFO - Step 15840: lr=8.76E-07, loss= 1.0429 (max= 1.5203), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:35,627 - root - INFO - Step 15840: lr=8.76E-07, loss= 1.0429 (max= 1.5203), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:35,627 - root - INFO - Step 15840: lr=8.76E-07, loss= 1.0429 (max= 1.5203), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:35,627 - root - INFO - Step 15840: lr=8.76E-07, loss= 1.0429 (max= 1.5203), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:35,627 - root - INFO - Step 15840: lr=8.76E-07, loss= 1.0429 (max= 1.5203), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:10:35,627 - root - INFO - Step 15840: lr=8.76E-07, loss= 1.0429 (max= 1.5203), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:07,403 - root - INFO - Step 15850: lr=8.73E-07, loss= 1.0660 (max= 2.0259), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:07,403 - root - INFO - Step 15850: lr=8.73E-07, loss= 1.0660 (max= 2.0259), tps=20626, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:07,403 - root - INFO - Step 15850: lr=8.73E-07, loss= 1.0660 (max= 2.0259), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:07,403 - root - INFO - Step 15850: lr=8.73E-07, loss= 1.0660 (max= 2.0259), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:07,403 - root - INFO - Step 15850: lr=8.73E-07, loss= 1.0660 (max= 2.0259), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:07,403 - root - INFO - Step 15850: lr=8.73E-07, loss= 1.0660 (max= 2.0259), tps=20626, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:07,403 - root - INFO - Step 15850: lr=8.73E-07, loss= 1.0660 (max= 2.0259), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:07,403 - root - INFO - Step 15850: lr=8.73E-07, loss= 1.0660 (max= 2.0259), tps=20626, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:39,286 - root - INFO - Step 15860: lr=8.70E-07, loss= 1.0711 (max= 1.5005), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:39,286 - root - INFO - Step 15860: lr=8.70E-07, loss= 1.0711 (max= 1.5005), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:39,286 - root - INFO - Step 15860: lr=8.70E-07, loss= 1.0711 (max= 1.5005), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:39,286 - root - INFO - Step 15860: lr=8.70E-07, loss= 1.0711 (max= 1.5005), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:39,286 - root - INFO - Step 15860: lr=8.70E-07, loss= 1.0711 (max= 1.5005), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:39,286 - root - INFO - Step 15860: lr=8.70E-07, loss= 1.0711 (max= 1.5005), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:39,286 - root - INFO - Step 15860: lr=8.70E-07, loss= 1.0711 (max= 1.5005), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:11:39,286 - root - INFO - Step 15860: lr=8.70E-07, loss= 1.0711 (max= 1.5005), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:11,139 - root - INFO - Step 15870: lr=8.67E-07, loss= 1.0253 (max= 1.5034), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:11,139 - root - INFO - Step 15870: lr=8.67E-07, loss= 1.0253 (max= 1.5034), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:11,139 - root - INFO - Step 15870: lr=8.67E-07, loss= 1.0253 (max= 1.5034), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:11,139 - root - INFO - Step 15870: lr=8.67E-07, loss= 1.0253 (max= 1.5034), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:11,139 - root - INFO - Step 15870: lr=8.67E-07, loss= 1.0253 (max= 1.5034), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:11,139 - root - INFO - Step 15870: lr=8.67E-07, loss= 1.0253 (max= 1.5034), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:11,139 - root - INFO - Step 15870: lr=8.67E-07, loss= 1.0253 (max= 1.5034), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:11,139 - root - INFO - Step 15870: lr=8.67E-07, loss= 1.0253 (max= 1.5034), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:43,026 - root - INFO - Step 15880: lr=8.64E-07, loss= 1.0568 (max= 1.5408), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:43,026 - root - INFO - Step 15880: lr=8.64E-07, loss= 1.0568 (max= 1.5408), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:43,026 - root - INFO - Step 15880: lr=8.64E-07, loss= 1.0568 (max= 1.5408), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:43,026 - root - INFO - Step 15880: lr=8.64E-07, loss= 1.0568 (max= 1.5408), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:43,026 - root - INFO - Step 15880: lr=8.64E-07, loss= 1.0568 (max= 1.5408), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:43,026 - root - INFO - Step 15880: lr=8.64E-07, loss= 1.0568 (max= 1.5408), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:43,026 - root - INFO - Step 15880: lr=8.64E-07, loss= 1.0568 (max= 1.5408), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:12:43,026 - root - INFO - Step 15880: lr=8.64E-07, loss= 1.0568 (max= 1.5408), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:14,942 - root - INFO - Step 15890: lr=8.61E-07, loss= 1.0657 (max= 1.5088), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:14,942 - root - INFO - Step 15890: lr=8.61E-07, loss= 1.0657 (max= 1.5088), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:14,942 - root - INFO - Step 15890: lr=8.61E-07, loss= 1.0657 (max= 1.5088), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:14,942 - root - INFO - Step 15890: lr=8.61E-07, loss= 1.0657 (max= 1.5088), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:14,942 - root - INFO - Step 15890: lr=8.61E-07, loss= 1.0657 (max= 1.5088), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:14,942 - root - INFO - Step 15890: lr=8.61E-07, loss= 1.0657 (max= 1.5088), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:14,942 - root - INFO - Step 15890: lr=8.61E-07, loss= 1.0657 (max= 1.5088), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:14,942 - root - INFO - Step 15890: lr=8.61E-07, loss= 1.0657 (max= 1.5088), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:46,790 - root - INFO - Step 15900: lr=8.58E-07, loss= 1.0118 (max= 1.6798), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:46,790 - root - INFO - Step 15900: lr=8.58E-07, loss= 1.0118 (max= 1.6798), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:46,790 - root - INFO - Step 15900: lr=8.58E-07, loss= 1.0118 (max= 1.6798), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:46,790 - root - INFO - Step 15900: lr=8.58E-07, loss= 1.0118 (max= 1.6798), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:46,790 - root - INFO - Step 15900: lr=8.58E-07, loss= 1.0118 (max= 1.6798), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:46,790 - root - INFO - Step 15900: lr=8.58E-07, loss= 1.0118 (max= 1.6798), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:46,790 - root - INFO - Step 15900: lr=8.58E-07, loss= 1.0118 (max= 1.6798), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:13:46,790 - root - INFO - Step 15900: lr=8.58E-07, loss= 1.0118 (max= 1.6798), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:18,852 - root - INFO - Step 15910: lr=8.55E-07, loss= 1.0322 (max= 1.4492), tps=20442, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:18,852 - root - INFO - Step 15910: lr=8.55E-07, loss= 1.0322 (max= 1.4492), tps=20442, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:18,852 - root - INFO - Step 15910: lr=8.55E-07, loss= 1.0322 (max= 1.4492), tps=20442, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:18,853 - root - INFO - Step 15910: lr=8.55E-07, loss= 1.0322 (max= 1.4492), tps=20442, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:18,853 - root - INFO - Step 15910: lr=8.55E-07, loss= 1.0322 (max= 1.4492), tps=20442, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:18,853 - root - INFO - Step 15910: lr=8.55E-07, loss= 1.0322 (max= 1.4492), tps=20442, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:18,853 - root - INFO - Step 15910: lr=8.55E-07, loss= 1.0322 (max= 1.4492), tps=20442, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:18,853 - root - INFO - Step 15910: lr=8.55E-07, loss= 1.0322 (max= 1.4492), tps=20442, mfu=42.59%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:50,704 - root - INFO - Step 15920: lr=8.52E-07, loss= 1.0574 (max= 1.7506), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:50,704 - root - INFO - Step 15920: lr=8.52E-07, loss= 1.0574 (max= 1.7506), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:50,704 - root - INFO - Step 15920: lr=8.52E-07, loss= 1.0574 (max= 1.7506), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:50,704 - root - INFO - Step 15920: lr=8.52E-07, loss= 1.0574 (max= 1.7506), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:50,704 - root - INFO - Step 15920: lr=8.52E-07, loss= 1.0574 (max= 1.7506), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:50,704 - root - INFO - Step 15920: lr=8.52E-07, loss= 1.0574 (max= 1.7506), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:50,704 - root - INFO - Step 15920: lr=8.52E-07, loss= 1.0574 (max= 1.7506), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:14:50,704 - root - INFO - Step 15920: lr=8.52E-07, loss= 1.0574 (max= 1.7506), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:22,490 - root - INFO - Step 15930: lr=8.49E-07, loss= 1.0421 (max= 1.6446), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:22,490 - root - INFO - Step 15930: lr=8.49E-07, loss= 1.0421 (max= 1.6446), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:22,490 - root - INFO - Step 15930: lr=8.49E-07, loss= 1.0421 (max= 1.6446), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:22,490 - root - INFO - Step 15930: lr=8.49E-07, loss= 1.0421 (max= 1.6446), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:22,490 - root - INFO - Step 15930: lr=8.49E-07, loss= 1.0421 (max= 1.6446), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:22,490 - root - INFO - Step 15930: lr=8.49E-07, loss= 1.0421 (max= 1.6446), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:22,490 - root - INFO - Step 15930: lr=8.49E-07, loss= 1.0421 (max= 1.6446), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:22,490 - root - INFO - Step 15930: lr=8.49E-07, loss= 1.0421 (max= 1.6446), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:54,261 - root - INFO - Step 15940: lr=8.46E-07, loss= 1.0695 (max= 1.5937), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:54,261 - root - INFO - Step 15940: lr=8.46E-07, loss= 1.0695 (max= 1.5937), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:54,261 - root - INFO - Step 15940: lr=8.46E-07, loss= 1.0695 (max= 1.5937), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:54,261 - root - INFO - Step 15940: lr=8.46E-07, loss= 1.0695 (max= 1.5937), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:54,261 - root - INFO - Step 15940: lr=8.46E-07, loss= 1.0695 (max= 1.5937), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:54,261 - root - INFO - Step 15940: lr=8.46E-07, loss= 1.0695 (max= 1.5937), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:54,261 - root - INFO - Step 15940: lr=8.46E-07, loss= 1.0695 (max= 1.5937), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:15:54,261 - root - INFO - Step 15940: lr=8.46E-07, loss= 1.0695 (max= 1.5937), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:26,096 - root - INFO - Step 15950: lr=8.43E-07, loss= 1.0752 (max= 1.7829), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:26,096 - root - INFO - Step 15950: lr=8.43E-07, loss= 1.0752 (max= 1.7829), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:26,096 - root - INFO - Step 15950: lr=8.43E-07, loss= 1.0752 (max= 1.7829), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:26,096 - root - INFO - Step 15950: lr=8.43E-07, loss= 1.0752 (max= 1.7829), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:26,096 - root - INFO - Step 15950: lr=8.43E-07, loss= 1.0752 (max= 1.7829), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:26,096 - root - INFO - Step 15950: lr=8.43E-07, loss= 1.0752 (max= 1.7829), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:26,096 - root - INFO - Step 15950: lr=8.43E-07, loss= 1.0752 (max= 1.7829), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:26,096 - root - INFO - Step 15950: lr=8.43E-07, loss= 1.0752 (max= 1.7829), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:58,034 - root - INFO - Step 15960: lr=8.40E-07, loss= 1.0516 (max= 1.4929), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:58,034 - root - INFO - Step 15960: lr=8.40E-07, loss= 1.0516 (max= 1.4929), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:58,034 - root - INFO - Step 15960: lr=8.40E-07, loss= 1.0516 (max= 1.4929), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:58,034 - root - INFO - Step 15960: lr=8.40E-07, loss= 1.0516 (max= 1.4929), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:58,034 - root - INFO - Step 15960: lr=8.40E-07, loss= 1.0516 (max= 1.4929), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:58,034 - root - INFO - Step 15960: lr=8.40E-07, loss= 1.0516 (max= 1.4929), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:58,034 - root - INFO - Step 15960: lr=8.40E-07, loss= 1.0516 (max= 1.4929), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:16:58,034 - root - INFO - Step 15960: lr=8.40E-07, loss= 1.0516 (max= 1.4929), tps=20522, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:17:29,873 - root - INFO - Step 15970: lr=8.37E-07, loss= 1.0628 (max= 1.4984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:17:29,874 - root - INFO - Step 15970: lr=8.37E-07, loss= 1.0628 (max= 1.4984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:17:29,874 - root - INFO - Step 15970: lr=8.37E-07, loss= 1.0628 (max= 1.4984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:17:29,874 - root - INFO - Step 15970: lr=8.37E-07, loss= 1.0628 (max= 1.4984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:17:29,874 - root - INFO - Step 15970: lr=8.37E-07, loss= 1.0628 (max= 1.4984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:17:29,874 - root - INFO - Step 15970: lr=8.37E-07, loss= 1.0628 (max= 1.4984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:17:29,874 - root - INFO - Step 15970: lr=8.37E-07, loss= 1.0628 (max= 1.4984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:17:29,874 - root - INFO - Step 15970: lr=8.37E-07, loss= 1.0628 (max= 1.4984), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:01,813 - root - INFO - Step 15980: lr=8.34E-07, loss= 1.0259 (max= 1.7047), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:01,814 - root - INFO - Step 15980: lr=8.34E-07, loss= 1.0259 (max= 1.7047), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:01,814 - root - INFO - Step 15980: lr=8.34E-07, loss= 1.0259 (max= 1.7047), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:01,814 - root - INFO - Step 15980: lr=8.34E-07, loss= 1.0259 (max= 1.7047), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:01,814 - root - INFO - Step 15980: lr=8.34E-07, loss= 1.0259 (max= 1.7047), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:01,814 - root - INFO - Step 15980: lr=8.34E-07, loss= 1.0259 (max= 1.7047), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:01,814 - root - INFO - Step 15980: lr=8.34E-07, loss= 1.0259 (max= 1.7047), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:01,814 - root - INFO - Step 15980: lr=8.34E-07, loss= 1.0259 (max= 1.7047), tps=20521, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:33,679 - root - INFO - Step 15990: lr=8.31E-07, loss= 1.0148 (max= 1.4438), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:33,679 - root - INFO - Step 15990: lr=8.31E-07, loss= 1.0148 (max= 1.4438), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:33,679 - root - INFO - Step 15990: lr=8.31E-07, loss= 1.0148 (max= 1.4438), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:33,679 - root - INFO - Step 15990: lr=8.31E-07, loss= 1.0148 (max= 1.4438), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:33,679 - root - INFO - Step 15990: lr=8.31E-07, loss= 1.0148 (max= 1.4438), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:33,679 - root - INFO - Step 15990: lr=8.31E-07, loss= 1.0148 (max= 1.4438), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:33,679 - root - INFO - Step 15990: lr=8.31E-07, loss= 1.0148 (max= 1.4438), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:18:33,679 - root - INFO - Step 15990: lr=8.31E-07, loss= 1.0148 (max= 1.4438), tps=20569, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-16000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-16000! Save time: 4.37660026550293 2025-10-26 22:19:05,505 - root - INFO - Step 16000: lr=8.28E-07, loss= 1.0304 (max= 1.6035), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:05,505 - root - INFO - Step 16000: lr=8.28E-07, loss= 1.0304 (max= 1.6035), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:05,505 - root - INFO - Step 16000: lr=8.28E-07, loss= 1.0304 (max= 1.6035), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:05,505 - root - INFO - Saving a full checkpoint at step 16000 2025-10-26 22:19:05,505 - root - INFO - Step 16000: lr=8.28E-07, loss= 1.0304 (max= 1.6035), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:05,505 - root - INFO - Saving a full checkpoint at step 16000 2025-10-26 22:19:05,505 - root - INFO - Step 16000: lr=8.28E-07, loss= 1.0304 (max= 1.6035), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:05,505 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 22:19:05,505 - root - INFO - Step 16000: lr=8.28E-07, loss= 1.0304 (max= 1.6035), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:05,505 - root - INFO - Step 16000: lr=8.28E-07, loss= 1.0304 (max= 1.6035), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:05,505 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 22:19:05,505 - root - INFO - Saving a full checkpoint at step 16000 2025-10-26 22:19:05,505 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 22:19:05,505 - root - INFO - Saving a full checkpoint at step 16000 2025-10-26 22:19:05,505 - root - INFO - Saving a full checkpoint at step 16000 2025-10-26 22:19:05,505 - root - INFO - Saving a full checkpoint at step 16000 2025-10-26 22:19:05,505 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 22:19:05,505 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 22:19:05,505 - root - INFO - Step 16000: lr=8.28E-07, loss= 1.0304 (max= 1.6035), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:05,505 - root - INFO - Saving a full checkpoint at step 16000 2025-10-26 22:19:05,505 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 22:19:05,505 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 22:19:05,505 - root - INFO - Saving a full checkpoint at step 16000 2025-10-26 22:19:05,505 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 22:19:20,523 - root - INFO - Finished saving the checkpoint in 15.02 seconds 2025-10-26 22:19:20,531 - root - INFO - Finished saving the checkpoint in 15.03 seconds 2025-10-26 22:19:20,531 - root - INFO - Finished saving the checkpoint in 15.03 seconds 2025-10-26 22:19:20,531 - root - INFO - Finished saving the checkpoint in 15.03 seconds 2025-10-26 22:19:20,531 - root - INFO - Finished saving the checkpoint in 15.03 seconds 2025-10-26 22:19:20,532 - root - INFO - Finished saving the checkpoint in 15.03 seconds 2025-10-26 22:19:20,532 - root - INFO - Finished saving the checkpoint in 15.03 seconds 2025-10-26 22:19:20,533 - root - INFO - Finished saving the checkpoint in 15.03 seconds 2025-10-26 22:19:52,274 - root - INFO - Step 16010: lr=8.25E-07, loss= 1.0381 (max= 1.5939), tps=14014, mfu=29.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:52,275 - root - INFO - Step 16010: lr=8.25E-07, loss= 1.0381 (max= 1.5939), tps=14014, mfu=29.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:52,275 - root - INFO - Step 16010: lr=8.25E-07, loss= 1.0381 (max= 1.5939), tps=14014, mfu=29.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:52,275 - root - INFO - Step 16010: lr=8.25E-07, loss= 1.0381 (max= 1.5939), tps=14014, mfu=29.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:52,275 - root - INFO - Step 16010: lr=8.25E-07, loss= 1.0381 (max= 1.5939), tps=14014, mfu=29.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:52,275 - root - INFO - Step 16010: lr=8.25E-07, loss= 1.0381 (max= 1.5939), tps=14014, mfu=29.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:52,275 - root - INFO - Step 16010: lr=8.25E-07, loss= 1.0381 (max= 1.5939), tps=14014, mfu=29.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:19:52,275 - root - INFO - Step 16010: lr=8.25E-07, loss= 1.0381 (max= 1.5939), tps=14014, mfu=29.20%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:24,181 - root - INFO - Step 16020: lr=8.22E-07, loss= 1.0372 (max= 1.5512), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:24,181 - root - INFO - Step 16020: lr=8.22E-07, loss= 1.0372 (max= 1.5512), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:24,181 - root - INFO - Step 16020: lr=8.22E-07, loss= 1.0372 (max= 1.5512), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:24,181 - root - INFO - Step 16020: lr=8.22E-07, loss= 1.0372 (max= 1.5512), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:24,181 - root - INFO - Step 16020: lr=8.22E-07, loss= 1.0372 (max= 1.5512), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:24,181 - root - INFO - Step 16020: lr=8.22E-07, loss= 1.0372 (max= 1.5512), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:24,181 - root - INFO - Step 16020: lr=8.22E-07, loss= 1.0372 (max= 1.5512), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:24,181 - root - INFO - Step 16020: lr=8.22E-07, loss= 1.0372 (max= 1.5512), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:55,973 - root - INFO - Step 16030: lr=8.19E-07, loss= 1.0467 (max= 1.5371), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:55,974 - root - INFO - Step 16030: lr=8.19E-07, loss= 1.0467 (max= 1.5371), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:55,974 - root - INFO - Step 16030: lr=8.19E-07, loss= 1.0467 (max= 1.5371), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:55,974 - root - INFO - Step 16030: lr=8.19E-07, loss= 1.0467 (max= 1.5371), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:55,974 - root - INFO - Step 16030: lr=8.19E-07, loss= 1.0467 (max= 1.5371), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:55,974 - root - INFO - Step 16030: lr=8.19E-07, loss= 1.0467 (max= 1.5371), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:55,974 - root - INFO - Step 16030: lr=8.19E-07, loss= 1.0467 (max= 1.5371), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:20:55,974 - root - INFO - Step 16030: lr=8.19E-07, loss= 1.0467 (max= 1.5371), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:27,814 - root - INFO - Step 16040: lr=8.16E-07, loss= 1.0213 (max= 1.5226), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:27,815 - root - INFO - Step 16040: lr=8.16E-07, loss= 1.0213 (max= 1.5226), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:27,815 - root - INFO - Step 16040: lr=8.16E-07, loss= 1.0213 (max= 1.5226), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:27,815 - root - INFO - Step 16040: lr=8.16E-07, loss= 1.0213 (max= 1.5226), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:27,815 - root - INFO - Step 16040: lr=8.16E-07, loss= 1.0213 (max= 1.5226), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:27,815 - root - INFO - Step 16040: lr=8.16E-07, loss= 1.0213 (max= 1.5226), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:27,815 - root - INFO - Step 16040: lr=8.16E-07, loss= 1.0213 (max= 1.5226), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:27,815 - root - INFO - Step 16040: lr=8.16E-07, loss= 1.0213 (max= 1.5226), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:59,608 - root - INFO - Step 16050: lr=8.14E-07, loss= 1.0495 (max= 1.5251), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:59,608 - root - INFO - Step 16050: lr=8.14E-07, loss= 1.0495 (max= 1.5251), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:59,608 - root - INFO - Step 16050: lr=8.14E-07, loss= 1.0495 (max= 1.5251), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:59,608 - root - INFO - Step 16050: lr=8.14E-07, loss= 1.0495 (max= 1.5251), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:59,608 - root - INFO - Step 16050: lr=8.14E-07, loss= 1.0495 (max= 1.5251), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:59,608 - root - INFO - Step 16050: lr=8.14E-07, loss= 1.0495 (max= 1.5251), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:59,608 - root - INFO - Step 16050: lr=8.14E-07, loss= 1.0495 (max= 1.5251), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:21:59,608 - root - INFO - Step 16050: lr=8.14E-07, loss= 1.0495 (max= 1.5251), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:22:31,476 - root - INFO - Step 16060: lr=8.11E-07, loss= 1.0419 (max= 1.5550), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:22:31,476 - root - INFO - Step 16060: lr=8.11E-07, loss= 1.0419 (max= 1.5550), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:22:31,476 - root - INFO - Step 16060: lr=8.11E-07, loss= 1.0419 (max= 1.5550), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:22:31,476 - root - INFO - Step 16060: lr=8.11E-07, loss= 1.0419 (max= 1.5550), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:22:31,476 - root - INFO - Step 16060: lr=8.11E-07, loss= 1.0419 (max= 1.5550), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:22:31,476 - root - INFO - Step 16060: lr=8.11E-07, loss= 1.0419 (max= 1.5550), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:22:31,476 - root - INFO - Step 16060: lr=8.11E-07, loss= 1.0419 (max= 1.5550), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:22:31,476 - root - INFO - Step 16060: lr=8.11E-07, loss= 1.0419 (max= 1.5550), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:03,773 - root - INFO - Step 16070: lr=8.08E-07, loss= 1.0318 (max= 1.4934), tps=20294, mfu=42.28%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:03,773 - root - INFO - Step 16070: lr=8.08E-07, loss= 1.0318 (max= 1.4934), tps=20294, mfu=42.28%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:03,773 - root - INFO - Step 16070: lr=8.08E-07, loss= 1.0318 (max= 1.4934), tps=20294, mfu=42.28%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:03,773 - root - INFO - Step 16070: lr=8.08E-07, loss= 1.0318 (max= 1.4934), tps=20294, mfu=42.28%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:03,773 - root - INFO - Step 16070: lr=8.08E-07, loss= 1.0318 (max= 1.4934), tps=20294, mfu=42.28%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:03,773 - root - INFO - Step 16070: lr=8.08E-07, loss= 1.0318 (max= 1.4934), tps=20294, mfu=42.28%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:03,773 - root - INFO - Step 16070: lr=8.08E-07, loss= 1.0318 (max= 1.4934), tps=20294, mfu=42.28%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:03,774 - root - INFO - Step 16070: lr=8.08E-07, loss= 1.0318 (max= 1.4934), tps=20294, mfu=42.28%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:35,728 - root - INFO - Step 16080: lr=8.05E-07, loss= 1.0337 (max= 1.6574), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:35,728 - root - INFO - Step 16080: lr=8.05E-07, loss= 1.0337 (max= 1.6574), tps=20510, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:35,728 - root - INFO - Step 16080: lr=8.05E-07, loss= 1.0337 (max= 1.6574), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:35,728 - root - INFO - Step 16080: lr=8.05E-07, loss= 1.0337 (max= 1.6574), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:35,728 - root - INFO - Step 16080: lr=8.05E-07, loss= 1.0337 (max= 1.6574), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:35,728 - root - INFO - Step 16080: lr=8.05E-07, loss= 1.0337 (max= 1.6574), tps=20511, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:35,728 - root - INFO - Step 16080: lr=8.05E-07, loss= 1.0337 (max= 1.6574), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:23:35,728 - root - INFO - Step 16080: lr=8.05E-07, loss= 1.0337 (max= 1.6574), tps=20511, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:07,577 - root - INFO - Step 16090: lr=8.02E-07, loss= 1.0245 (max= 1.3953), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:07,577 - root - INFO - Step 16090: lr=8.02E-07, loss= 1.0245 (max= 1.3953), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:07,577 - root - INFO - Step 16090: lr=8.02E-07, loss= 1.0245 (max= 1.3953), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:07,577 - root - INFO - Step 16090: lr=8.02E-07, loss= 1.0245 (max= 1.3953), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:07,577 - root - INFO - Step 16090: lr=8.02E-07, loss= 1.0245 (max= 1.3953), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:07,577 - root - INFO - Step 16090: lr=8.02E-07, loss= 1.0245 (max= 1.3953), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:07,577 - root - INFO - Step 16090: lr=8.02E-07, loss= 1.0245 (max= 1.3953), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:07,577 - root - INFO - Step 16090: lr=8.02E-07, loss= 1.0245 (max= 1.3953), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:39,591 - root - INFO - Step 16100: lr=7.99E-07, loss= 1.0517 (max= 1.5394), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:39,591 - root - INFO - Step 16100: lr=7.99E-07, loss= 1.0517 (max= 1.5394), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:39,591 - root - INFO - Step 16100: lr=7.99E-07, loss= 1.0517 (max= 1.5394), tps=20473, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:39,591 - root - INFO - Step 16100: lr=7.99E-07, loss= 1.0517 (max= 1.5394), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:39,592 - root - INFO - Step 16100: lr=7.99E-07, loss= 1.0517 (max= 1.5394), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:39,592 - root - INFO - Step 16100: lr=7.99E-07, loss= 1.0517 (max= 1.5394), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:39,592 - root - INFO - Step 16100: lr=7.99E-07, loss= 1.0517 (max= 1.5394), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:24:39,592 - root - INFO - Step 16100: lr=7.99E-07, loss= 1.0517 (max= 1.5394), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:11,425 - root - INFO - Step 16110: lr=7.96E-07, loss= 1.0451 (max= 1.6369), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:11,425 - root - INFO - Step 16110: lr=7.96E-07, loss= 1.0451 (max= 1.6369), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:11,425 - root - INFO - Step 16110: lr=7.96E-07, loss= 1.0451 (max= 1.6369), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:11,425 - root - INFO - Step 16110: lr=7.96E-07, loss= 1.0451 (max= 1.6369), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:11,425 - root - INFO - Step 16110: lr=7.96E-07, loss= 1.0451 (max= 1.6369), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:11,425 - root - INFO - Step 16110: lr=7.96E-07, loss= 1.0451 (max= 1.6369), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:11,425 - root - INFO - Step 16110: lr=7.96E-07, loss= 1.0451 (max= 1.6369), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:11,425 - root - INFO - Step 16110: lr=7.96E-07, loss= 1.0451 (max= 1.6369), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:34,323 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:3883965 2025-10-26 22:25:43,315 - root - INFO - Step 16120: lr=7.93E-07, loss= 1.0413 (max= 1.5164), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:43,315 - root - INFO - Step 16120: lr=7.93E-07, loss= 1.0413 (max= 1.5164), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:43,315 - root - INFO - Step 16120: lr=7.93E-07, loss= 1.0413 (max= 1.5164), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:43,315 - root - INFO - Step 16120: lr=7.93E-07, loss= 1.0413 (max= 1.5164), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:43,315 - root - INFO - Step 16120: lr=7.93E-07, loss= 1.0413 (max= 1.5164), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:43,315 - root - INFO - Step 16120: lr=7.93E-07, loss= 1.0413 (max= 1.5164), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:43,315 - root - INFO - Step 16120: lr=7.93E-07, loss= 1.0413 (max= 1.5164), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:25:43,315 - root - INFO - Step 16120: lr=7.93E-07, loss= 1.0413 (max= 1.5164), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:15,203 - root - INFO - Step 16130: lr=7.90E-07, loss= 1.0567 (max= 1.6638), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:15,203 - root - INFO - Step 16130: lr=7.90E-07, loss= 1.0567 (max= 1.6638), tps=20555, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:15,203 - root - INFO - Step 16130: lr=7.90E-07, loss= 1.0567 (max= 1.6638), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:15,203 - root - INFO - Step 16130: lr=7.90E-07, loss= 1.0567 (max= 1.6638), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:15,203 - root - INFO - Step 16130: lr=7.90E-07, loss= 1.0567 (max= 1.6638), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:15,203 - root - INFO - Step 16130: lr=7.90E-07, loss= 1.0567 (max= 1.6638), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:15,203 - root - INFO - Step 16130: lr=7.90E-07, loss= 1.0567 (max= 1.6638), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:15,203 - root - INFO - Step 16130: lr=7.90E-07, loss= 1.0567 (max= 1.6638), tps=20554, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:47,069 - root - INFO - Step 16140: lr=7.87E-07, loss= 1.0474 (max= 1.7743), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:47,069 - root - INFO - Step 16140: lr=7.87E-07, loss= 1.0474 (max= 1.7743), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:47,069 - root - INFO - Step 16140: lr=7.87E-07, loss= 1.0474 (max= 1.7743), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:47,069 - root - INFO - Step 16140: lr=7.87E-07, loss= 1.0474 (max= 1.7743), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:47,069 - root - INFO - Step 16140: lr=7.87E-07, loss= 1.0474 (max= 1.7743), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:47,069 - root - INFO - Step 16140: lr=7.87E-07, loss= 1.0474 (max= 1.7743), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:47,069 - root - INFO - Step 16140: lr=7.87E-07, loss= 1.0474 (max= 1.7743), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:26:47,069 - root - INFO - Step 16140: lr=7.87E-07, loss= 1.0474 (max= 1.7743), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:18,903 - root - INFO - Step 16150: lr=7.84E-07, loss= 1.0318 (max= 1.4899), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:18,903 - root - INFO - Step 16150: lr=7.84E-07, loss= 1.0318 (max= 1.4899), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:18,903 - root - INFO - Step 16150: lr=7.84E-07, loss= 1.0318 (max= 1.4899), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:18,903 - root - INFO - Step 16150: lr=7.84E-07, loss= 1.0318 (max= 1.4899), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:18,903 - root - INFO - Step 16150: lr=7.84E-07, loss= 1.0318 (max= 1.4899), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:18,903 - root - INFO - Step 16150: lr=7.84E-07, loss= 1.0318 (max= 1.4899), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:18,903 - root - INFO - Step 16150: lr=7.84E-07, loss= 1.0318 (max= 1.4899), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:18,903 - root - INFO - Step 16150: lr=7.84E-07, loss= 1.0318 (max= 1.4899), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:50,847 - root - INFO - Step 16160: lr=7.81E-07, loss= 1.0503 (max= 1.5056), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:50,847 - root - INFO - Step 16160: lr=7.81E-07, loss= 1.0503 (max= 1.5056), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:50,847 - root - INFO - Step 16160: lr=7.81E-07, loss= 1.0503 (max= 1.5056), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:50,847 - root - INFO - Step 16160: lr=7.81E-07, loss= 1.0503 (max= 1.5056), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:50,847 - root - INFO - Step 16160: lr=7.81E-07, loss= 1.0503 (max= 1.5056), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:50,847 - root - INFO - Step 16160: lr=7.81E-07, loss= 1.0503 (max= 1.5056), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:50,847 - root - INFO - Step 16160: lr=7.81E-07, loss= 1.0503 (max= 1.5056), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:27:50,847 - root - INFO - Step 16160: lr=7.81E-07, loss= 1.0503 (max= 1.5056), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:22,768 - root - INFO - Step 16170: lr=7.78E-07, loss= 1.0267 (max= 1.4325), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:22,768 - root - INFO - Step 16170: lr=7.78E-07, loss= 1.0267 (max= 1.4325), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:22,768 - root - INFO - Step 16170: lr=7.78E-07, loss= 1.0267 (max= 1.4325), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:22,768 - root - INFO - Step 16170: lr=7.78E-07, loss= 1.0267 (max= 1.4325), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:22,768 - root - INFO - Step 16170: lr=7.78E-07, loss= 1.0267 (max= 1.4325), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:22,768 - root - INFO - Step 16170: lr=7.78E-07, loss= 1.0267 (max= 1.4325), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:22,768 - root - INFO - Step 16170: lr=7.78E-07, loss= 1.0267 (max= 1.4325), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:22,768 - root - INFO - Step 16170: lr=7.78E-07, loss= 1.0267 (max= 1.4325), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:54,537 - root - INFO - Step 16180: lr=7.75E-07, loss= 1.0193 (max= 1.5095), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:54,537 - root - INFO - Step 16180: lr=7.75E-07, loss= 1.0193 (max= 1.5095), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:54,537 - root - INFO - Step 16180: lr=7.75E-07, loss= 1.0193 (max= 1.5095), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:54,537 - root - INFO - Step 16180: lr=7.75E-07, loss= 1.0193 (max= 1.5095), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:54,538 - root - INFO - Step 16180: lr=7.75E-07, loss= 1.0193 (max= 1.5095), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:54,538 - root - INFO - Step 16180: lr=7.75E-07, loss= 1.0193 (max= 1.5095), tps=20631, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:54,538 - root - INFO - Step 16180: lr=7.75E-07, loss= 1.0193 (max= 1.5095), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:28:54,538 - root - INFO - Step 16180: lr=7.75E-07, loss= 1.0193 (max= 1.5095), tps=20631, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:26,375 - root - INFO - Step 16190: lr=7.72E-07, loss= 1.0347 (max= 1.4659), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:26,375 - root - INFO - Step 16190: lr=7.72E-07, loss= 1.0347 (max= 1.4659), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:26,375 - root - INFO - Step 16190: lr=7.72E-07, loss= 1.0347 (max= 1.4659), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:26,375 - root - INFO - Step 16190: lr=7.72E-07, loss= 1.0347 (max= 1.4659), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:26,375 - root - INFO - Step 16190: lr=7.72E-07, loss= 1.0347 (max= 1.4659), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:26,375 - root - INFO - Step 16190: lr=7.72E-07, loss= 1.0347 (max= 1.4659), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:26,375 - root - INFO - Step 16190: lr=7.72E-07, loss= 1.0347 (max= 1.4659), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:26,375 - root - INFO - Step 16190: lr=7.72E-07, loss= 1.0347 (max= 1.4659), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:58,427 - root - INFO - Step 16200: lr=7.69E-07, loss= 1.0528 (max= 1.4719), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:58,427 - root - INFO - Step 16200: lr=7.69E-07, loss= 1.0528 (max= 1.4719), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:58,427 - root - INFO - Step 16200: lr=7.69E-07, loss= 1.0528 (max= 1.4719), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:58,427 - root - INFO - Step 16200: lr=7.69E-07, loss= 1.0528 (max= 1.4719), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:58,427 - root - INFO - Step 16200: lr=7.69E-07, loss= 1.0528 (max= 1.4719), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:58,427 - root - INFO - Step 16200: lr=7.69E-07, loss= 1.0528 (max= 1.4719), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:58,427 - root - INFO - Step 16200: lr=7.69E-07, loss= 1.0528 (max= 1.4719), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:29:58,427 - root - INFO - Step 16200: lr=7.69E-07, loss= 1.0528 (max= 1.4719), tps=20449, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:30:30,299 - root - INFO - Step 16210: lr=7.66E-07, loss= 1.0374 (max= 1.4408), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:30:30,300 - root - INFO - Step 16210: lr=7.66E-07, loss= 1.0374 (max= 1.4408), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:30:30,300 - root - INFO - Step 16210: lr=7.66E-07, loss= 1.0374 (max= 1.4408), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:30:30,300 - root - INFO - Step 16210: lr=7.66E-07, loss= 1.0374 (max= 1.4408), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:30:30,300 - root - INFO - Step 16210: lr=7.66E-07, loss= 1.0374 (max= 1.4408), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:30:30,300 - root - INFO - Step 16210: lr=7.66E-07, loss= 1.0374 (max= 1.4408), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:30:30,300 - root - INFO - Step 16210: lr=7.66E-07, loss= 1.0374 (max= 1.4408), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:30:30,300 - root - INFO - Step 16210: lr=7.66E-07, loss= 1.0374 (max= 1.4408), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:02,129 - root - INFO - Step 16220: lr=7.63E-07, loss= 1.0341 (max= 1.5657), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:02,129 - root - INFO - Step 16220: lr=7.63E-07, loss= 1.0341 (max= 1.5657), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:02,129 - root - INFO - Step 16220: lr=7.63E-07, loss= 1.0341 (max= 1.5657), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:02,129 - root - INFO - Step 16220: lr=7.63E-07, loss= 1.0341 (max= 1.5657), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:02,129 - root - INFO - Step 16220: lr=7.63E-07, loss= 1.0341 (max= 1.5657), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:02,129 - root - INFO - Step 16220: lr=7.63E-07, loss= 1.0341 (max= 1.5657), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:02,129 - root - INFO - Step 16220: lr=7.63E-07, loss= 1.0341 (max= 1.5657), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:02,129 - root - INFO - Step 16220: lr=7.63E-07, loss= 1.0341 (max= 1.5657), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:33,915 - root - INFO - Step 16230: lr=7.60E-07, loss= 1.0454 (max= 1.4639), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:33,915 - root - INFO - Step 16230: lr=7.60E-07, loss= 1.0454 (max= 1.4639), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:33,915 - root - INFO - Step 16230: lr=7.60E-07, loss= 1.0454 (max= 1.4639), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:33,915 - root - INFO - Step 16230: lr=7.60E-07, loss= 1.0454 (max= 1.4639), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:33,915 - root - INFO - Step 16230: lr=7.60E-07, loss= 1.0454 (max= 1.4639), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:33,915 - root - INFO - Step 16230: lr=7.60E-07, loss= 1.0454 (max= 1.4639), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:33,915 - root - INFO - Step 16230: lr=7.60E-07, loss= 1.0454 (max= 1.4639), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:31:33,915 - root - INFO - Step 16230: lr=7.60E-07, loss= 1.0454 (max= 1.4639), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:05,808 - root - INFO - Step 16240: lr=7.58E-07, loss= 1.0300 (max= 1.5888), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:05,808 - root - INFO - Step 16240: lr=7.58E-07, loss= 1.0300 (max= 1.5888), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:05,808 - root - INFO - Step 16240: lr=7.58E-07, loss= 1.0300 (max= 1.5888), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:05,808 - root - INFO - Step 16240: lr=7.58E-07, loss= 1.0300 (max= 1.5888), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:05,808 - root - INFO - Step 16240: lr=7.58E-07, loss= 1.0300 (max= 1.5888), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:05,808 - root - INFO - Step 16240: lr=7.58E-07, loss= 1.0300 (max= 1.5888), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:05,808 - root - INFO - Step 16240: lr=7.58E-07, loss= 1.0300 (max= 1.5888), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:05,808 - root - INFO - Step 16240: lr=7.58E-07, loss= 1.0300 (max= 1.5888), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:37,696 - root - INFO - Step 16250: lr=7.55E-07, loss= 1.0612 (max= 1.6742), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:37,697 - root - INFO - Step 16250: lr=7.55E-07, loss= 1.0612 (max= 1.6742), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:37,697 - root - INFO - Step 16250: lr=7.55E-07, loss= 1.0612 (max= 1.6742), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:37,697 - root - INFO - Step 16250: lr=7.55E-07, loss= 1.0612 (max= 1.6742), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:37,697 - root - INFO - Step 16250: lr=7.55E-07, loss= 1.0612 (max= 1.6742), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:37,697 - root - INFO - Step 16250: lr=7.55E-07, loss= 1.0612 (max= 1.6742), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:37,697 - root - INFO - Step 16250: lr=7.55E-07, loss= 1.0612 (max= 1.6742), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:32:37,697 - root - INFO - Step 16250: lr=7.55E-07, loss= 1.0612 (max= 1.6742), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:09,497 - root - INFO - Step 16260: lr=7.52E-07, loss= 1.0571 (max= 1.4731), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:09,497 - root - INFO - Step 16260: lr=7.52E-07, loss= 1.0571 (max= 1.4731), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:09,497 - root - INFO - Step 16260: lr=7.52E-07, loss= 1.0571 (max= 1.4731), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:09,497 - root - INFO - Step 16260: lr=7.52E-07, loss= 1.0571 (max= 1.4731), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:09,497 - root - INFO - Step 16260: lr=7.52E-07, loss= 1.0571 (max= 1.4731), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:09,497 - root - INFO - Step 16260: lr=7.52E-07, loss= 1.0571 (max= 1.4731), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:09,497 - root - INFO - Step 16260: lr=7.52E-07, loss= 1.0571 (max= 1.4731), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:09,497 - root - INFO - Step 16260: lr=7.52E-07, loss= 1.0571 (max= 1.4731), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:41,404 - root - INFO - Step 16270: lr=7.49E-07, loss= 1.0652 (max= 2.0465), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:41,404 - root - INFO - Step 16270: lr=7.49E-07, loss= 1.0652 (max= 2.0465), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:41,404 - root - INFO - Step 16270: lr=7.49E-07, loss= 1.0652 (max= 2.0465), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:41,404 - root - INFO - Step 16270: lr=7.49E-07, loss= 1.0652 (max= 2.0465), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:41,404 - root - INFO - Step 16270: lr=7.49E-07, loss= 1.0652 (max= 2.0465), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:41,404 - root - INFO - Step 16270: lr=7.49E-07, loss= 1.0652 (max= 2.0465), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:41,404 - root - INFO - Step 16270: lr=7.49E-07, loss= 1.0652 (max= 2.0465), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:33:41,404 - root - INFO - Step 16270: lr=7.49E-07, loss= 1.0652 (max= 2.0465), tps=20542, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:13,314 - root - INFO - Step 16280: lr=7.46E-07, loss= 1.0433 (max= 1.4862), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:13,314 - root - INFO - Step 16280: lr=7.46E-07, loss= 1.0433 (max= 1.4862), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:13,314 - root - INFO - Step 16280: lr=7.46E-07, loss= 1.0433 (max= 1.4862), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:13,314 - root - INFO - Step 16280: lr=7.46E-07, loss= 1.0433 (max= 1.4862), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:13,314 - root - INFO - Step 16280: lr=7.46E-07, loss= 1.0433 (max= 1.4862), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:13,314 - root - INFO - Step 16280: lr=7.46E-07, loss= 1.0433 (max= 1.4862), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:13,314 - root - INFO - Step 16280: lr=7.46E-07, loss= 1.0433 (max= 1.4862), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:13,314 - root - INFO - Step 16280: lr=7.46E-07, loss= 1.0433 (max= 1.4862), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:45,165 - root - INFO - Step 16290: lr=7.43E-07, loss= 1.0586 (max= 1.5500), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:45,165 - root - INFO - Step 16290: lr=7.43E-07, loss= 1.0586 (max= 1.5500), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:45,165 - root - INFO - Step 16290: lr=7.43E-07, loss= 1.0586 (max= 1.5500), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:45,165 - root - INFO - Step 16290: lr=7.43E-07, loss= 1.0586 (max= 1.5500), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:45,165 - root - INFO - Step 16290: lr=7.43E-07, loss= 1.0586 (max= 1.5500), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:45,165 - root - INFO - Step 16290: lr=7.43E-07, loss= 1.0586 (max= 1.5500), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:45,165 - root - INFO - Step 16290: lr=7.43E-07, loss= 1.0586 (max= 1.5500), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:34:45,165 - root - INFO - Step 16290: lr=7.43E-07, loss= 1.0586 (max= 1.5500), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:17,000 - root - INFO - Step 16300: lr=7.40E-07, loss= 1.0518 (max= 1.5241), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:17,000 - root - INFO - Step 16300: lr=7.40E-07, loss= 1.0518 (max= 1.5241), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:17,000 - root - INFO - Step 16300: lr=7.40E-07, loss= 1.0518 (max= 1.5241), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:17,000 - root - INFO - Step 16300: lr=7.40E-07, loss= 1.0518 (max= 1.5241), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:17,000 - root - INFO - Step 16300: lr=7.40E-07, loss= 1.0518 (max= 1.5241), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:17,000 - root - INFO - Step 16300: lr=7.40E-07, loss= 1.0518 (max= 1.5241), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:17,000 - root - INFO - Step 16300: lr=7.40E-07, loss= 1.0518 (max= 1.5241), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:17,000 - root - INFO - Step 16300: lr=7.40E-07, loss= 1.0518 (max= 1.5241), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:48,847 - root - INFO - Step 16310: lr=7.37E-07, loss= 1.0595 (max= 1.7471), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:48,847 - root - INFO - Step 16310: lr=7.37E-07, loss= 1.0595 (max= 1.7471), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:48,847 - root - INFO - Step 16310: lr=7.37E-07, loss= 1.0595 (max= 1.7471), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:48,847 - root - INFO - Step 16310: lr=7.37E-07, loss= 1.0595 (max= 1.7471), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:48,847 - root - INFO - Step 16310: lr=7.37E-07, loss= 1.0595 (max= 1.7471), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:48,847 - root - INFO - Step 16310: lr=7.37E-07, loss= 1.0595 (max= 1.7471), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:48,847 - root - INFO - Step 16310: lr=7.37E-07, loss= 1.0595 (max= 1.7471), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:35:48,847 - root - INFO - Step 16310: lr=7.37E-07, loss= 1.0595 (max= 1.7471), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:20,680 - root - INFO - Step 16320: lr=7.34E-07, loss= 1.0655 (max= 1.4069), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:20,680 - root - INFO - Step 16320: lr=7.34E-07, loss= 1.0655 (max= 1.4069), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:20,680 - root - INFO - Step 16320: lr=7.34E-07, loss= 1.0655 (max= 1.4069), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:20,680 - root - INFO - Step 16320: lr=7.34E-07, loss= 1.0655 (max= 1.4069), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:20,680 - root - INFO - Step 16320: lr=7.34E-07, loss= 1.0655 (max= 1.4069), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:20,680 - root - INFO - Step 16320: lr=7.34E-07, loss= 1.0655 (max= 1.4069), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:20,680 - root - INFO - Step 16320: lr=7.34E-07, loss= 1.0655 (max= 1.4069), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:20,680 - root - INFO - Step 16320: lr=7.34E-07, loss= 1.0655 (max= 1.4069), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:52,510 - root - INFO - Step 16330: lr=7.31E-07, loss= 1.0419 (max= 1.4545), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:52,510 - root - INFO - Step 16330: lr=7.31E-07, loss= 1.0419 (max= 1.4545), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:52,510 - root - INFO - Step 16330: lr=7.31E-07, loss= 1.0419 (max= 1.4545), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:52,510 - root - INFO - Step 16330: lr=7.31E-07, loss= 1.0419 (max= 1.4545), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:52,510 - root - INFO - Step 16330: lr=7.31E-07, loss= 1.0419 (max= 1.4545), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:52,510 - root - INFO - Step 16330: lr=7.31E-07, loss= 1.0419 (max= 1.4545), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:52,510 - root - INFO - Step 16330: lr=7.31E-07, loss= 1.0419 (max= 1.4545), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:36:52,510 - root - INFO - Step 16330: lr=7.31E-07, loss= 1.0419 (max= 1.4545), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:24,392 - root - INFO - Step 16340: lr=7.28E-07, loss= 1.0650 (max= 1.7572), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:24,392 - root - INFO - Step 16340: lr=7.28E-07, loss= 1.0650 (max= 1.7572), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:24,392 - root - INFO - Step 16340: lr=7.28E-07, loss= 1.0650 (max= 1.7572), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:24,392 - root - INFO - Step 16340: lr=7.28E-07, loss= 1.0650 (max= 1.7572), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:24,392 - root - INFO - Step 16340: lr=7.28E-07, loss= 1.0650 (max= 1.7572), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:24,392 - root - INFO - Step 16340: lr=7.28E-07, loss= 1.0650 (max= 1.7572), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:24,392 - root - INFO - Step 16340: lr=7.28E-07, loss= 1.0650 (max= 1.7572), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:24,392 - root - INFO - Step 16340: lr=7.28E-07, loss= 1.0650 (max= 1.7572), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:56,287 - root - INFO - Step 16350: lr=7.25E-07, loss= 1.0648 (max= 1.5138), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:56,287 - root - INFO - Step 16350: lr=7.25E-07, loss= 1.0648 (max= 1.5138), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:56,287 - root - INFO - Step 16350: lr=7.25E-07, loss= 1.0648 (max= 1.5138), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:56,287 - root - INFO - Step 16350: lr=7.25E-07, loss= 1.0648 (max= 1.5138), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:56,287 - root - INFO - Step 16350: lr=7.25E-07, loss= 1.0648 (max= 1.5138), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:56,287 - root - INFO - Step 16350: lr=7.25E-07, loss= 1.0648 (max= 1.5138), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:56,287 - root - INFO - Step 16350: lr=7.25E-07, loss= 1.0648 (max= 1.5138), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:37:56,287 - root - INFO - Step 16350: lr=7.25E-07, loss= 1.0648 (max= 1.5138), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:38:28,189 - root - INFO - Step 16360: lr=7.22E-07, loss= 1.0437 (max= 1.4288), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:38:28,189 - root - INFO - Step 16360: lr=7.22E-07, loss= 1.0437 (max= 1.4288), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:38:28,189 - root - INFO - Step 16360: lr=7.22E-07, loss= 1.0437 (max= 1.4288), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:38:28,189 - root - INFO - Step 16360: lr=7.22E-07, loss= 1.0437 (max= 1.4288), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:38:28,189 - root - INFO - Step 16360: lr=7.22E-07, loss= 1.0437 (max= 1.4288), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:38:28,189 - root - INFO - Step 16360: lr=7.22E-07, loss= 1.0437 (max= 1.4288), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:38:28,189 - root - INFO - Step 16360: lr=7.22E-07, loss= 1.0437 (max= 1.4288), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:38:28,189 - root - INFO - Step 16360: lr=7.22E-07, loss= 1.0437 (max= 1.4288), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:00,084 - root - INFO - Step 16370: lr=7.19E-07, loss= 1.0453 (max= 1.5516), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:00,085 - root - INFO - Step 16370: lr=7.19E-07, loss= 1.0453 (max= 1.5516), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:00,085 - root - INFO - Step 16370: lr=7.19E-07, loss= 1.0453 (max= 1.5516), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:00,085 - root - INFO - Step 16370: lr=7.19E-07, loss= 1.0453 (max= 1.5516), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:00,085 - root - INFO - Step 16370: lr=7.19E-07, loss= 1.0453 (max= 1.5516), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:00,085 - root - INFO - Step 16370: lr=7.19E-07, loss= 1.0453 (max= 1.5516), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:00,085 - root - INFO - Step 16370: lr=7.19E-07, loss= 1.0453 (max= 1.5516), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:00,085 - root - INFO - Step 16370: lr=7.19E-07, loss= 1.0453 (max= 1.5516), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:31,936 - root - INFO - Step 16380: lr=7.17E-07, loss= 1.0508 (max= 1.4744), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:31,936 - root - INFO - Step 16380: lr=7.17E-07, loss= 1.0508 (max= 1.4744), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:31,936 - root - INFO - Step 16380: lr=7.17E-07, loss= 1.0508 (max= 1.4744), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:31,936 - root - INFO - Step 16380: lr=7.17E-07, loss= 1.0508 (max= 1.4744), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:31,936 - root - INFO - Step 16380: lr=7.17E-07, loss= 1.0508 (max= 1.4744), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:31,936 - root - INFO - Step 16380: lr=7.17E-07, loss= 1.0508 (max= 1.4744), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:31,936 - root - INFO - Step 16380: lr=7.17E-07, loss= 1.0508 (max= 1.4744), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:39:31,936 - root - INFO - Step 16380: lr=7.17E-07, loss= 1.0508 (max= 1.4744), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:03,735 - root - INFO - Step 16390: lr=7.14E-07, loss= 1.0674 (max= 1.4907), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:03,735 - root - INFO - Step 16390: lr=7.14E-07, loss= 1.0674 (max= 1.4907), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:03,735 - root - INFO - Step 16390: lr=7.14E-07, loss= 1.0674 (max= 1.4907), tps=20611, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:03,735 - root - INFO - Step 16390: lr=7.14E-07, loss= 1.0674 (max= 1.4907), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:03,735 - root - INFO - Step 16390: lr=7.14E-07, loss= 1.0674 (max= 1.4907), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:03,735 - root - INFO - Step 16390: lr=7.14E-07, loss= 1.0674 (max= 1.4907), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:03,735 - root - INFO - Step 16390: lr=7.14E-07, loss= 1.0674 (max= 1.4907), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:03,735 - root - INFO - Step 16390: lr=7.14E-07, loss= 1.0674 (max= 1.4907), tps=20612, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:35,590 - root - INFO - Step 16400: lr=7.11E-07, loss= 1.0501 (max= 1.5582), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:35,590 - root - INFO - Step 16400: lr=7.11E-07, loss= 1.0501 (max= 1.5582), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:35,591 - root - INFO - Step 16400: lr=7.11E-07, loss= 1.0501 (max= 1.5582), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:35,591 - root - INFO - Step 16400: lr=7.11E-07, loss= 1.0501 (max= 1.5582), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:35,591 - root - INFO - Step 16400: lr=7.11E-07, loss= 1.0501 (max= 1.5582), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:35,591 - root - INFO - Step 16400: lr=7.11E-07, loss= 1.0501 (max= 1.5582), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:35,591 - root - INFO - Step 16400: lr=7.11E-07, loss= 1.0501 (max= 1.5582), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:40:35,591 - root - INFO - Step 16400: lr=7.11E-07, loss= 1.0501 (max= 1.5582), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:07,444 - root - INFO - Step 16410: lr=7.08E-07, loss= 1.0742 (max= 1.7041), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:07,444 - root - INFO - Step 16410: lr=7.08E-07, loss= 1.0742 (max= 1.7041), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:07,444 - root - INFO - Step 16410: lr=7.08E-07, loss= 1.0742 (max= 1.7041), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:07,444 - root - INFO - Step 16410: lr=7.08E-07, loss= 1.0742 (max= 1.7041), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:07,444 - root - INFO - Step 16410: lr=7.08E-07, loss= 1.0742 (max= 1.7041), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:07,444 - root - INFO - Step 16410: lr=7.08E-07, loss= 1.0742 (max= 1.7041), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:07,444 - root - INFO - Step 16410: lr=7.08E-07, loss= 1.0742 (max= 1.7041), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:07,444 - root - INFO - Step 16410: lr=7.08E-07, loss= 1.0742 (max= 1.7041), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:39,345 - root - INFO - Step 16420: lr=7.05E-07, loss= 1.0658 (max= 1.4615), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:39,345 - root - INFO - Step 16420: lr=7.05E-07, loss= 1.0658 (max= 1.4615), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:39,345 - root - INFO - Step 16420: lr=7.05E-07, loss= 1.0658 (max= 1.4615), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:39,345 - root - INFO - Step 16420: lr=7.05E-07, loss= 1.0658 (max= 1.4615), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:39,345 - root - INFO - Step 16420: lr=7.05E-07, loss= 1.0658 (max= 1.4615), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:39,345 - root - INFO - Step 16420: lr=7.05E-07, loss= 1.0658 (max= 1.4615), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:39,345 - root - INFO - Step 16420: lr=7.05E-07, loss= 1.0658 (max= 1.4615), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:41:39,345 - root - INFO - Step 16420: lr=7.05E-07, loss= 1.0658 (max= 1.4615), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:11,153 - root - INFO - Step 16430: lr=7.02E-07, loss= 1.0582 (max= 1.4131), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:11,153 - root - INFO - Step 16430: lr=7.02E-07, loss= 1.0582 (max= 1.4131), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:11,153 - root - INFO - Step 16430: lr=7.02E-07, loss= 1.0582 (max= 1.4131), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:11,153 - root - INFO - Step 16430: lr=7.02E-07, loss= 1.0582 (max= 1.4131), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:11,153 - root - INFO - Step 16430: lr=7.02E-07, loss= 1.0582 (max= 1.4131), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:11,153 - root - INFO - Step 16430: lr=7.02E-07, loss= 1.0582 (max= 1.4131), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:11,153 - root - INFO - Step 16430: lr=7.02E-07, loss= 1.0582 (max= 1.4131), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:11,153 - root - INFO - Step 16430: lr=7.02E-07, loss= 1.0582 (max= 1.4131), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:43,023 - root - INFO - Step 16440: lr=6.99E-07, loss= 1.0664 (max= 1.5760), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:43,023 - root - INFO - Step 16440: lr=6.99E-07, loss= 1.0664 (max= 1.5760), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:43,024 - root - INFO - Step 16440: lr=6.99E-07, loss= 1.0664 (max= 1.5760), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:43,024 - root - INFO - Step 16440: lr=6.99E-07, loss= 1.0664 (max= 1.5760), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:43,024 - root - INFO - Step 16440: lr=6.99E-07, loss= 1.0664 (max= 1.5760), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:43,024 - root - INFO - Step 16440: lr=6.99E-07, loss= 1.0664 (max= 1.5760), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:43,024 - root - INFO - Step 16440: lr=6.99E-07, loss= 1.0664 (max= 1.5760), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:42:43,024 - root - INFO - Step 16440: lr=6.99E-07, loss= 1.0664 (max= 1.5760), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:14,851 - root - INFO - Step 16450: lr=6.96E-07, loss= 1.0451 (max= 1.4928), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:14,852 - root - INFO - Step 16450: lr=6.96E-07, loss= 1.0451 (max= 1.4928), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:14,852 - root - INFO - Step 16450: lr=6.96E-07, loss= 1.0451 (max= 1.4928), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:14,852 - root - INFO - Step 16450: lr=6.96E-07, loss= 1.0451 (max= 1.4928), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:14,852 - root - INFO - Step 16450: lr=6.96E-07, loss= 1.0451 (max= 1.4928), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:14,852 - root - INFO - Step 16450: lr=6.96E-07, loss= 1.0451 (max= 1.4928), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:14,852 - root - INFO - Step 16450: lr=6.96E-07, loss= 1.0451 (max= 1.4928), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:14,852 - root - INFO - Step 16450: lr=6.96E-07, loss= 1.0451 (max= 1.4928), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:46,699 - root - INFO - Step 16460: lr=6.93E-07, loss= 1.0527 (max= 1.4517), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:46,699 - root - INFO - Step 16460: lr=6.93E-07, loss= 1.0527 (max= 1.4517), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:46,700 - root - INFO - Step 16460: lr=6.93E-07, loss= 1.0527 (max= 1.4517), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:46,700 - root - INFO - Step 16460: lr=6.93E-07, loss= 1.0527 (max= 1.4517), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:46,700 - root - INFO - Step 16460: lr=6.93E-07, loss= 1.0527 (max= 1.4517), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:46,700 - root - INFO - Step 16460: lr=6.93E-07, loss= 1.0527 (max= 1.4517), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:46,700 - root - INFO - Step 16460: lr=6.93E-07, loss= 1.0527 (max= 1.4517), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:43:46,700 - root - INFO - Step 16460: lr=6.93E-07, loss= 1.0527 (max= 1.4517), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:18,459 - root - INFO - Step 16470: lr=6.90E-07, loss= 1.0635 (max= 1.7234), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:18,459 - root - INFO - Step 16470: lr=6.90E-07, loss= 1.0635 (max= 1.7234), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:18,459 - root - INFO - Step 16470: lr=6.90E-07, loss= 1.0635 (max= 1.7234), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:18,459 - root - INFO - Step 16470: lr=6.90E-07, loss= 1.0635 (max= 1.7234), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:18,459 - root - INFO - Step 16470: lr=6.90E-07, loss= 1.0635 (max= 1.7234), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:18,459 - root - INFO - Step 16470: lr=6.90E-07, loss= 1.0635 (max= 1.7234), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:18,459 - root - INFO - Step 16470: lr=6.90E-07, loss= 1.0635 (max= 1.7234), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:18,460 - root - INFO - Step 16470: lr=6.90E-07, loss= 1.0635 (max= 1.7234), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:50,268 - root - INFO - Step 16480: lr=6.87E-07, loss= 1.0512 (max= 1.4481), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:50,268 - root - INFO - Step 16480: lr=6.87E-07, loss= 1.0512 (max= 1.4481), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:50,268 - root - INFO - Step 16480: lr=6.87E-07, loss= 1.0512 (max= 1.4481), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:50,268 - root - INFO - Step 16480: lr=6.87E-07, loss= 1.0512 (max= 1.4481), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:50,268 - root - INFO - Step 16480: lr=6.87E-07, loss= 1.0512 (max= 1.4481), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:50,268 - root - INFO - Step 16480: lr=6.87E-07, loss= 1.0512 (max= 1.4481), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:50,268 - root - INFO - Step 16480: lr=6.87E-07, loss= 1.0512 (max= 1.4481), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:44:50,268 - root - INFO - Step 16480: lr=6.87E-07, loss= 1.0512 (max= 1.4481), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:22,106 - root - INFO - Step 16490: lr=6.84E-07, loss= 1.0814 (max= 1.4712), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:22,106 - root - INFO - Step 16490: lr=6.84E-07, loss= 1.0814 (max= 1.4712), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:22,106 - root - INFO - Step 16490: lr=6.84E-07, loss= 1.0814 (max= 1.4712), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:22,106 - root - INFO - Step 16490: lr=6.84E-07, loss= 1.0814 (max= 1.4712), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:22,106 - root - INFO - Step 16490: lr=6.84E-07, loss= 1.0814 (max= 1.4712), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:22,106 - root - INFO - Step 16490: lr=6.84E-07, loss= 1.0814 (max= 1.4712), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:22,106 - root - INFO - Step 16490: lr=6.84E-07, loss= 1.0814 (max= 1.4712), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:22,106 - root - INFO - Step 16490: lr=6.84E-07, loss= 1.0814 (max= 1.4712), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:54,163 - root - INFO - Step 16500: lr=6.82E-07, loss= 1.0632 (max= 1.4548), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:54,163 - root - INFO - Step 16500: lr=6.82E-07, loss= 1.0632 (max= 1.4548), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:54,163 - root - INFO - Step 16500: lr=6.82E-07, loss= 1.0632 (max= 1.4548), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:54,163 - root - INFO - Step 16500: lr=6.82E-07, loss= 1.0632 (max= 1.4548), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:54,163 - root - INFO - Step 16500: lr=6.82E-07, loss= 1.0632 (max= 1.4548), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:54,163 - root - INFO - Step 16500: lr=6.82E-07, loss= 1.0632 (max= 1.4548), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:54,163 - root - INFO - Step 16500: lr=6.82E-07, loss= 1.0632 (max= 1.4548), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:45:54,164 - root - INFO - Step 16500: lr=6.82E-07, loss= 1.0632 (max= 1.4548), tps=20445, mfu=42.60%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:26,039 - root - INFO - Step 16510: lr=6.79E-07, loss= 1.0817 (max= 1.5195), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:26,039 - root - INFO - Step 16510: lr=6.79E-07, loss= 1.0817 (max= 1.5195), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:26,039 - root - INFO - Step 16510: lr=6.79E-07, loss= 1.0817 (max= 1.5195), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:26,039 - root - INFO - Step 16510: lr=6.79E-07, loss= 1.0817 (max= 1.5195), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:26,039 - root - INFO - Step 16510: lr=6.79E-07, loss= 1.0817 (max= 1.5195), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:26,039 - root - INFO - Step 16510: lr=6.79E-07, loss= 1.0817 (max= 1.5195), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:26,039 - root - INFO - Step 16510: lr=6.79E-07, loss= 1.0817 (max= 1.5195), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:26,039 - root - INFO - Step 16510: lr=6.79E-07, loss= 1.0817 (max= 1.5195), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:57,841 - root - INFO - Step 16520: lr=6.76E-07, loss= 1.0533 (max= 1.6619), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:57,841 - root - INFO - Step 16520: lr=6.76E-07, loss= 1.0533 (max= 1.6619), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:57,841 - root - INFO - Step 16520: lr=6.76E-07, loss= 1.0533 (max= 1.6619), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:57,841 - root - INFO - Step 16520: lr=6.76E-07, loss= 1.0533 (max= 1.6619), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:57,841 - root - INFO - Step 16520: lr=6.76E-07, loss= 1.0533 (max= 1.6619), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:57,841 - root - INFO - Step 16520: lr=6.76E-07, loss= 1.0533 (max= 1.6619), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:57,841 - root - INFO - Step 16520: lr=6.76E-07, loss= 1.0533 (max= 1.6619), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:46:57,842 - root - INFO - Step 16520: lr=6.76E-07, loss= 1.0533 (max= 1.6619), tps=20609, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:47:29,712 - root - INFO - Step 16530: lr=6.73E-07, loss= 1.0501 (max= 1.4632), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:47:29,712 - root - INFO - Step 16530: lr=6.73E-07, loss= 1.0501 (max= 1.4632), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:47:29,712 - root - INFO - Step 16530: lr=6.73E-07, loss= 1.0501 (max= 1.4632), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:47:29,712 - root - INFO - Step 16530: lr=6.73E-07, loss= 1.0501 (max= 1.4632), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:47:29,712 - root - INFO - Step 16530: lr=6.73E-07, loss= 1.0501 (max= 1.4632), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:47:29,712 - root - INFO - Step 16530: lr=6.73E-07, loss= 1.0501 (max= 1.4632), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:47:29,712 - root - INFO - Step 16530: lr=6.73E-07, loss= 1.0501 (max= 1.4632), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:47:29,712 - root - INFO - Step 16530: lr=6.73E-07, loss= 1.0501 (max= 1.4632), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:01,482 - root - INFO - Step 16540: lr=6.70E-07, loss= 1.0680 (max= 1.5699), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:01,482 - root - INFO - Step 16540: lr=6.70E-07, loss= 1.0680 (max= 1.5699), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:01,482 - root - INFO - Step 16540: lr=6.70E-07, loss= 1.0680 (max= 1.5699), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:01,482 - root - INFO - Step 16540: lr=6.70E-07, loss= 1.0680 (max= 1.5699), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:01,482 - root - INFO - Step 16540: lr=6.70E-07, loss= 1.0680 (max= 1.5699), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:01,483 - root - INFO - Step 16540: lr=6.70E-07, loss= 1.0680 (max= 1.5699), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:01,483 - root - INFO - Step 16540: lr=6.70E-07, loss= 1.0680 (max= 1.5699), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:01,483 - root - INFO - Step 16540: lr=6.70E-07, loss= 1.0680 (max= 1.5699), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:33,457 - root - INFO - Step 16550: lr=6.67E-07, loss= 1.0703 (max= 1.5063), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:33,457 - root - INFO - Step 16550: lr=6.67E-07, loss= 1.0703 (max= 1.5063), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:33,457 - root - INFO - Step 16550: lr=6.67E-07, loss= 1.0703 (max= 1.5063), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:33,457 - root - INFO - Step 16550: lr=6.67E-07, loss= 1.0703 (max= 1.5063), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:33,457 - root - INFO - Step 16550: lr=6.67E-07, loss= 1.0703 (max= 1.5063), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:33,457 - root - INFO - Step 16550: lr=6.67E-07, loss= 1.0703 (max= 1.5063), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:33,457 - root - INFO - Step 16550: lr=6.67E-07, loss= 1.0703 (max= 1.5063), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:48:33,457 - root - INFO - Step 16550: lr=6.67E-07, loss= 1.0703 (max= 1.5063), tps=20498, mfu=42.71%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:05,350 - root - INFO - Step 16560: lr=6.64E-07, loss= 1.0579 (max= 1.4269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:05,350 - root - INFO - Step 16560: lr=6.64E-07, loss= 1.0579 (max= 1.4269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:05,350 - root - INFO - Step 16560: lr=6.64E-07, loss= 1.0579 (max= 1.4269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:05,350 - root - INFO - Step 16560: lr=6.64E-07, loss= 1.0579 (max= 1.4269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:05,350 - root - INFO - Step 16560: lr=6.64E-07, loss= 1.0579 (max= 1.4269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:05,350 - root - INFO - Step 16560: lr=6.64E-07, loss= 1.0579 (max= 1.4269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:05,350 - root - INFO - Step 16560: lr=6.64E-07, loss= 1.0579 (max= 1.4269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:05,350 - root - INFO - Step 16560: lr=6.64E-07, loss= 1.0579 (max= 1.4269), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:37,279 - root - INFO - Step 16570: lr=6.61E-07, loss= 1.0564 (max= 1.4758), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:37,279 - root - INFO - Step 16570: lr=6.61E-07, loss= 1.0564 (max= 1.4758), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:37,279 - root - INFO - Step 16570: lr=6.61E-07, loss= 1.0564 (max= 1.4758), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:37,279 - root - INFO - Step 16570: lr=6.61E-07, loss= 1.0564 (max= 1.4758), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:37,279 - root - INFO - Step 16570: lr=6.61E-07, loss= 1.0564 (max= 1.4758), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:37,279 - root - INFO - Step 16570: lr=6.61E-07, loss= 1.0564 (max= 1.4758), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:37,279 - root - INFO - Step 16570: lr=6.61E-07, loss= 1.0564 (max= 1.4758), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:49:37,279 - root - INFO - Step 16570: lr=6.61E-07, loss= 1.0564 (max= 1.4758), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:09,100 - root - INFO - Step 16580: lr=6.58E-07, loss= 1.0694 (max= 1.5409), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:09,100 - root - INFO - Step 16580: lr=6.58E-07, loss= 1.0694 (max= 1.5409), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:09,100 - root - INFO - Step 16580: lr=6.58E-07, loss= 1.0694 (max= 1.5409), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:09,100 - root - INFO - Step 16580: lr=6.58E-07, loss= 1.0694 (max= 1.5409), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:09,100 - root - INFO - Step 16580: lr=6.58E-07, loss= 1.0694 (max= 1.5409), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:09,100 - root - INFO - Step 16580: lr=6.58E-07, loss= 1.0694 (max= 1.5409), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:09,100 - root - INFO - Step 16580: lr=6.58E-07, loss= 1.0694 (max= 1.5409), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:09,100 - root - INFO - Step 16580: lr=6.58E-07, loss= 1.0694 (max= 1.5409), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:40,890 - root - INFO - Step 16590: lr=6.55E-07, loss= 1.0667 (max= 1.6430), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:40,890 - root - INFO - Step 16590: lr=6.55E-07, loss= 1.0667 (max= 1.6430), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:40,890 - root - INFO - Step 16590: lr=6.55E-07, loss= 1.0667 (max= 1.6430), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:40,890 - root - INFO - Step 16590: lr=6.55E-07, loss= 1.0667 (max= 1.6430), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:40,890 - root - INFO - Step 16590: lr=6.55E-07, loss= 1.0667 (max= 1.6430), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:40,890 - root - INFO - Step 16590: lr=6.55E-07, loss= 1.0667 (max= 1.6430), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:40,890 - root - INFO - Step 16590: lr=6.55E-07, loss= 1.0667 (max= 1.6430), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:50:40,890 - root - INFO - Step 16590: lr=6.55E-07, loss= 1.0667 (max= 1.6430), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:12,655 - root - INFO - Step 16600: lr=6.52E-07, loss= 1.0742 (max= 1.5821), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:12,655 - root - INFO - Step 16600: lr=6.52E-07, loss= 1.0742 (max= 1.5821), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:12,656 - root - INFO - Step 16600: lr=6.52E-07, loss= 1.0742 (max= 1.5821), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:12,656 - root - INFO - Step 16600: lr=6.52E-07, loss= 1.0742 (max= 1.5821), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:12,656 - root - INFO - Step 16600: lr=6.52E-07, loss= 1.0742 (max= 1.5821), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:12,656 - root - INFO - Step 16600: lr=6.52E-07, loss= 1.0742 (max= 1.5821), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:12,656 - root - INFO - Step 16600: lr=6.52E-07, loss= 1.0742 (max= 1.5821), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:12,656 - root - INFO - Step 16600: lr=6.52E-07, loss= 1.0742 (max= 1.5821), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:44,585 - root - INFO - Step 16610: lr=6.50E-07, loss= 1.0534 (max= 1.5452), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:44,585 - root - INFO - Step 16610: lr=6.50E-07, loss= 1.0534 (max= 1.5452), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:44,585 - root - INFO - Step 16610: lr=6.50E-07, loss= 1.0534 (max= 1.5452), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:44,585 - root - INFO - Step 16610: lr=6.50E-07, loss= 1.0534 (max= 1.5452), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:44,585 - root - INFO - Step 16610: lr=6.50E-07, loss= 1.0534 (max= 1.5452), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:44,585 - root - INFO - Step 16610: lr=6.50E-07, loss= 1.0534 (max= 1.5452), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:44,585 - root - INFO - Step 16610: lr=6.50E-07, loss= 1.0534 (max= 1.5452), tps=20528, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:51:44,585 - root - INFO - Step 16610: lr=6.50E-07, loss= 1.0534 (max= 1.5452), tps=20527, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:16,401 - root - INFO - Step 16620: lr=6.47E-07, loss= 1.0543 (max= 1.5345), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:16,401 - root - INFO - Step 16620: lr=6.47E-07, loss= 1.0543 (max= 1.5345), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:16,401 - root - INFO - Step 16620: lr=6.47E-07, loss= 1.0543 (max= 1.5345), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:16,401 - root - INFO - Step 16620: lr=6.47E-07, loss= 1.0543 (max= 1.5345), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:16,401 - root - INFO - Step 16620: lr=6.47E-07, loss= 1.0543 (max= 1.5345), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:16,401 - root - INFO - Step 16620: lr=6.47E-07, loss= 1.0543 (max= 1.5345), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:16,401 - root - INFO - Step 16620: lr=6.47E-07, loss= 1.0543 (max= 1.5345), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:16,401 - root - INFO - Step 16620: lr=6.47E-07, loss= 1.0543 (max= 1.5345), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:48,213 - root - INFO - Step 16630: lr=6.44E-07, loss= 1.0539 (max= 1.4059), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:48,213 - root - INFO - Step 16630: lr=6.44E-07, loss= 1.0539 (max= 1.4059), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:48,214 - root - INFO - Step 16630: lr=6.44E-07, loss= 1.0539 (max= 1.4059), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:48,214 - root - INFO - Step 16630: lr=6.44E-07, loss= 1.0539 (max= 1.4059), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:48,214 - root - INFO - Step 16630: lr=6.44E-07, loss= 1.0539 (max= 1.4059), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:48,214 - root - INFO - Step 16630: lr=6.44E-07, loss= 1.0539 (max= 1.4059), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:48,214 - root - INFO - Step 16630: lr=6.44E-07, loss= 1.0539 (max= 1.4059), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:52:48,214 - root - INFO - Step 16630: lr=6.44E-07, loss= 1.0539 (max= 1.4059), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:20,046 - root - INFO - Step 16640: lr=6.41E-07, loss= 1.0402 (max= 1.5133), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:20,046 - root - INFO - Step 16640: lr=6.41E-07, loss= 1.0402 (max= 1.5133), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:20,046 - root - INFO - Step 16640: lr=6.41E-07, loss= 1.0402 (max= 1.5133), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:20,046 - root - INFO - Step 16640: lr=6.41E-07, loss= 1.0402 (max= 1.5133), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:20,046 - root - INFO - Step 16640: lr=6.41E-07, loss= 1.0402 (max= 1.5133), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:20,046 - root - INFO - Step 16640: lr=6.41E-07, loss= 1.0402 (max= 1.5133), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:20,046 - root - INFO - Step 16640: lr=6.41E-07, loss= 1.0402 (max= 1.5133), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:20,046 - root - INFO - Step 16640: lr=6.41E-07, loss= 1.0402 (max= 1.5133), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:51,915 - root - INFO - Step 16650: lr=6.38E-07, loss= 1.0672 (max= 1.4818), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:51,915 - root - INFO - Step 16650: lr=6.38E-07, loss= 1.0672 (max= 1.4818), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:51,916 - root - INFO - Step 16650: lr=6.38E-07, loss= 1.0672 (max= 1.4818), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:51,916 - root - INFO - Step 16650: lr=6.38E-07, loss= 1.0672 (max= 1.4818), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:51,916 - root - INFO - Step 16650: lr=6.38E-07, loss= 1.0672 (max= 1.4818), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:51,916 - root - INFO - Step 16650: lr=6.38E-07, loss= 1.0672 (max= 1.4818), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:51,916 - root - INFO - Step 16650: lr=6.38E-07, loss= 1.0672 (max= 1.4818), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:53:51,916 - root - INFO - Step 16650: lr=6.38E-07, loss= 1.0672 (max= 1.4818), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:23,762 - root - INFO - Step 16660: lr=6.35E-07, loss= 1.0577 (max= 1.4668), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:23,762 - root - INFO - Step 16660: lr=6.35E-07, loss= 1.0577 (max= 1.4668), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:23,762 - root - INFO - Step 16660: lr=6.35E-07, loss= 1.0577 (max= 1.4668), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:23,762 - root - INFO - Step 16660: lr=6.35E-07, loss= 1.0577 (max= 1.4668), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:23,762 - root - INFO - Step 16660: lr=6.35E-07, loss= 1.0577 (max= 1.4668), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:23,762 - root - INFO - Step 16660: lr=6.35E-07, loss= 1.0577 (max= 1.4668), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:23,762 - root - INFO - Step 16660: lr=6.35E-07, loss= 1.0577 (max= 1.4668), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:23,762 - root - INFO - Step 16660: lr=6.35E-07, loss= 1.0577 (max= 1.4668), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:55,658 - root - INFO - Step 16670: lr=6.32E-07, loss= 1.0715 (max= 1.4631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:55,658 - root - INFO - Step 16670: lr=6.32E-07, loss= 1.0715 (max= 1.4631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:55,658 - root - INFO - Step 16670: lr=6.32E-07, loss= 1.0715 (max= 1.4631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:55,658 - root - INFO - Step 16670: lr=6.32E-07, loss= 1.0715 (max= 1.4631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:55,658 - root - INFO - Step 16670: lr=6.32E-07, loss= 1.0715 (max= 1.4631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:55,658 - root - INFO - Step 16670: lr=6.32E-07, loss= 1.0715 (max= 1.4631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:55,658 - root - INFO - Step 16670: lr=6.32E-07, loss= 1.0715 (max= 1.4631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:54:55,658 - root - INFO - Step 16670: lr=6.32E-07, loss= 1.0715 (max= 1.4631), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:27,509 - root - INFO - Step 16680: lr=6.29E-07, loss= 1.0660 (max= 1.4975), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:27,509 - root - INFO - Step 16680: lr=6.29E-07, loss= 1.0660 (max= 1.4975), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:27,509 - root - INFO - Step 16680: lr=6.29E-07, loss= 1.0660 (max= 1.4975), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:27,509 - root - INFO - Step 16680: lr=6.29E-07, loss= 1.0660 (max= 1.4975), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:27,509 - root - INFO - Step 16680: lr=6.29E-07, loss= 1.0660 (max= 1.4975), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:27,509 - root - INFO - Step 16680: lr=6.29E-07, loss= 1.0660 (max= 1.4975), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:27,510 - root - INFO - Step 16680: lr=6.29E-07, loss= 1.0660 (max= 1.4975), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:27,510 - root - INFO - Step 16680: lr=6.29E-07, loss= 1.0660 (max= 1.4975), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:59,306 - root - INFO - Step 16690: lr=6.26E-07, loss= 1.0876 (max= 1.5213), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:59,306 - root - INFO - Step 16690: lr=6.26E-07, loss= 1.0876 (max= 1.5213), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:59,306 - root - INFO - Step 16690: lr=6.26E-07, loss= 1.0876 (max= 1.5213), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:59,306 - root - INFO - Step 16690: lr=6.26E-07, loss= 1.0876 (max= 1.5213), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:59,306 - root - INFO - Step 16690: lr=6.26E-07, loss= 1.0876 (max= 1.5213), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:59,306 - root - INFO - Step 16690: lr=6.26E-07, loss= 1.0876 (max= 1.5213), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:59,307 - root - INFO - Step 16690: lr=6.26E-07, loss= 1.0876 (max= 1.5213), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:55:59,307 - root - INFO - Step 16690: lr=6.26E-07, loss= 1.0876 (max= 1.5213), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:56:31,210 - root - INFO - Step 16700: lr=6.23E-07, loss= 1.0692 (max= 1.5958), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:56:31,210 - root - INFO - Step 16700: lr=6.23E-07, loss= 1.0692 (max= 1.5958), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:56:31,210 - root - INFO - Step 16700: lr=6.23E-07, loss= 1.0692 (max= 1.5958), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:56:31,210 - root - INFO - Step 16700: lr=6.23E-07, loss= 1.0692 (max= 1.5958), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:56:31,210 - root - INFO - Step 16700: lr=6.23E-07, loss= 1.0692 (max= 1.5958), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:56:31,210 - root - INFO - Step 16700: lr=6.23E-07, loss= 1.0692 (max= 1.5958), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:56:31,210 - root - INFO - Step 16700: lr=6.23E-07, loss= 1.0692 (max= 1.5958), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:56:31,210 - root - INFO - Step 16700: lr=6.23E-07, loss= 1.0692 (max= 1.5958), tps=20544, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:03,118 - root - INFO - Step 16710: lr=6.21E-07, loss= 1.0770 (max= 1.5759), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:03,119 - root - INFO - Step 16710: lr=6.21E-07, loss= 1.0770 (max= 1.5759), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:03,119 - root - INFO - Step 16710: lr=6.21E-07, loss= 1.0770 (max= 1.5759), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:03,119 - root - INFO - Step 16710: lr=6.21E-07, loss= 1.0770 (max= 1.5759), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:03,119 - root - INFO - Step 16710: lr=6.21E-07, loss= 1.0770 (max= 1.5759), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:03,119 - root - INFO - Step 16710: lr=6.21E-07, loss= 1.0770 (max= 1.5759), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:03,119 - root - INFO - Step 16710: lr=6.21E-07, loss= 1.0770 (max= 1.5759), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:03,119 - root - INFO - Step 16710: lr=6.21E-07, loss= 1.0770 (max= 1.5759), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:34,906 - root - INFO - Step 16720: lr=6.18E-07, loss= 1.0780 (max= 1.5322), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:34,906 - root - INFO - Step 16720: lr=6.18E-07, loss= 1.0780 (max= 1.5322), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:34,906 - root - INFO - Step 16720: lr=6.18E-07, loss= 1.0780 (max= 1.5322), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:34,906 - root - INFO - Step 16720: lr=6.18E-07, loss= 1.0780 (max= 1.5322), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:34,906 - root - INFO - Step 16720: lr=6.18E-07, loss= 1.0780 (max= 1.5322), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:34,906 - root - INFO - Step 16720: lr=6.18E-07, loss= 1.0780 (max= 1.5322), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:34,906 - root - INFO - Step 16720: lr=6.18E-07, loss= 1.0780 (max= 1.5322), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:57:34,906 - root - INFO - Step 16720: lr=6.18E-07, loss= 1.0780 (max= 1.5322), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:06,744 - root - INFO - Step 16730: lr=6.15E-07, loss= 1.0736 (max= 1.6899), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:06,744 - root - INFO - Step 16730: lr=6.15E-07, loss= 1.0736 (max= 1.6899), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:06,745 - root - INFO - Step 16730: lr=6.15E-07, loss= 1.0736 (max= 1.6899), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:06,745 - root - INFO - Step 16730: lr=6.15E-07, loss= 1.0736 (max= 1.6899), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:06,745 - root - INFO - Step 16730: lr=6.15E-07, loss= 1.0736 (max= 1.6899), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:06,745 - root - INFO - Step 16730: lr=6.15E-07, loss= 1.0736 (max= 1.6899), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:06,745 - root - INFO - Step 16730: lr=6.15E-07, loss= 1.0736 (max= 1.6899), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:06,745 - root - INFO - Step 16730: lr=6.15E-07, loss= 1.0736 (max= 1.6899), tps=20586, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:38,572 - root - INFO - Step 16740: lr=6.12E-07, loss= 1.0670 (max= 1.5853), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:38,572 - root - INFO - Step 16740: lr=6.12E-07, loss= 1.0670 (max= 1.5853), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:38,572 - root - INFO - Step 16740: lr=6.12E-07, loss= 1.0670 (max= 1.5853), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:38,572 - root - INFO - Step 16740: lr=6.12E-07, loss= 1.0670 (max= 1.5853), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:38,572 - root - INFO - Step 16740: lr=6.12E-07, loss= 1.0670 (max= 1.5853), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:38,572 - root - INFO - Step 16740: lr=6.12E-07, loss= 1.0670 (max= 1.5853), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:38,573 - root - INFO - Step 16740: lr=6.12E-07, loss= 1.0670 (max= 1.5853), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:58:38,573 - root - INFO - Step 16740: lr=6.12E-07, loss= 1.0670 (max= 1.5853), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:10,552 - root - INFO - Step 16750: lr=6.09E-07, loss= 1.0651 (max= 1.5363), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:10,552 - root - INFO - Step 16750: lr=6.09E-07, loss= 1.0651 (max= 1.5363), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:10,552 - root - INFO - Step 16750: lr=6.09E-07, loss= 1.0651 (max= 1.5363), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:10,552 - root - INFO - Step 16750: lr=6.09E-07, loss= 1.0651 (max= 1.5363), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:10,552 - root - INFO - Step 16750: lr=6.09E-07, loss= 1.0651 (max= 1.5363), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:10,552 - root - INFO - Step 16750: lr=6.09E-07, loss= 1.0651 (max= 1.5363), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:10,552 - root - INFO - Step 16750: lr=6.09E-07, loss= 1.0651 (max= 1.5363), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:10,552 - root - INFO - Step 16750: lr=6.09E-07, loss= 1.0651 (max= 1.5363), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:42,401 - root - INFO - Step 16760: lr=6.06E-07, loss= 1.0601 (max= 1.4696), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:42,401 - root - INFO - Step 16760: lr=6.06E-07, loss= 1.0601 (max= 1.4696), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:42,401 - root - INFO - Step 16760: lr=6.06E-07, loss= 1.0601 (max= 1.4696), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:42,401 - root - INFO - Step 16760: lr=6.06E-07, loss= 1.0601 (max= 1.4696), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:42,401 - root - INFO - Step 16760: lr=6.06E-07, loss= 1.0601 (max= 1.4696), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:42,401 - root - INFO - Step 16760: lr=6.06E-07, loss= 1.0601 (max= 1.4696), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:42,401 - root - INFO - Step 16760: lr=6.06E-07, loss= 1.0601 (max= 1.4696), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 22:59:42,401 - root - INFO - Step 16760: lr=6.06E-07, loss= 1.0601 (max= 1.4696), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:14,198 - root - INFO - Step 16770: lr=6.03E-07, loss= 1.0564 (max= 1.5972), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:14,198 - root - INFO - Step 16770: lr=6.03E-07, loss= 1.0564 (max= 1.5972), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:14,198 - root - INFO - Step 16770: lr=6.03E-07, loss= 1.0564 (max= 1.5972), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:14,198 - root - INFO - Step 16770: lr=6.03E-07, loss= 1.0564 (max= 1.5972), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:14,198 - root - INFO - Step 16770: lr=6.03E-07, loss= 1.0564 (max= 1.5972), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:14,198 - root - INFO - Step 16770: lr=6.03E-07, loss= 1.0564 (max= 1.5972), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:14,198 - root - INFO - Step 16770: lr=6.03E-07, loss= 1.0564 (max= 1.5972), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:14,198 - root - INFO - Step 16770: lr=6.03E-07, loss= 1.0564 (max= 1.5972), tps=20613, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:46,006 - root - INFO - Step 16780: lr=6.00E-07, loss= 1.0763 (max= 1.5408), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:46,006 - root - INFO - Step 16780: lr=6.00E-07, loss= 1.0763 (max= 1.5408), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:46,006 - root - INFO - Step 16780: lr=6.00E-07, loss= 1.0763 (max= 1.5408), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:46,006 - root - INFO - Step 16780: lr=6.00E-07, loss= 1.0763 (max= 1.5408), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:46,006 - root - INFO - Step 16780: lr=6.00E-07, loss= 1.0763 (max= 1.5408), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:46,006 - root - INFO - Step 16780: lr=6.00E-07, loss= 1.0763 (max= 1.5408), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:46,006 - root - INFO - Step 16780: lr=6.00E-07, loss= 1.0763 (max= 1.5408), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:00:46,006 - root - INFO - Step 16780: lr=6.00E-07, loss= 1.0763 (max= 1.5408), tps=20606, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:17,908 - root - INFO - Step 16790: lr=5.97E-07, loss= 1.0510 (max= 1.5646), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:17,908 - root - INFO - Step 16790: lr=5.97E-07, loss= 1.0510 (max= 1.5646), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:17,908 - root - INFO - Step 16790: lr=5.97E-07, loss= 1.0510 (max= 1.5646), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:17,908 - root - INFO - Step 16790: lr=5.97E-07, loss= 1.0510 (max= 1.5646), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:17,909 - root - INFO - Step 16790: lr=5.97E-07, loss= 1.0510 (max= 1.5646), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:17,909 - root - INFO - Step 16790: lr=5.97E-07, loss= 1.0510 (max= 1.5646), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:17,909 - root - INFO - Step 16790: lr=5.97E-07, loss= 1.0510 (max= 1.5646), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:17,909 - root - INFO - Step 16790: lr=5.97E-07, loss= 1.0510 (max= 1.5646), tps=20545, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:49,760 - root - INFO - Step 16800: lr=5.95E-07, loss= 1.0512 (max= 1.4895), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:49,760 - root - INFO - Step 16800: lr=5.95E-07, loss= 1.0512 (max= 1.4895), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:49,760 - root - INFO - Step 16800: lr=5.95E-07, loss= 1.0512 (max= 1.4895), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:49,760 - root - INFO - Step 16800: lr=5.95E-07, loss= 1.0512 (max= 1.4895), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:49,760 - root - INFO - Step 16800: lr=5.95E-07, loss= 1.0512 (max= 1.4895), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:49,760 - root - INFO - Step 16800: lr=5.95E-07, loss= 1.0512 (max= 1.4895), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:49,760 - root - INFO - Step 16800: lr=5.95E-07, loss= 1.0512 (max= 1.4895), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:01:49,761 - root - INFO - Step 16800: lr=5.95E-07, loss= 1.0512 (max= 1.4895), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:21,586 - root - INFO - Step 16810: lr=5.92E-07, loss= 1.0598 (max= 1.5758), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:21,586 - root - INFO - Step 16810: lr=5.92E-07, loss= 1.0598 (max= 1.5758), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:21,586 - root - INFO - Step 16810: lr=5.92E-07, loss= 1.0598 (max= 1.5758), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:21,586 - root - INFO - Step 16810: lr=5.92E-07, loss= 1.0598 (max= 1.5758), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:21,586 - root - INFO - Step 16810: lr=5.92E-07, loss= 1.0598 (max= 1.5758), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:21,586 - root - INFO - Step 16810: lr=5.92E-07, loss= 1.0598 (max= 1.5758), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:21,586 - root - INFO - Step 16810: lr=5.92E-07, loss= 1.0598 (max= 1.5758), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:21,586 - root - INFO - Step 16810: lr=5.92E-07, loss= 1.0598 (max= 1.5758), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:53,451 - root - INFO - Step 16820: lr=5.89E-07, loss= 1.0504 (max= 1.6580), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:53,451 - root - INFO - Step 16820: lr=5.89E-07, loss= 1.0504 (max= 1.6580), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:53,451 - root - INFO - Step 16820: lr=5.89E-07, loss= 1.0504 (max= 1.6580), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:53,451 - root - INFO - Step 16820: lr=5.89E-07, loss= 1.0504 (max= 1.6580), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:53,451 - root - INFO - Step 16820: lr=5.89E-07, loss= 1.0504 (max= 1.6580), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:53,451 - root - INFO - Step 16820: lr=5.89E-07, loss= 1.0504 (max= 1.6580), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:53,452 - root - INFO - Step 16820: lr=5.89E-07, loss= 1.0504 (max= 1.6580), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:02:53,452 - root - INFO - Step 16820: lr=5.89E-07, loss= 1.0504 (max= 1.6580), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:25,378 - root - INFO - Step 16830: lr=5.86E-07, loss= 1.0348 (max= 1.5120), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:25,378 - root - INFO - Step 16830: lr=5.86E-07, loss= 1.0348 (max= 1.5120), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:25,378 - root - INFO - Step 16830: lr=5.86E-07, loss= 1.0348 (max= 1.5120), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:25,378 - root - INFO - Step 16830: lr=5.86E-07, loss= 1.0348 (max= 1.5120), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:25,378 - root - INFO - Step 16830: lr=5.86E-07, loss= 1.0348 (max= 1.5120), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:25,378 - root - INFO - Step 16830: lr=5.86E-07, loss= 1.0348 (max= 1.5120), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:25,378 - root - INFO - Step 16830: lr=5.86E-07, loss= 1.0348 (max= 1.5120), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:25,378 - root - INFO - Step 16830: lr=5.86E-07, loss= 1.0348 (max= 1.5120), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:57,229 - root - INFO - Step 16840: lr=5.83E-07, loss= 1.0692 (max= 1.4900), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:57,229 - root - INFO - Step 16840: lr=5.83E-07, loss= 1.0692 (max= 1.4900), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:57,229 - root - INFO - Step 16840: lr=5.83E-07, loss= 1.0692 (max= 1.4900), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:57,229 - root - INFO - Step 16840: lr=5.83E-07, loss= 1.0692 (max= 1.4900), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:57,229 - root - INFO - Step 16840: lr=5.83E-07, loss= 1.0692 (max= 1.4900), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:57,230 - root - INFO - Step 16840: lr=5.83E-07, loss= 1.0692 (max= 1.4900), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:57,230 - root - INFO - Step 16840: lr=5.83E-07, loss= 1.0692 (max= 1.4900), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:03:57,230 - root - INFO - Step 16840: lr=5.83E-07, loss= 1.0692 (max= 1.4900), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:04:29,021 - root - INFO - Step 16850: lr=5.80E-07, loss= 1.0700 (max= 1.5387), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:04:29,022 - root - INFO - Step 16850: lr=5.80E-07, loss= 1.0700 (max= 1.5387), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:04:29,022 - root - INFO - Step 16850: lr=5.80E-07, loss= 1.0700 (max= 1.5387), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:04:29,022 - root - INFO - Step 16850: lr=5.80E-07, loss= 1.0700 (max= 1.5387), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:04:29,022 - root - INFO - Step 16850: lr=5.80E-07, loss= 1.0700 (max= 1.5387), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:04:29,022 - root - INFO - Step 16850: lr=5.80E-07, loss= 1.0700 (max= 1.5387), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:04:29,022 - root - INFO - Step 16850: lr=5.80E-07, loss= 1.0700 (max= 1.5387), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:04:29,022 - root - INFO - Step 16850: lr=5.80E-07, loss= 1.0700 (max= 1.5387), tps=20616, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:00,915 - root - INFO - Step 16860: lr=5.77E-07, loss= 1.0765 (max= 1.8124), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:00,915 - root - INFO - Step 16860: lr=5.77E-07, loss= 1.0765 (max= 1.8124), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:00,915 - root - INFO - Step 16860: lr=5.77E-07, loss= 1.0765 (max= 1.8124), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:00,915 - root - INFO - Step 16860: lr=5.77E-07, loss= 1.0765 (max= 1.8124), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:00,915 - root - INFO - Step 16860: lr=5.77E-07, loss= 1.0765 (max= 1.8124), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:00,915 - root - INFO - Step 16860: lr=5.77E-07, loss= 1.0765 (max= 1.8124), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:00,915 - root - INFO - Step 16860: lr=5.77E-07, loss= 1.0765 (max= 1.8124), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:00,915 - root - INFO - Step 16860: lr=5.77E-07, loss= 1.0765 (max= 1.8124), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:32,691 - root - INFO - Step 16870: lr=5.74E-07, loss= 1.0645 (max= 1.3760), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:32,691 - root - INFO - Step 16870: lr=5.74E-07, loss= 1.0645 (max= 1.3760), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:32,691 - root - INFO - Step 16870: lr=5.74E-07, loss= 1.0645 (max= 1.3760), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:32,691 - root - INFO - Step 16870: lr=5.74E-07, loss= 1.0645 (max= 1.3760), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:32,691 - root - INFO - Step 16870: lr=5.74E-07, loss= 1.0645 (max= 1.3760), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:32,691 - root - INFO - Step 16870: lr=5.74E-07, loss= 1.0645 (max= 1.3760), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:32,691 - root - INFO - Step 16870: lr=5.74E-07, loss= 1.0645 (max= 1.3760), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:05:32,691 - root - INFO - Step 16870: lr=5.74E-07, loss= 1.0645 (max= 1.3760), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:04,608 - root - INFO - Step 16880: lr=5.72E-07, loss= 1.0789 (max= 1.6888), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:04,608 - root - INFO - Step 16880: lr=5.72E-07, loss= 1.0789 (max= 1.6888), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:04,608 - root - INFO - Step 16880: lr=5.72E-07, loss= 1.0789 (max= 1.6888), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:04,608 - root - INFO - Step 16880: lr=5.72E-07, loss= 1.0789 (max= 1.6888), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:04,608 - root - INFO - Step 16880: lr=5.72E-07, loss= 1.0789 (max= 1.6888), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:04,608 - root - INFO - Step 16880: lr=5.72E-07, loss= 1.0789 (max= 1.6888), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:04,608 - root - INFO - Step 16880: lr=5.72E-07, loss= 1.0789 (max= 1.6888), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:04,608 - root - INFO - Step 16880: lr=5.72E-07, loss= 1.0789 (max= 1.6888), tps=20535, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:36,460 - root - INFO - Step 16890: lr=5.69E-07, loss= 1.0657 (max= 1.6560), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:36,460 - root - INFO - Step 16890: lr=5.69E-07, loss= 1.0657 (max= 1.6560), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:36,460 - root - INFO - Step 16890: lr=5.69E-07, loss= 1.0657 (max= 1.6560), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:36,460 - root - INFO - Step 16890: lr=5.69E-07, loss= 1.0657 (max= 1.6560), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:36,460 - root - INFO - Step 16890: lr=5.69E-07, loss= 1.0657 (max= 1.6560), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:36,460 - root - INFO - Step 16890: lr=5.69E-07, loss= 1.0657 (max= 1.6560), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:36,460 - root - INFO - Step 16890: lr=5.69E-07, loss= 1.0657 (max= 1.6560), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:06:36,460 - root - INFO - Step 16890: lr=5.69E-07, loss= 1.0657 (max= 1.6560), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:08,278 - root - INFO - Step 16900: lr=5.66E-07, loss= 1.0533 (max= 1.8297), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:08,278 - root - INFO - Step 16900: lr=5.66E-07, loss= 1.0533 (max= 1.8297), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:08,278 - root - INFO - Step 16900: lr=5.66E-07, loss= 1.0533 (max= 1.8297), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:08,278 - root - INFO - Step 16900: lr=5.66E-07, loss= 1.0533 (max= 1.8297), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:08,278 - root - INFO - Step 16900: lr=5.66E-07, loss= 1.0533 (max= 1.8297), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:08,278 - root - INFO - Step 16900: lr=5.66E-07, loss= 1.0533 (max= 1.8297), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:08,278 - root - INFO - Step 16900: lr=5.66E-07, loss= 1.0533 (max= 1.8297), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:08,278 - root - INFO - Step 16900: lr=5.66E-07, loss= 1.0533 (max= 1.8297), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:40,174 - root - INFO - Step 16910: lr=5.63E-07, loss= 1.0668 (max= 1.4782), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:40,174 - root - INFO - Step 16910: lr=5.63E-07, loss= 1.0668 (max= 1.4782), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:40,174 - root - INFO - Step 16910: lr=5.63E-07, loss= 1.0668 (max= 1.4782), tps=20549, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:40,174 - root - INFO - Step 16910: lr=5.63E-07, loss= 1.0668 (max= 1.4782), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:40,174 - root - INFO - Step 16910: lr=5.63E-07, loss= 1.0668 (max= 1.4782), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:40,174 - root - INFO - Step 16910: lr=5.63E-07, loss= 1.0668 (max= 1.4782), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:40,174 - root - INFO - Step 16910: lr=5.63E-07, loss= 1.0668 (max= 1.4782), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:07:40,174 - root - INFO - Step 16910: lr=5.63E-07, loss= 1.0668 (max= 1.4782), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:12,200 - root - INFO - Step 16920: lr=5.60E-07, loss= 1.0651 (max= 1.4891), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:12,200 - root - INFO - Step 16920: lr=5.60E-07, loss= 1.0651 (max= 1.4891), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:12,200 - root - INFO - Step 16920: lr=5.60E-07, loss= 1.0651 (max= 1.4891), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:12,200 - root - INFO - Step 16920: lr=5.60E-07, loss= 1.0651 (max= 1.4891), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:12,200 - root - INFO - Step 16920: lr=5.60E-07, loss= 1.0651 (max= 1.4891), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:12,200 - root - INFO - Step 16920: lr=5.60E-07, loss= 1.0651 (max= 1.4891), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:12,200 - root - INFO - Step 16920: lr=5.60E-07, loss= 1.0651 (max= 1.4891), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:12,200 - root - INFO - Step 16920: lr=5.60E-07, loss= 1.0651 (max= 1.4891), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:44,075 - root - INFO - Step 16930: lr=5.57E-07, loss= 1.0648 (max= 1.5500), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:44,075 - root - INFO - Step 16930: lr=5.57E-07, loss= 1.0648 (max= 1.5500), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:44,075 - root - INFO - Step 16930: lr=5.57E-07, loss= 1.0648 (max= 1.5500), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:44,075 - root - INFO - Step 16930: lr=5.57E-07, loss= 1.0648 (max= 1.5500), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:44,075 - root - INFO - Step 16930: lr=5.57E-07, loss= 1.0648 (max= 1.5500), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:44,075 - root - INFO - Step 16930: lr=5.57E-07, loss= 1.0648 (max= 1.5500), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:44,075 - root - INFO - Step 16930: lr=5.57E-07, loss= 1.0648 (max= 1.5500), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:08:44,075 - root - INFO - Step 16930: lr=5.57E-07, loss= 1.0648 (max= 1.5500), tps=20562, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:15,859 - root - INFO - Step 16940: lr=5.54E-07, loss= 1.0811 (max= 2.0021), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:15,859 - root - INFO - Step 16940: lr=5.54E-07, loss= 1.0811 (max= 2.0021), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:15,859 - root - INFO - Step 16940: lr=5.54E-07, loss= 1.0811 (max= 2.0021), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:15,860 - root - INFO - Step 16940: lr=5.54E-07, loss= 1.0811 (max= 2.0021), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:15,860 - root - INFO - Step 16940: lr=5.54E-07, loss= 1.0811 (max= 2.0021), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:15,860 - root - INFO - Step 16940: lr=5.54E-07, loss= 1.0811 (max= 2.0021), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:15,860 - root - INFO - Step 16940: lr=5.54E-07, loss= 1.0811 (max= 2.0021), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:15,860 - root - INFO - Step 16940: lr=5.54E-07, loss= 1.0811 (max= 2.0021), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:47,721 - root - INFO - Step 16950: lr=5.51E-07, loss= 1.0546 (max= 1.6404), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:47,721 - root - INFO - Step 16950: lr=5.51E-07, loss= 1.0546 (max= 1.6404), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:47,721 - root - INFO - Step 16950: lr=5.51E-07, loss= 1.0546 (max= 1.6404), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:47,721 - root - INFO - Step 16950: lr=5.51E-07, loss= 1.0546 (max= 1.6404), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:47,721 - root - INFO - Step 16950: lr=5.51E-07, loss= 1.0546 (max= 1.6404), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:47,721 - root - INFO - Step 16950: lr=5.51E-07, loss= 1.0546 (max= 1.6404), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:47,721 - root - INFO - Step 16950: lr=5.51E-07, loss= 1.0546 (max= 1.6404), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:09:47,721 - root - INFO - Step 16950: lr=5.51E-07, loss= 1.0546 (max= 1.6404), tps=20571, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:19,550 - root - INFO - Step 16960: lr=5.49E-07, loss= 1.0585 (max= 1.5776), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:19,551 - root - INFO - Step 16960: lr=5.49E-07, loss= 1.0585 (max= 1.5776), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:19,551 - root - INFO - Step 16960: lr=5.49E-07, loss= 1.0585 (max= 1.5776), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:19,551 - root - INFO - Step 16960: lr=5.49E-07, loss= 1.0585 (max= 1.5776), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:19,551 - root - INFO - Step 16960: lr=5.49E-07, loss= 1.0585 (max= 1.5776), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:19,551 - root - INFO - Step 16960: lr=5.49E-07, loss= 1.0585 (max= 1.5776), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:19,551 - root - INFO - Step 16960: lr=5.49E-07, loss= 1.0585 (max= 1.5776), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:19,551 - root - INFO - Step 16960: lr=5.49E-07, loss= 1.0585 (max= 1.5776), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:51,471 - root - INFO - Step 16970: lr=5.46E-07, loss= 1.0539 (max= 1.4934), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:51,471 - root - INFO - Step 16970: lr=5.46E-07, loss= 1.0539 (max= 1.4934), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:51,471 - root - INFO - Step 16970: lr=5.46E-07, loss= 1.0539 (max= 1.4934), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:51,471 - root - INFO - Step 16970: lr=5.46E-07, loss= 1.0539 (max= 1.4934), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:51,471 - root - INFO - Step 16970: lr=5.46E-07, loss= 1.0539 (max= 1.4934), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:51,471 - root - INFO - Step 16970: lr=5.46E-07, loss= 1.0539 (max= 1.4934), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:51,471 - root - INFO - Step 16970: lr=5.46E-07, loss= 1.0539 (max= 1.4934), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:10:51,471 - root - INFO - Step 16970: lr=5.46E-07, loss= 1.0539 (max= 1.4934), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:23,545 - root - INFO - Step 16980: lr=5.43E-07, loss= 1.0661 (max= 1.8220), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:23,545 - root - INFO - Step 16980: lr=5.43E-07, loss= 1.0661 (max= 1.8220), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:23,545 - root - INFO - Step 16980: lr=5.43E-07, loss= 1.0661 (max= 1.8220), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:23,545 - root - INFO - Step 16980: lr=5.43E-07, loss= 1.0661 (max= 1.8220), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:23,545 - root - INFO - Step 16980: lr=5.43E-07, loss= 1.0661 (max= 1.8220), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:23,545 - root - INFO - Step 16980: lr=5.43E-07, loss= 1.0661 (max= 1.8220), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:23,545 - root - INFO - Step 16980: lr=5.43E-07, loss= 1.0661 (max= 1.8220), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:23,545 - root - INFO - Step 16980: lr=5.43E-07, loss= 1.0661 (max= 1.8220), tps=20435, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:55,412 - root - INFO - Step 16990: lr=5.40E-07, loss= 1.0509 (max= 1.7036), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:55,412 - root - INFO - Step 16990: lr=5.40E-07, loss= 1.0509 (max= 1.7036), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:55,412 - root - INFO - Step 16990: lr=5.40E-07, loss= 1.0509 (max= 1.7036), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:55,412 - root - INFO - Step 16990: lr=5.40E-07, loss= 1.0509 (max= 1.7036), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:55,412 - root - INFO - Step 16990: lr=5.40E-07, loss= 1.0509 (max= 1.7036), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:55,412 - root - INFO - Step 16990: lr=5.40E-07, loss= 1.0509 (max= 1.7036), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:55,412 - root - INFO - Step 16990: lr=5.40E-07, loss= 1.0509 (max= 1.7036), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:11:55,412 - root - INFO - Step 16990: lr=5.40E-07, loss= 1.0509 (max= 1.7036), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-17000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-17000! Save time: 4.3681159019470215 2025-10-26 23:12:27,238 - root - INFO - Step 17000: lr=5.37E-07, loss= 1.0641 (max= 1.4713), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:12:27,238 - root - INFO - Saving a full checkpoint at step 17000 2025-10-26 23:12:27,238 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 23:12:27,238 - root - INFO - Step 17000: lr=5.37E-07, loss= 1.0641 (max= 1.4713), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:12:27,238 - root - INFO - Step 17000: lr=5.37E-07, loss= 1.0641 (max= 1.4713), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:12:27,238 - root - INFO - Saving a full checkpoint at step 17000 2025-10-26 23:12:27,238 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 23:12:27,238 - root - INFO - Saving a full checkpoint at step 17000 2025-10-26 23:12:27,238 - root - INFO - Step 17000: lr=5.37E-07, loss= 1.0641 (max= 1.4713), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:12:27,238 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 23:12:27,238 - root - INFO - Saving a full checkpoint at step 17000 2025-10-26 23:12:27,238 - root - INFO - Step 17000: lr=5.37E-07, loss= 1.0641 (max= 1.4713), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:12:27,238 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 23:12:27,238 - root - INFO - Saving a full checkpoint at step 17000 2025-10-26 23:12:27,238 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 23:12:27,238 - root - INFO - Step 17000: lr=5.37E-07, loss= 1.0641 (max= 1.4713), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:12:27,239 - root - INFO - Saving a full checkpoint at step 17000 2025-10-26 23:12:27,238 - root - INFO - Step 17000: lr=5.37E-07, loss= 1.0641 (max= 1.4713), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:12:27,239 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 23:12:27,239 - root - INFO - Saving a full checkpoint at step 17000 2025-10-26 23:12:27,239 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 23:12:27,239 - root - INFO - Step 17000: lr=5.37E-07, loss= 1.0641 (max= 1.4713), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:12:27,239 - root - INFO - Saving a full checkpoint at step 17000 2025-10-26 23:12:27,239 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-26 23:12:42,034 - root - INFO - Finished saving the checkpoint in 14.80 seconds 2025-10-26 23:12:42,041 - root - INFO - Finished saving the checkpoint in 14.80 seconds 2025-10-26 23:12:42,042 - root - INFO - Finished saving the checkpoint in 14.80 seconds 2025-10-26 23:12:42,042 - root - INFO - Finished saving the checkpoint in 14.80 seconds 2025-10-26 23:12:42,042 - root - INFO - Finished saving the checkpoint in 14.80 seconds 2025-10-26 23:12:42,043 - root - INFO - Finished saving the checkpoint in 14.80 seconds 2025-10-26 23:12:42,043 - root - INFO - Finished saving the checkpoint in 14.80 seconds 2025-10-26 23:12:42,043 - root - INFO - Finished saving the checkpoint in 14.81 seconds 2025-10-26 23:13:13,843 - root - INFO - Step 17010: lr=5.34E-07, loss= 1.0695 (max= 1.6721), tps=14063, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:13,843 - root - INFO - Step 17010: lr=5.34E-07, loss= 1.0695 (max= 1.6721), tps=14063, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:13,843 - root - INFO - Step 17010: lr=5.34E-07, loss= 1.0695 (max= 1.6721), tps=14063, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:13,843 - root - INFO - Step 17010: lr=5.34E-07, loss= 1.0695 (max= 1.6721), tps=14063, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:13,843 - root - INFO - Step 17010: lr=5.34E-07, loss= 1.0695 (max= 1.6721), tps=14063, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:13,843 - root - INFO - Step 17010: lr=5.34E-07, loss= 1.0695 (max= 1.6721), tps=14063, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:13,843 - root - INFO - Step 17010: lr=5.34E-07, loss= 1.0695 (max= 1.6721), tps=14063, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:13,843 - root - INFO - Step 17010: lr=5.34E-07, loss= 1.0695 (max= 1.6721), tps=14063, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:45,695 - root - INFO - Step 17020: lr=5.31E-07, loss= 1.0809 (max= 1.5624), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:45,695 - root - INFO - Step 17020: lr=5.31E-07, loss= 1.0809 (max= 1.5624), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:45,695 - root - INFO - Step 17020: lr=5.31E-07, loss= 1.0809 (max= 1.5624), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:45,695 - root - INFO - Step 17020: lr=5.31E-07, loss= 1.0809 (max= 1.5624), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:45,695 - root - INFO - Step 17020: lr=5.31E-07, loss= 1.0809 (max= 1.5624), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:45,695 - root - INFO - Step 17020: lr=5.31E-07, loss= 1.0809 (max= 1.5624), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:45,695 - root - INFO - Step 17020: lr=5.31E-07, loss= 1.0809 (max= 1.5624), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:13:45,695 - root - INFO - Step 17020: lr=5.31E-07, loss= 1.0809 (max= 1.5624), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:17,522 - root - INFO - Step 17030: lr=5.28E-07, loss= 1.0429 (max= 1.4697), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:17,522 - root - INFO - Step 17030: lr=5.28E-07, loss= 1.0429 (max= 1.4697), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:17,522 - root - INFO - Step 17030: lr=5.28E-07, loss= 1.0429 (max= 1.4697), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:17,522 - root - INFO - Step 17030: lr=5.28E-07, loss= 1.0429 (max= 1.4697), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:17,522 - root - INFO - Step 17030: lr=5.28E-07, loss= 1.0429 (max= 1.4697), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:17,522 - root - INFO - Step 17030: lr=5.28E-07, loss= 1.0429 (max= 1.4697), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:17,522 - root - INFO - Step 17030: lr=5.28E-07, loss= 1.0429 (max= 1.4697), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:17,522 - root - INFO - Step 17030: lr=5.28E-07, loss= 1.0429 (max= 1.4697), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:49,386 - root - INFO - Step 17040: lr=5.26E-07, loss= 1.0758 (max= 1.5112), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:49,386 - root - INFO - Step 17040: lr=5.26E-07, loss= 1.0758 (max= 1.5112), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:49,387 - root - INFO - Step 17040: lr=5.26E-07, loss= 1.0758 (max= 1.5112), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:49,387 - root - INFO - Step 17040: lr=5.26E-07, loss= 1.0758 (max= 1.5112), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:49,387 - root - INFO - Step 17040: lr=5.26E-07, loss= 1.0758 (max= 1.5112), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:49,387 - root - INFO - Step 17040: lr=5.26E-07, loss= 1.0758 (max= 1.5112), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:49,387 - root - INFO - Step 17040: lr=5.26E-07, loss= 1.0758 (max= 1.5112), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:14:49,387 - root - INFO - Step 17040: lr=5.26E-07, loss= 1.0758 (max= 1.5112), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:21,427 - root - INFO - Step 17050: lr=5.23E-07, loss= 1.0597 (max= 1.5361), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:21,427 - root - INFO - Step 17050: lr=5.23E-07, loss= 1.0597 (max= 1.5361), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:21,427 - root - INFO - Step 17050: lr=5.23E-07, loss= 1.0597 (max= 1.5361), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:21,427 - root - INFO - Step 17050: lr=5.23E-07, loss= 1.0597 (max= 1.5361), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:21,427 - root - INFO - Step 17050: lr=5.23E-07, loss= 1.0597 (max= 1.5361), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:21,427 - root - INFO - Step 17050: lr=5.23E-07, loss= 1.0597 (max= 1.5361), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:21,427 - root - INFO - Step 17050: lr=5.23E-07, loss= 1.0597 (max= 1.5361), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:21,427 - root - INFO - Step 17050: lr=5.23E-07, loss= 1.0597 (max= 1.5361), tps=20456, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:53,270 - root - INFO - Step 17060: lr=5.20E-07, loss= 1.0671 (max= 1.6583), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:53,270 - root - INFO - Step 17060: lr=5.20E-07, loss= 1.0671 (max= 1.6583), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:53,270 - root - INFO - Step 17060: lr=5.20E-07, loss= 1.0671 (max= 1.6583), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:53,271 - root - INFO - Step 17060: lr=5.20E-07, loss= 1.0671 (max= 1.6583), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:53,271 - root - INFO - Step 17060: lr=5.20E-07, loss= 1.0671 (max= 1.6583), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:53,271 - root - INFO - Step 17060: lr=5.20E-07, loss= 1.0671 (max= 1.6583), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:53,271 - root - INFO - Step 17060: lr=5.20E-07, loss= 1.0671 (max= 1.6583), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:15:53,271 - root - INFO - Step 17060: lr=5.20E-07, loss= 1.0671 (max= 1.6583), tps=20583, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:19,325 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2145936 2025-10-26 23:16:25,181 - root - INFO - Step 17070: lr=5.17E-07, loss= 1.0342 (max= 1.6502), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:25,181 - root - INFO - Step 17070: lr=5.17E-07, loss= 1.0342 (max= 1.6502), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:25,181 - root - INFO - Step 17070: lr=5.17E-07, loss= 1.0342 (max= 1.6502), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:25,181 - root - INFO - Step 17070: lr=5.17E-07, loss= 1.0342 (max= 1.6502), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:25,181 - root - INFO - Step 17070: lr=5.17E-07, loss= 1.0342 (max= 1.6502), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:25,181 - root - INFO - Step 17070: lr=5.17E-07, loss= 1.0342 (max= 1.6502), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:25,181 - root - INFO - Step 17070: lr=5.17E-07, loss= 1.0342 (max= 1.6502), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:25,181 - root - INFO - Step 17070: lr=5.17E-07, loss= 1.0342 (max= 1.6502), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:56,994 - root - INFO - Step 17080: lr=5.14E-07, loss= 1.0813 (max= 1.5034), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:56,994 - root - INFO - Step 17080: lr=5.14E-07, loss= 1.0813 (max= 1.5034), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:56,994 - root - INFO - Step 17080: lr=5.14E-07, loss= 1.0813 (max= 1.5034), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:56,994 - root - INFO - Step 17080: lr=5.14E-07, loss= 1.0813 (max= 1.5034), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:56,994 - root - INFO - Step 17080: lr=5.14E-07, loss= 1.0813 (max= 1.5034), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:56,994 - root - INFO - Step 17080: lr=5.14E-07, loss= 1.0813 (max= 1.5034), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:56,994 - root - INFO - Step 17080: lr=5.14E-07, loss= 1.0813 (max= 1.5034), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:16:56,994 - root - INFO - Step 17080: lr=5.14E-07, loss= 1.0813 (max= 1.5034), tps=20602, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:17:28,916 - root - INFO - Step 17090: lr=5.11E-07, loss= 1.0707 (max= 1.5007), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:17:28,916 - root - INFO - Step 17090: lr=5.11E-07, loss= 1.0707 (max= 1.5007), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:17:28,916 - root - INFO - Step 17090: lr=5.11E-07, loss= 1.0707 (max= 1.5007), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:17:28,916 - root - INFO - Step 17090: lr=5.11E-07, loss= 1.0707 (max= 1.5007), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:17:28,916 - root - INFO - Step 17090: lr=5.11E-07, loss= 1.0707 (max= 1.5007), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:17:28,916 - root - INFO - Step 17090: lr=5.11E-07, loss= 1.0707 (max= 1.5007), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:17:28,917 - root - INFO - Step 17090: lr=5.11E-07, loss= 1.0707 (max= 1.5007), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:17:28,917 - root - INFO - Step 17090: lr=5.11E-07, loss= 1.0707 (max= 1.5007), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:00,763 - root - INFO - Step 17100: lr=5.08E-07, loss= 1.0633 (max= 1.4401), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:00,763 - root - INFO - Step 17100: lr=5.08E-07, loss= 1.0633 (max= 1.4401), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:00,763 - root - INFO - Step 17100: lr=5.08E-07, loss= 1.0633 (max= 1.4401), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:00,763 - root - INFO - Step 17100: lr=5.08E-07, loss= 1.0633 (max= 1.4401), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:00,763 - root - INFO - Step 17100: lr=5.08E-07, loss= 1.0633 (max= 1.4401), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:00,763 - root - INFO - Step 17100: lr=5.08E-07, loss= 1.0633 (max= 1.4401), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:00,763 - root - INFO - Step 17100: lr=5.08E-07, loss= 1.0633 (max= 1.4401), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:00,763 - root - INFO - Step 17100: lr=5.08E-07, loss= 1.0633 (max= 1.4401), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:32,705 - root - INFO - Step 17110: lr=5.06E-07, loss= 1.0623 (max= 1.4533), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:32,705 - root - INFO - Step 17110: lr=5.06E-07, loss= 1.0623 (max= 1.4533), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:32,705 - root - INFO - Step 17110: lr=5.06E-07, loss= 1.0623 (max= 1.4533), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:32,705 - root - INFO - Step 17110: lr=5.06E-07, loss= 1.0623 (max= 1.4533), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:32,705 - root - INFO - Step 17110: lr=5.06E-07, loss= 1.0623 (max= 1.4533), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:32,705 - root - INFO - Step 17110: lr=5.06E-07, loss= 1.0623 (max= 1.4533), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:32,705 - root - INFO - Step 17110: lr=5.06E-07, loss= 1.0623 (max= 1.4533), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:18:32,705 - root - INFO - Step 17110: lr=5.06E-07, loss= 1.0623 (max= 1.4533), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:04,648 - root - INFO - Step 17120: lr=5.03E-07, loss= 1.0597 (max= 1.5108), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:04,648 - root - INFO - Step 17120: lr=5.03E-07, loss= 1.0597 (max= 1.5108), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:04,649 - root - INFO - Step 17120: lr=5.03E-07, loss= 1.0597 (max= 1.5108), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:04,649 - root - INFO - Step 17120: lr=5.03E-07, loss= 1.0597 (max= 1.5108), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:04,649 - root - INFO - Step 17120: lr=5.03E-07, loss= 1.0597 (max= 1.5108), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:04,649 - root - INFO - Step 17120: lr=5.03E-07, loss= 1.0597 (max= 1.5108), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:04,649 - root - INFO - Step 17120: lr=5.03E-07, loss= 1.0597 (max= 1.5108), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:04,649 - root - INFO - Step 17120: lr=5.03E-07, loss= 1.0597 (max= 1.5108), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:36,522 - root - INFO - Step 17130: lr=5.00E-07, loss= 1.0440 (max= 1.5276), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:36,522 - root - INFO - Step 17130: lr=5.00E-07, loss= 1.0440 (max= 1.5276), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:36,522 - root - INFO - Step 17130: lr=5.00E-07, loss= 1.0440 (max= 1.5276), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:36,522 - root - INFO - Step 17130: lr=5.00E-07, loss= 1.0440 (max= 1.5276), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:36,523 - root - INFO - Step 17130: lr=5.00E-07, loss= 1.0440 (max= 1.5276), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:36,523 - root - INFO - Step 17130: lr=5.00E-07, loss= 1.0440 (max= 1.5276), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:36,523 - root - INFO - Step 17130: lr=5.00E-07, loss= 1.0440 (max= 1.5276), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:19:36,523 - root - INFO - Step 17130: lr=5.00E-07, loss= 1.0440 (max= 1.5276), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:08,381 - root - INFO - Step 17140: lr=4.97E-07, loss= 1.0637 (max= 1.5149), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:08,381 - root - INFO - Step 17140: lr=4.97E-07, loss= 1.0637 (max= 1.5149), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:08,381 - root - INFO - Step 17140: lr=4.97E-07, loss= 1.0637 (max= 1.5149), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:08,381 - root - INFO - Step 17140: lr=4.97E-07, loss= 1.0637 (max= 1.5149), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:08,381 - root - INFO - Step 17140: lr=4.97E-07, loss= 1.0637 (max= 1.5149), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:08,381 - root - INFO - Step 17140: lr=4.97E-07, loss= 1.0637 (max= 1.5149), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:08,381 - root - INFO - Step 17140: lr=4.97E-07, loss= 1.0637 (max= 1.5149), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:08,381 - root - INFO - Step 17140: lr=4.97E-07, loss= 1.0637 (max= 1.5149), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:40,300 - root - INFO - Step 17150: lr=4.94E-07, loss= 1.0583 (max= 1.5018), tps=20535, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:40,300 - root - INFO - Step 17150: lr=4.94E-07, loss= 1.0583 (max= 1.5018), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:40,300 - root - INFO - Step 17150: lr=4.94E-07, loss= 1.0583 (max= 1.5018), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:40,300 - root - INFO - Step 17150: lr=4.94E-07, loss= 1.0583 (max= 1.5018), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:40,300 - root - INFO - Step 17150: lr=4.94E-07, loss= 1.0583 (max= 1.5018), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:40,300 - root - INFO - Step 17150: lr=4.94E-07, loss= 1.0583 (max= 1.5018), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:40,300 - root - INFO - Step 17150: lr=4.94E-07, loss= 1.0583 (max= 1.5018), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:20:40,300 - root - INFO - Step 17150: lr=4.94E-07, loss= 1.0583 (max= 1.5018), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:12,154 - root - INFO - Step 17160: lr=4.91E-07, loss= 1.0574 (max= 1.4326), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:12,154 - root - INFO - Step 17160: lr=4.91E-07, loss= 1.0574 (max= 1.4326), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:12,154 - root - INFO - Step 17160: lr=4.91E-07, loss= 1.0574 (max= 1.4326), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:12,154 - root - INFO - Step 17160: lr=4.91E-07, loss= 1.0574 (max= 1.4326), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:12,154 - root - INFO - Step 17160: lr=4.91E-07, loss= 1.0574 (max= 1.4326), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:12,154 - root - INFO - Step 17160: lr=4.91E-07, loss= 1.0574 (max= 1.4326), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:12,154 - root - INFO - Step 17160: lr=4.91E-07, loss= 1.0574 (max= 1.4326), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:12,154 - root - INFO - Step 17160: lr=4.91E-07, loss= 1.0574 (max= 1.4326), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:43,902 - root - INFO - Step 17170: lr=4.88E-07, loss= 1.0626 (max= 1.4641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:43,902 - root - INFO - Step 17170: lr=4.88E-07, loss= 1.0626 (max= 1.4641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:43,902 - root - INFO - Step 17170: lr=4.88E-07, loss= 1.0626 (max= 1.4641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:43,903 - root - INFO - Step 17170: lr=4.88E-07, loss= 1.0626 (max= 1.4641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:43,903 - root - INFO - Step 17170: lr=4.88E-07, loss= 1.0626 (max= 1.4641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:43,903 - root - INFO - Step 17170: lr=4.88E-07, loss= 1.0626 (max= 1.4641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:43,903 - root - INFO - Step 17170: lr=4.88E-07, loss= 1.0626 (max= 1.4641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:21:43,903 - root - INFO - Step 17170: lr=4.88E-07, loss= 1.0626 (max= 1.4641), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:15,742 - root - INFO - Step 17180: lr=4.86E-07, loss= 1.0471 (max= 1.5018), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:15,742 - root - INFO - Step 17180: lr=4.86E-07, loss= 1.0471 (max= 1.5018), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:15,742 - root - INFO - Step 17180: lr=4.86E-07, loss= 1.0471 (max= 1.5018), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:15,742 - root - INFO - Step 17180: lr=4.86E-07, loss= 1.0471 (max= 1.5018), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:15,742 - root - INFO - Step 17180: lr=4.86E-07, loss= 1.0471 (max= 1.5018), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:15,742 - root - INFO - Step 17180: lr=4.86E-07, loss= 1.0471 (max= 1.5018), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:15,742 - root - INFO - Step 17180: lr=4.86E-07, loss= 1.0471 (max= 1.5018), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:15,742 - root - INFO - Step 17180: lr=4.86E-07, loss= 1.0471 (max= 1.5018), tps=20585, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:47,772 - root - INFO - Step 17190: lr=4.83E-07, loss= 1.0593 (max= 1.6374), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:47,772 - root - INFO - Step 17190: lr=4.83E-07, loss= 1.0593 (max= 1.6374), tps=20463, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:47,772 - root - INFO - Step 17190: lr=4.83E-07, loss= 1.0593 (max= 1.6374), tps=20463, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:47,772 - root - INFO - Step 17190: lr=4.83E-07, loss= 1.0593 (max= 1.6374), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:47,772 - root - INFO - Step 17190: lr=4.83E-07, loss= 1.0593 (max= 1.6374), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:47,772 - root - INFO - Step 17190: lr=4.83E-07, loss= 1.0593 (max= 1.6374), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:47,772 - root - INFO - Step 17190: lr=4.83E-07, loss= 1.0593 (max= 1.6374), tps=20463, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:22:47,772 - root - INFO - Step 17190: lr=4.83E-07, loss= 1.0593 (max= 1.6374), tps=20463, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:19,588 - root - INFO - Step 17200: lr=4.80E-07, loss= 1.0639 (max= 1.5519), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:19,588 - root - INFO - Step 17200: lr=4.80E-07, loss= 1.0639 (max= 1.5519), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:19,588 - root - INFO - Step 17200: lr=4.80E-07, loss= 1.0639 (max= 1.5519), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:19,588 - root - INFO - Step 17200: lr=4.80E-07, loss= 1.0639 (max= 1.5519), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:19,588 - root - INFO - Step 17200: lr=4.80E-07, loss= 1.0639 (max= 1.5519), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:19,588 - root - INFO - Step 17200: lr=4.80E-07, loss= 1.0639 (max= 1.5519), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:19,588 - root - INFO - Step 17200: lr=4.80E-07, loss= 1.0639 (max= 1.5519), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:19,588 - root - INFO - Step 17200: lr=4.80E-07, loss= 1.0639 (max= 1.5519), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:51,373 - root - INFO - Step 17210: lr=4.77E-07, loss= 1.0569 (max= 1.4865), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:51,373 - root - INFO - Step 17210: lr=4.77E-07, loss= 1.0569 (max= 1.4865), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:51,373 - root - INFO - Step 17210: lr=4.77E-07, loss= 1.0569 (max= 1.4865), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:51,373 - root - INFO - Step 17210: lr=4.77E-07, loss= 1.0569 (max= 1.4865), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:51,373 - root - INFO - Step 17210: lr=4.77E-07, loss= 1.0569 (max= 1.4865), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:51,373 - root - INFO - Step 17210: lr=4.77E-07, loss= 1.0569 (max= 1.4865), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:51,373 - root - INFO - Step 17210: lr=4.77E-07, loss= 1.0569 (max= 1.4865), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:23:51,373 - root - INFO - Step 17210: lr=4.77E-07, loss= 1.0569 (max= 1.4865), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:23,192 - root - INFO - Step 17220: lr=4.74E-07, loss= 1.0278 (max= 1.4932), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:23,192 - root - INFO - Step 17220: lr=4.74E-07, loss= 1.0278 (max= 1.4932), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:23,192 - root - INFO - Step 17220: lr=4.74E-07, loss= 1.0278 (max= 1.4932), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:23,192 - root - INFO - Step 17220: lr=4.74E-07, loss= 1.0278 (max= 1.4932), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:23,192 - root - INFO - Step 17220: lr=4.74E-07, loss= 1.0278 (max= 1.4932), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:23,192 - root - INFO - Step 17220: lr=4.74E-07, loss= 1.0278 (max= 1.4932), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:23,192 - root - INFO - Step 17220: lr=4.74E-07, loss= 1.0278 (max= 1.4932), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:23,192 - root - INFO - Step 17220: lr=4.74E-07, loss= 1.0278 (max= 1.4932), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:55,115 - root - INFO - Step 17230: lr=4.71E-07, loss= 1.0426 (max= 1.5101), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:55,115 - root - INFO - Step 17230: lr=4.71E-07, loss= 1.0426 (max= 1.5101), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:55,115 - root - INFO - Step 17230: lr=4.71E-07, loss= 1.0426 (max= 1.5101), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:55,115 - root - INFO - Step 17230: lr=4.71E-07, loss= 1.0426 (max= 1.5101), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:55,115 - root - INFO - Step 17230: lr=4.71E-07, loss= 1.0426 (max= 1.5101), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:55,115 - root - INFO - Step 17230: lr=4.71E-07, loss= 1.0426 (max= 1.5101), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:55,115 - root - INFO - Step 17230: lr=4.71E-07, loss= 1.0426 (max= 1.5101), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:24:55,115 - root - INFO - Step 17230: lr=4.71E-07, loss= 1.0426 (max= 1.5101), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:26,994 - root - INFO - Step 17240: lr=4.68E-07, loss= 1.0089 (max= 1.4193), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:26,994 - root - INFO - Step 17240: lr=4.68E-07, loss= 1.0089 (max= 1.4193), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:26,994 - root - INFO - Step 17240: lr=4.68E-07, loss= 1.0089 (max= 1.4193), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:26,994 - root - INFO - Step 17240: lr=4.68E-07, loss= 1.0089 (max= 1.4193), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:26,994 - root - INFO - Step 17240: lr=4.68E-07, loss= 1.0089 (max= 1.4193), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:26,994 - root - INFO - Step 17240: lr=4.68E-07, loss= 1.0089 (max= 1.4193), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:26,994 - root - INFO - Step 17240: lr=4.68E-07, loss= 1.0089 (max= 1.4193), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:26,994 - root - INFO - Step 17240: lr=4.68E-07, loss= 1.0089 (max= 1.4193), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:58,819 - root - INFO - Step 17250: lr=4.66E-07, loss= 1.0448 (max= 1.6674), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:58,819 - root - INFO - Step 17250: lr=4.66E-07, loss= 1.0448 (max= 1.6674), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:58,819 - root - INFO - Step 17250: lr=4.66E-07, loss= 1.0448 (max= 1.6674), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:58,819 - root - INFO - Step 17250: lr=4.66E-07, loss= 1.0448 (max= 1.6674), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:58,819 - root - INFO - Step 17250: lr=4.66E-07, loss= 1.0448 (max= 1.6674), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:58,819 - root - INFO - Step 17250: lr=4.66E-07, loss= 1.0448 (max= 1.6674), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:58,819 - root - INFO - Step 17250: lr=4.66E-07, loss= 1.0448 (max= 1.6674), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:25:58,819 - root - INFO - Step 17250: lr=4.66E-07, loss= 1.0448 (max= 1.6674), tps=20595, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:26:30,629 - root - INFO - Step 17260: lr=4.63E-07, loss= 1.0541 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:26:30,629 - root - INFO - Step 17260: lr=4.63E-07, loss= 1.0541 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:26:30,629 - root - INFO - Step 17260: lr=4.63E-07, loss= 1.0541 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:26:30,629 - root - INFO - Step 17260: lr=4.63E-07, loss= 1.0541 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:26:30,629 - root - INFO - Step 17260: lr=4.63E-07, loss= 1.0541 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:26:30,629 - root - INFO - Step 17260: lr=4.63E-07, loss= 1.0541 (max= 1.4788), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:26:30,629 - root - INFO - Step 17260: lr=4.63E-07, loss= 1.0541 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:26:30,630 - root - INFO - Step 17260: lr=4.63E-07, loss= 1.0541 (max= 1.4788), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:02,487 - root - INFO - Step 17270: lr=4.60E-07, loss= 1.0197 (max= 1.4118), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:02,487 - root - INFO - Step 17270: lr=4.60E-07, loss= 1.0197 (max= 1.4118), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:02,487 - root - INFO - Step 17270: lr=4.60E-07, loss= 1.0197 (max= 1.4118), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:02,487 - root - INFO - Step 17270: lr=4.60E-07, loss= 1.0197 (max= 1.4118), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:02,487 - root - INFO - Step 17270: lr=4.60E-07, loss= 1.0197 (max= 1.4118), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:02,487 - root - INFO - Step 17270: lr=4.60E-07, loss= 1.0197 (max= 1.4118), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:02,487 - root - INFO - Step 17270: lr=4.60E-07, loss= 1.0197 (max= 1.4118), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:02,487 - root - INFO - Step 17270: lr=4.60E-07, loss= 1.0197 (max= 1.4118), tps=20573, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:34,369 - root - INFO - Step 17280: lr=4.57E-07, loss= 1.0285 (max= 1.4524), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:34,369 - root - INFO - Step 17280: lr=4.57E-07, loss= 1.0285 (max= 1.4524), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:34,369 - root - INFO - Step 17280: lr=4.57E-07, loss= 1.0285 (max= 1.4524), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:34,369 - root - INFO - Step 17280: lr=4.57E-07, loss= 1.0285 (max= 1.4524), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:34,369 - root - INFO - Step 17280: lr=4.57E-07, loss= 1.0285 (max= 1.4524), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:34,369 - root - INFO - Step 17280: lr=4.57E-07, loss= 1.0285 (max= 1.4524), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:34,369 - root - INFO - Step 17280: lr=4.57E-07, loss= 1.0285 (max= 1.4524), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:27:34,369 - root - INFO - Step 17280: lr=4.57E-07, loss= 1.0285 (max= 1.4524), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:06,183 - root - INFO - Step 17290: lr=4.54E-07, loss= 1.0260 (max= 1.4762), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:06,183 - root - INFO - Step 17290: lr=4.54E-07, loss= 1.0260 (max= 1.4762), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:06,183 - root - INFO - Step 17290: lr=4.54E-07, loss= 1.0260 (max= 1.4762), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:06,183 - root - INFO - Step 17290: lr=4.54E-07, loss= 1.0260 (max= 1.4762), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:06,183 - root - INFO - Step 17290: lr=4.54E-07, loss= 1.0260 (max= 1.4762), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:06,183 - root - INFO - Step 17290: lr=4.54E-07, loss= 1.0260 (max= 1.4762), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:06,183 - root - INFO - Step 17290: lr=4.54E-07, loss= 1.0260 (max= 1.4762), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:06,183 - root - INFO - Step 17290: lr=4.54E-07, loss= 1.0260 (max= 1.4762), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:38,035 - root - INFO - Step 17300: lr=4.51E-07, loss= 1.0388 (max= 1.5373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:38,035 - root - INFO - Step 17300: lr=4.51E-07, loss= 1.0388 (max= 1.5373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:38,035 - root - INFO - Step 17300: lr=4.51E-07, loss= 1.0388 (max= 1.5373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:38,035 - root - INFO - Step 17300: lr=4.51E-07, loss= 1.0388 (max= 1.5373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:38,035 - root - INFO - Step 17300: lr=4.51E-07, loss= 1.0388 (max= 1.5373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:38,035 - root - INFO - Step 17300: lr=4.51E-07, loss= 1.0388 (max= 1.5373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:38,035 - root - INFO - Step 17300: lr=4.51E-07, loss= 1.0388 (max= 1.5373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:28:38,035 - root - INFO - Step 17300: lr=4.51E-07, loss= 1.0388 (max= 1.5373), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:09,959 - root - INFO - Step 17310: lr=4.49E-07, loss= 1.0702 (max= 1.4962), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:09,959 - root - INFO - Step 17310: lr=4.49E-07, loss= 1.0702 (max= 1.4962), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:09,959 - root - INFO - Step 17310: lr=4.49E-07, loss= 1.0702 (max= 1.4962), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:09,959 - root - INFO - Step 17310: lr=4.49E-07, loss= 1.0702 (max= 1.4962), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:09,959 - root - INFO - Step 17310: lr=4.49E-07, loss= 1.0702 (max= 1.4962), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:09,959 - root - INFO - Step 17310: lr=4.49E-07, loss= 1.0702 (max= 1.4962), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:09,959 - root - INFO - Step 17310: lr=4.49E-07, loss= 1.0702 (max= 1.4962), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:09,959 - root - INFO - Step 17310: lr=4.49E-07, loss= 1.0702 (max= 1.4962), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:41,809 - root - INFO - Step 17320: lr=4.46E-07, loss= 1.0354 (max= 1.4416), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:41,809 - root - INFO - Step 17320: lr=4.46E-07, loss= 1.0354 (max= 1.4416), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:41,809 - root - INFO - Step 17320: lr=4.46E-07, loss= 1.0354 (max= 1.4416), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:41,809 - root - INFO - Step 17320: lr=4.46E-07, loss= 1.0354 (max= 1.4416), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:41,809 - root - INFO - Step 17320: lr=4.46E-07, loss= 1.0354 (max= 1.4416), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:41,809 - root - INFO - Step 17320: lr=4.46E-07, loss= 1.0354 (max= 1.4416), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:41,809 - root - INFO - Step 17320: lr=4.46E-07, loss= 1.0354 (max= 1.4416), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:29:41,809 - root - INFO - Step 17320: lr=4.46E-07, loss= 1.0354 (max= 1.4416), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:13,718 - root - INFO - Step 17330: lr=4.43E-07, loss= 1.0211 (max= 1.5770), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:13,718 - root - INFO - Step 17330: lr=4.43E-07, loss= 1.0211 (max= 1.5770), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:13,719 - root - INFO - Step 17330: lr=4.43E-07, loss= 1.0211 (max= 1.5770), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:13,719 - root - INFO - Step 17330: lr=4.43E-07, loss= 1.0211 (max= 1.5770), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:13,719 - root - INFO - Step 17330: lr=4.43E-07, loss= 1.0211 (max= 1.5770), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:13,719 - root - INFO - Step 17330: lr=4.43E-07, loss= 1.0211 (max= 1.5770), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:13,719 - root - INFO - Step 17330: lr=4.43E-07, loss= 1.0211 (max= 1.5770), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:13,719 - root - INFO - Step 17330: lr=4.43E-07, loss= 1.0211 (max= 1.5770), tps=20540, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:45,676 - root - INFO - Step 17340: lr=4.40E-07, loss= 1.0391 (max= 1.3726), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:45,676 - root - INFO - Step 17340: lr=4.40E-07, loss= 1.0391 (max= 1.3726), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:45,676 - root - INFO - Step 17340: lr=4.40E-07, loss= 1.0391 (max= 1.3726), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:45,676 - root - INFO - Step 17340: lr=4.40E-07, loss= 1.0391 (max= 1.3726), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:45,676 - root - INFO - Step 17340: lr=4.40E-07, loss= 1.0391 (max= 1.3726), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:45,676 - root - INFO - Step 17340: lr=4.40E-07, loss= 1.0391 (max= 1.3726), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:45,676 - root - INFO - Step 17340: lr=4.40E-07, loss= 1.0391 (max= 1.3726), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:30:45,676 - root - INFO - Step 17340: lr=4.40E-07, loss= 1.0391 (max= 1.3726), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:17,529 - root - INFO - Step 17350: lr=4.37E-07, loss= 1.0449 (max= 1.6644), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:17,529 - root - INFO - Step 17350: lr=4.37E-07, loss= 1.0449 (max= 1.6644), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:17,529 - root - INFO - Step 17350: lr=4.37E-07, loss= 1.0449 (max= 1.6644), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:17,529 - root - INFO - Step 17350: lr=4.37E-07, loss= 1.0449 (max= 1.6644), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:17,529 - root - INFO - Step 17350: lr=4.37E-07, loss= 1.0449 (max= 1.6644), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:17,529 - root - INFO - Step 17350: lr=4.37E-07, loss= 1.0449 (max= 1.6644), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:17,529 - root - INFO - Step 17350: lr=4.37E-07, loss= 1.0449 (max= 1.6644), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:17,529 - root - INFO - Step 17350: lr=4.37E-07, loss= 1.0449 (max= 1.6644), tps=20577, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:49,417 - root - INFO - Step 17360: lr=4.34E-07, loss= 1.0007 (max= 1.4604), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:49,417 - root - INFO - Step 17360: lr=4.34E-07, loss= 1.0007 (max= 1.4604), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:49,417 - root - INFO - Step 17360: lr=4.34E-07, loss= 1.0007 (max= 1.4604), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:49,417 - root - INFO - Step 17360: lr=4.34E-07, loss= 1.0007 (max= 1.4604), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:49,417 - root - INFO - Step 17360: lr=4.34E-07, loss= 1.0007 (max= 1.4604), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:49,417 - root - INFO - Step 17360: lr=4.34E-07, loss= 1.0007 (max= 1.4604), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:49,418 - root - INFO - Step 17360: lr=4.34E-07, loss= 1.0007 (max= 1.4604), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:31:49,418 - root - INFO - Step 17360: lr=4.34E-07, loss= 1.0007 (max= 1.4604), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:21,286 - root - INFO - Step 17370: lr=4.32E-07, loss= 1.0353 (max= 1.4662), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:21,286 - root - INFO - Step 17370: lr=4.32E-07, loss= 1.0353 (max= 1.4662), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:21,286 - root - INFO - Step 17370: lr=4.32E-07, loss= 1.0353 (max= 1.4662), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:21,286 - root - INFO - Step 17370: lr=4.32E-07, loss= 1.0353 (max= 1.4662), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:21,286 - root - INFO - Step 17370: lr=4.32E-07, loss= 1.0353 (max= 1.4662), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:21,286 - root - INFO - Step 17370: lr=4.32E-07, loss= 1.0353 (max= 1.4662), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:21,286 - root - INFO - Step 17370: lr=4.32E-07, loss= 1.0353 (max= 1.4662), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:21,286 - root - INFO - Step 17370: lr=4.32E-07, loss= 1.0353 (max= 1.4662), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:53,398 - root - INFO - Step 17380: lr=4.29E-07, loss= 1.0362 (max= 1.5131), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:53,398 - root - INFO - Step 17380: lr=4.29E-07, loss= 1.0362 (max= 1.5131), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:53,398 - root - INFO - Step 17380: lr=4.29E-07, loss= 1.0362 (max= 1.5131), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:53,399 - root - INFO - Step 17380: lr=4.29E-07, loss= 1.0362 (max= 1.5131), tps=20410, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:53,399 - root - INFO - Step 17380: lr=4.29E-07, loss= 1.0362 (max= 1.5131), tps=20411, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:53,399 - root - INFO - Step 17380: lr=4.29E-07, loss= 1.0362 (max= 1.5131), tps=20411, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:53,399 - root - INFO - Step 17380: lr=4.29E-07, loss= 1.0362 (max= 1.5131), tps=20411, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:32:53,399 - root - INFO - Step 17380: lr=4.29E-07, loss= 1.0362 (max= 1.5131), tps=20411, mfu=42.53%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:25,381 - root - INFO - Step 17390: lr=4.26E-07, loss= 1.0240 (max= 1.6616), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:25,381 - root - INFO - Step 17390: lr=4.26E-07, loss= 1.0240 (max= 1.6616), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:25,381 - root - INFO - Step 17390: lr=4.26E-07, loss= 1.0240 (max= 1.6616), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:25,382 - root - INFO - Step 17390: lr=4.26E-07, loss= 1.0240 (max= 1.6616), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:25,382 - root - INFO - Step 17390: lr=4.26E-07, loss= 1.0240 (max= 1.6616), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:25,382 - root - INFO - Step 17390: lr=4.26E-07, loss= 1.0240 (max= 1.6616), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:25,382 - root - INFO - Step 17390: lr=4.26E-07, loss= 1.0240 (max= 1.6616), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:25,382 - root - INFO - Step 17390: lr=4.26E-07, loss= 1.0240 (max= 1.6616), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:57,292 - root - INFO - Step 17400: lr=4.23E-07, loss= 1.0687 (max= 1.4288), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:57,292 - root - INFO - Step 17400: lr=4.23E-07, loss= 1.0687 (max= 1.4288), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:57,292 - root - INFO - Step 17400: lr=4.23E-07, loss= 1.0687 (max= 1.4288), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:57,292 - root - INFO - Step 17400: lr=4.23E-07, loss= 1.0687 (max= 1.4288), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:57,292 - root - INFO - Step 17400: lr=4.23E-07, loss= 1.0687 (max= 1.4288), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:57,292 - root - INFO - Step 17400: lr=4.23E-07, loss= 1.0687 (max= 1.4288), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:57,292 - root - INFO - Step 17400: lr=4.23E-07, loss= 1.0687 (max= 1.4288), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:33:57,292 - root - INFO - Step 17400: lr=4.23E-07, loss= 1.0687 (max= 1.4288), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:34:29,082 - root - INFO - Step 17410: lr=4.20E-07, loss= 1.0263 (max= 1.3712), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:34:29,082 - root - INFO - Step 17410: lr=4.20E-07, loss= 1.0263 (max= 1.3712), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:34:29,082 - root - INFO - Step 17410: lr=4.20E-07, loss= 1.0263 (max= 1.3712), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:34:29,082 - root - INFO - Step 17410: lr=4.20E-07, loss= 1.0263 (max= 1.3712), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:34:29,082 - root - INFO - Step 17410: lr=4.20E-07, loss= 1.0263 (max= 1.3712), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:34:29,082 - root - INFO - Step 17410: lr=4.20E-07, loss= 1.0263 (max= 1.3712), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:34:29,083 - root - INFO - Step 17410: lr=4.20E-07, loss= 1.0263 (max= 1.3712), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:34:29,083 - root - INFO - Step 17410: lr=4.20E-07, loss= 1.0263 (max= 1.3712), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:01,236 - root - INFO - Step 17420: lr=4.17E-07, loss= 1.0407 (max= 1.4393), tps=20384, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:01,236 - root - INFO - Step 17420: lr=4.17E-07, loss= 1.0407 (max= 1.4393), tps=20384, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:01,236 - root - INFO - Step 17420: lr=4.17E-07, loss= 1.0407 (max= 1.4393), tps=20384, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:01,236 - root - INFO - Step 17420: lr=4.17E-07, loss= 1.0407 (max= 1.4393), tps=20384, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:01,236 - root - INFO - Step 17420: lr=4.17E-07, loss= 1.0407 (max= 1.4393), tps=20384, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:01,236 - root - INFO - Step 17420: lr=4.17E-07, loss= 1.0407 (max= 1.4393), tps=20384, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:01,236 - root - INFO - Step 17420: lr=4.17E-07, loss= 1.0407 (max= 1.4393), tps=20384, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:01,236 - root - INFO - Step 17420: lr=4.17E-07, loss= 1.0407 (max= 1.4393), tps=20384, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:33,184 - root - INFO - Step 17430: lr=4.15E-07, loss= 1.0117 (max= 1.4981), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:33,184 - root - INFO - Step 17430: lr=4.15E-07, loss= 1.0117 (max= 1.4981), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:33,184 - root - INFO - Step 17430: lr=4.15E-07, loss= 1.0117 (max= 1.4981), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:33,184 - root - INFO - Step 17430: lr=4.15E-07, loss= 1.0117 (max= 1.4981), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:33,184 - root - INFO - Step 17430: lr=4.15E-07, loss= 1.0117 (max= 1.4981), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:33,184 - root - INFO - Step 17430: lr=4.15E-07, loss= 1.0117 (max= 1.4981), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:33,184 - root - INFO - Step 17430: lr=4.15E-07, loss= 1.0117 (max= 1.4981), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:35:33,184 - root - INFO - Step 17430: lr=4.15E-07, loss= 1.0117 (max= 1.4981), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:05,060 - root - INFO - Step 17440: lr=4.12E-07, loss= 1.0464 (max= 1.7729), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:05,061 - root - INFO - Step 17440: lr=4.12E-07, loss= 1.0464 (max= 1.7729), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:05,061 - root - INFO - Step 17440: lr=4.12E-07, loss= 1.0464 (max= 1.7729), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:05,061 - root - INFO - Step 17440: lr=4.12E-07, loss= 1.0464 (max= 1.7729), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:05,061 - root - INFO - Step 17440: lr=4.12E-07, loss= 1.0464 (max= 1.7729), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:05,061 - root - INFO - Step 17440: lr=4.12E-07, loss= 1.0464 (max= 1.7729), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:05,061 - root - INFO - Step 17440: lr=4.12E-07, loss= 1.0464 (max= 1.7729), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:05,061 - root - INFO - Step 17440: lr=4.12E-07, loss= 1.0464 (max= 1.7729), tps=20561, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:36,888 - root - INFO - Step 17450: lr=4.09E-07, loss= 1.0174 (max= 1.5551), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:36,888 - root - INFO - Step 17450: lr=4.09E-07, loss= 1.0174 (max= 1.5551), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:36,888 - root - INFO - Step 17450: lr=4.09E-07, loss= 1.0174 (max= 1.5551), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:36,888 - root - INFO - Step 17450: lr=4.09E-07, loss= 1.0174 (max= 1.5551), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:36,888 - root - INFO - Step 17450: lr=4.09E-07, loss= 1.0174 (max= 1.5551), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:36,888 - root - INFO - Step 17450: lr=4.09E-07, loss= 1.0174 (max= 1.5551), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:36,888 - root - INFO - Step 17450: lr=4.09E-07, loss= 1.0174 (max= 1.5551), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:36:36,888 - root - INFO - Step 17450: lr=4.09E-07, loss= 1.0174 (max= 1.5551), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:08,647 - root - INFO - Step 17460: lr=4.06E-07, loss= 1.0384 (max= 1.5085), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:08,647 - root - INFO - Step 17460: lr=4.06E-07, loss= 1.0384 (max= 1.5085), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:08,648 - root - INFO - Step 17460: lr=4.06E-07, loss= 1.0384 (max= 1.5085), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:08,648 - root - INFO - Step 17460: lr=4.06E-07, loss= 1.0384 (max= 1.5085), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:08,648 - root - INFO - Step 17460: lr=4.06E-07, loss= 1.0384 (max= 1.5085), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:08,648 - root - INFO - Step 17460: lr=4.06E-07, loss= 1.0384 (max= 1.5085), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:08,648 - root - INFO - Step 17460: lr=4.06E-07, loss= 1.0384 (max= 1.5085), tps=20638, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:08,648 - root - INFO - Step 17460: lr=4.06E-07, loss= 1.0384 (max= 1.5085), tps=20637, mfu=43.00%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:40,478 - root - INFO - Step 17470: lr=4.03E-07, loss= 1.0554 (max= 1.3950), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:40,478 - root - INFO - Step 17470: lr=4.03E-07, loss= 1.0554 (max= 1.3950), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:40,478 - root - INFO - Step 17470: lr=4.03E-07, loss= 1.0554 (max= 1.3950), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:40,478 - root - INFO - Step 17470: lr=4.03E-07, loss= 1.0554 (max= 1.3950), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:40,478 - root - INFO - Step 17470: lr=4.03E-07, loss= 1.0554 (max= 1.3950), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:40,479 - root - INFO - Step 17470: lr=4.03E-07, loss= 1.0554 (max= 1.3950), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:40,479 - root - INFO - Step 17470: lr=4.03E-07, loss= 1.0554 (max= 1.3950), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:37:40,479 - root - INFO - Step 17470: lr=4.03E-07, loss= 1.0554 (max= 1.3950), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:12,328 - root - INFO - Step 17480: lr=4.00E-07, loss= 1.0311 (max= 1.5579), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:12,328 - root - INFO - Step 17480: lr=4.00E-07, loss= 1.0311 (max= 1.5579), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:12,328 - root - INFO - Step 17480: lr=4.00E-07, loss= 1.0311 (max= 1.5579), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:12,328 - root - INFO - Step 17480: lr=4.00E-07, loss= 1.0311 (max= 1.5579), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:12,328 - root - INFO - Step 17480: lr=4.00E-07, loss= 1.0311 (max= 1.5579), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:12,328 - root - INFO - Step 17480: lr=4.00E-07, loss= 1.0311 (max= 1.5579), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:12,328 - root - INFO - Step 17480: lr=4.00E-07, loss= 1.0311 (max= 1.5579), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:12,328 - root - INFO - Step 17480: lr=4.00E-07, loss= 1.0311 (max= 1.5579), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:44,199 - root - INFO - Step 17490: lr=3.98E-07, loss= 1.0609 (max= 1.5530), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:44,200 - root - INFO - Step 17490: lr=3.98E-07, loss= 1.0609 (max= 1.5530), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:44,200 - root - INFO - Step 17490: lr=3.98E-07, loss= 1.0609 (max= 1.5530), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:44,200 - root - INFO - Step 17490: lr=3.98E-07, loss= 1.0609 (max= 1.5530), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:44,200 - root - INFO - Step 17490: lr=3.98E-07, loss= 1.0609 (max= 1.5530), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:44,200 - root - INFO - Step 17490: lr=3.98E-07, loss= 1.0609 (max= 1.5530), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:44,200 - root - INFO - Step 17490: lr=3.98E-07, loss= 1.0609 (max= 1.5530), tps=20564, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:38:44,200 - root - INFO - Step 17490: lr=3.98E-07, loss= 1.0609 (max= 1.5530), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:16,034 - root - INFO - Step 17500: lr=3.95E-07, loss= 1.0425 (max= 1.4655), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:16,034 - root - INFO - Step 17500: lr=3.95E-07, loss= 1.0425 (max= 1.4655), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:16,034 - root - INFO - Step 17500: lr=3.95E-07, loss= 1.0425 (max= 1.4655), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:16,034 - root - INFO - Step 17500: lr=3.95E-07, loss= 1.0425 (max= 1.4655), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:16,034 - root - INFO - Step 17500: lr=3.95E-07, loss= 1.0425 (max= 1.4655), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:16,034 - root - INFO - Step 17500: lr=3.95E-07, loss= 1.0425 (max= 1.4655), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:16,034 - root - INFO - Step 17500: lr=3.95E-07, loss= 1.0425 (max= 1.4655), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:16,034 - root - INFO - Step 17500: lr=3.95E-07, loss= 1.0425 (max= 1.4655), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:48,079 - root - INFO - Step 17510: lr=3.92E-07, loss= 1.0409 (max= 1.7352), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:48,079 - root - INFO - Step 17510: lr=3.92E-07, loss= 1.0409 (max= 1.7352), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:48,079 - root - INFO - Step 17510: lr=3.92E-07, loss= 1.0409 (max= 1.7352), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:48,079 - root - INFO - Step 17510: lr=3.92E-07, loss= 1.0409 (max= 1.7352), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:48,079 - root - INFO - Step 17510: lr=3.92E-07, loss= 1.0409 (max= 1.7352), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:48,079 - root - INFO - Step 17510: lr=3.92E-07, loss= 1.0409 (max= 1.7352), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:48,079 - root - INFO - Step 17510: lr=3.92E-07, loss= 1.0409 (max= 1.7352), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:39:48,079 - root - INFO - Step 17510: lr=3.92E-07, loss= 1.0409 (max= 1.7352), tps=20453, mfu=42.61%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:19,818 - root - INFO - Step 17520: lr=3.89E-07, loss= 1.0489 (max= 1.7550), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:19,819 - root - INFO - Step 17520: lr=3.89E-07, loss= 1.0489 (max= 1.7550), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:19,819 - root - INFO - Step 17520: lr=3.89E-07, loss= 1.0489 (max= 1.7550), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:19,819 - root - INFO - Step 17520: lr=3.89E-07, loss= 1.0489 (max= 1.7550), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:19,819 - root - INFO - Step 17520: lr=3.89E-07, loss= 1.0489 (max= 1.7550), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:19,819 - root - INFO - Step 17520: lr=3.89E-07, loss= 1.0489 (max= 1.7550), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:19,819 - root - INFO - Step 17520: lr=3.89E-07, loss= 1.0489 (max= 1.7550), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:19,819 - root - INFO - Step 17520: lr=3.89E-07, loss= 1.0489 (max= 1.7550), tps=20650, mfu=43.03%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:51,604 - root - INFO - Step 17530: lr=3.86E-07, loss= 1.0298 (max= 1.5763), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:51,604 - root - INFO - Step 17530: lr=3.86E-07, loss= 1.0298 (max= 1.5763), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:51,604 - root - INFO - Step 17530: lr=3.86E-07, loss= 1.0298 (max= 1.5763), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:51,604 - root - INFO - Step 17530: lr=3.86E-07, loss= 1.0298 (max= 1.5763), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:51,604 - root - INFO - Step 17530: lr=3.86E-07, loss= 1.0298 (max= 1.5763), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:51,604 - root - INFO - Step 17530: lr=3.86E-07, loss= 1.0298 (max= 1.5763), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:51,604 - root - INFO - Step 17530: lr=3.86E-07, loss= 1.0298 (max= 1.5763), tps=20621, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:40:51,604 - root - INFO - Step 17530: lr=3.86E-07, loss= 1.0298 (max= 1.5763), tps=20620, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:13,136 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2869311 2025-10-26 23:41:23,569 - root - INFO - Step 17540: lr=3.83E-07, loss= 1.0568 (max= 1.5304), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:23,569 - root - INFO - Step 17540: lr=3.83E-07, loss= 1.0568 (max= 1.5304), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:23,569 - root - INFO - Step 17540: lr=3.83E-07, loss= 1.0568 (max= 1.5304), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:23,569 - root - INFO - Step 17540: lr=3.83E-07, loss= 1.0568 (max= 1.5304), tps=20505, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:23,569 - root - INFO - Step 17540: lr=3.83E-07, loss= 1.0568 (max= 1.5304), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:23,569 - root - INFO - Step 17540: lr=3.83E-07, loss= 1.0568 (max= 1.5304), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:23,569 - root - INFO - Step 17540: lr=3.83E-07, loss= 1.0568 (max= 1.5304), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:23,569 - root - INFO - Step 17540: lr=3.83E-07, loss= 1.0568 (max= 1.5304), tps=20504, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:55,523 - root - INFO - Step 17550: lr=3.81E-07, loss= 1.0365 (max= 1.4922), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:55,523 - root - INFO - Step 17550: lr=3.81E-07, loss= 1.0365 (max= 1.4922), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:55,523 - root - INFO - Step 17550: lr=3.81E-07, loss= 1.0365 (max= 1.4922), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:55,523 - root - INFO - Step 17550: lr=3.81E-07, loss= 1.0365 (max= 1.4922), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:55,523 - root - INFO - Step 17550: lr=3.81E-07, loss= 1.0365 (max= 1.4922), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:55,523 - root - INFO - Step 17550: lr=3.81E-07, loss= 1.0365 (max= 1.4922), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:55,523 - root - INFO - Step 17550: lr=3.81E-07, loss= 1.0365 (max= 1.4922), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:41:55,523 - root - INFO - Step 17550: lr=3.81E-07, loss= 1.0365 (max= 1.4922), tps=20512, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:27,419 - root - INFO - Step 17560: lr=3.78E-07, loss= 1.0401 (max= 1.5001), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:27,419 - root - INFO - Step 17560: lr=3.78E-07, loss= 1.0401 (max= 1.5001), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:27,419 - root - INFO - Step 17560: lr=3.78E-07, loss= 1.0401 (max= 1.5001), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:27,419 - root - INFO - Step 17560: lr=3.78E-07, loss= 1.0401 (max= 1.5001), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:27,419 - root - INFO - Step 17560: lr=3.78E-07, loss= 1.0401 (max= 1.5001), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:27,419 - root - INFO - Step 17560: lr=3.78E-07, loss= 1.0401 (max= 1.5001), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:27,419 - root - INFO - Step 17560: lr=3.78E-07, loss= 1.0401 (max= 1.5001), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:27,419 - root - INFO - Step 17560: lr=3.78E-07, loss= 1.0401 (max= 1.5001), tps=20549, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:59,269 - root - INFO - Step 17570: lr=3.75E-07, loss= 1.0605 (max= 1.7320), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:59,269 - root - INFO - Step 17570: lr=3.75E-07, loss= 1.0605 (max= 1.7320), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:59,269 - root - INFO - Step 17570: lr=3.75E-07, loss= 1.0605 (max= 1.7320), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:59,269 - root - INFO - Step 17570: lr=3.75E-07, loss= 1.0605 (max= 1.7320), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:59,269 - root - INFO - Step 17570: lr=3.75E-07, loss= 1.0605 (max= 1.7320), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:59,269 - root - INFO - Step 17570: lr=3.75E-07, loss= 1.0605 (max= 1.7320), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:59,269 - root - INFO - Step 17570: lr=3.75E-07, loss= 1.0605 (max= 1.7320), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:42:59,269 - root - INFO - Step 17570: lr=3.75E-07, loss= 1.0605 (max= 1.7320), tps=20578, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:43:31,190 - root - INFO - Step 17580: lr=3.72E-07, loss= 1.0433 (max= 1.4607), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:43:31,190 - root - INFO - Step 17580: lr=3.72E-07, loss= 1.0433 (max= 1.4607), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:43:31,190 - root - INFO - Step 17580: lr=3.72E-07, loss= 1.0433 (max= 1.4607), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:43:31,190 - root - INFO - Step 17580: lr=3.72E-07, loss= 1.0433 (max= 1.4607), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:43:31,190 - root - INFO - Step 17580: lr=3.72E-07, loss= 1.0433 (max= 1.4607), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:43:31,190 - root - INFO - Step 17580: lr=3.72E-07, loss= 1.0433 (max= 1.4607), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:43:31,190 - root - INFO - Step 17580: lr=3.72E-07, loss= 1.0433 (max= 1.4607), tps=20534, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:43:31,190 - root - INFO - Step 17580: lr=3.72E-07, loss= 1.0433 (max= 1.4607), tps=20533, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:03,132 - root - INFO - Step 17590: lr=3.69E-07, loss= 1.0751 (max= 1.5944), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:03,132 - root - INFO - Step 17590: lr=3.69E-07, loss= 1.0751 (max= 1.5944), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:03,132 - root - INFO - Step 17590: lr=3.69E-07, loss= 1.0751 (max= 1.5944), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:03,132 - root - INFO - Step 17590: lr=3.69E-07, loss= 1.0751 (max= 1.5944), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:03,132 - root - INFO - Step 17590: lr=3.69E-07, loss= 1.0751 (max= 1.5944), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:03,132 - root - INFO - Step 17590: lr=3.69E-07, loss= 1.0751 (max= 1.5944), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:03,132 - root - INFO - Step 17590: lr=3.69E-07, loss= 1.0751 (max= 1.5944), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:03,132 - root - INFO - Step 17590: lr=3.69E-07, loss= 1.0751 (max= 1.5944), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:34,903 - root - INFO - Step 17600: lr=3.67E-07, loss= 1.0242 (max= 1.5135), tps=20629, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:34,903 - root - INFO - Step 17600: lr=3.67E-07, loss= 1.0242 (max= 1.5135), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:34,903 - root - INFO - Step 17600: lr=3.67E-07, loss= 1.0242 (max= 1.5135), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:34,903 - root - INFO - Step 17600: lr=3.67E-07, loss= 1.0242 (max= 1.5135), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:34,903 - root - INFO - Step 17600: lr=3.67E-07, loss= 1.0242 (max= 1.5135), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:34,903 - root - INFO - Step 17600: lr=3.67E-07, loss= 1.0242 (max= 1.5135), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:34,903 - root - INFO - Step 17600: lr=3.67E-07, loss= 1.0242 (max= 1.5135), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:44:34,903 - root - INFO - Step 17600: lr=3.67E-07, loss= 1.0242 (max= 1.5135), tps=20630, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:06,773 - root - INFO - Step 17610: lr=3.64E-07, loss= 1.0307 (max= 1.5141), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:06,773 - root - INFO - Step 17610: lr=3.64E-07, loss= 1.0307 (max= 1.5141), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:06,773 - root - INFO - Step 17610: lr=3.64E-07, loss= 1.0307 (max= 1.5141), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:06,773 - root - INFO - Step 17610: lr=3.64E-07, loss= 1.0307 (max= 1.5141), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:06,773 - root - INFO - Step 17610: lr=3.64E-07, loss= 1.0307 (max= 1.5141), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:06,773 - root - INFO - Step 17610: lr=3.64E-07, loss= 1.0307 (max= 1.5141), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:06,773 - root - INFO - Step 17610: lr=3.64E-07, loss= 1.0307 (max= 1.5141), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:06,773 - root - INFO - Step 17610: lr=3.64E-07, loss= 1.0307 (max= 1.5141), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:38,623 - root - INFO - Step 17620: lr=3.61E-07, loss= 1.0234 (max= 1.5168), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:38,623 - root - INFO - Step 17620: lr=3.61E-07, loss= 1.0234 (max= 1.5168), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:38,623 - root - INFO - Step 17620: lr=3.61E-07, loss= 1.0234 (max= 1.5168), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:38,623 - root - INFO - Step 17620: lr=3.61E-07, loss= 1.0234 (max= 1.5168), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:38,623 - root - INFO - Step 17620: lr=3.61E-07, loss= 1.0234 (max= 1.5168), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:38,623 - root - INFO - Step 17620: lr=3.61E-07, loss= 1.0234 (max= 1.5168), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:38,623 - root - INFO - Step 17620: lr=3.61E-07, loss= 1.0234 (max= 1.5168), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:45:38,623 - root - INFO - Step 17620: lr=3.61E-07, loss= 1.0234 (max= 1.5168), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:10,524 - root - INFO - Step 17630: lr=3.58E-07, loss= 1.0420 (max= 1.4272), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:10,524 - root - INFO - Step 17630: lr=3.58E-07, loss= 1.0420 (max= 1.4272), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:10,524 - root - INFO - Step 17630: lr=3.58E-07, loss= 1.0420 (max= 1.4272), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:10,524 - root - INFO - Step 17630: lr=3.58E-07, loss= 1.0420 (max= 1.4272), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:10,524 - root - INFO - Step 17630: lr=3.58E-07, loss= 1.0420 (max= 1.4272), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:10,524 - root - INFO - Step 17630: lr=3.58E-07, loss= 1.0420 (max= 1.4272), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:10,524 - root - INFO - Step 17630: lr=3.58E-07, loss= 1.0420 (max= 1.4272), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:10,524 - root - INFO - Step 17630: lr=3.58E-07, loss= 1.0420 (max= 1.4272), tps=20546, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:42,341 - root - INFO - Step 17640: lr=3.55E-07, loss= 1.0511 (max= 1.6079), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:42,341 - root - INFO - Step 17640: lr=3.55E-07, loss= 1.0511 (max= 1.6079), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:42,342 - root - INFO - Step 17640: lr=3.55E-07, loss= 1.0511 (max= 1.6079), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:42,342 - root - INFO - Step 17640: lr=3.55E-07, loss= 1.0511 (max= 1.6079), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:42,342 - root - INFO - Step 17640: lr=3.55E-07, loss= 1.0511 (max= 1.6079), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:42,342 - root - INFO - Step 17640: lr=3.55E-07, loss= 1.0511 (max= 1.6079), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:42,342 - root - INFO - Step 17640: lr=3.55E-07, loss= 1.0511 (max= 1.6079), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:46:42,342 - root - INFO - Step 17640: lr=3.55E-07, loss= 1.0511 (max= 1.6079), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:14,152 - root - INFO - Step 17650: lr=3.52E-07, loss= 1.0452 (max= 1.4466), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:14,152 - root - INFO - Step 17650: lr=3.52E-07, loss= 1.0452 (max= 1.4466), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:14,152 - root - INFO - Step 17650: lr=3.52E-07, loss= 1.0452 (max= 1.4466), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:14,152 - root - INFO - Step 17650: lr=3.52E-07, loss= 1.0452 (max= 1.4466), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:14,152 - root - INFO - Step 17650: lr=3.52E-07, loss= 1.0452 (max= 1.4466), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:14,152 - root - INFO - Step 17650: lr=3.52E-07, loss= 1.0452 (max= 1.4466), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:14,152 - root - INFO - Step 17650: lr=3.52E-07, loss= 1.0452 (max= 1.4466), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:14,152 - root - INFO - Step 17650: lr=3.52E-07, loss= 1.0452 (max= 1.4466), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:46,006 - root - INFO - Step 17660: lr=3.50E-07, loss= 1.0372 (max= 1.4738), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:46,006 - root - INFO - Step 17660: lr=3.50E-07, loss= 1.0372 (max= 1.4738), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:46,006 - root - INFO - Step 17660: lr=3.50E-07, loss= 1.0372 (max= 1.4738), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:46,006 - root - INFO - Step 17660: lr=3.50E-07, loss= 1.0372 (max= 1.4738), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:46,006 - root - INFO - Step 17660: lr=3.50E-07, loss= 1.0372 (max= 1.4738), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:46,006 - root - INFO - Step 17660: lr=3.50E-07, loss= 1.0372 (max= 1.4738), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:46,006 - root - INFO - Step 17660: lr=3.50E-07, loss= 1.0372 (max= 1.4738), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:47:46,006 - root - INFO - Step 17660: lr=3.50E-07, loss= 1.0372 (max= 1.4738), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:08,860 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:6066291 2025-10-26 23:48:17,810 - root - INFO - Step 17670: lr=3.47E-07, loss= 1.0568 (max= 1.6448), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:17,810 - root - INFO - Step 17670: lr=3.47E-07, loss= 1.0568 (max= 1.6448), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:17,810 - root - INFO - Step 17670: lr=3.47E-07, loss= 1.0568 (max= 1.6448), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:17,810 - root - INFO - Step 17670: lr=3.47E-07, loss= 1.0568 (max= 1.6448), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:17,810 - root - INFO - Step 17670: lr=3.47E-07, loss= 1.0568 (max= 1.6448), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:17,810 - root - INFO - Step 17670: lr=3.47E-07, loss= 1.0568 (max= 1.6448), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:17,810 - root - INFO - Step 17670: lr=3.47E-07, loss= 1.0568 (max= 1.6448), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:17,810 - root - INFO - Step 17670: lr=3.47E-07, loss= 1.0568 (max= 1.6448), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:49,664 - root - INFO - Step 17680: lr=3.44E-07, loss= 1.0664 (max= 1.9726), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:49,664 - root - INFO - Step 17680: lr=3.44E-07, loss= 1.0664 (max= 1.9726), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:49,664 - root - INFO - Step 17680: lr=3.44E-07, loss= 1.0664 (max= 1.9726), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:49,664 - root - INFO - Step 17680: lr=3.44E-07, loss= 1.0664 (max= 1.9726), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:49,664 - root - INFO - Step 17680: lr=3.44E-07, loss= 1.0664 (max= 1.9726), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:49,664 - root - INFO - Step 17680: lr=3.44E-07, loss= 1.0664 (max= 1.9726), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:49,664 - root - INFO - Step 17680: lr=3.44E-07, loss= 1.0664 (max= 1.9726), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:48:49,664 - root - INFO - Step 17680: lr=3.44E-07, loss= 1.0664 (max= 1.9726), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:21,810 - root - INFO - Step 17690: lr=3.41E-07, loss= 1.0577 (max= 1.5291), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:21,810 - root - INFO - Step 17690: lr=3.41E-07, loss= 1.0577 (max= 1.5291), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:21,810 - root - INFO - Step 17690: lr=3.41E-07, loss= 1.0577 (max= 1.5291), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:21,810 - root - INFO - Step 17690: lr=3.41E-07, loss= 1.0577 (max= 1.5291), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:21,810 - root - INFO - Step 17690: lr=3.41E-07, loss= 1.0577 (max= 1.5291), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:21,810 - root - INFO - Step 17690: lr=3.41E-07, loss= 1.0577 (max= 1.5291), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:21,810 - root - INFO - Step 17690: lr=3.41E-07, loss= 1.0577 (max= 1.5291), tps=20390, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:21,811 - root - INFO - Step 17690: lr=3.41E-07, loss= 1.0577 (max= 1.5291), tps=20389, mfu=42.48%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:53,621 - root - INFO - Step 17700: lr=3.38E-07, loss= 1.0488 (max= 1.4627), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:53,621 - root - INFO - Step 17700: lr=3.38E-07, loss= 1.0488 (max= 1.4627), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:53,621 - root - INFO - Step 17700: lr=3.38E-07, loss= 1.0488 (max= 1.4627), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:53,621 - root - INFO - Step 17700: lr=3.38E-07, loss= 1.0488 (max= 1.4627), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:53,621 - root - INFO - Step 17700: lr=3.38E-07, loss= 1.0488 (max= 1.4627), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:53,621 - root - INFO - Step 17700: lr=3.38E-07, loss= 1.0488 (max= 1.4627), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:53,621 - root - INFO - Step 17700: lr=3.38E-07, loss= 1.0488 (max= 1.4627), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:49:53,621 - root - INFO - Step 17700: lr=3.38E-07, loss= 1.0488 (max= 1.4627), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:25,400 - root - INFO - Step 17710: lr=3.36E-07, loss= 1.0469 (max= 1.6594), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:25,400 - root - INFO - Step 17710: lr=3.36E-07, loss= 1.0469 (max= 1.6594), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:25,400 - root - INFO - Step 17710: lr=3.36E-07, loss= 1.0469 (max= 1.6594), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:25,400 - root - INFO - Step 17710: lr=3.36E-07, loss= 1.0469 (max= 1.6594), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:25,400 - root - INFO - Step 17710: lr=3.36E-07, loss= 1.0469 (max= 1.6594), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:25,400 - root - INFO - Step 17710: lr=3.36E-07, loss= 1.0469 (max= 1.6594), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:25,400 - root - INFO - Step 17710: lr=3.36E-07, loss= 1.0469 (max= 1.6594), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:25,400 - root - INFO - Step 17710: lr=3.36E-07, loss= 1.0469 (max= 1.6594), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:57,233 - root - INFO - Step 17720: lr=3.33E-07, loss= 1.0542 (max= 1.4243), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:57,233 - root - INFO - Step 17720: lr=3.33E-07, loss= 1.0542 (max= 1.4243), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:57,233 - root - INFO - Step 17720: lr=3.33E-07, loss= 1.0542 (max= 1.4243), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:57,233 - root - INFO - Step 17720: lr=3.33E-07, loss= 1.0542 (max= 1.4243), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:57,233 - root - INFO - Step 17720: lr=3.33E-07, loss= 1.0542 (max= 1.4243), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:57,233 - root - INFO - Step 17720: lr=3.33E-07, loss= 1.0542 (max= 1.4243), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:57,233 - root - INFO - Step 17720: lr=3.33E-07, loss= 1.0542 (max= 1.4243), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:50:57,233 - root - INFO - Step 17720: lr=3.33E-07, loss= 1.0542 (max= 1.4243), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:51:29,020 - root - INFO - Step 17730: lr=3.30E-07, loss= 1.0483 (max= 1.4569), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:51:29,020 - root - INFO - Step 17730: lr=3.30E-07, loss= 1.0483 (max= 1.4569), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:51:29,020 - root - INFO - Step 17730: lr=3.30E-07, loss= 1.0483 (max= 1.4569), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:51:29,020 - root - INFO - Step 17730: lr=3.30E-07, loss= 1.0483 (max= 1.4569), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:51:29,020 - root - INFO - Step 17730: lr=3.30E-07, loss= 1.0483 (max= 1.4569), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:51:29,020 - root - INFO - Step 17730: lr=3.30E-07, loss= 1.0483 (max= 1.4569), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:51:29,020 - root - INFO - Step 17730: lr=3.30E-07, loss= 1.0483 (max= 1.4569), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:51:29,020 - root - INFO - Step 17730: lr=3.30E-07, loss= 1.0483 (max= 1.4569), tps=20619, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:00,932 - root - INFO - Step 17740: lr=3.27E-07, loss= 1.0619 (max= 1.6629), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:00,933 - root - INFO - Step 17740: lr=3.27E-07, loss= 1.0619 (max= 1.6629), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:00,933 - root - INFO - Step 17740: lr=3.27E-07, loss= 1.0619 (max= 1.6629), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:00,933 - root - INFO - Step 17740: lr=3.27E-07, loss= 1.0619 (max= 1.6629), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:00,933 - root - INFO - Step 17740: lr=3.27E-07, loss= 1.0619 (max= 1.6629), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:00,933 - root - INFO - Step 17740: lr=3.27E-07, loss= 1.0619 (max= 1.6629), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:00,933 - root - INFO - Step 17740: lr=3.27E-07, loss= 1.0619 (max= 1.6629), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:00,933 - root - INFO - Step 17740: lr=3.27E-07, loss= 1.0619 (max= 1.6629), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:32,752 - root - INFO - Step 17750: lr=3.24E-07, loss= 1.0759 (max= 1.8262), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:32,752 - root - INFO - Step 17750: lr=3.24E-07, loss= 1.0759 (max= 1.8262), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:32,752 - root - INFO - Step 17750: lr=3.24E-07, loss= 1.0759 (max= 1.8262), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:32,752 - root - INFO - Step 17750: lr=3.24E-07, loss= 1.0759 (max= 1.8262), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:32,752 - root - INFO - Step 17750: lr=3.24E-07, loss= 1.0759 (max= 1.8262), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:32,752 - root - INFO - Step 17750: lr=3.24E-07, loss= 1.0759 (max= 1.8262), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:32,752 - root - INFO - Step 17750: lr=3.24E-07, loss= 1.0759 (max= 1.8262), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:52:32,752 - root - INFO - Step 17750: lr=3.24E-07, loss= 1.0759 (max= 1.8262), tps=20598, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:04,650 - root - INFO - Step 17760: lr=3.22E-07, loss= 1.0356 (max= 1.4773), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:04,650 - root - INFO - Step 17760: lr=3.22E-07, loss= 1.0356 (max= 1.4773), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:04,650 - root - INFO - Step 17760: lr=3.22E-07, loss= 1.0356 (max= 1.4773), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:04,650 - root - INFO - Step 17760: lr=3.22E-07, loss= 1.0356 (max= 1.4773), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:04,650 - root - INFO - Step 17760: lr=3.22E-07, loss= 1.0356 (max= 1.4773), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:04,650 - root - INFO - Step 17760: lr=3.22E-07, loss= 1.0356 (max= 1.4773), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:04,650 - root - INFO - Step 17760: lr=3.22E-07, loss= 1.0356 (max= 1.4773), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:04,650 - root - INFO - Step 17760: lr=3.22E-07, loss= 1.0356 (max= 1.4773), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:36,506 - root - INFO - Step 17770: lr=3.19E-07, loss= 1.0484 (max= 1.5820), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:36,506 - root - INFO - Step 17770: lr=3.19E-07, loss= 1.0484 (max= 1.5820), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:36,506 - root - INFO - Step 17770: lr=3.19E-07, loss= 1.0484 (max= 1.5820), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:36,506 - root - INFO - Step 17770: lr=3.19E-07, loss= 1.0484 (max= 1.5820), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:36,506 - root - INFO - Step 17770: lr=3.19E-07, loss= 1.0484 (max= 1.5820), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:36,506 - root - INFO - Step 17770: lr=3.19E-07, loss= 1.0484 (max= 1.5820), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:36,506 - root - INFO - Step 17770: lr=3.19E-07, loss= 1.0484 (max= 1.5820), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:53:36,506 - root - INFO - Step 17770: lr=3.19E-07, loss= 1.0484 (max= 1.5820), tps=20575, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:08,354 - root - INFO - Step 17780: lr=3.16E-07, loss= 1.0588 (max= 1.5726), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:08,354 - root - INFO - Step 17780: lr=3.16E-07, loss= 1.0588 (max= 1.5726), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:08,354 - root - INFO - Step 17780: lr=3.16E-07, loss= 1.0588 (max= 1.5726), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:08,354 - root - INFO - Step 17780: lr=3.16E-07, loss= 1.0588 (max= 1.5726), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:08,354 - root - INFO - Step 17780: lr=3.16E-07, loss= 1.0588 (max= 1.5726), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:08,354 - root - INFO - Step 17780: lr=3.16E-07, loss= 1.0588 (max= 1.5726), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:08,354 - root - INFO - Step 17780: lr=3.16E-07, loss= 1.0588 (max= 1.5726), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:08,354 - root - INFO - Step 17780: lr=3.16E-07, loss= 1.0588 (max= 1.5726), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:40,171 - root - INFO - Step 17790: lr=3.13E-07, loss= 1.0472 (max= 1.5216), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:40,171 - root - INFO - Step 17790: lr=3.13E-07, loss= 1.0472 (max= 1.5216), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:40,171 - root - INFO - Step 17790: lr=3.13E-07, loss= 1.0472 (max= 1.5216), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:40,171 - root - INFO - Step 17790: lr=3.13E-07, loss= 1.0472 (max= 1.5216), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:40,171 - root - INFO - Step 17790: lr=3.13E-07, loss= 1.0472 (max= 1.5216), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:40,171 - root - INFO - Step 17790: lr=3.13E-07, loss= 1.0472 (max= 1.5216), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:40,171 - root - INFO - Step 17790: lr=3.13E-07, loss= 1.0472 (max= 1.5216), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:54:40,171 - root - INFO - Step 17790: lr=3.13E-07, loss= 1.0472 (max= 1.5216), tps=20600, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:12,068 - root - INFO - Step 17800: lr=3.10E-07, loss= 1.0608 (max= 1.5329), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:12,068 - root - INFO - Step 17800: lr=3.10E-07, loss= 1.0608 (max= 1.5329), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:12,068 - root - INFO - Step 17800: lr=3.10E-07, loss= 1.0608 (max= 1.5329), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:12,068 - root - INFO - Step 17800: lr=3.10E-07, loss= 1.0608 (max= 1.5329), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:12,068 - root - INFO - Step 17800: lr=3.10E-07, loss= 1.0608 (max= 1.5329), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:12,068 - root - INFO - Step 17800: lr=3.10E-07, loss= 1.0608 (max= 1.5329), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:12,068 - root - INFO - Step 17800: lr=3.10E-07, loss= 1.0608 (max= 1.5329), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:12,068 - root - INFO - Step 17800: lr=3.10E-07, loss= 1.0608 (max= 1.5329), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:44,079 - root - INFO - Step 17810: lr=3.08E-07, loss= 1.0440 (max= 1.4786), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:44,079 - root - INFO - Step 17810: lr=3.08E-07, loss= 1.0440 (max= 1.4786), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:44,079 - root - INFO - Step 17810: lr=3.08E-07, loss= 1.0440 (max= 1.4786), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:44,079 - root - INFO - Step 17810: lr=3.08E-07, loss= 1.0440 (max= 1.4786), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:44,079 - root - INFO - Step 17810: lr=3.08E-07, loss= 1.0440 (max= 1.4786), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:44,079 - root - INFO - Step 17810: lr=3.08E-07, loss= 1.0440 (max= 1.4786), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:44,079 - root - INFO - Step 17810: lr=3.08E-07, loss= 1.0440 (max= 1.4786), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:55:44,079 - root - INFO - Step 17810: lr=3.08E-07, loss= 1.0440 (max= 1.4786), tps=20475, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:15,970 - root - INFO - Step 17820: lr=3.05E-07, loss= 1.0540 (max= 1.4701), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:15,970 - root - INFO - Step 17820: lr=3.05E-07, loss= 1.0540 (max= 1.4701), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:15,970 - root - INFO - Step 17820: lr=3.05E-07, loss= 1.0540 (max= 1.4701), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:15,970 - root - INFO - Step 17820: lr=3.05E-07, loss= 1.0540 (max= 1.4701), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:15,970 - root - INFO - Step 17820: lr=3.05E-07, loss= 1.0540 (max= 1.4701), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:15,970 - root - INFO - Step 17820: lr=3.05E-07, loss= 1.0540 (max= 1.4701), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:15,970 - root - INFO - Step 17820: lr=3.05E-07, loss= 1.0540 (max= 1.4701), tps=20552, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:15,970 - root - INFO - Step 17820: lr=3.05E-07, loss= 1.0540 (max= 1.4701), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:47,764 - root - INFO - Step 17830: lr=3.02E-07, loss= 1.0365 (max= 1.5843), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:47,764 - root - INFO - Step 17830: lr=3.02E-07, loss= 1.0365 (max= 1.5843), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:47,764 - root - INFO - Step 17830: lr=3.02E-07, loss= 1.0365 (max= 1.5843), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:47,764 - root - INFO - Step 17830: lr=3.02E-07, loss= 1.0365 (max= 1.5843), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:47,764 - root - INFO - Step 17830: lr=3.02E-07, loss= 1.0365 (max= 1.5843), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:47,764 - root - INFO - Step 17830: lr=3.02E-07, loss= 1.0365 (max= 1.5843), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:47,764 - root - INFO - Step 17830: lr=3.02E-07, loss= 1.0365 (max= 1.5843), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:56:47,764 - root - INFO - Step 17830: lr=3.02E-07, loss= 1.0365 (max= 1.5843), tps=20615, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:19,621 - root - INFO - Step 17840: lr=2.99E-07, loss= 1.0582 (max= 1.4934), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:19,621 - root - INFO - Step 17840: lr=2.99E-07, loss= 1.0582 (max= 1.4934), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:19,621 - root - INFO - Step 17840: lr=2.99E-07, loss= 1.0582 (max= 1.4934), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:19,621 - root - INFO - Step 17840: lr=2.99E-07, loss= 1.0582 (max= 1.4934), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:19,621 - root - INFO - Step 17840: lr=2.99E-07, loss= 1.0582 (max= 1.4934), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:19,621 - root - INFO - Step 17840: lr=2.99E-07, loss= 1.0582 (max= 1.4934), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:19,621 - root - INFO - Step 17840: lr=2.99E-07, loss= 1.0582 (max= 1.4934), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:19,621 - root - INFO - Step 17840: lr=2.99E-07, loss= 1.0582 (max= 1.4934), tps=20574, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:51,370 - root - INFO - Step 17850: lr=2.96E-07, loss= 1.0405 (max= 1.4907), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:51,370 - root - INFO - Step 17850: lr=2.96E-07, loss= 1.0405 (max= 1.4907), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:51,370 - root - INFO - Step 17850: lr=2.96E-07, loss= 1.0405 (max= 1.4907), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:51,370 - root - INFO - Step 17850: lr=2.96E-07, loss= 1.0405 (max= 1.4907), tps=20645, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:51,370 - root - INFO - Step 17850: lr=2.96E-07, loss= 1.0405 (max= 1.4907), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:51,370 - root - INFO - Step 17850: lr=2.96E-07, loss= 1.0405 (max= 1.4907), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:51,370 - root - INFO - Step 17850: lr=2.96E-07, loss= 1.0405 (max= 1.4907), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:57:51,370 - root - INFO - Step 17850: lr=2.96E-07, loss= 1.0405 (max= 1.4907), tps=20644, mfu=43.01%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:23,386 - root - INFO - Step 17860: lr=2.94E-07, loss= 1.0292 (max= 1.5202), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:23,386 - root - INFO - Step 17860: lr=2.94E-07, loss= 1.0292 (max= 1.5202), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:23,386 - root - INFO - Step 17860: lr=2.94E-07, loss= 1.0292 (max= 1.5202), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:23,386 - root - INFO - Step 17860: lr=2.94E-07, loss= 1.0292 (max= 1.5202), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:23,386 - root - INFO - Step 17860: lr=2.94E-07, loss= 1.0292 (max= 1.5202), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:23,386 - root - INFO - Step 17860: lr=2.94E-07, loss= 1.0292 (max= 1.5202), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:23,386 - root - INFO - Step 17860: lr=2.94E-07, loss= 1.0292 (max= 1.5202), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:23,386 - root - INFO - Step 17860: lr=2.94E-07, loss= 1.0292 (max= 1.5202), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:55,227 - root - INFO - Step 17870: lr=2.91E-07, loss= 1.0611 (max= 1.4592), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:55,227 - root - INFO - Step 17870: lr=2.91E-07, loss= 1.0611 (max= 1.4592), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:55,227 - root - INFO - Step 17870: lr=2.91E-07, loss= 1.0611 (max= 1.4592), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:55,227 - root - INFO - Step 17870: lr=2.91E-07, loss= 1.0611 (max= 1.4592), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:55,227 - root - INFO - Step 17870: lr=2.91E-07, loss= 1.0611 (max= 1.4592), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:55,227 - root - INFO - Step 17870: lr=2.91E-07, loss= 1.0611 (max= 1.4592), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:55,227 - root - INFO - Step 17870: lr=2.91E-07, loss= 1.0611 (max= 1.4592), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:58:55,227 - root - INFO - Step 17870: lr=2.91E-07, loss= 1.0611 (max= 1.4592), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:27,150 - root - INFO - Step 17880: lr=2.88E-07, loss= 1.0383 (max= 1.5539), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:27,150 - root - INFO - Step 17880: lr=2.88E-07, loss= 1.0383 (max= 1.5539), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:27,150 - root - INFO - Step 17880: lr=2.88E-07, loss= 1.0383 (max= 1.5539), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:27,150 - root - INFO - Step 17880: lr=2.88E-07, loss= 1.0383 (max= 1.5539), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:27,150 - root - INFO - Step 17880: lr=2.88E-07, loss= 1.0383 (max= 1.5539), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:27,150 - root - INFO - Step 17880: lr=2.88E-07, loss= 1.0383 (max= 1.5539), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:27,150 - root - INFO - Step 17880: lr=2.88E-07, loss= 1.0383 (max= 1.5539), tps=20532, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:27,150 - root - INFO - Step 17880: lr=2.88E-07, loss= 1.0383 (max= 1.5539), tps=20531, mfu=42.78%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:58,985 - root - INFO - Step 17890: lr=2.85E-07, loss= 1.0538 (max= 1.9022), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:58,985 - root - INFO - Step 17890: lr=2.85E-07, loss= 1.0538 (max= 1.9022), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:58,985 - root - INFO - Step 17890: lr=2.85E-07, loss= 1.0538 (max= 1.9022), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:58,985 - root - INFO - Step 17890: lr=2.85E-07, loss= 1.0538 (max= 1.9022), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:58,985 - root - INFO - Step 17890: lr=2.85E-07, loss= 1.0538 (max= 1.9022), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:58,985 - root - INFO - Step 17890: lr=2.85E-07, loss= 1.0538 (max= 1.9022), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:58,985 - root - INFO - Step 17890: lr=2.85E-07, loss= 1.0538 (max= 1.9022), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-26 23:59:58,985 - root - INFO - Step 17890: lr=2.85E-07, loss= 1.0538 (max= 1.9022), tps=20588, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:00:30,978 - root - INFO - Step 17900: lr=2.82E-07, loss= 1.0251 (max= 1.9809), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:00:30,978 - root - INFO - Step 17900: lr=2.82E-07, loss= 1.0251 (max= 1.9809), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:00:30,978 - root - INFO - Step 17900: lr=2.82E-07, loss= 1.0251 (max= 1.9809), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:00:30,978 - root - INFO - Step 17900: lr=2.82E-07, loss= 1.0251 (max= 1.9809), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:00:30,978 - root - INFO - Step 17900: lr=2.82E-07, loss= 1.0251 (max= 1.9809), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:00:30,978 - root - INFO - Step 17900: lr=2.82E-07, loss= 1.0251 (max= 1.9809), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:00:30,978 - root - INFO - Step 17900: lr=2.82E-07, loss= 1.0251 (max= 1.9809), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:00:30,978 - root - INFO - Step 17900: lr=2.82E-07, loss= 1.0251 (max= 1.9809), tps=20486, mfu=42.68%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:02,820 - root - INFO - Step 17910: lr=2.80E-07, loss= 1.0161 (max= 1.5229), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:02,820 - root - INFO - Step 17910: lr=2.80E-07, loss= 1.0161 (max= 1.5229), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:02,820 - root - INFO - Step 17910: lr=2.80E-07, loss= 1.0161 (max= 1.5229), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:02,820 - root - INFO - Step 17910: lr=2.80E-07, loss= 1.0161 (max= 1.5229), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:02,820 - root - INFO - Step 17910: lr=2.80E-07, loss= 1.0161 (max= 1.5229), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:02,820 - root - INFO - Step 17910: lr=2.80E-07, loss= 1.0161 (max= 1.5229), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:02,820 - root - INFO - Step 17910: lr=2.80E-07, loss= 1.0161 (max= 1.5229), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:02,820 - root - INFO - Step 17910: lr=2.80E-07, loss= 1.0161 (max= 1.5229), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:34,719 - root - INFO - Step 17920: lr=2.77E-07, loss= 1.0473 (max= 1.4630), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:34,719 - root - INFO - Step 17920: lr=2.77E-07, loss= 1.0473 (max= 1.4630), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:34,719 - root - INFO - Step 17920: lr=2.77E-07, loss= 1.0473 (max= 1.4630), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:34,719 - root - INFO - Step 17920: lr=2.77E-07, loss= 1.0473 (max= 1.4630), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:34,719 - root - INFO - Step 17920: lr=2.77E-07, loss= 1.0473 (max= 1.4630), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:34,719 - root - INFO - Step 17920: lr=2.77E-07, loss= 1.0473 (max= 1.4630), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:34,719 - root - INFO - Step 17920: lr=2.77E-07, loss= 1.0473 (max= 1.4630), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:01:34,719 - root - INFO - Step 17920: lr=2.77E-07, loss= 1.0473 (max= 1.4630), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:06,532 - root - INFO - Step 17930: lr=2.74E-07, loss= 1.0567 (max= 1.6543), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:06,532 - root - INFO - Step 17930: lr=2.74E-07, loss= 1.0567 (max= 1.6543), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:06,532 - root - INFO - Step 17930: lr=2.74E-07, loss= 1.0567 (max= 1.6543), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:06,532 - root - INFO - Step 17930: lr=2.74E-07, loss= 1.0567 (max= 1.6543), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:06,532 - root - INFO - Step 17930: lr=2.74E-07, loss= 1.0567 (max= 1.6543), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:06,532 - root - INFO - Step 17930: lr=2.74E-07, loss= 1.0567 (max= 1.6543), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:06,532 - root - INFO - Step 17930: lr=2.74E-07, loss= 1.0567 (max= 1.6543), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:06,532 - root - INFO - Step 17930: lr=2.74E-07, loss= 1.0567 (max= 1.6543), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:38,438 - root - INFO - Step 17940: lr=2.71E-07, loss= 1.0469 (max= 1.6998), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:38,438 - root - INFO - Step 17940: lr=2.71E-07, loss= 1.0469 (max= 1.6998), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:38,438 - root - INFO - Step 17940: lr=2.71E-07, loss= 1.0469 (max= 1.6998), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:38,438 - root - INFO - Step 17940: lr=2.71E-07, loss= 1.0469 (max= 1.6998), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:38,438 - root - INFO - Step 17940: lr=2.71E-07, loss= 1.0469 (max= 1.6998), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:38,438 - root - INFO - Step 17940: lr=2.71E-07, loss= 1.0469 (max= 1.6998), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:38,438 - root - INFO - Step 17940: lr=2.71E-07, loss= 1.0469 (max= 1.6998), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:02:38,438 - root - INFO - Step 17940: lr=2.71E-07, loss= 1.0469 (max= 1.6998), tps=20543, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:10,317 - root - INFO - Step 17950: lr=2.68E-07, loss= 1.0493 (max= 1.6892), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:10,317 - root - INFO - Step 17950: lr=2.68E-07, loss= 1.0493 (max= 1.6892), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:10,317 - root - INFO - Step 17950: lr=2.68E-07, loss= 1.0493 (max= 1.6892), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:10,317 - root - INFO - Step 17950: lr=2.68E-07, loss= 1.0493 (max= 1.6892), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:10,317 - root - INFO - Step 17950: lr=2.68E-07, loss= 1.0493 (max= 1.6892), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:10,317 - root - INFO - Step 17950: lr=2.68E-07, loss= 1.0493 (max= 1.6892), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:10,317 - root - INFO - Step 17950: lr=2.68E-07, loss= 1.0493 (max= 1.6892), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:10,317 - root - INFO - Step 17950: lr=2.68E-07, loss= 1.0493 (max= 1.6892), tps=20560, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:42,118 - root - INFO - Step 17960: lr=2.66E-07, loss= 1.0248 (max= 1.5837), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:42,118 - root - INFO - Step 17960: lr=2.66E-07, loss= 1.0248 (max= 1.5837), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:42,118 - root - INFO - Step 17960: lr=2.66E-07, loss= 1.0248 (max= 1.5837), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:42,118 - root - INFO - Step 17960: lr=2.66E-07, loss= 1.0248 (max= 1.5837), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:42,118 - root - INFO - Step 17960: lr=2.66E-07, loss= 1.0248 (max= 1.5837), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:42,118 - root - INFO - Step 17960: lr=2.66E-07, loss= 1.0248 (max= 1.5837), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:42,118 - root - INFO - Step 17960: lr=2.66E-07, loss= 1.0248 (max= 1.5837), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:03:42,118 - root - INFO - Step 17960: lr=2.66E-07, loss= 1.0248 (max= 1.5837), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:14,001 - root - INFO - Step 17970: lr=2.63E-07, loss= 1.0322 (max= 1.4641), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:14,001 - root - INFO - Step 17970: lr=2.63E-07, loss= 1.0322 (max= 1.4641), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:14,001 - root - INFO - Step 17970: lr=2.63E-07, loss= 1.0322 (max= 1.4641), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:14,001 - root - INFO - Step 17970: lr=2.63E-07, loss= 1.0322 (max= 1.4641), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:14,001 - root - INFO - Step 17970: lr=2.63E-07, loss= 1.0322 (max= 1.4641), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:14,001 - root - INFO - Step 17970: lr=2.63E-07, loss= 1.0322 (max= 1.4641), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:14,001 - root - INFO - Step 17970: lr=2.63E-07, loss= 1.0322 (max= 1.4641), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:14,002 - root - INFO - Step 17970: lr=2.63E-07, loss= 1.0322 (max= 1.4641), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:45,929 - root - INFO - Step 17980: lr=2.60E-07, loss= 1.0308 (max= 1.5285), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:45,929 - root - INFO - Step 17980: lr=2.60E-07, loss= 1.0308 (max= 1.5285), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:45,929 - root - INFO - Step 17980: lr=2.60E-07, loss= 1.0308 (max= 1.5285), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:45,929 - root - INFO - Step 17980: lr=2.60E-07, loss= 1.0308 (max= 1.5285), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:45,929 - root - INFO - Step 17980: lr=2.60E-07, loss= 1.0308 (max= 1.5285), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:45,929 - root - INFO - Step 17980: lr=2.60E-07, loss= 1.0308 (max= 1.5285), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:45,929 - root - INFO - Step 17980: lr=2.60E-07, loss= 1.0308 (max= 1.5285), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:04:45,929 - root - INFO - Step 17980: lr=2.60E-07, loss= 1.0308 (max= 1.5285), tps=20529, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:17,841 - root - INFO - Step 17990: lr=2.57E-07, loss= 1.0559 (max= 1.7124), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:17,841 - root - INFO - Step 17990: lr=2.57E-07, loss= 1.0559 (max= 1.7124), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:17,841 - root - INFO - Step 17990: lr=2.57E-07, loss= 1.0559 (max= 1.7124), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:17,841 - root - INFO - Step 17990: lr=2.57E-07, loss= 1.0559 (max= 1.7124), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:17,841 - root - INFO - Step 17990: lr=2.57E-07, loss= 1.0559 (max= 1.7124), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:17,841 - root - INFO - Step 17990: lr=2.57E-07, loss= 1.0559 (max= 1.7124), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:17,841 - root - INFO - Step 17990: lr=2.57E-07, loss= 1.0559 (max= 1.7124), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:17,841 - root - INFO - Step 17990: lr=2.57E-07, loss= 1.0559 (max= 1.7124), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) Saving dataset to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-18000 Dataset successfully saved to jobs/munin-7b-open-stage3/checkpoints/dataloader/step-18000! Save time: 4.411734104156494 2025-10-27 00:05:49,821 - root - INFO - Step 18000: lr=2.55E-07, loss= 1.0489 (max= 1.4791), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:49,821 - root - INFO - Step 18000: lr=2.55E-07, loss= 1.0489 (max= 1.4791), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:49,821 - root - INFO - Step 18000: lr=2.55E-07, loss= 1.0489 (max= 1.4791), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:49,821 - root - INFO - Step 18000: lr=2.55E-07, loss= 1.0489 (max= 1.4791), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:49,821 - root - INFO - Saving a full checkpoint at step 18000 2025-10-27 00:05:49,821 - root - INFO - Saving a full checkpoint at step 18000 2025-10-27 00:05:49,821 - root - INFO - Step 18000: lr=2.55E-07, loss= 1.0489 (max= 1.4791), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:49,821 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-27 00:05:49,821 - root - INFO - Saving a full checkpoint at step 18000 2025-10-27 00:05:49,821 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-27 00:05:49,821 - root - INFO - Saving a full checkpoint at step 18000 2025-10-27 00:05:49,821 - root - INFO - Step 18000: lr=2.55E-07, loss= 1.0489 (max= 1.4791), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:49,821 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-27 00:05:49,821 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-27 00:05:49,821 - root - INFO - Step 18000: lr=2.55E-07, loss= 1.0489 (max= 1.4791), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:49,821 - root - INFO - Saving a full checkpoint at step 18000 2025-10-27 00:05:49,821 - root - INFO - Step 18000: lr=2.55E-07, loss= 1.0489 (max= 1.4791), tps=20495, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:05:49,821 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-27 00:05:49,821 - root - INFO - Saving a full checkpoint at step 18000 2025-10-27 00:05:49,821 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-27 00:05:49,821 - root - INFO - Saving a full checkpoint at step 18000 2025-10-27 00:05:49,821 - root - INFO - Saving a full checkpoint at step 18000 2025-10-27 00:05:49,821 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-27 00:05:49,821 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) 2025-10-27 00:06:04,645 - root - INFO - Finished saving the checkpoint in 14.82 seconds 2025-10-27 00:06:04,652 - root - INFO - Finished saving the checkpoint in 14.83 seconds 2025-10-27 00:06:04,652 - root - INFO - Finished saving the checkpoint in 14.83 seconds 2025-10-27 00:06:04,652 - root - INFO - Finished saving the checkpoint in 14.83 seconds 2025-10-27 00:06:04,652 - root - INFO - Finished saving the checkpoint in 14.83 seconds 2025-10-27 00:06:04,653 - root - INFO - Finished saving the checkpoint in 14.83 seconds 2025-10-27 00:06:04,653 - root - INFO - Finished saving the checkpoint in 14.83 seconds 2025-10-27 00:06:04,653 - root - INFO - Finished saving the checkpoint in 14.83 seconds 2025-10-27 00:06:36,421 - root - INFO - Step 18010: lr=2.52E-07, loss= 1.0371 (max= 1.5425), tps=14065, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:06:36,421 - root - INFO - Step 18010: lr=2.52E-07, loss= 1.0371 (max= 1.5425), tps=14065, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:06:36,421 - root - INFO - Step 18010: lr=2.52E-07, loss= 1.0371 (max= 1.5425), tps=14065, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:06:36,421 - root - INFO - Step 18010: lr=2.52E-07, loss= 1.0371 (max= 1.5425), tps=14065, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:06:36,421 - root - INFO - Step 18010: lr=2.52E-07, loss= 1.0371 (max= 1.5425), tps=14065, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:06:36,421 - root - INFO - Step 18010: lr=2.52E-07, loss= 1.0371 (max= 1.5425), tps=14065, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:06:36,421 - root - INFO - Step 18010: lr=2.52E-07, loss= 1.0371 (max= 1.5425), tps=14065, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:06:36,421 - root - INFO - Step 18010: lr=2.52E-07, loss= 1.0371 (max= 1.5425), tps=14065, mfu=29.30%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:08,332 - root - INFO - Step 18020: lr=2.49E-07, loss= 1.0498 (max= 1.5867), tps=20540, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:08,332 - root - INFO - Step 18020: lr=2.49E-07, loss= 1.0498 (max= 1.5867), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:08,332 - root - INFO - Step 18020: lr=2.49E-07, loss= 1.0498 (max= 1.5867), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:08,332 - root - INFO - Step 18020: lr=2.49E-07, loss= 1.0498 (max= 1.5867), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:08,332 - root - INFO - Step 18020: lr=2.49E-07, loss= 1.0498 (max= 1.5867), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:08,332 - root - INFO - Step 18020: lr=2.49E-07, loss= 1.0498 (max= 1.5867), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:08,332 - root - INFO - Step 18020: lr=2.49E-07, loss= 1.0498 (max= 1.5867), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:08,332 - root - INFO - Step 18020: lr=2.49E-07, loss= 1.0498 (max= 1.5867), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:40,200 - root - INFO - Step 18030: lr=2.46E-07, loss= 1.0557 (max= 1.5456), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:40,200 - root - INFO - Step 18030: lr=2.46E-07, loss= 1.0557 (max= 1.5456), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:40,200 - root - INFO - Step 18030: lr=2.46E-07, loss= 1.0557 (max= 1.5456), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:40,200 - root - INFO - Step 18030: lr=2.46E-07, loss= 1.0557 (max= 1.5456), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:40,200 - root - INFO - Step 18030: lr=2.46E-07, loss= 1.0557 (max= 1.5456), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:40,200 - root - INFO - Step 18030: lr=2.46E-07, loss= 1.0557 (max= 1.5456), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:40,200 - root - INFO - Step 18030: lr=2.46E-07, loss= 1.0557 (max= 1.5456), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:07:40,200 - root - INFO - Step 18030: lr=2.46E-07, loss= 1.0557 (max= 1.5456), tps=20567, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:12,067 - root - INFO - Step 18040: lr=2.43E-07, loss= 1.0166 (max= 1.4390), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:12,067 - root - INFO - Step 18040: lr=2.43E-07, loss= 1.0166 (max= 1.4390), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:12,067 - root - INFO - Step 18040: lr=2.43E-07, loss= 1.0166 (max= 1.4390), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:12,067 - root - INFO - Step 18040: lr=2.43E-07, loss= 1.0166 (max= 1.4390), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:12,067 - root - INFO - Step 18040: lr=2.43E-07, loss= 1.0166 (max= 1.4390), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:12,067 - root - INFO - Step 18040: lr=2.43E-07, loss= 1.0166 (max= 1.4390), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:12,067 - root - INFO - Step 18040: lr=2.43E-07, loss= 1.0166 (max= 1.4390), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:12,067 - root - INFO - Step 18040: lr=2.43E-07, loss= 1.0166 (max= 1.4390), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:43,980 - root - INFO - Step 18050: lr=2.41E-07, loss= 1.0591 (max= 1.5866), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:43,981 - root - INFO - Step 18050: lr=2.41E-07, loss= 1.0591 (max= 1.5866), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:43,981 - root - INFO - Step 18050: lr=2.41E-07, loss= 1.0591 (max= 1.5866), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:43,981 - root - INFO - Step 18050: lr=2.41E-07, loss= 1.0591 (max= 1.5866), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:43,981 - root - INFO - Step 18050: lr=2.41E-07, loss= 1.0591 (max= 1.5866), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:43,981 - root - INFO - Step 18050: lr=2.41E-07, loss= 1.0591 (max= 1.5866), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:43,981 - root - INFO - Step 18050: lr=2.41E-07, loss= 1.0591 (max= 1.5866), tps=20538, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:08:43,981 - root - INFO - Step 18050: lr=2.41E-07, loss= 1.0591 (max= 1.5866), tps=20537, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:15,755 - root - INFO - Step 18060: lr=2.38E-07, loss= 1.0678 (max= 1.7893), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:15,755 - root - INFO - Step 18060: lr=2.38E-07, loss= 1.0678 (max= 1.7893), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:15,755 - root - INFO - Step 18060: lr=2.38E-07, loss= 1.0678 (max= 1.7893), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:15,755 - root - INFO - Step 18060: lr=2.38E-07, loss= 1.0678 (max= 1.7893), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:15,755 - root - INFO - Step 18060: lr=2.38E-07, loss= 1.0678 (max= 1.7893), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:15,755 - root - INFO - Step 18060: lr=2.38E-07, loss= 1.0678 (max= 1.7893), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:15,755 - root - INFO - Step 18060: lr=2.38E-07, loss= 1.0678 (max= 1.7893), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:15,755 - root - INFO - Step 18060: lr=2.38E-07, loss= 1.0678 (max= 1.7893), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:47,601 - root - INFO - Step 18070: lr=2.35E-07, loss= 1.0617 (max= 1.4638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:47,601 - root - INFO - Step 18070: lr=2.35E-07, loss= 1.0617 (max= 1.4638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:47,601 - root - INFO - Step 18070: lr=2.35E-07, loss= 1.0617 (max= 1.4638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:47,601 - root - INFO - Step 18070: lr=2.35E-07, loss= 1.0617 (max= 1.4638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:47,601 - root - INFO - Step 18070: lr=2.35E-07, loss= 1.0617 (max= 1.4638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:47,601 - root - INFO - Step 18070: lr=2.35E-07, loss= 1.0617 (max= 1.4638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:47,601 - root - INFO - Step 18070: lr=2.35E-07, loss= 1.0617 (max= 1.4638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:09:47,601 - root - INFO - Step 18070: lr=2.35E-07, loss= 1.0617 (max= 1.4638), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:19,442 - root - INFO - Step 18080: lr=2.32E-07, loss= 1.0359 (max= 1.8006), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:19,442 - root - INFO - Step 18080: lr=2.32E-07, loss= 1.0359 (max= 1.8006), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:19,442 - root - INFO - Step 18080: lr=2.32E-07, loss= 1.0359 (max= 1.8006), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:19,442 - root - INFO - Step 18080: lr=2.32E-07, loss= 1.0359 (max= 1.8006), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:19,443 - root - INFO - Step 18080: lr=2.32E-07, loss= 1.0359 (max= 1.8006), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:19,443 - root - INFO - Step 18080: lr=2.32E-07, loss= 1.0359 (max= 1.8006), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:19,443 - root - INFO - Step 18080: lr=2.32E-07, loss= 1.0359 (max= 1.8006), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:19,443 - root - INFO - Step 18080: lr=2.32E-07, loss= 1.0359 (max= 1.8006), tps=20584, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:51,409 - root - INFO - Step 18090: lr=2.29E-07, loss= 1.0373 (max= 1.4183), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:51,409 - root - INFO - Step 18090: lr=2.29E-07, loss= 1.0373 (max= 1.4183), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:51,409 - root - INFO - Step 18090: lr=2.29E-07, loss= 1.0373 (max= 1.4183), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:51,409 - root - INFO - Step 18090: lr=2.29E-07, loss= 1.0373 (max= 1.4183), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:51,409 - root - INFO - Step 18090: lr=2.29E-07, loss= 1.0373 (max= 1.4183), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:51,409 - root - INFO - Step 18090: lr=2.29E-07, loss= 1.0373 (max= 1.4183), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:51,409 - root - INFO - Step 18090: lr=2.29E-07, loss= 1.0373 (max= 1.4183), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:10:51,409 - root - INFO - Step 18090: lr=2.29E-07, loss= 1.0373 (max= 1.4183), tps=20503, mfu=42.72%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:23,239 - root - INFO - Step 18100: lr=2.27E-07, loss= 1.0496 (max= 1.4812), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:23,239 - root - INFO - Step 18100: lr=2.27E-07, loss= 1.0496 (max= 1.4812), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:23,239 - root - INFO - Step 18100: lr=2.27E-07, loss= 1.0496 (max= 1.4812), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:23,239 - root - INFO - Step 18100: lr=2.27E-07, loss= 1.0496 (max= 1.4812), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:23,239 - root - INFO - Step 18100: lr=2.27E-07, loss= 1.0496 (max= 1.4812), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:23,239 - root - INFO - Step 18100: lr=2.27E-07, loss= 1.0496 (max= 1.4812), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:23,239 - root - INFO - Step 18100: lr=2.27E-07, loss= 1.0496 (max= 1.4812), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:23,239 - root - INFO - Step 18100: lr=2.27E-07, loss= 1.0496 (max= 1.4812), tps=20592, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:55,277 - root - INFO - Step 18110: lr=2.24E-07, loss= 1.0528 (max= 1.5235), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:55,277 - root - INFO - Step 18110: lr=2.24E-07, loss= 1.0528 (max= 1.5235), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:55,277 - root - INFO - Step 18110: lr=2.24E-07, loss= 1.0528 (max= 1.5235), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:55,277 - root - INFO - Step 18110: lr=2.24E-07, loss= 1.0528 (max= 1.5235), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:55,277 - root - INFO - Step 18110: lr=2.24E-07, loss= 1.0528 (max= 1.5235), tps=20457, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:55,277 - root - INFO - Step 18110: lr=2.24E-07, loss= 1.0528 (max= 1.5235), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:55,277 - root - INFO - Step 18110: lr=2.24E-07, loss= 1.0528 (max= 1.5235), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:11:55,277 - root - INFO - Step 18110: lr=2.24E-07, loss= 1.0528 (max= 1.5235), tps=20458, mfu=42.62%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:27,068 - root - INFO - Step 18120: lr=2.21E-07, loss= 1.0381 (max= 1.4263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:27,068 - root - INFO - Step 18120: lr=2.21E-07, loss= 1.0381 (max= 1.4263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:27,068 - root - INFO - Step 18120: lr=2.21E-07, loss= 1.0381 (max= 1.4263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:27,068 - root - INFO - Step 18120: lr=2.21E-07, loss= 1.0381 (max= 1.4263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:27,068 - root - INFO - Step 18120: lr=2.21E-07, loss= 1.0381 (max= 1.4263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:27,068 - root - INFO - Step 18120: lr=2.21E-07, loss= 1.0381 (max= 1.4263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:27,068 - root - INFO - Step 18120: lr=2.21E-07, loss= 1.0381 (max= 1.4263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:27,068 - root - INFO - Step 18120: lr=2.21E-07, loss= 1.0381 (max= 1.4263), tps=20617, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:58,965 - root - INFO - Step 18130: lr=2.18E-07, loss= 1.0721 (max= 1.4852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:58,965 - root - INFO - Step 18130: lr=2.18E-07, loss= 1.0721 (max= 1.4852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:58,965 - root - INFO - Step 18130: lr=2.18E-07, loss= 1.0721 (max= 1.4852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:58,965 - root - INFO - Step 18130: lr=2.18E-07, loss= 1.0721 (max= 1.4852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:58,965 - root - INFO - Step 18130: lr=2.18E-07, loss= 1.0721 (max= 1.4852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:58,965 - root - INFO - Step 18130: lr=2.18E-07, loss= 1.0721 (max= 1.4852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:58,965 - root - INFO - Step 18130: lr=2.18E-07, loss= 1.0721 (max= 1.4852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:12:58,965 - root - INFO - Step 18130: lr=2.18E-07, loss= 1.0721 (max= 1.4852), tps=20548, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:13:30,855 - root - INFO - Step 18140: lr=2.16E-07, loss= 1.0386 (max= 1.4805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:13:30,855 - root - INFO - Step 18140: lr=2.16E-07, loss= 1.0386 (max= 1.4805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:13:30,855 - root - INFO - Step 18140: lr=2.16E-07, loss= 1.0386 (max= 1.4805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:13:30,855 - root - INFO - Step 18140: lr=2.16E-07, loss= 1.0386 (max= 1.4805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:13:30,855 - root - INFO - Step 18140: lr=2.16E-07, loss= 1.0386 (max= 1.4805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:13:30,855 - root - INFO - Step 18140: lr=2.16E-07, loss= 1.0386 (max= 1.4805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:13:30,855 - root - INFO - Step 18140: lr=2.16E-07, loss= 1.0386 (max= 1.4805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:13:30,855 - root - INFO - Step 18140: lr=2.16E-07, loss= 1.0386 (max= 1.4805), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:02,661 - root - INFO - Step 18150: lr=2.13E-07, loss= 1.0474 (max= 1.6771), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:02,661 - root - INFO - Step 18150: lr=2.13E-07, loss= 1.0474 (max= 1.6771), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:02,661 - root - INFO - Step 18150: lr=2.13E-07, loss= 1.0474 (max= 1.6771), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:02,661 - root - INFO - Step 18150: lr=2.13E-07, loss= 1.0474 (max= 1.6771), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:02,661 - root - INFO - Step 18150: lr=2.13E-07, loss= 1.0474 (max= 1.6771), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:02,661 - root - INFO - Step 18150: lr=2.13E-07, loss= 1.0474 (max= 1.6771), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:02,661 - root - INFO - Step 18150: lr=2.13E-07, loss= 1.0474 (max= 1.6771), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:02,661 - root - INFO - Step 18150: lr=2.13E-07, loss= 1.0474 (max= 1.6771), tps=20607, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:34,676 - root - INFO - Step 18160: lr=2.10E-07, loss= 1.0467 (max= 1.5562), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:34,676 - root - INFO - Step 18160: lr=2.10E-07, loss= 1.0467 (max= 1.5562), tps=20473, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:34,676 - root - INFO - Step 18160: lr=2.10E-07, loss= 1.0467 (max= 1.5562), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:34,676 - root - INFO - Step 18160: lr=2.10E-07, loss= 1.0467 (max= 1.5562), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:34,676 - root - INFO - Step 18160: lr=2.10E-07, loss= 1.0467 (max= 1.5562), tps=20472, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:34,676 - root - INFO - Step 18160: lr=2.10E-07, loss= 1.0467 (max= 1.5562), tps=20473, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:34,676 - root - INFO - Step 18160: lr=2.10E-07, loss= 1.0467 (max= 1.5562), tps=20473, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:14:34,676 - root - INFO - Step 18160: lr=2.10E-07, loss= 1.0467 (max= 1.5562), tps=20473, mfu=42.65%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:06,502 - root - INFO - Step 18170: lr=2.07E-07, loss= 1.0414 (max= 1.5388), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:06,502 - root - INFO - Step 18170: lr=2.07E-07, loss= 1.0414 (max= 1.5388), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:06,502 - root - INFO - Step 18170: lr=2.07E-07, loss= 1.0414 (max= 1.5388), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:06,502 - root - INFO - Step 18170: lr=2.07E-07, loss= 1.0414 (max= 1.5388), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:06,502 - root - INFO - Step 18170: lr=2.07E-07, loss= 1.0414 (max= 1.5388), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:06,502 - root - INFO - Step 18170: lr=2.07E-07, loss= 1.0414 (max= 1.5388), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:06,502 - root - INFO - Step 18170: lr=2.07E-07, loss= 1.0414 (max= 1.5388), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:06,502 - root - INFO - Step 18170: lr=2.07E-07, loss= 1.0414 (max= 1.5388), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:38,360 - root - INFO - Step 18180: lr=2.05E-07, loss= 1.0817 (max= 1.4804), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:38,360 - root - INFO - Step 18180: lr=2.05E-07, loss= 1.0817 (max= 1.4804), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:38,360 - root - INFO - Step 18180: lr=2.05E-07, loss= 1.0817 (max= 1.4804), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:38,360 - root - INFO - Step 18180: lr=2.05E-07, loss= 1.0817 (max= 1.4804), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:38,360 - root - INFO - Step 18180: lr=2.05E-07, loss= 1.0817 (max= 1.4804), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:38,360 - root - INFO - Step 18180: lr=2.05E-07, loss= 1.0817 (max= 1.4804), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:38,360 - root - INFO - Step 18180: lr=2.05E-07, loss= 1.0817 (max= 1.4804), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:15:38,360 - root - INFO - Step 18180: lr=2.05E-07, loss= 1.0817 (max= 1.4804), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:10,343 - root - INFO - Step 18190: lr=2.02E-07, loss= 1.0509 (max= 1.6748), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:10,343 - root - INFO - Step 18190: lr=2.02E-07, loss= 1.0509 (max= 1.6748), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:10,343 - root - INFO - Step 18190: lr=2.02E-07, loss= 1.0509 (max= 1.6748), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:10,343 - root - INFO - Step 18190: lr=2.02E-07, loss= 1.0509 (max= 1.6748), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:10,343 - root - INFO - Step 18190: lr=2.02E-07, loss= 1.0509 (max= 1.6748), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:10,343 - root - INFO - Step 18190: lr=2.02E-07, loss= 1.0509 (max= 1.6748), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:10,343 - root - INFO - Step 18190: lr=2.02E-07, loss= 1.0509 (max= 1.6748), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:10,343 - root - INFO - Step 18190: lr=2.02E-07, loss= 1.0509 (max= 1.6748), tps=20493, mfu=42.70%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:42,241 - root - INFO - Step 18200: lr=1.99E-07, loss= 1.0491 (max= 1.4232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:42,241 - root - INFO - Step 18200: lr=1.99E-07, loss= 1.0491 (max= 1.4232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:42,241 - root - INFO - Step 18200: lr=1.99E-07, loss= 1.0491 (max= 1.4232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:42,241 - root - INFO - Step 18200: lr=1.99E-07, loss= 1.0491 (max= 1.4232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:42,241 - root - INFO - Step 18200: lr=1.99E-07, loss= 1.0491 (max= 1.4232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:42,241 - root - INFO - Step 18200: lr=1.99E-07, loss= 1.0491 (max= 1.4232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:42,241 - root - INFO - Step 18200: lr=1.99E-07, loss= 1.0491 (max= 1.4232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:16:42,241 - root - INFO - Step 18200: lr=1.99E-07, loss= 1.0491 (max= 1.4232), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:14,043 - root - INFO - Step 18210: lr=1.96E-07, loss= 1.0393 (max= 1.5184), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:14,043 - root - INFO - Step 18210: lr=1.96E-07, loss= 1.0393 (max= 1.5184), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:14,043 - root - INFO - Step 18210: lr=1.96E-07, loss= 1.0393 (max= 1.5184), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:14,043 - root - INFO - Step 18210: lr=1.96E-07, loss= 1.0393 (max= 1.5184), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:14,043 - root - INFO - Step 18210: lr=1.96E-07, loss= 1.0393 (max= 1.5184), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:14,043 - root - INFO - Step 18210: lr=1.96E-07, loss= 1.0393 (max= 1.5184), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:14,043 - root - INFO - Step 18210: lr=1.96E-07, loss= 1.0393 (max= 1.5184), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:14,043 - root - INFO - Step 18210: lr=1.96E-07, loss= 1.0393 (max= 1.5184), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:45,932 - root - INFO - Step 18220: lr=1.93E-07, loss= 1.0329 (max= 1.4816), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:45,932 - root - INFO - Step 18220: lr=1.93E-07, loss= 1.0329 (max= 1.4816), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:45,932 - root - INFO - Step 18220: lr=1.93E-07, loss= 1.0329 (max= 1.4816), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:45,932 - root - INFO - Step 18220: lr=1.93E-07, loss= 1.0329 (max= 1.4816), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:45,932 - root - INFO - Step 18220: lr=1.93E-07, loss= 1.0329 (max= 1.4816), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:45,932 - root - INFO - Step 18220: lr=1.93E-07, loss= 1.0329 (max= 1.4816), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:45,932 - root - INFO - Step 18220: lr=1.93E-07, loss= 1.0329 (max= 1.4816), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:17:45,932 - root - INFO - Step 18220: lr=1.93E-07, loss= 1.0329 (max= 1.4816), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:17,726 - root - INFO - Step 18230: lr=1.91E-07, loss= 1.0527 (max= 1.7317), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:17,727 - root - INFO - Step 18230: lr=1.91E-07, loss= 1.0527 (max= 1.7317), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:17,727 - root - INFO - Step 18230: lr=1.91E-07, loss= 1.0527 (max= 1.7317), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:17,727 - root - INFO - Step 18230: lr=1.91E-07, loss= 1.0527 (max= 1.7317), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:17,727 - root - INFO - Step 18230: lr=1.91E-07, loss= 1.0527 (max= 1.7317), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:17,727 - root - INFO - Step 18230: lr=1.91E-07, loss= 1.0527 (max= 1.7317), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:17,727 - root - INFO - Step 18230: lr=1.91E-07, loss= 1.0527 (max= 1.7317), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:17,727 - root - INFO - Step 18230: lr=1.91E-07, loss= 1.0527 (max= 1.7317), tps=20614, mfu=42.95%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:49,736 - root - INFO - Step 18240: lr=1.88E-07, loss= 1.0450 (max= 1.6832), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:49,736 - root - INFO - Step 18240: lr=1.88E-07, loss= 1.0450 (max= 1.6832), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:49,736 - root - INFO - Step 18240: lr=1.88E-07, loss= 1.0450 (max= 1.6832), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:49,736 - root - INFO - Step 18240: lr=1.88E-07, loss= 1.0450 (max= 1.6832), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:49,736 - root - INFO - Step 18240: lr=1.88E-07, loss= 1.0450 (max= 1.6832), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:49,736 - root - INFO - Step 18240: lr=1.88E-07, loss= 1.0450 (max= 1.6832), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:49,736 - root - INFO - Step 18240: lr=1.88E-07, loss= 1.0450 (max= 1.6832), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:18:49,736 - root - INFO - Step 18240: lr=1.88E-07, loss= 1.0450 (max= 1.6832), tps=20476, mfu=42.66%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:21,595 - root - INFO - Step 18250: lr=1.85E-07, loss= 1.0439 (max= 1.5195), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:21,595 - root - INFO - Step 18250: lr=1.85E-07, loss= 1.0439 (max= 1.5195), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:21,595 - root - INFO - Step 18250: lr=1.85E-07, loss= 1.0439 (max= 1.5195), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:21,595 - root - INFO - Step 18250: lr=1.85E-07, loss= 1.0439 (max= 1.5195), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:21,595 - root - INFO - Step 18250: lr=1.85E-07, loss= 1.0439 (max= 1.5195), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:21,595 - root - INFO - Step 18250: lr=1.85E-07, loss= 1.0439 (max= 1.5195), tps=20573, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:21,595 - root - INFO - Step 18250: lr=1.85E-07, loss= 1.0439 (max= 1.5195), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:21,595 - root - INFO - Step 18250: lr=1.85E-07, loss= 1.0439 (max= 1.5195), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:53,539 - root - INFO - Step 18260: lr=1.82E-07, loss= 1.0429 (max= 1.4760), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:53,539 - root - INFO - Step 18260: lr=1.82E-07, loss= 1.0429 (max= 1.4760), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:53,539 - root - INFO - Step 18260: lr=1.82E-07, loss= 1.0429 (max= 1.4760), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:53,539 - root - INFO - Step 18260: lr=1.82E-07, loss= 1.0429 (max= 1.4760), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:53,539 - root - INFO - Step 18260: lr=1.82E-07, loss= 1.0429 (max= 1.4760), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:53,539 - root - INFO - Step 18260: lr=1.82E-07, loss= 1.0429 (max= 1.4760), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:53,539 - root - INFO - Step 18260: lr=1.82E-07, loss= 1.0429 (max= 1.4760), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:53,539 - root - INFO - Step 18260: lr=1.82E-07, loss= 1.0429 (max= 1.4760), tps=20518, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:19:57,278 - root - INFO - ParquetDataset: entering epoch 1 2025-10-27 00:20:25,313 - root - INFO - Step 18270: lr=1.80E-07, loss= 1.0566 (max= 1.5466), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:25,313 - root - INFO - Step 18270: lr=1.80E-07, loss= 1.0566 (max= 1.5466), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:25,313 - root - INFO - Step 18270: lr=1.80E-07, loss= 1.0566 (max= 1.5466), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:25,313 - root - INFO - Step 18270: lr=1.80E-07, loss= 1.0566 (max= 1.5466), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:25,313 - root - INFO - Step 18270: lr=1.80E-07, loss= 1.0566 (max= 1.5466), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:25,313 - root - INFO - Step 18270: lr=1.80E-07, loss= 1.0566 (max= 1.5466), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:25,313 - root - INFO - Step 18270: lr=1.80E-07, loss= 1.0566 (max= 1.5466), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:25,313 - root - INFO - Step 18270: lr=1.80E-07, loss= 1.0566 (max= 1.5466), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:57,147 - root - INFO - Step 18280: lr=1.77E-07, loss= 1.0428 (max= 1.4580), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:57,147 - root - INFO - Step 18280: lr=1.77E-07, loss= 1.0428 (max= 1.4580), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:57,147 - root - INFO - Step 18280: lr=1.77E-07, loss= 1.0428 (max= 1.4580), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:57,147 - root - INFO - Step 18280: lr=1.77E-07, loss= 1.0428 (max= 1.4580), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:57,147 - root - INFO - Step 18280: lr=1.77E-07, loss= 1.0428 (max= 1.4580), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:57,147 - root - INFO - Step 18280: lr=1.77E-07, loss= 1.0428 (max= 1.4580), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:57,147 - root - INFO - Step 18280: lr=1.77E-07, loss= 1.0428 (max= 1.4580), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:20:57,147 - root - INFO - Step 18280: lr=1.77E-07, loss= 1.0428 (max= 1.4580), tps=20589, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:21:28,980 - root - INFO - Step 18290: lr=1.74E-07, loss= 1.0224 (max= 1.4532), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:21:28,980 - root - INFO - Step 18290: lr=1.74E-07, loss= 1.0224 (max= 1.4532), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:21:28,980 - root - INFO - Step 18290: lr=1.74E-07, loss= 1.0224 (max= 1.4532), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:21:28,980 - root - INFO - Step 18290: lr=1.74E-07, loss= 1.0224 (max= 1.4532), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:21:28,980 - root - INFO - Step 18290: lr=1.74E-07, loss= 1.0224 (max= 1.4532), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:21:28,980 - root - INFO - Step 18290: lr=1.74E-07, loss= 1.0224 (max= 1.4532), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:21:28,980 - root - INFO - Step 18290: lr=1.74E-07, loss= 1.0224 (max= 1.4532), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:21:28,980 - root - INFO - Step 18290: lr=1.74E-07, loss= 1.0224 (max= 1.4532), tps=20590, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:00,799 - root - INFO - Step 18300: lr=1.71E-07, loss= 1.0477 (max= 1.4012), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:00,799 - root - INFO - Step 18300: lr=1.71E-07, loss= 1.0477 (max= 1.4012), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:00,799 - root - INFO - Step 18300: lr=1.71E-07, loss= 1.0477 (max= 1.4012), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:00,799 - root - INFO - Step 18300: lr=1.71E-07, loss= 1.0477 (max= 1.4012), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:00,799 - root - INFO - Step 18300: lr=1.71E-07, loss= 1.0477 (max= 1.4012), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:00,799 - root - INFO - Step 18300: lr=1.71E-07, loss= 1.0477 (max= 1.4012), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:00,799 - root - INFO - Step 18300: lr=1.71E-07, loss= 1.0477 (max= 1.4012), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:00,799 - root - INFO - Step 18300: lr=1.71E-07, loss= 1.0477 (max= 1.4012), tps=20599, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:32,747 - root - INFO - Step 18310: lr=1.69E-07, loss= 1.0429 (max= 1.4751), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:32,747 - root - INFO - Step 18310: lr=1.69E-07, loss= 1.0429 (max= 1.4751), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:32,747 - root - INFO - Step 18310: lr=1.69E-07, loss= 1.0429 (max= 1.4751), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:32,747 - root - INFO - Step 18310: lr=1.69E-07, loss= 1.0429 (max= 1.4751), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:32,747 - root - INFO - Step 18310: lr=1.69E-07, loss= 1.0429 (max= 1.4751), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:32,747 - root - INFO - Step 18310: lr=1.69E-07, loss= 1.0429 (max= 1.4751), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:32,747 - root - INFO - Step 18310: lr=1.69E-07, loss= 1.0429 (max= 1.4751), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:22:32,747 - root - INFO - Step 18310: lr=1.69E-07, loss= 1.0429 (max= 1.4751), tps=20515, mfu=42.74%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:04,617 - root - INFO - Step 18320: lr=1.66E-07, loss= 1.0674 (max= 1.5050), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:04,617 - root - INFO - Step 18320: lr=1.66E-07, loss= 1.0674 (max= 1.5050), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:04,617 - root - INFO - Step 18320: lr=1.66E-07, loss= 1.0674 (max= 1.5050), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:04,617 - root - INFO - Step 18320: lr=1.66E-07, loss= 1.0674 (max= 1.5050), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:04,617 - root - INFO - Step 18320: lr=1.66E-07, loss= 1.0674 (max= 1.5050), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:04,617 - root - INFO - Step 18320: lr=1.66E-07, loss= 1.0674 (max= 1.5050), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:04,617 - root - INFO - Step 18320: lr=1.66E-07, loss= 1.0674 (max= 1.5050), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:04,617 - root - INFO - Step 18320: lr=1.66E-07, loss= 1.0674 (max= 1.5050), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:36,553 - root - INFO - Step 18330: lr=1.63E-07, loss= 1.0517 (max= 1.4761), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:36,553 - root - INFO - Step 18330: lr=1.63E-07, loss= 1.0517 (max= 1.4761), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:36,553 - root - INFO - Step 18330: lr=1.63E-07, loss= 1.0517 (max= 1.4761), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:36,553 - root - INFO - Step 18330: lr=1.63E-07, loss= 1.0517 (max= 1.4761), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:36,553 - root - INFO - Step 18330: lr=1.63E-07, loss= 1.0517 (max= 1.4761), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:36,553 - root - INFO - Step 18330: lr=1.63E-07, loss= 1.0517 (max= 1.4761), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:36,553 - root - INFO - Step 18330: lr=1.63E-07, loss= 1.0517 (max= 1.4761), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:23:36,553 - root - INFO - Step 18330: lr=1.63E-07, loss= 1.0517 (max= 1.4761), tps=20523, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:08,354 - root - INFO - Step 18340: lr=1.60E-07, loss= 1.0481 (max= 1.5238), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:08,354 - root - INFO - Step 18340: lr=1.60E-07, loss= 1.0481 (max= 1.5238), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:08,354 - root - INFO - Step 18340: lr=1.60E-07, loss= 1.0481 (max= 1.5238), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:08,354 - root - INFO - Step 18340: lr=1.60E-07, loss= 1.0481 (max= 1.5238), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:08,354 - root - INFO - Step 18340: lr=1.60E-07, loss= 1.0481 (max= 1.5238), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:08,354 - root - INFO - Step 18340: lr=1.60E-07, loss= 1.0481 (max= 1.5238), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:08,354 - root - INFO - Step 18340: lr=1.60E-07, loss= 1.0481 (max= 1.5238), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:08,354 - root - INFO - Step 18340: lr=1.60E-07, loss= 1.0481 (max= 1.5238), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:17,055 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:1038922 2025-10-27 00:24:40,237 - root - INFO - Step 18350: lr=1.58E-07, loss= 1.0506 (max= 1.6142), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:40,238 - root - INFO - Step 18350: lr=1.58E-07, loss= 1.0506 (max= 1.6142), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:40,238 - root - INFO - Step 18350: lr=1.58E-07, loss= 1.0506 (max= 1.6142), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:40,238 - root - INFO - Step 18350: lr=1.58E-07, loss= 1.0506 (max= 1.6142), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:40,238 - root - INFO - Step 18350: lr=1.58E-07, loss= 1.0506 (max= 1.6142), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:40,238 - root - INFO - Step 18350: lr=1.58E-07, loss= 1.0506 (max= 1.6142), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:40,238 - root - INFO - Step 18350: lr=1.58E-07, loss= 1.0506 (max= 1.6142), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:40,238 - root - INFO - Step 18350: lr=1.58E-07, loss= 1.0506 (max= 1.6142), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:24:56,913 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:5546760 2025-10-27 00:25:12,403 - root - INFO - Step 18360: lr=1.55E-07, loss= 1.0157 (max= 1.5450), tps=20377, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:12,403 - root - INFO - Step 18360: lr=1.55E-07, loss= 1.0157 (max= 1.5450), tps=20377, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:12,403 - root - INFO - Step 18360: lr=1.55E-07, loss= 1.0157 (max= 1.5450), tps=20377, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:12,403 - root - INFO - Step 18360: lr=1.55E-07, loss= 1.0157 (max= 1.5450), tps=20377, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:12,403 - root - INFO - Step 18360: lr=1.55E-07, loss= 1.0157 (max= 1.5450), tps=20377, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:12,404 - root - INFO - Step 18360: lr=1.55E-07, loss= 1.0157 (max= 1.5450), tps=20377, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:12,404 - root - INFO - Step 18360: lr=1.55E-07, loss= 1.0157 (max= 1.5450), tps=20376, mfu=42.45%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:12,404 - root - INFO - Step 18360: lr=1.55E-07, loss= 1.0157 (max= 1.5450), tps=20377, mfu=42.46%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:44,350 - root - INFO - Step 18370: lr=1.52E-07, loss= 1.0647 (max= 1.6692), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:44,350 - root - INFO - Step 18370: lr=1.52E-07, loss= 1.0647 (max= 1.6692), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:44,350 - root - INFO - Step 18370: lr=1.52E-07, loss= 1.0647 (max= 1.6692), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:44,350 - root - INFO - Step 18370: lr=1.52E-07, loss= 1.0647 (max= 1.6692), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:44,350 - root - INFO - Step 18370: lr=1.52E-07, loss= 1.0647 (max= 1.6692), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:44,350 - root - INFO - Step 18370: lr=1.52E-07, loss= 1.0647 (max= 1.6692), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:44,350 - root - INFO - Step 18370: lr=1.52E-07, loss= 1.0647 (max= 1.6692), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:25:44,350 - root - INFO - Step 18370: lr=1.52E-07, loss= 1.0647 (max= 1.6692), tps=20516, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:16,197 - root - INFO - Step 18380: lr=1.49E-07, loss= 1.0344 (max= 1.4252), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:16,197 - root - INFO - Step 18380: lr=1.49E-07, loss= 1.0344 (max= 1.4252), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:16,197 - root - INFO - Step 18380: lr=1.49E-07, loss= 1.0344 (max= 1.4252), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:16,197 - root - INFO - Step 18380: lr=1.49E-07, loss= 1.0344 (max= 1.4252), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:16,197 - root - INFO - Step 18380: lr=1.49E-07, loss= 1.0344 (max= 1.4252), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:16,197 - root - INFO - Step 18380: lr=1.49E-07, loss= 1.0344 (max= 1.4252), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:16,197 - root - INFO - Step 18380: lr=1.49E-07, loss= 1.0344 (max= 1.4252), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:16,197 - root - INFO - Step 18380: lr=1.49E-07, loss= 1.0344 (max= 1.4252), tps=20581, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:48,105 - root - INFO - Step 18390: lr=1.47E-07, loss= 1.0699 (max= 1.5988), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:48,105 - root - INFO - Step 18390: lr=1.47E-07, loss= 1.0699 (max= 1.5988), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:48,105 - root - INFO - Step 18390: lr=1.47E-07, loss= 1.0699 (max= 1.5988), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:48,105 - root - INFO - Step 18390: lr=1.47E-07, loss= 1.0699 (max= 1.5988), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:48,105 - root - INFO - Step 18390: lr=1.47E-07, loss= 1.0699 (max= 1.5988), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:48,105 - root - INFO - Step 18390: lr=1.47E-07, loss= 1.0699 (max= 1.5988), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:48,105 - root - INFO - Step 18390: lr=1.47E-07, loss= 1.0699 (max= 1.5988), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:26:48,105 - root - INFO - Step 18390: lr=1.47E-07, loss= 1.0699 (max= 1.5988), tps=20541, mfu=42.80%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:19,994 - root - INFO - Step 18400: lr=1.44E-07, loss= 1.0238 (max= 1.4955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:19,994 - root - INFO - Step 18400: lr=1.44E-07, loss= 1.0238 (max= 1.4955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:19,994 - root - INFO - Step 18400: lr=1.44E-07, loss= 1.0238 (max= 1.4955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:19,994 - root - INFO - Step 18400: lr=1.44E-07, loss= 1.0238 (max= 1.4955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:19,995 - root - INFO - Step 18400: lr=1.44E-07, loss= 1.0238 (max= 1.4955), tps=20554, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:19,995 - root - INFO - Step 18400: lr=1.44E-07, loss= 1.0238 (max= 1.4955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:19,995 - root - INFO - Step 18400: lr=1.44E-07, loss= 1.0238 (max= 1.4955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:19,995 - root - INFO - Step 18400: lr=1.44E-07, loss= 1.0238 (max= 1.4955), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:51,774 - root - INFO - Step 18410: lr=1.41E-07, loss= 1.0447 (max= 1.5246), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:51,774 - root - INFO - Step 18410: lr=1.41E-07, loss= 1.0447 (max= 1.5246), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:51,774 - root - INFO - Step 18410: lr=1.41E-07, loss= 1.0447 (max= 1.5246), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:51,774 - root - INFO - Step 18410: lr=1.41E-07, loss= 1.0447 (max= 1.5246), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:51,774 - root - INFO - Step 18410: lr=1.41E-07, loss= 1.0447 (max= 1.5246), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:51,774 - root - INFO - Step 18410: lr=1.41E-07, loss= 1.0447 (max= 1.5246), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:51,774 - root - INFO - Step 18410: lr=1.41E-07, loss= 1.0447 (max= 1.5246), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:27:51,774 - root - INFO - Step 18410: lr=1.41E-07, loss= 1.0447 (max= 1.5246), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:23,717 - root - INFO - Step 18420: lr=1.38E-07, loss= 1.0658 (max= 1.5438), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:23,717 - root - INFO - Step 18420: lr=1.38E-07, loss= 1.0658 (max= 1.5438), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:23,717 - root - INFO - Step 18420: lr=1.38E-07, loss= 1.0658 (max= 1.5438), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:23,717 - root - INFO - Step 18420: lr=1.38E-07, loss= 1.0658 (max= 1.5438), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:23,717 - root - INFO - Step 18420: lr=1.38E-07, loss= 1.0658 (max= 1.5438), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:23,717 - root - INFO - Step 18420: lr=1.38E-07, loss= 1.0658 (max= 1.5438), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:23,717 - root - INFO - Step 18420: lr=1.38E-07, loss= 1.0658 (max= 1.5438), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:23,717 - root - INFO - Step 18420: lr=1.38E-07, loss= 1.0658 (max= 1.5438), tps=20519, mfu=42.75%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:55,526 - root - INFO - Step 18430: lr=1.36E-07, loss= 1.0350 (max= 1.5077), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:55,526 - root - INFO - Step 18430: lr=1.36E-07, loss= 1.0350 (max= 1.5077), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:55,526 - root - INFO - Step 18430: lr=1.36E-07, loss= 1.0350 (max= 1.5077), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:55,526 - root - INFO - Step 18430: lr=1.36E-07, loss= 1.0350 (max= 1.5077), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:55,526 - root - INFO - Step 18430: lr=1.36E-07, loss= 1.0350 (max= 1.5077), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:55,526 - root - INFO - Step 18430: lr=1.36E-07, loss= 1.0350 (max= 1.5077), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:55,526 - root - INFO - Step 18430: lr=1.36E-07, loss= 1.0350 (max= 1.5077), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:28:55,526 - root - INFO - Step 18430: lr=1.36E-07, loss= 1.0350 (max= 1.5077), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:27,378 - root - INFO - Step 18440: lr=1.33E-07, loss= 1.0497 (max= 1.5349), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:27,378 - root - INFO - Step 18440: lr=1.33E-07, loss= 1.0497 (max= 1.5349), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:27,378 - root - INFO - Step 18440: lr=1.33E-07, loss= 1.0497 (max= 1.5349), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:27,378 - root - INFO - Step 18440: lr=1.33E-07, loss= 1.0497 (max= 1.5349), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:27,378 - root - INFO - Step 18440: lr=1.33E-07, loss= 1.0497 (max= 1.5349), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:27,378 - root - INFO - Step 18440: lr=1.33E-07, loss= 1.0497 (max= 1.5349), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:27,378 - root - INFO - Step 18440: lr=1.33E-07, loss= 1.0497 (max= 1.5349), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:27,378 - root - INFO - Step 18440: lr=1.33E-07, loss= 1.0497 (max= 1.5349), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:59,189 - root - INFO - Step 18450: lr=1.30E-07, loss= 1.0679 (max= 1.5404), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:59,189 - root - INFO - Step 18450: lr=1.30E-07, loss= 1.0679 (max= 1.5404), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:59,189 - root - INFO - Step 18450: lr=1.30E-07, loss= 1.0679 (max= 1.5404), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:59,189 - root - INFO - Step 18450: lr=1.30E-07, loss= 1.0679 (max= 1.5404), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:59,189 - root - INFO - Step 18450: lr=1.30E-07, loss= 1.0679 (max= 1.5404), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:59,189 - root - INFO - Step 18450: lr=1.30E-07, loss= 1.0679 (max= 1.5404), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:59,189 - root - INFO - Step 18450: lr=1.30E-07, loss= 1.0679 (max= 1.5404), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:29:59,189 - root - INFO - Step 18450: lr=1.30E-07, loss= 1.0679 (max= 1.5404), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:30:31,055 - root - INFO - Step 18460: lr=1.27E-07, loss= 1.0779 (max= 1.7701), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:30:31,055 - root - INFO - Step 18460: lr=1.27E-07, loss= 1.0779 (max= 1.7701), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:30:31,055 - root - INFO - Step 18460: lr=1.27E-07, loss= 1.0779 (max= 1.7701), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:30:31,055 - root - INFO - Step 18460: lr=1.27E-07, loss= 1.0779 (max= 1.7701), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:30:31,055 - root - INFO - Step 18460: lr=1.27E-07, loss= 1.0779 (max= 1.7701), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:30:31,055 - root - INFO - Step 18460: lr=1.27E-07, loss= 1.0779 (max= 1.7701), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:30:31,055 - root - INFO - Step 18460: lr=1.27E-07, loss= 1.0779 (max= 1.7701), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:30:31,055 - root - INFO - Step 18460: lr=1.27E-07, loss= 1.0779 (max= 1.7701), tps=20568, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:02,821 - root - INFO - Step 18470: lr=1.25E-07, loss= 1.0249 (max= 1.4601), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:02,821 - root - INFO - Step 18470: lr=1.25E-07, loss= 1.0249 (max= 1.4601), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:02,822 - root - INFO - Step 18470: lr=1.25E-07, loss= 1.0249 (max= 1.4601), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:02,821 - root - INFO - Step 18470: lr=1.25E-07, loss= 1.0249 (max= 1.4601), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:02,822 - root - INFO - Step 18470: lr=1.25E-07, loss= 1.0249 (max= 1.4601), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:02,822 - root - INFO - Step 18470: lr=1.25E-07, loss= 1.0249 (max= 1.4601), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:02,822 - root - INFO - Step 18470: lr=1.25E-07, loss= 1.0249 (max= 1.4601), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:02,822 - root - INFO - Step 18470: lr=1.25E-07, loss= 1.0249 (max= 1.4601), tps=20633, mfu=42.99%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:34,636 - root - INFO - Step 18480: lr=1.22E-07, loss= 1.0555 (max= 1.5174), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:34,636 - root - INFO - Step 18480: lr=1.22E-07, loss= 1.0555 (max= 1.5174), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:34,636 - root - INFO - Step 18480: lr=1.22E-07, loss= 1.0555 (max= 1.5174), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:34,636 - root - INFO - Step 18480: lr=1.22E-07, loss= 1.0555 (max= 1.5174), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:34,636 - root - INFO - Step 18480: lr=1.22E-07, loss= 1.0555 (max= 1.5174), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:34,636 - root - INFO - Step 18480: lr=1.22E-07, loss= 1.0555 (max= 1.5174), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:34,636 - root - INFO - Step 18480: lr=1.22E-07, loss= 1.0555 (max= 1.5174), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:31:34,636 - root - INFO - Step 18480: lr=1.22E-07, loss= 1.0555 (max= 1.5174), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:06,450 - root - INFO - Step 18490: lr=1.19E-07, loss= 1.0686 (max= 1.6462), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:06,450 - root - INFO - Step 18490: lr=1.19E-07, loss= 1.0686 (max= 1.6462), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:06,450 - root - INFO - Step 18490: lr=1.19E-07, loss= 1.0686 (max= 1.6462), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:06,450 - root - INFO - Step 18490: lr=1.19E-07, loss= 1.0686 (max= 1.6462), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:06,450 - root - INFO - Step 18490: lr=1.19E-07, loss= 1.0686 (max= 1.6462), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:06,450 - root - INFO - Step 18490: lr=1.19E-07, loss= 1.0686 (max= 1.6462), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:06,450 - root - INFO - Step 18490: lr=1.19E-07, loss= 1.0686 (max= 1.6462), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:06,450 - root - INFO - Step 18490: lr=1.19E-07, loss= 1.0686 (max= 1.6462), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:38,265 - root - INFO - Step 18500: lr=1.16E-07, loss= 1.0691 (max= 1.4964), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:38,265 - root - INFO - Step 18500: lr=1.16E-07, loss= 1.0691 (max= 1.4964), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:38,265 - root - INFO - Step 18500: lr=1.16E-07, loss= 1.0691 (max= 1.4964), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:38,265 - root - INFO - Step 18500: lr=1.16E-07, loss= 1.0691 (max= 1.4964), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:38,265 - root - INFO - Step 18500: lr=1.16E-07, loss= 1.0691 (max= 1.4964), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:38,265 - root - INFO - Step 18500: lr=1.16E-07, loss= 1.0691 (max= 1.4964), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:38,265 - root - INFO - Step 18500: lr=1.16E-07, loss= 1.0691 (max= 1.4964), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:32:38,265 - root - INFO - Step 18500: lr=1.16E-07, loss= 1.0691 (max= 1.4964), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:10,070 - root - INFO - Step 18510: lr=1.14E-07, loss= 1.0515 (max= 1.4732), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:10,070 - root - INFO - Step 18510: lr=1.14E-07, loss= 1.0515 (max= 1.4732), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:10,070 - root - INFO - Step 18510: lr=1.14E-07, loss= 1.0515 (max= 1.4732), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:10,070 - root - INFO - Step 18510: lr=1.14E-07, loss= 1.0515 (max= 1.4732), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:10,070 - root - INFO - Step 18510: lr=1.14E-07, loss= 1.0515 (max= 1.4732), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:10,070 - root - INFO - Step 18510: lr=1.14E-07, loss= 1.0515 (max= 1.4732), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:10,070 - root - INFO - Step 18510: lr=1.14E-07, loss= 1.0515 (max= 1.4732), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:10,070 - root - INFO - Step 18510: lr=1.14E-07, loss= 1.0515 (max= 1.4732), tps=20608, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:41,952 - root - INFO - Step 18520: lr=1.11E-07, loss= 1.0575 (max= 1.6809), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:41,952 - root - INFO - Step 18520: lr=1.11E-07, loss= 1.0575 (max= 1.6809), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:41,952 - root - INFO - Step 18520: lr=1.11E-07, loss= 1.0575 (max= 1.6809), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:41,952 - root - INFO - Step 18520: lr=1.11E-07, loss= 1.0575 (max= 1.6809), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:41,952 - root - INFO - Step 18520: lr=1.11E-07, loss= 1.0575 (max= 1.6809), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:41,952 - root - INFO - Step 18520: lr=1.11E-07, loss= 1.0575 (max= 1.6809), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:41,952 - root - INFO - Step 18520: lr=1.11E-07, loss= 1.0575 (max= 1.6809), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:33:41,952 - root - INFO - Step 18520: lr=1.11E-07, loss= 1.0575 (max= 1.6809), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:13,803 - root - INFO - Step 18530: lr=1.08E-07, loss= 1.0345 (max= 1.4849), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:13,803 - root - INFO - Step 18530: lr=1.08E-07, loss= 1.0345 (max= 1.4849), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:13,803 - root - INFO - Step 18530: lr=1.08E-07, loss= 1.0345 (max= 1.4849), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:13,804 - root - INFO - Step 18530: lr=1.08E-07, loss= 1.0345 (max= 1.4849), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:13,804 - root - INFO - Step 18530: lr=1.08E-07, loss= 1.0345 (max= 1.4849), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:13,804 - root - INFO - Step 18530: lr=1.08E-07, loss= 1.0345 (max= 1.4849), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:13,804 - root - INFO - Step 18530: lr=1.08E-07, loss= 1.0345 (max= 1.4849), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:13,804 - root - INFO - Step 18530: lr=1.08E-07, loss= 1.0345 (max= 1.4849), tps=20578, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:45,583 - root - INFO - Step 18540: lr=1.05E-07, loss= 1.0483 (max= 1.6502), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:45,583 - root - INFO - Step 18540: lr=1.05E-07, loss= 1.0483 (max= 1.6502), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:45,583 - root - INFO - Step 18540: lr=1.05E-07, loss= 1.0483 (max= 1.6502), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:45,583 - root - INFO - Step 18540: lr=1.05E-07, loss= 1.0483 (max= 1.6502), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:45,583 - root - INFO - Step 18540: lr=1.05E-07, loss= 1.0483 (max= 1.6502), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:45,583 - root - INFO - Step 18540: lr=1.05E-07, loss= 1.0483 (max= 1.6502), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:45,583 - root - INFO - Step 18540: lr=1.05E-07, loss= 1.0483 (max= 1.6502), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:34:45,583 - root - INFO - Step 18540: lr=1.05E-07, loss= 1.0483 (max= 1.6502), tps=20624, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:17,408 - root - INFO - Step 18550: lr=1.03E-07, loss= 1.0513 (max= 1.6167), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:17,408 - root - INFO - Step 18550: lr=1.03E-07, loss= 1.0513 (max= 1.6167), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:17,408 - root - INFO - Step 18550: lr=1.03E-07, loss= 1.0513 (max= 1.6167), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:17,408 - root - INFO - Step 18550: lr=1.03E-07, loss= 1.0513 (max= 1.6167), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:17,408 - root - INFO - Step 18550: lr=1.03E-07, loss= 1.0513 (max= 1.6167), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:17,408 - root - INFO - Step 18550: lr=1.03E-07, loss= 1.0513 (max= 1.6167), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:17,408 - root - INFO - Step 18550: lr=1.03E-07, loss= 1.0513 (max= 1.6167), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:17,408 - root - INFO - Step 18550: lr=1.03E-07, loss= 1.0513 (max= 1.6167), tps=20594, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:49,272 - root - INFO - Step 18560: lr=9.98E-08, loss= 1.0444 (max= 1.4978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:49,273 - root - INFO - Step 18560: lr=9.98E-08, loss= 1.0444 (max= 1.4978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:49,273 - root - INFO - Step 18560: lr=9.98E-08, loss= 1.0444 (max= 1.4978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:49,273 - root - INFO - Step 18560: lr=9.98E-08, loss= 1.0444 (max= 1.4978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:49,273 - root - INFO - Step 18560: lr=9.98E-08, loss= 1.0444 (max= 1.4978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:49,273 - root - INFO - Step 18560: lr=9.98E-08, loss= 1.0444 (max= 1.4978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:49,273 - root - INFO - Step 18560: lr=9.98E-08, loss= 1.0444 (max= 1.4978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:35:49,273 - root - INFO - Step 18560: lr=9.98E-08, loss= 1.0444 (max= 1.4978), tps=20569, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:21,154 - root - INFO - Step 18570: lr=9.71E-08, loss= 1.0475 (max= 1.7637), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:21,154 - root - INFO - Step 18570: lr=9.71E-08, loss= 1.0475 (max= 1.7637), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:21,154 - root - INFO - Step 18570: lr=9.71E-08, loss= 1.0475 (max= 1.7637), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:21,154 - root - INFO - Step 18570: lr=9.71E-08, loss= 1.0475 (max= 1.7637), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:21,154 - root - INFO - Step 18570: lr=9.71E-08, loss= 1.0475 (max= 1.7637), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:21,154 - root - INFO - Step 18570: lr=9.71E-08, loss= 1.0475 (max= 1.7637), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:21,154 - root - INFO - Step 18570: lr=9.71E-08, loss= 1.0475 (max= 1.7637), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:21,154 - root - INFO - Step 18570: lr=9.71E-08, loss= 1.0475 (max= 1.7637), tps=20558, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:53,025 - root - INFO - Step 18580: lr=9.43E-08, loss= 1.0487 (max= 1.6753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:53,025 - root - INFO - Step 18580: lr=9.43E-08, loss= 1.0487 (max= 1.6753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:53,025 - root - INFO - Step 18580: lr=9.43E-08, loss= 1.0487 (max= 1.6753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:53,025 - root - INFO - Step 18580: lr=9.43E-08, loss= 1.0487 (max= 1.6753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:53,025 - root - INFO - Step 18580: lr=9.43E-08, loss= 1.0487 (max= 1.6753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:53,025 - root - INFO - Step 18580: lr=9.43E-08, loss= 1.0487 (max= 1.6753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:53,025 - root - INFO - Step 18580: lr=9.43E-08, loss= 1.0487 (max= 1.6753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:36:53,025 - root - INFO - Step 18580: lr=9.43E-08, loss= 1.0487 (max= 1.6753), tps=20565, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:24,873 - root - INFO - Step 18590: lr=9.16E-08, loss= 1.0663 (max= 1.5268), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:24,873 - root - INFO - Step 18590: lr=9.16E-08, loss= 1.0663 (max= 1.5268), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:24,873 - root - INFO - Step 18590: lr=9.16E-08, loss= 1.0663 (max= 1.5268), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:24,873 - root - INFO - Step 18590: lr=9.16E-08, loss= 1.0663 (max= 1.5268), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:24,873 - root - INFO - Step 18590: lr=9.16E-08, loss= 1.0663 (max= 1.5268), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:24,873 - root - INFO - Step 18590: lr=9.16E-08, loss= 1.0663 (max= 1.5268), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:24,873 - root - INFO - Step 18590: lr=9.16E-08, loss= 1.0663 (max= 1.5268), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:24,873 - root - INFO - Step 18590: lr=9.16E-08, loss= 1.0663 (max= 1.5268), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:56,734 - root - INFO - Step 18600: lr=8.89E-08, loss= 1.0339 (max= 1.6432), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:56,734 - root - INFO - Step 18600: lr=8.89E-08, loss= 1.0339 (max= 1.6432), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:56,734 - root - INFO - Step 18600: lr=8.89E-08, loss= 1.0339 (max= 1.6432), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:56,734 - root - INFO - Step 18600: lr=8.89E-08, loss= 1.0339 (max= 1.6432), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:56,734 - root - INFO - Step 18600: lr=8.89E-08, loss= 1.0339 (max= 1.6432), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:56,734 - root - INFO - Step 18600: lr=8.89E-08, loss= 1.0339 (max= 1.6432), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:56,734 - root - INFO - Step 18600: lr=8.89E-08, loss= 1.0339 (max= 1.6432), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:37:56,734 - root - INFO - Step 18600: lr=8.89E-08, loss= 1.0339 (max= 1.6432), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:38:28,588 - root - INFO - Step 18610: lr=8.61E-08, loss= 1.0597 (max= 1.5158), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:38:28,588 - root - INFO - Step 18610: lr=8.61E-08, loss= 1.0597 (max= 1.5158), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:38:28,588 - root - INFO - Step 18610: lr=8.61E-08, loss= 1.0597 (max= 1.5158), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:38:28,588 - root - INFO - Step 18610: lr=8.61E-08, loss= 1.0597 (max= 1.5158), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:38:28,588 - root - INFO - Step 18610: lr=8.61E-08, loss= 1.0597 (max= 1.5158), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:38:28,588 - root - INFO - Step 18610: lr=8.61E-08, loss= 1.0597 (max= 1.5158), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:38:28,588 - root - INFO - Step 18610: lr=8.61E-08, loss= 1.0597 (max= 1.5158), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:38:28,588 - root - INFO - Step 18610: lr=8.61E-08, loss= 1.0597 (max= 1.5158), tps=20576, mfu=42.87%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:00,521 - root - INFO - Step 18620: lr=8.34E-08, loss= 1.0426 (max= 1.5097), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:00,521 - root - INFO - Step 18620: lr=8.34E-08, loss= 1.0426 (max= 1.5097), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:00,521 - root - INFO - Step 18620: lr=8.34E-08, loss= 1.0426 (max= 1.5097), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:00,521 - root - INFO - Step 18620: lr=8.34E-08, loss= 1.0426 (max= 1.5097), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:00,521 - root - INFO - Step 18620: lr=8.34E-08, loss= 1.0426 (max= 1.5097), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:00,521 - root - INFO - Step 18620: lr=8.34E-08, loss= 1.0426 (max= 1.5097), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:00,521 - root - INFO - Step 18620: lr=8.34E-08, loss= 1.0426 (max= 1.5097), tps=20526, mfu=42.77%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:00,521 - root - INFO - Step 18620: lr=8.34E-08, loss= 1.0426 (max= 1.5097), tps=20525, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:32,369 - root - INFO - Step 18630: lr=8.06E-08, loss= 1.0703 (max= 1.8103), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:32,370 - root - INFO - Step 18630: lr=8.06E-08, loss= 1.0703 (max= 1.8103), tps=20579, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:32,370 - root - INFO - Step 18630: lr=8.06E-08, loss= 1.0703 (max= 1.8103), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:32,370 - root - INFO - Step 18630: lr=8.06E-08, loss= 1.0703 (max= 1.8103), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:32,370 - root - INFO - Step 18630: lr=8.06E-08, loss= 1.0703 (max= 1.8103), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:32,370 - root - INFO - Step 18630: lr=8.06E-08, loss= 1.0703 (max= 1.8103), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:32,370 - root - INFO - Step 18630: lr=8.06E-08, loss= 1.0703 (max= 1.8103), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:39:32,370 - root - INFO - Step 18630: lr=8.06E-08, loss= 1.0703 (max= 1.8103), tps=20580, mfu=42.88%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:04,286 - root - INFO - Step 18640: lr=7.79E-08, loss= 1.0537 (max= 1.4831), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:04,286 - root - INFO - Step 18640: lr=7.79E-08, loss= 1.0537 (max= 1.4831), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:04,286 - root - INFO - Step 18640: lr=7.79E-08, loss= 1.0537 (max= 1.4831), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:04,286 - root - INFO - Step 18640: lr=7.79E-08, loss= 1.0537 (max= 1.4831), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:04,286 - root - INFO - Step 18640: lr=7.79E-08, loss= 1.0537 (max= 1.4831), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:04,286 - root - INFO - Step 18640: lr=7.79E-08, loss= 1.0537 (max= 1.4831), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:04,286 - root - INFO - Step 18640: lr=7.79E-08, loss= 1.0537 (max= 1.4831), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:04,286 - root - INFO - Step 18640: lr=7.79E-08, loss= 1.0537 (max= 1.4831), tps=20536, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:36,176 - root - INFO - Step 18650: lr=7.52E-08, loss= 1.0517 (max= 1.6315), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:36,176 - root - INFO - Step 18650: lr=7.52E-08, loss= 1.0517 (max= 1.6315), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:36,176 - root - INFO - Step 18650: lr=7.52E-08, loss= 1.0517 (max= 1.6315), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:36,176 - root - INFO - Step 18650: lr=7.52E-08, loss= 1.0517 (max= 1.6315), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:36,176 - root - INFO - Step 18650: lr=7.52E-08, loss= 1.0517 (max= 1.6315), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:36,176 - root - INFO - Step 18650: lr=7.52E-08, loss= 1.0517 (max= 1.6315), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:36,176 - root - INFO - Step 18650: lr=7.52E-08, loss= 1.0517 (max= 1.6315), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:40:36,176 - root - INFO - Step 18650: lr=7.52E-08, loss= 1.0517 (max= 1.6315), tps=20553, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:07,999 - root - INFO - Step 18660: lr=7.24E-08, loss= 1.0509 (max= 1.4679), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:07,999 - root - INFO - Step 18660: lr=7.24E-08, loss= 1.0509 (max= 1.4679), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:07,999 - root - INFO - Step 18660: lr=7.24E-08, loss= 1.0509 (max= 1.4679), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:07,999 - root - INFO - Step 18660: lr=7.24E-08, loss= 1.0509 (max= 1.4679), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:07,999 - root - INFO - Step 18660: lr=7.24E-08, loss= 1.0509 (max= 1.4679), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:07,999 - root - INFO - Step 18660: lr=7.24E-08, loss= 1.0509 (max= 1.4679), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:07,999 - root - INFO - Step 18660: lr=7.24E-08, loss= 1.0509 (max= 1.4679), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:07,999 - root - INFO - Step 18660: lr=7.24E-08, loss= 1.0509 (max= 1.4679), tps=20596, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:39,810 - root - INFO - Step 18670: lr=6.97E-08, loss= 1.0633 (max= 1.5119), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:39,810 - root - INFO - Step 18670: lr=6.97E-08, loss= 1.0633 (max= 1.5119), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:39,810 - root - INFO - Step 18670: lr=6.97E-08, loss= 1.0633 (max= 1.5119), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:39,810 - root - INFO - Step 18670: lr=6.97E-08, loss= 1.0633 (max= 1.5119), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:39,810 - root - INFO - Step 18670: lr=6.97E-08, loss= 1.0633 (max= 1.5119), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:39,810 - root - INFO - Step 18670: lr=6.97E-08, loss= 1.0633 (max= 1.5119), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:39,810 - root - INFO - Step 18670: lr=6.97E-08, loss= 1.0633 (max= 1.5119), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:41:39,810 - root - INFO - Step 18670: lr=6.97E-08, loss= 1.0633 (max= 1.5119), tps=20604, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:11,880 - root - INFO - Step 18680: lr=6.70E-08, loss= 1.0668 (max= 1.6498), tps=20437, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:11,880 - root - INFO - Step 18680: lr=6.70E-08, loss= 1.0668 (max= 1.6498), tps=20437, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:11,880 - root - INFO - Step 18680: lr=6.70E-08, loss= 1.0668 (max= 1.6498), tps=20437, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:11,880 - root - INFO - Step 18680: lr=6.70E-08, loss= 1.0668 (max= 1.6498), tps=20437, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:11,880 - root - INFO - Step 18680: lr=6.70E-08, loss= 1.0668 (max= 1.6498), tps=20437, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:11,880 - root - INFO - Step 18680: lr=6.70E-08, loss= 1.0668 (max= 1.6498), tps=20437, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:11,880 - root - INFO - Step 18680: lr=6.70E-08, loss= 1.0668 (max= 1.6498), tps=20437, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:11,881 - root - INFO - Step 18680: lr=6.70E-08, loss= 1.0668 (max= 1.6498), tps=20437, mfu=42.58%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:44,037 - root - INFO - Step 18690: lr=6.42E-08, loss= 1.0698 (max= 1.4771), tps=20382, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:44,037 - root - INFO - Step 18690: lr=6.42E-08, loss= 1.0698 (max= 1.4771), tps=20382, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:44,037 - root - INFO - Step 18690: lr=6.42E-08, loss= 1.0698 (max= 1.4771), tps=20382, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:44,037 - root - INFO - Step 18690: lr=6.42E-08, loss= 1.0698 (max= 1.4771), tps=20382, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:44,037 - root - INFO - Step 18690: lr=6.42E-08, loss= 1.0698 (max= 1.4771), tps=20382, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:44,037 - root - INFO - Step 18690: lr=6.42E-08, loss= 1.0698 (max= 1.4771), tps=20382, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:44,037 - root - INFO - Step 18690: lr=6.42E-08, loss= 1.0698 (max= 1.4771), tps=20382, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:42:44,037 - root - INFO - Step 18690: lr=6.42E-08, loss= 1.0698 (max= 1.4771), tps=20382, mfu=42.47%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:15,931 - root - INFO - Step 18700: lr=6.15E-08, loss= 1.0516 (max= 1.5000), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:15,931 - root - INFO - Step 18700: lr=6.15E-08, loss= 1.0516 (max= 1.5000), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:15,931 - root - INFO - Step 18700: lr=6.15E-08, loss= 1.0516 (max= 1.5000), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:15,931 - root - INFO - Step 18700: lr=6.15E-08, loss= 1.0516 (max= 1.5000), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:15,931 - root - INFO - Step 18700: lr=6.15E-08, loss= 1.0516 (max= 1.5000), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:15,931 - root - INFO - Step 18700: lr=6.15E-08, loss= 1.0516 (max= 1.5000), tps=20551, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:15,931 - root - INFO - Step 18700: lr=6.15E-08, loss= 1.0516 (max= 1.5000), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:15,932 - root - INFO - Step 18700: lr=6.15E-08, loss= 1.0516 (max= 1.5000), tps=20550, mfu=42.82%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:47,710 - root - INFO - Step 18710: lr=5.88E-08, loss= 1.0931 (max= 1.6914), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:47,710 - root - INFO - Step 18710: lr=5.88E-08, loss= 1.0931 (max= 1.6914), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:47,710 - root - INFO - Step 18710: lr=5.88E-08, loss= 1.0931 (max= 1.6914), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:47,710 - root - INFO - Step 18710: lr=5.88E-08, loss= 1.0931 (max= 1.6914), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:47,710 - root - INFO - Step 18710: lr=5.88E-08, loss= 1.0931 (max= 1.6914), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:47,710 - root - INFO - Step 18710: lr=5.88E-08, loss= 1.0931 (max= 1.6914), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:47,710 - root - INFO - Step 18710: lr=5.88E-08, loss= 1.0931 (max= 1.6914), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:43:47,710 - root - INFO - Step 18710: lr=5.88E-08, loss= 1.0931 (max= 1.6914), tps=20625, mfu=42.97%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:19,593 - root - INFO - Step 18720: lr=5.61E-08, loss= 1.0685 (max= 1.4848), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:19,593 - root - INFO - Step 18720: lr=5.61E-08, loss= 1.0685 (max= 1.4848), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:19,593 - root - INFO - Step 18720: lr=5.61E-08, loss= 1.0685 (max= 1.4848), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:19,593 - root - INFO - Step 18720: lr=5.61E-08, loss= 1.0685 (max= 1.4848), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:19,593 - root - INFO - Step 18720: lr=5.61E-08, loss= 1.0685 (max= 1.4848), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:19,593 - root - INFO - Step 18720: lr=5.61E-08, loss= 1.0685 (max= 1.4848), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:19,593 - root - INFO - Step 18720: lr=5.61E-08, loss= 1.0685 (max= 1.4848), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:19,593 - root - INFO - Step 18720: lr=5.61E-08, loss= 1.0685 (max= 1.4848), tps=20557, mfu=42.83%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:51,528 - root - INFO - Step 18730: lr=5.33E-08, loss= 1.0723 (max= 1.4971), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:51,528 - root - INFO - Step 18730: lr=5.33E-08, loss= 1.0723 (max= 1.4971), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:51,528 - root - INFO - Step 18730: lr=5.33E-08, loss= 1.0723 (max= 1.4971), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:51,528 - root - INFO - Step 18730: lr=5.33E-08, loss= 1.0723 (max= 1.4971), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:51,528 - root - INFO - Step 18730: lr=5.33E-08, loss= 1.0723 (max= 1.4971), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:51,528 - root - INFO - Step 18730: lr=5.33E-08, loss= 1.0723 (max= 1.4971), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:51,529 - root - INFO - Step 18730: lr=5.33E-08, loss= 1.0723 (max= 1.4971), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:44:51,529 - root - INFO - Step 18730: lr=5.33E-08, loss= 1.0723 (max= 1.4971), tps=20524, mfu=42.76%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:23,359 - root - INFO - Step 18740: lr=5.06E-08, loss= 1.0787 (max= 1.5744), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:23,359 - root - INFO - Step 18740: lr=5.06E-08, loss= 1.0787 (max= 1.5744), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:23,359 - root - INFO - Step 18740: lr=5.06E-08, loss= 1.0787 (max= 1.5744), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:23,359 - root - INFO - Step 18740: lr=5.06E-08, loss= 1.0787 (max= 1.5744), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:23,359 - root - INFO - Step 18740: lr=5.06E-08, loss= 1.0787 (max= 1.5744), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:23,359 - root - INFO - Step 18740: lr=5.06E-08, loss= 1.0787 (max= 1.5744), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:23,359 - root - INFO - Step 18740: lr=5.06E-08, loss= 1.0787 (max= 1.5744), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:23,359 - root - INFO - Step 18740: lr=5.06E-08, loss= 1.0787 (max= 1.5744), tps=20591, mfu=42.90%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:55,384 - root - INFO - Step 18750: lr=4.79E-08, loss= 1.0495 (max= 1.7203), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:55,385 - root - INFO - Step 18750: lr=4.79E-08, loss= 1.0495 (max= 1.7203), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:55,385 - root - INFO - Step 18750: lr=4.79E-08, loss= 1.0495 (max= 1.7203), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:55,385 - root - INFO - Step 18750: lr=4.79E-08, loss= 1.0495 (max= 1.7203), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:55,385 - root - INFO - Step 18750: lr=4.79E-08, loss= 1.0495 (max= 1.7203), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:55,385 - root - INFO - Step 18750: lr=4.79E-08, loss= 1.0495 (max= 1.7203), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:55,385 - root - INFO - Step 18750: lr=4.79E-08, loss= 1.0495 (max= 1.7203), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:45:55,385 - root - INFO - Step 18750: lr=4.79E-08, loss= 1.0495 (max= 1.7203), tps=20466, mfu=42.64%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:27,248 - root - INFO - Step 18760: lr=4.51E-08, loss= 1.0642 (max= 1.6339), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:27,248 - root - INFO - Step 18760: lr=4.51E-08, loss= 1.0642 (max= 1.6339), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:27,248 - root - INFO - Step 18760: lr=4.51E-08, loss= 1.0642 (max= 1.6339), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:27,248 - root - INFO - Step 18760: lr=4.51E-08, loss= 1.0642 (max= 1.6339), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:27,248 - root - INFO - Step 18760: lr=4.51E-08, loss= 1.0642 (max= 1.6339), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:27,248 - root - INFO - Step 18760: lr=4.51E-08, loss= 1.0642 (max= 1.6339), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:27,248 - root - INFO - Step 18760: lr=4.51E-08, loss= 1.0642 (max= 1.6339), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:27,248 - root - INFO - Step 18760: lr=4.51E-08, loss= 1.0642 (max= 1.6339), tps=20570, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:59,037 - root - INFO - Step 18770: lr=4.24E-08, loss= 1.0657 (max= 1.6505), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:59,037 - root - INFO - Step 18770: lr=4.24E-08, loss= 1.0657 (max= 1.6505), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:59,037 - root - INFO - Step 18770: lr=4.24E-08, loss= 1.0657 (max= 1.6505), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:59,037 - root - INFO - Step 18770: lr=4.24E-08, loss= 1.0657 (max= 1.6505), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:59,037 - root - INFO - Step 18770: lr=4.24E-08, loss= 1.0657 (max= 1.6505), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:59,038 - root - INFO - Step 18770: lr=4.24E-08, loss= 1.0657 (max= 1.6505), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:59,038 - root - INFO - Step 18770: lr=4.24E-08, loss= 1.0657 (max= 1.6505), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:46:59,038 - root - INFO - Step 18770: lr=4.24E-08, loss= 1.0657 (max= 1.6505), tps=20618, mfu=42.96%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:47:30,852 - root - INFO - Step 18780: lr=3.97E-08, loss= 1.0476 (max= 1.6809), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:47:30,852 - root - INFO - Step 18780: lr=3.97E-08, loss= 1.0476 (max= 1.6809), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:47:30,852 - root - INFO - Step 18780: lr=3.97E-08, loss= 1.0476 (max= 1.6809), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:47:30,852 - root - INFO - Step 18780: lr=3.97E-08, loss= 1.0476 (max= 1.6809), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:47:30,852 - root - INFO - Step 18780: lr=3.97E-08, loss= 1.0476 (max= 1.6809), tps=20602, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:47:30,853 - root - INFO - Step 18780: lr=3.97E-08, loss= 1.0476 (max= 1.6809), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:47:30,853 - root - INFO - Step 18780: lr=3.97E-08, loss= 1.0476 (max= 1.6809), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:47:30,853 - root - INFO - Step 18780: lr=3.97E-08, loss= 1.0476 (max= 1.6809), tps=20601, mfu=42.92%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:02,674 - root - INFO - Step 18790: lr=3.70E-08, loss= 1.0878 (max= 1.7743), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:02,674 - root - INFO - Step 18790: lr=3.70E-08, loss= 1.0878 (max= 1.7743), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:02,674 - root - INFO - Step 18790: lr=3.70E-08, loss= 1.0878 (max= 1.7743), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:02,674 - root - INFO - Step 18790: lr=3.70E-08, loss= 1.0878 (max= 1.7743), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:02,674 - root - INFO - Step 18790: lr=3.70E-08, loss= 1.0878 (max= 1.7743), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:02,674 - root - INFO - Step 18790: lr=3.70E-08, loss= 1.0878 (max= 1.7743), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:02,674 - root - INFO - Step 18790: lr=3.70E-08, loss= 1.0878 (max= 1.7743), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:02,674 - root - INFO - Step 18790: lr=3.70E-08, loss= 1.0878 (max= 1.7743), tps=20597, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:34,631 - root - INFO - Step 18800: lr=3.42E-08, loss= 1.0651 (max= 1.4796), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:34,631 - root - INFO - Step 18800: lr=3.42E-08, loss= 1.0651 (max= 1.4796), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:34,631 - root - INFO - Step 18800: lr=3.42E-08, loss= 1.0651 (max= 1.4796), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:34,631 - root - INFO - Step 18800: lr=3.42E-08, loss= 1.0651 (max= 1.4796), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:34,631 - root - INFO - Step 18800: lr=3.42E-08, loss= 1.0651 (max= 1.4796), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:34,631 - root - INFO - Step 18800: lr=3.42E-08, loss= 1.0651 (max= 1.4796), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:34,631 - root - INFO - Step 18800: lr=3.42E-08, loss= 1.0651 (max= 1.4796), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:48:34,631 - root - INFO - Step 18800: lr=3.42E-08, loss= 1.0651 (max= 1.4796), tps=20509, mfu=42.73%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:06,492 - root - INFO - Step 18810: lr=3.15E-08, loss= 1.0744 (max= 2.0294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:06,492 - root - INFO - Step 18810: lr=3.15E-08, loss= 1.0744 (max= 2.0294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:06,492 - root - INFO - Step 18810: lr=3.15E-08, loss= 1.0744 (max= 2.0294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:06,492 - root - INFO - Step 18810: lr=3.15E-08, loss= 1.0744 (max= 2.0294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:06,492 - root - INFO - Step 18810: lr=3.15E-08, loss= 1.0744 (max= 2.0294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:06,492 - root - INFO - Step 18810: lr=3.15E-08, loss= 1.0744 (max= 2.0294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:06,492 - root - INFO - Step 18810: lr=3.15E-08, loss= 1.0744 (max= 2.0294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:06,493 - root - INFO - Step 18810: lr=3.15E-08, loss= 1.0744 (max= 2.0294), tps=20572, mfu=42.86%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:38,301 - root - INFO - Step 18820: lr=2.88E-08, loss= 1.0516 (max= 1.4667), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:38,301 - root - INFO - Step 18820: lr=2.88E-08, loss= 1.0516 (max= 1.4667), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:38,301 - root - INFO - Step 18820: lr=2.88E-08, loss= 1.0516 (max= 1.4667), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:38,301 - root - INFO - Step 18820: lr=2.88E-08, loss= 1.0516 (max= 1.4667), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:38,301 - root - INFO - Step 18820: lr=2.88E-08, loss= 1.0516 (max= 1.4667), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:38,302 - root - INFO - Step 18820: lr=2.88E-08, loss= 1.0516 (max= 1.4667), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:38,302 - root - INFO - Step 18820: lr=2.88E-08, loss= 1.0516 (max= 1.4667), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:49:38,302 - root - INFO - Step 18820: lr=2.88E-08, loss= 1.0516 (max= 1.4667), tps=20605, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:10,170 - root - INFO - Step 18830: lr=2.61E-08, loss= 1.0361 (max= 1.5331), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:10,170 - root - INFO - Step 18830: lr=2.61E-08, loss= 1.0361 (max= 1.5331), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:10,171 - root - INFO - Step 18830: lr=2.61E-08, loss= 1.0361 (max= 1.5331), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:10,171 - root - INFO - Step 18830: lr=2.61E-08, loss= 1.0361 (max= 1.5331), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:10,171 - root - INFO - Step 18830: lr=2.61E-08, loss= 1.0361 (max= 1.5331), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:10,171 - root - INFO - Step 18830: lr=2.61E-08, loss= 1.0361 (max= 1.5331), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:10,171 - root - INFO - Step 18830: lr=2.61E-08, loss= 1.0361 (max= 1.5331), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:10,171 - root - INFO - Step 18830: lr=2.61E-08, loss= 1.0361 (max= 1.5331), tps=20566, mfu=42.85%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:28,398 - root - WARNING - Empty document detected at /work/production/data/dsk-open-dyna-0-of-1-cp-2-of-16-train/dsk-open-dyna-0-of-1-cp-2-of-16-train.parquet:2675277 2025-10-27 00:50:42,007 - root - INFO - Step 18840: lr=2.34E-08, loss= 1.0485 (max= 1.4052), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:42,007 - root - INFO - Step 18840: lr=2.34E-08, loss= 1.0485 (max= 1.4052), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:42,007 - root - INFO - Step 18840: lr=2.34E-08, loss= 1.0485 (max= 1.4052), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:42,007 - root - INFO - Step 18840: lr=2.34E-08, loss= 1.0485 (max= 1.4052), tps=20588, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:42,007 - root - INFO - Step 18840: lr=2.34E-08, loss= 1.0485 (max= 1.4052), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:42,007 - root - INFO - Step 18840: lr=2.34E-08, loss= 1.0485 (max= 1.4052), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:42,007 - root - INFO - Step 18840: lr=2.34E-08, loss= 1.0485 (max= 1.4052), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:50:42,007 - root - INFO - Step 18840: lr=2.34E-08, loss= 1.0485 (max= 1.4052), tps=20587, mfu=42.89%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:13,919 - root - INFO - Step 18850: lr=2.06E-08, loss= 1.0449 (max= 1.8051), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:13,919 - root - INFO - Step 18850: lr=2.06E-08, loss= 1.0449 (max= 1.8051), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:13,919 - root - INFO - Step 18850: lr=2.06E-08, loss= 1.0449 (max= 1.8051), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:13,919 - root - INFO - Step 18850: lr=2.06E-08, loss= 1.0449 (max= 1.8051), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:13,919 - root - INFO - Step 18850: lr=2.06E-08, loss= 1.0449 (max= 1.8051), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:13,919 - root - INFO - Step 18850: lr=2.06E-08, loss= 1.0449 (max= 1.8051), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:13,919 - root - INFO - Step 18850: lr=2.06E-08, loss= 1.0449 (max= 1.8051), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:13,919 - root - INFO - Step 18850: lr=2.06E-08, loss= 1.0449 (max= 1.8051), tps=20539, mfu=42.79%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:45,720 - root - INFO - Step 18860: lr=1.79E-08, loss= 1.0356 (max= 1.4192), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:45,720 - root - INFO - Step 18860: lr=1.79E-08, loss= 1.0356 (max= 1.4192), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:45,720 - root - INFO - Step 18860: lr=1.79E-08, loss= 1.0356 (max= 1.4192), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:45,721 - root - INFO - Step 18860: lr=1.79E-08, loss= 1.0356 (max= 1.4192), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:45,721 - root - INFO - Step 18860: lr=1.79E-08, loss= 1.0356 (max= 1.4192), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:45,721 - root - INFO - Step 18860: lr=1.79E-08, loss= 1.0356 (max= 1.4192), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:45,721 - root - INFO - Step 18860: lr=1.79E-08, loss= 1.0356 (max= 1.4192), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:51:45,721 - root - INFO - Step 18860: lr=1.79E-08, loss= 1.0356 (max= 1.4192), tps=20610, mfu=42.94%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:17,757 - root - INFO - Step 18870: lr=1.52E-08, loss= 1.0371 (max= 1.4147), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:17,757 - root - INFO - Step 18870: lr=1.52E-08, loss= 1.0371 (max= 1.4147), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:17,757 - root - INFO - Step 18870: lr=1.52E-08, loss= 1.0371 (max= 1.4147), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:17,757 - root - INFO - Step 18870: lr=1.52E-08, loss= 1.0371 (max= 1.4147), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:17,757 - root - INFO - Step 18870: lr=1.52E-08, loss= 1.0371 (max= 1.4147), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:17,757 - root - INFO - Step 18870: lr=1.52E-08, loss= 1.0371 (max= 1.4147), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:17,757 - root - INFO - Step 18870: lr=1.52E-08, loss= 1.0371 (max= 1.4147), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:17,757 - root - INFO - Step 18870: lr=1.52E-08, loss= 1.0371 (max= 1.4147), tps=20459, mfu=42.63%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:49,569 - root - INFO - Step 18880: lr=1.25E-08, loss= 1.0477 (max= 1.4405), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:49,569 - root - INFO - Step 18880: lr=1.25E-08, loss= 1.0477 (max= 1.4405), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:49,569 - root - INFO - Step 18880: lr=1.25E-08, loss= 1.0477 (max= 1.4405), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:49,569 - root - INFO - Step 18880: lr=1.25E-08, loss= 1.0477 (max= 1.4405), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:49,569 - root - INFO - Step 18880: lr=1.25E-08, loss= 1.0477 (max= 1.4405), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:49,569 - root - INFO - Step 18880: lr=1.25E-08, loss= 1.0477 (max= 1.4405), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:49,569 - root - INFO - Step 18880: lr=1.25E-08, loss= 1.0477 (max= 1.4405), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:52:49,569 - root - INFO - Step 18880: lr=1.25E-08, loss= 1.0477 (max= 1.4405), tps=20603, mfu=42.93%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:21,444 - root - INFO - Step 18890: lr=9.77E-09, loss= 1.0789 (max= 1.6922), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:21,444 - root - INFO - Step 18890: lr=9.77E-09, loss= 1.0789 (max= 1.6922), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:21,444 - root - INFO - Step 18890: lr=9.77E-09, loss= 1.0789 (max= 1.6922), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:21,444 - root - INFO - Step 18890: lr=9.77E-09, loss= 1.0789 (max= 1.6922), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:21,444 - root - INFO - Step 18890: lr=9.77E-09, loss= 1.0789 (max= 1.6922), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:21,444 - root - INFO - Step 18890: lr=9.77E-09, loss= 1.0789 (max= 1.6922), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:21,444 - root - INFO - Step 18890: lr=9.77E-09, loss= 1.0789 (max= 1.6922), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:21,444 - root - INFO - Step 18890: lr=9.77E-09, loss= 1.0789 (max= 1.6922), tps=20563, mfu=42.84%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:53,218 - root - INFO - Step 18900: lr=7.06E-09, loss= 1.0546 (max= 1.4406), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:53,218 - root - INFO - Step 18900: lr=7.06E-09, loss= 1.0546 (max= 1.4406), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:53,218 - root - INFO - Step 18900: lr=7.06E-09, loss= 1.0546 (max= 1.4406), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:53,218 - root - INFO - Step 18900: lr=7.06E-09, loss= 1.0546 (max= 1.4406), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:53,218 - root - INFO - Step 18900: lr=7.06E-09, loss= 1.0546 (max= 1.4406), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:53,218 - root - INFO - Step 18900: lr=7.06E-09, loss= 1.0546 (max= 1.4406), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:53,218 - root - INFO - Step 18900: lr=7.06E-09, loss= 1.0546 (max= 1.4406), tps=20627, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:53:53,218 - root - INFO - Step 18900: lr=7.06E-09, loss= 1.0546 (max= 1.4406), tps=20628, mfu=42.98%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:25,116 - root - INFO - Step 18910: lr=4.34E-09, loss= 1.0645 (max= 1.5130), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:25,116 - root - INFO - Step 18910: lr=4.34E-09, loss= 1.0645 (max= 1.5130), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:25,117 - root - INFO - Step 18910: lr=4.34E-09, loss= 1.0645 (max= 1.5130), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:25,117 - root - INFO - Step 18910: lr=4.34E-09, loss= 1.0645 (max= 1.5130), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:25,117 - root - INFO - Step 18910: lr=4.34E-09, loss= 1.0645 (max= 1.5130), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:25,117 - root - INFO - Step 18910: lr=4.34E-09, loss= 1.0645 (max= 1.5130), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:25,117 - root - INFO - Step 18910: lr=4.34E-09, loss= 1.0645 (max= 1.5130), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:25,117 - root - INFO - Step 18910: lr=4.34E-09, loss= 1.0645 (max= 1.5130), tps=20547, mfu=42.81%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:56,943 - root - INFO - Step 18920: lr=1.63E-09, loss= 1.0503 (max= 1.6983), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:56,944 - root - INFO - Step 18920: lr=1.63E-09, loss= 1.0503 (max= 1.6983), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:56,944 - root - INFO - Step 18920: lr=1.63E-09, loss= 1.0503 (max= 1.6983), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:56,944 - root - INFO - Step 18920: lr=1.63E-09, loss= 1.0503 (max= 1.6983), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:56,944 - root - INFO - Step 18920: lr=1.63E-09, loss= 1.0503 (max= 1.6983), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:56,944 - root - INFO - Step 18920: lr=1.63E-09, loss= 1.0503 (max= 1.6983), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:56,944 - root - INFO - Step 18920: lr=1.63E-09, loss= 1.0503 (max= 1.6983), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:54:56,944 - root - INFO - Step 18920: lr=1.63E-09, loss= 1.0503 (max= 1.6983), tps=20593, mfu=42.91%, memory: 154.31GiB(86.51%) time/data_loading=0.00s (max=0.00s, 0.01%) 2025-10-27 00:55:15,559 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 2025-10-27 00:55:15,559 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) 2025-10-27 00:55:15,562 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 2025-10-27 00:55:15,562 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) 2025-10-27 00:55:15,563 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 2025-10-27 00:55:15,563 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) 2025-10-27 00:55:15,563 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 2025-10-27 00:55:15,563 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) 2025-10-27 00:55:15,564 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 2025-10-27 00:55:15,564 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) 2025-10-27 00:55:15,567 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 2025-10-27 00:55:15,567 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) 2025-10-27 00:55:15,574 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 2025-10-27 00:55:15,574 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) 2025-10-27 00:55:15,575 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 18926 2025-10-27 00:55:15,575 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) 2025-10-27 00:55:23,968 - root - INFO - Finished saving the checkpoint in 8.40 seconds 2025-10-27 00:55:23,969 - root - INFO - Sleeping 2 seconds for other ranks to complete 2025-10-27 00:55:23,971 - root - INFO - Finished saving the checkpoint in 8.41 seconds 2025-10-27 00:55:23,971 - root - INFO - Finished saving the checkpoint in 8.40 seconds 2025-10-27 00:55:23,971 - root - INFO - Finished saving the checkpoint in 8.40 seconds 2025-10-27 00:55:23,972 - root - INFO - Finished saving the checkpoint in 8.41 seconds 2025-10-27 00:55:23,972 - root - INFO - Training successfully completed! 2025-10-27 00:55:23,972 - root - INFO - Training successfully completed! 2025-10-27 00:55:23,972 - root - INFO - Finished saving the checkpoint in 8.40 seconds 2025-10-27 00:55:23,972 - root - INFO - Finished saving the checkpoint in 8.41 seconds 2025-10-27 00:55:23,972 - root - INFO - Finished saving the checkpoint in 8.41 seconds 2025-10-27 00:55:23,972 - root - INFO - Training successfully completed! 2025-10-27 00:55:23,972 - root - INFO - Training successfully completed! 2025-10-27 00:55:23,972 - root - INFO - Training successfully completed! 2025-10-27 00:55:23,972 - root - INFO - Training successfully completed! 2025-10-27 00:55:23,973 - root - INFO - Training successfully completed! 2025-10-27 00:55:25,969 - root - INFO - Training successfully completed!