[2025-02-13 03:39:37,264] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-13 03:39:37,265] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-13 03:39:37,267] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-13 03:39:37,278] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-13 03:39:37,279] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-13 03:39:37,280] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-13 03:39:37,280] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) INFO 02-13 03:39:43 __init__.py:190] Automatically detected platform cuda. INFO 02-13 03:39:43 __init__.py:190] Automatically detected platform cuda. INFO 02-13 03:39:43 __init__.py:190] Automatically detected platform cuda. INFO 02-13 03:39:43 __init__.py:190] Automatically detected platform cuda. INFO 02-13 03:39:43 __init__.py:190] Automatically detected platform cuda. INFO 02-13 03:39:43 __init__.py:190] Automatically detected platform cuda. INFO 02-13 03:39:43 __init__.py:190] Automatically detected platform cuda. [2025-02-13 03:39:49,014] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-13 03:39:49,015] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-13 03:39:49,015] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-13 03:39:49,017] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-13 03:39:49,020] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-13 03:39:49,021] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-13 03:39:49,029] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-13 03:39:49,029] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-02-13 03:39:49,702] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7 You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2VisionTransformerPretrainedModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)` p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1059767 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1059767 [0] NCCL INFO Bootstrap : Using bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1059767 [0] NCCL INFO cudaDriverVersion 12040 NCCL version 2.21.5+cuda12.4 [2025-02-13 03:39:49,966] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7 You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-02-13 03:39:49,976] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7 [2025-02-13 03:39:49,977] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7 Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2VisionTransformerPretrainedModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)` You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-02-13 03:39:49,984] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7 You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2VisionTransformerPretrainedModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)` Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2VisionTransformerPretrainedModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)` [2025-02-13 03:39:49,996] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7 You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2VisionTransformerPretrainedModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)` p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1059772 [2] NCCL INFO cudaDriverVersion 12040 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1059772 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1059772 [2] NCCL INFO Bootstrap : Using bond0:10.9.200.89<0> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2VisionTransformerPretrainedModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)` p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1059779 [5] NCCL INFO cudaDriverVersion 12040 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1059779 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1059779 [5] NCCL INFO Bootstrap : Using bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1059775 [3] NCCL INFO cudaDriverVersion 12040 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1059775 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1059775 [3] NCCL INFO Bootstrap : Using bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1059770 [1] NCCL INFO cudaDriverVersion 12040 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1059770 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1059770 [1] NCCL INFO Bootstrap : Using bond0:10.9.200.89<0> [2025-02-13 03:39:50,031] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7 You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1059777 [4] NCCL INFO cudaDriverVersion 12040 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1059777 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1059777 [4] NCCL INFO Bootstrap : Using bond0:10.9.200.89<0> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2VisionTransformerPretrainedModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)` p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1059781 [6] NCCL INFO cudaDriverVersion 12040 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1059781 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1059781 [6] NCCL INFO Bootstrap : Using bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO P2P plugin IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [RO]; OOB bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Using non-device net plugin version 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Using network IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO P2P plugin IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO P2P plugin IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO P2P plugin IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO P2P plugin IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO P2P plugin IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [RO]; OOB bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Using non-device net plugin version 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Using network IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [RO]; OOB bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Using non-device net plugin version 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Using network IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [RO]; OOB bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Using non-device net plugin version 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Using network IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [RO]; OOB bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Using non-device net plugin version 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Using network IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO P2P plugin IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [RO]; OOB bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Using non-device net plugin version 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Using network IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [RO]; OOB bond0:10.9.200.89<0> p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Using non-device net plugin version 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Using network IBext_v8 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO ncclCommInitRank comm 0x563281849c20 rank 2 nranks 7 cudaDev 2 nvmlDev 2 busId 49000 commId 0x510c77ca277af1c1 - Init START p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO ncclCommInitRank comm 0x55c9426243d0 rank 1 nranks 7 cudaDev 1 nvmlDev 1 busId 16000 commId 0x510c77ca277af1c1 - Init START p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO ncclCommInitRank comm 0x559dab3953c0 rank 3 nranks 7 cudaDev 3 nvmlDev 3 busId 4d000 commId 0x510c77ca277af1c1 - Init START p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO ncclCommInitRank comm 0x56018a90f170 rank 6 nranks 7 cudaDev 6 nvmlDev 6 busId c6000 commId 0x510c77ca277af1c1 - Init START p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO ncclCommInitRank comm 0x5651c04d7280 rank 5 nranks 7 cudaDev 5 nvmlDev 5 busId 8f000 commId 0x510c77ca277af1c1 - Init START p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO ncclCommInitRank comm 0x55fd7282b2b0 rank 4 nranks 7 cudaDev 4 nvmlDev 4 busId 8a000 commId 0x510c77ca277af1c1 - Init START p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO ncclCommInitRank comm 0x564c8acfefa0 rank 0 nranks 7 cudaDev 0 nvmlDev 0 busId 10000 commId 0x510c77ca277af1c1 - Init START p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0. p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,00000000,ffffffff p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO NVLS multicast support is not available on dev 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0. p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0. p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,00000000,ffffffff,00000000 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO NVLS multicast support is not available on dev 5 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0. p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,00000000,ffffffff p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO NVLS multicast support is not available on dev 2 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Setting affinity for GPU 4 to ffffffff,00000000,ffffffff,00000000 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO NVLS multicast support is not available on dev 4 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0. p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,00000000,ffffffff p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO NVLS multicast support is not available on dev 1 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0. p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,00000000,ffffffff p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0. p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO NVLS multicast support is not available on dev 3 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,00000000,ffffffff,00000000 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO NVLS multicast support is not available on dev 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO comm 0x5651c04d7280 rank 5 nRanks 7 nNodes 1 localRanks 7 localRank 5 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO comm 0x55fd7282b2b0 rank 4 nRanks 7 nNodes 1 localRanks 7 localRank 4 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO comm 0x559dab3953c0 rank 3 nRanks 7 nNodes 1 localRanks 7 localRank 3 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO comm 0x563281849c20 rank 2 nRanks 7 nNodes 1 localRanks 7 localRank 2 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO comm 0x55c9426243d0 rank 1 nRanks 7 nNodes 1 localRanks 7 localRank 1 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO comm 0x56018a90f170 rank 6 nRanks 7 nNodes 1 localRanks 7 localRank 6 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO comm 0x564c8acfefa0 rank 0 nRanks 7 nNodes 1 localRanks 7 localRank 0 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Trees [0] -1/-1/-1->6->5 [1] -1/-1/-1->6->5 [2] -1/-1/-1->6->5 [3] -1/-1/-1->6->5 [4] -1/-1/-1->6->5 [5] -1/-1/-1->6->5 [6] -1/-1/-1->6->5 [7] -1/-1/-1->6->5 [8] -1/-1/-1->6->5 [9] -1/-1/-1->6->5 [10] -1/-1/-1->6->5 [11] -1/-1/-1->6->5 [12] -1/-1/-1->6->5 [13] -1/-1/-1->6->5 [14] -1/-1/-1->6->5 [15] -1/-1/-1->6->5 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 00/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 01/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 02/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 03/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 04/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 05/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 06/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 07/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 08/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 09/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 10/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 11/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 12/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 13/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 14/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 15/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 00/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 01/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 02/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 03/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 04/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 05/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 06/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 07/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 08/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 09/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 10/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 11/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 12/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 13/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 14/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 15/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2, using internal tuner instead. p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1078328 [4] NCCL INFO ncclCommInitRank comm 0x55fd7282b2b0 rank 4 nranks 7 cudaDev 4 nvmlDev 4 busId 8a000 commId 0x510c77ca277af1c1 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2, using internal tuner instead. p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2, using internal tuner instead. p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1078326 [1] NCCL INFO ncclCommInitRank comm 0x55c9426243d0 rank 1 nranks 7 cudaDev 1 nvmlDev 1 busId 16000 commId 0x510c77ca277af1c1 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1078320 [2] NCCL INFO ncclCommInitRank comm 0x563281849c20 rank 2 nranks 7 cudaDev 2 nvmlDev 2 busId 49000 commId 0x510c77ca277af1c1 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2, using internal tuner instead. p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1078307 [0] NCCL INFO ncclCommInitRank comm 0x564c8acfefa0 rank 0 nranks 7 cudaDev 0 nvmlDev 0 busId 10000 commId 0x510c77ca277af1c1 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2, using internal tuner instead. p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1078330 [6] NCCL INFO ncclCommInitRank comm 0x56018a90f170 rank 6 nranks 7 cudaDev 6 nvmlDev 6 busId c6000 commId 0x510c77ca277af1c1 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2, using internal tuner instead. p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1078322 [5] NCCL INFO ncclCommInitRank comm 0x5651c04d7280 rank 5 nranks 7 cudaDev 5 nvmlDev 5 busId 8f000 commId 0x510c77ca277af1c1 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2, using internal tuner instead. p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1078324 [3] NCCL INFO ncclCommInitRank comm 0x559dab3953c0 rank 3 nranks 7 cudaDev 3 nvmlDev 3 busId 4d000 commId 0x510c77ca277af1c1 - Init COMPLETE [2025-02-13 03:39:51,634] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 730, num_elems = 8.29B Loading checkpoint shards: 0%| | 0/5 [00:00 [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] communication_data_type ...... None [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] curriculum_enabled_legacy .... False [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] curriculum_params_legacy ..... False [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] data_efficiency_enabled ...... False [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] dataloader_drop_last ......... False [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] disable_allgather ............ False [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] dump_state ................... False [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] dynamic_loss_scale_args ...... None [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] eigenvalue_enabled ........... False [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] eigenvalue_gas_boundary_resolution 1 [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] eigenvalue_layer_name ........ bert.encoder.layer [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] eigenvalue_layer_num ......... 0 [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] eigenvalue_max_iter .......... 100 [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] eigenvalue_stability ......... 1e-06 [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] eigenvalue_tol ............... 0.01 [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] eigenvalue_verbose ........... False [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] elasticity_enabled ........... False [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2025-02-13 03:40:15,571] [INFO] [config.py:1003:print] fp16_auto_cast ............... None [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] fp16_enabled ................. False [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] fp16_master_weights_and_gradients False [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] global_rank .................. 0 [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] grad_accum_dtype ............. None [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] gradient_accumulation_steps .. 2 [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] gradient_clipping ............ 1.0 [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] gradient_predivide_factor .... 1.0 [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] graph_harvesting ............. False [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] initial_dynamic_scale ........ 1 [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] load_universal_checkpoint .... False [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] loss_scale ................... 1.0 [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] memory_breakdown ............. False [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] mics_hierarchial_params_gather False [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] mics_shard_size .............. -1 [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] optimizer_legacy_fusion ...... False [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] optimizer_name ............... None [2025-02-13 03:40:15,572] [INFO] [config.py:1003:print] optimizer_params ............. None [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] pld_enabled .................. False [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] pld_params ................... False [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] prescale_gradients ........... False [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] scheduler_name ............... None [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] scheduler_params ............. None [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] seq_parallel_communication_data_type torch.float32 [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] sparse_attention ............. None [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] sparse_gradients_enabled ..... False [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] steps_per_print .............. inf [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] timers_config ................ enabled=True synchronized=True [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] train_batch_size ............. 14 [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] train_micro_batch_size_per_gpu 1 [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] use_data_before_expert_parallel_ False [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] use_node_local_storage ....... False [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] wall_clock_breakdown ......... False [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] weight_quantization_config ... None [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] world_size ................... 7 [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] zero_allow_untested_optimizer False [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100000000, max_in_cpu=1000000000, pin_memory=True) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=True, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=True module_granularity_threshold=0 use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False zeropp_loco_param=None mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] zero_enabled ................. True [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] zero_force_ds_cpu_optimizer .. True [2025-02-13 03:40:15,573] [INFO] [config.py:1003:print] zero_optimization_stage ...... 3 [2025-02-13 03:40:15,574] [INFO] [config.py:989:print_user_config] json = { "fp16": { "enabled": false, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": true }, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "none", "pin_memory": true }, "offload_param": { "device": "none", "pin_memory": true }, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1.000000e+09, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1.000000e+09, "stage3_max_reuse_distance": 1.000000e+09, "stage3_gather_16bit_weights_on_model_save": true }, "gradient_accumulation_steps": 2, "gradient_clipping": 1.0, "steps_per_print": inf, "train_batch_size": 14, "train_micro_batch_size_per_gpu": 1, "wall_clock_breakdown": false, "zero_optimization.reduce_bucket_size": 1.284506e+07, "zero_optimization.stage3_param_persistence_threshold": 3.584000e+04, "zero_optimization.stage3_prefetch_bucket_size": 1.156055e+07 } INFO 02-13 03:40:28 config.py:542] This model supports multiple tasks: {'classify', 'generate', 'embed', 'score', 'reward'}. Defaulting to 'generate'. WARNING 02-13 03:40:28 arg_utils.py:1079] --enable-prefix-caching is currently not supported for multimodal models in v0 and has been disabled. INFO 02-13 03:40:28 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.2) with config: model='/home/vlm/pretrain_model/Qwen2-VL-7B-Instruct', speculative_config=None, tokenizer='/home/vlm/pretrain_model/Qwen2-VL-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda:7, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/home/vlm/pretrain_model/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, INFO 02-13 03:40:29 cuda.py:230] Using Flash Attention backend. INFO 02-13 03:40:29 model_runner.py:1110] Starting to load model /home/vlm/pretrain_model/Qwen2-VL-7B-Instruct... INFO 02-13 03:40:29 config.py:2992] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248] Loading safetensors checkpoint shards: 0% Completed | 0/5 [00:00 32768). Running this sequence through the model will result in indexing errors WARNING 02-13 03:40:45 profiling.py:187] The context length (32768) of the model is too short to hold the multi-modal embeddings in the worst case (49152 tokens in total, out of which {'image': 16384, 'video': 32768} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`. INFO 02-13 03:40:48 worker.py:267] Memory profiling takes 12.59 seconds INFO 02-13 03:40:48 worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.70) = 55.53GiB INFO 02-13 03:40:48 worker.py:267] model weights take 0.00GiB; non_torch_memory takes 0.00GiB; PyTorch activation peak memory takes 0.00GiB; the rest of the memory reserved for KV Cache is 55.53GiB. INFO 02-13 03:40:49 executor_base.py:110] # CUDA blocks: 64982, # CPU blocks: 4681 INFO 02-13 03:40:49 executor_base.py:115] Maximum concurrency for 32768 tokens per request: 31.73x INFO 02-13 03:40:51 model_runner.py:1434] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage. Capturing CUDA graph shapes: 0%| | 0/35 [00:003->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO comm 0x7fa2e806fe90 rank 5 nRanks 7 nNodes 1 localRanks 7 localRank 5 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Trees [0] -1/-1/-1->6->5 [1] -1/-1/-1->6->5 [2] -1/-1/-1->6->5 [3] -1/-1/-1->6->5 [4] -1/-1/-1->6->5 [5] -1/-1/-1->6->5 [6] -1/-1/-1->6->5 [7] -1/-1/-1->6->5 [8] -1/-1/-1->6->5 [9] -1/-1/-1->6->5 [10] -1/-1/-1->6->5 [11] -1/-1/-1->6->5 [12] -1/-1/-1->6->5 [13] -1/-1/-1->6->5 [14] -1/-1/-1->6->5 [15] -1/-1/-1->6->5 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO comm 0x7f6258070980 rank 4 nRanks 7 nNodes 1 localRanks 7 localRank 4 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO comm 0x7f61c406f9b0 rank 0 nRanks 7 nNodes 1 localRanks 7 localRank 0 MNNVL 0 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 00/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 01/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 02/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 03/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 04/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 05/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 06/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 07/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 08/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 09/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 10/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 11/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 12/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 13/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 14/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 15/16 : 0 1 2 3 4 5 6 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO P2P Chunksize set to 524288 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 00/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 01/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 02/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 03/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 04/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 05/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 06/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 07/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 08/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 09/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 10/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 11/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 12/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 13/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 14/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 15/0 : 6[6] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Connected all rings p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/IPC/read p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO Connected all trees p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO threadThresholds 8/8/64 | 56/8/64 | 512 | 512 p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer p-phy-ctyun-gz-a800-node-prod-200-89:1059779:1132010 [5] NCCL INFO ncclCommSplit comm 0x7fa2e806fe90 rank 5 nranks 7 cudaDev 5 nvmlDev 5 busId 8f000 parent 0x5651c04d7280 color -1326228412 key 5 commId 0xcda3b3a6f96aa498 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059775:1132008 [3] NCCL INFO ncclCommSplit comm 0x7fc44406fb00 rank 3 nranks 7 cudaDev 3 nvmlDev 3 busId 4d000 parent 0x559dab3953c0 color -1326228412 key 3 commId 0xcda3b3a6f96aa498 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059767:1132023 [0] NCCL INFO ncclCommSplit comm 0x7f61c406f9b0 rank 0 nranks 7 cudaDev 0 nvmlDev 0 busId 10000 parent 0x564c8acfefa0 color -1326228412 key 0 commId 0xcda3b3a6f96aa498 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059781:1132019 [6] NCCL INFO ncclCommSplit comm 0x7feafc070c10 rank 6 nranks 7 cudaDev 6 nvmlDev 6 busId c6000 parent 0x56018a90f170 color -1326228412 key 6 commId 0xcda3b3a6f96aa498 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059770:1132017 [1] NCCL INFO ncclCommSplit comm 0x7f2040070860 rank 1 nranks 7 cudaDev 1 nvmlDev 1 busId 16000 parent 0x55c9426243d0 color -1326228412 key 1 commId 0xcda3b3a6f96aa498 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059772:1132016 [2] NCCL INFO ncclCommSplit comm 0x7f23d4070d80 rank 2 nranks 7 cudaDev 2 nvmlDev 2 busId 49000 parent 0x563281849c20 color -1326228412 key 2 commId 0xcda3b3a6f96aa498 - Init COMPLETE p-phy-ctyun-gz-a800-node-prod-200-89:1059777:1132018 [4] NCCL INFO ncclCommSplit comm 0x7f6258070980 rank 4 nranks 7 cudaDev 4 nvmlDev 4 busId 8a000 parent 0x55fd7282b2b0 color -1326228412 key 4 commId 0xcda3b3a6f96aa498 - Init COMPLETE 0%| | 1/2500 [00:15<10:31:45, 15.17s/it] {'loss': -0.0, 'grad_norm': 3.3333390346682115, 'learning_rate': 9.996e-07, 'completion_length': 56.08928680419922, 'rewards/accuracy_reward': 0.7500000298023224, 'rewards/format_reward': 1.0, 'reward': 1.7500000596046448, 'reward_std': 0.26657508313655853, 'kl': 0.0, 'epoch': 0.0} 0%| | 1/2500 [00:15<10:31:45, 15.17s/it] 0%| | 2/2500 [00:24<8:10:25, 11.78s/it] {'loss': 0.0, 'grad_norm': 3.4051657869925807, 'learning_rate': 9.992e-07, 'completion_length': 60.92857551574707, 'rewards/accuracy_reward': 0.8214286267757416, 'rewards/format_reward': 1.0, 'reward': 1.821428656578064, 'reward_std': 0.2857142984867096, 'kl': 0.00051116943359375, 'epoch': 0.0} 0%| | 2/2500 [00:24<8:10:25, 11.78s/it] 0%| | 3/2500 [00:34<7:34:02, 10.91s/it] {'loss': 0.0001, 'grad_norm': 5.333836284523779, 'learning_rate': 9.988e-07, 'completion_length': 59.25000190734863, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750001192092896, 'reward_std': 0.1896214671432972, 'kl': 0.0031890869140625, 'epoch': 0.0} 0%| | 3/2500 [00:34<7:34:02, 10.91s/it] 0%| | 4/2500 [00:43<6:59:20, 10.08s/it] {'loss': 0.0003, 'grad_norm': 1.9463453965742006, 'learning_rate': 9.983999999999998e-07, 'completion_length': 57.58928871154785, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 0.9642857313156128, 'reward': 1.9107143878936768, 'reward_std': 0.14838216826319695, 'kl': 0.007110595703125, 'epoch': 0.0} 0%| | 4/2500 [00:43<6:59:20, 10.08s/it] 0%| | 5/2500 [00:53<6:59:52, 10.10s/it] {'loss': 0.0008, 'grad_norm': 3.1547226928488215, 'learning_rate': 9.98e-07, 'completion_length': 61.17857360839844, 'rewards/accuracy_reward': 0.8571429252624512, 'rewards/format_reward': 1.0, 'reward': 1.8571429252624512, 'reward_std': 0.2253357619047165, 'kl': 0.0198974609375, 'epoch': 0.0} 0%| | 5/2500 [00:53<6:59:52, 10.10s/it] 0%| | 6/2500 [01:05<7:25:23, 10.72s/it] {'loss': 0.0011, 'grad_norm': 3.369775745668218, 'learning_rate': 9.976e-07, 'completion_length': 69.78571701049805, 'rewards/accuracy_reward': 0.7321428954601288, 'rewards/format_reward': 0.9642857313156128, 'reward': 1.696428656578064, 'reward_std': 0.2610500529408455, 'kl': 0.02728271484375, 'epoch': 0.0} 0%| | 6/2500 [01:05<7:25:23, 10.72s/it] 0%| | 7/2500 [01:14<7:01:47, 10.15s/it] {'loss': 0.0022, 'grad_norm': 4.714172277840533, 'learning_rate': 9.972e-07, 'completion_length': 49.55357360839844, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.1896214708685875, 'kl': 0.0562744140625, 'epoch': 0.0} 0%| | 7/2500 [01:14<7:01:47, 10.15s/it] 0%| | 8/2500 [01:23<6:44:14, 9.73s/it] {'loss': 0.0022, 'grad_norm': 4.404514493267795, 'learning_rate': 9.968e-07, 'completion_length': 55.30357551574707, 'rewards/accuracy_reward': 0.767857164144516, 'rewards/format_reward': 1.0, 'reward': 1.7678572535514832, 'reward_std': 0.23086076974868774, 'kl': 0.055419921875, 'epoch': 0.0} 0%| | 8/2500 [01:23<6:44:14, 9.73s/it] 0%| | 9/2500 [01:32<6:38:43, 9.60s/it] {'loss': 0.0022, 'grad_norm': 4.954634884999213, 'learning_rate': 9.964e-07, 'completion_length': 66.33928871154785, 'rewards/accuracy_reward': 0.803571492433548, 'rewards/format_reward': 1.0, 'reward': 1.8035715222358704, 'reward_std': 0.21981073170900345, 'kl': 0.053955078125, 'epoch': 0.0} 0%| | 9/2500 [01:32<6:38:43, 9.60s/it] 0%| | 10/2500 [01:41<6:33:39, 9.49s/it] {'loss': 0.0041, 'grad_norm': 3.2573138202403125, 'learning_rate': 9.959999999999999e-07, 'completion_length': 49.96428680419922, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.1181928999722004, 'kl': 0.102294921875, 'epoch': 0.0} 0%| | 10/2500 [01:41<6:33:39, 9.49s/it] 0%| | 11/2500 [01:50<6:19:54, 9.16s/it] {'loss': 0.0052, 'grad_norm': 5.071055070146385, 'learning_rate': 9.956e-07, 'completion_length': 43.69643020629883, 'rewards/accuracy_reward': 0.8928571939468384, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.11266787722706795, 'kl': 0.13037109375, 'epoch': 0.0} 0%| | 11/2500 [01:50<6:19:54, 9.16s/it] 0%| | 12/2500 [01:58<6:06:18, 8.83s/it] {'loss': 0.0064, 'grad_norm': 1.6815294348994583, 'learning_rate': 9.952e-07, 'completion_length': 31.16071605682373, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.159423828125, 'epoch': 0.0} 0%| | 12/2500 [01:58<6:06:18, 8.83s/it] 1%| | 13/2500 [02:05<5:50:18, 8.45s/it] {'loss': 0.0068, 'grad_norm': 5.457201185977461, 'learning_rate': 9.948e-07, 'completion_length': 37.375000953674316, 'rewards/accuracy_reward': 0.8571428954601288, 'rewards/format_reward': 1.0, 'reward': 1.857142984867096, 'reward_std': 0.1539071835577488, 'kl': 0.16943359375, 'epoch': 0.01} 1%| | 13/2500 [02:05<5:50:18, 8.45s/it] 1%| | 14/2500 [02:14<5:53:14, 8.53s/it] {'loss': 0.0057, 'grad_norm': 3.209626263263565, 'learning_rate': 9.944e-07, 'completion_length': 45.767860412597656, 'rewards/accuracy_reward': 0.803571492433548, 'rewards/format_reward': 1.0, 'reward': 1.8035715222358704, 'reward_std': 0.24191083014011383, 'kl': 0.14111328125, 'epoch': 0.01} 1%| | 14/2500 [02:14<5:53:14, 8.53s/it] 1%| | 15/2500 [02:23<5:55:32, 8.58s/it] {'loss': 0.0056, 'grad_norm': 5.403707482147789, 'learning_rate': 9.94e-07, 'completion_length': 43.14285850524902, 'rewards/accuracy_reward': 0.7321428954601288, 'rewards/format_reward': 1.0, 'reward': 1.732142984867096, 'reward_std': 0.2721000760793686, 'kl': 0.140625, 'epoch': 0.01} 1%| | 15/2500 [02:23<5:55:32, 8.58s/it] 1%| | 16/2500 [02:31<5:52:05, 8.50s/it] {'loss': 0.0054, 'grad_norm': 2.108519591076692, 'learning_rate': 9.936e-07, 'completion_length': 48.28571701049805, 'rewards/accuracy_reward': 0.8750000596046448, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.1071428619325161, 'kl': 0.134521484375, 'epoch': 0.01} 1%| | 16/2500 [02:31<5:52:05, 8.50s/it] 1%| | 17/2500 [02:39<5:47:17, 8.39s/it] {'loss': 0.0065, 'grad_norm': 20.821562829116463, 'learning_rate': 9.931999999999999e-07, 'completion_length': 45.53571701049805, 'rewards/accuracy_reward': 0.8750000596046448, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.1896214708685875, 'kl': 0.162353515625, 'epoch': 0.01} 1%| | 17/2500 [02:39<5:47:17, 8.39s/it] 1%| | 18/2500 [02:48<5:56:24, 8.62s/it] {'loss': 0.0061, 'grad_norm': 4.750596200275119, 'learning_rate': 9.928e-07, 'completion_length': 41.62500190734863, 'rewards/accuracy_reward': 0.8035714626312256, 'rewards/format_reward': 1.0, 'reward': 1.8035715222358704, 'reward_std': 0.1896214671432972, 'kl': 0.1533203125, 'epoch': 0.01} 1%| | 18/2500 [02:48<5:56:24, 8.62s/it] 1%| | 19/2500 [02:56<5:49:49, 8.46s/it] {'loss': 0.0043, 'grad_norm': 5.194124529450528, 'learning_rate': 9.923999999999998e-07, 'completion_length': 45.75000190734863, 'rewards/accuracy_reward': 0.8928571939468384, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.18409645557403564, 'kl': 0.108154296875, 'epoch': 0.01} 1%| | 19/2500 [02:56<5:49:49, 8.46s/it] 1%| | 20/2500 [03:05<5:47:58, 8.42s/it] {'loss': 0.0061, 'grad_norm': 2.9442495453031405, 'learning_rate': 9.92e-07, 'completion_length': 41.05357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.15185546875, 'epoch': 0.01} 1%| | 20/2500 [03:05<5:47:58, 8.42s/it] 1%| | 21/2500 [03:13<5:44:47, 8.35s/it] {'loss': 0.0067, 'grad_norm': 2.27603916539206, 'learning_rate': 9.916e-07, 'completion_length': 39.892860412597656, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.16796875, 'epoch': 0.01} 1%| | 21/2500 [03:13<5:44:47, 8.35s/it] 1%| | 22/2500 [03:21<5:48:24, 8.44s/it] {'loss': 0.005, 'grad_norm': 6.3184127860380555, 'learning_rate': 9.912e-07, 'completion_length': 49.96428680419922, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.910714328289032, 'reward_std': 0.14838215708732605, 'kl': 0.12451171875, 'epoch': 0.01} 1%| | 22/2500 [03:22<5:48:24, 8.44s/it] 1%| | 23/2500 [03:30<5:50:34, 8.49s/it] {'loss': 0.0069, 'grad_norm': 3.143526948302313, 'learning_rate': 9.908e-07, 'completion_length': 39.62500190734863, 'rewards/accuracy_reward': 0.803571492433548, 'rewards/format_reward': 1.0, 'reward': 1.8035715222358704, 'reward_std': 0.2610500380396843, 'kl': 0.173828125, 'epoch': 0.01} 1%| | 23/2500 [03:30<5:50:34, 8.49s/it] 1%| | 24/2500 [03:44<6:51:50, 9.98s/it] {'loss': 0.0052, 'grad_norm': 3.3589947509393214, 'learning_rate': 9.903999999999999e-07, 'completion_length': 49.96428871154785, 'rewards/accuracy_reward': 0.8750000596046448, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.8571429252624512, 'reward_std': 0.2142857238650322, 'kl': 0.129638671875, 'epoch': 0.01} 1%| | 24/2500 [03:44<6:51:50, 9.98s/it] 1%| | 25/2500 [03:51<6:23:07, 9.29s/it] {'loss': 0.0061, 'grad_norm': 1.9962750941149539, 'learning_rate': 9.9e-07, 'completion_length': 41.55357360839844, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.0357142873108387, 'kl': 0.15234375, 'epoch': 0.01} 1%| | 25/2500 [03:51<6:23:07, 9.29s/it] 1%| | 26/2500 [03:58<5:54:38, 8.60s/it] {'loss': 0.004, 'grad_norm': 2.6043934949184107, 'learning_rate': 9.896e-07, 'completion_length': 38.410715103149414, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.099609375, 'epoch': 0.01} 1%| | 26/2500 [03:58<5:54:38, 8.60s/it] 1%| | 27/2500 [04:05<5:36:55, 8.17s/it] {'loss': 0.0058, 'grad_norm': 3.5240040914290836, 'learning_rate': 9.892e-07, 'completion_length': 32.17857265472412, 'rewards/accuracy_reward': 0.892857164144516, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.0714285746216774, 'kl': 0.14404296875, 'epoch': 0.01} 1%| | 27/2500 [04:05<5:36:55, 8.17s/it] 1%| | 28/2500 [04:19<6:37:37, 9.65s/it] {'loss': 0.0035, 'grad_norm': 2.616155659391863, 'learning_rate': 9.888e-07, 'completion_length': 45.67857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.910714328289032, 'reward_std': 0.14838216453790665, 'kl': 0.08642578125, 'epoch': 0.01} 1%| | 28/2500 [04:19<6:37:37, 9.65s/it] 1%| | 29/2500 [04:26<6:05:56, 8.89s/it] {'loss': 0.0044, 'grad_norm': 4.4895177350078015, 'learning_rate': 9.884e-07, 'completion_length': 35.76785850524902, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.14838216826319695, 'kl': 0.10986328125, 'epoch': 0.01} 1%| | 29/2500 [04:26<6:05:56, 8.89s/it] 1%| | 30/2500 [04:33<5:49:38, 8.49s/it] {'loss': 0.0037, 'grad_norm': 1.7112218630469644, 'learning_rate': 9.88e-07, 'completion_length': 37.50000190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9464285969734192, 'reward_std': 0.1071428656578064, 'kl': 0.092529296875, 'epoch': 0.01} 1%| | 30/2500 [04:33<5:49:38, 8.49s/it] 1%| | 31/2500 [04:41<5:41:16, 8.29s/it] {'loss': 0.0052, 'grad_norm': 2.7760630900354135, 'learning_rate': 9.876e-07, 'completion_length': 37.28571701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.12939453125, 'epoch': 0.01} 1%| | 31/2500 [04:41<5:41:16, 8.29s/it] 1%|▏ | 32/2500 [04:49<5:32:26, 8.08s/it] {'loss': 0.0047, 'grad_norm': 3.0579867372236036, 'learning_rate': 9.871999999999998e-07, 'completion_length': 37.10714340209961, 'rewards/accuracy_reward': 0.785714328289032, 'rewards/format_reward': 1.0, 'reward': 1.785714328289032, 'reward_std': 0.1428571492433548, 'kl': 0.117919921875, 'epoch': 0.01} 1%|▏ | 32/2500 [04:49<5:32:26, 8.08s/it] 1%|▏ | 33/2500 [04:56<5:27:35, 7.97s/it] {'loss': 0.0045, 'grad_norm': 4.361113758843598, 'learning_rate': 9.868e-07, 'completion_length': 40.10714530944824, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750001192092896, 'reward_std': 0.14838216826319695, 'kl': 0.113525390625, 'epoch': 0.01} 1%|▏ | 33/2500 [04:56<5:27:35, 7.97s/it] 1%|▏ | 34/2500 [05:05<5:36:21, 8.18s/it] {'loss': 0.0044, 'grad_norm': 8.181315941913754, 'learning_rate': 9.864e-07, 'completion_length': 41.33928680419922, 'rewards/accuracy_reward': 0.892857164144516, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.11266788095235825, 'kl': 0.110107421875, 'epoch': 0.01} 1%|▏ | 34/2500 [05:05<5:36:21, 8.18s/it] 1%|▏ | 35/2500 [05:12<5:26:33, 7.95s/it] {'loss': 0.0043, 'grad_norm': 13.93908843267638, 'learning_rate': 9.86e-07, 'completion_length': 38.21428680419922, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.107177734375, 'epoch': 0.01} 1%|▏ | 35/2500 [05:12<5:26:33, 7.95s/it] 1%|▏ | 36/2500 [05:20<5:17:36, 7.73s/it] {'loss': 0.0044, 'grad_norm': 2.579473587615712, 'learning_rate': 9.856e-07, 'completion_length': 39.30357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.108642578125, 'epoch': 0.01} 1%|▏ | 36/2500 [05:20<5:17:36, 7.73s/it] 1%|▏ | 37/2500 [05:27<5:09:14, 7.53s/it] {'loss': 0.0046, 'grad_norm': 8.51987193953086, 'learning_rate': 9.852e-07, 'completion_length': 41.46428871154785, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.115966796875, 'epoch': 0.01} 1%|▏ | 37/2500 [05:27<5:09:14, 7.53s/it] 2%|▏ | 38/2500 [05:34<5:10:50, 7.58s/it] {'loss': 0.0055, 'grad_norm': 25.60077211341876, 'learning_rate': 9.847999999999999e-07, 'completion_length': 44.91071701049805, 'rewards/accuracy_reward': 0.8571428954601288, 'rewards/format_reward': 1.0, 'reward': 1.8571429252624512, 'reward_std': 0.1428571492433548, 'kl': 0.137939453125, 'epoch': 0.02} 2%|▏ | 38/2500 [05:34<5:10:50, 7.58s/it] 2%|▏ | 39/2500 [05:42<5:06:35, 7.47s/it] {'loss': 0.0061, 'grad_norm': 7.05525094000649, 'learning_rate': 9.844e-07, 'completion_length': 38.55357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.1428571492433548, 'kl': 0.1533203125, 'epoch': 0.02} 2%|▏ | 39/2500 [05:42<5:06:35, 7.47s/it] 2%|▏ | 40/2500 [05:49<5:08:35, 7.53s/it] {'loss': 0.0052, 'grad_norm': 2.885862323106803, 'learning_rate': 9.84e-07, 'completion_length': 42.28571701049805, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.1071428619325161, 'kl': 0.13037109375, 'epoch': 0.02} 2%|▏ | 40/2500 [05:49<5:08:35, 7.53s/it] 2%|▏ | 41/2500 [05:57<5:11:46, 7.61s/it] {'loss': 0.0058, 'grad_norm': 2.9423559114304587, 'learning_rate': 9.836e-07, 'completion_length': 42.05357360839844, 'rewards/accuracy_reward': 0.892857164144516, 'rewards/format_reward': 1.0, 'reward': 1.8928572535514832, 'reward_std': 0.18409644439816475, 'kl': 0.14501953125, 'epoch': 0.02} 2%|▏ | 41/2500 [05:57<5:11:46, 7.61s/it] 2%|▏ | 42/2500 [06:05<5:15:05, 7.69s/it] {'loss': 0.0053, 'grad_norm': 3.0090501854485243, 'learning_rate': 9.832e-07, 'completion_length': 43.19643020629883, 'rewards/accuracy_reward': 0.8571428954601288, 'rewards/format_reward': 1.0, 'reward': 1.857142984867096, 'reward_std': 0.1428571529686451, 'kl': 0.1318359375, 'epoch': 0.02} 2%|▏ | 42/2500 [06:05<5:15:05, 7.69s/it] 2%|▏ | 43/2500 [06:18<6:22:35, 9.34s/it] {'loss': 0.0056, 'grad_norm': 2.3322938439040617, 'learning_rate': 9.828e-07, 'completion_length': 45.32143020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9464285969734192, 'reward_std': 0.1071428656578064, 'kl': 0.140380859375, 'epoch': 0.02} 2%|▏ | 43/2500 [06:18<6:22:35, 9.34s/it] 2%|▏ | 44/2500 [06:25<5:56:52, 8.72s/it] {'loss': 0.0055, 'grad_norm': 2.0552740616778506, 'learning_rate': 9.824e-07, 'completion_length': 38.75000190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.1376953125, 'epoch': 0.02} 2%|▏ | 44/2500 [06:25<5:56:52, 8.72s/it] 2%|▏ | 45/2500 [06:33<5:37:06, 8.24s/it] {'loss': 0.0059, 'grad_norm': 0.3970394348653877, 'learning_rate': 9.819999999999999e-07, 'completion_length': 42.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.146484375, 'epoch': 0.02} 2%|▏ | 45/2500 [06:33<5:37:06, 8.24s/it] 2%|▏ | 46/2500 [06:40<5:23:55, 7.92s/it] {'loss': 0.008, 'grad_norm': 0.9082254775430247, 'learning_rate': 9.816e-07, 'completion_length': 37.48214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.19970703125, 'epoch': 0.02} 2%|▏ | 46/2500 [06:40<5:23:55, 7.92s/it] 2%|▏ | 47/2500 [06:47<5:16:38, 7.74s/it] {'loss': 0.0067, 'grad_norm': 9.096722893561001, 'learning_rate': 9.811999999999998e-07, 'completion_length': 40.35714530944824, 'rewards/accuracy_reward': 0.892857164144516, 'rewards/format_reward': 1.0, 'reward': 1.8928572535514832, 'reward_std': 0.0714285746216774, 'kl': 0.16650390625, 'epoch': 0.02} 2%|▏ | 47/2500 [06:47<5:16:38, 7.74s/it] 2%|▏ | 48/2500 [06:54<5:11:52, 7.63s/it] {'loss': 0.007, 'grad_norm': 1.8816110051241532, 'learning_rate': 9.808e-07, 'completion_length': 41.67857360839844, 'rewards/accuracy_reward': 0.8392857611179352, 'rewards/format_reward': 1.0, 'reward': 1.8392857909202576, 'reward_std': 0.0357142873108387, 'kl': 0.17431640625, 'epoch': 0.02} 2%|▏ | 48/2500 [06:54<5:11:52, 7.63s/it] 2%|▏ | 49/2500 [07:02<5:06:43, 7.51s/it] {'loss': 0.0061, 'grad_norm': 1.9095525278469587, 'learning_rate': 9.804e-07, 'completion_length': 42.33928680419922, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.15283203125, 'epoch': 0.02} 2%|▏ | 49/2500 [07:02<5:06:43, 7.51s/it] 2%|▏ | 50/2500 [07:09<5:07:02, 7.52s/it] {'loss': 0.0037, 'grad_norm': 4.9193527475782535, 'learning_rate': 9.8e-07, 'completion_length': 46.69643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.09326171875, 'epoch': 0.02} 2%|▏ | 50/2500 [07:09<5:07:02, 7.52s/it] 2%|▏ | 51/2500 [07:17<5:05:17, 7.48s/it] {'loss': 0.0041, 'grad_norm': 0.46084274366749933, 'learning_rate': 9.796e-07, 'completion_length': 48.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.102783203125, 'epoch': 0.02} 2%|▏ | 51/2500 [07:17<5:05:17, 7.48s/it] 2%|▏ | 52/2500 [07:25<5:20:00, 7.84s/it] {'loss': 0.0025, 'grad_norm': 2.157076967726512, 'learning_rate': 9.791999999999999e-07, 'completion_length': 45.44643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.0628662109375, 'epoch': 0.02} 2%|▏ | 52/2500 [07:25<5:20:00, 7.84s/it] 2%|▏ | 53/2500 [07:33<5:13:20, 7.68s/it] {'loss': 0.0017, 'grad_norm': 2.5798869048670467, 'learning_rate': 9.788e-07, 'completion_length': 43.50000190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285715222358704, 'reward_std': 0.0714285746216774, 'kl': 0.04150390625, 'epoch': 0.02} 2%|▏ | 53/2500 [07:33<5:13:20, 7.68s/it] 2%|▏ | 54/2500 [07:40<5:14:45, 7.72s/it] {'loss': 0.0015, 'grad_norm': 1.6943869418478181, 'learning_rate': 9.784e-07, 'completion_length': 43.57143020629883, 'rewards/accuracy_reward': 0.892857164144516, 'rewards/format_reward': 1.0, 'reward': 1.8928572535514832, 'reward_std': 0.11266787722706795, 'kl': 0.0374755859375, 'epoch': 0.02} 2%|▏ | 54/2500 [07:40<5:14:45, 7.72s/it] 2%|▏ | 55/2500 [07:48<5:11:29, 7.64s/it] {'loss': 0.0019, 'grad_norm': 1.9191200680270557, 'learning_rate': 9.78e-07, 'completion_length': 45.42857360839844, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.048095703125, 'epoch': 0.02} 2%|▏ | 55/2500 [07:48<5:11:29, 7.64s/it] 2%|▏ | 56/2500 [07:56<5:15:52, 7.75s/it] {'loss': 0.0016, 'grad_norm': 4.088663740839911, 'learning_rate': 9.776e-07, 'completion_length': 45.785715103149414, 'rewards/accuracy_reward': 0.8214285969734192, 'rewards/format_reward': 1.0, 'reward': 1.821428656578064, 'reward_std': 0.2253357544541359, 'kl': 0.0400390625, 'epoch': 0.02} 2%|▏ | 56/2500 [07:56<5:15:52, 7.75s/it] 2%|▏ | 57/2500 [08:04<5:19:36, 7.85s/it] {'loss': 0.0015, 'grad_norm': 2.1009682296865244, 'learning_rate': 9.772e-07, 'completion_length': 52.28571701049805, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.14838217198848724, 'kl': 0.0384521484375, 'epoch': 0.02} 2%|▏ | 57/2500 [08:04<5:19:36, 7.85s/it] 2%|▏ | 58/2500 [08:12<5:20:01, 7.86s/it] {'loss': 0.0019, 'grad_norm': 3.090914542033666, 'learning_rate': 9.768e-07, 'completion_length': 51.30357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.0474853515625, 'epoch': 0.02} 2%|▏ | 58/2500 [08:12<5:20:01, 7.86s/it] 2%|▏ | 59/2500 [08:20<5:19:30, 7.85s/it] {'loss': 0.0015, 'grad_norm': 2.011015421104532, 'learning_rate': 9.764e-07, 'completion_length': 51.50000190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285715222358704, 'reward_std': 0.11266787722706795, 'kl': 0.0367431640625, 'epoch': 0.02} 2%|▏ | 59/2500 [08:20<5:19:30, 7.85s/it] 2%|▏ | 60/2500 [08:28<5:29:29, 8.10s/it] {'loss': 0.0013, 'grad_norm': 1.751341549059088, 'learning_rate': 9.759999999999998e-07, 'completion_length': 52.30357551574707, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0335693359375, 'epoch': 0.02} 2%|▏ | 60/2500 [08:28<5:29:29, 8.10s/it] 2%|▏ | 61/2500 [08:36<5:27:06, 8.05s/it] {'loss': 0.0018, 'grad_norm': 1.5725902171265589, 'learning_rate': 9.756e-07, 'completion_length': 47.67857360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.0438232421875, 'epoch': 0.02} 2%|▏ | 61/2500 [08:36<5:27:06, 8.05s/it] 2%|▏ | 62/2500 [08:46<5:46:57, 8.54s/it] {'loss': 0.0018, 'grad_norm': 2.210181793577838, 'learning_rate': 9.752e-07, 'completion_length': 60.964290618896484, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.1181928962469101, 'kl': 0.04443359375, 'epoch': 0.02} 2%|▏ | 62/2500 [08:46<5:46:57, 8.54s/it] 3%|▎ | 63/2500 [08:54<5:40:22, 8.38s/it] {'loss': 0.0015, 'grad_norm': 2.1876256926302986, 'learning_rate': 9.748e-07, 'completion_length': 51.392860412597656, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.0369873046875, 'epoch': 0.03} 3%|▎ | 63/2500 [08:54<5:40:22, 8.38s/it] 3%|▎ | 64/2500 [09:02<5:38:54, 8.35s/it] {'loss': 0.0016, 'grad_norm': 1.7102735979702837, 'learning_rate': 9.744e-07, 'completion_length': 48.85714530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.04052734375, 'epoch': 0.03} 3%|▎ | 64/2500 [09:02<5:38:54, 8.35s/it] 3%|▎ | 65/2500 [09:11<5:44:33, 8.49s/it] {'loss': 0.0015, 'grad_norm': 1.6752760211351152, 'learning_rate': 9.74e-07, 'completion_length': 54.21428871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0380859375, 'epoch': 0.03} 3%|▎ | 65/2500 [09:11<5:44:33, 8.49s/it] 3%|▎ | 66/2500 [09:19<5:38:01, 8.33s/it] {'loss': 0.0016, 'grad_norm': 0.33936255899623724, 'learning_rate': 9.735999999999999e-07, 'completion_length': 50.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0396728515625, 'epoch': 0.03} 3%|▎ | 66/2500 [09:19<5:38:01, 8.33s/it] 3%|▎ | 67/2500 [09:27<5:31:03, 8.16s/it] {'loss': 0.0021, 'grad_norm': 2.9494227663564976, 'learning_rate': 9.731999999999998e-07, 'completion_length': 47.19643020629883, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.0518798828125, 'epoch': 0.03} 3%|▎ | 67/2500 [09:27<5:31:03, 8.16s/it] 3%|▎ | 68/2500 [09:35<5:26:42, 8.06s/it] {'loss': 0.0018, 'grad_norm': 1.7919111230413993, 'learning_rate': 9.728e-07, 'completion_length': 47.35714530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.045654296875, 'epoch': 0.03} 3%|▎ | 68/2500 [09:35<5:26:42, 8.06s/it] 3%|▎ | 69/2500 [09:42<5:15:39, 7.79s/it] {'loss': 0.0021, 'grad_norm': 4.294949312068558, 'learning_rate': 9.724e-07, 'completion_length': 43.62500190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.05224609375, 'epoch': 0.03} 3%|▎ | 69/2500 [09:42<5:15:39, 7.79s/it] 3%|▎ | 70/2500 [09:50<5:15:57, 7.80s/it] {'loss': 0.0021, 'grad_norm': 3.9670964690445145, 'learning_rate': 9.72e-07, 'completion_length': 42.87500190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.052001953125, 'epoch': 0.03} 3%|▎ | 70/2500 [09:50<5:15:57, 7.80s/it] 3%|▎ | 71/2500 [09:58<5:23:49, 8.00s/it] {'loss': 0.0017, 'grad_norm': 2.688626519385768, 'learning_rate': 9.716e-07, 'completion_length': 53.19643020629883, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.1181928962469101, 'kl': 0.0426025390625, 'epoch': 0.03} 3%|▎ | 71/2500 [09:58<5:23:49, 8.00s/it] 3%|▎ | 72/2500 [10:05<5:12:34, 7.72s/it] {'loss': 0.0025, 'grad_norm': 2.602951919825454, 'learning_rate': 9.712e-07, 'completion_length': 40.76785850524902, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.1181928999722004, 'kl': 0.0635986328125, 'epoch': 0.03} 3%|▎ | 72/2500 [10:05<5:12:34, 7.72s/it] 3%|▎ | 73/2500 [10:13<5:13:23, 7.75s/it] {'loss': 0.0023, 'grad_norm': 0.3175463729474886, 'learning_rate': 9.707999999999999e-07, 'completion_length': 42.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.058349609375, 'epoch': 0.03} 3%|▎ | 73/2500 [10:13<5:13:23, 7.75s/it] 3%|▎ | 74/2500 [10:21<5:12:00, 7.72s/it] {'loss': 0.0022, 'grad_norm': 1.3571518793971864, 'learning_rate': 9.704e-07, 'completion_length': 48.42857360839844, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.05615234375, 'epoch': 0.03} 3%|▎ | 74/2500 [10:21<5:12:00, 7.72s/it] 3%|▎ | 75/2500 [10:28<5:07:51, 7.62s/it] {'loss': 0.0022, 'grad_norm': 0.8046483184450275, 'learning_rate': 9.7e-07, 'completion_length': 41.62500190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0557861328125, 'epoch': 0.03} 3%|▎ | 75/2500 [10:28<5:07:51, 7.62s/it] 3%|▎ | 76/2500 [10:36<5:07:54, 7.62s/it] {'loss': 0.0021, 'grad_norm': 1.862816788520276, 'learning_rate': 9.696e-07, 'completion_length': 47.87500190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0518798828125, 'epoch': 0.03} 3%|▎ | 76/2500 [10:36<5:07:54, 7.62s/it] 3%|▎ | 77/2500 [10:43<5:06:06, 7.58s/it] {'loss': 0.002, 'grad_norm': 1.69728113274905, 'learning_rate': 9.692e-07, 'completion_length': 47.39285850524902, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0506591796875, 'epoch': 0.03} 3%|▎ | 77/2500 [10:43<5:06:06, 7.58s/it] 3%|▎ | 78/2500 [10:51<5:08:48, 7.65s/it] {'loss': 0.0021, 'grad_norm': 1.2689006386188142, 'learning_rate': 9.688e-07, 'completion_length': 47.33928871154785, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.05126953125, 'epoch': 0.03} 3%|▎ | 78/2500 [10:51<5:08:48, 7.65s/it] 3%|▎ | 79/2500 [10:58<5:07:45, 7.63s/it] {'loss': 0.002, 'grad_norm': 1.7613735807023239, 'learning_rate': 9.684e-07, 'completion_length': 43.58928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0491943359375, 'epoch': 0.03} 3%|▎ | 79/2500 [10:58<5:07:45, 7.63s/it] 3%|▎ | 80/2500 [11:06<5:08:41, 7.65s/it] {'loss': 0.002, 'grad_norm': 4.191101727168493, 'learning_rate': 9.679999999999999e-07, 'completion_length': 49.50000190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.0501708984375, 'epoch': 0.03} 3%|▎ | 80/2500 [11:06<5:08:41, 7.65s/it] 3%|▎ | 81/2500 [11:14<5:08:57, 7.66s/it] {'loss': 0.0017, 'grad_norm': 0.943011189490279, 'learning_rate': 9.676e-07, 'completion_length': 47.32143020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.04248046875, 'epoch': 0.03} 3%|▎ | 81/2500 [11:14<5:08:57, 7.66s/it] 3%|▎ | 82/2500 [11:22<5:13:44, 7.79s/it] {'loss': 0.0019, 'grad_norm': 0.16286320158001097, 'learning_rate': 9.671999999999998e-07, 'completion_length': 49.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0469970703125, 'epoch': 0.03} 3%|▎ | 82/2500 [11:22<5:13:44, 7.79s/it] 3%|▎ | 83/2500 [11:30<5:12:14, 7.75s/it] {'loss': 0.0016, 'grad_norm': 0.12407786789987872, 'learning_rate': 9.668e-07, 'completion_length': 45.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03961181640625, 'epoch': 0.03} 3%|▎ | 83/2500 [11:30<5:12:14, 7.75s/it] 3%|▎ | 84/2500 [11:37<5:10:10, 7.70s/it] {'loss': 0.0018, 'grad_norm': 4.760931613111456, 'learning_rate': 9.664e-07, 'completion_length': 42.428571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.04449462890625, 'epoch': 0.03} 3%|▎ | 84/2500 [11:37<5:10:10, 7.70s/it] 3%|▎ | 85/2500 [11:45<5:13:44, 7.79s/it] {'loss': 0.0022, 'grad_norm': 0.295163613376099, 'learning_rate': 9.66e-07, 'completion_length': 48.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0543212890625, 'epoch': 0.03} 3%|▎ | 85/2500 [11:45<5:13:44, 7.79s/it] 3%|▎ | 86/2500 [11:53<5:11:37, 7.75s/it] {'loss': 0.0017, 'grad_norm': 0.4083692878863113, 'learning_rate': 9.656e-07, 'completion_length': 44.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.042236328125, 'epoch': 0.03} 3%|▎ | 86/2500 [11:53<5:11:37, 7.75s/it] 3%|▎ | 87/2500 [12:01<5:14:16, 7.81s/it] {'loss': 0.0016, 'grad_norm': 4.435830729581053, 'learning_rate': 9.651999999999999e-07, 'completion_length': 51.23214530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0399169921875, 'epoch': 0.03} 3%|▎ | 87/2500 [12:01<5:14:16, 7.81s/it] 4%|▎ | 88/2500 [12:09<5:17:35, 7.90s/it] {'loss': 0.0019, 'grad_norm': 142.27452049638111, 'learning_rate': 9.647999999999999e-07, 'completion_length': 53.35714530944824, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.1181928962469101, 'kl': 0.046142578125, 'epoch': 0.04} 4%|▎ | 88/2500 [12:09<5:17:35, 7.90s/it] 4%|▎ | 89/2500 [12:16<5:12:06, 7.77s/it] {'loss': 0.0022, 'grad_norm': 0.18437906156027678, 'learning_rate': 9.644e-07, 'completion_length': 40.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0552978515625, 'epoch': 0.04} 4%|▎ | 89/2500 [12:16<5:12:06, 7.77s/it] 4%|▎ | 90/2500 [12:24<5:14:46, 7.84s/it] {'loss': 0.002, 'grad_norm': 1.461261701912831, 'learning_rate': 9.64e-07, 'completion_length': 52.01785850524902, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.0494384765625, 'epoch': 0.04} 4%|▎ | 90/2500 [12:24<5:14:46, 7.84s/it] 4%|▎ | 91/2500 [12:32<5:12:02, 7.77s/it] {'loss': 0.0022, 'grad_norm': 1.8079213389344742, 'learning_rate': 9.636e-07, 'completion_length': 41.07143020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0540771484375, 'epoch': 0.04} 4%|▎ | 91/2500 [12:32<5:12:02, 7.77s/it] 4%|▎ | 92/2500 [12:40<5:10:46, 7.74s/it] {'loss': 0.002, 'grad_norm': 5.564253227908698, 'learning_rate': 9.632e-07, 'completion_length': 48.08928871154785, 'rewards/accuracy_reward': 0.892857164144516, 'rewards/format_reward': 1.0, 'reward': 1.8928572535514832, 'reward_std': 0.1428571529686451, 'kl': 0.0501708984375, 'epoch': 0.04} 4%|▎ | 92/2500 [12:40<5:10:46, 7.74s/it] 4%|▎ | 93/2500 [12:48<5:18:31, 7.94s/it] {'loss': 0.002, 'grad_norm': 2.5573837457197595, 'learning_rate': 9.628e-07, 'completion_length': 57.875003814697266, 'rewards/accuracy_reward': 0.9107142984867096, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.1071428656578064, 'kl': 0.050048828125, 'epoch': 0.04} 4%|▎ | 93/2500 [12:48<5:18:31, 7.94s/it] 4%|▍ | 94/2500 [12:56<5:20:08, 7.98s/it] {'loss': 0.0014, 'grad_norm': 1.299660967395917, 'learning_rate': 9.624e-07, 'completion_length': 49.44643020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.034912109375, 'epoch': 0.04} 4%|▍ | 94/2500 [12:56<5:20:08, 7.98s/it] 4%|▍ | 95/2500 [13:04<5:18:44, 7.95s/it] {'loss': 0.0021, 'grad_norm': 8.912312745124764, 'learning_rate': 9.619999999999999e-07, 'completion_length': 47.78571701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.05224609375, 'epoch': 0.04} 4%|▍ | 95/2500 [13:04<5:18:44, 7.95s/it] 4%|▍ | 96/2500 [13:12<5:18:25, 7.95s/it] {'loss': 0.0021, 'grad_norm': 2.733357010759711, 'learning_rate': 9.616e-07, 'completion_length': 48.035715103149414, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285715222358704, 'reward_std': 0.11266787722706795, 'kl': 0.052734375, 'epoch': 0.04} 4%|▍ | 96/2500 [13:12<5:18:25, 7.95s/it] 4%|▍ | 97/2500 [13:21<5:27:41, 8.18s/it] {'loss': 0.0019, 'grad_norm': 1.5268355576832027, 'learning_rate': 9.612e-07, 'completion_length': 53.33928871154785, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.0489501953125, 'epoch': 0.04} 4%|▍ | 97/2500 [13:21<5:27:41, 8.18s/it] 4%|▍ | 98/2500 [13:28<5:20:38, 8.01s/it] {'loss': 0.0019, 'grad_norm': 0.15944891659904378, 'learning_rate': 9.608e-07, 'completion_length': 47.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0467529296875, 'epoch': 0.04} 4%|▍ | 98/2500 [13:28<5:20:38, 8.01s/it] 4%|▍ | 99/2500 [13:36<5:19:54, 7.99s/it] {'loss': 0.0019, 'grad_norm': 0.2168563839341889, 'learning_rate': 9.604e-07, 'completion_length': 49.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0467529296875, 'epoch': 0.04} 4%|▍ | 99/2500 [13:36<5:19:54, 7.99s/it] 4%|▍ | 100/2500 [13:44<5:22:07, 8.05s/it] {'loss': 0.002, 'grad_norm': 2.230148017180326, 'learning_rate': 9.6e-07, 'completion_length': 52.98214530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.048828125, 'epoch': 0.04} 4%|▍ | 100/2500 [13:44<5:22:07, 8.05s/it] 4%|▍ | 101/2500 [14:56<18:01:02, 27.04s/it] {'loss': 0.0019, 'grad_norm': 1.5933523103106297, 'learning_rate': 9.595999999999999e-07, 'completion_length': 47.80357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0469970703125, 'epoch': 0.04} 4%|▍ | 101/2500 [14:56<18:01:02, 27.04s/it] 4%|▍ | 102/2500 [15:09<15:13:58, 22.87s/it] {'loss': 0.0021, 'grad_norm': 0.18255613232427267, 'learning_rate': 9.592e-07, 'completion_length': 50.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0521240234375, 'epoch': 0.04} 4%|▍ | 102/2500 [15:09<15:13:58, 22.87s/it] 4%|▍ | 103/2500 [15:23<13:24:40, 20.14s/it] {'loss': 0.0019, 'grad_norm': 1.9672553137675317, 'learning_rate': 9.588e-07, 'completion_length': 51.785715103149414, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.04833984375, 'epoch': 0.04} 4%|▍ | 103/2500 [15:23<13:24:40, 20.14s/it] 4%|▍ | 104/2500 [15:35<11:56:10, 17.93s/it] {'loss': 0.0019, 'grad_norm': 0.24909351725749884, 'learning_rate': 9.584e-07, 'completion_length': 48.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0484619140625, 'epoch': 0.04} 4%|▍ | 104/2500 [15:35<11:56:10, 17.93s/it] 4%|▍ | 105/2500 [15:50<11:09:17, 16.77s/it] {'loss': 0.0021, 'grad_norm': 0.1802633795661363, 'learning_rate': 9.58e-07, 'completion_length': 51.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.052978515625, 'epoch': 0.04} 4%|▍ | 105/2500 [15:50<11:09:17, 16.77s/it] 4%|▍ | 106/2500 [16:03<10:28:37, 15.76s/it] {'loss': 0.0018, 'grad_norm': 0.20427748357191594, 'learning_rate': 9.576e-07, 'completion_length': 50.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.045166015625, 'epoch': 0.04} 4%|▍ | 106/2500 [16:03<10:28:37, 15.76s/it] 4%|▍ | 107/2500 [16:16<9:54:39, 14.91s/it] {'loss': 0.0022, 'grad_norm': 4.346937637739627, 'learning_rate': 9.572e-07, 'completion_length': 46.21428680419922, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.1428571492433548, 'kl': 0.0540771484375, 'epoch': 0.04} 4%|▍ | 107/2500 [16:16<9:54:39, 14.91s/it] 4%|▍ | 108/2500 [16:30<9:39:25, 14.53s/it] {'loss': 0.0021, 'grad_norm': 0.3287644718427525, 'learning_rate': 9.567999999999999e-07, 'completion_length': 51.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.051513671875, 'epoch': 0.04} 4%|▍ | 108/2500 [16:30<9:39:25, 14.53s/it] 4%|▍ | 109/2500 [16:44<9:33:35, 14.39s/it] {'loss': 0.0018, 'grad_norm': 0.7088055056416017, 'learning_rate': 9.564e-07, 'completion_length': 53.50000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0440673828125, 'epoch': 0.04} 4%|▍ | 109/2500 [16:44<9:33:35, 14.39s/it] 4%|▍ | 110/2500 [16:57<9:18:40, 14.03s/it] {'loss': 0.0023, 'grad_norm': 2.202124962817271, 'learning_rate': 9.559999999999998e-07, 'completion_length': 49.39285850524902, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.058349609375, 'epoch': 0.04} 4%|▍ | 110/2500 [16:57<9:18:40, 14.03s/it] 4%|▍ | 111/2500 [17:10<9:11:56, 13.86s/it] {'loss': 0.0017, 'grad_norm': 2.3363189460654294, 'learning_rate': 9.556e-07, 'completion_length': 50.44643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.04266357421875, 'epoch': 0.04} 4%|▍ | 111/2500 [17:10<9:11:56, 13.86s/it] 4%|▍ | 112/2500 [17:24<9:09:23, 13.80s/it] {'loss': 0.0024, 'grad_norm': 2.7648615484505905, 'learning_rate': 9.552e-07, 'completion_length': 55.60714530944824, 'rewards/accuracy_reward': 0.8750000596046448, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.07695358991622925, 'kl': 0.0601806640625, 'epoch': 0.04} 4%|▍ | 112/2500 [17:24<9:09:23, 13.80s/it] 5%|▍ | 113/2500 [17:37<9:03:37, 13.66s/it] {'loss': 0.0023, 'grad_norm': 1.0642663249481858, 'learning_rate': 9.548e-07, 'completion_length': 60.10714530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.05859375, 'epoch': 0.05} 5%|▍ | 113/2500 [17:37<9:03:37, 13.66s/it] 5%|▍ | 114/2500 [17:50<8:58:11, 13.53s/it] {'loss': 0.0015, 'grad_norm': 0.1271400228708715, 'learning_rate': 9.544e-07, 'completion_length': 50.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.036865234375, 'epoch': 0.05} 5%|▍ | 114/2500 [17:50<8:58:11, 13.53s/it] 5%|▍ | 115/2500 [18:04<8:57:02, 13.51s/it] {'loss': 0.0019, 'grad_norm': 0.18103552013128837, 'learning_rate': 9.539999999999999e-07, 'completion_length': 50.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.048095703125, 'epoch': 0.05} 5%|▍ | 115/2500 [18:04<8:57:02, 13.51s/it] 5%|▍ | 116/2500 [18:16<8:44:20, 13.20s/it] {'loss': 0.0026, 'grad_norm': 0.20442027231718998, 'learning_rate': 9.536e-07, 'completion_length': 46.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.064697265625, 'epoch': 0.05} 5%|▍ | 116/2500 [18:16<8:44:20, 13.20s/it] 5%|▍ | 117/2500 [18:30<8:48:04, 13.30s/it] {'loss': 0.0017, 'grad_norm': 1.062693264184625, 'learning_rate': 9.532e-07, 'completion_length': 59.67857551574707, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.041259765625, 'epoch': 0.05} 5%|▍ | 117/2500 [18:30<8:48:04, 13.30s/it] 5%|▍ | 118/2500 [18:44<8:52:16, 13.41s/it] {'loss': 0.0017, 'grad_norm': 0.11885603207903564, 'learning_rate': 9.527999999999999e-07, 'completion_length': 57.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0428466796875, 'epoch': 0.05} 5%|▍ | 118/2500 [18:44<8:52:16, 13.41s/it] 5%|▍ | 119/2500 [18:58<9:07:21, 13.79s/it] {'loss': 0.0021, 'grad_norm': 1.2741485377992603, 'learning_rate': 9.524e-07, 'completion_length': 57.57143211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0516357421875, 'epoch': 0.05} 5%|▍ | 119/2500 [18:58<9:07:21, 13.79s/it] 5%|▍ | 120/2500 [19:12<9:01:02, 13.64s/it] {'loss': 0.002, 'grad_norm': 0.18784028977000897, 'learning_rate': 9.52e-07, 'completion_length': 55.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0504150390625, 'epoch': 0.05} 5%|▍ | 120/2500 [19:12<9:01:02, 13.64s/it] 5%|▍ | 121/2500 [19:25<8:57:43, 13.56s/it] {'loss': 0.0023, 'grad_norm': 1.8673196211815766, 'learning_rate': 9.515999999999999e-07, 'completion_length': 59.142860412597656, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.0587158203125, 'epoch': 0.05} 5%|▍ | 121/2500 [19:25<8:57:43, 13.56s/it] 5%|▍ | 122/2500 [19:39<8:58:40, 13.59s/it] {'loss': 0.0023, 'grad_norm': 2.0790543494058795, 'learning_rate': 9.512e-07, 'completion_length': 62.17857551574707, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0574951171875, 'epoch': 0.05} 5%|▍ | 122/2500 [19:39<8:58:40, 13.59s/it] 5%|▍ | 123/2500 [19:53<9:04:14, 13.74s/it] {'loss': 0.0015, 'grad_norm': 1.8097098234437539, 'learning_rate': 9.508e-07, 'completion_length': 58.69643211364746, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.0382080078125, 'epoch': 0.05} 5%|▍ | 123/2500 [19:53<9:04:14, 13.74s/it] 5%|▍ | 124/2500 [20:06<9:01:30, 13.67s/it] {'loss': 0.0022, 'grad_norm': 0.22465112736948825, 'learning_rate': 9.503999999999999e-07, 'completion_length': 55.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05419921875, 'epoch': 0.05} 5%|▍ | 124/2500 [20:06<9:01:30, 13.67s/it] 5%|▌ | 125/2500 [20:21<9:11:46, 13.94s/it] {'loss': 0.002, 'grad_norm': 0.23188488442865393, 'learning_rate': 9.499999999999999e-07, 'completion_length': 64.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0499267578125, 'epoch': 0.05} 5%|▌ | 125/2500 [20:21<9:11:46, 13.94s/it] 5%|▌ | 126/2500 [20:35<9:18:14, 14.11s/it] {'loss': 0.0018, 'grad_norm': 0.13025376915084302, 'learning_rate': 9.496e-07, 'completion_length': 61.30357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0458984375, 'epoch': 0.05} 5%|▌ | 126/2500 [20:35<9:18:14, 14.11s/it] 5%|▌ | 127/2500 [20:48<9:06:42, 13.82s/it] {'loss': 0.0019, 'grad_norm': 4.860386552130475, 'learning_rate': 9.492e-07, 'completion_length': 54.91071701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.0477294921875, 'epoch': 0.05} 5%|▌ | 127/2500 [20:48<9:06:42, 13.82s/it] 5%|▌ | 128/2500 [21:02<9:02:47, 13.73s/it] {'loss': 0.0024, 'grad_norm': 1.2062655230935861, 'learning_rate': 9.487999999999999e-07, 'completion_length': 58.19643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0601806640625, 'epoch': 0.05} 5%|▌ | 128/2500 [21:02<9:02:47, 13.73s/it] 5%|▌ | 129/2500 [21:15<8:56:50, 13.59s/it] {'loss': 0.0013, 'grad_norm': 0.27156603074321445, 'learning_rate': 9.484e-07, 'completion_length': 55.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.05} 5%|▌ | 129/2500 [21:15<8:56:50, 13.59s/it] 5%|▌ | 130/2500 [21:28<8:49:35, 13.41s/it] {'loss': 0.0019, 'grad_norm': 0.17020370297282628, 'learning_rate': 9.479999999999999e-07, 'completion_length': 52.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0465087890625, 'epoch': 0.05} 5%|▌ | 130/2500 [21:28<8:49:35, 13.41s/it] 5%|▌ | 131/2500 [21:42<8:55:10, 13.55s/it] {'loss': 0.0021, 'grad_norm': 2.8206272330931763, 'learning_rate': 9.475999999999999e-07, 'completion_length': 49.23214530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.05120849609375, 'epoch': 0.05} 5%|▌ | 131/2500 [21:42<8:55:10, 13.55s/it] 5%|▌ | 132/2500 [21:55<8:45:40, 13.32s/it] {'loss': 0.0017, 'grad_norm': 0.22199468854583598, 'learning_rate': 9.472e-07, 'completion_length': 52.32143020629883, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.04150390625, 'epoch': 0.05} 5%|▌ | 132/2500 [21:55<8:45:40, 13.32s/it] 5%|▌ | 133/2500 [22:08<8:42:25, 13.24s/it] {'loss': 0.0015, 'grad_norm': 0.9430965216133906, 'learning_rate': 9.468e-07, 'completion_length': 50.67857360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.036376953125, 'epoch': 0.05} 5%|▌ | 133/2500 [22:08<8:42:25, 13.24s/it] 5%|▌ | 134/2500 [22:21<8:41:44, 13.23s/it] {'loss': 0.0013, 'grad_norm': 0.22632729179906388, 'learning_rate': 9.464e-07, 'completion_length': 55.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03271484375, 'epoch': 0.05} 5%|▌ | 134/2500 [22:21<8:41:44, 13.23s/it] 5%|▌ | 135/2500 [22:35<8:49:46, 13.44s/it] {'loss': 0.0022, 'grad_norm': 0.2615360457017647, 'learning_rate': 9.459999999999999e-07, 'completion_length': 52.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0540771484375, 'epoch': 0.05} 5%|▌ | 135/2500 [22:35<8:49:46, 13.44s/it] 5%|▌ | 136/2500 [22:48<8:48:38, 13.42s/it] {'loss': 0.0013, 'grad_norm': 0.8510675595596776, 'learning_rate': 9.456e-07, 'completion_length': 57.10714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.031982421875, 'epoch': 0.05} 5%|▌ | 136/2500 [22:48<8:48:38, 13.42s/it] 5%|▌ | 137/2500 [23:02<8:56:20, 13.62s/it] {'loss': 0.0017, 'grad_norm': 0.20346944873463232, 'learning_rate': 9.452e-07, 'completion_length': 55.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.043701171875, 'epoch': 0.05} 5%|▌ | 137/2500 [23:02<8:56:20, 13.62s/it] 6%|▌ | 138/2500 [23:16<8:53:38, 13.56s/it] {'loss': 0.0018, 'grad_norm': 0.29932407770460157, 'learning_rate': 9.447999999999999e-07, 'completion_length': 48.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0439453125, 'epoch': 0.06} 6%|▌ | 138/2500 [23:16<8:53:38, 13.56s/it] 6%|▌ | 139/2500 [23:29<8:51:05, 13.50s/it] {'loss': 0.0014, 'grad_norm': 0.11823459988837812, 'learning_rate': 9.444e-07, 'completion_length': 52.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03436279296875, 'epoch': 0.06} 6%|▌ | 139/2500 [23:29<8:51:05, 13.50s/it] 6%|▌ | 140/2500 [23:43<8:48:04, 13.43s/it] {'loss': 0.0014, 'grad_norm': 0.21786623270144356, 'learning_rate': 9.439999999999999e-07, 'completion_length': 51.21428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0345458984375, 'epoch': 0.06} 6%|▌ | 140/2500 [23:43<8:48:04, 13.43s/it] 6%|▌ | 141/2500 [23:57<9:02:46, 13.81s/it] {'loss': 0.0014, 'grad_norm': 0.15068577327508254, 'learning_rate': 9.436e-07, 'completion_length': 63.91071701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03460693359375, 'epoch': 0.06} 6%|▌ | 141/2500 [23:57<9:02:46, 13.81s/it] 6%|▌ | 142/2500 [24:11<9:06:04, 13.89s/it] {'loss': 0.0013, 'grad_norm': 0.24127529374092857, 'learning_rate': 9.432e-07, 'completion_length': 60.44643211364746, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0325927734375, 'epoch': 0.06} 6%|▌ | 142/2500 [24:11<9:06:04, 13.89s/it] 6%|▌ | 143/2500 [24:25<8:59:18, 13.73s/it] {'loss': 0.0026, 'grad_norm': 4.579682033772751, 'learning_rate': 9.427999999999999e-07, 'completion_length': 49.46428680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.066162109375, 'epoch': 0.06} 6%|▌ | 143/2500 [24:25<8:59:18, 13.73s/it] 6%|▌ | 144/2500 [24:38<8:55:44, 13.64s/it] {'loss': 0.0012, 'grad_norm': 0.3294089815342532, 'learning_rate': 9.424e-07, 'completion_length': 56.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03106689453125, 'epoch': 0.06} 6%|▌ | 144/2500 [24:38<8:55:44, 13.64s/it] 6%|▌ | 145/2500 [24:52<9:03:27, 13.85s/it] {'loss': 0.0014, 'grad_norm': 0.11698723049904844, 'learning_rate': 9.419999999999999e-07, 'completion_length': 64.75000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0338134765625, 'epoch': 0.06} 6%|▌ | 145/2500 [24:52<9:03:27, 13.85s/it] 6%|▌ | 146/2500 [25:05<8:53:17, 13.59s/it] {'loss': 0.001, 'grad_norm': 0.13210152020982044, 'learning_rate': 9.415999999999999e-07, 'completion_length': 50.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0252685546875, 'epoch': 0.06} 6%|▌ | 146/2500 [25:05<8:53:17, 13.59s/it] 6%|▌ | 147/2500 [25:22<9:23:26, 14.37s/it] {'loss': 0.0012, 'grad_norm': 1.0318164976817972, 'learning_rate': 9.412e-07, 'completion_length': 67.50000381469727, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.030517578125, 'epoch': 0.06} 6%|▌ | 147/2500 [25:22<9:23:26, 14.37s/it] 6%|▌ | 148/2500 [25:36<9:26:09, 14.44s/it] {'loss': 0.0017, 'grad_norm': 0.13835430471505636, 'learning_rate': 9.408e-07, 'completion_length': 72.87500381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.043701171875, 'epoch': 0.06} 6%|▌ | 148/2500 [25:36<9:26:09, 14.44s/it] 6%|▌ | 149/2500 [25:50<9:18:47, 14.26s/it] {'loss': 0.0018, 'grad_norm': 0.15648601779633578, 'learning_rate': 9.403999999999999e-07, 'completion_length': 55.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.045166015625, 'epoch': 0.06} 6%|▌ | 149/2500 [25:50<9:18:47, 14.26s/it] 6%|▌ | 150/2500 [26:05<9:25:12, 14.43s/it] {'loss': 0.002, 'grad_norm': 0.1616659018430096, 'learning_rate': 9.399999999999999e-07, 'completion_length': 64.64286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04931640625, 'epoch': 0.06} 6%|▌ | 150/2500 [26:05<9:25:12, 14.43s/it] 6%|▌ | 151/2500 [26:19<9:25:22, 14.44s/it] {'loss': 0.0011, 'grad_norm': 0.7253642017359166, 'learning_rate': 9.396e-07, 'completion_length': 60.51785850524902, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0263671875, 'epoch': 0.06} 6%|▌ | 151/2500 [26:19<9:25:22, 14.44s/it] 6%|▌ | 152/2500 [26:35<9:42:01, 14.87s/it] {'loss': 0.001, 'grad_norm': 0.2984971805402914, 'learning_rate': 9.391999999999999e-07, 'completion_length': 72.51786041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0247802734375, 'epoch': 0.06} 6%|▌ | 152/2500 [26:35<9:42:01, 14.87s/it] 6%|▌ | 153/2500 [26:49<9:34:44, 14.69s/it] {'loss': 0.0014, 'grad_norm': 0.10843531269591301, 'learning_rate': 9.387999999999999e-07, 'completion_length': 63.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03387451171875, 'epoch': 0.06} 6%|▌ | 153/2500 [26:49<9:34:44, 14.69s/it] 6%|▌ | 154/2500 [27:03<9:25:58, 14.48s/it] {'loss': 0.0015, 'grad_norm': 1.691420997824465, 'learning_rate': 9.384e-07, 'completion_length': 60.03571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03741455078125, 'epoch': 0.06} 6%|▌ | 154/2500 [27:03<9:25:58, 14.48s/it] 6%|▌ | 155/2500 [27:18<9:24:19, 14.44s/it] {'loss': 0.0011, 'grad_norm': 0.11665816151484247, 'learning_rate': 9.379999999999998e-07, 'completion_length': 59.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.06} 6%|▌ | 155/2500 [27:18<9:24:19, 14.44s/it] 6%|▌ | 156/2500 [27:32<9:23:13, 14.42s/it] {'loss': 0.0012, 'grad_norm': 0.10124514234309785, 'learning_rate': 9.375999999999999e-07, 'completion_length': 64.16071891784668, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0306396484375, 'epoch': 0.06} 6%|▌ | 156/2500 [27:32<9:23:13, 14.42s/it] 6%|▋ | 157/2500 [27:47<9:26:49, 14.52s/it] {'loss': 0.0019, 'grad_norm': 2.309412081142838, 'learning_rate': 9.372e-07, 'completion_length': 66.51786041259766, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.04833984375, 'epoch': 0.06} 6%|▋ | 157/2500 [27:47<9:26:49, 14.52s/it] 6%|▋ | 158/2500 [28:00<9:13:27, 14.18s/it] {'loss': 0.0007, 'grad_norm': 0.06496887754180301, 'learning_rate': 9.368e-07, 'completion_length': 64.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018402099609375, 'epoch': 0.06} 6%|▋ | 158/2500 [28:00<9:13:27, 14.18s/it] 6%|▋ | 159/2500 [28:15<9:13:57, 14.20s/it] {'loss': 0.0009, 'grad_norm': 0.12989026366932432, 'learning_rate': 9.363999999999999e-07, 'completion_length': 60.14286231994629, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0220947265625, 'epoch': 0.06} 6%|▋ | 159/2500 [28:15<9:13:57, 14.20s/it] 6%|▋ | 160/2500 [28:29<9:13:39, 14.20s/it] {'loss': 0.0015, 'grad_norm': 1.0029836737853237, 'learning_rate': 9.36e-07, 'completion_length': 56.42857360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03643798828125, 'epoch': 0.06} 6%|▋ | 160/2500 [28:29<9:13:39, 14.20s/it] 6%|▋ | 161/2500 [28:43<9:17:08, 14.29s/it] {'loss': 0.0014, 'grad_norm': 1.3069833124976478, 'learning_rate': 9.356e-07, 'completion_length': 58.91071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0341796875, 'epoch': 0.06} 6%|▋ | 161/2500 [28:43<9:17:08, 14.29s/it] 6%|▋ | 162/2500 [28:56<9:00:36, 13.87s/it] {'loss': 0.0012, 'grad_norm': 0.317300737417184, 'learning_rate': 9.352e-07, 'completion_length': 46.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029541015625, 'epoch': 0.06} 6%|▋ | 162/2500 [28:56<9:00:36, 13.87s/it] 7%|▋ | 163/2500 [29:11<9:16:11, 14.28s/it] {'loss': 0.0015, 'grad_norm': 0.3338760046179353, 'learning_rate': 9.347999999999999e-07, 'completion_length': 61.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0382080078125, 'epoch': 0.07} 7%|▋ | 163/2500 [29:11<9:16:11, 14.28s/it] 7%|▋ | 164/2500 [29:26<9:19:10, 14.36s/it] {'loss': 0.0014, 'grad_norm': 0.8986619829814221, 'learning_rate': 9.344e-07, 'completion_length': 62.160715103149414, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0352783203125, 'epoch': 0.07} 7%|▋ | 164/2500 [29:26<9:19:10, 14.36s/it] 7%|▋ | 165/2500 [29:41<9:23:05, 14.47s/it] {'loss': 0.0011, 'grad_norm': 0.9092750581982901, 'learning_rate': 9.34e-07, 'completion_length': 57.67857360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02813720703125, 'epoch': 0.07} 7%|▋ | 165/2500 [29:41<9:23:05, 14.47s/it] 7%|▋ | 166/2500 [29:55<9:25:56, 14.55s/it] {'loss': 0.0014, 'grad_norm': 0.21137344957989276, 'learning_rate': 9.335999999999999e-07, 'completion_length': 59.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0352783203125, 'epoch': 0.07} 7%|▋ | 166/2500 [29:55<9:25:56, 14.55s/it] 7%|▋ | 167/2500 [30:09<9:12:22, 14.21s/it] {'loss': 0.0011, 'grad_norm': 0.13672683539614833, 'learning_rate': 9.332e-07, 'completion_length': 47.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02716064453125, 'epoch': 0.07} 7%|▋ | 167/2500 [30:09<9:12:22, 14.21s/it] 7%|▋ | 168/2500 [30:25<9:31:38, 14.71s/it] {'loss': 0.0009, 'grad_norm': 0.1031665869655269, 'learning_rate': 9.327999999999999e-07, 'completion_length': 64.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02142333984375, 'epoch': 0.07} 7%|▋ | 168/2500 [30:25<9:31:38, 14.71s/it] 7%|▋ | 169/2500 [30:39<9:23:41, 14.51s/it] {'loss': 0.0009, 'grad_norm': 0.09425029719062317, 'learning_rate': 9.324e-07, 'completion_length': 54.39285850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02349853515625, 'epoch': 0.07} 7%|▋ | 169/2500 [30:39<9:23:41, 14.51s/it] 7%|▋ | 170/2500 [30:53<9:15:07, 14.30s/it] {'loss': 0.0009, 'grad_norm': 0.25542333353217483, 'learning_rate': 9.32e-07, 'completion_length': 57.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02294921875, 'epoch': 0.07} 7%|▋ | 170/2500 [30:53<9:15:07, 14.30s/it] 7%|▋ | 171/2500 [31:06<9:10:59, 14.19s/it] {'loss': 0.0013, 'grad_norm': 0.1428585502012657, 'learning_rate': 9.315999999999999e-07, 'completion_length': 56.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03131103515625, 'epoch': 0.07} 7%|▋ | 171/2500 [31:06<9:10:59, 14.19s/it] 7%|▋ | 172/2500 [31:20<9:07:42, 14.12s/it] {'loss': 0.0009, 'grad_norm': 0.10404684069182156, 'learning_rate': 9.312e-07, 'completion_length': 55.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0213623046875, 'epoch': 0.07} 7%|▋ | 172/2500 [31:20<9:07:42, 14.12s/it] 7%|▋ | 173/2500 [31:36<9:24:24, 14.55s/it] {'loss': 0.0014, 'grad_norm': 2.6095624618262296, 'learning_rate': 9.307999999999999e-07, 'completion_length': 57.28571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03424072265625, 'epoch': 0.07} 7%|▋ | 173/2500 [31:36<9:24:24, 14.55s/it] 7%|▋ | 174/2500 [31:50<9:21:03, 14.47s/it] {'loss': 0.0012, 'grad_norm': 1.695347524805757, 'learning_rate': 9.303999999999999e-07, 'completion_length': 61.60714530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0824786126613617, 'kl': 0.03094482421875, 'epoch': 0.07} 7%|▋ | 174/2500 [31:50<9:21:03, 14.47s/it] 7%|▋ | 175/2500 [32:10<10:23:04, 16.08s/it] {'loss': 0.0012, 'grad_norm': 0.6885646470067937, 'learning_rate': 9.3e-07, 'completion_length': 71.87500381469727, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0289306640625, 'epoch': 0.07} 7%|▋ | 175/2500 [32:10<10:23:04, 16.08s/it] 7%|▋ | 176/2500 [32:24<9:54:17, 15.34s/it] {'loss': 0.0016, 'grad_norm': 0.15611859195097577, 'learning_rate': 9.296e-07, 'completion_length': 52.17857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03955078125, 'epoch': 0.07} 7%|▋ | 176/2500 [32:24<9:54:17, 15.34s/it] 7%|▋ | 177/2500 [32:37<9:30:01, 14.72s/it] {'loss': 0.0009, 'grad_norm': 0.9913215280201426, 'learning_rate': 9.292e-07, 'completion_length': 49.37500190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0235595703125, 'epoch': 0.07} 7%|▋ | 177/2500 [32:37<9:30:01, 14.72s/it] 7%|▋ | 178/2500 [32:51<9:21:36, 14.51s/it] {'loss': 0.0018, 'grad_norm': 1.9143244305694789, 'learning_rate': 9.287999999999999e-07, 'completion_length': 61.98214530944824, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.07695359364151955, 'kl': 0.044921875, 'epoch': 0.07} 7%|▋ | 178/2500 [32:51<9:21:36, 14.51s/it] 7%|▋ | 179/2500 [33:05<9:10:58, 14.24s/it] {'loss': 0.0011, 'grad_norm': 0.1461993939565281, 'learning_rate': 9.284e-07, 'completion_length': 55.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02825927734375, 'epoch': 0.07} 7%|▋ | 179/2500 [33:05<9:10:58, 14.24s/it] 7%|▋ | 180/2500 [33:18<9:04:38, 14.09s/it] {'loss': 0.0009, 'grad_norm': 0.09982784129309277, 'learning_rate': 9.28e-07, 'completion_length': 59.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.07} 7%|▋ | 180/2500 [33:18<9:04:38, 14.09s/it] 7%|▋ | 181/2500 [33:33<9:07:03, 14.15s/it] {'loss': 0.0012, 'grad_norm': 0.17453098788748816, 'learning_rate': 9.275999999999999e-07, 'completion_length': 53.875003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0302734375, 'epoch': 0.07} 7%|▋ | 181/2500 [33:33<9:07:03, 14.15s/it] 7%|▋ | 182/2500 [33:47<9:06:37, 14.15s/it] {'loss': 0.0012, 'grad_norm': 1.0230696409844107, 'learning_rate': 9.272e-07, 'completion_length': 58.66071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02880859375, 'epoch': 0.07} 7%|▋ | 182/2500 [33:47<9:06:37, 14.15s/it] 7%|▋ | 183/2500 [34:02<9:15:43, 14.39s/it] {'loss': 0.0013, 'grad_norm': 0.21494025707574868, 'learning_rate': 9.268e-07, 'completion_length': 58.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03192138671875, 'epoch': 0.07} 7%|▋ | 183/2500 [34:02<9:15:43, 14.39s/it] 7%|▋ | 184/2500 [34:17<9:22:41, 14.58s/it] {'loss': 0.0015, 'grad_norm': 0.12711380765450203, 'learning_rate': 9.263999999999999e-07, 'completion_length': 59.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.07} 7%|▋ | 184/2500 [34:17<9:22:41, 14.58s/it] 7%|▋ | 185/2500 [34:30<9:09:18, 14.24s/it] {'loss': 0.0009, 'grad_norm': 0.10913191310914204, 'learning_rate': 9.26e-07, 'completion_length': 53.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02191162109375, 'epoch': 0.07} 7%|▋ | 185/2500 [34:30<9:09:18, 14.24s/it] 7%|▋ | 186/2500 [34:44<9:00:12, 14.01s/it] {'loss': 0.0011, 'grad_norm': 0.09172965362722069, 'learning_rate': 9.256e-07, 'completion_length': 52.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0279541015625, 'epoch': 0.07} 7%|▋ | 186/2500 [34:44<9:00:12, 14.01s/it] 7%|▋ | 187/2500 [34:58<9:05:39, 14.15s/it] {'loss': 0.0012, 'grad_norm': 0.09791776126299782, 'learning_rate': 9.251999999999999e-07, 'completion_length': 58.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029296875, 'epoch': 0.07} 7%|▋ | 187/2500 [34:58<9:05:39, 14.15s/it] 8%|▊ | 188/2500 [35:12<9:00:15, 14.02s/it] {'loss': 0.0017, 'grad_norm': 1.0747679005323911, 'learning_rate': 9.247999999999999e-07, 'completion_length': 56.19643020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0421142578125, 'epoch': 0.08} 8%|▊ | 188/2500 [35:12<9:00:15, 14.02s/it] 8%|▊ | 189/2500 [35:25<8:53:42, 13.86s/it] {'loss': 0.0017, 'grad_norm': 2.5274708658050007, 'learning_rate': 9.244e-07, 'completion_length': 53.17857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0824786126613617, 'kl': 0.042724609375, 'epoch': 0.08} 8%|▊ | 189/2500 [35:25<8:53:42, 13.86s/it] 8%|▊ | 190/2500 [35:40<8:58:44, 13.99s/it] {'loss': 0.0011, 'grad_norm': 0.996141416390738, 'learning_rate': 9.24e-07, 'completion_length': 57.33928871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02850341796875, 'epoch': 0.08} 8%|▊ | 190/2500 [35:40<8:58:44, 13.99s/it] 8%|▊ | 191/2500 [35:53<8:55:45, 13.92s/it] {'loss': 0.0014, 'grad_norm': 0.14981324581697733, 'learning_rate': 9.235999999999999e-07, 'completion_length': 54.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.034423828125, 'epoch': 0.08} 8%|▊ | 191/2500 [35:53<8:55:45, 13.92s/it] 8%|▊ | 192/2500 [36:08<9:07:57, 14.24s/it] {'loss': 0.0015, 'grad_norm': 0.11934364456791773, 'learning_rate': 9.232e-07, 'completion_length': 59.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0380859375, 'epoch': 0.08} 8%|▊ | 192/2500 [36:08<9:07:57, 14.24s/it] 8%|▊ | 193/2500 [36:23<9:07:15, 14.23s/it] {'loss': 0.0016, 'grad_norm': 0.8631152224146978, 'learning_rate': 9.227999999999999e-07, 'completion_length': 62.60714530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0404052734375, 'epoch': 0.08} 8%|▊ | 193/2500 [36:23<9:07:15, 14.23s/it] 8%|▊ | 194/2500 [36:37<9:05:48, 14.20s/it] {'loss': 0.001, 'grad_norm': 0.12292045219857756, 'learning_rate': 9.224e-07, 'completion_length': 54.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0240478515625, 'epoch': 0.08} 8%|▊ | 194/2500 [36:37<9:05:48, 14.20s/it] 8%|▊ | 195/2500 [36:50<8:53:51, 13.90s/it] {'loss': 0.0009, 'grad_norm': 2.5882547843536012, 'learning_rate': 9.22e-07, 'completion_length': 47.89285850524902, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.022705078125, 'epoch': 0.08} 8%|▊ | 195/2500 [36:50<8:53:51, 13.90s/it] 8%|▊ | 196/2500 [37:06<9:16:58, 14.50s/it] {'loss': 0.0013, 'grad_norm': 0.15208650052545827, 'learning_rate': 9.215999999999999e-07, 'completion_length': 52.750003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032470703125, 'epoch': 0.08} 8%|▊ | 196/2500 [37:06<9:16:58, 14.50s/it] 8%|▊ | 197/2500 [37:20<9:09:06, 14.31s/it] {'loss': 0.001, 'grad_norm': 0.1762362941287586, 'learning_rate': 9.212e-07, 'completion_length': 61.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0257568359375, 'epoch': 0.08} 8%|▊ | 197/2500 [37:20<9:09:06, 14.31s/it] 8%|▊ | 198/2500 [37:34<9:04:05, 14.18s/it] {'loss': 0.0012, 'grad_norm': 0.18930511899035546, 'learning_rate': 9.207999999999999e-07, 'completion_length': 53.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029541015625, 'epoch': 0.08} 8%|▊ | 198/2500 [37:34<9:04:05, 14.18s/it] 8%|▊ | 199/2500 [37:47<9:00:15, 14.09s/it] {'loss': 0.0016, 'grad_norm': 1.693649516336547, 'learning_rate': 9.203999999999999e-07, 'completion_length': 62.42857551574707, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0391845703125, 'epoch': 0.08} 8%|▊ | 199/2500 [37:47<9:00:15, 14.09s/it] 8%|▊ | 200/2500 [38:01<8:50:25, 13.84s/it] {'loss': 0.0013, 'grad_norm': 1.674886782341409, 'learning_rate': 9.2e-07, 'completion_length': 51.67857360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0325927734375, 'epoch': 0.08} 8%|▊ | 200/2500 [38:01<8:50:25, 13.84s/it] 8%|▊ | 201/2500 [39:17<20:45:25, 32.50s/it] {'loss': 0.0012, 'grad_norm': 0.511506653860972, 'learning_rate': 9.196e-07, 'completion_length': 61.83928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0299072265625, 'epoch': 0.08} 8%|▊ | 201/2500 [39:17<20:45:25, 32.50s/it] 8%|▊ | 202/2500 [39:34<17:52:59, 28.02s/it] {'loss': 0.0013, 'grad_norm': 1.0327992264547294, 'learning_rate': 9.192e-07, 'completion_length': 73.92857360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.03240966796875, 'epoch': 0.08} 8%|▊ | 202/2500 [39:34<17:52:59, 28.02s/it] 8%|▊ | 203/2500 [39:49<15:18:02, 23.98s/it] {'loss': 0.0009, 'grad_norm': 0.12161331240970279, 'learning_rate': 9.187999999999999e-07, 'completion_length': 64.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0218505859375, 'epoch': 0.08} 8%|▊ | 203/2500 [39:49<15:18:02, 23.98s/it] 8%|▊ | 204/2500 [40:03<13:28:44, 21.13s/it] {'loss': 0.0011, 'grad_norm': 0.16723279694925924, 'learning_rate': 9.184e-07, 'completion_length': 57.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02813720703125, 'epoch': 0.08} 8%|▊ | 204/2500 [40:03<13:28:44, 21.13s/it] 8%|▊ | 205/2500 [40:17<11:57:41, 18.76s/it] {'loss': 0.0016, 'grad_norm': 0.13157071379552635, 'learning_rate': 9.18e-07, 'completion_length': 56.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038818359375, 'epoch': 0.08} 8%|▊ | 205/2500 [40:17<11:57:41, 18.76s/it] 8%|▊ | 206/2500 [40:30<10:52:36, 17.07s/it] {'loss': 0.001, 'grad_norm': 0.10972814314888407, 'learning_rate': 9.175999999999999e-07, 'completion_length': 54.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02435302734375, 'epoch': 0.08} 8%|▊ | 206/2500 [40:30<10:52:36, 17.07s/it] 8%|▊ | 207/2500 [40:43<10:09:17, 15.94s/it] {'loss': 0.0016, 'grad_norm': 0.7113116623826022, 'learning_rate': 9.172e-07, 'completion_length': 52.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.04010009765625, 'epoch': 0.08} 8%|▊ | 207/2500 [40:43<10:09:17, 15.94s/it] 8%|▊ | 208/2500 [40:56<9:40:11, 15.19s/it] {'loss': 0.0012, 'grad_norm': 0.11888270643208261, 'learning_rate': 9.168e-07, 'completion_length': 59.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029541015625, 'epoch': 0.08} 8%|▊ | 208/2500 [40:56<9:40:11, 15.19s/it] 8%|▊ | 209/2500 [41:11<9:32:41, 15.00s/it] {'loss': 0.0018, 'grad_norm': 0.20497510565620772, 'learning_rate': 9.163999999999999e-07, 'completion_length': 60.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.043701171875, 'epoch': 0.08} 8%|▊ | 209/2500 [41:11<9:32:41, 15.00s/it] 8%|▊ | 210/2500 [41:24<9:13:02, 14.49s/it] {'loss': 0.0016, 'grad_norm': 0.09899755776047041, 'learning_rate': 9.16e-07, 'completion_length': 52.892860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04058837890625, 'epoch': 0.08} 8%|▊ | 210/2500 [41:24<9:13:02, 14.49s/it] 8%|▊ | 211/2500 [41:39<9:12:12, 14.47s/it] {'loss': 0.001, 'grad_norm': 0.10231958810808639, 'learning_rate': 9.156e-07, 'completion_length': 58.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02593994140625, 'epoch': 0.08} 8%|▊ | 211/2500 [41:39<9:12:12, 14.47s/it] 8%|▊ | 212/2500 [41:54<9:23:31, 14.78s/it] {'loss': 0.0013, 'grad_norm': 0.641392349225121, 'learning_rate': 9.151999999999999e-07, 'completion_length': 63.46428871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.032470703125, 'epoch': 0.08} 8%|▊ | 212/2500 [41:54<9:23:31, 14.78s/it] 9%|▊ | 213/2500 [42:08<9:06:55, 14.35s/it] {'loss': 0.0012, 'grad_norm': 0.1512150777496226, 'learning_rate': 9.147999999999999e-07, 'completion_length': 57.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02984619140625, 'epoch': 0.09} 9%|▊ | 213/2500 [42:08<9:06:55, 14.35s/it] 9%|▊ | 214/2500 [42:21<8:54:21, 14.03s/it] {'loss': 0.0021, 'grad_norm': 0.25416378548498525, 'learning_rate': 9.144e-07, 'completion_length': 51.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05224609375, 'epoch': 0.09} 9%|▊ | 214/2500 [42:21<8:54:21, 14.03s/it] 9%|▊ | 215/2500 [42:34<8:43:51, 13.76s/it] {'loss': 0.0014, 'grad_norm': 1.2267172387148142, 'learning_rate': 9.14e-07, 'completion_length': 50.19643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.035400390625, 'epoch': 0.09} 9%|▊ | 215/2500 [42:34<8:43:51, 13.76s/it] 9%|▊ | 216/2500 [42:53<9:40:41, 15.25s/it] {'loss': 0.0019, 'grad_norm': 1.1733977572926442, 'learning_rate': 9.135999999999999e-07, 'completion_length': 59.41071701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9464285969734192, 'reward_std': 0.1071428656578064, 'kl': 0.048583984375, 'epoch': 0.09} 9%|▊ | 216/2500 [42:53<9:40:41, 15.25s/it] 9%|▊ | 217/2500 [43:06<9:20:18, 14.73s/it] {'loss': 0.0009, 'grad_norm': 1.667888873572049, 'learning_rate': 9.132e-07, 'completion_length': 53.85714530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.02264404296875, 'epoch': 0.09} 9%|▊ | 217/2500 [43:06<9:20:18, 14.73s/it] 9%|▊ | 218/2500 [43:20<9:12:25, 14.52s/it] {'loss': 0.0012, 'grad_norm': 2.0359855030884, 'learning_rate': 9.127999999999999e-07, 'completion_length': 52.10714530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.03106689453125, 'epoch': 0.09} 9%|▊ | 218/2500 [43:20<9:12:25, 14.52s/it] 9%|▉ | 219/2500 [43:34<8:59:29, 14.19s/it] {'loss': 0.0017, 'grad_norm': 0.19051190104556873, 'learning_rate': 9.123999999999999e-07, 'completion_length': 49.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.043701171875, 'epoch': 0.09} 9%|▉ | 219/2500 [43:34<8:59:29, 14.19s/it] 9%|▉ | 220/2500 [43:47<8:44:11, 13.79s/it] {'loss': 0.0011, 'grad_norm': 0.2090847944719581, 'learning_rate': 9.12e-07, 'completion_length': 44.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027587890625, 'epoch': 0.09} 9%|▉ | 220/2500 [43:47<8:44:11, 13.79s/it] 9%|▉ | 221/2500 [43:59<8:33:41, 13.52s/it] {'loss': 0.001, 'grad_norm': 0.12620587732939434, 'learning_rate': 9.115999999999999e-07, 'completion_length': 46.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0240478515625, 'epoch': 0.09} 9%|▉ | 221/2500 [43:59<8:33:41, 13.52s/it] 9%|▉ | 222/2500 [44:13<8:32:09, 13.49s/it] {'loss': 0.0015, 'grad_norm': 0.12412352465388399, 'learning_rate': 9.112e-07, 'completion_length': 50.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03759765625, 'epoch': 0.09} 9%|▉ | 222/2500 [44:13<8:32:09, 13.49s/it] 9%|▉ | 223/2500 [44:27<8:38:10, 13.65s/it] {'loss': 0.0018, 'grad_norm': 0.1415179609075741, 'learning_rate': 9.108e-07, 'completion_length': 56.875003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0438232421875, 'epoch': 0.09} 9%|▉ | 223/2500 [44:27<8:38:10, 13.65s/it] 9%|▉ | 224/2500 [44:40<8:31:49, 13.49s/it] {'loss': 0.0008, 'grad_norm': 0.12371945585606263, 'learning_rate': 9.103999999999999e-07, 'completion_length': 52.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0205078125, 'epoch': 0.09} 9%|▉ | 224/2500 [44:40<8:31:49, 13.49s/it] 9%|▉ | 225/2500 [44:53<8:30:36, 13.47s/it] {'loss': 0.0013, 'grad_norm': 0.18593646014212906, 'learning_rate': 9.1e-07, 'completion_length': 51.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03265380859375, 'epoch': 0.09} 9%|▉ | 225/2500 [44:53<8:30:36, 13.47s/it] 9%|▉ | 226/2500 [45:06<8:25:26, 13.34s/it] {'loss': 0.0016, 'grad_norm': 0.14347938050564582, 'learning_rate': 9.095999999999999e-07, 'completion_length': 48.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03875732421875, 'epoch': 0.09} 9%|▉ | 226/2500 [45:06<8:25:26, 13.34s/it] 9%|▉ | 227/2500 [45:19<8:19:37, 13.19s/it] {'loss': 0.0016, 'grad_norm': 2.7319652222280992, 'learning_rate': 9.092e-07, 'completion_length': 43.32143211364746, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285715222358704, 'reward_std': 0.0714285746216774, 'kl': 0.0408935546875, 'epoch': 0.09} 9%|▉ | 227/2500 [45:19<8:19:37, 13.19s/it] 9%|▉ | 228/2500 [45:33<8:23:04, 13.29s/it] {'loss': 0.0014, 'grad_norm': 0.16326066858696944, 'learning_rate': 9.088e-07, 'completion_length': 58.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.035888671875, 'epoch': 0.09} 9%|▉ | 228/2500 [45:33<8:23:04, 13.29s/it] 9%|▉ | 229/2500 [45:46<8:21:06, 13.24s/it] {'loss': 0.0012, 'grad_norm': 0.17029572645867447, 'learning_rate': 9.084e-07, 'completion_length': 52.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02978515625, 'epoch': 0.09} 9%|▉ | 229/2500 [45:46<8:21:06, 13.24s/it] 9%|▉ | 230/2500 [46:00<8:35:05, 13.61s/it] {'loss': 0.0016, 'grad_norm': 4.210631547338229, 'learning_rate': 9.08e-07, 'completion_length': 59.67857360839844, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.0357142873108387, 'kl': 0.04052734375, 'epoch': 0.09} 9%|▉ | 230/2500 [46:00<8:35:05, 13.61s/it] 9%|▉ | 231/2500 [46:14<8:33:53, 13.59s/it] {'loss': 0.0011, 'grad_norm': 1.4444384906785876, 'learning_rate': 9.075999999999999e-07, 'completion_length': 55.35714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02764892578125, 'epoch': 0.09} 9%|▉ | 231/2500 [46:14<8:33:53, 13.59s/it] 9%|▉ | 232/2500 [46:30<9:00:34, 14.30s/it] {'loss': 0.0013, 'grad_norm': 0.07759759568256835, 'learning_rate': 9.072e-07, 'completion_length': 63.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03314208984375, 'epoch': 0.09} 9%|▉ | 232/2500 [46:30<9:00:34, 14.30s/it] 9%|▉ | 233/2500 [46:43<8:50:51, 14.05s/it] {'loss': 0.0009, 'grad_norm': 1.7921559280505925, 'learning_rate': 9.068e-07, 'completion_length': 51.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021728515625, 'epoch': 0.09} 9%|▉ | 233/2500 [46:43<8:50:51, 14.05s/it] 9%|▉ | 234/2500 [46:57<8:47:00, 13.95s/it] {'loss': 0.0014, 'grad_norm': 0.12574793940764434, 'learning_rate': 9.063999999999999e-07, 'completion_length': 52.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03594970703125, 'epoch': 0.09} 9%|▉ | 234/2500 [46:57<8:47:00, 13.95s/it] 9%|▉ | 235/2500 [47:11<8:43:42, 13.87s/it] {'loss': 0.0007, 'grad_norm': 0.42882935918059184, 'learning_rate': 9.06e-07, 'completion_length': 56.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017578125, 'epoch': 0.09} 9%|▉ | 235/2500 [47:11<8:43:42, 13.87s/it] 9%|▉ | 236/2500 [47:24<8:33:20, 13.60s/it] {'loss': 0.0012, 'grad_norm': 1.1470278206337068, 'learning_rate': 9.056e-07, 'completion_length': 56.69643211364746, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02880859375, 'epoch': 0.09} 9%|▉ | 236/2500 [47:24<8:33:20, 13.60s/it] 9%|▉ | 237/2500 [47:37<8:32:06, 13.58s/it] {'loss': 0.0008, 'grad_norm': 0.09644147652412167, 'learning_rate': 9.051999999999999e-07, 'completion_length': 56.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0208740234375, 'epoch': 0.09} 9%|▉ | 237/2500 [47:37<8:32:06, 13.58s/it] 10%|▉ | 238/2500 [47:51<8:36:59, 13.71s/it] {'loss': 0.0014, 'grad_norm': 1.3761155258839683, 'learning_rate': 9.048e-07, 'completion_length': 58.46428871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03411865234375, 'epoch': 0.1} 10%|▉ | 238/2500 [47:51<8:36:59, 13.71s/it] 10%|▉ | 239/2500 [48:05<8:34:45, 13.66s/it] {'loss': 0.0013, 'grad_norm': 0.7379556329780173, 'learning_rate': 9.044e-07, 'completion_length': 61.250003814697266, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0322265625, 'epoch': 0.1} 10%|▉ | 239/2500 [48:05<8:34:45, 13.66s/it] 10%|▉ | 240/2500 [48:19<8:39:38, 13.80s/it] {'loss': 0.0011, 'grad_norm': 0.11942669250017907, 'learning_rate': 9.039999999999999e-07, 'completion_length': 59.69643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.1} 10%|▉ | 240/2500 [48:19<8:39:38, 13.80s/it] 10%|▉ | 241/2500 [48:33<8:38:25, 13.77s/it] {'loss': 0.0018, 'grad_norm': 1.0996901712995848, 'learning_rate': 9.035999999999999e-07, 'completion_length': 58.982147216796875, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0440673828125, 'epoch': 0.1} 10%|▉ | 241/2500 [48:33<8:38:25, 13.77s/it] 10%|▉ | 242/2500 [48:46<8:36:59, 13.74s/it] {'loss': 0.0009, 'grad_norm': 1.0798360745906366, 'learning_rate': 9.032e-07, 'completion_length': 62.08928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02276611328125, 'epoch': 0.1} 10%|▉ | 242/2500 [48:46<8:36:59, 13.74s/it] 10%|▉ | 243/2500 [48:59<8:24:07, 13.40s/it] {'loss': 0.0014, 'grad_norm': 1.3798223836292907, 'learning_rate': 9.028e-07, 'completion_length': 50.10714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0338134765625, 'epoch': 0.1} 10%|▉ | 243/2500 [48:59<8:24:07, 13.40s/it] 10%|▉ | 244/2500 [49:12<8:21:48, 13.35s/it] {'loss': 0.0008, 'grad_norm': 0.10131301927846939, 'learning_rate': 9.023999999999999e-07, 'completion_length': 49.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0208740234375, 'epoch': 0.1} 10%|▉ | 244/2500 [49:12<8:21:48, 13.35s/it] 10%|▉ | 245/2500 [49:26<8:24:09, 13.41s/it] {'loss': 0.0015, 'grad_norm': 0.9975774559558845, 'learning_rate': 9.02e-07, 'completion_length': 52.19643211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.037353515625, 'epoch': 0.1} 10%|▉ | 245/2500 [49:26<8:24:09, 13.41s/it] 10%|▉ | 246/2500 [49:38<8:15:43, 13.20s/it] {'loss': 0.0015, 'grad_norm': 1.701208461792009, 'learning_rate': 9.015999999999999e-07, 'completion_length': 46.83928871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.03662109375, 'epoch': 0.1} 10%|▉ | 246/2500 [49:38<8:15:43, 13.20s/it] 10%|▉ | 247/2500 [49:52<8:17:04, 13.24s/it] {'loss': 0.0012, 'grad_norm': 1.171575736486859, 'learning_rate': 9.011999999999999e-07, 'completion_length': 49.375003814697266, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.0357142873108387, 'kl': 0.031005859375, 'epoch': 0.1} 10%|▉ | 247/2500 [49:52<8:17:04, 13.24s/it] 10%|▉ | 248/2500 [50:11<9:22:01, 14.97s/it] {'loss': 0.0014, 'grad_norm': 0.5662483619857837, 'learning_rate': 9.008e-07, 'completion_length': 63.30357551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.03564453125, 'epoch': 0.1} 10%|▉ | 248/2500 [50:11<9:22:01, 14.97s/it] 10%|▉ | 249/2500 [50:24<9:04:35, 14.52s/it] {'loss': 0.0018, 'grad_norm': 0.16830890536178703, 'learning_rate': 9.004e-07, 'completion_length': 54.26785850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0450439453125, 'epoch': 0.1} 10%|▉ | 249/2500 [50:24<9:04:35, 14.52s/it] 10%|█ | 250/2500 [50:37<8:44:11, 13.98s/it] {'loss': 0.0013, 'grad_norm': 0.6787692445227346, 'learning_rate': 9e-07, 'completion_length': 54.44643020629883, 'rewards/accuracy_reward': 0.8571429252624512, 'rewards/format_reward': 1.0, 'reward': 1.8571429252624512, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.1} 10%|█ | 250/2500 [50:37<8:44:11, 13.98s/it] 10%|█ | 251/2500 [50:50<8:29:51, 13.60s/it] {'loss': 0.0016, 'grad_norm': 0.16166694733095407, 'learning_rate': 8.995999999999999e-07, 'completion_length': 41.01785850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03961181640625, 'epoch': 0.1} 10%|█ | 251/2500 [50:50<8:29:51, 13.60s/it] 10%|█ | 252/2500 [51:03<8:23:30, 13.44s/it] {'loss': 0.0026, 'grad_norm': 1.7501841348035267, 'learning_rate': 8.992e-07, 'completion_length': 48.67857360839844, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.0643310546875, 'epoch': 0.1} 10%|█ | 252/2500 [51:03<8:23:30, 13.44s/it] 10%|█ | 253/2500 [51:16<8:20:39, 13.37s/it] {'loss': 0.001, 'grad_norm': 1.7802963334364783, 'learning_rate': 8.988e-07, 'completion_length': 49.250003814697266, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02508544921875, 'epoch': 0.1} 10%|█ | 253/2500 [51:16<8:20:39, 13.37s/it] 10%|█ | 254/2500 [51:34<9:18:00, 14.91s/it] {'loss': 0.0011, 'grad_norm': 1.0932574230842909, 'learning_rate': 8.983999999999999e-07, 'completion_length': 62.25000190734863, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9285714626312256, 'reward_std': 0.11266788095235825, 'kl': 0.0269775390625, 'epoch': 0.1} 10%|█ | 254/2500 [51:34<9:18:00, 14.91s/it] 10%|█ | 255/2500 [51:49<9:09:39, 14.69s/it] {'loss': 0.0017, 'grad_norm': 0.14896971152468824, 'learning_rate': 8.98e-07, 'completion_length': 46.08928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.041259765625, 'epoch': 0.1} 10%|█ | 255/2500 [51:49<9:09:39, 14.69s/it] 10%|█ | 256/2500 [52:02<8:55:01, 14.31s/it] {'loss': 0.0021, 'grad_norm': 1.5085747903742777, 'learning_rate': 8.975999999999999e-07, 'completion_length': 51.23214530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.052978515625, 'epoch': 0.1} 10%|█ | 256/2500 [52:02<8:55:01, 14.31s/it] 10%|█ | 257/2500 [52:15<8:43:25, 14.00s/it] {'loss': 0.0019, 'grad_norm': 0.23946702708435824, 'learning_rate': 8.972e-07, 'completion_length': 52.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.047607421875, 'epoch': 0.1} 10%|█ | 257/2500 [52:15<8:43:25, 14.00s/it] 10%|█ | 258/2500 [52:28<8:27:53, 13.59s/it] {'loss': 0.0018, 'grad_norm': 0.8327443865402813, 'learning_rate': 8.968e-07, 'completion_length': 49.67857360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.04571533203125, 'epoch': 0.1} 10%|█ | 258/2500 [52:28<8:27:53, 13.59s/it] 10%|█ | 259/2500 [52:41<8:22:24, 13.45s/it] {'loss': 0.0013, 'grad_norm': 22.6900554668045, 'learning_rate': 8.963999999999999e-07, 'completion_length': 49.19643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0321044921875, 'epoch': 0.1} 10%|█ | 259/2500 [52:41<8:22:24, 13.45s/it] 10%|█ | 260/2500 [52:54<8:15:37, 13.28s/it] {'loss': 0.0019, 'grad_norm': 0.28469317370624175, 'learning_rate': 8.96e-07, 'completion_length': 47.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.046875, 'epoch': 0.1} 10%|█ | 260/2500 [52:54<8:15:37, 13.28s/it] 10%|█ | 261/2500 [53:09<8:30:15, 13.67s/it] {'loss': 0.0021, 'grad_norm': 0.12401849802986362, 'learning_rate': 8.955999999999999e-07, 'completion_length': 51.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0523681640625, 'epoch': 0.1} 10%|█ | 261/2500 [53:09<8:30:15, 13.67s/it] 10%|█ | 262/2500 [53:22<8:24:22, 13.52s/it] {'loss': 0.0013, 'grad_norm': 0.11039445917815256, 'learning_rate': 8.951999999999999e-07, 'completion_length': 49.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03192138671875, 'epoch': 0.1} 10%|█ | 262/2500 [53:22<8:24:22, 13.52s/it] 11%|█ | 263/2500 [53:35<8:24:15, 13.52s/it] {'loss': 0.0016, 'grad_norm': 0.1943202290212078, 'learning_rate': 8.948e-07, 'completion_length': 53.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0411376953125, 'epoch': 0.11} 11%|█ | 263/2500 [53:35<8:24:15, 13.52s/it] 11%|█ | 264/2500 [53:48<8:18:38, 13.38s/it] {'loss': 0.0016, 'grad_norm': 0.5834793970439, 'learning_rate': 8.944e-07, 'completion_length': 50.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038818359375, 'epoch': 0.11} 11%|█ | 264/2500 [53:48<8:18:38, 13.38s/it] 11%|█ | 265/2500 [54:01<8:11:53, 13.21s/it] {'loss': 0.0017, 'grad_norm': 0.11306862615514862, 'learning_rate': 8.939999999999999e-07, 'completion_length': 49.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0428466796875, 'epoch': 0.11} 11%|█ | 265/2500 [54:01<8:11:53, 13.21s/it] 11%|█ | 266/2500 [54:15<8:18:14, 13.38s/it] {'loss': 0.0021, 'grad_norm': 0.10674220741419889, 'learning_rate': 8.935999999999999e-07, 'completion_length': 56.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0526123046875, 'epoch': 0.11} 11%|█ | 266/2500 [54:15<8:18:14, 13.38s/it] 11%|█ | 267/2500 [54:28<8:14:57, 13.30s/it] {'loss': 0.0017, 'grad_norm': 1.25003029250924, 'learning_rate': 8.932e-07, 'completion_length': 49.26785850524902, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0430908203125, 'epoch': 0.11} 11%|█ | 267/2500 [54:28<8:14:57, 13.30s/it] 11%|█ | 268/2500 [54:43<8:36:38, 13.89s/it] {'loss': 0.0013, 'grad_norm': 0.23274878935034998, 'learning_rate': 8.928e-07, 'completion_length': 58.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03363037109375, 'epoch': 0.11} 11%|█ | 268/2500 [54:43<8:36:38, 13.89s/it] 11%|█ | 269/2500 [54:57<8:38:44, 13.95s/it] {'loss': 0.0021, 'grad_norm': 2.2155037107514457, 'learning_rate': 8.923999999999999e-07, 'completion_length': 54.50000190734863, 'rewards/accuracy_reward': 0.892857164144516, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.0714285746216774, 'kl': 0.0528564453125, 'epoch': 0.11} 11%|█ | 269/2500 [54:57<8:38:44, 13.95s/it] 11%|█ | 270/2500 [55:11<8:32:33, 13.79s/it] {'loss': 0.0014, 'grad_norm': 0.10168099610531232, 'learning_rate': 8.92e-07, 'completion_length': 51.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.035888671875, 'epoch': 0.11} 11%|█ | 270/2500 [55:11<8:32:33, 13.79s/it] 11%|█ | 271/2500 [55:24<8:24:57, 13.59s/it] {'loss': 0.0023, 'grad_norm': 2.208207519434844, 'learning_rate': 8.915999999999999e-07, 'completion_length': 53.83928680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0574951171875, 'epoch': 0.11} 11%|█ | 271/2500 [55:24<8:24:57, 13.59s/it] 11%|█ | 272/2500 [55:37<8:21:42, 13.51s/it] {'loss': 0.0018, 'grad_norm': 0.3271975105930231, 'learning_rate': 8.911999999999999e-07, 'completion_length': 53.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.044189453125, 'epoch': 0.11} 11%|█ | 272/2500 [55:37<8:21:42, 13.51s/it] 11%|█ | 273/2500 [55:51<8:27:11, 13.66s/it] {'loss': 0.002, 'grad_norm': 0.1862530328705457, 'learning_rate': 8.908e-07, 'completion_length': 58.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05078125, 'epoch': 0.11} 11%|█ | 273/2500 [55:51<8:27:11, 13.66s/it] 11%|█ | 274/2500 [56:05<8:23:58, 13.58s/it] {'loss': 0.0013, 'grad_norm': 0.10130098801037303, 'learning_rate': 8.904e-07, 'completion_length': 53.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03375244140625, 'epoch': 0.11} 11%|█ | 274/2500 [56:05<8:23:58, 13.58s/it] 11%|█ | 275/2500 [56:21<8:51:20, 14.33s/it] {'loss': 0.0017, 'grad_norm': 0.11149912880688154, 'learning_rate': 8.9e-07, 'completion_length': 69.46429061889648, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.041748046875, 'epoch': 0.11} 11%|█ | 275/2500 [56:21<8:51:20, 14.33s/it] 11%|█ | 276/2500 [56:34<8:35:58, 13.92s/it] {'loss': 0.0021, 'grad_norm': 0.3766172662186461, 'learning_rate': 8.895999999999999e-07, 'completion_length': 50.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.053466796875, 'epoch': 0.11} 11%|█ | 276/2500 [56:34<8:35:58, 13.92s/it] 11%|█ | 277/2500 [56:47<8:25:21, 13.64s/it] {'loss': 0.0013, 'grad_norm': 0.1034429875497399, 'learning_rate': 8.892e-07, 'completion_length': 50.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0325927734375, 'epoch': 0.11} 11%|█ | 277/2500 [56:47<8:25:21, 13.64s/it] 11%|█ | 278/2500 [57:01<8:29:37, 13.76s/it] {'loss': 0.0013, 'grad_norm': 0.09404330442625303, 'learning_rate': 8.888e-07, 'completion_length': 61.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0328369140625, 'epoch': 0.11} 11%|█ | 278/2500 [57:01<8:29:37, 13.76s/it] 11%|█ | 279/2500 [57:14<8:22:32, 13.58s/it] {'loss': 0.0016, 'grad_norm': 0.12328586420113871, 'learning_rate': 8.883999999999999e-07, 'completion_length': 54.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04058837890625, 'epoch': 0.11} 11%|█ | 279/2500 [57:14<8:22:32, 13.58s/it] 11%|█ | 280/2500 [57:28<8:30:35, 13.80s/it] {'loss': 0.0017, 'grad_norm': 3.5989429050981703, 'learning_rate': 8.88e-07, 'completion_length': 55.267860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.041748046875, 'epoch': 0.11} 11%|█ | 280/2500 [57:28<8:30:35, 13.80s/it] 11%|█ | 281/2500 [57:42<8:29:29, 13.78s/it] {'loss': 0.001, 'grad_norm': 0.07723356145229876, 'learning_rate': 8.875999999999999e-07, 'completion_length': 53.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0252685546875, 'epoch': 0.11} 11%|█ | 281/2500 [57:42<8:29:29, 13.78s/it] 11%|█▏ | 282/2500 [57:55<8:20:21, 13.54s/it] {'loss': 0.0008, 'grad_norm': 0.08454297745641938, 'learning_rate': 8.872e-07, 'completion_length': 49.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0203857421875, 'epoch': 0.11} 11%|█▏ | 282/2500 [57:55<8:20:21, 13.54s/it] 11%|█▏ | 283/2500 [58:09<8:22:20, 13.59s/it] {'loss': 0.0011, 'grad_norm': 0.11787247147560075, 'learning_rate': 8.868e-07, 'completion_length': 59.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0283203125, 'epoch': 0.11} 11%|█▏ | 283/2500 [58:09<8:22:20, 13.59s/it] 11%|█▏ | 284/2500 [58:23<8:31:41, 13.85s/it] {'loss': 0.002, 'grad_norm': 0.18223625257973491, 'learning_rate': 8.863999999999999e-07, 'completion_length': 56.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0509033203125, 'epoch': 0.11} 11%|█▏ | 284/2500 [58:23<8:31:41, 13.85s/it] 11%|█▏ | 285/2500 [58:36<8:24:41, 13.67s/it] {'loss': 0.0015, 'grad_norm': 0.10896462662378117, 'learning_rate': 8.86e-07, 'completion_length': 48.17857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0382080078125, 'epoch': 0.11} 11%|█▏ | 285/2500 [58:36<8:24:41, 13.67s/it] 11%|█▏ | 286/2500 [58:50<8:29:48, 13.82s/it] {'loss': 0.0016, 'grad_norm': 0.11825805554927499, 'learning_rate': 8.856e-07, 'completion_length': 51.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.039306640625, 'epoch': 0.11} 11%|█▏ | 286/2500 [58:51<8:29:48, 13.82s/it] 11%|█▏ | 287/2500 [59:05<8:35:07, 13.97s/it] {'loss': 0.001, 'grad_norm': 0.18514446596346198, 'learning_rate': 8.851999999999999e-07, 'completion_length': 53.08928680419922, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02569580078125, 'epoch': 0.11} 11%|█▏ | 287/2500 [59:05<8:35:07, 13.97s/it] 12%|█▏ | 288/2500 [59:19<8:34:16, 13.95s/it] {'loss': 0.0014, 'grad_norm': 0.16506586718236005, 'learning_rate': 8.848e-07, 'completion_length': 58.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.035888671875, 'epoch': 0.12} 12%|█▏ | 288/2500 [59:19<8:34:16, 13.95s/it] 12%|█▏ | 289/2500 [59:33<8:32:44, 13.91s/it] {'loss': 0.0018, 'grad_norm': 13.058602084180777, 'learning_rate': 8.844e-07, 'completion_length': 53.23214530944824, 'rewards/accuracy_reward': 0.8928571939468384, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.11266787722706795, 'kl': 0.0462646484375, 'epoch': 0.12} 12%|█▏ | 289/2500 [59:33<8:32:44, 13.91s/it] 12%|█▏ | 290/2500 [59:47<8:38:22, 14.07s/it] {'loss': 0.0011, 'grad_norm': 0.21212363592556996, 'learning_rate': 8.839999999999999e-07, 'completion_length': 58.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0272216796875, 'epoch': 0.12} 12%|█▏ | 290/2500 [59:47<8:38:22, 14.07s/it] 12%|█▏ | 291/2500 [1:00:01<8:39:50, 14.12s/it] {'loss': 0.001, 'grad_norm': 0.1415796219740443, 'learning_rate': 8.836e-07, 'completion_length': 59.57143211364746, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02581787109375, 'epoch': 0.12} 12%|█▏ | 291/2500 [1:00:01<8:39:50, 14.12s/it] 12%|█▏ | 292/2500 [1:00:14<8:28:00, 13.80s/it] {'loss': 0.001, 'grad_norm': 0.10756850829470417, 'learning_rate': 8.832e-07, 'completion_length': 54.01785850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02593994140625, 'epoch': 0.12} 12%|█▏ | 292/2500 [1:00:14<8:28:00, 13.80s/it] 12%|█▏ | 293/2500 [1:00:28<8:28:32, 13.83s/it] {'loss': 0.0011, 'grad_norm': 0.08580497387005497, 'learning_rate': 8.827999999999999e-07, 'completion_length': 53.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02630615234375, 'epoch': 0.12} 12%|█▏ | 293/2500 [1:00:28<8:28:32, 13.83s/it] 12%|█▏ | 294/2500 [1:00:42<8:27:13, 13.80s/it] {'loss': 0.0013, 'grad_norm': 0.1462290179199122, 'learning_rate': 8.823999999999999e-07, 'completion_length': 56.08928871154785, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0322265625, 'epoch': 0.12} 12%|█▏ | 294/2500 [1:00:42<8:27:13, 13.80s/it] 12%|█▏ | 295/2500 [1:00:56<8:28:08, 13.83s/it] {'loss': 0.0006, 'grad_norm': 0.07881238475469524, 'learning_rate': 8.82e-07, 'completion_length': 58.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015380859375, 'epoch': 0.12} 12%|█▏ | 295/2500 [1:00:56<8:28:08, 13.83s/it] 12%|█▏ | 296/2500 [1:01:10<8:34:02, 13.99s/it] {'loss': 0.0011, 'grad_norm': 0.07293828736652003, 'learning_rate': 8.816000000000001e-07, 'completion_length': 59.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.12} 12%|█▏ | 296/2500 [1:01:10<8:34:02, 13.99s/it] 12%|█▏ | 297/2500 [1:01:23<8:25:43, 13.77s/it] {'loss': 0.0019, 'grad_norm': 0.08085702386598355, 'learning_rate': 8.811999999999999e-07, 'completion_length': 52.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04736328125, 'epoch': 0.12} 12%|█▏ | 297/2500 [1:01:23<8:25:43, 13.77s/it] 12%|█▏ | 298/2500 [1:01:38<8:37:41, 14.11s/it] {'loss': 0.0016, 'grad_norm': 0.7262316081762332, 'learning_rate': 8.808e-07, 'completion_length': 58.91071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0401611328125, 'epoch': 0.12} 12%|█▏ | 298/2500 [1:01:38<8:37:41, 14.11s/it] 12%|█▏ | 299/2500 [1:01:53<8:43:56, 14.28s/it] {'loss': 0.0019, 'grad_norm': 2.4450087824859694, 'learning_rate': 8.804e-07, 'completion_length': 62.767860412597656, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.048583984375, 'epoch': 0.12} 12%|█▏ | 299/2500 [1:01:53<8:43:56, 14.28s/it] 12%|█▏ | 300/2500 [1:02:08<8:46:57, 14.37s/it] {'loss': 0.0021, 'grad_norm': 0.9613598470781414, 'learning_rate': 8.799999999999999e-07, 'completion_length': 57.01785850524902, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.052978515625, 'epoch': 0.12} 12%|█▏ | 300/2500 [1:02:08<8:46:57, 14.37s/it] 12%|█▏ | 301/2500 [1:03:20<19:23:16, 31.74s/it] {'loss': 0.0012, 'grad_norm': 2.6244542894124474, 'learning_rate': 8.796e-07, 'completion_length': 65.37500381469727, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.11266788095235825, 'kl': 0.02880859375, 'epoch': 0.12} 12%|█▏ | 301/2500 [1:03:20<19:23:16, 31.74s/it] 12%|█▏ | 302/2500 [1:03:34<16:04:45, 26.34s/it] {'loss': 0.0008, 'grad_norm': 0.09674265236627626, 'learning_rate': 8.792e-07, 'completion_length': 54.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0203857421875, 'epoch': 0.12} 12%|█▏ | 302/2500 [1:03:34<16:04:45, 26.34s/it] 12%|█▏ | 303/2500 [1:03:48<13:47:49, 22.61s/it] {'loss': 0.0015, 'grad_norm': 0.12913520864027314, 'learning_rate': 8.788e-07, 'completion_length': 53.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03668212890625, 'epoch': 0.12} 12%|█▏ | 303/2500 [1:03:48<13:47:49, 22.61s/it] 12%|█▏ | 304/2500 [1:04:02<12:14:48, 20.08s/it] {'loss': 0.001, 'grad_norm': 0.14820357655948596, 'learning_rate': 8.783999999999999e-07, 'completion_length': 50.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0247802734375, 'epoch': 0.12} 12%|█▏ | 304/2500 [1:04:02<12:14:48, 20.08s/it] 12%|█▏ | 305/2500 [1:04:15<11:00:13, 18.05s/it] {'loss': 0.0017, 'grad_norm': 0.09114607092277592, 'learning_rate': 8.78e-07, 'completion_length': 49.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.041259765625, 'epoch': 0.12} 12%|█▏ | 305/2500 [1:04:15<11:00:13, 18.05s/it] 12%|█▏ | 306/2500 [1:04:28<10:08:14, 16.63s/it] {'loss': 0.0015, 'grad_norm': 0.16283566262323762, 'learning_rate': 8.776e-07, 'completion_length': 51.98214530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.037353515625, 'epoch': 0.12} 12%|█▏ | 306/2500 [1:04:28<10:08:14, 16.63s/it] 12%|█▏ | 307/2500 [1:04:42<9:37:10, 15.79s/it] {'loss': 0.002, 'grad_norm': 3.6104060908936484, 'learning_rate': 8.771999999999999e-07, 'completion_length': 46.98214530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.0494384765625, 'epoch': 0.12} 12%|█▏ | 307/2500 [1:04:42<9:37:10, 15.79s/it] 12%|█▏ | 308/2500 [1:04:55<9:02:20, 14.85s/it] {'loss': 0.0011, 'grad_norm': 0.11951698500394942, 'learning_rate': 8.768e-07, 'completion_length': 50.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0274658203125, 'epoch': 0.12} 12%|█▏ | 308/2500 [1:04:55<9:02:20, 14.85s/it] 12%|█▏ | 309/2500 [1:05:09<8:50:09, 14.52s/it] {'loss': 0.0016, 'grad_norm': 0.10724991910211987, 'learning_rate': 8.763999999999999e-07, 'completion_length': 59.69643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04052734375, 'epoch': 0.12} 12%|█▏ | 309/2500 [1:05:09<8:50:09, 14.52s/it] 12%|█▏ | 310/2500 [1:05:22<8:33:22, 14.07s/it] {'loss': 0.0013, 'grad_norm': 0.20126124353143235, 'learning_rate': 8.76e-07, 'completion_length': 51.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031494140625, 'epoch': 0.12} 12%|█▏ | 310/2500 [1:05:22<8:33:22, 14.07s/it] 12%|█▏ | 311/2500 [1:05:35<8:29:41, 13.97s/it] {'loss': 0.002, 'grad_norm': 3.620253747519774, 'learning_rate': 8.756e-07, 'completion_length': 58.78571891784668, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.0506591796875, 'epoch': 0.12} 12%|█▏ | 311/2500 [1:05:35<8:29:41, 13.97s/it] 12%|█▏ | 312/2500 [1:05:49<8:27:57, 13.93s/it] {'loss': 0.0018, 'grad_norm': 2.024523990520663, 'learning_rate': 8.751999999999999e-07, 'completion_length': 65.71428680419922, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.0357142873108387, 'kl': 0.04559326171875, 'epoch': 0.12} 12%|█▏ | 312/2500 [1:05:49<8:27:57, 13.93s/it] 13%|█▎ | 313/2500 [1:06:02<8:19:37, 13.71s/it] {'loss': 0.0008, 'grad_norm': 0.150668135327158, 'learning_rate': 8.748e-07, 'completion_length': 54.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02105712890625, 'epoch': 0.13} 13%|█▎ | 313/2500 [1:06:02<8:19:37, 13.71s/it] 13%|█▎ | 314/2500 [1:06:16<8:18:00, 13.67s/it] {'loss': 0.0015, 'grad_norm': 0.504125115668341, 'learning_rate': 8.743999999999999e-07, 'completion_length': 52.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03662109375, 'epoch': 0.13} 13%|█▎ | 314/2500 [1:06:16<8:18:00, 13.67s/it] 13%|█▎ | 315/2500 [1:06:29<8:12:21, 13.52s/it] {'loss': 0.0015, 'grad_norm': 4.492515187629274, 'learning_rate': 8.739999999999999e-07, 'completion_length': 53.87500190734863, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0369873046875, 'epoch': 0.13} 13%|█▎ | 315/2500 [1:06:29<8:12:21, 13.52s/it] 13%|█▎ | 316/2500 [1:06:43<8:15:56, 13.62s/it] {'loss': 0.0011, 'grad_norm': 0.13258570584492865, 'learning_rate': 8.736e-07, 'completion_length': 59.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02801513671875, 'epoch': 0.13} 13%|█▎ | 316/2500 [1:06:43<8:15:56, 13.62s/it] 13%|█▎ | 317/2500 [1:06:56<8:08:42, 13.43s/it] {'loss': 0.0013, 'grad_norm': 0.16677588899874846, 'learning_rate': 8.732e-07, 'completion_length': 50.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0328369140625, 'epoch': 0.13} 13%|█▎ | 317/2500 [1:06:56<8:08:42, 13.43s/it] 13%|█▎ | 318/2500 [1:07:11<8:21:56, 13.80s/it] {'loss': 0.0016, 'grad_norm': 0.08850130298152914, 'learning_rate': 8.728e-07, 'completion_length': 61.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.039794921875, 'epoch': 0.13} 13%|█▎ | 318/2500 [1:07:11<8:21:56, 13.80s/it] 13%|█▎ | 319/2500 [1:07:24<8:12:14, 13.54s/it] {'loss': 0.0015, 'grad_norm': 0.18970018675834413, 'learning_rate': 8.723999999999999e-07, 'completion_length': 53.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0380859375, 'epoch': 0.13} 13%|█▎ | 319/2500 [1:07:24<8:12:14, 13.54s/it] 13%|█▎ | 320/2500 [1:07:37<8:13:57, 13.60s/it] {'loss': 0.0012, 'grad_norm': 0.1299080609947469, 'learning_rate': 8.72e-07, 'completion_length': 59.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03021240234375, 'epoch': 0.13} 13%|█▎ | 320/2500 [1:07:37<8:13:57, 13.60s/it] 13%|█▎ | 321/2500 [1:07:52<8:22:15, 13.83s/it] {'loss': 0.0012, 'grad_norm': 1.5935420890115228, 'learning_rate': 8.716e-07, 'completion_length': 61.625003814697266, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.0286865234375, 'epoch': 0.13} 13%|█▎ | 321/2500 [1:07:52<8:22:15, 13.83s/it] 13%|█▎ | 322/2500 [1:08:06<8:25:33, 13.93s/it] {'loss': 0.0016, 'grad_norm': 0.15599192412127888, 'learning_rate': 8.711999999999999e-07, 'completion_length': 62.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04010009765625, 'epoch': 0.13} 13%|█▎ | 322/2500 [1:08:06<8:25:33, 13.93s/it] 13%|█▎ | 323/2500 [1:08:20<8:28:03, 14.00s/it] {'loss': 0.0011, 'grad_norm': 0.12444246398716262, 'learning_rate': 8.708e-07, 'completion_length': 59.41071891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0264892578125, 'epoch': 0.13} 13%|█▎ | 323/2500 [1:08:20<8:28:03, 14.00s/it] 13%|█▎ | 324/2500 [1:08:34<8:22:57, 13.87s/it] {'loss': 0.001, 'grad_norm': 3.137890844114671, 'learning_rate': 8.704e-07, 'completion_length': 57.517860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02435302734375, 'epoch': 0.13} 13%|█▎ | 324/2500 [1:08:34<8:22:57, 13.87s/it] 13%|█▎ | 325/2500 [1:08:48<8:27:32, 14.00s/it] {'loss': 0.001, 'grad_norm': 0.13127133405035837, 'learning_rate': 8.699999999999999e-07, 'completion_length': 56.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024658203125, 'epoch': 0.13} 13%|█▎ | 325/2500 [1:08:48<8:27:32, 14.00s/it] 13%|█▎ | 326/2500 [1:09:02<8:25:40, 13.96s/it] {'loss': 0.0013, 'grad_norm': 0.10700165922425883, 'learning_rate': 8.696e-07, 'completion_length': 56.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0328369140625, 'epoch': 0.13} 13%|█▎ | 326/2500 [1:09:02<8:25:40, 13.96s/it] 13%|█▎ | 327/2500 [1:09:16<8:27:04, 14.00s/it] {'loss': 0.0016, 'grad_norm': 0.08746698084191962, 'learning_rate': 8.692e-07, 'completion_length': 61.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03973388671875, 'epoch': 0.13} 13%|█▎ | 327/2500 [1:09:16<8:27:04, 14.00s/it] 13%|█▎ | 328/2500 [1:09:32<8:45:50, 14.53s/it] {'loss': 0.0013, 'grad_norm': 1.3237205918326902, 'learning_rate': 8.687999999999999e-07, 'completion_length': 61.000003814697266, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.032470703125, 'epoch': 0.13} 13%|█▎ | 328/2500 [1:09:32<8:45:50, 14.53s/it] 13%|█▎ | 329/2500 [1:09:46<8:45:35, 14.53s/it] {'loss': 0.0011, 'grad_norm': 2.036391578205753, 'learning_rate': 8.683999999999999e-07, 'completion_length': 60.67857360839844, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02862548828125, 'epoch': 0.13} 13%|█▎ | 329/2500 [1:09:46<8:45:35, 14.53s/it] 13%|█▎ | 330/2500 [1:10:01<8:47:30, 14.59s/it] {'loss': 0.0008, 'grad_norm': 0.10803373144303297, 'learning_rate': 8.68e-07, 'completion_length': 63.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020263671875, 'epoch': 0.13} 13%|█▎ | 330/2500 [1:10:01<8:47:30, 14.59s/it] 13%|█▎ | 331/2500 [1:10:15<8:42:09, 14.44s/it] {'loss': 0.001, 'grad_norm': 0.09414975686691565, 'learning_rate': 8.676e-07, 'completion_length': 58.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0252685546875, 'epoch': 0.13} 13%|█▎ | 331/2500 [1:10:15<8:42:09, 14.44s/it] 13%|█▎ | 332/2500 [1:10:30<8:45:14, 14.54s/it] {'loss': 0.0018, 'grad_norm': 3.6400944549379766, 'learning_rate': 8.671999999999999e-07, 'completion_length': 75.37500381469727, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.044921875, 'epoch': 0.13} 13%|█▎ | 332/2500 [1:10:30<8:45:14, 14.54s/it] 13%|█▎ | 333/2500 [1:10:45<8:49:26, 14.66s/it] {'loss': 0.0015, 'grad_norm': 9.026120436629993, 'learning_rate': 8.668e-07, 'completion_length': 64.14286231994629, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.1071428619325161, 'kl': 0.03729248046875, 'epoch': 0.13} 13%|█▎ | 333/2500 [1:10:45<8:49:26, 14.66s/it] 13%|█▎ | 334/2500 [1:10:59<8:45:55, 14.57s/it] {'loss': 0.0011, 'grad_norm': 3.6879664782003, 'learning_rate': 8.663999999999999e-07, 'completion_length': 65.80357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.0263671875, 'epoch': 0.13} 13%|█▎ | 334/2500 [1:10:59<8:45:55, 14.57s/it] 13%|█▎ | 335/2500 [1:11:13<8:44:04, 14.52s/it] {'loss': 0.0018, 'grad_norm': 2.419932216762632, 'learning_rate': 8.659999999999999e-07, 'completion_length': 64.64286041259766, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.07695359364151955, 'kl': 0.0447998046875, 'epoch': 0.13} 13%|█▎ | 335/2500 [1:11:13<8:44:04, 14.52s/it] 13%|█▎ | 336/2500 [1:11:28<8:48:41, 14.66s/it] {'loss': 0.0015, 'grad_norm': 1.0075869500716548, 'learning_rate': 8.656e-07, 'completion_length': 57.71428871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0364990234375, 'epoch': 0.13} 13%|█▎ | 336/2500 [1:11:28<8:48:41, 14.66s/it] 13%|█▎ | 337/2500 [1:11:42<8:34:23, 14.27s/it] {'loss': 0.0015, 'grad_norm': 0.09906689941462112, 'learning_rate': 8.651999999999999e-07, 'completion_length': 55.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03662109375, 'epoch': 0.13} 13%|█▎ | 337/2500 [1:11:42<8:34:23, 14.27s/it] 14%|█▎ | 338/2500 [1:11:56<8:35:11, 14.30s/it] {'loss': 0.0015, 'grad_norm': 0.8910054372380899, 'learning_rate': 8.648e-07, 'completion_length': 64.83928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0367431640625, 'epoch': 0.14} 14%|█▎ | 338/2500 [1:11:56<8:35:11, 14.30s/it] 14%|█▎ | 339/2500 [1:12:11<8:44:46, 14.57s/it] {'loss': 0.0014, 'grad_norm': 0.08281777865020326, 'learning_rate': 8.643999999999999e-07, 'completion_length': 61.58928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0350341796875, 'epoch': 0.14} 14%|█▎ | 339/2500 [1:12:11<8:44:46, 14.57s/it] 14%|█▎ | 340/2500 [1:12:25<8:37:45, 14.38s/it] {'loss': 0.0013, 'grad_norm': 0.19020610639230676, 'learning_rate': 8.639999999999999e-07, 'completion_length': 55.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0333251953125, 'epoch': 0.14} 14%|█▎ | 340/2500 [1:12:25<8:37:45, 14.38s/it] 14%|█▎ | 341/2500 [1:12:44<9:25:40, 15.72s/it] {'loss': 0.0011, 'grad_norm': 1.2847373657208878, 'learning_rate': 8.636e-07, 'completion_length': 63.82143020629883, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.8928571939468384, 'reward_std': 0.0714285746216774, 'kl': 0.02655029296875, 'epoch': 0.14} 14%|█▎ | 341/2500 [1:12:44<9:25:40, 15.72s/it] 14%|█▎ | 342/2500 [1:12:57<8:57:39, 14.95s/it] {'loss': 0.0011, 'grad_norm': 0.10852126054388771, 'learning_rate': 8.632e-07, 'completion_length': 54.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02655029296875, 'epoch': 0.14} 14%|█▎ | 342/2500 [1:12:57<8:57:39, 14.95s/it] 14%|█▎ | 343/2500 [1:13:11<8:40:21, 14.47s/it] {'loss': 0.001, 'grad_norm': 0.09013346829335829, 'learning_rate': 8.628e-07, 'completion_length': 49.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02520751953125, 'epoch': 0.14} 14%|█▎ | 343/2500 [1:13:11<8:40:21, 14.47s/it] 14%|█▍ | 344/2500 [1:13:24<8:24:11, 14.03s/it] {'loss': 0.0017, 'grad_norm': 0.10776356255540562, 'learning_rate': 8.624e-07, 'completion_length': 49.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04345703125, 'epoch': 0.14} 14%|█▍ | 344/2500 [1:13:24<8:24:11, 14.03s/it] 14%|█▍ | 345/2500 [1:13:37<8:20:08, 13.93s/it] {'loss': 0.0011, 'grad_norm': 0.27594798812027904, 'learning_rate': 8.62e-07, 'completion_length': 50.125003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.14} 14%|█▍ | 345/2500 [1:13:37<8:20:08, 13.93s/it] 14%|█▍ | 346/2500 [1:13:52<8:23:53, 14.04s/it] {'loss': 0.0016, 'grad_norm': 18.365825264591784, 'learning_rate': 8.616e-07, 'completion_length': 46.32143020629883, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.04095458984375, 'epoch': 0.14} 14%|█▍ | 346/2500 [1:13:52<8:23:53, 14.04s/it] 14%|█▍ | 347/2500 [1:14:05<8:20:57, 13.96s/it] {'loss': 0.0013, 'grad_norm': 0.11231119871513655, 'learning_rate': 8.611999999999999e-07, 'completion_length': 54.035715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0325927734375, 'epoch': 0.14} 14%|█▍ | 347/2500 [1:14:05<8:20:57, 13.96s/it] 14%|█▍ | 348/2500 [1:14:19<8:15:14, 13.81s/it] {'loss': 0.0012, 'grad_norm': 0.08858815037784173, 'learning_rate': 8.608e-07, 'completion_length': 48.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02874755859375, 'epoch': 0.14} 14%|█▍ | 348/2500 [1:14:19<8:15:14, 13.81s/it] 14%|█▍ | 349/2500 [1:14:32<8:06:15, 13.56s/it] {'loss': 0.0014, 'grad_norm': 0.11962001645380528, 'learning_rate': 8.604000000000001e-07, 'completion_length': 49.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03558349609375, 'epoch': 0.14} 14%|█▍ | 349/2500 [1:14:32<8:06:15, 13.56s/it] 14%|█▍ | 350/2500 [1:14:45<8:05:40, 13.55s/it] {'loss': 0.0017, 'grad_norm': 0.15613752004382775, 'learning_rate': 8.599999999999999e-07, 'completion_length': 50.410715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04296875, 'epoch': 0.14} 14%|█▍ | 350/2500 [1:14:45<8:05:40, 13.55s/it] 14%|█▍ | 351/2500 [1:14:58<7:56:09, 13.29s/it] {'loss': 0.0012, 'grad_norm': 0.09851407821599488, 'learning_rate': 8.596e-07, 'completion_length': 52.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031005859375, 'epoch': 0.14} 14%|█▍ | 351/2500 [1:14:58<7:56:09, 13.29s/it] 14%|█▍ | 352/2500 [1:15:12<8:00:58, 13.43s/it] {'loss': 0.0014, 'grad_norm': 0.10292879174129684, 'learning_rate': 8.592e-07, 'completion_length': 54.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03515625, 'epoch': 0.14} 14%|█▍ | 352/2500 [1:15:12<8:00:58, 13.43s/it] 14%|█▍ | 353/2500 [1:15:25<8:02:00, 13.47s/it] {'loss': 0.0017, 'grad_norm': 0.11700599148893431, 'learning_rate': 8.587999999999999e-07, 'completion_length': 46.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0426025390625, 'epoch': 0.14} 14%|█▍ | 353/2500 [1:15:25<8:02:00, 13.47s/it] 14%|█▍ | 354/2500 [1:15:39<8:00:56, 13.45s/it] {'loss': 0.0014, 'grad_norm': 1.4763593199133198, 'learning_rate': 8.584e-07, 'completion_length': 53.42857360839844, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03436279296875, 'epoch': 0.14} 14%|█▍ | 354/2500 [1:15:39<8:00:56, 13.45s/it] 14%|█▍ | 355/2500 [1:15:52<7:56:17, 13.32s/it] {'loss': 0.0013, 'grad_norm': 0.22762044366317388, 'learning_rate': 8.58e-07, 'completion_length': 51.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03179931640625, 'epoch': 0.14} 14%|█▍ | 355/2500 [1:15:52<7:56:17, 13.32s/it] 14%|█▍ | 356/2500 [1:16:06<8:01:03, 13.46s/it] {'loss': 0.002, 'grad_norm': 0.09330155566941592, 'learning_rate': 8.576e-07, 'completion_length': 53.785715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.050048828125, 'epoch': 0.14} 14%|█▍ | 356/2500 [1:16:06<8:01:03, 13.46s/it] 14%|█▍ | 357/2500 [1:16:21<8:27:23, 14.21s/it] {'loss': 0.0017, 'grad_norm': 1.3565490427003846, 'learning_rate': 8.571999999999999e-07, 'completion_length': 72.16071701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.043212890625, 'epoch': 0.14} 14%|█▍ | 357/2500 [1:16:21<8:27:23, 14.21s/it] 14%|█▍ | 358/2500 [1:16:41<9:25:34, 15.84s/it] {'loss': 0.0011, 'grad_norm': 0.3579172388735342, 'learning_rate': 8.568e-07, 'completion_length': 71.94643020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.02764892578125, 'epoch': 0.14} 14%|█▍ | 358/2500 [1:16:41<9:25:34, 15.84s/it] 14%|█▍ | 359/2500 [1:16:57<9:20:37, 15.71s/it] {'loss': 0.0008, 'grad_norm': 0.8555929235255206, 'learning_rate': 8.564e-07, 'completion_length': 59.58928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.019287109375, 'epoch': 0.14} 14%|█▍ | 359/2500 [1:16:57<9:20:37, 15.71s/it] 14%|█▍ | 360/2500 [1:17:10<8:56:10, 15.03s/it] {'loss': 0.0014, 'grad_norm': 0.15865484318122425, 'learning_rate': 8.559999999999999e-07, 'completion_length': 59.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03411865234375, 'epoch': 0.14} 14%|█▍ | 360/2500 [1:17:10<8:56:10, 15.03s/it] 14%|█▍ | 361/2500 [1:17:25<8:50:41, 14.89s/it] {'loss': 0.0017, 'grad_norm': 0.10968812658873854, 'learning_rate': 8.556e-07, 'completion_length': 61.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0419921875, 'epoch': 0.14} 14%|█▍ | 361/2500 [1:17:25<8:50:41, 14.89s/it] 14%|█▍ | 362/2500 [1:17:43<9:31:52, 16.05s/it] {'loss': 0.0014, 'grad_norm': 0.8047852635623836, 'learning_rate': 8.551999999999999e-07, 'completion_length': 57.32143211364746, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.8928571939468384, 'reward_std': 0.0714285746216774, 'kl': 0.03375244140625, 'epoch': 0.14} 14%|█▍ | 362/2500 [1:17:43<9:31:52, 16.05s/it] 15%|█▍ | 363/2500 [1:17:58<9:14:13, 15.56s/it] {'loss': 0.001, 'grad_norm': 0.15285217031181292, 'learning_rate': 8.548e-07, 'completion_length': 52.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024658203125, 'epoch': 0.15} 15%|█▍ | 363/2500 [1:17:58<9:14:13, 15.56s/it] 15%|█▍ | 364/2500 [1:18:12<8:55:47, 15.05s/it] {'loss': 0.0011, 'grad_norm': 0.08382231185776597, 'learning_rate': 8.544e-07, 'completion_length': 51.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.15} 15%|█▍ | 364/2500 [1:18:12<8:55:47, 15.05s/it] 15%|█▍ | 365/2500 [1:18:26<8:45:10, 14.76s/it] {'loss': 0.0011, 'grad_norm': 0.15082249851776908, 'learning_rate': 8.539999999999999e-07, 'completion_length': 59.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0272216796875, 'epoch': 0.15} 15%|█▍ | 365/2500 [1:18:26<8:45:10, 14.76s/it] 15%|█▍ | 366/2500 [1:18:40<8:36:37, 14.53s/it] {'loss': 0.0011, 'grad_norm': 0.10045097032501765, 'learning_rate': 8.536e-07, 'completion_length': 57.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028076171875, 'epoch': 0.15} 15%|█▍ | 366/2500 [1:18:40<8:36:37, 14.53s/it] 15%|█▍ | 367/2500 [1:18:53<8:22:47, 14.14s/it] {'loss': 0.0011, 'grad_norm': 0.15134743598803535, 'learning_rate': 8.531999999999999e-07, 'completion_length': 60.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0286865234375, 'epoch': 0.15} 15%|█▍ | 367/2500 [1:18:53<8:22:47, 14.14s/it] 15%|█▍ | 368/2500 [1:19:06<8:09:25, 13.77s/it] {'loss': 0.0012, 'grad_norm': 0.5149341935713923, 'learning_rate': 8.528e-07, 'completion_length': 52.21428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030517578125, 'epoch': 0.15} 15%|█▍ | 368/2500 [1:19:06<8:09:25, 13.77s/it] 15%|█▍ | 369/2500 [1:19:19<8:08:16, 13.75s/it] {'loss': 0.0016, 'grad_norm': 1.1171465422073705, 'learning_rate': 8.524e-07, 'completion_length': 59.16071701049805, 'rewards/accuracy_reward': 0.9107142984867096, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.0357142873108387, 'kl': 0.0394287109375, 'epoch': 0.15} 15%|█▍ | 369/2500 [1:19:19<8:08:16, 13.75s/it] 15%|█▍ | 370/2500 [1:19:33<8:09:03, 13.78s/it] {'loss': 0.0016, 'grad_norm': 0.1177963720643264, 'learning_rate': 8.52e-07, 'completion_length': 56.76785850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0404052734375, 'epoch': 0.15} 15%|█▍ | 370/2500 [1:19:33<8:09:03, 13.78s/it] 15%|█▍ | 371/2500 [1:19:47<8:12:24, 13.88s/it] {'loss': 0.0015, 'grad_norm': 0.10750079305314861, 'learning_rate': 8.516e-07, 'completion_length': 63.89286231994629, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03857421875, 'epoch': 0.15} 15%|█▍ | 371/2500 [1:19:47<8:12:24, 13.88s/it] 15%|█▍ | 372/2500 [1:20:00<8:03:23, 13.63s/it] {'loss': 0.0009, 'grad_norm': 0.09134475105958575, 'learning_rate': 8.511999999999999e-07, 'completion_length': 52.82143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0213623046875, 'epoch': 0.15} 15%|█▍ | 372/2500 [1:20:00<8:03:23, 13.63s/it] 15%|█▍ | 373/2500 [1:20:14<8:04:53, 13.68s/it] {'loss': 0.0014, 'grad_norm': 0.13366793010702446, 'learning_rate': 8.508e-07, 'completion_length': 57.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03497314453125, 'epoch': 0.15} 15%|█▍ | 373/2500 [1:20:14<8:04:53, 13.68s/it] 15%|█▍ | 374/2500 [1:20:28<8:01:59, 13.60s/it] {'loss': 0.001, 'grad_norm': 0.16218035211843246, 'learning_rate': 8.504e-07, 'completion_length': 52.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0255126953125, 'epoch': 0.15} 15%|█▍ | 374/2500 [1:20:28<8:01:59, 13.60s/it] 15%|█▌ | 375/2500 [1:20:42<8:06:14, 13.73s/it] {'loss': 0.0012, 'grad_norm': 0.6671164620186704, 'learning_rate': 8.499999999999999e-07, 'completion_length': 60.000003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02978515625, 'epoch': 0.15} 15%|█▌ | 375/2500 [1:20:42<8:06:14, 13.73s/it] 15%|█▌ | 376/2500 [1:20:55<7:58:26, 13.52s/it] {'loss': 0.0016, 'grad_norm': 0.1396205358061693, 'learning_rate': 8.496e-07, 'completion_length': 52.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.039794921875, 'epoch': 0.15} 15%|█▌ | 376/2500 [1:20:55<7:58:26, 13.52s/it] 15%|█▌ | 377/2500 [1:21:09<8:02:44, 13.64s/it] {'loss': 0.0013, 'grad_norm': 0.9405683182817653, 'learning_rate': 8.492e-07, 'completion_length': 61.142860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0318603515625, 'epoch': 0.15} 15%|█▌ | 377/2500 [1:21:09<8:02:44, 13.64s/it] 15%|█▌ | 378/2500 [1:21:23<8:10:32, 13.87s/it] {'loss': 0.0014, 'grad_norm': 0.18079910024043397, 'learning_rate': 8.487999999999999e-07, 'completion_length': 51.892860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0338134765625, 'epoch': 0.15} 15%|█▌ | 378/2500 [1:21:23<8:10:32, 13.87s/it] 15%|█▌ | 379/2500 [1:21:37<8:07:14, 13.78s/it] {'loss': 0.0014, 'grad_norm': 1.3675963933160304, 'learning_rate': 8.484e-07, 'completion_length': 59.55357360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03485107421875, 'epoch': 0.15} 15%|█▌ | 379/2500 [1:21:37<8:07:14, 13.78s/it] 15%|█▌ | 380/2500 [1:21:51<8:16:00, 14.04s/it] {'loss': 0.0013, 'grad_norm': 3.0898359455601567, 'learning_rate': 8.48e-07, 'completion_length': 61.21428871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03240966796875, 'epoch': 0.15} 15%|█▌ | 380/2500 [1:21:51<8:16:00, 14.04s/it] 15%|█▌ | 381/2500 [1:22:06<8:19:22, 14.14s/it] {'loss': 0.0014, 'grad_norm': 2.0140222238596954, 'learning_rate': 8.475999999999999e-07, 'completion_length': 60.80357551574707, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.033935546875, 'epoch': 0.15} 15%|█▌ | 381/2500 [1:22:06<8:19:22, 14.14s/it] 15%|█▌ | 382/2500 [1:22:19<8:09:59, 13.88s/it] {'loss': 0.0016, 'grad_norm': 0.1671827564505293, 'learning_rate': 8.471999999999999e-07, 'completion_length': 50.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0408935546875, 'epoch': 0.15} 15%|█▌ | 382/2500 [1:22:19<8:09:59, 13.88s/it] 15%|█▌ | 383/2500 [1:22:32<8:05:33, 13.76s/it] {'loss': 0.0015, 'grad_norm': 0.10474875256006423, 'learning_rate': 8.468e-07, 'completion_length': 64.98214721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03753662109375, 'epoch': 0.15} 15%|█▌ | 383/2500 [1:22:32<8:05:33, 13.76s/it] 15%|█▌ | 384/2500 [1:22:47<8:09:24, 13.88s/it] {'loss': 0.001, 'grad_norm': 0.13829858245556362, 'learning_rate': 8.464e-07, 'completion_length': 58.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02392578125, 'epoch': 0.15} 15%|█▌ | 384/2500 [1:22:47<8:09:24, 13.88s/it] 15%|█▌ | 385/2500 [1:23:00<8:06:07, 13.79s/it] {'loss': 0.0017, 'grad_norm': 3.7424939680716243, 'learning_rate': 8.459999999999999e-07, 'completion_length': 61.91071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.04156494140625, 'epoch': 0.15} 15%|█▌ | 385/2500 [1:23:00<8:06:07, 13.79s/it] 15%|█▌ | 386/2500 [1:23:14<8:03:22, 13.72s/it] {'loss': 0.0015, 'grad_norm': 0.12253494296606746, 'learning_rate': 8.456e-07, 'completion_length': 54.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.036376953125, 'epoch': 0.15} 15%|█▌ | 386/2500 [1:23:14<8:03:22, 13.72s/it] 15%|█▌ | 387/2500 [1:23:27<7:59:37, 13.62s/it] {'loss': 0.001, 'grad_norm': 0.11145991408266168, 'learning_rate': 8.451999999999999e-07, 'completion_length': 58.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0255126953125, 'epoch': 0.15} 15%|█▌ | 387/2500 [1:23:27<7:59:37, 13.62s/it] 16%|█▌ | 388/2500 [1:23:40<7:56:57, 13.55s/it] {'loss': 0.0012, 'grad_norm': 0.15355823876046107, 'learning_rate': 8.447999999999999e-07, 'completion_length': 63.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0291748046875, 'epoch': 0.16} 16%|█▌ | 388/2500 [1:23:40<7:56:57, 13.55s/it] 16%|█▌ | 389/2500 [1:23:54<7:54:06, 13.48s/it] {'loss': 0.0015, 'grad_norm': 0.16195331925159573, 'learning_rate': 8.444e-07, 'completion_length': 55.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0362548828125, 'epoch': 0.16} 16%|█▌ | 389/2500 [1:23:54<7:54:06, 13.48s/it] 16%|█▌ | 390/2500 [1:24:08<7:59:10, 13.63s/it] {'loss': 0.0018, 'grad_norm': 0.12292488346227579, 'learning_rate': 8.439999999999999e-07, 'completion_length': 60.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.045654296875, 'epoch': 0.16} 16%|█▌ | 390/2500 [1:24:08<7:59:10, 13.63s/it] 16%|█▌ | 391/2500 [1:24:21<7:53:43, 13.48s/it] {'loss': 0.0012, 'grad_norm': 0.10143986984888305, 'learning_rate': 8.436e-07, 'completion_length': 53.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030029296875, 'epoch': 0.16} 16%|█▌ | 391/2500 [1:24:21<7:53:43, 13.48s/it] 16%|█▌ | 392/2500 [1:24:35<8:03:49, 13.77s/it] {'loss': 0.001, 'grad_norm': 0.5991750737724139, 'learning_rate': 8.431999999999999e-07, 'completion_length': 59.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025634765625, 'epoch': 0.16} 16%|█▌ | 392/2500 [1:24:35<8:03:49, 13.77s/it] 16%|█▌ | 393/2500 [1:24:49<7:58:25, 13.62s/it] {'loss': 0.0008, 'grad_norm': 0.13930144737127642, 'learning_rate': 8.428e-07, 'completion_length': 59.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02069091796875, 'epoch': 0.16} 16%|█▌ | 393/2500 [1:24:49<7:58:25, 13.62s/it] 16%|█▌ | 394/2500 [1:25:02<8:00:13, 13.68s/it] {'loss': 0.0019, 'grad_norm': 1.3806393597645361, 'learning_rate': 8.424e-07, 'completion_length': 54.017860412597656, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0467529296875, 'epoch': 0.16} 16%|█▌ | 394/2500 [1:25:02<8:00:13, 13.68s/it] 16%|█▌ | 395/2500 [1:25:17<8:04:57, 13.82s/it] {'loss': 0.0012, 'grad_norm': 0.09103943695203759, 'learning_rate': 8.419999999999999e-07, 'completion_length': 53.410715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02886962890625, 'epoch': 0.16} 16%|█▌ | 395/2500 [1:25:17<8:04:57, 13.82s/it] 16%|█▌ | 396/2500 [1:25:29<7:54:25, 13.53s/it] {'loss': 0.0016, 'grad_norm': 2.006838981663528, 'learning_rate': 8.416e-07, 'completion_length': 47.10714340209961, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0411376953125, 'epoch': 0.16} 16%|█▌ | 396/2500 [1:25:29<7:54:25, 13.53s/it] 16%|█▌ | 397/2500 [1:25:43<7:52:33, 13.48s/it] {'loss': 0.0012, 'grad_norm': 0.11906294532119761, 'learning_rate': 8.411999999999999e-07, 'completion_length': 52.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030029296875, 'epoch': 0.16} 16%|█▌ | 397/2500 [1:25:43<7:52:33, 13.48s/it] 16%|█▌ | 398/2500 [1:25:56<7:46:24, 13.31s/it] {'loss': 0.0008, 'grad_norm': 0.13745531244817777, 'learning_rate': 8.408e-07, 'completion_length': 51.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021270751953125, 'epoch': 0.16} 16%|█▌ | 398/2500 [1:25:56<7:46:24, 13.31s/it] 16%|█▌ | 399/2500 [1:26:09<7:48:05, 13.37s/it] {'loss': 0.0012, 'grad_norm': 0.08189055457674145, 'learning_rate': 8.404e-07, 'completion_length': 52.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02880859375, 'epoch': 0.16} 16%|█▌ | 399/2500 [1:26:09<7:48:05, 13.37s/it] 16%|█▌ | 400/2500 [1:26:22<7:44:32, 13.27s/it] {'loss': 0.0014, 'grad_norm': 0.10754582258403375, 'learning_rate': 8.399999999999999e-07, 'completion_length': 51.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.034423828125, 'epoch': 0.16} 16%|█▌ | 400/2500 [1:26:22<7:44:32, 13.27s/it] 16%|█▌ | 401/2500 [1:27:31<17:24:22, 29.85s/it] {'loss': 0.0011, 'grad_norm': 0.1239220952702497, 'learning_rate': 8.396e-07, 'completion_length': 48.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.16} 16%|█▌ | 401/2500 [1:27:31<17:24:22, 29.85s/it] 16%|█▌ | 402/2500 [1:27:46<14:45:46, 25.33s/it] {'loss': 0.002, 'grad_norm': 0.16791114030503954, 'learning_rate': 8.391999999999999e-07, 'completion_length': 57.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0491943359375, 'epoch': 0.16} 16%|█▌ | 402/2500 [1:27:46<14:45:46, 25.33s/it] 16%|█▌ | 403/2500 [1:27:59<12:36:19, 21.64s/it] {'loss': 0.0017, 'grad_norm': 0.15778381531856797, 'learning_rate': 8.387999999999999e-07, 'completion_length': 57.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.043701171875, 'epoch': 0.16} 16%|█▌ | 403/2500 [1:27:59<12:36:19, 21.64s/it] 16%|█▌ | 404/2500 [1:28:12<11:04:23, 19.02s/it] {'loss': 0.0017, 'grad_norm': 0.2522997829544574, 'learning_rate': 8.384e-07, 'completion_length': 52.12500190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0426025390625, 'epoch': 0.16} 16%|█▌ | 404/2500 [1:28:12<11:04:23, 19.02s/it] 16%|█▌ | 405/2500 [1:28:26<10:16:49, 17.67s/it] {'loss': 0.0015, 'grad_norm': 0.13186408851238446, 'learning_rate': 8.38e-07, 'completion_length': 57.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03753662109375, 'epoch': 0.16} 16%|█▌ | 405/2500 [1:28:26<10:16:49, 17.67s/it] 16%|█▌ | 406/2500 [1:28:39<9:26:16, 16.23s/it] {'loss': 0.0016, 'grad_norm': 0.21822146272434378, 'learning_rate': 8.375999999999999e-07, 'completion_length': 52.12500190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0408935546875, 'epoch': 0.16} 16%|█▌ | 406/2500 [1:28:39<9:26:16, 16.23s/it] 16%|█▋ | 407/2500 [1:28:52<8:54:13, 15.31s/it] {'loss': 0.0023, 'grad_norm': 0.17728923500344432, 'learning_rate': 8.372e-07, 'completion_length': 53.642860412597656, 'rewards/accuracy_reward': 0.8571428656578064, 'rewards/format_reward': 1.0, 'reward': 1.8571429252624512, 'reward_std': 0.0, 'kl': 0.05712890625, 'epoch': 0.16} 16%|█▋ | 407/2500 [1:28:52<8:54:13, 15.31s/it] 16%|█▋ | 408/2500 [1:29:07<8:52:16, 15.27s/it] {'loss': 0.0013, 'grad_norm': 2.1830358701137196, 'learning_rate': 8.368e-07, 'completion_length': 71.62500381469727, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.0321044921875, 'epoch': 0.16} 16%|█▋ | 408/2500 [1:29:07<8:52:16, 15.27s/it] 16%|█▋ | 409/2500 [1:29:22<8:41:58, 14.98s/it] {'loss': 0.0013, 'grad_norm': 0.09377101089339704, 'learning_rate': 8.363999999999999e-07, 'completion_length': 61.607147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032958984375, 'epoch': 0.16} 16%|█▋ | 409/2500 [1:29:22<8:41:58, 14.98s/it] 16%|█▋ | 410/2500 [1:29:37<8:45:06, 15.07s/it] {'loss': 0.0013, 'grad_norm': 0.13820718829106426, 'learning_rate': 8.359999999999999e-07, 'completion_length': 60.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0335693359375, 'epoch': 0.16} 16%|█▋ | 410/2500 [1:29:37<8:45:06, 15.07s/it] 16%|█▋ | 411/2500 [1:29:50<8:26:33, 14.55s/it] {'loss': 0.0012, 'grad_norm': 0.11412890474180247, 'learning_rate': 8.356e-07, 'completion_length': 53.42857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02960205078125, 'epoch': 0.16} 16%|█▋ | 411/2500 [1:29:50<8:26:33, 14.55s/it] 16%|█▋ | 412/2500 [1:30:04<8:13:36, 14.18s/it] {'loss': 0.0007, 'grad_norm': 0.11453479133874128, 'learning_rate': 8.352000000000001e-07, 'completion_length': 53.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01702880859375, 'epoch': 0.16} 16%|█▋ | 412/2500 [1:30:04<8:13:36, 14.18s/it] 17%|█▋ | 413/2500 [1:30:17<8:09:03, 14.06s/it] {'loss': 0.0014, 'grad_norm': 1.723210924474256, 'learning_rate': 8.347999999999999e-07, 'completion_length': 60.96428871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03387451171875, 'epoch': 0.17} 17%|█▋ | 413/2500 [1:30:17<8:09:03, 14.06s/it] 17%|█▋ | 414/2500 [1:30:31<8:07:39, 14.03s/it] {'loss': 0.0018, 'grad_norm': 0.06829741194790784, 'learning_rate': 8.344e-07, 'completion_length': 60.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0458984375, 'epoch': 0.17} 17%|█▋ | 414/2500 [1:30:31<8:07:39, 14.03s/it] 17%|█▋ | 415/2500 [1:30:45<8:06:08, 13.99s/it] {'loss': 0.0019, 'grad_norm': 0.10187573026244007, 'learning_rate': 8.34e-07, 'completion_length': 54.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04736328125, 'epoch': 0.17} 17%|█▋ | 415/2500 [1:30:45<8:06:08, 13.99s/it] 17%|█▋ | 416/2500 [1:30:59<8:03:15, 13.91s/it] {'loss': 0.0016, 'grad_norm': 1.4946838978119517, 'learning_rate': 8.335999999999999e-07, 'completion_length': 55.07143020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.04052734375, 'epoch': 0.17} 17%|█▋ | 416/2500 [1:30:59<8:03:15, 13.91s/it] 17%|█▋ | 417/2500 [1:31:13<8:04:48, 13.96s/it] {'loss': 0.0009, 'grad_norm': 0.08833519853099626, 'learning_rate': 8.332e-07, 'completion_length': 58.05357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0233154296875, 'epoch': 0.17} 17%|█▋ | 417/2500 [1:31:13<8:04:48, 13.96s/it] 17%|█▋ | 418/2500 [1:31:27<8:03:19, 13.93s/it] {'loss': 0.0012, 'grad_norm': 0.10762955964876778, 'learning_rate': 8.328e-07, 'completion_length': 55.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02911376953125, 'epoch': 0.17} 17%|█▋ | 418/2500 [1:31:27<8:03:19, 13.93s/it] 17%|█▋ | 419/2500 [1:31:40<7:58:57, 13.81s/it] {'loss': 0.0011, 'grad_norm': 0.10134045413136021, 'learning_rate': 8.324e-07, 'completion_length': 59.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02691650390625, 'epoch': 0.17} 17%|█▋ | 419/2500 [1:31:40<7:58:57, 13.81s/it] 17%|█▋ | 420/2500 [1:31:56<8:13:15, 14.23s/it] {'loss': 0.0018, 'grad_norm': 0.08669895289650797, 'learning_rate': 8.319999999999999e-07, 'completion_length': 63.214290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0439453125, 'epoch': 0.17} 17%|█▋ | 420/2500 [1:31:56<8:13:15, 14.23s/it] 17%|█▋ | 421/2500 [1:32:10<8:12:54, 14.23s/it] {'loss': 0.0009, 'grad_norm': 0.08147705273517226, 'learning_rate': 8.316e-07, 'completion_length': 52.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022308349609375, 'epoch': 0.17} 17%|█▋ | 421/2500 [1:32:10<8:12:54, 14.23s/it] 17%|█▋ | 422/2500 [1:32:25<8:20:37, 14.45s/it] {'loss': 0.0012, 'grad_norm': 0.09620815319315633, 'learning_rate': 8.312e-07, 'completion_length': 58.285715103149414, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.030029296875, 'epoch': 0.17} 17%|█▋ | 422/2500 [1:32:25<8:20:37, 14.45s/it] 17%|█▋ | 423/2500 [1:32:39<8:15:48, 14.32s/it] {'loss': 0.0011, 'grad_norm': 1.3927457036260718, 'learning_rate': 8.308e-07, 'completion_length': 62.50000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.028076171875, 'epoch': 0.17} 17%|█▋ | 423/2500 [1:32:39<8:15:48, 14.32s/it] 17%|█▋ | 424/2500 [1:32:53<8:19:28, 14.44s/it] {'loss': 0.0018, 'grad_norm': 1.0447908059069029, 'learning_rate': 8.304e-07, 'completion_length': 62.80357360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0452880859375, 'epoch': 0.17} 17%|█▋ | 424/2500 [1:32:53<8:19:28, 14.44s/it] 17%|█▋ | 425/2500 [1:33:07<8:10:14, 14.18s/it] {'loss': 0.0012, 'grad_norm': 0.0878047567145916, 'learning_rate': 8.299999999999999e-07, 'completion_length': 57.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02960205078125, 'epoch': 0.17} 17%|█▋ | 425/2500 [1:33:07<8:10:14, 14.18s/it] 17%|█▋ | 426/2500 [1:33:22<8:13:31, 14.28s/it] {'loss': 0.0016, 'grad_norm': 0.09017856098375766, 'learning_rate': 8.296e-07, 'completion_length': 57.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0406494140625, 'epoch': 0.17} 17%|█▋ | 426/2500 [1:33:22<8:13:31, 14.28s/it] 17%|█▋ | 427/2500 [1:33:35<8:07:18, 14.10s/it] {'loss': 0.0011, 'grad_norm': 0.1039746576096053, 'learning_rate': 8.292e-07, 'completion_length': 55.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02667236328125, 'epoch': 0.17} 17%|█▋ | 427/2500 [1:33:35<8:07:18, 14.10s/it] 17%|█▋ | 428/2500 [1:33:50<8:10:03, 14.19s/it] {'loss': 0.0017, 'grad_norm': 0.18100917000327987, 'learning_rate': 8.287999999999999e-07, 'completion_length': 55.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04296875, 'epoch': 0.17} 17%|█▋ | 428/2500 [1:33:50<8:10:03, 14.19s/it] 17%|█▋ | 429/2500 [1:34:04<8:06:45, 14.10s/it] {'loss': 0.001, 'grad_norm': 2.2530777363331795, 'learning_rate': 8.284e-07, 'completion_length': 55.12500190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02581787109375, 'epoch': 0.17} 17%|█▋ | 429/2500 [1:34:04<8:06:45, 14.10s/it] 17%|█▋ | 430/2500 [1:34:18<8:05:55, 14.08s/it] {'loss': 0.0011, 'grad_norm': 0.08013469019274078, 'learning_rate': 8.28e-07, 'completion_length': 57.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0263671875, 'epoch': 0.17} 17%|█▋ | 430/2500 [1:34:18<8:05:55, 14.08s/it] 17%|█▋ | 431/2500 [1:34:32<8:05:58, 14.09s/it] {'loss': 0.0016, 'grad_norm': 0.1476051906980153, 'learning_rate': 8.275999999999999e-07, 'completion_length': 57.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.039306640625, 'epoch': 0.17} 17%|█▋ | 431/2500 [1:34:32<8:05:58, 14.09s/it] 17%|█▋ | 432/2500 [1:34:46<8:10:45, 14.24s/it] {'loss': 0.0017, 'grad_norm': 0.14600734245087257, 'learning_rate': 8.272e-07, 'completion_length': 59.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0430908203125, 'epoch': 0.17} 17%|█▋ | 432/2500 [1:34:46<8:10:45, 14.24s/it] 17%|█▋ | 433/2500 [1:35:00<8:02:21, 14.00s/it] {'loss': 0.0011, 'grad_norm': 0.07949481309889231, 'learning_rate': 8.268e-07, 'completion_length': 55.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0267333984375, 'epoch': 0.17} 17%|█▋ | 433/2500 [1:35:00<8:02:21, 14.00s/it] 17%|█▋ | 434/2500 [1:35:13<7:57:31, 13.87s/it] {'loss': 0.0019, 'grad_norm': 0.8756031836861622, 'learning_rate': 8.263999999999999e-07, 'completion_length': 49.23214530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.046875, 'epoch': 0.17} 17%|█▋ | 434/2500 [1:35:13<7:57:31, 13.87s/it] 17%|█▋ | 435/2500 [1:35:33<8:52:54, 15.48s/it] {'loss': 0.0016, 'grad_norm': 0.8142268895815966, 'learning_rate': 8.259999999999999e-07, 'completion_length': 70.89285850524902, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9464285969734192, 'reward_std': 0.06838765740394592, 'kl': 0.0406494140625, 'epoch': 0.17} 17%|█▋ | 435/2500 [1:35:33<8:52:54, 15.48s/it] 17%|█▋ | 436/2500 [1:35:47<8:40:17, 15.12s/it] {'loss': 0.0012, 'grad_norm': 1.1243671434457805, 'learning_rate': 8.256e-07, 'completion_length': 59.57143211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03118896484375, 'epoch': 0.17} 17%|█▋ | 436/2500 [1:35:47<8:40:17, 15.12s/it] 17%|█▋ | 437/2500 [1:36:02<8:45:25, 15.28s/it] {'loss': 0.0022, 'grad_norm': 0.09946394042334611, 'learning_rate': 8.252000000000001e-07, 'completion_length': 62.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0543212890625, 'epoch': 0.17} 17%|█▋ | 437/2500 [1:36:02<8:45:25, 15.28s/it] 18%|█▊ | 438/2500 [1:36:16<8:27:47, 14.78s/it] {'loss': 0.0014, 'grad_norm': 0.117110135746425, 'learning_rate': 8.247999999999999e-07, 'completion_length': 60.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03515625, 'epoch': 0.18} 18%|█▊ | 438/2500 [1:36:16<8:27:47, 14.78s/it] 18%|█▊ | 439/2500 [1:36:30<8:22:50, 14.64s/it] {'loss': 0.0018, 'grad_norm': 0.3390027125361112, 'learning_rate': 8.244e-07, 'completion_length': 56.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.045654296875, 'epoch': 0.18} 18%|█▊ | 439/2500 [1:36:30<8:22:50, 14.64s/it] 18%|█▊ | 440/2500 [1:36:44<8:15:47, 14.44s/it] {'loss': 0.0018, 'grad_norm': 2.199585602804065, 'learning_rate': 8.24e-07, 'completion_length': 53.69643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.04461669921875, 'epoch': 0.18} 18%|█▊ | 440/2500 [1:36:44<8:15:47, 14.44s/it] 18%|█▊ | 441/2500 [1:36:58<8:10:42, 14.30s/it] {'loss': 0.0009, 'grad_norm': 0.07881018775415614, 'learning_rate': 8.235999999999999e-07, 'completion_length': 53.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02130126953125, 'epoch': 0.18} 18%|█▊ | 441/2500 [1:36:58<8:10:42, 14.30s/it] 18%|█▊ | 442/2500 [1:37:12<8:07:08, 14.20s/it] {'loss': 0.0015, 'grad_norm': 0.14077631317524555, 'learning_rate': 8.232e-07, 'completion_length': 50.73214530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0372314453125, 'epoch': 0.18} 18%|█▊ | 442/2500 [1:37:12<8:07:08, 14.20s/it] 18%|█▊ | 443/2500 [1:37:27<8:11:26, 14.33s/it] {'loss': 0.0017, 'grad_norm': 0.10652988468922694, 'learning_rate': 8.228e-07, 'completion_length': 55.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.043212890625, 'epoch': 0.18} 18%|█▊ | 443/2500 [1:37:27<8:11:26, 14.33s/it] 18%|█▊ | 444/2500 [1:37:40<8:00:48, 14.03s/it] {'loss': 0.0014, 'grad_norm': 0.1976174913681119, 'learning_rate': 8.224e-07, 'completion_length': 51.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03607177734375, 'epoch': 0.18} 18%|█▊ | 444/2500 [1:37:40<8:00:48, 14.03s/it] 18%|█▊ | 445/2500 [1:37:55<8:04:18, 14.14s/it] {'loss': 0.0027, 'grad_norm': 0.6978573824814968, 'learning_rate': 8.219999999999999e-07, 'completion_length': 56.21428680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0665283203125, 'epoch': 0.18} 18%|█▊ | 445/2500 [1:37:55<8:04:18, 14.14s/it] 18%|█▊ | 446/2500 [1:38:08<7:55:47, 13.90s/it] {'loss': 0.0012, 'grad_norm': 0.1514008252781013, 'learning_rate': 8.216e-07, 'completion_length': 53.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03045654296875, 'epoch': 0.18} 18%|█▊ | 446/2500 [1:38:08<7:55:47, 13.90s/it] 18%|█▊ | 447/2500 [1:38:22<8:00:48, 14.05s/it] {'loss': 0.0025, 'grad_norm': 5.606049742609596, 'learning_rate': 8.212e-07, 'completion_length': 56.03571701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.0615234375, 'epoch': 0.18} 18%|█▊ | 447/2500 [1:38:22<8:00:48, 14.05s/it] 18%|█▊ | 448/2500 [1:38:37<8:06:34, 14.23s/it] {'loss': 0.002, 'grad_norm': 3.1449289075085414, 'learning_rate': 8.207999999999999e-07, 'completion_length': 59.75000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0491943359375, 'epoch': 0.18} 18%|█▊ | 448/2500 [1:38:37<8:06:34, 14.23s/it] 18%|█▊ | 449/2500 [1:38:51<8:02:55, 14.13s/it] {'loss': 0.0021, 'grad_norm': 2.2837076762097297, 'learning_rate': 8.204e-07, 'completion_length': 51.17857360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.052490234375, 'epoch': 0.18} 18%|█▊ | 449/2500 [1:38:51<8:02:55, 14.13s/it] 18%|█▊ | 450/2500 [1:39:07<8:24:02, 14.75s/it] {'loss': 0.0012, 'grad_norm': 0.3243496215851178, 'learning_rate': 8.199999999999999e-07, 'completion_length': 52.285715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03125, 'epoch': 0.18} 18%|█▊ | 450/2500 [1:39:07<8:24:02, 14.75s/it] 18%|█▊ | 451/2500 [1:39:21<8:19:25, 14.62s/it] {'loss': 0.0018, 'grad_norm': 0.12207285576268993, 'learning_rate': 8.196e-07, 'completion_length': 58.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04541015625, 'epoch': 0.18} 18%|█▊ | 451/2500 [1:39:21<8:19:25, 14.62s/it] 18%|█▊ | 452/2500 [1:39:35<8:09:22, 14.34s/it] {'loss': 0.0019, 'grad_norm': 0.10842872144869758, 'learning_rate': 8.192e-07, 'completion_length': 53.660715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0474853515625, 'epoch': 0.18} 18%|█▊ | 452/2500 [1:39:35<8:09:22, 14.34s/it] 18%|█▊ | 453/2500 [1:39:49<8:05:43, 14.24s/it] {'loss': 0.002, 'grad_norm': 0.1196007604373062, 'learning_rate': 8.187999999999999e-07, 'completion_length': 53.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0489501953125, 'epoch': 0.18} 18%|█▊ | 453/2500 [1:39:49<8:05:43, 14.24s/it] 18%|█▊ | 454/2500 [1:40:03<8:00:36, 14.09s/it] {'loss': 0.002, 'grad_norm': 0.14109418944697694, 'learning_rate': 8.184e-07, 'completion_length': 60.12500190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0506591796875, 'epoch': 0.18} 18%|█▊ | 454/2500 [1:40:03<8:00:36, 14.09s/it] 18%|█▊ | 455/2500 [1:40:16<7:51:48, 13.84s/it] {'loss': 0.0014, 'grad_norm': 0.09018305460057056, 'learning_rate': 8.179999999999999e-07, 'completion_length': 53.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03387451171875, 'epoch': 0.18} 18%|█▊ | 455/2500 [1:40:16<7:51:48, 13.84s/it] 18%|█▊ | 456/2500 [1:40:29<7:44:46, 13.64s/it] {'loss': 0.0017, 'grad_norm': 0.13345613943402218, 'learning_rate': 8.175999999999999e-07, 'completion_length': 47.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.041748046875, 'epoch': 0.18} 18%|█▊ | 456/2500 [1:40:29<7:44:46, 13.64s/it] 18%|█▊ | 457/2500 [1:40:43<7:42:25, 13.58s/it] {'loss': 0.0018, 'grad_norm': 0.2167654560986977, 'learning_rate': 8.172e-07, 'completion_length': 48.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0462646484375, 'epoch': 0.18} 18%|█▊ | 457/2500 [1:40:43<7:42:25, 13.58s/it] 18%|█▊ | 458/2500 [1:40:58<8:01:15, 14.14s/it] {'loss': 0.0014, 'grad_norm': 0.1911464273557406, 'learning_rate': 8.168e-07, 'completion_length': 53.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03509521484375, 'epoch': 0.18} 18%|█▊ | 458/2500 [1:40:58<8:01:15, 14.14s/it] 18%|█▊ | 459/2500 [1:41:12<7:56:39, 14.01s/it] {'loss': 0.0008, 'grad_norm': 0.08964187433448614, 'learning_rate': 8.163999999999999e-07, 'completion_length': 56.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02081298828125, 'epoch': 0.18} 18%|█▊ | 459/2500 [1:41:12<7:56:39, 14.01s/it] 18%|█▊ | 460/2500 [1:41:27<8:03:05, 14.21s/it] {'loss': 0.0015, 'grad_norm': 1.413609947177239, 'learning_rate': 8.159999999999999e-07, 'completion_length': 57.08928680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0374755859375, 'epoch': 0.18} 18%|█▊ | 460/2500 [1:41:27<8:03:05, 14.21s/it] 18%|█▊ | 461/2500 [1:41:40<7:58:47, 14.09s/it] {'loss': 0.0021, 'grad_norm': 0.09692820879473507, 'learning_rate': 8.156e-07, 'completion_length': 52.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05126953125, 'epoch': 0.18} 18%|█▊ | 461/2500 [1:41:40<7:58:47, 14.09s/it] 18%|█▊ | 462/2500 [1:41:55<7:59:03, 14.10s/it] {'loss': 0.0024, 'grad_norm': 0.11139576730283093, 'learning_rate': 8.152e-07, 'completion_length': 54.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0592041015625, 'epoch': 0.18} 18%|█▊ | 462/2500 [1:41:55<7:59:03, 14.10s/it] 19%|█▊ | 463/2500 [1:42:09<7:59:20, 14.12s/it] {'loss': 0.0015, 'grad_norm': 0.16014670600662195, 'learning_rate': 8.147999999999999e-07, 'completion_length': 52.83928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03851318359375, 'epoch': 0.19} 19%|█▊ | 463/2500 [1:42:09<7:59:20, 14.12s/it] 19%|█▊ | 464/2500 [1:42:22<7:55:42, 14.02s/it] {'loss': 0.0014, 'grad_norm': 1.0329880371206932, 'learning_rate': 8.144e-07, 'completion_length': 57.91071701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0361328125, 'epoch': 0.19} 19%|█▊ | 464/2500 [1:42:22<7:55:42, 14.02s/it] 19%|█▊ | 465/2500 [1:42:37<7:59:14, 14.13s/it] {'loss': 0.0014, 'grad_norm': 1.456745206652413, 'learning_rate': 8.14e-07, 'completion_length': 60.92857551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03594970703125, 'epoch': 0.19} 19%|█▊ | 465/2500 [1:42:37<7:59:14, 14.13s/it] 19%|█▊ | 466/2500 [1:42:52<8:09:23, 14.44s/it] {'loss': 0.0012, 'grad_norm': 0.6798888221854864, 'learning_rate': 8.135999999999999e-07, 'completion_length': 60.107147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03118896484375, 'epoch': 0.19} 19%|█▊ | 466/2500 [1:42:52<8:09:23, 14.44s/it] 19%|█▊ | 467/2500 [1:43:06<8:01:01, 14.20s/it] {'loss': 0.0013, 'grad_norm': 0.07993692239287668, 'learning_rate': 8.132e-07, 'completion_length': 55.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032470703125, 'epoch': 0.19} 19%|█▊ | 467/2500 [1:43:06<8:01:01, 14.20s/it] 19%|█▊ | 468/2500 [1:43:20<8:03:44, 14.28s/it] {'loss': 0.0011, 'grad_norm': 1.7951425883904406, 'learning_rate': 8.128e-07, 'completion_length': 59.41071701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.02789306640625, 'epoch': 0.19} 19%|█▊ | 468/2500 [1:43:20<8:03:44, 14.28s/it] 19%|█▉ | 469/2500 [1:43:34<7:56:35, 14.08s/it] {'loss': 0.0015, 'grad_norm': 1.5813821215813177, 'learning_rate': 8.123999999999999e-07, 'completion_length': 55.80357360839844, 'rewards/accuracy_reward': 0.8750000596046448, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.1071428619325161, 'kl': 0.036376953125, 'epoch': 0.19} 19%|█▉ | 469/2500 [1:43:34<7:56:35, 14.08s/it] 19%|█▉ | 470/2500 [1:43:48<7:56:35, 14.09s/it] {'loss': 0.0011, 'grad_norm': 1.9524617290676836, 'learning_rate': 8.12e-07, 'completion_length': 57.55357551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02783203125, 'epoch': 0.19} 19%|█▉ | 470/2500 [1:43:48<7:56:35, 14.09s/it] 19%|█▉ | 471/2500 [1:44:01<7:49:50, 13.89s/it] {'loss': 0.0024, 'grad_norm': 2.9073814088063843, 'learning_rate': 8.116e-07, 'completion_length': 54.39285850524902, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.060791015625, 'epoch': 0.19} 19%|█▉ | 471/2500 [1:44:01<7:49:50, 13.89s/it] 19%|█▉ | 472/2500 [1:44:15<7:44:35, 13.75s/it] {'loss': 0.0011, 'grad_norm': 0.1474765117990341, 'learning_rate': 8.112e-07, 'completion_length': 55.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0283203125, 'epoch': 0.19} 19%|█▉ | 472/2500 [1:44:15<7:44:35, 13.75s/it] 19%|█▉ | 473/2500 [1:44:28<7:44:44, 13.76s/it] {'loss': 0.0014, 'grad_norm': 0.12467465921446666, 'learning_rate': 8.107999999999999e-07, 'completion_length': 56.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03607177734375, 'epoch': 0.19} 19%|█▉ | 473/2500 [1:44:28<7:44:44, 13.76s/it] 19%|█▉ | 474/2500 [1:44:42<7:44:00, 13.74s/it] {'loss': 0.0014, 'grad_norm': 2.6774620771496314, 'learning_rate': 8.104e-07, 'completion_length': 61.37500190734863, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.035400390625, 'epoch': 0.19} 19%|█▉ | 474/2500 [1:44:42<7:44:00, 13.74s/it] 19%|█▉ | 475/2500 [1:44:56<7:39:37, 13.62s/it] {'loss': 0.0016, 'grad_norm': 0.11216357302576692, 'learning_rate': 8.1e-07, 'completion_length': 52.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0390625, 'epoch': 0.19} 19%|█▉ | 475/2500 [1:44:56<7:39:37, 13.62s/it] 19%|█▉ | 476/2500 [1:45:09<7:41:09, 13.67s/it] {'loss': 0.0008, 'grad_norm': 0.08317032969797844, 'learning_rate': 8.095999999999999e-07, 'completion_length': 55.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02069091796875, 'epoch': 0.19} 19%|█▉ | 476/2500 [1:45:09<7:41:09, 13.67s/it] 19%|█▉ | 477/2500 [1:45:22<7:35:42, 13.52s/it] {'loss': 0.0011, 'grad_norm': 0.13910080372224187, 'learning_rate': 8.092e-07, 'completion_length': 50.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02703857421875, 'epoch': 0.19} 19%|█▉ | 477/2500 [1:45:22<7:35:42, 13.52s/it] 19%|█▉ | 478/2500 [1:45:36<7:37:27, 13.57s/it] {'loss': 0.0011, 'grad_norm': 0.12524113405039322, 'learning_rate': 8.087999999999999e-07, 'completion_length': 56.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02813720703125, 'epoch': 0.19} 19%|█▉ | 478/2500 [1:45:36<7:37:27, 13.57s/it] 19%|█▉ | 479/2500 [1:45:49<7:30:35, 13.38s/it] {'loss': 0.0007, 'grad_norm': 0.09890029945448572, 'learning_rate': 8.084e-07, 'completion_length': 48.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01824951171875, 'epoch': 0.19} 19%|█▉ | 479/2500 [1:45:49<7:30:35, 13.38s/it] 19%|█▉ | 480/2500 [1:46:03<7:31:42, 13.42s/it] {'loss': 0.0013, 'grad_norm': 5.867777095048563, 'learning_rate': 8.08e-07, 'completion_length': 57.41071701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.0316162109375, 'epoch': 0.19} 19%|█▉ | 480/2500 [1:46:03<7:31:42, 13.42s/it] 19%|█▉ | 481/2500 [1:46:16<7:29:00, 13.34s/it] {'loss': 0.001, 'grad_norm': 1.4196042232352402, 'learning_rate': 8.075999999999999e-07, 'completion_length': 54.07143020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.025634765625, 'epoch': 0.19} 19%|█▉ | 481/2500 [1:46:16<7:29:00, 13.34s/it] 19%|█▉ | 482/2500 [1:46:29<7:31:00, 13.41s/it] {'loss': 0.0014, 'grad_norm': 0.28597468856590735, 'learning_rate': 8.072e-07, 'completion_length': 52.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.034423828125, 'epoch': 0.19} 19%|█▉ | 482/2500 [1:46:29<7:31:00, 13.41s/it] 19%|█▉ | 483/2500 [1:46:45<7:53:15, 14.08s/it] {'loss': 0.0022, 'grad_norm': 0.09791927903139762, 'learning_rate': 8.067999999999999e-07, 'completion_length': 64.08928871154785, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.05609130859375, 'epoch': 0.19} 19%|█▉ | 483/2500 [1:46:45<7:53:15, 14.08s/it] 19%|█▉ | 484/2500 [1:46:59<7:56:31, 14.18s/it] {'loss': 0.0021, 'grad_norm': 0.08597558647731238, 'learning_rate': 8.064e-07, 'completion_length': 58.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.052001953125, 'epoch': 0.19} 19%|█▉ | 484/2500 [1:46:59<7:56:31, 14.18s/it] 19%|█▉ | 485/2500 [1:47:13<7:51:23, 14.04s/it] {'loss': 0.0014, 'grad_norm': 0.17555472911695247, 'learning_rate': 8.06e-07, 'completion_length': 61.03571891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.035888671875, 'epoch': 0.19} 19%|█▉ | 485/2500 [1:47:13<7:51:23, 14.04s/it] 19%|█▉ | 486/2500 [1:47:26<7:39:06, 13.68s/it] {'loss': 0.0015, 'grad_norm': 0.13765255580279864, 'learning_rate': 8.056e-07, 'completion_length': 47.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038330078125, 'epoch': 0.19} 19%|█▉ | 486/2500 [1:47:26<7:39:06, 13.68s/it] 19%|█▉ | 487/2500 [1:47:39<7:34:35, 13.55s/it] {'loss': 0.0011, 'grad_norm': 0.15304010558737097, 'learning_rate': 8.052e-07, 'completion_length': 56.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02691650390625, 'epoch': 0.19} 19%|█▉ | 487/2500 [1:47:39<7:34:35, 13.55s/it] 20%|█▉ | 488/2500 [1:47:53<7:32:29, 13.49s/it] {'loss': 0.0016, 'grad_norm': 4.745496234577397, 'learning_rate': 8.047999999999999e-07, 'completion_length': 53.87500190734863, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.039306640625, 'epoch': 0.2} 20%|█▉ | 488/2500 [1:47:53<7:32:29, 13.49s/it] 20%|█▉ | 489/2500 [1:48:06<7:33:43, 13.54s/it] {'loss': 0.001, 'grad_norm': 0.1022926928904197, 'learning_rate': 8.044e-07, 'completion_length': 52.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02490234375, 'epoch': 0.2} 20%|█▉ | 489/2500 [1:48:06<7:33:43, 13.54s/it] 20%|█▉ | 490/2500 [1:48:20<7:32:33, 13.51s/it] {'loss': 0.0017, 'grad_norm': 0.1368431866233337, 'learning_rate': 8.04e-07, 'completion_length': 52.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04351806640625, 'epoch': 0.2} 20%|█▉ | 490/2500 [1:48:20<7:32:33, 13.51s/it] 20%|█▉ | 491/2500 [1:48:34<7:41:15, 13.78s/it] {'loss': 0.0016, 'grad_norm': 1.3875133149741656, 'learning_rate': 8.035999999999999e-07, 'completion_length': 61.232147216796875, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.038818359375, 'epoch': 0.2} 20%|█▉ | 491/2500 [1:48:34<7:41:15, 13.78s/it] 20%|█▉ | 492/2500 [1:48:47<7:37:41, 13.68s/it] {'loss': 0.001, 'grad_norm': 0.10118293946756733, 'learning_rate': 8.032e-07, 'completion_length': 51.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0255126953125, 'epoch': 0.2} 20%|█▉ | 492/2500 [1:48:47<7:37:41, 13.68s/it] 20%|█▉ | 493/2500 [1:49:03<7:50:53, 14.08s/it] {'loss': 0.0011, 'grad_norm': 0.09155458783950769, 'learning_rate': 8.028e-07, 'completion_length': 59.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02691650390625, 'epoch': 0.2} 20%|█▉ | 493/2500 [1:49:03<7:50:53, 14.08s/it] 20%|█▉ | 494/2500 [1:49:17<7:52:34, 14.13s/it] {'loss': 0.0012, 'grad_norm': 1.447495038433746, 'learning_rate': 8.023999999999999e-07, 'completion_length': 56.16071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0296630859375, 'epoch': 0.2} 20%|█▉ | 494/2500 [1:49:17<7:52:34, 14.13s/it] 20%|█▉ | 495/2500 [1:49:30<7:44:09, 13.89s/it] {'loss': 0.0013, 'grad_norm': 2.413184380570108, 'learning_rate': 8.02e-07, 'completion_length': 54.58928680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03314208984375, 'epoch': 0.2} 20%|█▉ | 495/2500 [1:49:30<7:44:09, 13.89s/it] 20%|█▉ | 496/2500 [1:49:45<7:50:44, 14.09s/it] {'loss': 0.0018, 'grad_norm': 0.12823181648789525, 'learning_rate': 8.016e-07, 'completion_length': 54.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04443359375, 'epoch': 0.2} 20%|█▉ | 496/2500 [1:49:45<7:50:44, 14.09s/it] 20%|█▉ | 497/2500 [1:49:59<7:49:44, 14.07s/it] {'loss': 0.0015, 'grad_norm': 0.11349125141044478, 'learning_rate': 8.012e-07, 'completion_length': 56.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0369873046875, 'epoch': 0.2} 20%|█▉ | 497/2500 [1:49:59<7:49:44, 14.07s/it] 20%|█▉ | 498/2500 [1:50:12<7:45:12, 13.94s/it] {'loss': 0.0016, 'grad_norm': 0.1574123202409659, 'learning_rate': 8.007999999999999e-07, 'completion_length': 49.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038818359375, 'epoch': 0.2} 20%|█▉ | 498/2500 [1:50:12<7:45:12, 13.94s/it] 20%|█▉ | 499/2500 [1:50:25<7:36:00, 13.67s/it] {'loss': 0.0019, 'grad_norm': 0.19341094293246472, 'learning_rate': 8.004e-07, 'completion_length': 51.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.046630859375, 'epoch': 0.2} 20%|█▉ | 499/2500 [1:50:25<7:36:00, 13.67s/it] 20%|██ | 500/2500 [1:50:39<7:33:51, 13.62s/it] {'loss': 0.0018, 'grad_norm': 1.5695730492422793, 'learning_rate': 8e-07, 'completion_length': 51.21428871154785, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.04400634765625, 'epoch': 0.2} 20%|██ | 500/2500 [1:50:39<7:33:51, 13.62s/it] 20%|██ | 501/2500 [1:51:49<17:03:36, 30.72s/it] {'loss': 0.001, 'grad_norm': 0.14585924699985925, 'learning_rate': 7.995999999999999e-07, 'completion_length': 55.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0257568359375, 'epoch': 0.2} 20%|██ | 501/2500 [1:51:50<17:03:36, 30.72s/it] 20%|██ | 502/2500 [1:52:04<14:17:29, 25.75s/it] {'loss': 0.001, 'grad_norm': 0.15752377113633048, 'learning_rate': 7.992e-07, 'completion_length': 54.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023956298828125, 'epoch': 0.2} 20%|██ | 502/2500 [1:52:04<14:17:29, 25.75s/it] 20%|██ | 503/2500 [1:52:18<12:21:43, 22.28s/it] {'loss': 0.0017, 'grad_norm': 0.08002869879382464, 'learning_rate': 7.987999999999999e-07, 'completion_length': 51.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0413818359375, 'epoch': 0.2} 20%|██ | 503/2500 [1:52:18<12:21:43, 22.28s/it] 20%|██ | 504/2500 [1:52:32<11:01:59, 19.90s/it] {'loss': 0.0011, 'grad_norm': 0.0860311549069076, 'learning_rate': 7.984e-07, 'completion_length': 60.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02716064453125, 'epoch': 0.2} 20%|██ | 504/2500 [1:52:32<11:01:59, 19.90s/it] 20%|██ | 505/2500 [1:52:46<10:01:10, 18.08s/it] {'loss': 0.0018, 'grad_norm': 0.1920046438284545, 'learning_rate': 7.98e-07, 'completion_length': 59.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.044189453125, 'epoch': 0.2} 20%|██ | 505/2500 [1:52:46<10:01:10, 18.08s/it] 20%|██ | 506/2500 [1:53:00<9:20:57, 16.88s/it] {'loss': 0.0017, 'grad_norm': 0.42443042060997016, 'learning_rate': 7.975999999999999e-07, 'completion_length': 60.589290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04150390625, 'epoch': 0.2} 20%|██ | 506/2500 [1:53:00<9:20:57, 16.88s/it] 20%|██ | 507/2500 [1:53:17<9:20:48, 16.88s/it] {'loss': 0.0011, 'grad_norm': 0.0804792788101105, 'learning_rate': 7.972e-07, 'completion_length': 65.26786231994629, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02777099609375, 'epoch': 0.2} 20%|██ | 507/2500 [1:53:17<9:20:48, 16.88s/it] 20%|██ | 508/2500 [1:53:31<8:50:14, 15.97s/it] {'loss': 0.0011, 'grad_norm': 0.08546949844107674, 'learning_rate': 7.967999999999999e-07, 'completion_length': 56.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02813720703125, 'epoch': 0.2} 20%|██ | 508/2500 [1:53:31<8:50:14, 15.97s/it] 20%|██ | 509/2500 [1:53:46<8:38:08, 15.61s/it] {'loss': 0.0014, 'grad_norm': 0.13122914443250752, 'learning_rate': 7.964e-07, 'completion_length': 58.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03607177734375, 'epoch': 0.2} 20%|██ | 509/2500 [1:53:46<8:38:08, 15.61s/it] 20%|██ | 510/2500 [1:53:59<8:15:03, 14.93s/it] {'loss': 0.0016, 'grad_norm': 2.314452754207779, 'learning_rate': 7.96e-07, 'completion_length': 53.53571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0406494140625, 'epoch': 0.2} 20%|██ | 510/2500 [1:53:59<8:15:03, 14.93s/it] 20%|██ | 511/2500 [1:54:13<8:04:12, 14.61s/it] {'loss': 0.0017, 'grad_norm': 0.21556425821202474, 'learning_rate': 7.956e-07, 'completion_length': 55.625003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0418701171875, 'epoch': 0.2} 20%|██ | 511/2500 [1:54:13<8:04:12, 14.61s/it] 20%|██ | 512/2500 [1:54:27<7:57:35, 14.41s/it] {'loss': 0.0016, 'grad_norm': 0.07082254508201755, 'learning_rate': 7.952e-07, 'completion_length': 59.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03948974609375, 'epoch': 0.2} 20%|██ | 512/2500 [1:54:27<7:57:35, 14.41s/it] 21%|██ | 513/2500 [1:54:40<7:49:26, 14.18s/it] {'loss': 0.0017, 'grad_norm': 0.8373076902644754, 'learning_rate': 7.947999999999999e-07, 'completion_length': 58.58928871154785, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.04180908203125, 'epoch': 0.21} 21%|██ | 513/2500 [1:54:40<7:49:26, 14.18s/it] 21%|██ | 514/2500 [1:54:54<7:44:20, 14.03s/it] {'loss': 0.0009, 'grad_norm': 0.08565523088893261, 'learning_rate': 7.944e-07, 'completion_length': 58.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02349853515625, 'epoch': 0.21} 21%|██ | 514/2500 [1:54:54<7:44:20, 14.03s/it] 21%|██ | 515/2500 [1:55:07<7:35:24, 13.77s/it] {'loss': 0.0016, 'grad_norm': 1.7249958650030692, 'learning_rate': 7.94e-07, 'completion_length': 56.10714530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0401611328125, 'epoch': 0.21} 21%|██ | 515/2500 [1:55:07<7:35:24, 13.77s/it] 21%|██ | 516/2500 [1:55:24<8:07:13, 14.73s/it] {'loss': 0.0029, 'grad_norm': 0.9967538818727476, 'learning_rate': 7.935999999999999e-07, 'completion_length': 71.05357360839844, 'rewards/accuracy_reward': 0.8392857313156128, 'rewards/format_reward': 1.0, 'reward': 1.8392857313156128, 'reward_std': 0.0357142873108387, 'kl': 0.073486328125, 'epoch': 0.21} 21%|██ | 516/2500 [1:55:24<8:07:13, 14.73s/it] 21%|██ | 517/2500 [1:55:37<7:51:08, 14.26s/it] {'loss': 0.001, 'grad_norm': 0.11192260170433246, 'learning_rate': 7.932e-07, 'completion_length': 51.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024658203125, 'epoch': 0.21} 21%|██ | 517/2500 [1:55:37<7:51:08, 14.26s/it] 21%|██ | 518/2500 [1:55:51<7:43:35, 14.03s/it] {'loss': 0.0019, 'grad_norm': 0.14807606125534462, 'learning_rate': 7.928e-07, 'completion_length': 59.392860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0478515625, 'epoch': 0.21} 21%|██ | 518/2500 [1:55:51<7:43:35, 14.03s/it] 21%|██ | 519/2500 [1:56:04<7:34:07, 13.75s/it] {'loss': 0.0015, 'grad_norm': 0.10653277353838134, 'learning_rate': 7.923999999999999e-07, 'completion_length': 47.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.21} 21%|██ | 519/2500 [1:56:04<7:34:07, 13.75s/it] 21%|██ | 520/2500 [1:56:18<7:33:22, 13.74s/it] {'loss': 0.0016, 'grad_norm': 0.08434201004480697, 'learning_rate': 7.92e-07, 'completion_length': 53.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0389404296875, 'epoch': 0.21} 21%|██ | 520/2500 [1:56:18<7:33:22, 13.74s/it] 21%|██ | 521/2500 [1:56:31<7:29:59, 13.64s/it] {'loss': 0.0027, 'grad_norm': 0.7159523818731053, 'learning_rate': 7.916e-07, 'completion_length': 54.42857360839844, 'rewards/accuracy_reward': 0.9107142984867096, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.0357142873108387, 'kl': 0.068359375, 'epoch': 0.21} 21%|██ | 521/2500 [1:56:31<7:29:59, 13.64s/it] 21%|██ | 522/2500 [1:56:47<7:53:00, 14.35s/it] {'loss': 0.0012, 'grad_norm': 0.0903252910199644, 'learning_rate': 7.911999999999999e-07, 'completion_length': 70.73214721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02886962890625, 'epoch': 0.21} 21%|██ | 522/2500 [1:56:47<7:53:00, 14.35s/it] 21%|██ | 523/2500 [1:57:01<7:50:01, 14.26s/it] {'loss': 0.0013, 'grad_norm': 0.15792338200641223, 'learning_rate': 7.907999999999999e-07, 'completion_length': 57.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0328369140625, 'epoch': 0.21} 21%|██ | 523/2500 [1:57:01<7:50:01, 14.26s/it] 21%|██ | 524/2500 [1:57:15<7:41:50, 14.02s/it] {'loss': 0.0014, 'grad_norm': 0.12584613873853087, 'learning_rate': 7.904e-07, 'completion_length': 56.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03436279296875, 'epoch': 0.21} 21%|██ | 524/2500 [1:57:15<7:41:50, 14.02s/it] 21%|██ | 525/2500 [1:57:30<7:50:23, 14.29s/it] {'loss': 0.002, 'grad_norm': 0.0946502228582285, 'learning_rate': 7.9e-07, 'completion_length': 62.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05078125, 'epoch': 0.21} 21%|██ | 525/2500 [1:57:30<7:50:23, 14.29s/it] 21%|██ | 526/2500 [1:57:46<8:09:03, 14.86s/it] {'loss': 0.001, 'grad_norm': 0.45970182058241943, 'learning_rate': 7.895999999999999e-07, 'completion_length': 66.53571510314941, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02410888671875, 'epoch': 0.21} 21%|██ | 526/2500 [1:57:46<8:09:03, 14.86s/it] 21%|██ | 527/2500 [1:57:59<7:54:15, 14.42s/it] {'loss': 0.0012, 'grad_norm': 2.344351401444678, 'learning_rate': 7.892e-07, 'completion_length': 52.23214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.030517578125, 'epoch': 0.21} 21%|██ | 527/2500 [1:57:59<7:54:15, 14.42s/it] 21%|██ | 528/2500 [1:58:13<7:47:04, 14.21s/it] {'loss': 0.002, 'grad_norm': 0.727100039834431, 'learning_rate': 7.887999999999999e-07, 'completion_length': 53.30357551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.048828125, 'epoch': 0.21} 21%|██ | 528/2500 [1:58:13<7:47:04, 14.21s/it] 21%|██ | 529/2500 [1:58:27<7:46:08, 14.19s/it] {'loss': 0.0006, 'grad_norm': 0.10372954284474392, 'learning_rate': 7.883999999999999e-07, 'completion_length': 62.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015472412109375, 'epoch': 0.21} 21%|██ | 529/2500 [1:58:27<7:46:08, 14.19s/it] 21%|██ | 530/2500 [1:58:40<7:37:19, 13.93s/it] {'loss': 0.0008, 'grad_norm': 0.38140260378389, 'learning_rate': 7.88e-07, 'completion_length': 50.035715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019866943359375, 'epoch': 0.21} 21%|██ | 530/2500 [1:58:40<7:37:19, 13.93s/it] 21%|██ | 531/2500 [1:58:55<7:41:16, 14.06s/it] {'loss': 0.0009, 'grad_norm': 0.09074217225329, 'learning_rate': 7.875999999999999e-07, 'completion_length': 51.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021240234375, 'epoch': 0.21} 21%|██ | 531/2500 [1:58:55<7:41:16, 14.06s/it] 21%|██▏ | 532/2500 [1:59:08<7:30:26, 13.73s/it] {'loss': 0.0014, 'grad_norm': 0.14989551964856093, 'learning_rate': 7.872e-07, 'completion_length': 43.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.033935546875, 'epoch': 0.21} 21%|██▏ | 532/2500 [1:59:08<7:30:26, 13.73s/it] 21%|██▏ | 533/2500 [1:59:22<7:31:47, 13.78s/it] {'loss': 0.0012, 'grad_norm': 0.08767572629883624, 'learning_rate': 7.868e-07, 'completion_length': 60.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03009033203125, 'epoch': 0.21} 21%|██▏ | 533/2500 [1:59:22<7:31:47, 13.78s/it] 21%|██▏ | 534/2500 [1:59:35<7:28:50, 13.70s/it] {'loss': 0.0013, 'grad_norm': 0.12940810370332614, 'learning_rate': 7.864e-07, 'completion_length': 59.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03271484375, 'epoch': 0.21} 21%|██▏ | 534/2500 [1:59:35<7:28:50, 13.70s/it] 21%|██▏ | 535/2500 [1:59:49<7:27:51, 13.67s/it] {'loss': 0.0009, 'grad_norm': 0.09303680841669618, 'learning_rate': 7.86e-07, 'completion_length': 53.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0233154296875, 'epoch': 0.21} 21%|██▏ | 535/2500 [1:59:49<7:27:51, 13.67s/it] 21%|██▏ | 536/2500 [2:00:04<7:43:28, 14.16s/it] {'loss': 0.0012, 'grad_norm': 0.09012925675106415, 'learning_rate': 7.855999999999999e-07, 'completion_length': 61.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02996826171875, 'epoch': 0.21} 21%|██▏ | 536/2500 [2:00:04<7:43:28, 14.16s/it] 21%|██▏ | 537/2500 [2:00:19<7:56:26, 14.56s/it] {'loss': 0.0011, 'grad_norm': 1.6651509937539093, 'learning_rate': 7.852e-07, 'completion_length': 60.96428871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.0284423828125, 'epoch': 0.21} 21%|██▏ | 537/2500 [2:00:19<7:56:26, 14.56s/it] 22%|██▏ | 538/2500 [2:00:35<8:09:12, 14.96s/it] {'loss': 0.0008, 'grad_norm': 0.2520635439123829, 'learning_rate': 7.848e-07, 'completion_length': 65.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020751953125, 'epoch': 0.22} 22%|██▏ | 538/2500 [2:00:35<8:09:12, 14.96s/it] 22%|██▏ | 539/2500 [2:00:49<7:55:02, 14.53s/it] {'loss': 0.001, 'grad_norm': 0.12680938474901435, 'learning_rate': 7.844e-07, 'completion_length': 55.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02496337890625, 'epoch': 0.22} 22%|██▏ | 539/2500 [2:00:49<7:55:02, 14.53s/it] 22%|██▏ | 540/2500 [2:01:05<8:09:48, 14.99s/it] {'loss': 0.0016, 'grad_norm': 0.17957404001721405, 'learning_rate': 7.84e-07, 'completion_length': 63.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.040283203125, 'epoch': 0.22} 22%|██▏ | 540/2500 [2:01:05<8:09:48, 14.99s/it] 22%|██▏ | 541/2500 [2:01:20<8:06:49, 14.91s/it] {'loss': 0.0015, 'grad_norm': 3.296756526977308, 'learning_rate': 7.835999999999999e-07, 'completion_length': 62.58928871154785, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.0357142873108387, 'kl': 0.0386962890625, 'epoch': 0.22} 22%|██▏ | 541/2500 [2:01:20<8:06:49, 14.91s/it] 22%|██▏ | 542/2500 [2:01:35<8:08:18, 14.96s/it] {'loss': 0.0019, 'grad_norm': 0.09525766093750815, 'learning_rate': 7.832e-07, 'completion_length': 61.392860412597656, 'rewards/accuracy_reward': 0.8571428656578064, 'rewards/format_reward': 1.0, 'reward': 1.8571429252624512, 'reward_std': 0.0, 'kl': 0.048095703125, 'epoch': 0.22} 22%|██▏ | 542/2500 [2:01:35<8:08:18, 14.96s/it] 22%|██▏ | 543/2500 [2:01:48<7:50:14, 14.42s/it] {'loss': 0.0011, 'grad_norm': 1.4251618280719291, 'learning_rate': 7.828e-07, 'completion_length': 51.10714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0286865234375, 'epoch': 0.22} 22%|██▏ | 543/2500 [2:01:48<7:50:14, 14.42s/it] 22%|██▏ | 544/2500 [2:02:02<7:50:05, 14.42s/it] {'loss': 0.0012, 'grad_norm': 0.09890065278152432, 'learning_rate': 7.823999999999999e-07, 'completion_length': 63.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0301513671875, 'epoch': 0.22} 22%|██▏ | 544/2500 [2:02:02<7:50:05, 14.42s/it] 22%|██▏ | 545/2500 [2:02:17<7:50:10, 14.43s/it] {'loss': 0.0012, 'grad_norm': 1.5747090645783535, 'learning_rate': 7.82e-07, 'completion_length': 62.000003814697266, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.031005859375, 'epoch': 0.22} 22%|██▏ | 545/2500 [2:02:17<7:50:10, 14.43s/it] 22%|██▏ | 546/2500 [2:02:30<7:40:42, 14.15s/it] {'loss': 0.0013, 'grad_norm': 0.10527433209102609, 'learning_rate': 7.816e-07, 'completion_length': 61.05357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032958984375, 'epoch': 0.22} 22%|██▏ | 546/2500 [2:02:30<7:40:42, 14.15s/it] 22%|██▏ | 547/2500 [2:02:50<8:32:04, 15.73s/it] {'loss': 0.0015, 'grad_norm': 2.9116336626028496, 'learning_rate': 7.811999999999999e-07, 'completion_length': 73.3214340209961, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9285715222358704, 'reward_std': 0.10410194471478462, 'kl': 0.03790283203125, 'epoch': 0.22} 22%|██▏ | 547/2500 [2:02:50<8:32:04, 15.73s/it] 22%|██▏ | 548/2500 [2:03:03<8:09:31, 15.05s/it] {'loss': 0.0014, 'grad_norm': 0.197755200003853, 'learning_rate': 7.808e-07, 'completion_length': 54.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03515625, 'epoch': 0.22} 22%|██▏ | 548/2500 [2:03:03<8:09:31, 15.05s/it] 22%|██▏ | 549/2500 [2:03:20<8:25:00, 15.53s/it] {'loss': 0.0014, 'grad_norm': 0.14408457653276244, 'learning_rate': 7.804e-07, 'completion_length': 64.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0357666015625, 'epoch': 0.22} 22%|██▏ | 549/2500 [2:03:20<8:25:00, 15.53s/it] 22%|██▏ | 550/2500 [2:03:33<8:06:06, 14.96s/it] {'loss': 0.0013, 'grad_norm': 0.13377961884289588, 'learning_rate': 7.799999999999999e-07, 'completion_length': 61.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03173828125, 'epoch': 0.22} 22%|██▏ | 550/2500 [2:03:33<8:06:06, 14.96s/it] 22%|██▏ | 551/2500 [2:03:48<7:59:43, 14.77s/it] {'loss': 0.0016, 'grad_norm': 0.1149502066561918, 'learning_rate': 7.795999999999999e-07, 'completion_length': 69.39286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038818359375, 'epoch': 0.22} 22%|██▏ | 551/2500 [2:03:48<7:59:43, 14.77s/it] 22%|██▏ | 552/2500 [2:04:02<7:50:38, 14.50s/it] {'loss': 0.001, 'grad_norm': 0.09308279480948756, 'learning_rate': 7.792e-07, 'completion_length': 62.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0257568359375, 'epoch': 0.22} 22%|██▏ | 552/2500 [2:04:02<7:50:38, 14.50s/it] 22%|██▏ | 553/2500 [2:04:15<7:40:39, 14.20s/it] {'loss': 0.0022, 'grad_norm': 1.1502880200352887, 'learning_rate': 7.788000000000001e-07, 'completion_length': 53.910715103149414, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0555419921875, 'epoch': 0.22} 22%|██▏ | 553/2500 [2:04:15<7:40:39, 14.20s/it] 22%|██▏ | 554/2500 [2:04:29<7:36:13, 14.07s/it] {'loss': 0.0012, 'grad_norm': 0.07801866488456169, 'learning_rate': 7.783999999999999e-07, 'completion_length': 59.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0311279296875, 'epoch': 0.22} 22%|██▏ | 554/2500 [2:04:29<7:36:13, 14.07s/it] 22%|██▏ | 555/2500 [2:04:44<7:43:07, 14.29s/it] {'loss': 0.0019, 'grad_norm': 3.1392343052871334, 'learning_rate': 7.78e-07, 'completion_length': 61.28571891784668, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.07695359364151955, 'kl': 0.0487060546875, 'epoch': 0.22} 22%|██▏ | 555/2500 [2:04:44<7:43:07, 14.29s/it] 22%|██▏ | 556/2500 [2:04:59<7:54:53, 14.66s/it] {'loss': 0.0021, 'grad_norm': 2.5238654610708435, 'learning_rate': 7.776e-07, 'completion_length': 64.12500381469727, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.11266788095235825, 'kl': 0.052734375, 'epoch': 0.22} 22%|██▏ | 556/2500 [2:04:59<7:54:53, 14.66s/it] 22%|██▏ | 557/2500 [2:05:12<7:39:21, 14.18s/it] {'loss': 0.0017, 'grad_norm': 1.7865984899574487, 'learning_rate': 7.771999999999999e-07, 'completion_length': 54.66071701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0419921875, 'epoch': 0.22} 22%|██▏ | 557/2500 [2:05:12<7:39:21, 14.18s/it] 22%|██▏ | 558/2500 [2:05:27<7:40:17, 14.22s/it] {'loss': 0.0025, 'grad_norm': 1.3500083281031163, 'learning_rate': 7.768e-07, 'completion_length': 65.21428680419922, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.0635986328125, 'epoch': 0.22} 22%|██▏ | 558/2500 [2:05:27<7:40:17, 14.22s/it] 22%|██▏ | 559/2500 [2:05:40<7:32:52, 14.00s/it] {'loss': 0.001, 'grad_norm': 0.10401288666358154, 'learning_rate': 7.764e-07, 'completion_length': 58.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0242919921875, 'epoch': 0.22} 22%|██▏ | 559/2500 [2:05:40<7:32:52, 14.00s/it] 22%|██▏ | 560/2500 [2:05:54<7:33:56, 14.04s/it] {'loss': 0.0014, 'grad_norm': 0.7077341982127799, 'learning_rate': 7.76e-07, 'completion_length': 55.76785850524902, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03485107421875, 'epoch': 0.22} 22%|██▏ | 560/2500 [2:05:54<7:33:56, 14.04s/it] 22%|██▏ | 561/2500 [2:06:09<7:40:46, 14.26s/it] {'loss': 0.0013, 'grad_norm': 0.1400884216720984, 'learning_rate': 7.755999999999999e-07, 'completion_length': 64.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032470703125, 'epoch': 0.22} 22%|██▏ | 561/2500 [2:06:09<7:40:46, 14.26s/it] 22%|██▏ | 562/2500 [2:06:24<7:45:00, 14.40s/it] {'loss': 0.0012, 'grad_norm': 0.16517076733911748, 'learning_rate': 7.752e-07, 'completion_length': 66.17857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03070068359375, 'epoch': 0.22} 22%|██▏ | 562/2500 [2:06:24<7:45:00, 14.40s/it] 23%|██▎ | 563/2500 [2:06:39<7:50:04, 14.56s/it] {'loss': 0.0011, 'grad_norm': 5.278849428675341, 'learning_rate': 7.748e-07, 'completion_length': 58.08928871154785, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.02734375, 'epoch': 0.23} 23%|██▎ | 563/2500 [2:06:39<7:50:04, 14.56s/it] 23%|██▎ | 564/2500 [2:06:53<7:49:56, 14.56s/it] {'loss': 0.0011, 'grad_norm': 0.09627506391207313, 'learning_rate': 7.743999999999999e-07, 'completion_length': 69.08928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026611328125, 'epoch': 0.23} 23%|██▎ | 564/2500 [2:06:53<7:49:56, 14.56s/it] 23%|██▎ | 565/2500 [2:07:06<7:37:27, 14.18s/it] {'loss': 0.0018, 'grad_norm': 0.2599826637743689, 'learning_rate': 7.74e-07, 'completion_length': 52.375003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0450439453125, 'epoch': 0.23} 23%|██▎ | 565/2500 [2:07:06<7:37:27, 14.18s/it] 23%|██▎ | 566/2500 [2:07:21<7:45:18, 14.44s/it] {'loss': 0.0012, 'grad_norm': 1.4680452736348446, 'learning_rate': 7.735999999999999e-07, 'completion_length': 54.73214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02886962890625, 'epoch': 0.23} 23%|██▎ | 566/2500 [2:07:21<7:45:18, 14.44s/it] 23%|██▎ | 567/2500 [2:07:35<7:38:17, 14.23s/it] {'loss': 0.0019, 'grad_norm': 1.064731034055635, 'learning_rate': 7.732e-07, 'completion_length': 63.46428871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.046630859375, 'epoch': 0.23} 23%|██▎ | 567/2500 [2:07:35<7:38:17, 14.23s/it] 23%|██▎ | 568/2500 [2:07:49<7:34:30, 14.12s/it] {'loss': 0.0023, 'grad_norm': 1.2905789853819656, 'learning_rate': 7.728e-07, 'completion_length': 57.16071701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0582275390625, 'epoch': 0.23} 23%|██▎ | 568/2500 [2:07:49<7:34:30, 14.12s/it] 23%|██▎ | 569/2500 [2:08:03<7:34:01, 14.11s/it] {'loss': 0.0013, 'grad_norm': 1.6349682888013468, 'learning_rate': 7.723999999999999e-07, 'completion_length': 59.767860412597656, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.03277587890625, 'epoch': 0.23} 23%|██▎ | 569/2500 [2:08:03<7:34:01, 14.11s/it] 23%|██▎ | 570/2500 [2:08:18<7:40:20, 14.31s/it] {'loss': 0.0022, 'grad_norm': 0.11012860133670342, 'learning_rate': 7.72e-07, 'completion_length': 56.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0548095703125, 'epoch': 0.23} 23%|██▎ | 570/2500 [2:08:18<7:40:20, 14.31s/it] 23%|██▎ | 571/2500 [2:08:32<7:35:24, 14.17s/it] {'loss': 0.0018, 'grad_norm': 1.369486962014182, 'learning_rate': 7.716e-07, 'completion_length': 54.62500190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.04443359375, 'epoch': 0.23} 23%|██▎ | 571/2500 [2:08:32<7:35:24, 14.17s/it] 23%|██▎ | 572/2500 [2:08:46<7:40:06, 14.32s/it] {'loss': 0.001, 'grad_norm': 0.13516710869608384, 'learning_rate': 7.711999999999999e-07, 'completion_length': 64.96428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02447509765625, 'epoch': 0.23} 23%|██▎ | 572/2500 [2:08:46<7:40:06, 14.32s/it] 23%|██▎ | 573/2500 [2:09:01<7:37:54, 14.26s/it] {'loss': 0.0012, 'grad_norm': 0.11882251769462722, 'learning_rate': 7.708e-07, 'completion_length': 64.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03070068359375, 'epoch': 0.23} 23%|██▎ | 573/2500 [2:09:01<7:37:54, 14.26s/it] 23%|██▎ | 574/2500 [2:09:14<7:27:46, 13.95s/it] {'loss': 0.0008, 'grad_norm': 0.14827485364492954, 'learning_rate': 7.704e-07, 'completion_length': 51.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01910400390625, 'epoch': 0.23} 23%|██▎ | 574/2500 [2:09:14<7:27:46, 13.95s/it] 23%|██▎ | 575/2500 [2:09:29<7:40:16, 14.35s/it] {'loss': 0.0022, 'grad_norm': 1.5704600465924425, 'learning_rate': 7.699999999999999e-07, 'completion_length': 67.39286041259766, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0543212890625, 'epoch': 0.23} 23%|██▎ | 575/2500 [2:09:29<7:40:16, 14.35s/it] 23%|██▎ | 576/2500 [2:09:44<7:41:46, 14.40s/it] {'loss': 0.0018, 'grad_norm': 0.10480987095022251, 'learning_rate': 7.695999999999999e-07, 'completion_length': 56.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0439453125, 'epoch': 0.23} 23%|██▎ | 576/2500 [2:09:44<7:41:46, 14.40s/it] 23%|██▎ | 577/2500 [2:09:57<7:33:53, 14.16s/it] {'loss': 0.002, 'grad_norm': 0.14363564057100092, 'learning_rate': 7.692e-07, 'completion_length': 53.01785850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.051025390625, 'epoch': 0.23} 23%|██▎ | 577/2500 [2:09:57<7:33:53, 14.16s/it] 23%|██▎ | 578/2500 [2:10:11<7:31:33, 14.10s/it] {'loss': 0.0015, 'grad_norm': 0.09988364835504213, 'learning_rate': 7.688000000000001e-07, 'completion_length': 51.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0367431640625, 'epoch': 0.23} 23%|██▎ | 578/2500 [2:10:11<7:31:33, 14.10s/it] 23%|██▎ | 579/2500 [2:10:25<7:28:31, 14.01s/it] {'loss': 0.001, 'grad_norm': 0.1392339023007377, 'learning_rate': 7.683999999999999e-07, 'completion_length': 63.92857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025390625, 'epoch': 0.23} 23%|██▎ | 579/2500 [2:10:25<7:28:31, 14.01s/it] 23%|██▎ | 580/2500 [2:10:44<8:14:43, 15.46s/it] {'loss': 0.0015, 'grad_norm': 2.4579297513524025, 'learning_rate': 7.68e-07, 'completion_length': 66.96429061889648, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9285714626312256, 'reward_std': 0.11266788095235825, 'kl': 0.0384521484375, 'epoch': 0.23} 23%|██▎ | 580/2500 [2:10:44<8:14:43, 15.46s/it] 23%|██▎ | 581/2500 [2:10:57<7:52:55, 14.79s/it] {'loss': 0.0009, 'grad_norm': 0.07677915329165722, 'learning_rate': 7.676e-07, 'completion_length': 52.000003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.23} 23%|██▎ | 581/2500 [2:10:57<7:52:55, 14.79s/it] 23%|██▎ | 582/2500 [2:11:11<7:44:18, 14.52s/it] {'loss': 0.002, 'grad_norm': 0.08739343757548869, 'learning_rate': 7.671999999999999e-07, 'completion_length': 55.21428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.050048828125, 'epoch': 0.23} 23%|██▎ | 582/2500 [2:11:11<7:44:18, 14.52s/it] 23%|██▎ | 583/2500 [2:11:25<7:42:25, 14.47s/it] {'loss': 0.0014, 'grad_norm': 0.12242280944913218, 'learning_rate': 7.668e-07, 'completion_length': 60.48214530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0340576171875, 'epoch': 0.23} 23%|██▎ | 583/2500 [2:11:25<7:42:25, 14.47s/it] 23%|██▎ | 584/2500 [2:11:40<7:48:34, 14.67s/it] {'loss': 0.0017, 'grad_norm': 4.080543204177776, 'learning_rate': 7.664e-07, 'completion_length': 57.66071701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0433349609375, 'epoch': 0.23} 23%|██▎ | 584/2500 [2:11:40<7:48:34, 14.67s/it] 23%|██▎ | 585/2500 [2:11:55<7:48:17, 14.67s/it] {'loss': 0.0023, 'grad_norm': 0.12128201275388086, 'learning_rate': 7.66e-07, 'completion_length': 64.73214721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.056884765625, 'epoch': 0.23} 23%|██▎ | 585/2500 [2:11:55<7:48:17, 14.67s/it] 23%|██▎ | 586/2500 [2:12:09<7:38:21, 14.37s/it] {'loss': 0.0014, 'grad_norm': 0.1275405870614131, 'learning_rate': 7.655999999999999e-07, 'completion_length': 57.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03369140625, 'epoch': 0.23} 23%|██▎ | 586/2500 [2:12:09<7:38:21, 14.37s/it] 23%|██▎ | 587/2500 [2:12:22<7:30:36, 14.13s/it] {'loss': 0.0017, 'grad_norm': 0.1681039378351692, 'learning_rate': 7.652e-07, 'completion_length': 59.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04345703125, 'epoch': 0.23} 23%|██▎ | 587/2500 [2:12:22<7:30:36, 14.13s/it] 24%|██▎ | 588/2500 [2:12:36<7:23:37, 13.92s/it] {'loss': 0.0023, 'grad_norm': 0.2102387906577779, 'learning_rate': 7.648e-07, 'completion_length': 51.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.056396484375, 'epoch': 0.24} 24%|██▎ | 588/2500 [2:12:36<7:23:37, 13.92s/it] 24%|██▎ | 589/2500 [2:12:49<7:16:01, 13.69s/it] {'loss': 0.0019, 'grad_norm': 0.10732278647021079, 'learning_rate': 7.643999999999999e-07, 'completion_length': 52.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0484619140625, 'epoch': 0.24} 24%|██▎ | 589/2500 [2:12:49<7:16:01, 13.69s/it] 24%|██▎ | 590/2500 [2:13:02<7:13:47, 13.63s/it] {'loss': 0.0009, 'grad_norm': 0.2597631193837064, 'learning_rate': 7.64e-07, 'completion_length': 54.375003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.24} 24%|██▎ | 590/2500 [2:13:02<7:13:47, 13.63s/it] 24%|██▎ | 591/2500 [2:13:17<7:20:06, 13.83s/it] {'loss': 0.0011, 'grad_norm': 1.5417499326114328, 'learning_rate': 7.635999999999999e-07, 'completion_length': 61.12500190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02734375, 'epoch': 0.24} 24%|██▎ | 591/2500 [2:13:17<7:20:06, 13.83s/it] 24%|██▎ | 592/2500 [2:13:30<7:15:48, 13.70s/it] {'loss': 0.0016, 'grad_norm': 0.32281049748861224, 'learning_rate': 7.632e-07, 'completion_length': 55.785715103149414, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0391845703125, 'epoch': 0.24} 24%|██▎ | 592/2500 [2:13:30<7:15:48, 13.70s/it] 24%|██▎ | 593/2500 [2:13:45<7:27:03, 14.07s/it] {'loss': 0.0016, 'grad_norm': 0.1173816462883045, 'learning_rate': 7.628e-07, 'completion_length': 56.80357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03900146484375, 'epoch': 0.24} 24%|██▎ | 593/2500 [2:13:45<7:27:03, 14.07s/it] 24%|██▍ | 594/2500 [2:13:59<7:30:01, 14.17s/it] {'loss': 0.0015, 'grad_norm': 0.09604365605239086, 'learning_rate': 7.623999999999999e-07, 'completion_length': 60.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0369873046875, 'epoch': 0.24} 24%|██▍ | 594/2500 [2:13:59<7:30:01, 14.17s/it] 24%|██▍ | 595/2500 [2:14:15<7:42:16, 14.56s/it] {'loss': 0.0011, 'grad_norm': 0.09674718492200698, 'learning_rate': 7.62e-07, 'completion_length': 64.46429061889648, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.24} 24%|██▍ | 595/2500 [2:14:15<7:42:16, 14.56s/it] 24%|██▍ | 596/2500 [2:14:28<7:26:09, 14.06s/it] {'loss': 0.0007, 'grad_norm': 0.10526153620177925, 'learning_rate': 7.616e-07, 'completion_length': 52.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01727294921875, 'epoch': 0.24} 24%|██▍ | 596/2500 [2:14:28<7:26:09, 14.06s/it] 24%|██▍ | 597/2500 [2:14:43<7:37:05, 14.41s/it] {'loss': 0.0018, 'grad_norm': 0.1645604322448615, 'learning_rate': 7.611999999999999e-07, 'completion_length': 65.30357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0450439453125, 'epoch': 0.24} 24%|██▍ | 597/2500 [2:14:43<7:37:05, 14.41s/it] 24%|██▍ | 598/2500 [2:14:57<7:28:15, 14.14s/it] {'loss': 0.0019, 'grad_norm': 0.4327946526961024, 'learning_rate': 7.608e-07, 'completion_length': 58.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0478515625, 'epoch': 0.24} 24%|██▍ | 598/2500 [2:14:57<7:28:15, 14.14s/it] 24%|██▍ | 599/2500 [2:15:10<7:24:06, 14.02s/it] {'loss': 0.0007, 'grad_norm': 0.09205973875510975, 'learning_rate': 7.604e-07, 'completion_length': 52.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0181884765625, 'epoch': 0.24} 24%|██▍ | 599/2500 [2:15:10<7:24:06, 14.02s/it] 24%|██▍ | 600/2500 [2:15:25<7:30:26, 14.22s/it] {'loss': 0.0011, 'grad_norm': 0.8479663745726513, 'learning_rate': 7.599999999999999e-07, 'completion_length': 63.75000190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02850341796875, 'epoch': 0.24} 24%|██▍ | 600/2500 [2:15:25<7:30:26, 14.22s/it] 24%|██▍ | 601/2500 [2:16:31<15:42:15, 29.77s/it] {'loss': 0.0013, 'grad_norm': 0.08980563658543221, 'learning_rate': 7.596e-07, 'completion_length': 67.01786231994629, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0335693359375, 'epoch': 0.24} 24%|██▍ | 601/2500 [2:16:31<15:42:15, 29.77s/it] 24%|██▍ | 602/2500 [2:16:40<12:22:21, 23.47s/it] {'loss': 0.0019, 'grad_norm': 0.10604311309931175, 'learning_rate': 7.592e-07, 'completion_length': 65.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0482177734375, 'epoch': 0.24} 24%|██▍ | 602/2500 [2:16:40<12:22:21, 23.47s/it] 24%|██▍ | 603/2500 [2:16:49<10:02:16, 19.05s/it] {'loss': 0.0018, 'grad_norm': 1.761721655008679, 'learning_rate': 7.588e-07, 'completion_length': 53.89285850524902, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0447998046875, 'epoch': 0.24} 24%|██▍ | 603/2500 [2:16:49<10:02:16, 19.05s/it] 24%|██▍ | 604/2500 [2:17:01<8:55:36, 16.95s/it] {'loss': 0.001, 'grad_norm': 0.09631390969574408, 'learning_rate': 7.583999999999999e-07, 'completion_length': 60.83928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02618408203125, 'epoch': 0.24} 24%|██▍ | 604/2500 [2:17:01<8:55:36, 16.95s/it] 24%|██▍ | 605/2500 [2:17:10<7:39:59, 14.56s/it] {'loss': 0.0017, 'grad_norm': 0.14072420520257395, 'learning_rate': 7.58e-07, 'completion_length': 50.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0416259765625, 'epoch': 0.24} 24%|██▍ | 605/2500 [2:17:10<7:39:59, 14.56s/it] 24%|██▍ | 606/2500 [2:17:18<6:44:03, 12.80s/it] {'loss': 0.0024, 'grad_norm': 0.16907414939103316, 'learning_rate': 7.576000000000001e-07, 'completion_length': 49.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0606689453125, 'epoch': 0.24} 24%|██▍ | 606/2500 [2:17:18<6:44:03, 12.80s/it] 24%|██▍ | 607/2500 [2:17:26<5:57:56, 11.35s/it] {'loss': 0.0009, 'grad_norm': 0.10286782256208599, 'learning_rate': 7.571999999999999e-07, 'completion_length': 47.660715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02349853515625, 'epoch': 0.24} 24%|██▍ | 607/2500 [2:17:26<5:57:56, 11.35s/it] 24%|██▍ | 608/2500 [2:17:38<6:00:26, 11.43s/it] {'loss': 0.001, 'grad_norm': 0.12039163216563488, 'learning_rate': 7.568e-07, 'completion_length': 54.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02459716796875, 'epoch': 0.24} 24%|██▍ | 608/2500 [2:17:38<6:00:26, 11.43s/it] 24%|██▍ | 609/2500 [2:17:47<5:36:48, 10.69s/it] {'loss': 0.0012, 'grad_norm': 0.12034803802050005, 'learning_rate': 7.564e-07, 'completion_length': 59.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03009033203125, 'epoch': 0.24} 24%|██▍ | 609/2500 [2:17:47<5:36:48, 10.69s/it] 24%|██▍ | 610/2500 [2:17:56<5:19:43, 10.15s/it] {'loss': 0.0009, 'grad_norm': 0.16026013649989418, 'learning_rate': 7.559999999999999e-07, 'completion_length': 52.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02362060546875, 'epoch': 0.24} 24%|██▍ | 610/2500 [2:17:56<5:19:43, 10.15s/it] 24%|██▍ | 611/2500 [2:18:04<5:02:53, 9.62s/it] {'loss': 0.0017, 'grad_norm': 0.10380054389983508, 'learning_rate': 7.556e-07, 'completion_length': 51.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0433349609375, 'epoch': 0.24} 24%|██▍ | 611/2500 [2:18:04<5:02:53, 9.62s/it] 24%|██▍ | 612/2500 [2:18:12<4:47:31, 9.14s/it] {'loss': 0.0012, 'grad_norm': 0.08927373863974948, 'learning_rate': 7.552e-07, 'completion_length': 48.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0302734375, 'epoch': 0.24} 24%|██▍ | 612/2500 [2:18:12<4:47:31, 9.14s/it] 25%|██▍ | 613/2500 [2:18:20<4:39:59, 8.90s/it] {'loss': 0.0012, 'grad_norm': 0.8197172708906688, 'learning_rate': 7.548e-07, 'completion_length': 54.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0296630859375, 'epoch': 0.25} 25%|██▍ | 613/2500 [2:18:20<4:39:59, 8.90s/it] 25%|██▍ | 614/2500 [2:18:29<4:33:32, 8.70s/it] {'loss': 0.0006, 'grad_norm': 0.0812878029972795, 'learning_rate': 7.543999999999999e-07, 'completion_length': 55.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01568603515625, 'epoch': 0.25} 25%|██▍ | 614/2500 [2:18:29<4:33:32, 8.70s/it] 25%|██▍ | 615/2500 [2:18:38<4:41:58, 8.98s/it] {'loss': 0.0009, 'grad_norm': 0.1128795573938503, 'learning_rate': 7.54e-07, 'completion_length': 63.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02203369140625, 'epoch': 0.25} 25%|██▍ | 615/2500 [2:18:38<4:41:58, 8.98s/it] 25%|██▍ | 616/2500 [2:18:47<4:35:43, 8.78s/it] {'loss': 0.0023, 'grad_norm': 0.1761943312033771, 'learning_rate': 7.536e-07, 'completion_length': 55.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.056640625, 'epoch': 0.25} 25%|██▍ | 616/2500 [2:18:47<4:35:43, 8.78s/it] 25%|██▍ | 617/2500 [2:18:55<4:33:12, 8.71s/it] {'loss': 0.0018, 'grad_norm': 3.623524504703856, 'learning_rate': 7.531999999999999e-07, 'completion_length': 58.21428871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.0439453125, 'epoch': 0.25} 25%|██▍ | 617/2500 [2:18:55<4:33:12, 8.71s/it] 25%|██▍ | 618/2500 [2:19:04<4:31:58, 8.67s/it] {'loss': 0.0019, 'grad_norm': 0.1195240031169403, 'learning_rate': 7.528e-07, 'completion_length': 57.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04833984375, 'epoch': 0.25} 25%|██▍ | 618/2500 [2:19:04<4:31:58, 8.67s/it] 25%|██▍ | 619/2500 [2:19:12<4:29:54, 8.61s/it] {'loss': 0.0015, 'grad_norm': 1.5018385118056212, 'learning_rate': 7.523999999999999e-07, 'completion_length': 61.83928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0386962890625, 'epoch': 0.25} 25%|██▍ | 619/2500 [2:19:12<4:29:54, 8.61s/it] 25%|██▍ | 620/2500 [2:19:21<4:34:38, 8.76s/it] {'loss': 0.0018, 'grad_norm': 0.11977954004563905, 'learning_rate': 7.52e-07, 'completion_length': 66.50000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0455322265625, 'epoch': 0.25} 25%|██▍ | 620/2500 [2:19:21<4:34:38, 8.76s/it] 25%|██▍ | 621/2500 [2:19:31<4:45:13, 9.11s/it] {'loss': 0.0019, 'grad_norm': 0.9573762560674822, 'learning_rate': 7.516e-07, 'completion_length': 66.85714721679688, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.047119140625, 'epoch': 0.25} 25%|██▍ | 621/2500 [2:19:31<4:45:13, 9.11s/it] 25%|██▍ | 622/2500 [2:19:41<4:47:47, 9.19s/it] {'loss': 0.0011, 'grad_norm': 0.18284576997712818, 'learning_rate': 7.511999999999999e-07, 'completion_length': 55.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02838134765625, 'epoch': 0.25} 25%|██▍ | 622/2500 [2:19:41<4:47:47, 9.19s/it] 25%|██▍ | 623/2500 [2:19:55<5:38:51, 10.83s/it] {'loss': 0.002, 'grad_norm': 0.1543826771475713, 'learning_rate': 7.508e-07, 'completion_length': 61.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.049072265625, 'epoch': 0.25} 25%|██▍ | 623/2500 [2:19:55<5:38:51, 10.83s/it] 25%|██▍ | 624/2500 [2:20:10<6:14:20, 11.97s/it] {'loss': 0.0022, 'grad_norm': 0.13198490324412082, 'learning_rate': 7.503999999999999e-07, 'completion_length': 52.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0555419921875, 'epoch': 0.25} 25%|██▍ | 624/2500 [2:20:10<6:14:20, 11.97s/it] 25%|██▌ | 625/2500 [2:20:24<6:31:23, 12.52s/it] {'loss': 0.0012, 'grad_norm': 0.8068611615401341, 'learning_rate': 7.5e-07, 'completion_length': 56.89285850524902, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0296630859375, 'epoch': 0.25} 25%|██▌ | 625/2500 [2:20:24<6:31:23, 12.52s/it] 25%|██▌ | 626/2500 [2:20:40<7:02:45, 13.54s/it] {'loss': 0.0019, 'grad_norm': 0.10931143460888017, 'learning_rate': 7.496e-07, 'completion_length': 61.71428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0465087890625, 'epoch': 0.25} 25%|██▌ | 626/2500 [2:20:40<7:02:45, 13.54s/it] 25%|██▌ | 627/2500 [2:20:54<7:09:18, 13.75s/it] {'loss': 0.0007, 'grad_norm': 0.10526075200168265, 'learning_rate': 7.492e-07, 'completion_length': 58.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0166015625, 'epoch': 0.25} 25%|██▌ | 627/2500 [2:20:54<7:09:18, 13.75s/it] 25%|██▌ | 628/2500 [2:21:07<7:07:08, 13.69s/it] {'loss': 0.0012, 'grad_norm': 0.0912843804630203, 'learning_rate': 7.488e-07, 'completion_length': 47.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030364990234375, 'epoch': 0.25} 25%|██▌ | 628/2500 [2:21:07<7:07:08, 13.69s/it] 25%|██▌ | 629/2500 [2:21:22<7:10:37, 13.81s/it] {'loss': 0.0015, 'grad_norm': 0.11406509808622398, 'learning_rate': 7.483999999999999e-07, 'completion_length': 63.05357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03662109375, 'epoch': 0.25} 25%|██▌ | 629/2500 [2:21:22<7:10:37, 13.81s/it] 25%|██▌ | 630/2500 [2:21:35<7:10:29, 13.81s/it] {'loss': 0.0011, 'grad_norm': 0.11340960894859235, 'learning_rate': 7.48e-07, 'completion_length': 50.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028564453125, 'epoch': 0.25} 25%|██▌ | 630/2500 [2:21:35<7:10:29, 13.81s/it] 25%|██▌ | 631/2500 [2:21:50<7:16:05, 14.00s/it] {'loss': 0.0011, 'grad_norm': 0.1063101329219235, 'learning_rate': 7.476e-07, 'completion_length': 57.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02630615234375, 'epoch': 0.25} 25%|██▌ | 631/2500 [2:21:50<7:16:05, 14.00s/it] 25%|██▌ | 632/2500 [2:22:03<7:10:03, 13.81s/it] {'loss': 0.0016, 'grad_norm': 0.10251603941047477, 'learning_rate': 7.471999999999999e-07, 'completion_length': 52.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03900146484375, 'epoch': 0.25} 25%|██▌ | 632/2500 [2:22:03<7:10:03, 13.81s/it] 25%|██▌ | 633/2500 [2:22:17<7:12:14, 13.89s/it] {'loss': 0.0016, 'grad_norm': 0.09307982671615761, 'learning_rate': 7.468e-07, 'completion_length': 60.42857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0404052734375, 'epoch': 0.25} 25%|██▌ | 633/2500 [2:22:17<7:12:14, 13.89s/it] 25%|██▌ | 634/2500 [2:22:31<7:09:06, 13.80s/it] {'loss': 0.0007, 'grad_norm': 0.08411933598135891, 'learning_rate': 7.464e-07, 'completion_length': 60.982147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01629638671875, 'epoch': 0.25} 25%|██▌ | 634/2500 [2:22:31<7:09:06, 13.80s/it] 25%|██▌ | 635/2500 [2:22:44<7:05:12, 13.68s/it] {'loss': 0.0018, 'grad_norm': 1.140466681169588, 'learning_rate': 7.459999999999999e-07, 'completion_length': 61.214290618896484, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0440673828125, 'epoch': 0.25} 25%|██▌ | 635/2500 [2:22:44<7:05:12, 13.68s/it] 25%|██▌ | 636/2500 [2:22:57<6:56:29, 13.41s/it] {'loss': 0.001, 'grad_norm': 0.07334738206451127, 'learning_rate': 7.456e-07, 'completion_length': 51.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02569580078125, 'epoch': 0.25} 25%|██▌ | 636/2500 [2:22:57<6:56:29, 13.41s/it] 25%|██▌ | 637/2500 [2:23:12<7:13:34, 13.96s/it] {'loss': 0.0016, 'grad_norm': 0.08297262423767227, 'learning_rate': 7.452e-07, 'completion_length': 62.607147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0408935546875, 'epoch': 0.25} 25%|██▌ | 637/2500 [2:23:12<7:13:34, 13.96s/it] 26%|██▌ | 638/2500 [2:23:26<7:11:02, 13.89s/it] {'loss': 0.0009, 'grad_norm': 0.1221955111844992, 'learning_rate': 7.447999999999999e-07, 'completion_length': 52.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0216064453125, 'epoch': 0.26} 26%|██▌ | 638/2500 [2:23:26<7:11:02, 13.89s/it] 26%|██▌ | 639/2500 [2:23:40<7:12:10, 13.93s/it] {'loss': 0.0018, 'grad_norm': 0.1238724361700813, 'learning_rate': 7.443999999999999e-07, 'completion_length': 62.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.045166015625, 'epoch': 0.26} 26%|██▌ | 639/2500 [2:23:40<7:12:10, 13.93s/it] 26%|██▌ | 640/2500 [2:23:54<7:11:07, 13.91s/it] {'loss': 0.0011, 'grad_norm': 0.08141563275208821, 'learning_rate': 7.44e-07, 'completion_length': 61.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0274658203125, 'epoch': 0.26} 26%|██▌ | 640/2500 [2:23:54<7:11:07, 13.91s/it] 26%|██▌ | 641/2500 [2:24:08<7:10:48, 13.90s/it] {'loss': 0.0013, 'grad_norm': 0.08235452889918932, 'learning_rate': 7.436e-07, 'completion_length': 54.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03192138671875, 'epoch': 0.26} 26%|██▌ | 641/2500 [2:24:08<7:10:48, 13.90s/it] 26%|██▌ | 642/2500 [2:24:23<7:20:42, 14.23s/it] {'loss': 0.0006, 'grad_norm': 0.07965189521134475, 'learning_rate': 7.431999999999999e-07, 'completion_length': 61.35714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0157470703125, 'epoch': 0.26} 26%|██▌ | 642/2500 [2:24:23<7:20:42, 14.23s/it] 26%|██▌ | 643/2500 [2:24:37<7:17:09, 14.12s/it] {'loss': 0.0007, 'grad_norm': 0.05594625487372498, 'learning_rate': 7.428e-07, 'completion_length': 59.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01641845703125, 'epoch': 0.26} 26%|██▌ | 643/2500 [2:24:37<7:17:09, 14.12s/it] 26%|██▌ | 644/2500 [2:24:56<8:01:10, 15.56s/it] {'loss': 0.0013, 'grad_norm': 0.8003528905084624, 'learning_rate': 7.423999999999999e-07, 'completion_length': 56.64286231994629, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.03350830078125, 'epoch': 0.26} 26%|██▌ | 644/2500 [2:24:56<8:01:10, 15.56s/it] 26%|██▌ | 645/2500 [2:25:09<7:43:58, 15.01s/it] {'loss': 0.0007, 'grad_norm': 1.630515203252549, 'learning_rate': 7.42e-07, 'completion_length': 53.767860412597656, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0186767578125, 'epoch': 0.26} 26%|██▌ | 645/2500 [2:25:09<7:43:58, 15.01s/it] 26%|██▌ | 646/2500 [2:25:23<7:29:22, 14.54s/it] {'loss': 0.0013, 'grad_norm': 0.0752892876652509, 'learning_rate': 7.416e-07, 'completion_length': 50.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03289794921875, 'epoch': 0.26} 26%|██▌ | 646/2500 [2:25:23<7:29:22, 14.54s/it] 26%|██▌ | 647/2500 [2:25:36<7:17:18, 14.16s/it] {'loss': 0.0018, 'grad_norm': 0.2858918770173109, 'learning_rate': 7.411999999999999e-07, 'completion_length': 58.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0452880859375, 'epoch': 0.26} 26%|██▌ | 647/2500 [2:25:36<7:17:18, 14.16s/it] 26%|██▌ | 648/2500 [2:25:50<7:19:58, 14.25s/it] {'loss': 0.0009, 'grad_norm': 0.1495838420476563, 'learning_rate': 7.408e-07, 'completion_length': 60.03571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.022705078125, 'epoch': 0.26} 26%|██▌ | 648/2500 [2:25:50<7:19:58, 14.25s/it] 26%|██▌ | 649/2500 [2:26:05<7:19:05, 14.23s/it] {'loss': 0.0015, 'grad_norm': 0.1675529016723948, 'learning_rate': 7.403999999999999e-07, 'completion_length': 60.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03857421875, 'epoch': 0.26} 26%|██▌ | 649/2500 [2:26:05<7:19:05, 14.23s/it] 26%|██▌ | 650/2500 [2:26:19<7:19:17, 14.25s/it] {'loss': 0.0012, 'grad_norm': 0.0724432064140073, 'learning_rate': 7.4e-07, 'completion_length': 53.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02935791015625, 'epoch': 0.26} 26%|██▌ | 650/2500 [2:26:19<7:19:17, 14.25s/it] 26%|██▌ | 651/2500 [2:26:33<7:19:44, 14.27s/it] {'loss': 0.0012, 'grad_norm': 0.10075479497266199, 'learning_rate': 7.396e-07, 'completion_length': 55.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03076171875, 'epoch': 0.26} 26%|██▌ | 651/2500 [2:26:33<7:19:44, 14.27s/it] 26%|██▌ | 652/2500 [2:26:49<7:34:13, 14.75s/it] {'loss': 0.0021, 'grad_norm': 0.08251686931466798, 'learning_rate': 7.392e-07, 'completion_length': 65.73214340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0518798828125, 'epoch': 0.26} 26%|██▌ | 652/2500 [2:26:49<7:34:13, 14.75s/it] 26%|██▌ | 653/2500 [2:27:03<7:29:52, 14.61s/it] {'loss': 0.0018, 'grad_norm': 0.0905698092773059, 'learning_rate': 7.388e-07, 'completion_length': 53.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0445556640625, 'epoch': 0.26} 26%|██▌ | 653/2500 [2:27:03<7:29:52, 14.61s/it] 26%|██▌ | 654/2500 [2:27:18<7:29:00, 14.59s/it] {'loss': 0.0014, 'grad_norm': 1.8698121018032974, 'learning_rate': 7.383999999999999e-07, 'completion_length': 54.53571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.034912109375, 'epoch': 0.26} 26%|██▌ | 654/2500 [2:27:18<7:29:00, 14.59s/it] 26%|██▌ | 655/2500 [2:27:32<7:24:34, 14.46s/it] {'loss': 0.0012, 'grad_norm': 0.08529778513420559, 'learning_rate': 7.38e-07, 'completion_length': 57.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03057861328125, 'epoch': 0.26} 26%|██▌ | 655/2500 [2:27:32<7:24:34, 14.46s/it] 26%|██▌ | 656/2500 [2:27:46<7:16:07, 14.19s/it] {'loss': 0.0011, 'grad_norm': 0.12011259745115022, 'learning_rate': 7.376e-07, 'completion_length': 60.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028594970703125, 'epoch': 0.26} 26%|██▌ | 656/2500 [2:27:46<7:16:07, 14.19s/it] 26%|██▋ | 657/2500 [2:27:59<7:11:34, 14.05s/it] {'loss': 0.0025, 'grad_norm': 4.7346434349152995, 'learning_rate': 7.371999999999999e-07, 'completion_length': 58.28571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285715222358704, 'reward_std': 0.0714285746216774, 'kl': 0.0621337890625, 'epoch': 0.26} 26%|██▋ | 657/2500 [2:27:59<7:11:34, 14.05s/it] 26%|██▋ | 658/2500 [2:28:13<7:10:04, 14.01s/it] {'loss': 0.0017, 'grad_norm': 0.14135356349655273, 'learning_rate': 7.368e-07, 'completion_length': 57.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0421142578125, 'epoch': 0.26} 26%|██▋ | 658/2500 [2:28:13<7:10:04, 14.01s/it] 26%|██▋ | 659/2500 [2:28:27<7:06:26, 13.90s/it] {'loss': 0.0008, 'grad_norm': 0.13529297970486226, 'learning_rate': 7.364000000000001e-07, 'completion_length': 51.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01910400390625, 'epoch': 0.26} 26%|██▋ | 659/2500 [2:28:27<7:06:26, 13.90s/it] 26%|██▋ | 660/2500 [2:28:41<7:07:25, 13.94s/it] {'loss': 0.0013, 'grad_norm': 1.5058505380975153, 'learning_rate': 7.359999999999999e-07, 'completion_length': 56.14285850524902, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0323486328125, 'epoch': 0.26} 26%|██▋ | 660/2500 [2:28:41<7:07:25, 13.94s/it] 26%|██▋ | 661/2500 [2:28:56<7:14:32, 14.18s/it] {'loss': 0.0009, 'grad_norm': 0.06783831879852091, 'learning_rate': 7.356e-07, 'completion_length': 53.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021453857421875, 'epoch': 0.26} 26%|██▋ | 661/2500 [2:28:56<7:14:32, 14.18s/it] 26%|██▋ | 662/2500 [2:29:09<7:02:58, 13.81s/it] {'loss': 0.0016, 'grad_norm': 0.11409408472088303, 'learning_rate': 7.352e-07, 'completion_length': 52.92857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0391845703125, 'epoch': 0.26} 26%|██▋ | 662/2500 [2:29:09<7:02:58, 13.81s/it] 27%|██▋ | 663/2500 [2:29:23<7:05:55, 13.91s/it] {'loss': 0.0008, 'grad_norm': 0.0741704000002197, 'learning_rate': 7.347999999999999e-07, 'completion_length': 54.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018951416015625, 'epoch': 0.27} 27%|██▋ | 663/2500 [2:29:23<7:05:55, 13.91s/it] 27%|██▋ | 664/2500 [2:29:36<6:57:29, 13.64s/it] {'loss': 0.0014, 'grad_norm': 0.1009380741534091, 'learning_rate': 7.344e-07, 'completion_length': 53.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.034912109375, 'epoch': 0.27} 27%|██▋ | 664/2500 [2:29:36<6:57:29, 13.64s/it] 27%|██▋ | 665/2500 [2:29:50<6:58:48, 13.69s/it] {'loss': 0.0011, 'grad_norm': 0.07233416230211254, 'learning_rate': 7.34e-07, 'completion_length': 54.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028717041015625, 'epoch': 0.27} 27%|██▋ | 665/2500 [2:29:50<6:58:48, 13.69s/it] 27%|██▋ | 666/2500 [2:30:04<7:03:18, 13.85s/it] {'loss': 0.0017, 'grad_norm': 0.09787596332062909, 'learning_rate': 7.336e-07, 'completion_length': 55.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.042236328125, 'epoch': 0.27} 27%|██▋ | 666/2500 [2:30:04<7:03:18, 13.85s/it] 27%|██▋ | 667/2500 [2:30:17<6:59:36, 13.74s/it] {'loss': 0.0019, 'grad_norm': 0.12694532698349295, 'learning_rate': 7.331999999999999e-07, 'completion_length': 49.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04638671875, 'epoch': 0.27} 27%|██▋ | 667/2500 [2:30:17<6:59:36, 13.74s/it] 27%|██▋ | 668/2500 [2:30:31<6:57:08, 13.66s/it] {'loss': 0.0016, 'grad_norm': 0.1278005286009749, 'learning_rate': 7.328e-07, 'completion_length': 53.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0406494140625, 'epoch': 0.27} 27%|██▋ | 668/2500 [2:30:31<6:57:08, 13.66s/it] 27%|██▋ | 669/2500 [2:30:44<6:55:34, 13.62s/it] {'loss': 0.0019, 'grad_norm': 0.7607444215785176, 'learning_rate': 7.324e-07, 'completion_length': 53.58928680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0482177734375, 'epoch': 0.27} 27%|██▋ | 669/2500 [2:30:44<6:55:34, 13.62s/it] 27%|██▋ | 670/2500 [2:30:58<6:55:34, 13.63s/it] {'loss': 0.0012, 'grad_norm': 0.07634234792020571, 'learning_rate': 7.319999999999999e-07, 'completion_length': 52.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029541015625, 'epoch': 0.27} 27%|██▋ | 670/2500 [2:30:58<6:55:34, 13.63s/it] 27%|██▋ | 671/2500 [2:31:11<6:50:36, 13.47s/it] {'loss': 0.0022, 'grad_norm': 1.0786130000377685, 'learning_rate': 7.316e-07, 'completion_length': 52.160715103149414, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.054931640625, 'epoch': 0.27} 27%|██▋ | 671/2500 [2:31:11<6:50:36, 13.47s/it] 27%|██▋ | 672/2500 [2:31:26<7:06:16, 13.99s/it] {'loss': 0.0007, 'grad_norm': 0.09430918893412657, 'learning_rate': 7.311999999999999e-07, 'completion_length': 61.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016448974609375, 'epoch': 0.27} 27%|██▋ | 672/2500 [2:31:26<7:06:16, 13.99s/it] 27%|██▋ | 673/2500 [2:31:41<7:10:36, 14.14s/it] {'loss': 0.0012, 'grad_norm': 0.08252518569793342, 'learning_rate': 7.308e-07, 'completion_length': 59.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03033447265625, 'epoch': 0.27} 27%|██▋ | 673/2500 [2:31:41<7:10:36, 14.14s/it] 27%|██▋ | 674/2500 [2:31:56<7:17:37, 14.38s/it] {'loss': 0.0007, 'grad_norm': 0.06930677632491988, 'learning_rate': 7.304e-07, 'completion_length': 58.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01776123046875, 'epoch': 0.27} 27%|██▋ | 674/2500 [2:31:56<7:17:37, 14.38s/it] 27%|██▋ | 675/2500 [2:32:10<7:12:38, 14.22s/it] {'loss': 0.0018, 'grad_norm': 2.815672954010588, 'learning_rate': 7.3e-07, 'completion_length': 57.535715103149414, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0457763671875, 'epoch': 0.27} 27%|██▋ | 675/2500 [2:32:10<7:12:38, 14.22s/it] 27%|██▋ | 676/2500 [2:32:23<7:07:41, 14.07s/it] {'loss': 0.0007, 'grad_norm': 0.23559077532845876, 'learning_rate': 7.296e-07, 'completion_length': 56.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017822265625, 'epoch': 0.27} 27%|██▋ | 676/2500 [2:32:23<7:07:41, 14.07s/it] 27%|██▋ | 677/2500 [2:32:37<7:05:58, 14.02s/it] {'loss': 0.0012, 'grad_norm': 0.08052310358150043, 'learning_rate': 7.291999999999999e-07, 'completion_length': 58.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029052734375, 'epoch': 0.27} 27%|██▋ | 677/2500 [2:32:37<7:05:58, 14.02s/it] 27%|██▋ | 678/2500 [2:32:50<6:58:15, 13.77s/it] {'loss': 0.001, 'grad_norm': 0.10763099487697383, 'learning_rate': 7.288e-07, 'completion_length': 50.48214340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025390625, 'epoch': 0.27} 27%|██▋ | 678/2500 [2:32:50<6:58:15, 13.77s/it] 27%|██▋ | 679/2500 [2:33:04<6:59:49, 13.83s/it] {'loss': 0.0011, 'grad_norm': 0.09418618422519029, 'learning_rate': 7.284e-07, 'completion_length': 55.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0283203125, 'epoch': 0.27} 27%|██▋ | 679/2500 [2:33:04<6:59:49, 13.83s/it] 27%|██▋ | 680/2500 [2:33:19<7:06:02, 14.05s/it] {'loss': 0.001, 'grad_norm': 0.10650309577097826, 'learning_rate': 7.28e-07, 'completion_length': 65.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024169921875, 'epoch': 0.27} 27%|██▋ | 680/2500 [2:33:19<7:06:02, 14.05s/it] 27%|██▋ | 681/2500 [2:33:33<7:05:31, 14.04s/it] {'loss': 0.0014, 'grad_norm': 0.19808159455215882, 'learning_rate': 7.276e-07, 'completion_length': 50.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03399658203125, 'epoch': 0.27} 27%|██▋ | 681/2500 [2:33:33<7:05:31, 14.04s/it] 27%|██▋ | 682/2500 [2:33:46<6:58:22, 13.81s/it] {'loss': 0.0011, 'grad_norm': 2.7561294455843015, 'learning_rate': 7.271999999999999e-07, 'completion_length': 50.58928871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02850341796875, 'epoch': 0.27} 27%|██▋ | 682/2500 [2:33:46<6:58:22, 13.81s/it] 27%|██▋ | 683/2500 [2:34:00<6:56:26, 13.75s/it] {'loss': 0.0013, 'grad_norm': 0.176552676512657, 'learning_rate': 7.268e-07, 'completion_length': 53.53571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03204345703125, 'epoch': 0.27} 27%|██▋ | 683/2500 [2:34:00<6:56:26, 13.75s/it] 27%|██▋ | 684/2500 [2:34:14<7:01:12, 13.92s/it] {'loss': 0.0015, 'grad_norm': 1.4037466465979331, 'learning_rate': 7.264e-07, 'completion_length': 63.39286231994629, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.03656005859375, 'epoch': 0.27} 27%|██▋ | 684/2500 [2:34:14<7:01:12, 13.92s/it] 27%|██▋ | 685/2500 [2:34:29<7:08:57, 14.18s/it] {'loss': 0.0012, 'grad_norm': 0.7185673804908498, 'learning_rate': 7.259999999999999e-07, 'completion_length': 59.53571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0308837890625, 'epoch': 0.27} 27%|██▋ | 685/2500 [2:34:29<7:08:57, 14.18s/it] 27%|██▋ | 686/2500 [2:34:43<7:09:23, 14.20s/it] {'loss': 0.0012, 'grad_norm': 0.10127289689259363, 'learning_rate': 7.256e-07, 'completion_length': 58.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02984619140625, 'epoch': 0.27} 27%|██▋ | 686/2500 [2:34:43<7:09:23, 14.20s/it] 27%|██▋ | 687/2500 [2:34:57<7:09:42, 14.22s/it] {'loss': 0.0013, 'grad_norm': 2.0564432519627456, 'learning_rate': 7.252e-07, 'completion_length': 55.05357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.03228759765625, 'epoch': 0.27} 27%|██▋ | 687/2500 [2:34:57<7:09:42, 14.22s/it] 28%|██▊ | 688/2500 [2:35:11<7:07:31, 14.16s/it] {'loss': 0.0011, 'grad_norm': 2.3022605789022323, 'learning_rate': 7.247999999999999e-07, 'completion_length': 55.517860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02850341796875, 'epoch': 0.28} 28%|██▊ | 688/2500 [2:35:11<7:07:31, 14.16s/it] 28%|██▊ | 689/2500 [2:35:25<7:04:34, 14.07s/it] {'loss': 0.0019, 'grad_norm': 4.449061092501443, 'learning_rate': 7.244e-07, 'completion_length': 60.58928871154785, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.0357142873108387, 'kl': 0.0465087890625, 'epoch': 0.28} 28%|██▊ | 689/2500 [2:35:25<7:04:34, 14.07s/it] 28%|██▊ | 690/2500 [2:35:39<6:58:41, 13.88s/it] {'loss': 0.0006, 'grad_norm': 0.08648189458469277, 'learning_rate': 7.24e-07, 'completion_length': 53.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01568603515625, 'epoch': 0.28} 28%|██▊ | 690/2500 [2:35:39<6:58:41, 13.88s/it] 28%|██▊ | 691/2500 [2:35:52<6:53:18, 13.71s/it] {'loss': 0.0008, 'grad_norm': 0.069352686669724, 'learning_rate': 7.235999999999999e-07, 'completion_length': 50.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01953125, 'epoch': 0.28} 28%|██▊ | 691/2500 [2:35:52<6:53:18, 13.71s/it] 28%|██▊ | 692/2500 [2:36:06<6:56:34, 13.82s/it] {'loss': 0.0014, 'grad_norm': 0.13222491533402816, 'learning_rate': 7.231999999999999e-07, 'completion_length': 54.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0352783203125, 'epoch': 0.28} 28%|██▊ | 692/2500 [2:36:06<6:56:34, 13.82s/it] 28%|██▊ | 693/2500 [2:36:19<6:52:08, 13.68s/it] {'loss': 0.0007, 'grad_norm': 0.08435792235742831, 'learning_rate': 7.228e-07, 'completion_length': 57.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017913818359375, 'epoch': 0.28} 28%|██▊ | 693/2500 [2:36:19<6:52:08, 13.68s/it] 28%|██▊ | 694/2500 [2:36:33<6:50:17, 13.63s/it] {'loss': 0.0021, 'grad_norm': 1.6780464902785055, 'learning_rate': 7.224e-07, 'completion_length': 61.125003814697266, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.053466796875, 'epoch': 0.28} 28%|██▊ | 694/2500 [2:36:33<6:50:17, 13.63s/it] 28%|██▊ | 695/2500 [2:36:48<7:03:48, 14.09s/it] {'loss': 0.0015, 'grad_norm': 7.625757285098363, 'learning_rate': 7.219999999999999e-07, 'completion_length': 60.750003814697266, 'rewards/accuracy_reward': 0.8571429252624512, 'rewards/format_reward': 1.0, 'reward': 1.8571429252624512, 'reward_std': 0.0824786126613617, 'kl': 0.0382080078125, 'epoch': 0.28} 28%|██▊ | 695/2500 [2:36:48<7:03:48, 14.09s/it] 28%|██▊ | 696/2500 [2:37:01<6:55:49, 13.83s/it] {'loss': 0.0015, 'grad_norm': 2.4128319113513186, 'learning_rate': 7.216e-07, 'completion_length': 49.94643211364746, 'rewards/accuracy_reward': 0.9107142984867096, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.0357142873108387, 'kl': 0.0364990234375, 'epoch': 0.28} 28%|██▊ | 696/2500 [2:37:01<6:55:49, 13.83s/it] 28%|██▊ | 697/2500 [2:37:15<6:53:45, 13.77s/it] {'loss': 0.001, 'grad_norm': 1.062864825805295, 'learning_rate': 7.211999999999999e-07, 'completion_length': 59.92857360839844, 'rewards/accuracy_reward': 0.9107142984867096, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.0357142873108387, 'kl': 0.02490234375, 'epoch': 0.28} 28%|██▊ | 697/2500 [2:37:15<6:53:45, 13.77s/it] 28%|██▊ | 698/2500 [2:37:29<6:59:29, 13.97s/it] {'loss': 0.002, 'grad_norm': 0.11897958007371734, 'learning_rate': 7.207999999999999e-07, 'completion_length': 60.96428871154785, 'rewards/accuracy_reward': 0.8571428656578064, 'rewards/format_reward': 1.0, 'reward': 1.8571429252624512, 'reward_std': 0.0, 'kl': 0.049560546875, 'epoch': 0.28} 28%|██▊ | 698/2500 [2:37:29<6:59:29, 13.97s/it] 28%|██▊ | 699/2500 [2:37:44<7:07:11, 14.23s/it] {'loss': 0.0011, 'grad_norm': 2.8181113732187386, 'learning_rate': 7.204e-07, 'completion_length': 60.44643211364746, 'rewards/accuracy_reward': 0.8571428954601288, 'rewards/format_reward': 1.0, 'reward': 1.857142984867096, 'reward_std': 0.0714285746216774, 'kl': 0.02685546875, 'epoch': 0.28} 28%|██▊ | 699/2500 [2:37:44<7:07:11, 14.23s/it] 28%|██▊ | 700/2500 [2:37:58<6:59:13, 13.97s/it] {'loss': 0.0014, 'grad_norm': 2.697757927941478, 'learning_rate': 7.2e-07, 'completion_length': 55.60714530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03521728515625, 'epoch': 0.28} 28%|██▊ | 700/2500 [2:37:58<6:59:13, 13.97s/it] 28%|██▊ | 701/2500 [2:39:10<15:45:32, 31.54s/it] {'loss': 0.0015, 'grad_norm': 0.09669848289833162, 'learning_rate': 7.196e-07, 'completion_length': 59.92857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0382080078125, 'epoch': 0.28} 28%|██▊ | 701/2500 [2:39:10<15:45:32, 31.54s/it] 28%|██▊ | 702/2500 [2:39:25<13:17:00, 26.60s/it] {'loss': 0.0008, 'grad_norm': 0.17553586900678783, 'learning_rate': 7.191999999999999e-07, 'completion_length': 56.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01910400390625, 'epoch': 0.28} 28%|██▊ | 702/2500 [2:39:25<13:17:00, 26.60s/it] 28%|██▊ | 703/2500 [2:39:39<11:19:46, 22.70s/it] {'loss': 0.0008, 'grad_norm': 0.06286616867209587, 'learning_rate': 7.188e-07, 'completion_length': 56.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018768310546875, 'epoch': 0.28} 28%|██▊ | 703/2500 [2:39:39<11:19:46, 22.70s/it] 28%|██▊ | 704/2500 [2:39:53<10:04:02, 20.18s/it] {'loss': 0.0014, 'grad_norm': 0.08913896681124467, 'learning_rate': 7.184e-07, 'completion_length': 56.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0345458984375, 'epoch': 0.28} 28%|██▊ | 704/2500 [2:39:53<10:04:02, 20.18s/it] 28%|██▊ | 705/2500 [2:40:06<9:00:02, 18.05s/it] {'loss': 0.0007, 'grad_norm': 0.07783987358637695, 'learning_rate': 7.179999999999999e-07, 'completion_length': 56.375003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01629638671875, 'epoch': 0.28} 28%|██▊ | 705/2500 [2:40:06<9:00:02, 18.05s/it] 28%|██▊ | 706/2500 [2:40:25<9:08:11, 18.33s/it] {'loss': 0.0015, 'grad_norm': 0.4863095968052153, 'learning_rate': 7.176e-07, 'completion_length': 70.17857360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.03826904296875, 'epoch': 0.28} 28%|██▊ | 706/2500 [2:40:25<9:08:11, 18.33s/it] 28%|██▊ | 707/2500 [2:40:40<8:32:43, 17.16s/it] {'loss': 0.0011, 'grad_norm': 0.5923431208559592, 'learning_rate': 7.171999999999999e-07, 'completion_length': 61.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.28} 28%|██▊ | 707/2500 [2:40:40<8:32:43, 17.16s/it] 28%|██▊ | 708/2500 [2:40:53<7:58:43, 16.03s/it] {'loss': 0.0017, 'grad_norm': 1.9156230753800396, 'learning_rate': 7.168e-07, 'completion_length': 54.33928680419922, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0413818359375, 'epoch': 0.28} 28%|██▊ | 708/2500 [2:40:53<7:58:43, 16.03s/it] 28%|██▊ | 709/2500 [2:41:07<7:44:12, 15.55s/it] {'loss': 0.0019, 'grad_norm': 0.06962642690537019, 'learning_rate': 7.164e-07, 'completion_length': 53.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04638671875, 'epoch': 0.28} 28%|██▊ | 709/2500 [2:41:07<7:44:12, 15.55s/it] 28%|██▊ | 710/2500 [2:41:22<7:31:55, 15.15s/it] {'loss': 0.0017, 'grad_norm': 1.2303092719763773, 'learning_rate': 7.159999999999999e-07, 'completion_length': 57.80357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.04345703125, 'epoch': 0.28} 28%|██▊ | 710/2500 [2:41:22<7:31:55, 15.15s/it] 28%|██▊ | 711/2500 [2:41:35<7:17:13, 14.66s/it] {'loss': 0.0012, 'grad_norm': 0.12854133075265015, 'learning_rate': 7.156e-07, 'completion_length': 52.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0296630859375, 'epoch': 0.28} 28%|██▊ | 711/2500 [2:41:35<7:17:13, 14.66s/it] 28%|██▊ | 712/2500 [2:41:50<7:14:23, 14.58s/it] {'loss': 0.0011, 'grad_norm': 1.243423085308731, 'learning_rate': 7.151999999999999e-07, 'completion_length': 62.00000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02850341796875, 'epoch': 0.28} 28%|██▊ | 712/2500 [2:41:50<7:14:23, 14.58s/it] 29%|██▊ | 713/2500 [2:42:03<7:07:21, 14.35s/it] {'loss': 0.0008, 'grad_norm': 0.08878780840853344, 'learning_rate': 7.147999999999999e-07, 'completion_length': 58.267860412597656, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.020172119140625, 'epoch': 0.29} 29%|██▊ | 713/2500 [2:42:03<7:07:21, 14.35s/it] 29%|██▊ | 714/2500 [2:42:17<6:58:22, 14.06s/it] {'loss': 0.0007, 'grad_norm': 0.11913449005649634, 'learning_rate': 7.144e-07, 'completion_length': 55.21428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01666259765625, 'epoch': 0.29} 29%|██▊ | 714/2500 [2:42:17<6:58:22, 14.06s/it] 29%|██▊ | 715/2500 [2:42:31<7:02:32, 14.20s/it] {'loss': 0.0012, 'grad_norm': 2.543331916881282, 'learning_rate': 7.14e-07, 'completion_length': 66.07143020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03094482421875, 'epoch': 0.29} 29%|██▊ | 715/2500 [2:42:31<7:02:32, 14.20s/it] 29%|██▊ | 716/2500 [2:42:46<7:04:22, 14.27s/it] {'loss': 0.001, 'grad_norm': 0.09677527780908447, 'learning_rate': 7.135999999999999e-07, 'completion_length': 63.875003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0245361328125, 'epoch': 0.29} 29%|██▊ | 716/2500 [2:42:46<7:04:22, 14.27s/it] 29%|██▊ | 717/2500 [2:43:00<7:04:06, 14.27s/it] {'loss': 0.0014, 'grad_norm': 0.1165768933855483, 'learning_rate': 7.131999999999999e-07, 'completion_length': 57.232147216796875, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0357666015625, 'epoch': 0.29} 29%|██▊ | 717/2500 [2:43:00<7:04:06, 14.27s/it] 29%|██▊ | 718/2500 [2:43:15<7:08:53, 14.44s/it] {'loss': 0.0009, 'grad_norm': 0.11697276255012551, 'learning_rate': 7.128e-07, 'completion_length': 50.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02166748046875, 'epoch': 0.29} 29%|██▊ | 718/2500 [2:43:15<7:08:53, 14.44s/it] 29%|██▉ | 719/2500 [2:43:28<6:56:54, 14.05s/it] {'loss': 0.0007, 'grad_norm': 0.08834665523188744, 'learning_rate': 7.124e-07, 'completion_length': 50.83928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01708984375, 'epoch': 0.29} 29%|██▉ | 719/2500 [2:43:28<6:56:54, 14.05s/it] 29%|██▉ | 720/2500 [2:43:45<7:19:26, 14.81s/it] {'loss': 0.0007, 'grad_norm': 0.11665509484914113, 'learning_rate': 7.119999999999999e-07, 'completion_length': 66.00000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01849365234375, 'epoch': 0.29} 29%|██▉ | 720/2500 [2:43:45<7:19:26, 14.81s/it] 29%|██▉ | 721/2500 [2:43:58<7:11:27, 14.55s/it] {'loss': 0.0013, 'grad_norm': 0.08984058138783793, 'learning_rate': 7.116e-07, 'completion_length': 62.321434020996094, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03131103515625, 'epoch': 0.29} 29%|██▉ | 721/2500 [2:43:58<7:11:27, 14.55s/it] 29%|██▉ | 722/2500 [2:44:13<7:13:11, 14.62s/it] {'loss': 0.0007, 'grad_norm': 1.069304792525965, 'learning_rate': 7.112000000000001e-07, 'completion_length': 54.05357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0179443359375, 'epoch': 0.29} 29%|██▉ | 722/2500 [2:44:13<7:13:11, 14.62s/it] 29%|██▉ | 723/2500 [2:44:27<7:05:27, 14.37s/it] {'loss': 0.0007, 'grad_norm': 1.8131241497151103, 'learning_rate': 7.107999999999999e-07, 'completion_length': 60.48214530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.01641845703125, 'epoch': 0.29} 29%|██▉ | 723/2500 [2:44:27<7:05:27, 14.37s/it] 29%|██▉ | 724/2500 [2:44:41<7:03:47, 14.32s/it] {'loss': 0.0008, 'grad_norm': 0.07812443135629664, 'learning_rate': 7.104e-07, 'completion_length': 56.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019256591796875, 'epoch': 0.29} 29%|██▉ | 724/2500 [2:44:41<7:03:47, 14.32s/it] 29%|██▉ | 725/2500 [2:44:55<6:57:39, 14.12s/it] {'loss': 0.0014, 'grad_norm': 7.117217570825064, 'learning_rate': 7.1e-07, 'completion_length': 62.250003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285715222358704, 'reward_std': 0.0714285746216774, 'kl': 0.03564453125, 'epoch': 0.29} 29%|██▉ | 725/2500 [2:44:55<6:57:39, 14.12s/it] 29%|██▉ | 726/2500 [2:45:08<6:49:28, 13.85s/it] {'loss': 0.0005, 'grad_norm': 0.08034319878246897, 'learning_rate': 7.096e-07, 'completion_length': 55.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01251220703125, 'epoch': 0.29} 29%|██▉ | 726/2500 [2:45:08<6:49:28, 13.85s/it] 29%|██▉ | 727/2500 [2:45:22<6:49:05, 13.84s/it] {'loss': 0.0011, 'grad_norm': 0.0966625273698343, 'learning_rate': 7.092e-07, 'completion_length': 59.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02667236328125, 'epoch': 0.29} 29%|██▉ | 727/2500 [2:45:22<6:49:05, 13.84s/it] 29%|██▉ | 728/2500 [2:45:38<7:12:10, 14.63s/it] {'loss': 0.0015, 'grad_norm': 0.1361400873171404, 'learning_rate': 7.088e-07, 'completion_length': 61.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0367431640625, 'epoch': 0.29} 29%|██▉ | 728/2500 [2:45:38<7:12:10, 14.63s/it] 29%|██▉ | 729/2500 [2:45:52<7:02:06, 14.30s/it] {'loss': 0.0011, 'grad_norm': 0.14835404783314896, 'learning_rate': 7.084e-07, 'completion_length': 50.58928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02850341796875, 'epoch': 0.29} 29%|██▉ | 729/2500 [2:45:52<7:02:06, 14.30s/it] 29%|██▉ | 730/2500 [2:46:07<7:05:27, 14.42s/it] {'loss': 0.0014, 'grad_norm': 3.3272854088974424, 'learning_rate': 7.079999999999999e-07, 'completion_length': 57.03571891784668, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.035400390625, 'epoch': 0.29} 29%|██▉ | 730/2500 [2:46:07<7:05:27, 14.42s/it] 29%|██▉ | 731/2500 [2:46:20<6:57:54, 14.17s/it] {'loss': 0.0008, 'grad_norm': 0.10925893486236321, 'learning_rate': 7.076e-07, 'completion_length': 49.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0205078125, 'epoch': 0.29} 29%|██▉ | 731/2500 [2:46:20<6:57:54, 14.17s/it] 29%|██▉ | 732/2500 [2:46:35<6:59:15, 14.23s/it] {'loss': 0.0009, 'grad_norm': 0.09684702631424215, 'learning_rate': 7.072e-07, 'completion_length': 58.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02215576171875, 'epoch': 0.29} 29%|██▉ | 732/2500 [2:46:35<6:59:15, 14.23s/it] 29%|██▉ | 733/2500 [2:46:48<6:54:59, 14.09s/it] {'loss': 0.0007, 'grad_norm': 0.14142896112986453, 'learning_rate': 7.068e-07, 'completion_length': 58.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01861572265625, 'epoch': 0.29} 29%|██▉ | 733/2500 [2:46:48<6:54:59, 14.09s/it] 29%|██▉ | 734/2500 [2:47:02<6:51:32, 13.98s/it] {'loss': 0.0012, 'grad_norm': 0.12841806737167435, 'learning_rate': 7.064e-07, 'completion_length': 53.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03106689453125, 'epoch': 0.29} 29%|██▉ | 734/2500 [2:47:02<6:51:32, 13.98s/it] 29%|██▉ | 735/2500 [2:47:16<6:50:14, 13.95s/it] {'loss': 0.001, 'grad_norm': 1.9474883221828392, 'learning_rate': 7.059999999999999e-07, 'completion_length': 53.78571701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.025390625, 'epoch': 0.29} 29%|██▉ | 735/2500 [2:47:16<6:50:14, 13.95s/it] 29%|██▉ | 736/2500 [2:47:30<6:52:37, 14.03s/it] {'loss': 0.0011, 'grad_norm': 0.08697220416283523, 'learning_rate': 7.056e-07, 'completion_length': 57.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02838134765625, 'epoch': 0.29} 29%|██▉ | 736/2500 [2:47:30<6:52:37, 14.03s/it] 29%|██▉ | 737/2500 [2:47:44<6:46:59, 13.85s/it] {'loss': 0.0017, 'grad_norm': 0.09536533218374876, 'learning_rate': 7.052e-07, 'completion_length': 52.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0419921875, 'epoch': 0.29} 29%|██▉ | 737/2500 [2:47:44<6:46:59, 13.85s/it] 30%|██▉ | 738/2500 [2:47:57<6:44:24, 13.77s/it] {'loss': 0.0009, 'grad_norm': 4.551512520922931, 'learning_rate': 7.047999999999999e-07, 'completion_length': 59.00000190734863, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.022857666015625, 'epoch': 0.3} 30%|██▉ | 738/2500 [2:47:57<6:44:24, 13.77s/it] 30%|██▉ | 739/2500 [2:48:11<6:45:31, 13.82s/it] {'loss': 0.0008, 'grad_norm': 0.07282668331894177, 'learning_rate': 7.044e-07, 'completion_length': 55.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0189208984375, 'epoch': 0.3} 30%|██▉ | 739/2500 [2:48:11<6:45:31, 13.82s/it] 30%|██▉ | 740/2500 [2:48:25<6:44:16, 13.78s/it] {'loss': 0.001, 'grad_norm': 0.10498165332480865, 'learning_rate': 7.04e-07, 'completion_length': 57.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0250244140625, 'epoch': 0.3} 30%|██▉ | 740/2500 [2:48:25<6:44:16, 13.78s/it] 30%|██▉ | 741/2500 [2:48:39<6:46:33, 13.87s/it] {'loss': 0.0018, 'grad_norm': 1.1809196186140976, 'learning_rate': 7.035999999999999e-07, 'completion_length': 56.48214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0452880859375, 'epoch': 0.3} 30%|██▉ | 741/2500 [2:48:39<6:46:33, 13.87s/it] 30%|██▉ | 742/2500 [2:48:54<6:57:05, 14.24s/it] {'loss': 0.0009, 'grad_norm': 0.46349978160961813, 'learning_rate': 7.032e-07, 'completion_length': 60.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022308349609375, 'epoch': 0.3} 30%|██▉ | 742/2500 [2:48:54<6:57:05, 14.24s/it] 30%|██▉ | 743/2500 [2:49:07<6:49:12, 13.97s/it] {'loss': 0.0008, 'grad_norm': 0.11686743213255103, 'learning_rate': 7.028e-07, 'completion_length': 54.125003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018768310546875, 'epoch': 0.3} 30%|██▉ | 743/2500 [2:49:07<6:49:12, 13.97s/it] 30%|██▉ | 744/2500 [2:49:22<6:58:32, 14.30s/it] {'loss': 0.0018, 'grad_norm': 0.09585193371894497, 'learning_rate': 7.024e-07, 'completion_length': 58.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04541015625, 'epoch': 0.3} 30%|██▉ | 744/2500 [2:49:22<6:58:32, 14.30s/it] 30%|██▉ | 745/2500 [2:49:36<6:50:43, 14.04s/it] {'loss': 0.0012, 'grad_norm': 2.6818257284488087, 'learning_rate': 7.019999999999999e-07, 'completion_length': 56.16071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02886962890625, 'epoch': 0.3} 30%|██▉ | 745/2500 [2:49:36<6:50:43, 14.04s/it] 30%|██▉ | 746/2500 [2:49:49<6:43:19, 13.80s/it] {'loss': 0.0008, 'grad_norm': 0.08406623853520942, 'learning_rate': 7.016e-07, 'completion_length': 55.87500190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.019439697265625, 'epoch': 0.3} 30%|██▉ | 746/2500 [2:49:49<6:43:19, 13.80s/it] 30%|██▉ | 747/2500 [2:50:03<6:43:35, 13.81s/it] {'loss': 0.0009, 'grad_norm': 0.06078019759826697, 'learning_rate': 7.012000000000001e-07, 'completion_length': 55.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02325439453125, 'epoch': 0.3} 30%|██▉ | 747/2500 [2:50:03<6:43:35, 13.81s/it] 30%|██▉ | 748/2500 [2:50:20<7:09:04, 14.69s/it] {'loss': 0.0011, 'grad_norm': 11.008052615099668, 'learning_rate': 7.007999999999999e-07, 'completion_length': 61.91071701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02642822265625, 'epoch': 0.3} 30%|██▉ | 748/2500 [2:50:20<7:09:04, 14.69s/it] 30%|██▉ | 749/2500 [2:50:33<6:56:54, 14.29s/it] {'loss': 0.0012, 'grad_norm': 3.2711287896765096, 'learning_rate': 7.004e-07, 'completion_length': 56.517860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0301513671875, 'epoch': 0.3} 30%|██▉ | 749/2500 [2:50:33<6:56:54, 14.29s/it] 30%|███ | 750/2500 [2:50:47<6:52:09, 14.13s/it] {'loss': 0.0007, 'grad_norm': 0.11858311679318627, 'learning_rate': 7e-07, 'completion_length': 52.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01812744140625, 'epoch': 0.3} 30%|███ | 750/2500 [2:50:47<6:52:09, 14.13s/it] 30%|███ | 751/2500 [2:51:00<6:46:44, 13.95s/it] {'loss': 0.0012, 'grad_norm': 0.09695316300991451, 'learning_rate': 6.995999999999999e-07, 'completion_length': 61.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02899169921875, 'epoch': 0.3} 30%|███ | 751/2500 [2:51:00<6:46:44, 13.95s/it] 30%|███ | 752/2500 [2:51:14<6:43:22, 13.85s/it] {'loss': 0.0022, 'grad_norm': 1.5963898023328593, 'learning_rate': 6.992e-07, 'completion_length': 55.55357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0557861328125, 'epoch': 0.3} 30%|███ | 752/2500 [2:51:14<6:43:22, 13.85s/it] 30%|███ | 753/2500 [2:51:27<6:39:41, 13.73s/it] {'loss': 0.0011, 'grad_norm': 0.10391109233733205, 'learning_rate': 6.988e-07, 'completion_length': 49.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027496337890625, 'epoch': 0.3} 30%|███ | 753/2500 [2:51:27<6:39:41, 13.73s/it] 30%|███ | 754/2500 [2:51:41<6:35:44, 13.60s/it] {'loss': 0.0016, 'grad_norm': 0.10059524487337207, 'learning_rate': 6.984e-07, 'completion_length': 51.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.040283203125, 'epoch': 0.3} 30%|███ | 754/2500 [2:51:41<6:35:44, 13.60s/it] 30%|███ | 755/2500 [2:51:54<6:31:22, 13.46s/it] {'loss': 0.0014, 'grad_norm': 0.0898317155524682, 'learning_rate': 6.979999999999999e-07, 'completion_length': 56.107147216796875, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0355224609375, 'epoch': 0.3} 30%|███ | 755/2500 [2:51:54<6:31:22, 13.46s/it] 30%|███ | 756/2500 [2:52:11<7:02:20, 14.53s/it] {'loss': 0.0022, 'grad_norm': 0.08151857714485095, 'learning_rate': 6.976e-07, 'completion_length': 58.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05615234375, 'epoch': 0.3} 30%|███ | 756/2500 [2:52:11<7:02:20, 14.53s/it] 30%|███ | 757/2500 [2:52:24<6:50:52, 14.14s/it] {'loss': 0.0007, 'grad_norm': 0.17153668353729848, 'learning_rate': 6.972e-07, 'completion_length': 60.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01763916015625, 'epoch': 0.3} 30%|███ | 757/2500 [2:52:24<6:50:52, 14.14s/it] 30%|███ | 758/2500 [2:52:39<6:59:43, 14.46s/it] {'loss': 0.002, 'grad_norm': 0.1250370298568543, 'learning_rate': 6.967999999999999e-07, 'completion_length': 54.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0506591796875, 'epoch': 0.3} 30%|███ | 758/2500 [2:52:39<6:59:43, 14.46s/it] 30%|███ | 759/2500 [2:52:53<6:51:16, 14.17s/it] {'loss': 0.0012, 'grad_norm': 0.17065713416504036, 'learning_rate': 6.964e-07, 'completion_length': 50.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03094482421875, 'epoch': 0.3} 30%|███ | 759/2500 [2:52:53<6:51:16, 14.17s/it] 30%|███ | 760/2500 [2:53:12<7:33:35, 15.64s/it] {'loss': 0.0011, 'grad_norm': 0.41741199166913145, 'learning_rate': 6.959999999999999e-07, 'completion_length': 65.00000381469727, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.02655029296875, 'epoch': 0.3} 30%|███ | 760/2500 [2:53:12<7:33:35, 15.64s/it] 30%|███ | 761/2500 [2:53:26<7:21:19, 15.23s/it] {'loss': 0.0012, 'grad_norm': 2.2795283688028394, 'learning_rate': 6.956e-07, 'completion_length': 63.55357551574707, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03131103515625, 'epoch': 0.3} 30%|███ | 761/2500 [2:53:26<7:21:19, 15.23s/it] 30%|███ | 762/2500 [2:53:41<7:15:42, 15.04s/it] {'loss': 0.0013, 'grad_norm': 0.19234471987093626, 'learning_rate': 6.952e-07, 'completion_length': 63.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0335693359375, 'epoch': 0.3} 30%|███ | 762/2500 [2:53:41<7:15:42, 15.04s/it] 31%|███ | 763/2500 [2:53:54<7:02:37, 14.60s/it] {'loss': 0.0013, 'grad_norm': 0.08483073826307225, 'learning_rate': 6.947999999999999e-07, 'completion_length': 56.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03173828125, 'epoch': 0.31} 31%|███ | 763/2500 [2:53:54<7:02:37, 14.60s/it] 31%|███ | 764/2500 [2:54:09<7:02:32, 14.60s/it] {'loss': 0.0018, 'grad_norm': 0.09932821891035463, 'learning_rate': 6.944e-07, 'completion_length': 66.50000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.045654296875, 'epoch': 0.31} 31%|███ | 764/2500 [2:54:09<7:02:32, 14.60s/it] 31%|███ | 765/2500 [2:54:23<6:54:53, 14.35s/it] {'loss': 0.0016, 'grad_norm': 0.10545727402089511, 'learning_rate': 6.939999999999999e-07, 'completion_length': 52.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0389404296875, 'epoch': 0.31} 31%|███ | 765/2500 [2:54:23<6:54:53, 14.35s/it] 31%|███ | 766/2500 [2:54:36<6:46:41, 14.07s/it] {'loss': 0.0027, 'grad_norm': 0.14860977070992012, 'learning_rate': 6.935999999999999e-07, 'completion_length': 55.410715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0673828125, 'epoch': 0.31} 31%|███ | 766/2500 [2:54:36<6:46:41, 14.07s/it] 31%|███ | 767/2500 [2:54:50<6:46:04, 14.06s/it] {'loss': 0.0016, 'grad_norm': 2.517915567882457, 'learning_rate': 6.932e-07, 'completion_length': 60.392860412597656, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.0390625, 'epoch': 0.31} 31%|███ | 767/2500 [2:54:50<6:46:04, 14.06s/it] 31%|███ | 768/2500 [2:55:04<6:43:22, 13.97s/it] {'loss': 0.0013, 'grad_norm': 0.1252606522167723, 'learning_rate': 6.928e-07, 'completion_length': 57.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0328369140625, 'epoch': 0.31} 31%|███ | 768/2500 [2:55:04<6:43:22, 13.97s/it] 31%|███ | 769/2500 [2:55:17<6:35:45, 13.72s/it] {'loss': 0.0018, 'grad_norm': 0.10795295951911772, 'learning_rate': 6.924e-07, 'completion_length': 47.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0439453125, 'epoch': 0.31} 31%|███ | 769/2500 [2:55:17<6:35:45, 13.72s/it] 31%|███ | 770/2500 [2:55:31<6:38:51, 13.83s/it] {'loss': 0.0015, 'grad_norm': 0.12955170217366468, 'learning_rate': 6.919999999999999e-07, 'completion_length': 62.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03759765625, 'epoch': 0.31} 31%|███ | 770/2500 [2:55:31<6:38:51, 13.83s/it] 31%|███ | 771/2500 [2:55:46<6:43:31, 14.00s/it] {'loss': 0.0017, 'grad_norm': 0.24183961555454211, 'learning_rate': 6.916e-07, 'completion_length': 57.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04229736328125, 'epoch': 0.31} 31%|███ | 771/2500 [2:55:46<6:43:31, 14.00s/it] 31%|███ | 772/2500 [2:55:59<6:39:11, 13.86s/it] {'loss': 0.0019, 'grad_norm': 0.13999902897375363, 'learning_rate': 6.912e-07, 'completion_length': 56.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0478515625, 'epoch': 0.31} 31%|███ | 772/2500 [2:55:59<6:39:11, 13.86s/it] 31%|███ | 773/2500 [2:56:13<6:39:14, 13.87s/it] {'loss': 0.0008, 'grad_norm': 0.8597788653732019, 'learning_rate': 6.907999999999999e-07, 'completion_length': 57.142860412597656, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0208740234375, 'epoch': 0.31} 31%|███ | 773/2500 [2:56:13<6:39:14, 13.87s/it] 31%|███ | 774/2500 [2:56:27<6:37:20, 13.81s/it] {'loss': 0.0009, 'grad_norm': 0.34089006581350234, 'learning_rate': 6.904e-07, 'completion_length': 61.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02197265625, 'epoch': 0.31} 31%|███ | 774/2500 [2:56:27<6:37:20, 13.81s/it] 31%|███ | 775/2500 [2:56:41<6:40:36, 13.93s/it] {'loss': 0.0013, 'grad_norm': 0.10077012644310021, 'learning_rate': 6.9e-07, 'completion_length': 54.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03277587890625, 'epoch': 0.31} 31%|███ | 775/2500 [2:56:41<6:40:36, 13.93s/it] 31%|███ | 776/2500 [2:56:55<6:40:07, 13.93s/it] {'loss': 0.0017, 'grad_norm': 0.05932619134320799, 'learning_rate': 6.895999999999999e-07, 'completion_length': 63.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04345703125, 'epoch': 0.31} 31%|███ | 776/2500 [2:56:55<6:40:07, 13.93s/it] 31%|███ | 777/2500 [2:57:09<6:43:54, 14.07s/it] {'loss': 0.0013, 'grad_norm': 0.10025836093274983, 'learning_rate': 6.892e-07, 'completion_length': 60.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.31} 31%|███ | 777/2500 [2:57:09<6:43:54, 14.07s/it] 31%|███ | 778/2500 [2:57:23<6:45:15, 14.12s/it] {'loss': 0.0023, 'grad_norm': 0.12189080740950849, 'learning_rate': 6.888e-07, 'completion_length': 64.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0572509765625, 'epoch': 0.31} 31%|███ | 778/2500 [2:57:23<6:45:15, 14.12s/it] 31%|███ | 779/2500 [2:57:36<6:35:47, 13.80s/it] {'loss': 0.0011, 'grad_norm': 0.1332029161432603, 'learning_rate': 6.883999999999999e-07, 'completion_length': 49.75000190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02691650390625, 'epoch': 0.31} 31%|███ | 779/2500 [2:57:36<6:35:47, 13.80s/it] 31%|███ | 780/2500 [2:57:52<6:54:39, 14.46s/it] {'loss': 0.001, 'grad_norm': 0.06084210833666039, 'learning_rate': 6.879999999999999e-07, 'completion_length': 63.78571891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0247802734375, 'epoch': 0.31} 31%|███ | 780/2500 [2:57:52<6:54:39, 14.46s/it] 31%|███ | 781/2500 [2:58:07<6:51:37, 14.37s/it] {'loss': 0.0019, 'grad_norm': 2.628801973713936, 'learning_rate': 6.876e-07, 'completion_length': 55.80357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.047607421875, 'epoch': 0.31} 31%|███ | 781/2500 [2:58:07<6:51:37, 14.37s/it] 31%|███▏ | 782/2500 [2:58:21<6:53:51, 14.45s/it] {'loss': 0.0017, 'grad_norm': 1.2697045387061163, 'learning_rate': 6.872e-07, 'completion_length': 61.62500190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0419921875, 'epoch': 0.31} 31%|███▏ | 782/2500 [2:58:21<6:53:51, 14.45s/it] 31%|███▏ | 783/2500 [2:58:38<7:16:37, 15.26s/it] {'loss': 0.0011, 'grad_norm': 4.642971520104478, 'learning_rate': 6.867999999999999e-07, 'completion_length': 57.66071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02667236328125, 'epoch': 0.31} 31%|███▏ | 783/2500 [2:58:38<7:16:37, 15.26s/it] 31%|███▏ | 784/2500 [2:58:52<6:59:39, 14.67s/it] {'loss': 0.0006, 'grad_norm': 0.07861457081305712, 'learning_rate': 6.864e-07, 'completion_length': 53.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013824462890625, 'epoch': 0.31} 31%|███▏ | 784/2500 [2:58:52<6:59:39, 14.67s/it] 31%|███▏ | 785/2500 [2:59:06<6:52:42, 14.44s/it] {'loss': 0.0014, 'grad_norm': 0.09647798306430239, 'learning_rate': 6.86e-07, 'completion_length': 57.125003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03533935546875, 'epoch': 0.31} 31%|███▏ | 785/2500 [2:59:06<6:52:42, 14.44s/it] 31%|███▏ | 786/2500 [2:59:20<6:54:19, 14.50s/it] {'loss': 0.001, 'grad_norm': 0.2533622594855885, 'learning_rate': 6.855999999999999e-07, 'completion_length': 59.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02508544921875, 'epoch': 0.31} 31%|███▏ | 786/2500 [2:59:20<6:54:19, 14.50s/it] 31%|███▏ | 787/2500 [2:59:34<6:43:48, 14.14s/it] {'loss': 0.0018, 'grad_norm': 1.7543244324457385, 'learning_rate': 6.852e-07, 'completion_length': 52.285715103149414, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.07695359364151955, 'kl': 0.0445556640625, 'epoch': 0.31} 31%|███▏ | 787/2500 [2:59:34<6:43:48, 14.14s/it] 32%|███▏ | 788/2500 [2:59:48<6:45:02, 14.20s/it] {'loss': 0.0009, 'grad_norm': 0.0883462788907039, 'learning_rate': 6.847999999999999e-07, 'completion_length': 54.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022216796875, 'epoch': 0.32} 32%|███▏ | 788/2500 [2:59:48<6:45:02, 14.20s/it] 32%|███▏ | 789/2500 [3:00:02<6:43:08, 14.14s/it] {'loss': 0.0012, 'grad_norm': 3.562738238159449, 'learning_rate': 6.844e-07, 'completion_length': 62.00000190734863, 'rewards/accuracy_reward': 0.8928571939468384, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.04123930633068085, 'kl': 0.0311279296875, 'epoch': 0.32} 32%|███▏ | 789/2500 [3:00:02<6:43:08, 14.14s/it] 32%|███▏ | 790/2500 [3:00:16<6:42:46, 14.13s/it] {'loss': 0.0014, 'grad_norm': 0.5410677083329736, 'learning_rate': 6.84e-07, 'completion_length': 52.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.035888671875, 'epoch': 0.32} 32%|███▏ | 790/2500 [3:00:16<6:42:46, 14.13s/it] 32%|███▏ | 791/2500 [3:00:29<6:36:33, 13.92s/it] {'loss': 0.0013, 'grad_norm': 2.006390592718714, 'learning_rate': 6.836e-07, 'completion_length': 56.17857551574707, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.033203125, 'epoch': 0.32} 32%|███▏ | 791/2500 [3:00:29<6:36:33, 13.92s/it] 32%|███▏ | 792/2500 [3:00:43<6:31:09, 13.74s/it] {'loss': 0.001, 'grad_norm': 0.15764790240148116, 'learning_rate': 6.832e-07, 'completion_length': 51.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0252685546875, 'epoch': 0.32} 32%|███▏ | 792/2500 [3:00:43<6:31:09, 13.74s/it] 32%|███▏ | 793/2500 [3:00:58<6:44:30, 14.22s/it] {'loss': 0.0009, 'grad_norm': 0.13649750403606647, 'learning_rate': 6.827999999999999e-07, 'completion_length': 62.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02294921875, 'epoch': 0.32} 32%|███▏ | 793/2500 [3:00:58<6:44:30, 14.22s/it] 32%|███▏ | 794/2500 [3:01:12<6:44:17, 14.22s/it] {'loss': 0.0013, 'grad_norm': 1.8606986889228336, 'learning_rate': 6.824e-07, 'completion_length': 67.28571701049805, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.0357142873108387, 'kl': 0.03167724609375, 'epoch': 0.32} 32%|███▏ | 794/2500 [3:01:12<6:44:17, 14.22s/it] 32%|███▏ | 795/2500 [3:01:27<6:44:47, 14.25s/it] {'loss': 0.0011, 'grad_norm': 0.12484584437852456, 'learning_rate': 6.82e-07, 'completion_length': 64.80357551574707, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.026611328125, 'epoch': 0.32} 32%|███▏ | 795/2500 [3:01:27<6:44:47, 14.25s/it] 32%|███▏ | 796/2500 [3:01:42<6:50:29, 14.45s/it] {'loss': 0.0018, 'grad_norm': 0.19792019791836227, 'learning_rate': 6.816e-07, 'completion_length': 65.42857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.043701171875, 'epoch': 0.32} 32%|███▏ | 796/2500 [3:01:42<6:50:29, 14.45s/it] 32%|███▏ | 797/2500 [3:01:59<7:18:46, 15.46s/it] {'loss': 0.0011, 'grad_norm': 0.07455815379814612, 'learning_rate': 6.812e-07, 'completion_length': 61.589290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02642822265625, 'epoch': 0.32} 32%|███▏ | 797/2500 [3:01:59<7:18:46, 15.46s/it] 32%|███▏ | 798/2500 [3:02:13<7:05:44, 15.01s/it] {'loss': 0.0021, 'grad_norm': 1.1308767035485485, 'learning_rate': 6.807999999999999e-07, 'completion_length': 52.660715103149414, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0513916015625, 'epoch': 0.32} 32%|███▏ | 798/2500 [3:02:13<7:05:44, 15.01s/it] 32%|███▏ | 799/2500 [3:02:28<7:04:23, 14.97s/it] {'loss': 0.0009, 'grad_norm': 0.06274844197307876, 'learning_rate': 6.804e-07, 'completion_length': 59.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02325439453125, 'epoch': 0.32} 32%|███▏ | 799/2500 [3:02:28<7:04:23, 14.97s/it] 32%|███▏ | 800/2500 [3:02:42<6:54:01, 14.61s/it] {'loss': 0.0007, 'grad_norm': 0.11378686636067654, 'learning_rate': 6.800000000000001e-07, 'completion_length': 55.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01678466796875, 'epoch': 0.32} 32%|███▏ | 800/2500 [3:02:42<6:54:01, 14.61s/it] 32%|███▏ | 801/2500 [3:03:52<14:46:59, 31.32s/it] {'loss': 0.0015, 'grad_norm': 0.986354540783464, 'learning_rate': 6.795999999999999e-07, 'completion_length': 60.94643211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03759765625, 'epoch': 0.32} 32%|███▏ | 801/2500 [3:03:52<14:46:59, 31.32s/it] 32%|███▏ | 802/2500 [3:04:05<12:11:39, 25.85s/it] {'loss': 0.0014, 'grad_norm': 0.15744961930821125, 'learning_rate': 6.792e-07, 'completion_length': 54.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.033935546875, 'epoch': 0.32} 32%|███▏ | 802/2500 [3:04:05<12:11:39, 25.85s/it] 32%|███▏ | 803/2500 [3:04:18<10:20:33, 21.94s/it] {'loss': 0.0012, 'grad_norm': 0.1476991355496387, 'learning_rate': 6.788e-07, 'completion_length': 50.85714530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.029571533203125, 'epoch': 0.32} 32%|███▏ | 803/2500 [3:04:18<10:20:33, 21.94s/it] 32%|███▏ | 804/2500 [3:04:32<9:10:24, 19.47s/it] {'loss': 0.0023, 'grad_norm': 0.15333258992736043, 'learning_rate': 6.783999999999999e-07, 'completion_length': 57.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0572509765625, 'epoch': 0.32} 32%|███▏ | 804/2500 [3:04:32<9:10:24, 19.47s/it] 32%|███▏ | 805/2500 [3:04:46<8:22:05, 17.77s/it] {'loss': 0.0009, 'grad_norm': 0.10365020138619581, 'learning_rate': 6.78e-07, 'completion_length': 50.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.32} 32%|███▏ | 805/2500 [3:04:46<8:22:05, 17.77s/it] 32%|███▏ | 806/2500 [3:04:59<7:40:19, 16.30s/it] {'loss': 0.0011, 'grad_norm': 0.14052191604646452, 'learning_rate': 6.776e-07, 'completion_length': 47.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02801513671875, 'epoch': 0.32} 32%|███▏ | 806/2500 [3:04:59<7:40:19, 16.30s/it] 32%|███▏ | 807/2500 [3:05:13<7:23:18, 15.71s/it] {'loss': 0.0016, 'grad_norm': 0.1317139868012151, 'learning_rate': 6.772e-07, 'completion_length': 55.53571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0400390625, 'epoch': 0.32} 32%|███▏ | 807/2500 [3:05:13<7:23:18, 15.71s/it] 32%|███▏ | 808/2500 [3:05:26<7:03:13, 15.01s/it] {'loss': 0.0009, 'grad_norm': 0.1021876549106448, 'learning_rate': 6.767999999999999e-07, 'completion_length': 50.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022369384765625, 'epoch': 0.32} 32%|███▏ | 808/2500 [3:05:26<7:03:13, 15.01s/it] 32%|███▏ | 809/2500 [3:05:40<6:50:35, 14.57s/it] {'loss': 0.0019, 'grad_norm': 1.02095642952281, 'learning_rate': 6.764e-07, 'completion_length': 54.80357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0472412109375, 'epoch': 0.32} 32%|███▏ | 809/2500 [3:05:40<6:50:35, 14.57s/it] 32%|███▏ | 810/2500 [3:05:54<6:45:58, 14.41s/it] {'loss': 0.0015, 'grad_norm': 10.936218795159943, 'learning_rate': 6.76e-07, 'completion_length': 61.392860412597656, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03759765625, 'epoch': 0.32} 32%|███▏ | 810/2500 [3:05:54<6:45:58, 14.41s/it] 32%|███▏ | 811/2500 [3:06:09<6:50:02, 14.57s/it] {'loss': 0.0017, 'grad_norm': 0.09095204192663289, 'learning_rate': 6.755999999999999e-07, 'completion_length': 63.982147216796875, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.04345703125, 'epoch': 0.32} 32%|███▏ | 811/2500 [3:06:09<6:50:02, 14.57s/it] 32%|███▏ | 812/2500 [3:06:22<6:41:50, 14.28s/it] {'loss': 0.0006, 'grad_norm': 0.12373427887298602, 'learning_rate': 6.752e-07, 'completion_length': 51.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014373779296875, 'epoch': 0.32} 32%|███▏ | 812/2500 [3:06:22<6:41:50, 14.28s/it] 33%|███▎ | 813/2500 [3:06:37<6:42:04, 14.30s/it] {'loss': 0.0013, 'grad_norm': 0.06414625089647978, 'learning_rate': 6.747999999999999e-07, 'completion_length': 57.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03155517578125, 'epoch': 0.33} 33%|███▎ | 813/2500 [3:06:37<6:42:04, 14.30s/it] 33%|███▎ | 814/2500 [3:06:51<6:38:27, 14.18s/it] {'loss': 0.0008, 'grad_norm': 4.674028327809063, 'learning_rate': 6.744e-07, 'completion_length': 55.142860412597656, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.020751953125, 'epoch': 0.33} 33%|███▎ | 814/2500 [3:06:51<6:38:27, 14.18s/it] 33%|███▎ | 815/2500 [3:07:04<6:33:29, 14.01s/it] {'loss': 0.001, 'grad_norm': 0.35009969665835244, 'learning_rate': 6.74e-07, 'completion_length': 58.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0255126953125, 'epoch': 0.33} 33%|███▎ | 815/2500 [3:07:04<6:33:29, 14.01s/it] 33%|███▎ | 816/2500 [3:07:18<6:34:00, 14.04s/it] {'loss': 0.0013, 'grad_norm': 0.09103655689850575, 'learning_rate': 6.736e-07, 'completion_length': 52.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03173828125, 'epoch': 0.33} 33%|███▎ | 816/2500 [3:07:18<6:34:00, 14.04s/it] 33%|███▎ | 817/2500 [3:07:32<6:32:35, 14.00s/it] {'loss': 0.0011, 'grad_norm': 0.0885253034087936, 'learning_rate': 6.732e-07, 'completion_length': 61.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02630615234375, 'epoch': 0.33} 33%|███▎ | 817/2500 [3:07:32<6:32:35, 14.00s/it] 33%|███▎ | 818/2500 [3:07:47<6:41:23, 14.32s/it] {'loss': 0.0016, 'grad_norm': 0.9852323319690413, 'learning_rate': 6.727999999999999e-07, 'completion_length': 61.55357551574707, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.041015625, 'epoch': 0.33} 33%|███▎ | 818/2500 [3:07:47<6:41:23, 14.32s/it] 33%|███▎ | 819/2500 [3:08:01<6:35:21, 14.11s/it] {'loss': 0.0013, 'grad_norm': 0.10738002990809563, 'learning_rate': 6.724e-07, 'completion_length': 54.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03350830078125, 'epoch': 0.33} 33%|███▎ | 819/2500 [3:08:01<6:35:21, 14.11s/it] 33%|███▎ | 820/2500 [3:08:15<6:33:09, 14.04s/it] {'loss': 0.0015, 'grad_norm': 2.0658587321522814, 'learning_rate': 6.72e-07, 'completion_length': 62.732147216796875, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0824786126613617, 'kl': 0.037109375, 'epoch': 0.33} 33%|███▎ | 820/2500 [3:08:15<6:33:09, 14.04s/it] 33%|███▎ | 821/2500 [3:08:29<6:37:36, 14.21s/it] {'loss': 0.001, 'grad_norm': 0.11199887003301462, 'learning_rate': 6.716e-07, 'completion_length': 58.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02508544921875, 'epoch': 0.33} 33%|███▎ | 821/2500 [3:08:29<6:37:36, 14.21s/it] 33%|███▎ | 822/2500 [3:08:43<6:34:20, 14.10s/it] {'loss': 0.0012, 'grad_norm': 0.06734112791632953, 'learning_rate': 6.712e-07, 'completion_length': 54.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029296875, 'epoch': 0.33} 33%|███▎ | 822/2500 [3:08:43<6:34:20, 14.10s/it] 33%|███▎ | 823/2500 [3:08:59<6:50:19, 14.68s/it] {'loss': 0.0018, 'grad_norm': 0.10806141751143905, 'learning_rate': 6.707999999999999e-07, 'completion_length': 66.89286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.044921875, 'epoch': 0.33} 33%|███▎ | 823/2500 [3:08:59<6:50:19, 14.68s/it] 33%|███▎ | 824/2500 [3:09:13<6:42:31, 14.41s/it] {'loss': 0.0014, 'grad_norm': 0.09137742368935035, 'learning_rate': 6.704e-07, 'completion_length': 54.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03460693359375, 'epoch': 0.33} 33%|███▎ | 824/2500 [3:09:13<6:42:31, 14.41s/it] 33%|███▎ | 825/2500 [3:09:27<6:40:28, 14.35s/it] {'loss': 0.0012, 'grad_norm': 1.10278607487176, 'learning_rate': 6.7e-07, 'completion_length': 58.69643211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0302734375, 'epoch': 0.33} 33%|███▎ | 825/2500 [3:09:27<6:40:28, 14.35s/it] 33%|███▎ | 826/2500 [3:09:41<6:32:13, 14.06s/it] {'loss': 0.0014, 'grad_norm': 0.1371188983089447, 'learning_rate': 6.695999999999999e-07, 'completion_length': 52.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03466796875, 'epoch': 0.33} 33%|███▎ | 826/2500 [3:09:41<6:32:13, 14.06s/it] 33%|███▎ | 827/2500 [3:09:57<6:48:04, 14.63s/it] {'loss': 0.0008, 'grad_norm': 0.12088665341634625, 'learning_rate': 6.692e-07, 'completion_length': 57.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02020263671875, 'epoch': 0.33} 33%|███▎ | 827/2500 [3:09:57<6:48:04, 14.63s/it] 33%|███▎ | 828/2500 [3:10:11<6:43:56, 14.50s/it] {'loss': 0.0017, 'grad_norm': 2.595034934539072, 'learning_rate': 6.688e-07, 'completion_length': 57.428571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0419921875, 'epoch': 0.33} 33%|███▎ | 828/2500 [3:10:11<6:43:56, 14.50s/it] 33%|███▎ | 829/2500 [3:10:24<6:35:15, 14.19s/it] {'loss': 0.0008, 'grad_norm': 0.12189526738549288, 'learning_rate': 6.683999999999999e-07, 'completion_length': 50.660715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02105712890625, 'epoch': 0.33} 33%|███▎ | 829/2500 [3:10:24<6:35:15, 14.19s/it] 33%|███▎ | 830/2500 [3:10:38<6:32:03, 14.09s/it] {'loss': 0.0014, 'grad_norm': 1.3344864928154494, 'learning_rate': 6.68e-07, 'completion_length': 58.42857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.03607177734375, 'epoch': 0.33} 33%|███▎ | 830/2500 [3:10:38<6:32:03, 14.09s/it] 33%|███▎ | 831/2500 [3:10:52<6:27:51, 13.94s/it] {'loss': 0.0008, 'grad_norm': 0.07338032035206629, 'learning_rate': 6.676e-07, 'completion_length': 57.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020965576171875, 'epoch': 0.33} 33%|███▎ | 831/2500 [3:10:52<6:27:51, 13.94s/it] 33%|███▎ | 832/2500 [3:11:05<6:22:56, 13.78s/it] {'loss': 0.0012, 'grad_norm': 0.08360580608668221, 'learning_rate': 6.671999999999999e-07, 'completion_length': 56.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03070068359375, 'epoch': 0.33} 33%|███▎ | 832/2500 [3:11:05<6:22:56, 13.78s/it] 33%|███▎ | 833/2500 [3:11:20<6:34:36, 14.20s/it] {'loss': 0.0011, 'grad_norm': 0.15316694932619365, 'learning_rate': 6.667999999999999e-07, 'completion_length': 58.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027587890625, 'epoch': 0.33} 33%|███▎ | 833/2500 [3:11:20<6:34:36, 14.20s/it] 33%|███▎ | 834/2500 [3:11:34<6:31:40, 14.11s/it] {'loss': 0.0008, 'grad_norm': 0.0869924374144073, 'learning_rate': 6.664e-07, 'completion_length': 59.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02032470703125, 'epoch': 0.33} 33%|███▎ | 834/2500 [3:11:34<6:31:40, 14.11s/it] 33%|███▎ | 835/2500 [3:11:47<6:24:18, 13.85s/it] {'loss': 0.001, 'grad_norm': 0.08418510620502631, 'learning_rate': 6.66e-07, 'completion_length': 51.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026123046875, 'epoch': 0.33} 33%|███▎ | 835/2500 [3:11:47<6:24:18, 13.85s/it] 33%|███▎ | 836/2500 [3:12:01<6:19:07, 13.67s/it] {'loss': 0.0012, 'grad_norm': 0.10920755844908031, 'learning_rate': 6.655999999999999e-07, 'completion_length': 54.392860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03094482421875, 'epoch': 0.33} 33%|███▎ | 836/2500 [3:12:01<6:19:07, 13.67s/it] 33%|███▎ | 837/2500 [3:12:14<6:19:04, 13.68s/it] {'loss': 0.0007, 'grad_norm': 0.08623781122531364, 'learning_rate': 6.652e-07, 'completion_length': 61.660715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0181884765625, 'epoch': 0.33} 33%|███▎ | 837/2500 [3:12:14<6:19:04, 13.68s/it] 34%|███▎ | 838/2500 [3:12:28<6:16:09, 13.58s/it] {'loss': 0.0008, 'grad_norm': 0.07463947606002541, 'learning_rate': 6.647999999999999e-07, 'completion_length': 55.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019439697265625, 'epoch': 0.34} 34%|███▎ | 838/2500 [3:12:28<6:16:09, 13.58s/it] 34%|███▎ | 839/2500 [3:12:43<6:26:37, 13.97s/it] {'loss': 0.0016, 'grad_norm': 0.19978032449384292, 'learning_rate': 6.643999999999999e-07, 'completion_length': 68.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0401611328125, 'epoch': 0.34} 34%|███▎ | 839/2500 [3:12:43<6:26:37, 13.97s/it] 34%|███▎ | 840/2500 [3:12:56<6:23:24, 13.86s/it] {'loss': 0.001, 'grad_norm': 0.08440463782695921, 'learning_rate': 6.64e-07, 'completion_length': 62.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02593994140625, 'epoch': 0.34} 34%|███▎ | 840/2500 [3:12:56<6:23:24, 13.86s/it] 34%|███▎ | 841/2500 [3:13:10<6:23:06, 13.86s/it] {'loss': 0.0014, 'grad_norm': 0.20763684714779934, 'learning_rate': 6.636e-07, 'completion_length': 55.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0361328125, 'epoch': 0.34} 34%|███▎ | 841/2500 [3:13:10<6:23:06, 13.86s/it] 34%|███▎ | 842/2500 [3:13:25<6:33:20, 14.23s/it] {'loss': 0.0016, 'grad_norm': 0.11075787035948238, 'learning_rate': 6.632e-07, 'completion_length': 65.98214721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0408935546875, 'epoch': 0.34} 34%|███▎ | 842/2500 [3:13:25<6:33:20, 14.23s/it] 34%|███▎ | 843/2500 [3:13:39<6:25:21, 13.95s/it] {'loss': 0.0012, 'grad_norm': 0.11189429982127536, 'learning_rate': 6.627999999999999e-07, 'completion_length': 55.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02886962890625, 'epoch': 0.34} 34%|███▎ | 843/2500 [3:13:39<6:25:21, 13.95s/it] 34%|███▍ | 844/2500 [3:13:56<6:54:15, 15.01s/it] {'loss': 0.0016, 'grad_norm': 0.08899438926386984, 'learning_rate': 6.624e-07, 'completion_length': 66.21428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0390625, 'epoch': 0.34} 34%|███▍ | 844/2500 [3:13:56<6:54:15, 15.01s/it] 34%|███▍ | 845/2500 [3:14:10<6:42:57, 14.61s/it] {'loss': 0.0008, 'grad_norm': 0.09026868781913784, 'learning_rate': 6.62e-07, 'completion_length': 58.607147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01910400390625, 'epoch': 0.34} 34%|███▍ | 845/2500 [3:14:10<6:42:57, 14.61s/it] 34%|███▍ | 846/2500 [3:14:23<6:35:06, 14.33s/it] {'loss': 0.0019, 'grad_norm': 0.1098010377663239, 'learning_rate': 6.615999999999999e-07, 'completion_length': 58.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0482177734375, 'epoch': 0.34} 34%|███▍ | 846/2500 [3:14:23<6:35:06, 14.33s/it] 34%|███▍ | 847/2500 [3:14:39<6:49:06, 14.85s/it] {'loss': 0.0013, 'grad_norm': 1.6724147895877093, 'learning_rate': 6.612e-07, 'completion_length': 63.30357551574707, 'rewards/accuracy_reward': 0.9107142984867096, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.07695358991622925, 'kl': 0.03271484375, 'epoch': 0.34} 34%|███▍ | 847/2500 [3:14:39<6:49:06, 14.85s/it] 34%|███▍ | 848/2500 [3:14:53<6:42:21, 14.61s/it] {'loss': 0.0011, 'grad_norm': 0.07517654600702925, 'learning_rate': 6.608e-07, 'completion_length': 55.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02783203125, 'epoch': 0.34} 34%|███▍ | 848/2500 [3:14:53<6:42:21, 14.61s/it] 34%|███▍ | 849/2500 [3:15:07<6:32:57, 14.28s/it] {'loss': 0.0008, 'grad_norm': 0.07286375829061237, 'learning_rate': 6.604e-07, 'completion_length': 53.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021240234375, 'epoch': 0.34} 34%|███▍ | 849/2500 [3:15:07<6:32:57, 14.28s/it] 34%|███▍ | 850/2500 [3:15:20<6:25:52, 14.03s/it] {'loss': 0.001, 'grad_norm': 0.08001066022201002, 'learning_rate': 6.6e-07, 'completion_length': 45.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0247802734375, 'epoch': 0.34} 34%|███▍ | 850/2500 [3:15:20<6:25:52, 14.03s/it] 34%|███▍ | 851/2500 [3:15:35<6:26:59, 14.08s/it] {'loss': 0.0013, 'grad_norm': 1.6007316686724973, 'learning_rate': 6.595999999999999e-07, 'completion_length': 59.55357551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.033447265625, 'epoch': 0.34} 34%|███▍ | 851/2500 [3:15:35<6:26:59, 14.08s/it] 34%|███▍ | 852/2500 [3:15:48<6:19:13, 13.81s/it] {'loss': 0.0011, 'grad_norm': 0.09464751550403738, 'learning_rate': 6.592e-07, 'completion_length': 49.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0264892578125, 'epoch': 0.34} 34%|███▍ | 852/2500 [3:15:48<6:19:13, 13.81s/it] 34%|███▍ | 853/2500 [3:16:01<6:14:36, 13.65s/it] {'loss': 0.0009, 'grad_norm': 0.07578089215519276, 'learning_rate': 6.588e-07, 'completion_length': 58.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.34} 34%|███▍ | 853/2500 [3:16:01<6:14:36, 13.65s/it] 34%|███▍ | 854/2500 [3:16:15<6:16:48, 13.74s/it] {'loss': 0.0011, 'grad_norm': 0.09594422545819015, 'learning_rate': 6.583999999999999e-07, 'completion_length': 52.285715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0286865234375, 'epoch': 0.34} 34%|███▍ | 854/2500 [3:16:15<6:16:48, 13.74s/it] 34%|███▍ | 855/2500 [3:16:28<6:12:59, 13.60s/it] {'loss': 0.0016, 'grad_norm': 0.19781435537943634, 'learning_rate': 6.58e-07, 'completion_length': 57.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04052734375, 'epoch': 0.34} 34%|███▍ | 855/2500 [3:16:28<6:12:59, 13.60s/it] 34%|███▍ | 856/2500 [3:16:41<6:09:24, 13.48s/it] {'loss': 0.0007, 'grad_norm': 0.06473994418220828, 'learning_rate': 6.576e-07, 'completion_length': 54.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01629638671875, 'epoch': 0.34} 34%|███▍ | 856/2500 [3:16:41<6:09:24, 13.48s/it] 34%|███▍ | 857/2500 [3:16:54<6:04:25, 13.31s/it] {'loss': 0.0013, 'grad_norm': 0.29241444498013086, 'learning_rate': 6.571999999999999e-07, 'completion_length': 52.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03326416015625, 'epoch': 0.34} 34%|███▍ | 857/2500 [3:16:54<6:04:25, 13.31s/it] 34%|███▍ | 858/2500 [3:17:09<6:14:00, 13.67s/it] {'loss': 0.0009, 'grad_norm': 0.09398579626536684, 'learning_rate': 6.568e-07, 'completion_length': 60.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02166748046875, 'epoch': 0.34} 34%|███▍ | 858/2500 [3:17:09<6:14:00, 13.67s/it] 34%|███▍ | 859/2500 [3:17:22<6:11:10, 13.57s/it] {'loss': 0.0016, 'grad_norm': 0.243294808417007, 'learning_rate': 6.564e-07, 'completion_length': 51.33928871154785, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03955078125, 'epoch': 0.34} 34%|███▍ | 859/2500 [3:17:22<6:11:10, 13.57s/it] 34%|███▍ | 860/2500 [3:17:36<6:13:26, 13.66s/it] {'loss': 0.0008, 'grad_norm': 0.10528299799602424, 'learning_rate': 6.56e-07, 'completion_length': 57.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01904296875, 'epoch': 0.34} 34%|███▍ | 860/2500 [3:17:36<6:13:26, 13.66s/it] 34%|███▍ | 861/2500 [3:17:49<6:09:44, 13.54s/it] {'loss': 0.0007, 'grad_norm': 0.08819493857048413, 'learning_rate': 6.555999999999999e-07, 'completion_length': 48.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017608642578125, 'epoch': 0.34} 34%|███▍ | 861/2500 [3:17:49<6:09:44, 13.54s/it] 34%|███▍ | 862/2500 [3:18:05<6:25:17, 14.11s/it] {'loss': 0.0019, 'grad_norm': 0.7013897809564896, 'learning_rate': 6.552e-07, 'completion_length': 59.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.046630859375, 'epoch': 0.34} 34%|███▍ | 862/2500 [3:18:05<6:25:17, 14.11s/it] 35%|███▍ | 863/2500 [3:18:19<6:24:02, 14.08s/it] {'loss': 0.0012, 'grad_norm': 1.5716281565208607, 'learning_rate': 6.548000000000001e-07, 'completion_length': 52.66071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03021240234375, 'epoch': 0.35} 35%|███▍ | 863/2500 [3:18:19<6:24:02, 14.08s/it] 35%|███▍ | 864/2500 [3:18:33<6:28:51, 14.26s/it] {'loss': 0.0009, 'grad_norm': 0.08364698836860075, 'learning_rate': 6.543999999999999e-07, 'completion_length': 55.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0223388671875, 'epoch': 0.35} 35%|███▍ | 864/2500 [3:18:34<6:28:51, 14.26s/it] 35%|███▍ | 865/2500 [3:18:47<6:25:04, 14.13s/it] {'loss': 0.0013, 'grad_norm': 0.07243735055299931, 'learning_rate': 6.54e-07, 'completion_length': 57.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03314208984375, 'epoch': 0.35} 35%|███▍ | 865/2500 [3:18:47<6:25:04, 14.13s/it] 35%|███▍ | 866/2500 [3:19:01<6:18:07, 13.88s/it] {'loss': 0.0015, 'grad_norm': 0.08006935282152117, 'learning_rate': 6.536e-07, 'completion_length': 47.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0374755859375, 'epoch': 0.35} 35%|███▍ | 866/2500 [3:19:01<6:18:07, 13.88s/it] 35%|███▍ | 867/2500 [3:19:14<6:13:41, 13.73s/it] {'loss': 0.0018, 'grad_norm': 0.10318041399305222, 'learning_rate': 6.531999999999999e-07, 'completion_length': 48.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.045806884765625, 'epoch': 0.35} 35%|███▍ | 867/2500 [3:19:14<6:13:41, 13.73s/it] 35%|███▍ | 868/2500 [3:19:27<6:10:27, 13.62s/it] {'loss': 0.0012, 'grad_norm': 0.07803504208680825, 'learning_rate': 6.528e-07, 'completion_length': 49.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03009033203125, 'epoch': 0.35} 35%|███▍ | 868/2500 [3:19:27<6:10:27, 13.62s/it] 35%|███▍ | 869/2500 [3:19:41<6:12:40, 13.71s/it] {'loss': 0.001, 'grad_norm': 0.07833548994552623, 'learning_rate': 6.524e-07, 'completion_length': 57.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024169921875, 'epoch': 0.35} 35%|███▍ | 869/2500 [3:19:41<6:12:40, 13.71s/it] 35%|███▍ | 870/2500 [3:19:56<6:19:16, 13.96s/it] {'loss': 0.0015, 'grad_norm': 0.16168235338666687, 'learning_rate': 6.52e-07, 'completion_length': 60.08928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.35} 35%|███▍ | 870/2500 [3:19:56<6:19:16, 13.96s/it] 35%|███▍ | 871/2500 [3:20:10<6:17:24, 13.90s/it] {'loss': 0.0008, 'grad_norm': 0.06383204022923289, 'learning_rate': 6.515999999999999e-07, 'completion_length': 52.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019500732421875, 'epoch': 0.35} 35%|███▍ | 871/2500 [3:20:10<6:17:24, 13.90s/it] 35%|███▍ | 872/2500 [3:20:24<6:21:29, 14.06s/it] {'loss': 0.0005, 'grad_norm': 0.06429162943030034, 'learning_rate': 6.512e-07, 'completion_length': 61.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01129150390625, 'epoch': 0.35} 35%|███▍ | 872/2500 [3:20:24<6:21:29, 14.06s/it] 35%|███▍ | 873/2500 [3:20:37<6:16:06, 13.87s/it] {'loss': 0.0009, 'grad_norm': 0.0991466950483286, 'learning_rate': 6.508e-07, 'completion_length': 59.69643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0235595703125, 'epoch': 0.35} 35%|███▍ | 873/2500 [3:20:37<6:16:06, 13.87s/it] 35%|███▍ | 874/2500 [3:20:51<6:11:17, 13.70s/it] {'loss': 0.0008, 'grad_norm': 0.1331843883160042, 'learning_rate': 6.504e-07, 'completion_length': 50.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018798828125, 'epoch': 0.35} 35%|███▍ | 874/2500 [3:20:51<6:11:17, 13.70s/it] 35%|███▌ | 875/2500 [3:21:04<6:11:18, 13.71s/it] {'loss': 0.0016, 'grad_norm': 1.3815385113760668, 'learning_rate': 6.5e-07, 'completion_length': 53.910715103149414, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0390625, 'epoch': 0.35} 35%|███▌ | 875/2500 [3:21:04<6:11:18, 13.71s/it] 35%|███▌ | 876/2500 [3:21:19<6:16:52, 13.92s/it] {'loss': 0.0018, 'grad_norm': 0.069310813769072, 'learning_rate': 6.495999999999999e-07, 'completion_length': 59.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0445556640625, 'epoch': 0.35} 35%|███▌ | 876/2500 [3:21:19<6:16:52, 13.92s/it] 35%|███▌ | 877/2500 [3:21:32<6:11:06, 13.72s/it] {'loss': 0.0009, 'grad_norm': 0.07625853564796779, 'learning_rate': 6.492e-07, 'completion_length': 57.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021820068359375, 'epoch': 0.35} 35%|███▌ | 877/2500 [3:21:32<6:11:06, 13.72s/it] 35%|███▌ | 878/2500 [3:21:47<6:16:18, 13.92s/it] {'loss': 0.0012, 'grad_norm': 0.11930817426257322, 'learning_rate': 6.488e-07, 'completion_length': 59.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030029296875, 'epoch': 0.35} 35%|███▌ | 878/2500 [3:21:47<6:16:18, 13.92s/it] 35%|███▌ | 879/2500 [3:22:01<6:17:49, 13.98s/it] {'loss': 0.0011, 'grad_norm': 0.3430062663020313, 'learning_rate': 6.483999999999999e-07, 'completion_length': 62.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.35} 35%|███▌ | 879/2500 [3:22:01<6:17:49, 13.98s/it] 35%|███▌ | 880/2500 [3:22:15<6:22:15, 14.16s/it] {'loss': 0.0011, 'grad_norm': 0.08058786147281445, 'learning_rate': 6.48e-07, 'completion_length': 57.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.35} 35%|███▌ | 880/2500 [3:22:15<6:22:15, 14.16s/it] 35%|███▌ | 881/2500 [3:22:29<6:16:25, 13.95s/it] {'loss': 0.0013, 'grad_norm': 0.0818159592931752, 'learning_rate': 6.476e-07, 'completion_length': 55.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0321044921875, 'epoch': 0.35} 35%|███▌ | 881/2500 [3:22:29<6:16:25, 13.95s/it] 35%|███▌ | 882/2500 [3:22:42<6:07:07, 13.61s/it] {'loss': 0.001, 'grad_norm': 0.06473107549858997, 'learning_rate': 6.471999999999999e-07, 'completion_length': 51.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0252685546875, 'epoch': 0.35} 35%|███▌ | 882/2500 [3:22:42<6:07:07, 13.61s/it] 35%|███▌ | 883/2500 [3:22:54<6:00:38, 13.38s/it] {'loss': 0.0014, 'grad_norm': 1.5121758054946144, 'learning_rate': 6.468e-07, 'completion_length': 47.94643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03515625, 'epoch': 0.35} 35%|███▌ | 883/2500 [3:22:54<6:00:38, 13.38s/it] 35%|███▌ | 884/2500 [3:23:08<6:00:15, 13.38s/it] {'loss': 0.0018, 'grad_norm': 0.08210821281148512, 'learning_rate': 6.464e-07, 'completion_length': 49.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04473876953125, 'epoch': 0.35} 35%|███▌ | 884/2500 [3:23:08<6:00:15, 13.38s/it] 35%|███▌ | 885/2500 [3:23:22<6:05:55, 13.59s/it] {'loss': 0.0011, 'grad_norm': 0.07115163965117381, 'learning_rate': 6.46e-07, 'completion_length': 53.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0272674560546875, 'epoch': 0.35} 35%|███▌ | 885/2500 [3:23:22<6:05:55, 13.59s/it] 35%|███▌ | 886/2500 [3:23:38<6:23:37, 14.26s/it] {'loss': 0.0018, 'grad_norm': 1.1674974023652105, 'learning_rate': 6.455999999999999e-07, 'completion_length': 64.60714721679688, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0457763671875, 'epoch': 0.35} 35%|███▌ | 886/2500 [3:23:38<6:23:37, 14.26s/it] 35%|███▌ | 887/2500 [3:23:52<6:23:33, 14.27s/it] {'loss': 0.0011, 'grad_norm': 1.2081404461355207, 'learning_rate': 6.452e-07, 'completion_length': 55.00000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.02685546875, 'epoch': 0.35} 35%|███▌ | 887/2500 [3:23:52<6:23:33, 14.27s/it] 36%|███▌ | 888/2500 [3:24:07<6:32:41, 14.62s/it] {'loss': 0.0015, 'grad_norm': 1.0131280220584398, 'learning_rate': 6.448000000000001e-07, 'completion_length': 60.69643020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0364990234375, 'epoch': 0.36} 36%|███▌ | 888/2500 [3:24:07<6:32:41, 14.62s/it] 36%|███▌ | 889/2500 [3:24:22<6:32:45, 14.63s/it] {'loss': 0.0014, 'grad_norm': 0.07850991168512651, 'learning_rate': 6.443999999999999e-07, 'completion_length': 61.392860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.034423828125, 'epoch': 0.36} 36%|███▌ | 889/2500 [3:24:22<6:32:45, 14.63s/it] 36%|███▌ | 890/2500 [3:24:36<6:25:55, 14.38s/it] {'loss': 0.0006, 'grad_norm': 0.09379257760171889, 'learning_rate': 6.44e-07, 'completion_length': 52.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0139923095703125, 'epoch': 0.36} 36%|███▌ | 890/2500 [3:24:36<6:25:55, 14.38s/it] 36%|███▌ | 891/2500 [3:24:50<6:21:49, 14.24s/it] {'loss': 0.0011, 'grad_norm': 3.2049234019661212, 'learning_rate': 6.436e-07, 'completion_length': 56.62500190734863, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.0357142873108387, 'kl': 0.02642822265625, 'epoch': 0.36} 36%|███▌ | 891/2500 [3:24:50<6:21:49, 14.24s/it] 36%|███▌ | 892/2500 [3:25:03<6:14:21, 13.97s/it] {'loss': 0.0014, 'grad_norm': 0.11237375615973787, 'learning_rate': 6.431999999999999e-07, 'completion_length': 49.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03375244140625, 'epoch': 0.36} 36%|███▌ | 892/2500 [3:25:03<6:14:21, 13.97s/it] 36%|███▌ | 893/2500 [3:25:17<6:13:22, 13.94s/it] {'loss': 0.0014, 'grad_norm': 0.1769961880406567, 'learning_rate': 6.428e-07, 'completion_length': 55.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03448486328125, 'epoch': 0.36} 36%|███▌ | 893/2500 [3:25:17<6:13:22, 13.94s/it] 36%|███▌ | 894/2500 [3:25:30<6:05:55, 13.67s/it] {'loss': 0.0015, 'grad_norm': 2.0598698114585274, 'learning_rate': 6.424e-07, 'completion_length': 57.66071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0364990234375, 'epoch': 0.36} 36%|███▌ | 894/2500 [3:25:30<6:05:55, 13.67s/it] 36%|███▌ | 895/2500 [3:25:44<6:05:57, 13.68s/it] {'loss': 0.0009, 'grad_norm': 0.09599402780364111, 'learning_rate': 6.42e-07, 'completion_length': 55.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02154541015625, 'epoch': 0.36} 36%|███▌ | 895/2500 [3:25:44<6:05:57, 13.68s/it] 36%|███▌ | 896/2500 [3:25:58<6:07:11, 13.74s/it] {'loss': 0.001, 'grad_norm': 0.1100861349714222, 'learning_rate': 6.415999999999999e-07, 'completion_length': 53.21428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025146484375, 'epoch': 0.36} 36%|███▌ | 896/2500 [3:25:58<6:07:11, 13.74s/it] 36%|███▌ | 897/2500 [3:26:11<6:07:56, 13.77s/it] {'loss': 0.0009, 'grad_norm': 0.0984367955291419, 'learning_rate': 6.412e-07, 'completion_length': 58.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0216064453125, 'epoch': 0.36} 36%|███▌ | 897/2500 [3:26:11<6:07:56, 13.77s/it] 36%|███▌ | 898/2500 [3:26:25<6:09:01, 13.82s/it] {'loss': 0.0007, 'grad_norm': 0.1857624625293374, 'learning_rate': 6.408e-07, 'completion_length': 58.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017242431640625, 'epoch': 0.36} 36%|███▌ | 898/2500 [3:26:25<6:09:01, 13.82s/it] 36%|███▌ | 899/2500 [3:26:39<6:05:34, 13.70s/it] {'loss': 0.0008, 'grad_norm': 0.08373940449442888, 'learning_rate': 6.403999999999999e-07, 'completion_length': 51.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02032470703125, 'epoch': 0.36} 36%|███▌ | 899/2500 [3:26:39<6:05:34, 13.70s/it] 36%|███▌ | 900/2500 [3:26:52<6:01:46, 13.57s/it] {'loss': 0.0014, 'grad_norm': 0.22133590584985272, 'learning_rate': 6.4e-07, 'completion_length': 57.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03558349609375, 'epoch': 0.36} 36%|███▌ | 900/2500 [3:26:52<6:01:46, 13.57s/it] 36%|███▌ | 901/2500 [3:28:05<13:54:14, 31.30s/it] {'loss': 0.0008, 'grad_norm': 0.1964178937170366, 'learning_rate': 6.395999999999999e-07, 'completion_length': 56.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0208740234375, 'epoch': 0.36} 36%|███▌ | 901/2500 [3:28:05<13:54:14, 31.30s/it] 36%|███▌ | 902/2500 [3:28:18<11:29:59, 25.91s/it] {'loss': 0.0017, 'grad_norm': 0.08393105854746684, 'learning_rate': 6.392e-07, 'completion_length': 55.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0426025390625, 'epoch': 0.36} 36%|███▌ | 902/2500 [3:28:18<11:29:59, 25.91s/it] 36%|███▌ | 903/2500 [3:28:34<10:13:09, 23.04s/it] {'loss': 0.001, 'grad_norm': 0.22353662103390382, 'learning_rate': 6.388e-07, 'completion_length': 69.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0238037109375, 'epoch': 0.36} 36%|███▌ | 903/2500 [3:28:34<10:13:09, 23.04s/it] 36%|███▌ | 904/2500 [3:28:50<9:13:06, 20.79s/it] {'loss': 0.0012, 'grad_norm': 0.07199758742591472, 'learning_rate': 6.383999999999999e-07, 'completion_length': 59.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03009033203125, 'epoch': 0.36} 36%|███▌ | 904/2500 [3:28:50<9:13:06, 20.79s/it] 36%|███▌ | 905/2500 [3:29:04<8:16:31, 18.68s/it] {'loss': 0.0009, 'grad_norm': 0.2007513703642929, 'learning_rate': 6.38e-07, 'completion_length': 62.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.36} 36%|███▌ | 905/2500 [3:29:04<8:16:31, 18.68s/it] 36%|███▌ | 906/2500 [3:29:18<7:40:05, 17.32s/it] {'loss': 0.0017, 'grad_norm': 0.06762432151022353, 'learning_rate': 6.375999999999999e-07, 'completion_length': 58.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0428466796875, 'epoch': 0.36} 36%|███▌ | 906/2500 [3:29:18<7:40:05, 17.32s/it] 36%|███▋ | 907/2500 [3:29:33<7:23:15, 16.70s/it] {'loss': 0.0022, 'grad_norm': 1.5068860159666804, 'learning_rate': 6.371999999999999e-07, 'completion_length': 59.660715103149414, 'rewards/accuracy_reward': 0.8392857611179352, 'rewards/format_reward': 1.0, 'reward': 1.8392857909202576, 'reward_std': 0.0357142873108387, 'kl': 0.0552978515625, 'epoch': 0.36} 36%|███▋ | 907/2500 [3:29:33<7:23:15, 16.70s/it] 36%|███▋ | 908/2500 [3:29:46<6:50:55, 15.49s/it] {'loss': 0.0014, 'grad_norm': 0.6559248025050372, 'learning_rate': 6.368e-07, 'completion_length': 49.142860412597656, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0341796875, 'epoch': 0.36} 36%|███▋ | 908/2500 [3:29:46<6:50:55, 15.49s/it] 36%|███▋ | 909/2500 [3:30:00<6:37:48, 15.00s/it] {'loss': 0.0014, 'grad_norm': 0.09297395190691801, 'learning_rate': 6.364e-07, 'completion_length': 55.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0360107421875, 'epoch': 0.36} 36%|███▋ | 909/2500 [3:30:00<6:37:48, 15.00s/it] 36%|███▋ | 910/2500 [3:30:13<6:26:27, 14.58s/it] {'loss': 0.0009, 'grad_norm': 0.06886269853258986, 'learning_rate': 6.36e-07, 'completion_length': 59.03571891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02294921875, 'epoch': 0.36} 36%|███▋ | 910/2500 [3:30:13<6:26:27, 14.58s/it] 36%|███▋ | 911/2500 [3:30:26<6:15:43, 14.19s/it] {'loss': 0.0007, 'grad_norm': 0.12839473919339817, 'learning_rate': 6.356e-07, 'completion_length': 50.785715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01702880859375, 'epoch': 0.36} 36%|███▋ | 911/2500 [3:30:26<6:15:43, 14.19s/it] 36%|███▋ | 912/2500 [3:30:41<6:20:49, 14.39s/it] {'loss': 0.0008, 'grad_norm': 0.11149867482088646, 'learning_rate': 6.352e-07, 'completion_length': 61.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02093505859375, 'epoch': 0.36} 36%|███▋ | 912/2500 [3:30:41<6:20:49, 14.39s/it] 37%|███▋ | 913/2500 [3:30:56<6:23:38, 14.50s/it] {'loss': 0.0013, 'grad_norm': 0.10393507814145254, 'learning_rate': 6.348e-07, 'completion_length': 54.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03271484375, 'epoch': 0.37} 37%|███▋ | 913/2500 [3:30:56<6:23:38, 14.50s/it] 37%|███▋ | 914/2500 [3:31:11<6:24:25, 14.54s/it] {'loss': 0.0013, 'grad_norm': 0.21685053130589987, 'learning_rate': 6.343999999999999e-07, 'completion_length': 63.732147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03314208984375, 'epoch': 0.37} 37%|███▋ | 914/2500 [3:31:11<6:24:25, 14.54s/it] 37%|███▋ | 915/2500 [3:31:24<6:10:42, 14.03s/it] {'loss': 0.0009, 'grad_norm': 0.07720551535494295, 'learning_rate': 6.34e-07, 'completion_length': 49.83928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02203369140625, 'epoch': 0.37} 37%|███▋ | 915/2500 [3:31:24<6:10:42, 14.03s/it] 37%|███▋ | 916/2500 [3:31:43<6:52:48, 15.64s/it] {'loss': 0.0012, 'grad_norm': 0.42545561694039624, 'learning_rate': 6.336000000000001e-07, 'completion_length': 67.94643020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.02947998046875, 'epoch': 0.37} 37%|███▋ | 916/2500 [3:31:43<6:52:48, 15.64s/it] 37%|███▋ | 917/2500 [3:31:56<6:32:51, 14.89s/it] {'loss': 0.0009, 'grad_norm': 0.08835339140145688, 'learning_rate': 6.331999999999999e-07, 'completion_length': 49.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02337646484375, 'epoch': 0.37} 37%|███▋ | 917/2500 [3:31:56<6:32:51, 14.89s/it] 37%|███▋ | 918/2500 [3:32:10<6:24:31, 14.58s/it] {'loss': 0.0012, 'grad_norm': 1.9677951768001825, 'learning_rate': 6.328e-07, 'completion_length': 59.00000190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.029632568359375, 'epoch': 0.37} 37%|███▋ | 918/2500 [3:32:10<6:24:31, 14.58s/it] 37%|███▋ | 919/2500 [3:32:25<6:24:09, 14.58s/it] {'loss': 0.0009, 'grad_norm': 0.07412561931283537, 'learning_rate': 6.324e-07, 'completion_length': 58.857147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02325439453125, 'epoch': 0.37} 37%|███▋ | 919/2500 [3:32:25<6:24:09, 14.58s/it] 37%|███▋ | 920/2500 [3:32:39<6:20:13, 14.44s/it] {'loss': 0.0012, 'grad_norm': 0.07132424863571282, 'learning_rate': 6.319999999999999e-07, 'completion_length': 60.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03045654296875, 'epoch': 0.37} 37%|███▋ | 920/2500 [3:32:39<6:20:13, 14.44s/it] 37%|███▋ | 921/2500 [3:32:53<6:18:38, 14.39s/it] {'loss': 0.0007, 'grad_norm': 1.1105453699773828, 'learning_rate': 6.316e-07, 'completion_length': 62.32143211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.017242431640625, 'epoch': 0.37} 37%|███▋ | 921/2500 [3:32:53<6:18:38, 14.39s/it] 37%|███▋ | 922/2500 [3:33:09<6:30:23, 14.84s/it] {'loss': 0.0013, 'grad_norm': 0.08680161518562751, 'learning_rate': 6.312e-07, 'completion_length': 59.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0335693359375, 'epoch': 0.37} 37%|███▋ | 922/2500 [3:33:09<6:30:23, 14.84s/it] 37%|███▋ | 923/2500 [3:33:22<6:19:07, 14.42s/it] {'loss': 0.0011, 'grad_norm': 1.602684687901735, 'learning_rate': 6.308e-07, 'completion_length': 53.60714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02813720703125, 'epoch': 0.37} 37%|███▋ | 923/2500 [3:33:22<6:19:07, 14.42s/it] 37%|███▋ | 924/2500 [3:33:37<6:24:53, 14.65s/it] {'loss': 0.0012, 'grad_norm': 0.07165513937781147, 'learning_rate': 6.303999999999999e-07, 'completion_length': 62.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029052734375, 'epoch': 0.37} 37%|███▋ | 924/2500 [3:33:37<6:24:53, 14.65s/it] 37%|███▋ | 925/2500 [3:33:51<6:17:11, 14.37s/it] {'loss': 0.0012, 'grad_norm': 0.10108255930927267, 'learning_rate': 6.3e-07, 'completion_length': 49.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0303955078125, 'epoch': 0.37} 37%|███▋ | 925/2500 [3:33:51<6:17:11, 14.37s/it] 37%|███▋ | 926/2500 [3:34:05<6:09:16, 14.08s/it] {'loss': 0.0019, 'grad_norm': 0.10005261342366852, 'learning_rate': 6.296e-07, 'completion_length': 52.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0462646484375, 'epoch': 0.37} 37%|███▋ | 926/2500 [3:34:05<6:09:16, 14.08s/it] 37%|███▋ | 927/2500 [3:34:19<6:13:23, 14.24s/it] {'loss': 0.0011, 'grad_norm': 0.11975377567592874, 'learning_rate': 6.291999999999999e-07, 'completion_length': 54.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.37} 37%|███▋ | 927/2500 [3:34:19<6:13:23, 14.24s/it] 37%|███▋ | 928/2500 [3:34:32<6:03:03, 13.86s/it] {'loss': 0.0014, 'grad_norm': 0.07734143344348965, 'learning_rate': 6.288e-07, 'completion_length': 51.42857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03619384765625, 'epoch': 0.37} 37%|███▋ | 928/2500 [3:34:32<6:03:03, 13.86s/it] 37%|███▋ | 929/2500 [3:34:46<6:02:11, 13.83s/it] {'loss': 0.0006, 'grad_norm': 0.11888680407957869, 'learning_rate': 6.283999999999999e-07, 'completion_length': 54.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01580810546875, 'epoch': 0.37} 37%|███▋ | 929/2500 [3:34:46<6:02:11, 13.83s/it] 37%|███▋ | 930/2500 [3:35:00<6:00:34, 13.78s/it] {'loss': 0.0015, 'grad_norm': 0.1684109586535366, 'learning_rate': 6.28e-07, 'completion_length': 54.69643020629883, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.036376953125, 'epoch': 0.37} 37%|███▋ | 930/2500 [3:35:00<6:00:34, 13.78s/it] 37%|███▋ | 931/2500 [3:35:14<6:06:16, 14.01s/it] {'loss': 0.0021, 'grad_norm': 0.12915946136011766, 'learning_rate': 6.276e-07, 'completion_length': 56.58928680419922, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.05126953125, 'epoch': 0.37} 37%|███▋ | 931/2500 [3:35:14<6:06:16, 14.01s/it] 37%|███▋ | 932/2500 [3:35:28<6:02:14, 13.86s/it] {'loss': 0.0013, 'grad_norm': 0.10015548575129599, 'learning_rate': 6.271999999999999e-07, 'completion_length': 53.96428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0328369140625, 'epoch': 0.37} 37%|███▋ | 932/2500 [3:35:28<6:02:14, 13.86s/it] 37%|███▋ | 933/2500 [3:35:42<6:09:41, 14.16s/it] {'loss': 0.0014, 'grad_norm': 0.09760948230084915, 'learning_rate': 6.268e-07, 'completion_length': 56.875003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03515625, 'epoch': 0.37} 37%|███▋ | 933/2500 [3:35:42<6:09:41, 14.16s/it] 37%|███▋ | 934/2500 [3:35:59<6:27:01, 14.83s/it] {'loss': 0.001, 'grad_norm': 0.08496573716712051, 'learning_rate': 6.263999999999999e-07, 'completion_length': 54.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02532958984375, 'epoch': 0.37} 37%|███▋ | 934/2500 [3:35:59<6:27:01, 14.83s/it] 37%|███▋ | 935/2500 [3:36:13<6:20:22, 14.58s/it] {'loss': 0.0009, 'grad_norm': 0.10614279389202061, 'learning_rate': 6.26e-07, 'completion_length': 67.50000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02288818359375, 'epoch': 0.37} 37%|███▋ | 935/2500 [3:36:13<6:20:22, 14.58s/it] 37%|███▋ | 936/2500 [3:36:28<6:23:19, 14.71s/it] {'loss': 0.0011, 'grad_norm': 0.06550100037926275, 'learning_rate': 6.256e-07, 'completion_length': 62.964290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028076171875, 'epoch': 0.37} 37%|███▋ | 936/2500 [3:36:28<6:23:19, 14.71s/it] 37%|███▋ | 937/2500 [3:36:42<6:16:24, 14.45s/it] {'loss': 0.0014, 'grad_norm': 0.06346421084990364, 'learning_rate': 6.252e-07, 'completion_length': 56.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0357666015625, 'epoch': 0.37} 37%|███▋ | 937/2500 [3:36:42<6:16:24, 14.45s/it] 38%|███▊ | 938/2500 [3:36:57<6:24:46, 14.78s/it] {'loss': 0.0008, 'grad_norm': 0.08289359641058018, 'learning_rate': 6.248e-07, 'completion_length': 61.125003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020660400390625, 'epoch': 0.38} 38%|███▊ | 938/2500 [3:36:57<6:24:46, 14.78s/it] 38%|███▊ | 939/2500 [3:37:11<6:15:08, 14.42s/it] {'loss': 0.0023, 'grad_norm': 1.8860618004898644, 'learning_rate': 6.243999999999999e-07, 'completion_length': 55.41071701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.056640625, 'epoch': 0.38} 38%|███▊ | 939/2500 [3:37:11<6:15:08, 14.42s/it] 38%|███▊ | 940/2500 [3:37:25<6:10:39, 14.26s/it] {'loss': 0.0015, 'grad_norm': 0.09806283711780438, 'learning_rate': 6.24e-07, 'completion_length': 57.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03662109375, 'epoch': 0.38} 38%|███▊ | 940/2500 [3:37:25<6:10:39, 14.26s/it] 38%|███▊ | 941/2500 [3:37:41<6:23:44, 14.77s/it] {'loss': 0.001, 'grad_norm': 0.1470456208214165, 'learning_rate': 6.236e-07, 'completion_length': 60.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02386474609375, 'epoch': 0.38} 38%|███▊ | 941/2500 [3:37:41<6:23:44, 14.77s/it] 38%|███▊ | 942/2500 [3:37:55<6:20:35, 14.66s/it] {'loss': 0.0007, 'grad_norm': 0.06949455940788747, 'learning_rate': 6.231999999999999e-07, 'completion_length': 56.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0177001953125, 'epoch': 0.38} 38%|███▊ | 942/2500 [3:37:55<6:20:35, 14.66s/it] 38%|███▊ | 943/2500 [3:38:11<6:28:38, 14.98s/it] {'loss': 0.0009, 'grad_norm': 0.09026298766877337, 'learning_rate': 6.228e-07, 'completion_length': 64.58929061889648, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0223388671875, 'epoch': 0.38} 38%|███▊ | 943/2500 [3:38:11<6:28:38, 14.98s/it] 38%|███▊ | 944/2500 [3:38:24<6:14:19, 14.43s/it] {'loss': 0.0012, 'grad_norm': 2.134143491658505, 'learning_rate': 6.224e-07, 'completion_length': 53.53571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03045654296875, 'epoch': 0.38} 38%|███▊ | 944/2500 [3:38:24<6:14:19, 14.43s/it] 38%|███▊ | 945/2500 [3:38:38<6:12:44, 14.38s/it] {'loss': 0.0009, 'grad_norm': 0.0647406099619624, 'learning_rate': 6.219999999999999e-07, 'completion_length': 53.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02203369140625, 'epoch': 0.38} 38%|███▊ | 945/2500 [3:38:38<6:12:44, 14.38s/it] 38%|███▊ | 946/2500 [3:38:54<6:20:49, 14.70s/it] {'loss': 0.0005, 'grad_norm': 0.09792849225876757, 'learning_rate': 6.216e-07, 'completion_length': 60.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01336669921875, 'epoch': 0.38} 38%|███▊ | 946/2500 [3:38:54<6:20:49, 14.70s/it] 38%|███▊ | 947/2500 [3:39:08<6:20:17, 14.69s/it] {'loss': 0.0018, 'grad_norm': 0.06962197181918345, 'learning_rate': 6.212e-07, 'completion_length': 59.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04541015625, 'epoch': 0.38} 38%|███▊ | 947/2500 [3:39:08<6:20:17, 14.69s/it] 38%|███▊ | 948/2500 [3:39:22<6:11:47, 14.37s/it] {'loss': 0.0011, 'grad_norm': 0.0841070142901976, 'learning_rate': 6.208e-07, 'completion_length': 60.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.38} 38%|███▊ | 948/2500 [3:39:22<6:11:47, 14.37s/it] 38%|███▊ | 949/2500 [3:39:35<6:02:37, 14.03s/it] {'loss': 0.0008, 'grad_norm': 0.07055748780241786, 'learning_rate': 6.203999999999999e-07, 'completion_length': 52.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02032470703125, 'epoch': 0.38} 38%|███▊ | 949/2500 [3:39:35<6:02:37, 14.03s/it] 38%|███▊ | 950/2500 [3:39:49<5:57:03, 13.82s/it] {'loss': 0.0005, 'grad_norm': 0.08209088985617445, 'learning_rate': 6.2e-07, 'completion_length': 54.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013671875, 'epoch': 0.38} 38%|███▊ | 950/2500 [3:39:49<5:57:03, 13.82s/it] 38%|███▊ | 951/2500 [3:40:03<5:59:13, 13.91s/it] {'loss': 0.001, 'grad_norm': 0.22697535960590426, 'learning_rate': 6.196e-07, 'completion_length': 65.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02435302734375, 'epoch': 0.38} 38%|███▊ | 951/2500 [3:40:03<5:59:13, 13.91s/it] 38%|███▊ | 952/2500 [3:40:17<6:02:44, 14.06s/it] {'loss': 0.0016, 'grad_norm': 0.2804300956150772, 'learning_rate': 6.191999999999999e-07, 'completion_length': 65.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0386962890625, 'epoch': 0.38} 38%|███▊ | 952/2500 [3:40:17<6:02:44, 14.06s/it] 38%|███▊ | 953/2500 [3:40:31<6:03:28, 14.10s/it] {'loss': 0.0011, 'grad_norm': 2.2096704824470774, 'learning_rate': 6.188e-07, 'completion_length': 61.80357551574707, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0267333984375, 'epoch': 0.38} 38%|███▊ | 953/2500 [3:40:31<6:03:28, 14.10s/it] 38%|███▊ | 954/2500 [3:40:45<6:01:09, 14.02s/it] {'loss': 0.0013, 'grad_norm': 0.22770850438515225, 'learning_rate': 6.183999999999999e-07, 'completion_length': 59.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03350830078125, 'epoch': 0.38} 38%|███▊ | 954/2500 [3:40:45<6:01:09, 14.02s/it] 38%|███▊ | 955/2500 [3:40:59<6:03:00, 14.10s/it] {'loss': 0.0011, 'grad_norm': 0.06426605692653095, 'learning_rate': 6.18e-07, 'completion_length': 54.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02740478515625, 'epoch': 0.38} 38%|███▊ | 955/2500 [3:40:59<6:03:00, 14.10s/it] 38%|███▊ | 956/2500 [3:41:15<6:11:37, 14.44s/it] {'loss': 0.0015, 'grad_norm': 0.21707063935665247, 'learning_rate': 6.176e-07, 'completion_length': 61.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.38} 38%|███▊ | 956/2500 [3:41:15<6:11:37, 14.44s/it] 38%|███▊ | 957/2500 [3:41:29<6:12:19, 14.48s/it] {'loss': 0.0011, 'grad_norm': 0.0781806579377065, 'learning_rate': 6.172e-07, 'completion_length': 69.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02801513671875, 'epoch': 0.38} 38%|███▊ | 957/2500 [3:41:29<6:12:19, 14.48s/it] 38%|███▊ | 958/2500 [3:41:43<6:03:04, 14.13s/it] {'loss': 0.0008, 'grad_norm': 0.08535178164557769, 'learning_rate': 6.168e-07, 'completion_length': 56.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01873779296875, 'epoch': 0.38} 38%|███▊ | 958/2500 [3:41:43<6:03:04, 14.13s/it] 38%|███▊ | 959/2500 [3:41:58<6:13:04, 14.53s/it] {'loss': 0.0008, 'grad_norm': 1.155500107859008, 'learning_rate': 6.163999999999999e-07, 'completion_length': 61.875003814697266, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.021026611328125, 'epoch': 0.38} 38%|███▊ | 959/2500 [3:41:58<6:13:04, 14.53s/it] 38%|███▊ | 960/2500 [3:42:11<6:02:54, 14.14s/it] {'loss': 0.0009, 'grad_norm': 0.0790630107635321, 'learning_rate': 6.16e-07, 'completion_length': 54.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02130126953125, 'epoch': 0.38} 38%|███▊ | 960/2500 [3:42:11<6:02:54, 14.14s/it] 38%|███▊ | 961/2500 [3:42:26<6:05:23, 14.25s/it] {'loss': 0.0013, 'grad_norm': 0.0592286900300209, 'learning_rate': 6.156e-07, 'completion_length': 60.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03369140625, 'epoch': 0.38} 38%|███▊ | 961/2500 [3:42:26<6:05:23, 14.25s/it] 38%|███▊ | 962/2500 [3:42:39<6:00:06, 14.05s/it] {'loss': 0.0014, 'grad_norm': 0.16589812497670828, 'learning_rate': 6.152e-07, 'completion_length': 54.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03424072265625, 'epoch': 0.38} 38%|███▊ | 962/2500 [3:42:39<6:00:06, 14.05s/it] 39%|███▊ | 963/2500 [3:42:53<5:57:37, 13.96s/it] {'loss': 0.0019, 'grad_norm': 0.05375277719818683, 'learning_rate': 6.148e-07, 'completion_length': 60.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04656982421875, 'epoch': 0.39} 39%|███▊ | 963/2500 [3:42:53<5:57:37, 13.96s/it] 39%|███▊ | 964/2500 [3:43:07<5:53:54, 13.82s/it] {'loss': 0.0009, 'grad_norm': 1.2690925705364877, 'learning_rate': 6.143999999999999e-07, 'completion_length': 61.62500190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.021881103515625, 'epoch': 0.39} 39%|███▊ | 964/2500 [3:43:07<5:53:54, 13.82s/it] 39%|███▊ | 965/2500 [3:43:22<6:04:16, 14.24s/it] {'loss': 0.0012, 'grad_norm': 1.3837821938384363, 'learning_rate': 6.14e-07, 'completion_length': 64.87500381469727, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0296630859375, 'epoch': 0.39} 39%|███▊ | 965/2500 [3:43:22<6:04:16, 14.24s/it] 39%|███▊ | 966/2500 [3:43:36<6:03:11, 14.21s/it] {'loss': 0.0022, 'grad_norm': 2.2841592518676874, 'learning_rate': 6.136e-07, 'completion_length': 55.48214530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0557861328125, 'epoch': 0.39} 39%|███▊ | 966/2500 [3:43:36<6:03:11, 14.21s/it] 39%|███▊ | 967/2500 [3:43:50<6:02:44, 14.20s/it] {'loss': 0.0013, 'grad_norm': 0.10291130685540681, 'learning_rate': 6.131999999999999e-07, 'completion_length': 55.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03155517578125, 'epoch': 0.39} 39%|███▊ | 967/2500 [3:43:50<6:02:44, 14.20s/it] 39%|███▊ | 968/2500 [3:44:05<6:07:48, 14.41s/it] {'loss': 0.0013, 'grad_norm': 0.5207754668992418, 'learning_rate': 6.128e-07, 'completion_length': 53.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03204345703125, 'epoch': 0.39} 39%|███▊ | 968/2500 [3:44:05<6:07:48, 14.41s/it] 39%|███▉ | 969/2500 [3:44:18<6:00:38, 14.13s/it] {'loss': 0.0011, 'grad_norm': 0.09819384490111403, 'learning_rate': 6.124000000000001e-07, 'completion_length': 48.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0286865234375, 'epoch': 0.39} 39%|███▉ | 969/2500 [3:44:18<6:00:38, 14.13s/it] 39%|███▉ | 970/2500 [3:44:32<5:59:06, 14.08s/it] {'loss': 0.0011, 'grad_norm': 0.0803806522315354, 'learning_rate': 6.119999999999999e-07, 'completion_length': 54.160715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02801513671875, 'epoch': 0.39} 39%|███▉ | 970/2500 [3:44:32<5:59:06, 14.08s/it] 39%|███▉ | 971/2500 [3:44:46<5:55:45, 13.96s/it] {'loss': 0.0012, 'grad_norm': 0.09410310418434233, 'learning_rate': 6.116e-07, 'completion_length': 57.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029144287109375, 'epoch': 0.39} 39%|███▉ | 971/2500 [3:44:46<5:55:45, 13.96s/it] 39%|███▉ | 972/2500 [3:45:00<5:55:42, 13.97s/it] {'loss': 0.0014, 'grad_norm': 1.9902932828267235, 'learning_rate': 6.112e-07, 'completion_length': 53.32143020629883, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.03411865234375, 'epoch': 0.39} 39%|███▉ | 972/2500 [3:45:00<5:55:42, 13.97s/it] 39%|███▉ | 973/2500 [3:45:16<6:09:05, 14.50s/it] {'loss': 0.0012, 'grad_norm': 0.12067521340657825, 'learning_rate': 6.107999999999999e-07, 'completion_length': 55.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03106689453125, 'epoch': 0.39} 39%|███▉ | 973/2500 [3:45:16<6:09:05, 14.50s/it] 39%|███▉ | 974/2500 [3:45:30<6:05:56, 14.39s/it] {'loss': 0.0017, 'grad_norm': 0.08198564012739543, 'learning_rate': 6.104e-07, 'completion_length': 54.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04376220703125, 'epoch': 0.39} 39%|███▉ | 974/2500 [3:45:30<6:05:56, 14.39s/it] 39%|███▉ | 975/2500 [3:45:44<6:03:29, 14.30s/it] {'loss': 0.0012, 'grad_norm': 0.06190154466248263, 'learning_rate': 6.1e-07, 'completion_length': 55.910715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02935791015625, 'epoch': 0.39} 39%|███▉ | 975/2500 [3:45:44<6:03:29, 14.30s/it] 39%|███▉ | 976/2500 [3:45:59<6:07:07, 14.45s/it] {'loss': 0.0009, 'grad_norm': 0.10185640300089009, 'learning_rate': 6.096e-07, 'completion_length': 69.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02191162109375, 'epoch': 0.39} 39%|███▉ | 976/2500 [3:45:59<6:07:07, 14.45s/it] 39%|███▉ | 977/2500 [3:46:13<6:04:12, 14.35s/it] {'loss': 0.0025, 'grad_norm': 0.09882398366515724, 'learning_rate': 6.091999999999999e-07, 'completion_length': 55.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0618896484375, 'epoch': 0.39} 39%|███▉ | 977/2500 [3:46:13<6:04:12, 14.35s/it] 39%|███▉ | 978/2500 [3:46:28<6:05:55, 14.43s/it] {'loss': 0.0017, 'grad_norm': 0.0880111370006699, 'learning_rate': 6.088e-07, 'completion_length': 64.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04345703125, 'epoch': 0.39} 39%|███▉ | 978/2500 [3:46:28<6:05:55, 14.43s/it] 39%|███▉ | 979/2500 [3:46:42<6:04:37, 14.38s/it] {'loss': 0.0012, 'grad_norm': 0.07388120257175532, 'learning_rate': 6.084000000000001e-07, 'completion_length': 57.17857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03021240234375, 'epoch': 0.39} 39%|███▉ | 979/2500 [3:46:42<6:04:37, 14.38s/it] 39%|███▉ | 980/2500 [3:46:55<5:57:13, 14.10s/it] {'loss': 0.0007, 'grad_norm': 0.08077072624971196, 'learning_rate': 6.079999999999999e-07, 'completion_length': 58.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018341064453125, 'epoch': 0.39} 39%|███▉ | 980/2500 [3:46:55<5:57:13, 14.10s/it] 39%|███▉ | 981/2500 [3:47:10<6:02:52, 14.33s/it] {'loss': 0.0015, 'grad_norm': 0.08626741274022388, 'learning_rate': 6.076e-07, 'completion_length': 64.14286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03851318359375, 'epoch': 0.39} 39%|███▉ | 981/2500 [3:47:10<6:02:52, 14.33s/it] 39%|███▉ | 982/2500 [3:47:24<5:59:47, 14.22s/it] {'loss': 0.001, 'grad_norm': 1.7370546893158967, 'learning_rate': 6.072e-07, 'completion_length': 61.285715103149414, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02490234375, 'epoch': 0.39} 39%|███▉ | 982/2500 [3:47:24<5:59:47, 14.22s/it] 39%|███▉ | 983/2500 [3:47:38<5:59:46, 14.23s/it] {'loss': 0.0018, 'grad_norm': 0.09258719572488826, 'learning_rate': 6.068e-07, 'completion_length': 59.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0443115234375, 'epoch': 0.39} 39%|███▉ | 983/2500 [3:47:38<5:59:46, 14.23s/it] 39%|███▉ | 984/2500 [3:47:58<6:42:17, 15.92s/it] {'loss': 0.0023, 'grad_norm': 0.4475349230906974, 'learning_rate': 6.064e-07, 'completion_length': 63.80357360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0572509765625, 'epoch': 0.39} 39%|███▉ | 984/2500 [3:47:58<6:42:17, 15.92s/it] 39%|███▉ | 985/2500 [3:48:12<6:24:03, 15.21s/it] {'loss': 0.001, 'grad_norm': 0.07129356268746202, 'learning_rate': 6.06e-07, 'completion_length': 50.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024261474609375, 'epoch': 0.39} 39%|███▉ | 985/2500 [3:48:12<6:24:03, 15.21s/it] 39%|███▉ | 986/2500 [3:48:26<6:13:42, 14.81s/it] {'loss': 0.0009, 'grad_norm': 0.11236347842362787, 'learning_rate': 6.056e-07, 'completion_length': 57.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0234375, 'epoch': 0.39} 39%|███▉ | 986/2500 [3:48:26<6:13:42, 14.81s/it] 39%|███▉ | 987/2500 [3:48:40<6:10:18, 14.69s/it] {'loss': 0.0014, 'grad_norm': 0.1512161952019647, 'learning_rate': 6.051999999999999e-07, 'completion_length': 63.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03497314453125, 'epoch': 0.39} 39%|███▉ | 987/2500 [3:48:40<6:10:18, 14.69s/it] 40%|███▉ | 988/2500 [3:48:53<5:57:12, 14.18s/it] {'loss': 0.0019, 'grad_norm': 1.4260331218903397, 'learning_rate': 6.048e-07, 'completion_length': 47.78571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.048583984375, 'epoch': 0.4} 40%|███▉ | 988/2500 [3:48:53<5:57:12, 14.18s/it] 40%|███▉ | 989/2500 [3:49:07<5:53:47, 14.05s/it] {'loss': 0.0017, 'grad_norm': 0.08326659066800923, 'learning_rate': 6.044e-07, 'completion_length': 52.80357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.041748046875, 'epoch': 0.4} 40%|███▉ | 989/2500 [3:49:07<5:53:47, 14.05s/it] 40%|███▉ | 990/2500 [3:49:22<6:04:28, 14.48s/it] {'loss': 0.0013, 'grad_norm': 0.08884574108948261, 'learning_rate': 6.04e-07, 'completion_length': 63.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03204345703125, 'epoch': 0.4} 40%|███▉ | 990/2500 [3:49:22<6:04:28, 14.48s/it] 40%|███▉ | 991/2500 [3:49:37<6:03:06, 14.44s/it] {'loss': 0.0011, 'grad_norm': 0.0560020556816866, 'learning_rate': 6.036e-07, 'completion_length': 66.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02813720703125, 'epoch': 0.4} 40%|███▉ | 991/2500 [3:49:37<6:03:06, 14.44s/it] 40%|███▉ | 992/2500 [3:49:50<5:54:41, 14.11s/it] {'loss': 0.0012, 'grad_norm': 0.2702875559603002, 'learning_rate': 6.031999999999999e-07, 'completion_length': 52.92857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0302734375, 'epoch': 0.4} 40%|███▉ | 992/2500 [3:49:50<5:54:41, 14.11s/it] 40%|███▉ | 993/2500 [3:50:05<5:57:37, 14.24s/it] {'loss': 0.0013, 'grad_norm': 0.08481235366779157, 'learning_rate': 6.028e-07, 'completion_length': 55.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03350830078125, 'epoch': 0.4} 40%|███▉ | 993/2500 [3:50:05<5:57:37, 14.24s/it] 40%|███▉ | 994/2500 [3:50:19<5:58:50, 14.30s/it] {'loss': 0.0018, 'grad_norm': 0.11861382409819758, 'learning_rate': 6.024e-07, 'completion_length': 61.69643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.044921875, 'epoch': 0.4} 40%|███▉ | 994/2500 [3:50:19<5:58:50, 14.30s/it] 40%|███▉ | 995/2500 [3:50:33<5:55:15, 14.16s/it] {'loss': 0.0012, 'grad_norm': 0.1317177408264261, 'learning_rate': 6.019999999999999e-07, 'completion_length': 59.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03070068359375, 'epoch': 0.4} 40%|███▉ | 995/2500 [3:50:33<5:55:15, 14.16s/it] 40%|███▉ | 996/2500 [3:50:47<5:53:25, 14.10s/it] {'loss': 0.0008, 'grad_norm': 0.08776550692360854, 'learning_rate': 6.016e-07, 'completion_length': 59.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0203857421875, 'epoch': 0.4} 40%|███▉ | 996/2500 [3:50:47<5:53:25, 14.10s/it] 40%|███▉ | 997/2500 [3:51:01<5:52:06, 14.06s/it] {'loss': 0.0015, 'grad_norm': 0.09235904020096818, 'learning_rate': 6.012e-07, 'completion_length': 60.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0386962890625, 'epoch': 0.4} 40%|███▉ | 997/2500 [3:51:01<5:52:06, 14.06s/it] 40%|███▉ | 998/2500 [3:51:14<5:47:09, 13.87s/it] {'loss': 0.0005, 'grad_norm': 0.12008989492595155, 'learning_rate': 6.007999999999999e-07, 'completion_length': 53.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013519287109375, 'epoch': 0.4} 40%|███▉ | 998/2500 [3:51:14<5:47:09, 13.87s/it] 40%|███▉ | 999/2500 [3:51:28<5:45:27, 13.81s/it] {'loss': 0.0016, 'grad_norm': 2.1196386116534374, 'learning_rate': 6.004e-07, 'completion_length': 63.83928871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642858505249023, 'reward_std': 0.0714285746216774, 'kl': 0.0404052734375, 'epoch': 0.4} 40%|███▉ | 999/2500 [3:51:28<5:45:27, 13.81s/it] 40%|████ | 1000/2500 [3:51:42<5:47:54, 13.92s/it] {'loss': 0.0017, 'grad_norm': 0.08099372114021308, 'learning_rate': 6e-07, 'completion_length': 58.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04278564453125, 'epoch': 0.4} 40%|████ | 1000/2500 [3:51:42<5:47:54, 13.92s/it] 40%|████ | 1001/2500 [3:52:46<12:01:55, 28.90s/it] {'loss': 0.0015, 'grad_norm': 0.09034622136724846, 'learning_rate': 5.995999999999999e-07, 'completion_length': 54.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03826904296875, 'epoch': 0.4} 40%|████ | 1001/2500 [3:52:46<12:01:55, 28.90s/it] 40%|████ | 1002/2500 [3:52:54<9:23:44, 22.58s/it] {'loss': 0.0013, 'grad_norm': 0.1018186957585425, 'learning_rate': 5.991999999999999e-07, 'completion_length': 48.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03277587890625, 'epoch': 0.4} 40%|████ | 1002/2500 [3:52:54<9:23:44, 22.58s/it] 40%|████ | 1003/2500 [3:53:01<7:33:00, 18.16s/it] {'loss': 0.0015, 'grad_norm': 0.09716588725464204, 'learning_rate': 5.988e-07, 'completion_length': 54.53571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03662109375, 'epoch': 0.4} 40%|████ | 1003/2500 [3:53:01<7:33:00, 18.16s/it] 40%|████ | 1004/2500 [3:53:10<6:21:48, 15.31s/it] {'loss': 0.0012, 'grad_norm': 0.09508612621265257, 'learning_rate': 5.984000000000001e-07, 'completion_length': 61.23214340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0301513671875, 'epoch': 0.4} 40%|████ | 1004/2500 [3:53:10<6:21:48, 15.31s/it] 40%|████ | 1005/2500 [3:53:19<5:31:49, 13.32s/it] {'loss': 0.0006, 'grad_norm': 0.15988339045045835, 'learning_rate': 5.979999999999999e-07, 'completion_length': 58.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015380859375, 'epoch': 0.4} 40%|████ | 1005/2500 [3:53:19<5:31:49, 13.32s/it] 40%|████ | 1006/2500 [3:53:27<4:56:40, 11.91s/it] {'loss': 0.0008, 'grad_norm': 1.9601983991487557, 'learning_rate': 5.976e-07, 'completion_length': 59.51785850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.02093505859375, 'epoch': 0.4} 40%|████ | 1006/2500 [3:53:27<4:56:40, 11.91s/it] 40%|████ | 1007/2500 [3:53:36<4:28:09, 10.78s/it] {'loss': 0.0007, 'grad_norm': 0.05534018470527618, 'learning_rate': 5.972e-07, 'completion_length': 51.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01812744140625, 'epoch': 0.4} 40%|████ | 1007/2500 [3:53:36<4:28:09, 10.78s/it] 40%|████ | 1008/2500 [3:53:45<4:16:02, 10.30s/it] {'loss': 0.0011, 'grad_norm': 0.09463534350118742, 'learning_rate': 5.967999999999999e-07, 'completion_length': 57.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02838134765625, 'epoch': 0.4} 40%|████ | 1008/2500 [3:53:45<4:16:02, 10.30s/it] 40%|████ | 1009/2500 [3:53:54<4:07:42, 9.97s/it] {'loss': 0.0013, 'grad_norm': 0.08153764022775631, 'learning_rate': 5.964e-07, 'completion_length': 53.71428680419922, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.031982421875, 'epoch': 0.4} 40%|████ | 1009/2500 [3:53:54<4:07:42, 9.97s/it] 40%|████ | 1010/2500 [3:54:03<4:00:04, 9.67s/it] {'loss': 0.0015, 'grad_norm': 0.07395910157278127, 'learning_rate': 5.96e-07, 'completion_length': 56.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0381927490234375, 'epoch': 0.4} 40%|████ | 1010/2500 [3:54:03<4:00:04, 9.67s/it] 40%|████ | 1011/2500 [3:54:11<3:50:45, 9.30s/it] {'loss': 0.0006, 'grad_norm': 0.1449245342155658, 'learning_rate': 5.956e-07, 'completion_length': 54.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014129638671875, 'epoch': 0.4} 40%|████ | 1011/2500 [3:54:11<3:50:45, 9.30s/it] 40%|████ | 1012/2500 [3:54:20<3:43:51, 9.03s/it] {'loss': 0.0013, 'grad_norm': 1.526031783509496, 'learning_rate': 5.951999999999999e-07, 'completion_length': 60.357147216796875, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0316162109375, 'epoch': 0.4} 40%|████ | 1012/2500 [3:54:20<3:43:51, 9.03s/it] 41%|████ | 1013/2500 [3:54:29<3:46:54, 9.16s/it] {'loss': 0.001, 'grad_norm': 0.0767387406701204, 'learning_rate': 5.948e-07, 'completion_length': 56.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02593994140625, 'epoch': 0.41} 41%|████ | 1013/2500 [3:54:29<3:46:54, 9.16s/it] 41%|████ | 1014/2500 [3:54:39<3:49:28, 9.27s/it] {'loss': 0.0007, 'grad_norm': 0.06610130175458845, 'learning_rate': 5.944e-07, 'completion_length': 64.12500190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.01861572265625, 'epoch': 0.41} 41%|████ | 1014/2500 [3:54:39<3:49:28, 9.27s/it] 41%|████ | 1015/2500 [3:54:47<3:39:38, 8.87s/it] {'loss': 0.0017, 'grad_norm': 1.5573469687272752, 'learning_rate': 5.939999999999999e-07, 'completion_length': 52.285715103149414, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0416259765625, 'epoch': 0.41} 41%|████ | 1015/2500 [3:54:47<3:39:38, 8.87s/it] 41%|████ | 1016/2500 [3:54:55<3:38:20, 8.83s/it] {'loss': 0.0011, 'grad_norm': 0.10062821314717335, 'learning_rate': 5.936e-07, 'completion_length': 54.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02825927734375, 'epoch': 0.41} 41%|████ | 1016/2500 [3:54:55<3:38:20, 8.83s/it] 41%|████ | 1017/2500 [3:55:04<3:38:50, 8.85s/it] {'loss': 0.0015, 'grad_norm': 0.06967743279390869, 'learning_rate': 5.931999999999999e-07, 'completion_length': 55.83928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03656005859375, 'epoch': 0.41} 41%|████ | 1017/2500 [3:55:04<3:38:50, 8.85s/it] 41%|████ | 1018/2500 [3:55:14<3:42:46, 9.02s/it] {'loss': 0.0015, 'grad_norm': 0.09594264097493616, 'learning_rate': 5.928e-07, 'completion_length': 62.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03704833984375, 'epoch': 0.41} 41%|████ | 1018/2500 [3:55:14<3:42:46, 9.02s/it] 41%|████ | 1019/2500 [3:55:22<3:37:39, 8.82s/it] {'loss': 0.0009, 'grad_norm': 0.3126307091683797, 'learning_rate': 5.924e-07, 'completion_length': 54.21428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.41} 41%|████ | 1019/2500 [3:55:22<3:37:39, 8.82s/it] 41%|████ | 1020/2500 [3:55:31<3:38:02, 8.84s/it] {'loss': 0.0014, 'grad_norm': 0.1501135361616415, 'learning_rate': 5.919999999999999e-07, 'completion_length': 57.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03533935546875, 'epoch': 0.41} 41%|████ | 1020/2500 [3:55:31<3:38:02, 8.84s/it] 41%|████ | 1021/2500 [3:55:40<3:35:33, 8.74s/it] {'loss': 0.0008, 'grad_norm': 0.10275118406105566, 'learning_rate': 5.916e-07, 'completion_length': 51.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0203857421875, 'epoch': 0.41} 41%|████ | 1021/2500 [3:55:40<3:35:33, 8.74s/it] 41%|████ | 1022/2500 [3:55:50<3:51:19, 9.39s/it] {'loss': 0.0019, 'grad_norm': 0.129892270017347, 'learning_rate': 5.911999999999999e-07, 'completion_length': 72.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0482177734375, 'epoch': 0.41} 41%|████ | 1022/2500 [3:55:50<3:51:19, 9.39s/it] 41%|████ | 1023/2500 [3:55:59<3:48:11, 9.27s/it] {'loss': 0.0008, 'grad_norm': 0.07192621376241141, 'learning_rate': 5.907999999999999e-07, 'completion_length': 57.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02056884765625, 'epoch': 0.41} 41%|████ | 1023/2500 [3:55:59<3:48:11, 9.27s/it] 41%|████ | 1024/2500 [3:56:08<3:41:28, 9.00s/it] {'loss': 0.0013, 'grad_norm': 0.07854903233828309, 'learning_rate': 5.904e-07, 'completion_length': 50.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0333251953125, 'epoch': 0.41} 41%|████ | 1024/2500 [3:56:08<3:41:28, 9.00s/it] 41%|████ | 1025/2500 [3:56:16<3:34:58, 8.75s/it] {'loss': 0.0011, 'grad_norm': 0.09694982443623133, 'learning_rate': 5.9e-07, 'completion_length': 63.267860412597656, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02862548828125, 'epoch': 0.41} 41%|████ | 1025/2500 [3:56:16<3:34:58, 8.75s/it] 41%|████ | 1026/2500 [3:56:24<3:28:20, 8.48s/it] {'loss': 0.0012, 'grad_norm': 0.10166638444042193, 'learning_rate': 5.896e-07, 'completion_length': 48.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029052734375, 'epoch': 0.41} 41%|████ | 1026/2500 [3:56:24<3:28:20, 8.48s/it] 41%|████ | 1027/2500 [3:56:35<3:46:08, 9.21s/it] {'loss': 0.0013, 'grad_norm': 0.07593848546251183, 'learning_rate': 5.891999999999999e-07, 'completion_length': 62.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032196044921875, 'epoch': 0.41} 41%|████ | 1027/2500 [3:56:35<3:46:08, 9.21s/it] 41%|████ | 1028/2500 [3:56:43<3:42:36, 9.07s/it] {'loss': 0.0017, 'grad_norm': 0.08376091747585444, 'learning_rate': 5.888e-07, 'completion_length': 54.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0421142578125, 'epoch': 0.41} 41%|████ | 1028/2500 [3:56:43<3:42:36, 9.07s/it] 41%|████ | 1029/2500 [3:56:51<3:30:50, 8.60s/it] {'loss': 0.0007, 'grad_norm': 0.12495086228824405, 'learning_rate': 5.884000000000001e-07, 'completion_length': 47.46428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01641845703125, 'epoch': 0.41} 41%|████ | 1029/2500 [3:56:51<3:30:50, 8.60s/it] 41%|████ | 1030/2500 [3:57:00<3:34:26, 8.75s/it] {'loss': 0.0014, 'grad_norm': 0.06268489682224523, 'learning_rate': 5.879999999999999e-07, 'completion_length': 57.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03582763671875, 'epoch': 0.41} 41%|████ | 1030/2500 [3:57:00<3:34:26, 8.75s/it] 41%|████ | 1031/2500 [3:57:11<3:48:07, 9.32s/it] {'loss': 0.0015, 'grad_norm': 0.0948600085136292, 'learning_rate': 5.876e-07, 'completion_length': 67.39286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03668212890625, 'epoch': 0.41} 41%|████ | 1031/2500 [3:57:11<3:48:07, 9.32s/it] 41%|████▏ | 1032/2500 [3:57:19<3:42:52, 9.11s/it] {'loss': 0.0008, 'grad_norm': 8.508115345965228, 'learning_rate': 5.872000000000001e-07, 'completion_length': 61.46428871154785, 'rewards/accuracy_reward': 0.8571428954601288, 'rewards/format_reward': 1.0, 'reward': 1.857142984867096, 'reward_std': 0.0714285746216774, 'kl': 0.02020263671875, 'epoch': 0.41} 41%|████▏ | 1032/2500 [3:57:19<3:42:52, 9.11s/it] 41%|████▏ | 1033/2500 [3:57:28<3:40:09, 9.00s/it] {'loss': 0.0011, 'grad_norm': 0.12384622036115575, 'learning_rate': 5.867999999999999e-07, 'completion_length': 55.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0263671875, 'epoch': 0.41} 41%|████▏ | 1033/2500 [3:57:28<3:40:09, 9.00s/it] 41%|████▏ | 1034/2500 [3:57:36<3:33:17, 8.73s/it] {'loss': 0.0014, 'grad_norm': 0.12733407034826566, 'learning_rate': 5.864e-07, 'completion_length': 59.39285850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0360107421875, 'epoch': 0.41} 41%|████▏ | 1034/2500 [3:57:36<3:33:17, 8.73s/it] 41%|████▏ | 1035/2500 [3:57:44<3:28:36, 8.54s/it] {'loss': 0.0014, 'grad_norm': 0.14743359989937782, 'learning_rate': 5.86e-07, 'completion_length': 55.19643020629883, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03448486328125, 'epoch': 0.41} 41%|████▏ | 1035/2500 [3:57:44<3:28:36, 8.54s/it] 41%|████▏ | 1036/2500 [3:57:54<3:39:53, 9.01s/it] {'loss': 0.0009, 'grad_norm': 0.06519177502732634, 'learning_rate': 5.856e-07, 'completion_length': 53.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0225830078125, 'epoch': 0.41} 41%|████▏ | 1036/2500 [3:57:54<3:39:53, 9.01s/it] 41%|████▏ | 1037/2500 [3:58:08<4:10:12, 10.26s/it] {'loss': 0.002, 'grad_norm': 0.0685760174542288, 'learning_rate': 5.852e-07, 'completion_length': 55.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0499267578125, 'epoch': 0.41} 41%|████▏ | 1037/2500 [3:58:08<4:10:12, 10.26s/it] 42%|████▏ | 1038/2500 [3:58:22<4:41:05, 11.54s/it] {'loss': 0.0007, 'grad_norm': 1.598870869963966, 'learning_rate': 5.848e-07, 'completion_length': 60.21428871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.016387939453125, 'epoch': 0.42} 42%|████▏ | 1038/2500 [3:58:22<4:41:05, 11.54s/it] 42%|████▏ | 1039/2500 [3:58:35<4:53:08, 12.04s/it] {'loss': 0.0012, 'grad_norm': 0.20321497018910703, 'learning_rate': 5.844e-07, 'completion_length': 51.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031005859375, 'epoch': 0.42} 42%|████▏ | 1039/2500 [3:58:35<4:53:08, 12.04s/it] 42%|████▏ | 1040/2500 [3:58:49<5:06:17, 12.59s/it] {'loss': 0.0008, 'grad_norm': 0.3323884156631259, 'learning_rate': 5.839999999999999e-07, 'completion_length': 58.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0201416015625, 'epoch': 0.42} 42%|████▏ | 1040/2500 [3:58:49<5:06:17, 12.59s/it] 42%|████▏ | 1041/2500 [3:59:03<5:17:41, 13.06s/it] {'loss': 0.0006, 'grad_norm': 1.1170095490328773, 'learning_rate': 5.836e-07, 'completion_length': 54.73214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.014739990234375, 'epoch': 0.42} 42%|████▏ | 1041/2500 [3:59:03<5:17:41, 13.06s/it] 42%|████▏ | 1042/2500 [3:59:19<5:37:19, 13.88s/it] {'loss': 0.0008, 'grad_norm': 0.08706101399720755, 'learning_rate': 5.832e-07, 'completion_length': 66.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019256591796875, 'epoch': 0.42} 42%|████▏ | 1042/2500 [3:59:19<5:37:19, 13.88s/it] 42%|████▏ | 1043/2500 [3:59:32<5:32:39, 13.70s/it] {'loss': 0.0008, 'grad_norm': 0.09565016506822228, 'learning_rate': 5.828e-07, 'completion_length': 56.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021148681640625, 'epoch': 0.42} 42%|████▏ | 1043/2500 [3:59:32<5:32:39, 13.70s/it] 42%|████▏ | 1044/2500 [3:59:49<5:53:14, 14.56s/it] {'loss': 0.0017, 'grad_norm': 2.2433259236033263, 'learning_rate': 5.824e-07, 'completion_length': 63.44643020629883, 'rewards/accuracy_reward': 0.8928571939468384, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.04123930633068085, 'kl': 0.0430908203125, 'epoch': 0.42} 42%|████▏ | 1044/2500 [3:59:49<5:53:14, 14.56s/it] 42%|████▏ | 1045/2500 [4:00:03<5:52:41, 14.54s/it] {'loss': 0.0009, 'grad_norm': 0.14680874843247027, 'learning_rate': 5.819999999999999e-07, 'completion_length': 57.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.42} 42%|████▏ | 1045/2500 [4:00:03<5:52:41, 14.54s/it] 42%|████▏ | 1046/2500 [4:00:18<5:51:20, 14.50s/it] {'loss': 0.0012, 'grad_norm': 0.060120221660062347, 'learning_rate': 5.816e-07, 'completion_length': 56.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03076171875, 'epoch': 0.42} 42%|████▏ | 1046/2500 [4:00:18<5:51:20, 14.50s/it] 42%|████▏ | 1047/2500 [4:00:32<5:50:06, 14.46s/it] {'loss': 0.0012, 'grad_norm': 0.6348761055870736, 'learning_rate': 5.812e-07, 'completion_length': 56.33928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03033447265625, 'epoch': 0.42} 42%|████▏ | 1047/2500 [4:00:32<5:50:06, 14.46s/it] 42%|████▏ | 1048/2500 [4:00:45<5:39:59, 14.05s/it] {'loss': 0.0016, 'grad_norm': 0.08817846927707565, 'learning_rate': 5.807999999999999e-07, 'completion_length': 54.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.039215087890625, 'epoch': 0.42} 42%|████▏ | 1048/2500 [4:00:45<5:39:59, 14.05s/it] 42%|████▏ | 1049/2500 [4:00:59<5:34:08, 13.82s/it] {'loss': 0.001, 'grad_norm': 0.09284691585067391, 'learning_rate': 5.804e-07, 'completion_length': 53.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024322509765625, 'epoch': 0.42} 42%|████▏ | 1049/2500 [4:00:59<5:34:08, 13.82s/it] 42%|████▏ | 1050/2500 [4:01:12<5:28:09, 13.58s/it] {'loss': 0.0009, 'grad_norm': 0.05913824585091619, 'learning_rate': 5.8e-07, 'completion_length': 49.30357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.022705078125, 'epoch': 0.42} 42%|████▏ | 1050/2500 [4:01:12<5:28:09, 13.58s/it] 42%|████▏ | 1051/2500 [4:01:24<5:21:47, 13.33s/it] {'loss': 0.0009, 'grad_norm': 3.063175651781858, 'learning_rate': 5.796e-07, 'completion_length': 49.21428680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02215576171875, 'epoch': 0.42} 42%|████▏ | 1051/2500 [4:01:24<5:21:47, 13.33s/it] 42%|████▏ | 1052/2500 [4:01:40<5:36:27, 13.94s/it] {'loss': 0.0009, 'grad_norm': 8.103680746100562, 'learning_rate': 5.792e-07, 'completion_length': 53.03571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0714285746216774, 'kl': 0.02294921875, 'epoch': 0.42} 42%|████▏ | 1052/2500 [4:01:40<5:36:27, 13.94s/it] 42%|████▏ | 1053/2500 [4:01:54<5:38:05, 14.02s/it] {'loss': 0.0008, 'grad_norm': 6.681198682260687, 'learning_rate': 5.788e-07, 'completion_length': 63.17857551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.020263671875, 'epoch': 0.42} 42%|████▏ | 1053/2500 [4:01:54<5:38:05, 14.02s/it] 42%|████▏ | 1054/2500 [4:02:08<5:37:31, 14.00s/it] {'loss': 0.0021, 'grad_norm': 2.1290587747605816, 'learning_rate': 5.784e-07, 'completion_length': 55.71428680419922, 'rewards/accuracy_reward': 0.8750000596046448, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.8571429252624512, 'reward_std': 0.0714285746216774, 'kl': 0.052978515625, 'epoch': 0.42} 42%|████▏ | 1054/2500 [4:02:08<5:37:31, 14.00s/it] 42%|████▏ | 1055/2500 [4:02:22<5:40:05, 14.12s/it] {'loss': 0.0009, 'grad_norm': 0.08720003573873873, 'learning_rate': 5.779999999999999e-07, 'completion_length': 61.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02276611328125, 'epoch': 0.42} 42%|████▏ | 1055/2500 [4:02:22<5:40:05, 14.12s/it] 42%|████▏ | 1056/2500 [4:02:36<5:40:14, 14.14s/it] {'loss': 0.0006, 'grad_norm': 0.08525884225939993, 'learning_rate': 5.776e-07, 'completion_length': 54.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015899658203125, 'epoch': 0.42} 42%|████▏ | 1056/2500 [4:02:36<5:40:14, 14.14s/it] 42%|████▏ | 1057/2500 [4:02:50<5:38:35, 14.08s/it] {'loss': 0.0015, 'grad_norm': 2.730783782230386, 'learning_rate': 5.772000000000001e-07, 'completion_length': 61.92857551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0382080078125, 'epoch': 0.42} 42%|████▏ | 1057/2500 [4:02:50<5:38:35, 14.08s/it] 42%|████▏ | 1058/2500 [4:03:04<5:33:22, 13.87s/it] {'loss': 0.0013, 'grad_norm': 13.313190383886855, 'learning_rate': 5.767999999999999e-07, 'completion_length': 57.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03173828125, 'epoch': 0.42} 42%|████▏ | 1058/2500 [4:03:04<5:33:22, 13.87s/it] 42%|████▏ | 1059/2500 [4:03:18<5:32:27, 13.84s/it] {'loss': 0.0008, 'grad_norm': 0.07561052651400481, 'learning_rate': 5.764e-07, 'completion_length': 57.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02093505859375, 'epoch': 0.42} 42%|████▏ | 1059/2500 [4:03:18<5:32:27, 13.84s/it] 42%|████▏ | 1060/2500 [4:03:31<5:29:42, 13.74s/it] {'loss': 0.0009, 'grad_norm': 2.599880044634989, 'learning_rate': 5.76e-07, 'completion_length': 60.33928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0216064453125, 'epoch': 0.42} 42%|████▏ | 1060/2500 [4:03:31<5:29:42, 13.74s/it] 42%|████▏ | 1061/2500 [4:03:45<5:34:07, 13.93s/it] {'loss': 0.0006, 'grad_norm': 1.2957204737491912, 'learning_rate': 5.755999999999999e-07, 'completion_length': 55.41071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0140380859375, 'epoch': 0.42} 42%|████▏ | 1061/2500 [4:03:45<5:34:07, 13.93s/it] 42%|████▏ | 1062/2500 [4:03:59<5:33:44, 13.93s/it] {'loss': 0.0016, 'grad_norm': 4.784980681697074, 'learning_rate': 5.752e-07, 'completion_length': 59.357147216796875, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285715222358704, 'reward_std': 0.0714285746216774, 'kl': 0.0396728515625, 'epoch': 0.42} 42%|████▏ | 1062/2500 [4:03:59<5:33:44, 13.93s/it] 43%|████▎ | 1063/2500 [4:04:14<5:41:42, 14.27s/it] {'loss': 0.0012, 'grad_norm': 0.08065388052132648, 'learning_rate': 5.748e-07, 'completion_length': 53.660715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031005859375, 'epoch': 0.43} 43%|████▎ | 1063/2500 [4:04:14<5:41:42, 14.27s/it] 43%|████▎ | 1064/2500 [4:04:28<5:35:49, 14.03s/it] {'loss': 0.001, 'grad_norm': 0.07803397670063852, 'learning_rate': 5.744e-07, 'completion_length': 50.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023834228515625, 'epoch': 0.43} 43%|████▎ | 1064/2500 [4:04:28<5:35:49, 14.03s/it] 43%|████▎ | 1065/2500 [4:04:41<5:28:32, 13.74s/it] {'loss': 0.0005, 'grad_norm': 0.22681341187806267, 'learning_rate': 5.739999999999999e-07, 'completion_length': 49.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013275146484375, 'epoch': 0.43} 43%|████▎ | 1065/2500 [4:04:41<5:28:32, 13.74s/it] 43%|████▎ | 1066/2500 [4:04:54<5:20:16, 13.40s/it] {'loss': 0.0009, 'grad_norm': 0.09957465862862794, 'learning_rate': 5.736e-07, 'completion_length': 43.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02227783203125, 'epoch': 0.43} 43%|████▎ | 1066/2500 [4:04:54<5:20:16, 13.40s/it] 43%|████▎ | 1067/2500 [4:05:08<5:25:37, 13.63s/it] {'loss': 0.0018, 'grad_norm': 0.10380176885467164, 'learning_rate': 5.732e-07, 'completion_length': 57.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0450439453125, 'epoch': 0.43} 43%|████▎ | 1067/2500 [4:05:08<5:25:37, 13.63s/it] 43%|████▎ | 1068/2500 [4:05:22<5:32:22, 13.93s/it] {'loss': 0.0016, 'grad_norm': 0.13165784194586805, 'learning_rate': 5.727999999999999e-07, 'completion_length': 57.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04034423828125, 'epoch': 0.43} 43%|████▎ | 1068/2500 [4:05:22<5:32:22, 13.93s/it] 43%|████▎ | 1069/2500 [4:05:36<5:28:59, 13.79s/it] {'loss': 0.0008, 'grad_norm': 0.07086847065004787, 'learning_rate': 5.724e-07, 'completion_length': 47.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02044677734375, 'epoch': 0.43} 43%|████▎ | 1069/2500 [4:05:36<5:28:59, 13.79s/it] 43%|████▎ | 1070/2500 [4:05:49<5:23:54, 13.59s/it] {'loss': 0.001, 'grad_norm': 0.12200747846911819, 'learning_rate': 5.719999999999999e-07, 'completion_length': 54.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025146484375, 'epoch': 0.43} 43%|████▎ | 1070/2500 [4:05:49<5:23:54, 13.59s/it] 43%|████▎ | 1071/2500 [4:06:03<5:27:11, 13.74s/it] {'loss': 0.0012, 'grad_norm': 0.11686893983273981, 'learning_rate': 5.716e-07, 'completion_length': 60.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02972412109375, 'epoch': 0.43} 43%|████▎ | 1071/2500 [4:06:03<5:27:11, 13.74s/it] 43%|████▎ | 1072/2500 [4:06:17<5:30:31, 13.89s/it] {'loss': 0.001, 'grad_norm': 0.08837477926176791, 'learning_rate': 5.712e-07, 'completion_length': 50.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02398681640625, 'epoch': 0.43} 43%|████▎ | 1072/2500 [4:06:17<5:30:31, 13.89s/it] 43%|████▎ | 1073/2500 [4:06:31<5:30:51, 13.91s/it] {'loss': 0.0015, 'grad_norm': 0.08504154350637581, 'learning_rate': 5.707999999999999e-07, 'completion_length': 55.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03631591796875, 'epoch': 0.43} 43%|████▎ | 1073/2500 [4:06:31<5:30:51, 13.91s/it] 43%|████▎ | 1074/2500 [4:06:45<5:30:24, 13.90s/it] {'loss': 0.001, 'grad_norm': 0.15232747502466557, 'learning_rate': 5.704e-07, 'completion_length': 54.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02557373046875, 'epoch': 0.43} 43%|████▎ | 1074/2500 [4:06:45<5:30:24, 13.90s/it] 43%|████▎ | 1075/2500 [4:07:00<5:35:26, 14.12s/it] {'loss': 0.0009, 'grad_norm': 0.08090164109092395, 'learning_rate': 5.699999999999999e-07, 'completion_length': 57.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02197265625, 'epoch': 0.43} 43%|████▎ | 1075/2500 [4:07:00<5:35:26, 14.12s/it] 43%|████▎ | 1076/2500 [4:07:13<5:28:37, 13.85s/it] {'loss': 0.0009, 'grad_norm': 2.9889199774694517, 'learning_rate': 5.696e-07, 'completion_length': 48.392860412597656, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02362060546875, 'epoch': 0.43} 43%|████▎ | 1076/2500 [4:07:13<5:28:37, 13.85s/it] 43%|████▎ | 1077/2500 [4:07:27<5:32:22, 14.01s/it] {'loss': 0.0009, 'grad_norm': 0.07096149073898156, 'learning_rate': 5.692e-07, 'completion_length': 69.76786041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02349853515625, 'epoch': 0.43} 43%|████▎ | 1077/2500 [4:07:27<5:32:22, 14.01s/it] 43%|████▎ | 1078/2500 [4:07:41<5:26:42, 13.79s/it] {'loss': 0.0017, 'grad_norm': 0.13143400397351365, 'learning_rate': 5.688e-07, 'completion_length': 49.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.042205810546875, 'epoch': 0.43} 43%|████▎ | 1078/2500 [4:07:41<5:26:42, 13.79s/it] 43%|████▎ | 1079/2500 [4:07:54<5:26:04, 13.77s/it] {'loss': 0.0009, 'grad_norm': 0.08605176846000569, 'learning_rate': 5.684e-07, 'completion_length': 51.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02288818359375, 'epoch': 0.43} 43%|████▎ | 1079/2500 [4:07:54<5:26:04, 13.77s/it] 43%|████▎ | 1080/2500 [4:08:11<5:44:45, 14.57s/it] {'loss': 0.0009, 'grad_norm': 2.560974373832466, 'learning_rate': 5.679999999999999e-07, 'completion_length': 63.94643020629883, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.022705078125, 'epoch': 0.43} 43%|████▎ | 1080/2500 [4:08:11<5:44:45, 14.57s/it] 43%|████▎ | 1081/2500 [4:08:25<5:41:36, 14.44s/it] {'loss': 0.001, 'grad_norm': 0.10910737796956763, 'learning_rate': 5.676e-07, 'completion_length': 61.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02532958984375, 'epoch': 0.43} 43%|████▎ | 1081/2500 [4:08:25<5:41:36, 14.44s/it] 43%|████▎ | 1082/2500 [4:08:40<5:48:27, 14.74s/it] {'loss': 0.0009, 'grad_norm': 1.6928731482913852, 'learning_rate': 5.672e-07, 'completion_length': 59.78571701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02301025390625, 'epoch': 0.43} 43%|████▎ | 1082/2500 [4:08:40<5:48:27, 14.74s/it] 43%|████▎ | 1083/2500 [4:08:57<5:59:00, 15.20s/it] {'loss': 0.0008, 'grad_norm': 0.06227475856559006, 'learning_rate': 5.667999999999999e-07, 'completion_length': 60.05357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0201416015625, 'epoch': 0.43} 43%|████▎ | 1083/2500 [4:08:57<5:59:00, 15.20s/it] 43%|████▎ | 1084/2500 [4:09:11<5:49:51, 14.82s/it] {'loss': 0.0022, 'grad_norm': 0.08221309793574344, 'learning_rate': 5.664e-07, 'completion_length': 60.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0552978515625, 'epoch': 0.43} 43%|████▎ | 1084/2500 [4:09:11<5:49:51, 14.82s/it] 43%|████▎ | 1085/2500 [4:09:24<5:39:58, 14.42s/it] {'loss': 0.0011, 'grad_norm': 0.08507165398802241, 'learning_rate': 5.66e-07, 'completion_length': 57.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028564453125, 'epoch': 0.43} 43%|████▎ | 1085/2500 [4:09:24<5:39:58, 14.42s/it] 43%|████▎ | 1086/2500 [4:09:37<5:31:47, 14.08s/it] {'loss': 0.0004, 'grad_norm': 0.07318633403230322, 'learning_rate': 5.655999999999999e-07, 'completion_length': 50.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.008758544921875, 'epoch': 0.43} 43%|████▎ | 1086/2500 [4:09:37<5:31:47, 14.08s/it] 43%|████▎ | 1087/2500 [4:09:51<5:26:39, 13.87s/it] {'loss': 0.0007, 'grad_norm': 0.09390998490776685, 'learning_rate': 5.652e-07, 'completion_length': 52.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0166015625, 'epoch': 0.43} 43%|████▎ | 1087/2500 [4:09:51<5:26:39, 13.87s/it] 44%|████▎ | 1088/2500 [4:10:04<5:24:55, 13.81s/it] {'loss': 0.0014, 'grad_norm': 0.1874272494667261, 'learning_rate': 5.648e-07, 'completion_length': 56.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03399658203125, 'epoch': 0.44} 44%|████▎ | 1088/2500 [4:10:04<5:24:55, 13.81s/it] 44%|████▎ | 1089/2500 [4:10:18<5:19:50, 13.60s/it] {'loss': 0.0008, 'grad_norm': 0.056743878495920184, 'learning_rate': 5.643999999999999e-07, 'completion_length': 55.73214530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02099609375, 'epoch': 0.44} 44%|████▎ | 1089/2500 [4:10:18<5:19:50, 13.60s/it] 44%|████▎ | 1090/2500 [4:10:32<5:23:02, 13.75s/it] {'loss': 0.001, 'grad_norm': 0.07049699231127905, 'learning_rate': 5.639999999999999e-07, 'completion_length': 54.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026123046875, 'epoch': 0.44} 44%|████▎ | 1090/2500 [4:10:32<5:23:02, 13.75s/it] 44%|████▎ | 1091/2500 [4:10:45<5:22:59, 13.75s/it] {'loss': 0.0019, 'grad_norm': 0.08222483297897883, 'learning_rate': 5.636e-07, 'completion_length': 56.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.046142578125, 'epoch': 0.44} 44%|████▎ | 1091/2500 [4:10:45<5:22:59, 13.75s/it] 44%|████▎ | 1092/2500 [4:10:59<5:22:56, 13.76s/it] {'loss': 0.0016, 'grad_norm': 0.08774935236119942, 'learning_rate': 5.632e-07, 'completion_length': 58.46428680419922, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0408935546875, 'epoch': 0.44} 44%|████▎ | 1092/2500 [4:10:59<5:22:56, 13.76s/it] 44%|████▎ | 1093/2500 [4:11:15<5:39:11, 14.46s/it] {'loss': 0.0014, 'grad_norm': 0.09343060288651296, 'learning_rate': 5.627999999999999e-07, 'completion_length': 59.553571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03533935546875, 'epoch': 0.44} 44%|████▎ | 1093/2500 [4:11:15<5:39:11, 14.46s/it] 44%|████▍ | 1094/2500 [4:11:29<5:31:16, 14.14s/it] {'loss': 0.0012, 'grad_norm': 0.25402704450904456, 'learning_rate': 5.624e-07, 'completion_length': 57.500003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03076171875, 'epoch': 0.44} 44%|████▍ | 1094/2500 [4:11:29<5:31:16, 14.14s/it] 44%|████▍ | 1095/2500 [4:11:43<5:30:14, 14.10s/it] {'loss': 0.0012, 'grad_norm': 0.07567514522929449, 'learning_rate': 5.620000000000001e-07, 'completion_length': 58.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02911376953125, 'epoch': 0.44} 44%|████▍ | 1095/2500 [4:11:43<5:30:14, 14.10s/it] 44%|████▍ | 1096/2500 [4:11:56<5:22:13, 13.77s/it] {'loss': 0.001, 'grad_norm': 0.06926288512994883, 'learning_rate': 5.615999999999999e-07, 'completion_length': 46.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0240478515625, 'epoch': 0.44} 44%|████▍ | 1096/2500 [4:11:56<5:22:13, 13.77s/it] 44%|████▍ | 1097/2500 [4:12:10<5:28:50, 14.06s/it] {'loss': 0.001, 'grad_norm': 0.08053955138388147, 'learning_rate': 5.612e-07, 'completion_length': 54.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02392578125, 'epoch': 0.44} 44%|████▍ | 1097/2500 [4:12:10<5:28:50, 14.06s/it] 44%|████▍ | 1098/2500 [4:12:26<5:39:34, 14.53s/it] {'loss': 0.0018, 'grad_norm': 0.06836070799219418, 'learning_rate': 5.608e-07, 'completion_length': 67.21428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0452880859375, 'epoch': 0.44} 44%|████▍ | 1098/2500 [4:12:26<5:39:34, 14.53s/it] 44%|████▍ | 1099/2500 [4:12:39<5:31:00, 14.18s/it] {'loss': 0.0013, 'grad_norm': 0.0712975318159512, 'learning_rate': 5.604e-07, 'completion_length': 46.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032470703125, 'epoch': 0.44} 44%|████▍ | 1099/2500 [4:12:39<5:31:00, 14.18s/it] 44%|████▍ | 1100/2500 [4:12:54<5:31:07, 14.19s/it] {'loss': 0.001, 'grad_norm': 0.06454062390892344, 'learning_rate': 5.6e-07, 'completion_length': 57.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02410888671875, 'epoch': 0.44} 44%|████▍ | 1100/2500 [4:12:54<5:31:07, 14.19s/it] 44%|████▍ | 1101/2500 [4:14:04<12:06:29, 31.16s/it] {'loss': 0.0015, 'grad_norm': 0.06217668819621835, 'learning_rate': 5.596e-07, 'completion_length': 52.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03851318359375, 'epoch': 0.44} 44%|████▍ | 1101/2500 [4:14:04<12:06:29, 31.16s/it] 44%|████▍ | 1102/2500 [4:14:19<10:10:20, 26.19s/it] {'loss': 0.002, 'grad_norm': 0.15491277693268155, 'learning_rate': 5.592e-07, 'completion_length': 63.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.048828125, 'epoch': 0.44} 44%|████▍ | 1102/2500 [4:14:19<10:10:20, 26.19s/it] 44%|████▍ | 1103/2500 [4:14:33<8:42:09, 22.43s/it] {'loss': 0.0016, 'grad_norm': 0.2598128072409604, 'learning_rate': 5.588e-07, 'completion_length': 58.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.041259765625, 'epoch': 0.44} 44%|████▍ | 1103/2500 [4:14:33<8:42:09, 22.43s/it] 44%|████▍ | 1104/2500 [4:14:46<7:41:12, 19.82s/it] {'loss': 0.0017, 'grad_norm': 0.17744591176187802, 'learning_rate': 5.584e-07, 'completion_length': 57.589290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04345703125, 'epoch': 0.44} 44%|████▍ | 1104/2500 [4:14:46<7:41:12, 19.82s/it] 44%|████▍ | 1105/2500 [4:15:00<6:57:13, 17.95s/it] {'loss': 0.0013, 'grad_norm': 0.0555595589388318, 'learning_rate': 5.58e-07, 'completion_length': 56.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0313720703125, 'epoch': 0.44} 44%|████▍ | 1105/2500 [4:15:00<6:57:13, 17.95s/it] 44%|████▍ | 1106/2500 [4:15:16<6:46:57, 17.52s/it] {'loss': 0.001, 'grad_norm': 0.093424673048878, 'learning_rate': 5.576e-07, 'completion_length': 72.87500381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025634765625, 'epoch': 0.44} 44%|████▍ | 1106/2500 [4:15:16<6:46:57, 17.52s/it] 44%|████▍ | 1107/2500 [4:15:31<6:23:49, 16.53s/it] {'loss': 0.001, 'grad_norm': 0.08201859500390703, 'learning_rate': 5.572e-07, 'completion_length': 55.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026123046875, 'epoch': 0.44} 44%|████▍ | 1107/2500 [4:15:31<6:23:49, 16.53s/it] 44%|████▍ | 1108/2500 [4:15:46<6:13:28, 16.10s/it] {'loss': 0.0007, 'grad_norm': 0.07730426031002204, 'learning_rate': 5.567999999999999e-07, 'completion_length': 59.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018310546875, 'epoch': 0.44} 44%|████▍ | 1108/2500 [4:15:46<6:13:28, 16.10s/it] 44%|████▍ | 1109/2500 [4:15:59<5:51:12, 15.15s/it] {'loss': 0.001, 'grad_norm': 0.12583552399732637, 'learning_rate': 5.564e-07, 'completion_length': 47.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025146484375, 'epoch': 0.44} 44%|████▍ | 1109/2500 [4:15:59<5:51:12, 15.15s/it] 44%|████▍ | 1110/2500 [4:16:12<5:40:39, 14.70s/it] {'loss': 0.0012, 'grad_norm': 2.450509592966594, 'learning_rate': 5.560000000000001e-07, 'completion_length': 56.94643020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.029541015625, 'epoch': 0.44} 44%|████▍ | 1110/2500 [4:16:12<5:40:39, 14.70s/it] 44%|████▍ | 1111/2500 [4:16:26<5:31:19, 14.31s/it] {'loss': 0.0012, 'grad_norm': 0.15950136145009663, 'learning_rate': 5.555999999999999e-07, 'completion_length': 48.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02984619140625, 'epoch': 0.44} 44%|████▍ | 1111/2500 [4:16:26<5:31:19, 14.31s/it] 44%|████▍ | 1112/2500 [4:16:40<5:32:39, 14.38s/it] {'loss': 0.0006, 'grad_norm': 1.3290476341234034, 'learning_rate': 5.552e-07, 'completion_length': 57.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.014068603515625, 'epoch': 0.44} 44%|████▍ | 1112/2500 [4:16:40<5:32:39, 14.38s/it] 45%|████▍ | 1113/2500 [4:16:54<5:29:12, 14.24s/it] {'loss': 0.0005, 'grad_norm': 0.06676787646074257, 'learning_rate': 5.548e-07, 'completion_length': 58.535715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.012420654296875, 'epoch': 0.45} 45%|████▍ | 1113/2500 [4:16:54<5:29:12, 14.24s/it] 45%|████▍ | 1114/2500 [4:17:09<5:30:04, 14.29s/it] {'loss': 0.0006, 'grad_norm': 2.637225763340455, 'learning_rate': 5.543999999999999e-07, 'completion_length': 55.91071701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0146484375, 'epoch': 0.45} 45%|████▍ | 1114/2500 [4:17:09<5:30:04, 14.29s/it] 45%|████▍ | 1115/2500 [4:17:23<5:31:45, 14.37s/it] {'loss': 0.0012, 'grad_norm': 0.11898152421041965, 'learning_rate': 5.54e-07, 'completion_length': 58.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03118896484375, 'epoch': 0.45} 45%|████▍ | 1115/2500 [4:17:23<5:31:45, 14.37s/it] 45%|████▍ | 1116/2500 [4:17:38<5:33:06, 14.44s/it] {'loss': 0.0014, 'grad_norm': 0.08795494489120645, 'learning_rate': 5.536e-07, 'completion_length': 55.67857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03460693359375, 'epoch': 0.45} 45%|████▍ | 1116/2500 [4:17:38<5:33:06, 14.44s/it] 45%|████▍ | 1117/2500 [4:17:52<5:29:08, 14.28s/it] {'loss': 0.0009, 'grad_norm': 1.5238099225695956, 'learning_rate': 5.532e-07, 'completion_length': 56.392860412597656, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.022979736328125, 'epoch': 0.45} 45%|████▍ | 1117/2500 [4:17:52<5:29:08, 14.28s/it] 45%|████▍ | 1118/2500 [4:18:07<5:35:27, 14.56s/it] {'loss': 0.0011, 'grad_norm': 0.07439040687181663, 'learning_rate': 5.527999999999999e-07, 'completion_length': 65.23214340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026336669921875, 'epoch': 0.45} 45%|████▍ | 1118/2500 [4:18:07<5:35:27, 14.56s/it] 45%|████▍ | 1119/2500 [4:18:20<5:22:50, 14.03s/it] {'loss': 0.0012, 'grad_norm': 0.07768293237516878, 'learning_rate': 5.524e-07, 'completion_length': 51.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029541015625, 'epoch': 0.45} 45%|████▍ | 1119/2500 [4:18:20<5:22:50, 14.03s/it] 45%|████▍ | 1120/2500 [4:18:35<5:28:54, 14.30s/it] {'loss': 0.0023, 'grad_norm': 0.8705024727663639, 'learning_rate': 5.520000000000001e-07, 'completion_length': 64.07143020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0570068359375, 'epoch': 0.45} 45%|████▍ | 1120/2500 [4:18:35<5:28:54, 14.30s/it] 45%|████▍ | 1121/2500 [4:18:49<5:29:21, 14.33s/it] {'loss': 0.0012, 'grad_norm': 0.15148017827967988, 'learning_rate': 5.515999999999999e-07, 'completion_length': 62.35714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02899169921875, 'epoch': 0.45} 45%|████▍ | 1121/2500 [4:18:49<5:29:21, 14.33s/it] 45%|████▍ | 1122/2500 [4:19:04<5:30:24, 14.39s/it] {'loss': 0.0011, 'grad_norm': 0.7549342324811402, 'learning_rate': 5.512e-07, 'completion_length': 58.01785850524902, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02813720703125, 'epoch': 0.45} 45%|████▍ | 1122/2500 [4:19:04<5:30:24, 14.39s/it] 45%|████▍ | 1123/2500 [4:19:18<5:31:47, 14.46s/it] {'loss': 0.0014, 'grad_norm': 0.07989878456011527, 'learning_rate': 5.508e-07, 'completion_length': 63.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03466796875, 'epoch': 0.45} 45%|████▍ | 1123/2500 [4:19:18<5:31:47, 14.46s/it] 45%|████▍ | 1124/2500 [4:19:32<5:28:12, 14.31s/it] {'loss': 0.0013, 'grad_norm': 0.07975811605106102, 'learning_rate': 5.504e-07, 'completion_length': 57.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.45} 45%|████▍ | 1124/2500 [4:19:32<5:28:12, 14.31s/it] 45%|████▌ | 1125/2500 [4:19:46<5:25:23, 14.20s/it] {'loss': 0.0013, 'grad_norm': 0.11288682509398781, 'learning_rate': 5.5e-07, 'completion_length': 54.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03179931640625, 'epoch': 0.45} 45%|████▌ | 1125/2500 [4:19:46<5:25:23, 14.20s/it] 45%|████▌ | 1126/2500 [4:20:00<5:25:27, 14.21s/it] {'loss': 0.0009, 'grad_norm': 0.2574769206880075, 'learning_rate': 5.496e-07, 'completion_length': 61.642860412597656, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.022857666015625, 'epoch': 0.45} 45%|████▌ | 1126/2500 [4:20:00<5:25:27, 14.21s/it] 45%|████▌ | 1127/2500 [4:20:14<5:23:24, 14.13s/it] {'loss': 0.0016, 'grad_norm': 0.20348804399965836, 'learning_rate': 5.492e-07, 'completion_length': 60.57143211364746, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0391845703125, 'epoch': 0.45} 45%|████▌ | 1127/2500 [4:20:14<5:23:24, 14.13s/it] 45%|████▌ | 1128/2500 [4:20:29<5:24:34, 14.19s/it] {'loss': 0.001, 'grad_norm': 0.06908886648444884, 'learning_rate': 5.487999999999999e-07, 'completion_length': 64.85714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0242919921875, 'epoch': 0.45} 45%|████▌ | 1128/2500 [4:20:29<5:24:34, 14.19s/it] 45%|████▌ | 1129/2500 [4:20:44<5:32:06, 14.53s/it] {'loss': 0.0015, 'grad_norm': 0.12695039900191493, 'learning_rate': 5.484e-07, 'completion_length': 55.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0362548828125, 'epoch': 0.45} 45%|████▌ | 1129/2500 [4:20:44<5:32:06, 14.53s/it] 45%|████▌ | 1130/2500 [4:20:57<5:22:47, 14.14s/it] {'loss': 0.0008, 'grad_norm': 0.09482837769515239, 'learning_rate': 5.48e-07, 'completion_length': 51.30357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.019439697265625, 'epoch': 0.45} 45%|████▌ | 1130/2500 [4:20:57<5:22:47, 14.14s/it] 45%|████▌ | 1131/2500 [4:21:11<5:22:40, 14.14s/it] {'loss': 0.001, 'grad_norm': 0.16600877385082236, 'learning_rate': 5.476e-07, 'completion_length': 56.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02545166015625, 'epoch': 0.45} 45%|████▌ | 1131/2500 [4:21:11<5:22:40, 14.14s/it] 45%|████▌ | 1132/2500 [4:21:24<5:15:46, 13.85s/it] {'loss': 0.0017, 'grad_norm': 5.434514415986347, 'learning_rate': 5.472e-07, 'completion_length': 56.642860412597656, 'rewards/accuracy_reward': 0.8392857611179352, 'rewards/format_reward': 1.0, 'reward': 1.8392857909202576, 'reward_std': 0.0357142873108387, 'kl': 0.042236328125, 'epoch': 0.45} 45%|████▌ | 1132/2500 [4:21:24<5:15:46, 13.85s/it] 45%|████▌ | 1133/2500 [4:21:39<5:17:18, 13.93s/it] {'loss': 0.0011, 'grad_norm': 0.07556559792363081, 'learning_rate': 5.467999999999999e-07, 'completion_length': 53.71428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02752685546875, 'epoch': 0.45} 45%|████▌ | 1133/2500 [4:21:39<5:17:18, 13.93s/it] 45%|████▌ | 1134/2500 [4:21:52<5:15:09, 13.84s/it] {'loss': 0.0017, 'grad_norm': 0.1141643953564871, 'learning_rate': 5.464e-07, 'completion_length': 51.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0423583984375, 'epoch': 0.45} 45%|████▌ | 1134/2500 [4:21:52<5:15:09, 13.84s/it] 45%|████▌ | 1135/2500 [4:22:06<5:12:10, 13.72s/it] {'loss': 0.0019, 'grad_norm': 1.7040262179506462, 'learning_rate': 5.46e-07, 'completion_length': 56.12500190734863, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.046630859375, 'epoch': 0.45} 45%|████▌ | 1135/2500 [4:22:06<5:12:10, 13.72s/it] 45%|████▌ | 1136/2500 [4:22:19<5:11:14, 13.69s/it] {'loss': 0.0018, 'grad_norm': 0.08762675019360983, 'learning_rate': 5.455999999999999e-07, 'completion_length': 59.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04400634765625, 'epoch': 0.45} 45%|████▌ | 1136/2500 [4:22:19<5:11:14, 13.69s/it] 45%|████▌ | 1137/2500 [4:22:33<5:14:08, 13.83s/it] {'loss': 0.0013, 'grad_norm': 0.07166634098748864, 'learning_rate': 5.452e-07, 'completion_length': 51.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03369140625, 'epoch': 0.45} 45%|████▌ | 1137/2500 [4:22:33<5:14:08, 13.83s/it] 46%|████▌ | 1138/2500 [4:22:47<5:10:11, 13.66s/it] {'loss': 0.0011, 'grad_norm': 0.21098184035197026, 'learning_rate': 5.448e-07, 'completion_length': 50.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02825927734375, 'epoch': 0.46} 46%|████▌ | 1138/2500 [4:22:47<5:10:11, 13.66s/it] 46%|████▌ | 1139/2500 [4:23:00<5:09:35, 13.65s/it] {'loss': 0.001, 'grad_norm': 0.06867285974002652, 'learning_rate': 5.443999999999999e-07, 'completion_length': 50.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02398681640625, 'epoch': 0.46} 46%|████▌ | 1139/2500 [4:23:00<5:09:35, 13.65s/it] 46%|████▌ | 1140/2500 [4:23:14<5:09:45, 13.67s/it] {'loss': 0.0009, 'grad_norm': 0.07997968135788779, 'learning_rate': 5.44e-07, 'completion_length': 57.035715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023468017578125, 'epoch': 0.46} 46%|████▌ | 1140/2500 [4:23:14<5:09:45, 13.67s/it] 46%|████▌ | 1141/2500 [4:23:28<5:13:04, 13.82s/it] {'loss': 0.0009, 'grad_norm': 2.5714923156960743, 'learning_rate': 5.436e-07, 'completion_length': 54.60714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02362060546875, 'epoch': 0.46} 46%|████▌ | 1141/2500 [4:23:28<5:13:04, 13.82s/it] 46%|████▌ | 1142/2500 [4:23:42<5:13:33, 13.85s/it] {'loss': 0.0013, 'grad_norm': 0.1159832179145531, 'learning_rate': 5.431999999999999e-07, 'completion_length': 54.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.033203125, 'epoch': 0.46} 46%|████▌ | 1142/2500 [4:23:42<5:13:33, 13.85s/it] 46%|████▌ | 1143/2500 [4:23:55<5:09:44, 13.70s/it] {'loss': 0.0005, 'grad_norm': 0.11720792730741694, 'learning_rate': 5.427999999999999e-07, 'completion_length': 50.035715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.012908935546875, 'epoch': 0.46} 46%|████▌ | 1143/2500 [4:23:55<5:09:44, 13.70s/it] 46%|████▌ | 1144/2500 [4:24:09<5:08:17, 13.64s/it] {'loss': 0.0005, 'grad_norm': 0.07386561572806077, 'learning_rate': 5.424e-07, 'completion_length': 52.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01361083984375, 'epoch': 0.46} 46%|████▌ | 1144/2500 [4:24:09<5:08:17, 13.64s/it] 46%|████▌ | 1145/2500 [4:24:23<5:08:40, 13.67s/it] {'loss': 0.0008, 'grad_norm': 1.3888007398748632, 'learning_rate': 5.420000000000001e-07, 'completion_length': 50.55357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0203857421875, 'epoch': 0.46} 46%|████▌ | 1145/2500 [4:24:23<5:08:40, 13.67s/it] 46%|████▌ | 1146/2500 [4:24:36<5:06:05, 13.56s/it] {'loss': 0.0012, 'grad_norm': 0.07931553343309475, 'learning_rate': 5.415999999999999e-07, 'completion_length': 52.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0306396484375, 'epoch': 0.46} 46%|████▌ | 1146/2500 [4:24:36<5:06:05, 13.56s/it] 46%|████▌ | 1147/2500 [4:24:49<5:03:41, 13.47s/it] {'loss': 0.0016, 'grad_norm': 0.11553806512862298, 'learning_rate': 5.412e-07, 'completion_length': 52.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04052734375, 'epoch': 0.46} 46%|████▌ | 1147/2500 [4:24:49<5:03:41, 13.47s/it] 46%|████▌ | 1148/2500 [4:25:03<5:07:49, 13.66s/it] {'loss': 0.001, 'grad_norm': 0.14426646201665724, 'learning_rate': 5.408e-07, 'completion_length': 52.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0240478515625, 'epoch': 0.46} 46%|████▌ | 1148/2500 [4:25:03<5:07:49, 13.66s/it] 46%|████▌ | 1149/2500 [4:25:18<5:11:19, 13.83s/it] {'loss': 0.0012, 'grad_norm': 0.09647619672642367, 'learning_rate': 5.403999999999999e-07, 'completion_length': 61.82143020629883, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0294189453125, 'epoch': 0.46} 46%|████▌ | 1149/2500 [4:25:18<5:11:19, 13.83s/it] 46%|████▌ | 1150/2500 [4:25:31<5:10:37, 13.81s/it] {'loss': 0.0012, 'grad_norm': 0.08459063869375337, 'learning_rate': 5.4e-07, 'completion_length': 57.00000190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03009033203125, 'epoch': 0.46} 46%|████▌ | 1150/2500 [4:25:31<5:10:37, 13.81s/it] 46%|████▌ | 1151/2500 [4:25:45<5:12:13, 13.89s/it] {'loss': 0.0008, 'grad_norm': 0.13297095164273193, 'learning_rate': 5.396e-07, 'completion_length': 52.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01995849609375, 'epoch': 0.46} 46%|████▌ | 1151/2500 [4:25:45<5:12:13, 13.89s/it] 46%|████▌ | 1152/2500 [4:25:58<5:05:54, 13.62s/it] {'loss': 0.0008, 'grad_norm': 3.534656982426814, 'learning_rate': 5.392e-07, 'completion_length': 46.50000190734863, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0205078125, 'epoch': 0.46} 46%|████▌ | 1152/2500 [4:25:58<5:05:54, 13.62s/it] 46%|████▌ | 1153/2500 [4:26:13<5:09:08, 13.77s/it] {'loss': 0.0009, 'grad_norm': 0.15024515517588305, 'learning_rate': 5.387999999999999e-07, 'completion_length': 61.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021728515625, 'epoch': 0.46} 46%|████▌ | 1153/2500 [4:26:13<5:09:08, 13.77s/it] 46%|████▌ | 1154/2500 [4:26:25<5:03:37, 13.53s/it] {'loss': 0.001, 'grad_norm': 2.257817568235148, 'learning_rate': 5.384e-07, 'completion_length': 50.41071701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.025909423828125, 'epoch': 0.46} 46%|████▌ | 1154/2500 [4:26:25<5:03:37, 13.53s/it] 46%|████▌ | 1155/2500 [4:26:39<5:02:36, 13.50s/it] {'loss': 0.0006, 'grad_norm': 0.13480066208908845, 'learning_rate': 5.38e-07, 'completion_length': 48.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015716552734375, 'epoch': 0.46} 46%|████▌ | 1155/2500 [4:26:39<5:02:36, 13.50s/it] 46%|████▌ | 1156/2500 [4:26:53<5:06:35, 13.69s/it] {'loss': 0.0018, 'grad_norm': 1.794981014312237, 'learning_rate': 5.375999999999999e-07, 'completion_length': 56.53571701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.044189453125, 'epoch': 0.46} 46%|████▌ | 1156/2500 [4:26:53<5:06:35, 13.69s/it] 46%|████▋ | 1157/2500 [4:27:07<5:06:20, 13.69s/it] {'loss': 0.0012, 'grad_norm': 0.08381253693816539, 'learning_rate': 5.372e-07, 'completion_length': 52.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030517578125, 'epoch': 0.46} 46%|████▋ | 1157/2500 [4:27:07<5:06:20, 13.69s/it] 46%|████▋ | 1158/2500 [4:27:22<5:16:09, 14.14s/it] {'loss': 0.0011, 'grad_norm': 0.08608563635616827, 'learning_rate': 5.368e-07, 'completion_length': 63.410715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027587890625, 'epoch': 0.46} 46%|████▋ | 1158/2500 [4:27:22<5:16:09, 14.14s/it] 46%|████▋ | 1159/2500 [4:27:37<5:22:35, 14.43s/it] {'loss': 0.0009, 'grad_norm': 0.1632060308580651, 'learning_rate': 5.364e-07, 'completion_length': 60.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0230712890625, 'epoch': 0.46} 46%|████▋ | 1159/2500 [4:27:37<5:22:35, 14.43s/it] 46%|████▋ | 1160/2500 [4:27:51<5:19:55, 14.32s/it] {'loss': 0.0008, 'grad_norm': 0.2326138165346469, 'learning_rate': 5.36e-07, 'completion_length': 60.91071891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019866943359375, 'epoch': 0.46} 46%|████▋ | 1160/2500 [4:27:51<5:19:55, 14.32s/it] 46%|████▋ | 1161/2500 [4:28:05<5:13:44, 14.06s/it] {'loss': 0.0027, 'grad_norm': 0.0680101151912335, 'learning_rate': 5.355999999999999e-07, 'completion_length': 55.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.066650390625, 'epoch': 0.46} 46%|████▋ | 1161/2500 [4:28:05<5:13:44, 14.06s/it] 46%|████▋ | 1162/2500 [4:28:19<5:14:03, 14.08s/it] {'loss': 0.0013, 'grad_norm': 0.07559929874523504, 'learning_rate': 5.352e-07, 'completion_length': 57.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031463623046875, 'epoch': 0.46} 46%|████▋ | 1162/2500 [4:28:19<5:14:03, 14.08s/it] 47%|████▋ | 1163/2500 [4:28:35<5:29:01, 14.77s/it] {'loss': 0.0012, 'grad_norm': 1.7832885456907215, 'learning_rate': 5.348e-07, 'completion_length': 62.30357551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02996826171875, 'epoch': 0.47} 47%|████▋ | 1163/2500 [4:28:35<5:29:01, 14.77s/it] 47%|████▋ | 1164/2500 [4:28:49<5:21:23, 14.43s/it] {'loss': 0.0025, 'grad_norm': 0.07763645369104229, 'learning_rate': 5.343999999999999e-07, 'completion_length': 54.53571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0623779296875, 'epoch': 0.47} 47%|████▋ | 1164/2500 [4:28:49<5:21:23, 14.43s/it] 47%|████▋ | 1165/2500 [4:29:04<5:25:39, 14.64s/it] {'loss': 0.0025, 'grad_norm': 0.06484656836521521, 'learning_rate': 5.34e-07, 'completion_length': 59.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0628662109375, 'epoch': 0.47} 47%|████▋ | 1165/2500 [4:29:04<5:25:39, 14.64s/it] 47%|████▋ | 1166/2500 [4:29:18<5:21:41, 14.47s/it] {'loss': 0.0012, 'grad_norm': 0.0767195321196231, 'learning_rate': 5.336e-07, 'completion_length': 50.80357360839844, 'rewards/accuracy_reward': 0.8571429252624512, 'rewards/format_reward': 1.0, 'reward': 1.8571429252624512, 'reward_std': 0.0, 'kl': 0.02984619140625, 'epoch': 0.47} 47%|████▋ | 1166/2500 [4:29:18<5:21:41, 14.47s/it] 47%|████▋ | 1167/2500 [4:29:31<5:14:43, 14.17s/it] {'loss': 0.0016, 'grad_norm': 0.1608030946176387, 'learning_rate': 5.331999999999999e-07, 'completion_length': 53.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0404052734375, 'epoch': 0.47} 47%|████▋ | 1167/2500 [4:29:31<5:14:43, 14.17s/it] 47%|████▋ | 1168/2500 [4:29:45<5:13:49, 14.14s/it] {'loss': 0.0019, 'grad_norm': 0.0900336072956898, 'learning_rate': 5.328e-07, 'completion_length': 55.535715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0484619140625, 'epoch': 0.47} 47%|████▋ | 1168/2500 [4:29:45<5:13:49, 14.14s/it] 47%|████▋ | 1169/2500 [4:29:59<5:12:37, 14.09s/it] {'loss': 0.0014, 'grad_norm': 0.10013069878371011, 'learning_rate': 5.324e-07, 'completion_length': 57.160715103149414, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03436279296875, 'epoch': 0.47} 47%|████▋ | 1169/2500 [4:29:59<5:12:37, 14.09s/it] 47%|████▋ | 1170/2500 [4:30:14<5:12:45, 14.11s/it] {'loss': 0.0008, 'grad_norm': 0.21138133630282815, 'learning_rate': 5.32e-07, 'completion_length': 57.48214530944824, 'rewards/accuracy_reward': 0.785714328289032, 'rewards/format_reward': 1.0, 'reward': 1.7857143878936768, 'reward_std': 0.0, 'kl': 0.019195556640625, 'epoch': 0.47} 47%|████▋ | 1170/2500 [4:30:14<5:12:45, 14.11s/it] 47%|████▋ | 1171/2500 [4:30:26<5:03:01, 13.68s/it] {'loss': 0.0008, 'grad_norm': 2.0128731021322053, 'learning_rate': 5.315999999999999e-07, 'completion_length': 51.28571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0201416015625, 'epoch': 0.47} 47%|████▋ | 1171/2500 [4:30:26<5:03:01, 13.68s/it] 47%|████▋ | 1172/2500 [4:30:40<5:01:50, 13.64s/it] {'loss': 0.0009, 'grad_norm': 0.1178782146516707, 'learning_rate': 5.312e-07, 'completion_length': 63.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02154541015625, 'epoch': 0.47} 47%|████▋ | 1172/2500 [4:30:40<5:01:50, 13.64s/it] 47%|████▋ | 1173/2500 [4:30:54<5:04:41, 13.78s/it] {'loss': 0.0011, 'grad_norm': 0.1069434523271501, 'learning_rate': 5.308000000000001e-07, 'completion_length': 56.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.47} 47%|████▋ | 1173/2500 [4:30:54<5:04:41, 13.78s/it] 47%|████▋ | 1174/2500 [4:31:07<5:03:25, 13.73s/it] {'loss': 0.0025, 'grad_norm': 0.09725199866167025, 'learning_rate': 5.303999999999999e-07, 'completion_length': 58.58928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0618896484375, 'epoch': 0.47} 47%|████▋ | 1174/2500 [4:31:07<5:03:25, 13.73s/it] 47%|████▋ | 1175/2500 [4:31:22<5:05:08, 13.82s/it] {'loss': 0.0013, 'grad_norm': 0.06136120064756287, 'learning_rate': 5.3e-07, 'completion_length': 56.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03265380859375, 'epoch': 0.47} 47%|████▋ | 1175/2500 [4:31:22<5:05:08, 13.82s/it] 47%|████▋ | 1176/2500 [4:31:36<5:11:03, 14.10s/it] {'loss': 0.001, 'grad_norm': 0.06923019297516281, 'learning_rate': 5.296e-07, 'completion_length': 60.01785850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.025390625, 'epoch': 0.47} 47%|████▋ | 1176/2500 [4:31:36<5:11:03, 14.10s/it] 47%|████▋ | 1177/2500 [4:31:49<5:04:32, 13.81s/it] {'loss': 0.0007, 'grad_norm': 0.09206679465744219, 'learning_rate': 5.292e-07, 'completion_length': 49.80357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.017059326171875, 'epoch': 0.47} 47%|████▋ | 1177/2500 [4:31:49<5:04:32, 13.81s/it] 47%|████▋ | 1178/2500 [4:32:03<4:59:47, 13.61s/it] {'loss': 0.0017, 'grad_norm': 0.10843390949913073, 'learning_rate': 5.288e-07, 'completion_length': 52.08928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0423583984375, 'epoch': 0.47} 47%|████▋ | 1178/2500 [4:32:03<4:59:47, 13.61s/it] 47%|████▋ | 1179/2500 [4:32:17<5:04:13, 13.82s/it] {'loss': 0.0013, 'grad_norm': 5.019854528801256, 'learning_rate': 5.284e-07, 'completion_length': 57.73214530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.03271484375, 'epoch': 0.47} 47%|████▋ | 1179/2500 [4:32:17<5:04:13, 13.82s/it] 47%|████▋ | 1180/2500 [4:32:31<5:03:57, 13.82s/it] {'loss': 0.0012, 'grad_norm': 0.08083143584917135, 'learning_rate': 5.28e-07, 'completion_length': 53.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02972412109375, 'epoch': 0.47} 47%|████▋ | 1180/2500 [4:32:31<5:03:57, 13.82s/it] 47%|████▋ | 1181/2500 [4:32:48<5:25:33, 14.81s/it] {'loss': 0.0007, 'grad_norm': 3.027220985526025, 'learning_rate': 5.275999999999999e-07, 'completion_length': 67.87500381469727, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0185546875, 'epoch': 0.47} 47%|████▋ | 1181/2500 [4:32:48<5:25:33, 14.81s/it] 47%|████▋ | 1182/2500 [4:33:02<5:20:10, 14.58s/it] {'loss': 0.001, 'grad_norm': 0.088289832147401, 'learning_rate': 5.272e-07, 'completion_length': 64.10714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024169921875, 'epoch': 0.47} 47%|████▋ | 1182/2500 [4:33:02<5:20:10, 14.58s/it] 47%|████▋ | 1183/2500 [4:33:19<5:37:00, 15.35s/it] {'loss': 0.0012, 'grad_norm': 0.12526708906420117, 'learning_rate': 5.268e-07, 'completion_length': 56.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0296630859375, 'epoch': 0.47} 47%|████▋ | 1183/2500 [4:33:19<5:37:00, 15.35s/it] 47%|████▋ | 1184/2500 [4:33:33<5:30:29, 15.07s/it] {'loss': 0.0011, 'grad_norm': 0.3462095326785716, 'learning_rate': 5.264e-07, 'completion_length': 62.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0263671875, 'epoch': 0.47} 47%|████▋ | 1184/2500 [4:33:33<5:30:29, 15.07s/it] 47%|████▋ | 1185/2500 [4:33:47<5:21:48, 14.68s/it] {'loss': 0.0016, 'grad_norm': 0.6154609678144236, 'learning_rate': 5.26e-07, 'completion_length': 50.98214530944824, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.9107143878936768, 'reward_std': 0.0357142873108387, 'kl': 0.03985595703125, 'epoch': 0.47} 47%|████▋ | 1185/2500 [4:33:47<5:21:48, 14.68s/it] 47%|████▋ | 1186/2500 [4:34:00<5:12:28, 14.27s/it] {'loss': 0.0009, 'grad_norm': 0.16894942590974515, 'learning_rate': 5.255999999999999e-07, 'completion_length': 55.92857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.021331787109375, 'epoch': 0.47} 47%|████▋ | 1186/2500 [4:34:00<5:12:28, 14.27s/it] 47%|████▋ | 1187/2500 [4:34:15<5:11:32, 14.24s/it] {'loss': 0.001, 'grad_norm': 0.055103426350011815, 'learning_rate': 5.252e-07, 'completion_length': 61.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0252685546875, 'epoch': 0.47} 47%|████▋ | 1187/2500 [4:34:15<5:11:32, 14.24s/it] 48%|████▊ | 1188/2500 [4:34:28<5:04:38, 13.93s/it] {'loss': 0.0011, 'grad_norm': 0.15687486062494835, 'learning_rate': 5.248e-07, 'completion_length': 57.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02642822265625, 'epoch': 0.48} 48%|████▊ | 1188/2500 [4:34:28<5:04:38, 13.93s/it] 48%|████▊ | 1189/2500 [4:34:43<5:12:35, 14.31s/it] {'loss': 0.0014, 'grad_norm': 0.065472114111981, 'learning_rate': 5.243999999999999e-07, 'completion_length': 66.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0341796875, 'epoch': 0.48} 48%|████▊ | 1189/2500 [4:34:43<5:12:35, 14.31s/it] 48%|████▊ | 1190/2500 [4:34:56<5:05:28, 13.99s/it] {'loss': 0.0008, 'grad_norm': 0.12024849546454426, 'learning_rate': 5.24e-07, 'completion_length': 53.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02081298828125, 'epoch': 0.48} 48%|████▊ | 1190/2500 [4:34:56<5:05:28, 13.99s/it] 48%|████▊ | 1191/2500 [4:35:11<5:10:22, 14.23s/it] {'loss': 0.0014, 'grad_norm': 3.9065794531558673, 'learning_rate': 5.236e-07, 'completion_length': 62.83928680419922, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0352783203125, 'epoch': 0.48} 48%|████▊ | 1191/2500 [4:35:11<5:10:22, 14.23s/it] 48%|████▊ | 1192/2500 [4:35:26<5:13:04, 14.36s/it] {'loss': 0.0009, 'grad_norm': 0.06900601990275612, 'learning_rate': 5.232e-07, 'completion_length': 62.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02252197265625, 'epoch': 0.48} 48%|████▊ | 1192/2500 [4:35:26<5:13:04, 14.36s/it] 48%|████▊ | 1193/2500 [4:35:41<5:20:49, 14.73s/it] {'loss': 0.0008, 'grad_norm': 0.06990491128504053, 'learning_rate': 5.228e-07, 'completion_length': 69.80357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02105712890625, 'epoch': 0.48} 48%|████▊ | 1193/2500 [4:35:41<5:20:49, 14.73s/it] 48%|████▊ | 1194/2500 [4:35:56<5:17:22, 14.58s/it] {'loss': 0.0006, 'grad_norm': 0.3310967732077769, 'learning_rate': 5.224e-07, 'completion_length': 63.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015472412109375, 'epoch': 0.48} 48%|████▊ | 1194/2500 [4:35:56<5:17:22, 14.58s/it] 48%|████▊ | 1195/2500 [4:36:09<5:08:26, 14.18s/it] {'loss': 0.0016, 'grad_norm': 0.08210995912231037, 'learning_rate': 5.22e-07, 'completion_length': 50.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03924560546875, 'epoch': 0.48} 48%|████▊ | 1195/2500 [4:36:09<5:08:26, 14.18s/it] 48%|████▊ | 1196/2500 [4:36:23<5:11:05, 14.31s/it] {'loss': 0.0008, 'grad_norm': 0.08821663903211016, 'learning_rate': 5.215999999999999e-07, 'completion_length': 64.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019866943359375, 'epoch': 0.48} 48%|████▊ | 1196/2500 [4:36:23<5:11:05, 14.31s/it] 48%|████▊ | 1197/2500 [4:36:38<5:09:56, 14.27s/it] {'loss': 0.0007, 'grad_norm': 0.07507377743838076, 'learning_rate': 5.212e-07, 'completion_length': 58.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018157958984375, 'epoch': 0.48} 48%|████▊ | 1197/2500 [4:36:38<5:09:56, 14.27s/it] 48%|████▊ | 1198/2500 [4:36:51<5:00:44, 13.86s/it] {'loss': 0.0009, 'grad_norm': 0.07305700927495502, 'learning_rate': 5.208000000000001e-07, 'completion_length': 46.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022613525390625, 'epoch': 0.48} 48%|████▊ | 1198/2500 [4:36:51<5:00:44, 13.86s/it] 48%|████▊ | 1199/2500 [4:37:06<5:10:45, 14.33s/it] {'loss': 0.0011, 'grad_norm': 0.09177055233811346, 'learning_rate': 5.203999999999999e-07, 'completion_length': 67.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02825927734375, 'epoch': 0.48} 48%|████▊ | 1199/2500 [4:37:06<5:10:45, 14.33s/it] 48%|████▊ | 1200/2500 [4:37:19<5:05:00, 14.08s/it] {'loss': 0.001, 'grad_norm': 0.05520908244213994, 'learning_rate': 5.2e-07, 'completion_length': 56.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0257568359375, 'epoch': 0.48} 48%|████▊ | 1200/2500 [4:37:19<5:05:00, 14.08s/it] 48%|████▊ | 1201/2500 [4:38:29<11:07:55, 30.85s/it] {'loss': 0.001, 'grad_norm': 0.07712001479312225, 'learning_rate': 5.196e-07, 'completion_length': 56.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02398681640625, 'epoch': 0.48} 48%|████▊ | 1201/2500 [4:38:29<11:07:55, 30.85s/it] 48%|████▊ | 1202/2500 [4:38:43<9:13:43, 25.60s/it] {'loss': 0.0013, 'grad_norm': 0.06898482581919352, 'learning_rate': 5.191999999999999e-07, 'completion_length': 54.58928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032379150390625, 'epoch': 0.48} 48%|████▊ | 1202/2500 [4:38:43<9:13:43, 25.60s/it] 48%|████▊ | 1203/2500 [4:38:57<8:01:08, 22.26s/it] {'loss': 0.0008, 'grad_norm': 0.0851237232269968, 'learning_rate': 5.188e-07, 'completion_length': 56.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02093505859375, 'epoch': 0.48} 48%|████▊ | 1203/2500 [4:38:57<8:01:08, 22.26s/it] 48%|████▊ | 1204/2500 [4:39:10<7:02:10, 19.54s/it] {'loss': 0.0011, 'grad_norm': 0.062398071880437146, 'learning_rate': 5.184e-07, 'completion_length': 56.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0281982421875, 'epoch': 0.48} 48%|████▊ | 1204/2500 [4:39:10<7:02:10, 19.54s/it] 48%|████▊ | 1205/2500 [4:39:25<6:30:50, 18.11s/it] {'loss': 0.0012, 'grad_norm': 0.1025169678900025, 'learning_rate': 5.18e-07, 'completion_length': 60.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03021240234375, 'epoch': 0.48} 48%|████▊ | 1205/2500 [4:39:25<6:30:50, 18.11s/it] 48%|████▊ | 1206/2500 [4:39:39<6:01:27, 16.76s/it] {'loss': 0.0012, 'grad_norm': 0.06603852992637521, 'learning_rate': 5.175999999999999e-07, 'completion_length': 55.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0289306640625, 'epoch': 0.48} 48%|████▊ | 1206/2500 [4:39:39<6:01:27, 16.76s/it] 48%|████▊ | 1207/2500 [4:39:52<5:40:03, 15.78s/it] {'loss': 0.001, 'grad_norm': 0.09064898484467794, 'learning_rate': 5.172e-07, 'completion_length': 51.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0245361328125, 'epoch': 0.48} 48%|████▊ | 1207/2500 [4:39:52<5:40:03, 15.78s/it] 48%|████▊ | 1208/2500 [4:40:07<5:35:14, 15.57s/it] {'loss': 0.0018, 'grad_norm': 0.09635691696739127, 'learning_rate': 5.168e-07, 'completion_length': 53.82143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0460205078125, 'epoch': 0.48} 48%|████▊ | 1208/2500 [4:40:07<5:35:14, 15.57s/it] 48%|████▊ | 1209/2500 [4:40:24<5:44:30, 16.01s/it] {'loss': 0.0016, 'grad_norm': 0.08139658581748337, 'learning_rate': 5.163999999999999e-07, 'completion_length': 63.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0401611328125, 'epoch': 0.48} 48%|████▊ | 1209/2500 [4:40:24<5:44:30, 16.01s/it] 48%|████▊ | 1210/2500 [4:40:39<5:35:33, 15.61s/it] {'loss': 0.0009, 'grad_norm': 1.634542431839511, 'learning_rate': 5.16e-07, 'completion_length': 62.26786231994629, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02142333984375, 'epoch': 0.48} 48%|████▊ | 1210/2500 [4:40:39<5:35:33, 15.61s/it] 48%|████▊ | 1211/2500 [4:40:54<5:33:09, 15.51s/it] {'loss': 0.0008, 'grad_norm': 1.8917430427706534, 'learning_rate': 5.155999999999999e-07, 'completion_length': 64.10714721679688, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.020355224609375, 'epoch': 0.48} 48%|████▊ | 1211/2500 [4:40:54<5:33:09, 15.51s/it] 48%|████▊ | 1212/2500 [4:41:08<5:24:02, 15.10s/it] {'loss': 0.0006, 'grad_norm': 0.16385112400819157, 'learning_rate': 5.152e-07, 'completion_length': 60.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01458740234375, 'epoch': 0.48} 48%|████▊ | 1212/2500 [4:41:08<5:24:02, 15.10s/it] 49%|████▊ | 1213/2500 [4:41:22<5:12:08, 14.55s/it] {'loss': 0.001, 'grad_norm': 0.12630208459986697, 'learning_rate': 5.148e-07, 'completion_length': 57.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0257568359375, 'epoch': 0.49} 49%|████▊ | 1213/2500 [4:41:22<5:12:08, 14.55s/it] 49%|████▊ | 1214/2500 [4:41:36<5:08:34, 14.40s/it] {'loss': 0.0013, 'grad_norm': 0.06675974070022238, 'learning_rate': 5.143999999999999e-07, 'completion_length': 54.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032958984375, 'epoch': 0.49} 49%|████▊ | 1214/2500 [4:41:36<5:08:34, 14.40s/it] 49%|████▊ | 1215/2500 [4:41:51<5:11:52, 14.56s/it] {'loss': 0.0015, 'grad_norm': 0.22040329023333702, 'learning_rate': 5.14e-07, 'completion_length': 52.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0379638671875, 'epoch': 0.49} 49%|████▊ | 1215/2500 [4:41:51<5:11:52, 14.56s/it] 49%|████▊ | 1216/2500 [4:42:06<5:18:20, 14.88s/it] {'loss': 0.0016, 'grad_norm': 0.08164451093113341, 'learning_rate': 5.135999999999999e-07, 'completion_length': 62.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0396728515625, 'epoch': 0.49} 49%|████▊ | 1216/2500 [4:42:06<5:18:20, 14.88s/it] 49%|████▊ | 1217/2500 [4:42:20<5:08:42, 14.44s/it] {'loss': 0.001, 'grad_norm': 0.0761016668450133, 'learning_rate': 5.132e-07, 'completion_length': 51.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02374267578125, 'epoch': 0.49} 49%|████▊ | 1217/2500 [4:42:20<5:08:42, 14.44s/it] 49%|████▊ | 1218/2500 [4:42:35<5:11:19, 14.57s/it] {'loss': 0.0015, 'grad_norm': 0.059546075249177106, 'learning_rate': 5.128e-07, 'completion_length': 59.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.036865234375, 'epoch': 0.49} 49%|████▊ | 1218/2500 [4:42:35<5:11:19, 14.57s/it] 49%|████▉ | 1219/2500 [4:42:48<5:02:52, 14.19s/it] {'loss': 0.0007, 'grad_norm': 0.05240382806229155, 'learning_rate': 5.124e-07, 'completion_length': 55.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018707275390625, 'epoch': 0.49} 49%|████▉ | 1219/2500 [4:42:48<5:02:52, 14.19s/it] 49%|████▉ | 1220/2500 [4:43:03<5:06:27, 14.37s/it] {'loss': 0.0012, 'grad_norm': 0.09496025907447732, 'learning_rate': 5.12e-07, 'completion_length': 57.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0289306640625, 'epoch': 0.49} 49%|████▉ | 1220/2500 [4:43:03<5:06:27, 14.37s/it] 49%|████▉ | 1221/2500 [4:43:17<5:03:17, 14.23s/it] {'loss': 0.0012, 'grad_norm': 0.05658260286029252, 'learning_rate': 5.116e-07, 'completion_length': 56.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03057861328125, 'epoch': 0.49} 49%|████▉ | 1221/2500 [4:43:17<5:03:17, 14.23s/it] 49%|████▉ | 1222/2500 [4:43:30<4:59:09, 14.05s/it] {'loss': 0.0005, 'grad_norm': 0.04461136831723344, 'learning_rate': 5.112e-07, 'completion_length': 54.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01214599609375, 'epoch': 0.49} 49%|████▉ | 1222/2500 [4:43:30<4:59:09, 14.05s/it] 49%|████▉ | 1223/2500 [4:43:44<4:57:49, 13.99s/it] {'loss': 0.0009, 'grad_norm': 0.27961853655678665, 'learning_rate': 5.108e-07, 'completion_length': 55.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02325439453125, 'epoch': 0.49} 49%|████▉ | 1223/2500 [4:43:44<4:57:49, 13.99s/it] 49%|████▉ | 1224/2500 [4:43:58<4:57:18, 13.98s/it] {'loss': 0.0021, 'grad_norm': 0.10321964559828331, 'learning_rate': 5.103999999999999e-07, 'completion_length': 58.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.051513671875, 'epoch': 0.49} 49%|████▉ | 1224/2500 [4:43:58<4:57:18, 13.98s/it] 49%|████▉ | 1225/2500 [4:44:12<4:55:16, 13.89s/it] {'loss': 0.0011, 'grad_norm': 0.16049919969558823, 'learning_rate': 5.1e-07, 'completion_length': 52.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026611328125, 'epoch': 0.49} 49%|████▉ | 1225/2500 [4:44:12<4:55:16, 13.89s/it] 49%|████▉ | 1226/2500 [4:44:26<4:55:06, 13.90s/it] {'loss': 0.002, 'grad_norm': 2.377881543171553, 'learning_rate': 5.096000000000001e-07, 'completion_length': 65.33929061889648, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0509033203125, 'epoch': 0.49} 49%|████▉ | 1226/2500 [4:44:26<4:55:06, 13.90s/it] 49%|████▉ | 1227/2500 [4:44:41<5:05:14, 14.39s/it] {'loss': 0.0013, 'grad_norm': 0.09539688628118084, 'learning_rate': 5.091999999999999e-07, 'completion_length': 67.37500381469727, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0323486328125, 'epoch': 0.49} 49%|████▉ | 1227/2500 [4:44:41<5:05:14, 14.39s/it] 49%|████▉ | 1228/2500 [4:44:54<4:56:01, 13.96s/it] {'loss': 0.0014, 'grad_norm': 0.2778776690675871, 'learning_rate': 5.088e-07, 'completion_length': 49.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03399658203125, 'epoch': 0.49} 49%|████▉ | 1228/2500 [4:44:54<4:56:01, 13.96s/it] 49%|████▉ | 1229/2500 [4:45:09<4:59:43, 14.15s/it] {'loss': 0.0013, 'grad_norm': 0.07411422042337755, 'learning_rate': 5.084e-07, 'completion_length': 60.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.033447265625, 'epoch': 0.49} 49%|████▉ | 1229/2500 [4:45:09<4:59:43, 14.15s/it] 49%|████▉ | 1230/2500 [4:45:22<4:55:27, 13.96s/it] {'loss': 0.0009, 'grad_norm': 0.08267210981556239, 'learning_rate': 5.079999999999999e-07, 'completion_length': 53.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021484375, 'epoch': 0.49} 49%|████▉ | 1230/2500 [4:45:22<4:55:27, 13.96s/it] 49%|████▉ | 1231/2500 [4:45:36<4:53:40, 13.89s/it] {'loss': 0.0019, 'grad_norm': 0.10384490871539812, 'learning_rate': 5.076e-07, 'completion_length': 56.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04632568359375, 'epoch': 0.49} 49%|████▉ | 1231/2500 [4:45:36<4:53:40, 13.89s/it] 49%|████▉ | 1232/2500 [4:45:50<4:57:08, 14.06s/it] {'loss': 0.001, 'grad_norm': 1.454312639611242, 'learning_rate': 5.072e-07, 'completion_length': 65.33928680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02581787109375, 'epoch': 0.49} 49%|████▉ | 1232/2500 [4:45:50<4:57:08, 14.06s/it] 49%|████▉ | 1233/2500 [4:46:04<4:56:18, 14.03s/it] {'loss': 0.001, 'grad_norm': 0.07772628335301063, 'learning_rate': 5.068e-07, 'completion_length': 57.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02484130859375, 'epoch': 0.49} 49%|████▉ | 1233/2500 [4:46:04<4:56:18, 14.03s/it] 49%|████▉ | 1234/2500 [4:46:18<4:53:57, 13.93s/it] {'loss': 0.0006, 'grad_norm': 0.07230069691145963, 'learning_rate': 5.063999999999999e-07, 'completion_length': 60.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015838623046875, 'epoch': 0.49} 49%|████▉ | 1234/2500 [4:46:18<4:53:57, 13.93s/it] 49%|████▉ | 1235/2500 [4:46:32<4:50:46, 13.79s/it] {'loss': 0.0008, 'grad_norm': 0.07421048348065654, 'learning_rate': 5.06e-07, 'completion_length': 54.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0203094482421875, 'epoch': 0.49} 49%|████▉ | 1235/2500 [4:46:32<4:50:46, 13.79s/it] 49%|████▉ | 1236/2500 [4:46:46<4:56:01, 14.05s/it] {'loss': 0.0007, 'grad_norm': 0.057747531696582526, 'learning_rate': 5.056e-07, 'completion_length': 56.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0174560546875, 'epoch': 0.49} 49%|████▉ | 1236/2500 [4:46:46<4:56:01, 14.05s/it] 49%|████▉ | 1237/2500 [4:47:01<5:00:29, 14.28s/it] {'loss': 0.0013, 'grad_norm': 2.126153617220244, 'learning_rate': 5.051999999999999e-07, 'completion_length': 51.50000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03350830078125, 'epoch': 0.49} 49%|████▉ | 1237/2500 [4:47:01<5:00:29, 14.28s/it] 50%|████▉ | 1238/2500 [4:47:15<4:57:57, 14.17s/it] {'loss': 0.0017, 'grad_norm': 0.1193513129487796, 'learning_rate': 5.048e-07, 'completion_length': 56.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04266357421875, 'epoch': 0.5} 50%|████▉ | 1238/2500 [4:47:15<4:57:57, 14.17s/it] 50%|████▉ | 1239/2500 [4:47:29<4:57:16, 14.15s/it] {'loss': 0.0014, 'grad_norm': 0.08573795597866789, 'learning_rate': 5.043999999999999e-07, 'completion_length': 61.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03509521484375, 'epoch': 0.5} 50%|████▉ | 1239/2500 [4:47:29<4:57:16, 14.15s/it] 50%|████▉ | 1240/2500 [4:47:43<4:58:31, 14.22s/it] {'loss': 0.001, 'grad_norm': 0.08753404471142066, 'learning_rate': 5.04e-07, 'completion_length': 58.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024505615234375, 'epoch': 0.5} 50%|████▉ | 1240/2500 [4:47:43<4:58:31, 14.22s/it] 50%|████▉ | 1241/2500 [4:47:58<5:01:48, 14.38s/it] {'loss': 0.0014, 'grad_norm': 0.059670987056316495, 'learning_rate': 5.036e-07, 'completion_length': 63.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03460693359375, 'epoch': 0.5} 50%|████▉ | 1241/2500 [4:47:58<5:01:48, 14.38s/it] 50%|████▉ | 1242/2500 [4:48:13<5:03:24, 14.47s/it] {'loss': 0.0014, 'grad_norm': 0.08486062944157581, 'learning_rate': 5.032e-07, 'completion_length': 57.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0360107421875, 'epoch': 0.5} 50%|████▉ | 1242/2500 [4:48:13<5:03:24, 14.47s/it] 50%|████▉ | 1243/2500 [4:48:28<5:09:43, 14.78s/it] {'loss': 0.0014, 'grad_norm': 0.07450781406413898, 'learning_rate': 5.028e-07, 'completion_length': 62.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03558349609375, 'epoch': 0.5} 50%|████▉ | 1243/2500 [4:48:28<5:09:43, 14.78s/it] 50%|████▉ | 1244/2500 [4:48:44<5:13:18, 14.97s/it] {'loss': 0.0008, 'grad_norm': 0.09722827610598464, 'learning_rate': 5.023999999999999e-07, 'completion_length': 57.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0189208984375, 'epoch': 0.5} 50%|████▉ | 1244/2500 [4:48:44<5:13:18, 14.97s/it] 50%|████▉ | 1245/2500 [4:48:58<5:07:49, 14.72s/it] {'loss': 0.0013, 'grad_norm': 0.06559772109641314, 'learning_rate': 5.02e-07, 'completion_length': 61.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03253173828125, 'epoch': 0.5} 50%|████▉ | 1245/2500 [4:48:58<5:07:49, 14.72s/it] 50%|████▉ | 1246/2500 [4:49:12<5:03:55, 14.54s/it] {'loss': 0.0009, 'grad_norm': 0.06512415670077544, 'learning_rate': 5.016e-07, 'completion_length': 61.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02227783203125, 'epoch': 0.5} 50%|████▉ | 1246/2500 [4:49:12<5:03:55, 14.54s/it] 50%|████▉ | 1247/2500 [4:49:27<5:05:13, 14.62s/it] {'loss': 0.0013, 'grad_norm': 0.05997386189983098, 'learning_rate': 5.012e-07, 'completion_length': 60.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03302001953125, 'epoch': 0.5} 50%|████▉ | 1247/2500 [4:49:27<5:05:13, 14.62s/it] 50%|████▉ | 1248/2500 [4:49:44<5:19:17, 15.30s/it] {'loss': 0.0009, 'grad_norm': 0.062341967607471634, 'learning_rate': 5.008e-07, 'completion_length': 80.875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021942138671875, 'epoch': 0.5} 50%|████▉ | 1248/2500 [4:49:44<5:19:17, 15.30s/it] 50%|████▉ | 1249/2500 [4:49:58<5:09:21, 14.84s/it] {'loss': 0.0011, 'grad_norm': 0.12290909556091553, 'learning_rate': 5.003999999999999e-07, 'completion_length': 57.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0279541015625, 'epoch': 0.5} 50%|████▉ | 1249/2500 [4:49:58<5:09:21, 14.84s/it] 50%|█████ | 1250/2500 [4:50:12<5:05:07, 14.65s/it] {'loss': 0.0008, 'grad_norm': 0.10148436450654078, 'learning_rate': 5e-07, 'completion_length': 56.000003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01934814453125, 'epoch': 0.5} 50%|█████ | 1250/2500 [4:50:12<5:05:07, 14.65s/it] 50%|█████ | 1251/2500 [4:50:26<5:00:38, 14.44s/it] {'loss': 0.0008, 'grad_norm': 0.1119135609077519, 'learning_rate': 4.996e-07, 'completion_length': 54.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02099609375, 'epoch': 0.5} 50%|█████ | 1251/2500 [4:50:26<5:00:38, 14.44s/it] 50%|█████ | 1252/2500 [4:50:41<5:05:15, 14.68s/it] {'loss': 0.002, 'grad_norm': 0.06506014396428239, 'learning_rate': 4.991999999999999e-07, 'completion_length': 59.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.049072265625, 'epoch': 0.5} 50%|█████ | 1252/2500 [4:50:41<5:05:15, 14.68s/it] 50%|█████ | 1253/2500 [4:50:55<4:59:54, 14.43s/it] {'loss': 0.0011, 'grad_norm': 0.07144196377528385, 'learning_rate': 4.988e-07, 'completion_length': 54.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0267333984375, 'epoch': 0.5} 50%|█████ | 1253/2500 [4:50:55<4:59:54, 14.43s/it] 50%|█████ | 1254/2500 [4:51:09<4:57:23, 14.32s/it] {'loss': 0.0013, 'grad_norm': 0.08170154639846944, 'learning_rate': 4.984e-07, 'completion_length': 61.875003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0323486328125, 'epoch': 0.5} 50%|█████ | 1254/2500 [4:51:09<4:57:23, 14.32s/it] 50%|█████ | 1255/2500 [4:51:23<4:58:39, 14.39s/it] {'loss': 0.0016, 'grad_norm': 0.1359954951995133, 'learning_rate': 4.979999999999999e-07, 'completion_length': 60.53571891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.040283203125, 'epoch': 0.5} 50%|█████ | 1255/2500 [4:51:23<4:58:39, 14.39s/it] 50%|█████ | 1256/2500 [4:51:38<5:02:17, 14.58s/it] {'loss': 0.0008, 'grad_norm': 0.06308614431152386, 'learning_rate': 4.976e-07, 'completion_length': 67.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02056884765625, 'epoch': 0.5} 50%|█████ | 1256/2500 [4:51:38<5:02:17, 14.58s/it] 50%|█████ | 1257/2500 [4:51:53<4:59:53, 14.48s/it] {'loss': 0.0012, 'grad_norm': 0.057697543547825254, 'learning_rate': 4.972e-07, 'completion_length': 61.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02978515625, 'epoch': 0.5} 50%|█████ | 1257/2500 [4:51:53<4:59:53, 14.48s/it] 50%|█████ | 1258/2500 [4:52:10<5:19:33, 15.44s/it] {'loss': 0.0009, 'grad_norm': 0.06397766334102821, 'learning_rate': 4.968e-07, 'completion_length': 69.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02325439453125, 'epoch': 0.5} 50%|█████ | 1258/2500 [4:52:10<5:19:33, 15.44s/it] 50%|█████ | 1259/2500 [4:52:25<5:13:40, 15.17s/it] {'loss': 0.0022, 'grad_norm': 0.05188454612148169, 'learning_rate': 4.964e-07, 'completion_length': 66.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05419921875, 'epoch': 0.5} 50%|█████ | 1259/2500 [4:52:25<5:13:40, 15.17s/it] 50%|█████ | 1260/2500 [4:52:39<5:07:12, 14.87s/it] {'loss': 0.0017, 'grad_norm': 0.06364013627220128, 'learning_rate': 4.96e-07, 'completion_length': 62.32143211364746, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0426025390625, 'epoch': 0.5} 50%|█████ | 1260/2500 [4:52:39<5:07:12, 14.87s/it] 50%|█████ | 1261/2500 [4:52:54<5:06:01, 14.82s/it] {'loss': 0.0012, 'grad_norm': 0.10104954140781237, 'learning_rate': 4.956e-07, 'completion_length': 55.05357551574707, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02880859375, 'epoch': 0.5} 50%|█████ | 1261/2500 [4:52:54<5:06:01, 14.82s/it] 50%|█████ | 1262/2500 [4:53:07<4:55:11, 14.31s/it] {'loss': 0.0006, 'grad_norm': 0.05828906659228899, 'learning_rate': 4.951999999999999e-07, 'completion_length': 59.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01385498046875, 'epoch': 0.5} 50%|█████ | 1262/2500 [4:53:07<4:55:11, 14.31s/it] 51%|█████ | 1263/2500 [4:53:21<4:55:52, 14.35s/it] {'loss': 0.0011, 'grad_norm': 0.1086066895877018, 'learning_rate': 4.948e-07, 'completion_length': 60.232147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02874755859375, 'epoch': 0.51} 51%|█████ | 1263/2500 [4:53:21<4:55:52, 14.35s/it] 51%|█████ | 1264/2500 [4:53:35<4:52:51, 14.22s/it] {'loss': 0.0011, 'grad_norm': 0.0730510840180659, 'learning_rate': 4.944e-07, 'completion_length': 58.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02813720703125, 'epoch': 0.51} 51%|█████ | 1264/2500 [4:53:35<4:52:51, 14.22s/it] 51%|█████ | 1265/2500 [4:53:49<4:50:07, 14.10s/it] {'loss': 0.0013, 'grad_norm': 0.09108015300564319, 'learning_rate': 4.94e-07, 'completion_length': 51.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03131103515625, 'epoch': 0.51} 51%|█████ | 1265/2500 [4:53:49<4:50:07, 14.10s/it] 51%|█████ | 1266/2500 [4:54:03<4:49:22, 14.07s/it] {'loss': 0.0015, 'grad_norm': 0.07964777173466067, 'learning_rate': 4.935999999999999e-07, 'completion_length': 59.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0367431640625, 'epoch': 0.51} 51%|█████ | 1266/2500 [4:54:03<4:49:22, 14.07s/it] 51%|█████ | 1267/2500 [4:54:17<4:48:59, 14.06s/it] {'loss': 0.0015, 'grad_norm': 0.050262939011315365, 'learning_rate': 4.932e-07, 'completion_length': 55.28571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.51} 51%|█████ | 1267/2500 [4:54:17<4:48:59, 14.06s/it] 51%|█████ | 1268/2500 [4:54:32<4:52:19, 14.24s/it] {'loss': 0.0011, 'grad_norm': 0.06473857497981247, 'learning_rate': 4.928e-07, 'completion_length': 59.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0277099609375, 'epoch': 0.51} 51%|█████ | 1268/2500 [4:54:32<4:52:19, 14.24s/it] 51%|█████ | 1269/2500 [4:54:45<4:47:33, 14.02s/it] {'loss': 0.0011, 'grad_norm': 0.05820804606033818, 'learning_rate': 4.923999999999999e-07, 'completion_length': 56.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027587890625, 'epoch': 0.51} 51%|█████ | 1269/2500 [4:54:45<4:47:33, 14.02s/it] 51%|█████ | 1270/2500 [4:54:59<4:48:05, 14.05s/it] {'loss': 0.0016, 'grad_norm': 0.07686515899114664, 'learning_rate': 4.92e-07, 'completion_length': 48.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.041015625, 'epoch': 0.51} 51%|█████ | 1270/2500 [4:54:59<4:48:05, 14.05s/it] 51%|█████ | 1271/2500 [4:55:14<4:50:24, 14.18s/it] {'loss': 0.0011, 'grad_norm': 0.08169127265096225, 'learning_rate': 4.916e-07, 'completion_length': 63.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02825927734375, 'epoch': 0.51} 51%|█████ | 1271/2500 [4:55:14<4:50:24, 14.18s/it] 51%|█████ | 1272/2500 [4:55:28<4:49:18, 14.14s/it] {'loss': 0.001, 'grad_norm': 0.04684053869216014, 'learning_rate': 4.912e-07, 'completion_length': 56.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02581787109375, 'epoch': 0.51} 51%|█████ | 1272/2500 [4:55:28<4:49:18, 14.14s/it] 51%|█████ | 1273/2500 [4:55:41<4:45:28, 13.96s/it] {'loss': 0.0005, 'grad_norm': 0.3935971574462371, 'learning_rate': 4.908e-07, 'completion_length': 51.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01165771484375, 'epoch': 0.51} 51%|█████ | 1273/2500 [4:55:41<4:45:28, 13.96s/it] 51%|█████ | 1274/2500 [4:55:55<4:41:34, 13.78s/it] {'loss': 0.0011, 'grad_norm': 0.06864470125756375, 'learning_rate': 4.904e-07, 'completion_length': 54.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02764892578125, 'epoch': 0.51} 51%|█████ | 1274/2500 [4:55:55<4:41:34, 13.78s/it] 51%|█████ | 1275/2500 [4:56:08<4:39:48, 13.71s/it] {'loss': 0.0015, 'grad_norm': 0.11871525540639356, 'learning_rate': 4.9e-07, 'completion_length': 58.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03717041015625, 'epoch': 0.51} 51%|█████ | 1275/2500 [4:56:08<4:39:48, 13.71s/it] 51%|█████ | 1276/2500 [4:56:22<4:39:20, 13.69s/it] {'loss': 0.0008, 'grad_norm': 0.054886907598228354, 'learning_rate': 4.895999999999999e-07, 'completion_length': 56.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02020263671875, 'epoch': 0.51} 51%|█████ | 1276/2500 [4:56:22<4:39:20, 13.69s/it] 51%|█████ | 1277/2500 [4:56:36<4:41:51, 13.83s/it] {'loss': 0.0018, 'grad_norm': 1.1291862010229259, 'learning_rate': 4.892e-07, 'completion_length': 58.62500190734863, 'rewards/accuracy_reward': 0.9107142984867096, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.0357142873108387, 'kl': 0.04541015625, 'epoch': 0.51} 51%|█████ | 1277/2500 [4:56:36<4:41:51, 13.83s/it] 51%|█████ | 1278/2500 [4:56:50<4:44:07, 13.95s/it] {'loss': 0.0017, 'grad_norm': 0.0865071710519647, 'learning_rate': 4.888e-07, 'completion_length': 52.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04302978515625, 'epoch': 0.51} 51%|█████ | 1278/2500 [4:56:50<4:44:07, 13.95s/it] 51%|█████ | 1279/2500 [4:57:09<5:12:08, 15.34s/it] {'loss': 0.0003, 'grad_norm': 0.07007732694847515, 'learning_rate': 4.884e-07, 'completion_length': 73.08928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0085296630859375, 'epoch': 0.51} 51%|█████ | 1279/2500 [4:57:09<5:12:08, 15.34s/it] 51%|█████ | 1280/2500 [4:57:23<5:04:28, 14.97s/it] {'loss': 0.0016, 'grad_norm': 0.12371690882301792, 'learning_rate': 4.879999999999999e-07, 'completion_length': 58.21428680419922, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0401611328125, 'epoch': 0.51} 51%|█████ | 1280/2500 [4:57:23<5:04:28, 14.97s/it] 51%|█████ | 1281/2500 [4:57:40<5:17:27, 15.63s/it] {'loss': 0.0012, 'grad_norm': 0.08344288585995435, 'learning_rate': 4.876e-07, 'completion_length': 62.39285850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03106689453125, 'epoch': 0.51} 51%|█████ | 1281/2500 [4:57:40<5:17:27, 15.63s/it] 51%|█████▏ | 1282/2500 [4:58:01<5:49:56, 17.24s/it] {'loss': 0.0016, 'grad_norm': 2.149436360522774, 'learning_rate': 4.872e-07, 'completion_length': 67.91071701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.946428656578064, 'reward_std': 0.1071428619325161, 'kl': 0.04010009765625, 'epoch': 0.51} 51%|█████▏ | 1282/2500 [4:58:01<5:49:56, 17.24s/it] 51%|█████▏ | 1283/2500 [4:58:17<5:41:36, 16.84s/it] {'loss': 0.0018, 'grad_norm': 0.0940047315797955, 'learning_rate': 4.867999999999999e-07, 'completion_length': 65.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04541015625, 'epoch': 0.51} 51%|█████▏ | 1283/2500 [4:58:17<5:41:36, 16.84s/it] 51%|█████▏ | 1284/2500 [4:58:31<5:21:23, 15.86s/it] {'loss': 0.0007, 'grad_norm': 0.6241949328879449, 'learning_rate': 4.864e-07, 'completion_length': 54.25000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.017578125, 'epoch': 0.51} 51%|█████▏ | 1284/2500 [4:58:31<5:21:23, 15.86s/it] 51%|█████▏ | 1285/2500 [4:58:45<5:10:07, 15.32s/it] {'loss': 0.0011, 'grad_norm': 0.08523210817822945, 'learning_rate': 4.86e-07, 'completion_length': 54.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0274658203125, 'epoch': 0.51} 51%|█████▏ | 1285/2500 [4:58:45<5:10:07, 15.32s/it] 51%|█████▏ | 1286/2500 [4:58:59<5:02:55, 14.97s/it] {'loss': 0.0011, 'grad_norm': 0.07504790192680494, 'learning_rate': 4.856e-07, 'completion_length': 58.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02630615234375, 'epoch': 0.51} 51%|█████▏ | 1286/2500 [4:58:59<5:02:55, 14.97s/it] 51%|█████▏ | 1287/2500 [4:59:13<4:57:53, 14.74s/it] {'loss': 0.0019, 'grad_norm': 0.20298688401925918, 'learning_rate': 4.852e-07, 'completion_length': 59.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0482177734375, 'epoch': 0.51} 51%|█████▏ | 1287/2500 [4:59:13<4:57:53, 14.74s/it] 52%|█████▏ | 1288/2500 [4:59:27<4:50:55, 14.40s/it] {'loss': 0.0017, 'grad_norm': 0.06185487329489537, 'learning_rate': 4.848e-07, 'completion_length': 59.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04296875, 'epoch': 0.52} 52%|█████▏ | 1288/2500 [4:59:27<4:50:55, 14.40s/it] 52%|█████▏ | 1289/2500 [4:59:40<4:43:34, 14.05s/it] {'loss': 0.0019, 'grad_norm': 0.08477246162861511, 'learning_rate': 4.844e-07, 'completion_length': 56.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.046142578125, 'epoch': 0.52} 52%|█████▏ | 1289/2500 [4:59:40<4:43:34, 14.05s/it] 52%|█████▏ | 1290/2500 [4:59:54<4:42:00, 13.98s/it] {'loss': 0.0013, 'grad_norm': 0.06737727625040588, 'learning_rate': 4.839999999999999e-07, 'completion_length': 59.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.033599853515625, 'epoch': 0.52} 52%|█████▏ | 1290/2500 [4:59:54<4:42:00, 13.98s/it] 52%|█████▏ | 1291/2500 [5:00:08<4:41:23, 13.97s/it] {'loss': 0.0015, 'grad_norm': 0.25661309837324703, 'learning_rate': 4.835999999999999e-07, 'completion_length': 55.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0374755859375, 'epoch': 0.52} 52%|█████▏ | 1291/2500 [5:00:08<4:41:23, 13.97s/it] 52%|█████▏ | 1292/2500 [5:00:24<4:54:50, 14.64s/it] {'loss': 0.0014, 'grad_norm': 0.07521841876926806, 'learning_rate': 4.832e-07, 'completion_length': 62.05357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0350341796875, 'epoch': 0.52} 52%|█████▏ | 1292/2500 [5:00:24<4:54:50, 14.64s/it] 52%|█████▏ | 1293/2500 [5:00:38<4:51:14, 14.48s/it] {'loss': 0.0012, 'grad_norm': 0.10110370595010419, 'learning_rate': 4.828e-07, 'completion_length': 56.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02880859375, 'epoch': 0.52} 52%|█████▏ | 1293/2500 [5:00:38<4:51:14, 14.48s/it] 52%|█████▏ | 1294/2500 [5:00:52<4:47:38, 14.31s/it] {'loss': 0.0016, 'grad_norm': 0.07223508441769673, 'learning_rate': 4.823999999999999e-07, 'completion_length': 60.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03997802734375, 'epoch': 0.52} 52%|█████▏ | 1294/2500 [5:00:52<4:47:38, 14.31s/it] 52%|█████▏ | 1295/2500 [5:01:11<5:14:05, 15.64s/it] {'loss': 0.0017, 'grad_norm': 0.5649710831478465, 'learning_rate': 4.82e-07, 'completion_length': 62.92857551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.04229736328125, 'epoch': 0.52} 52%|█████▏ | 1295/2500 [5:01:11<5:14:05, 15.64s/it] 52%|█████▏ | 1296/2500 [5:01:25<5:03:27, 15.12s/it] {'loss': 0.0011, 'grad_norm': 0.2341961306127726, 'learning_rate': 4.816e-07, 'completion_length': 52.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027008056640625, 'epoch': 0.52} 52%|█████▏ | 1296/2500 [5:01:25<5:03:27, 15.12s/it] 52%|█████▏ | 1297/2500 [5:01:38<4:54:20, 14.68s/it] {'loss': 0.0011, 'grad_norm': 0.10142822524210023, 'learning_rate': 4.812e-07, 'completion_length': 49.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027313232421875, 'epoch': 0.52} 52%|█████▏ | 1297/2500 [5:01:38<4:54:20, 14.68s/it] 52%|█████▏ | 1298/2500 [5:01:52<4:47:43, 14.36s/it] {'loss': 0.0003, 'grad_norm': 0.08105894037818982, 'learning_rate': 4.808e-07, 'completion_length': 58.10714530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0083770751953125, 'epoch': 0.52} 52%|█████▏ | 1298/2500 [5:01:52<4:47:43, 14.36s/it] 52%|█████▏ | 1299/2500 [5:02:06<4:43:27, 14.16s/it] {'loss': 0.0008, 'grad_norm': 0.07995738856937906, 'learning_rate': 4.804e-07, 'completion_length': 51.91071701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0196533203125, 'epoch': 0.52} 52%|█████▏ | 1299/2500 [5:02:06<4:43:27, 14.16s/it] 52%|█████▏ | 1300/2500 [5:02:20<4:45:17, 14.26s/it] {'loss': 0.0013, 'grad_norm': 0.08294850538241258, 'learning_rate': 4.8e-07, 'completion_length': 57.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.52} 52%|█████▏ | 1300/2500 [5:02:20<4:45:17, 14.26s/it] 52%|█████▏ | 1301/2500 [5:03:28<10:08:51, 30.47s/it] {'loss': 0.0009, 'grad_norm': 0.07120367015385819, 'learning_rate': 4.796e-07, 'completion_length': 51.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0225830078125, 'epoch': 0.52} 52%|█████▏ | 1301/2500 [5:03:28<10:08:51, 30.47s/it] 52%|█████▏ | 1302/2500 [5:03:43<8:35:25, 25.81s/it] {'loss': 0.0008, 'grad_norm': 0.05450004568440091, 'learning_rate': 4.792e-07, 'completion_length': 59.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02001953125, 'epoch': 0.52} 52%|█████▏ | 1302/2500 [5:03:43<8:35:25, 25.81s/it] 52%|█████▏ | 1303/2500 [5:03:57<7:21:19, 22.12s/it] {'loss': 0.0014, 'grad_norm': 0.07515084135944376, 'learning_rate': 4.788e-07, 'completion_length': 53.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0361328125, 'epoch': 0.52} 52%|█████▏ | 1303/2500 [5:03:57<7:21:19, 22.12s/it] 52%|█████▏ | 1304/2500 [5:04:11<6:33:26, 19.74s/it] {'loss': 0.0013, 'grad_norm': 0.12984334430473485, 'learning_rate': 4.783999999999999e-07, 'completion_length': 61.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0330810546875, 'epoch': 0.52} 52%|█████▏ | 1304/2500 [5:04:11<6:33:26, 19.74s/it] 52%|█████▏ | 1305/2500 [5:04:25<6:00:02, 18.08s/it] {'loss': 0.0009, 'grad_norm': 0.09992092114381086, 'learning_rate': 4.779999999999999e-07, 'completion_length': 59.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022003173828125, 'epoch': 0.52} 52%|█████▏ | 1305/2500 [5:04:25<6:00:02, 18.08s/it] 52%|█████▏ | 1306/2500 [5:04:39<5:33:01, 16.74s/it] {'loss': 0.0011, 'grad_norm': 0.08595094023576619, 'learning_rate': 4.776e-07, 'completion_length': 56.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02783203125, 'epoch': 0.52} 52%|█████▏ | 1306/2500 [5:04:39<5:33:01, 16.74s/it] 52%|█████▏ | 1307/2500 [5:04:52<5:13:07, 15.75s/it] {'loss': 0.0005, 'grad_norm': 0.05172644295076984, 'learning_rate': 4.772e-07, 'completion_length': 53.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0121002197265625, 'epoch': 0.52} 52%|█████▏ | 1307/2500 [5:04:52<5:13:07, 15.75s/it] 52%|█████▏ | 1308/2500 [5:05:07<5:08:54, 15.55s/it] {'loss': 0.0008, 'grad_norm': 0.061403007216573935, 'learning_rate': 4.768e-07, 'completion_length': 62.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0205078125, 'epoch': 0.52} 52%|█████▏ | 1308/2500 [5:05:07<5:08:54, 15.55s/it] 52%|█████▏ | 1309/2500 [5:05:23<5:08:02, 15.52s/it] {'loss': 0.001, 'grad_norm': 2.3213959894304366, 'learning_rate': 4.7639999999999995e-07, 'completion_length': 60.46428871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02545166015625, 'epoch': 0.52} 52%|█████▏ | 1309/2500 [5:05:23<5:08:02, 15.52s/it] 52%|█████▏ | 1310/2500 [5:05:37<5:01:41, 15.21s/it] {'loss': 0.0005, 'grad_norm': 0.10488334392967635, 'learning_rate': 4.76e-07, 'completion_length': 56.875003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.011810302734375, 'epoch': 0.52} 52%|█████▏ | 1310/2500 [5:05:37<5:01:41, 15.21s/it] 52%|█████▏ | 1311/2500 [5:05:51<4:50:38, 14.67s/it] {'loss': 0.0009, 'grad_norm': 0.16634100643305777, 'learning_rate': 4.756e-07, 'completion_length': 51.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021484375, 'epoch': 0.52} 52%|█████▏ | 1311/2500 [5:05:51<4:50:38, 14.67s/it] 52%|█████▏ | 1312/2500 [5:06:04<4:44:32, 14.37s/it] {'loss': 0.0011, 'grad_norm': 0.05109812267885866, 'learning_rate': 4.7519999999999997e-07, 'completion_length': 53.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02642822265625, 'epoch': 0.52} 52%|█████▏ | 1312/2500 [5:06:04<4:44:32, 14.37s/it] 53%|█████▎ | 1313/2500 [5:06:19<4:43:39, 14.34s/it] {'loss': 0.001, 'grad_norm': 0.055874379770244294, 'learning_rate': 4.748e-07, 'completion_length': 60.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02532958984375, 'epoch': 0.53} 53%|█████▎ | 1313/2500 [5:06:19<4:43:39, 14.34s/it] 53%|█████▎ | 1314/2500 [5:06:32<4:38:56, 14.11s/it] {'loss': 0.0012, 'grad_norm': 0.05804969852972209, 'learning_rate': 4.7439999999999996e-07, 'completion_length': 52.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0291748046875, 'epoch': 0.53} 53%|█████▎ | 1314/2500 [5:06:32<4:38:56, 14.11s/it] 53%|█████▎ | 1315/2500 [5:06:46<4:36:03, 13.98s/it] {'loss': 0.0009, 'grad_norm': 0.05774953909727996, 'learning_rate': 4.7399999999999993e-07, 'completion_length': 54.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0233154296875, 'epoch': 0.53} 53%|█████▎ | 1315/2500 [5:06:46<4:36:03, 13.98s/it] 53%|█████▎ | 1316/2500 [5:07:00<4:37:17, 14.05s/it] {'loss': 0.002, 'grad_norm': 0.08950181788845653, 'learning_rate': 4.736e-07, 'completion_length': 57.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0496826171875, 'epoch': 0.53} 53%|█████▎ | 1316/2500 [5:07:00<4:37:17, 14.05s/it] 53%|█████▎ | 1317/2500 [5:07:14<4:37:44, 14.09s/it] {'loss': 0.0011, 'grad_norm': 15.183377916401884, 'learning_rate': 4.732e-07, 'completion_length': 56.750003814697266, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.02716064453125, 'epoch': 0.53} 53%|█████▎ | 1317/2500 [5:07:14<4:37:44, 14.09s/it] 53%|█████▎ | 1318/2500 [5:07:29<4:44:14, 14.43s/it] {'loss': 0.0015, 'grad_norm': 0.11367924039224715, 'learning_rate': 4.728e-07, 'completion_length': 60.62500190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03668212890625, 'epoch': 0.53} 53%|█████▎ | 1318/2500 [5:07:29<4:44:14, 14.43s/it] 53%|█████▎ | 1319/2500 [5:07:42<4:34:10, 13.93s/it] {'loss': 0.0009, 'grad_norm': 0.06656733955152336, 'learning_rate': 4.7239999999999997e-07, 'completion_length': 50.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02294921875, 'epoch': 0.53} 53%|█████▎ | 1319/2500 [5:07:42<4:34:10, 13.93s/it] 53%|█████▎ | 1320/2500 [5:07:56<4:34:10, 13.94s/it] {'loss': 0.0015, 'grad_norm': 0.08378540569203698, 'learning_rate': 4.7199999999999994e-07, 'completion_length': 57.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03643798828125, 'epoch': 0.53} 53%|█████▎ | 1320/2500 [5:07:56<4:34:10, 13.94s/it] 53%|█████▎ | 1321/2500 [5:08:10<4:32:49, 13.88s/it] {'loss': 0.0012, 'grad_norm': 0.053492001257740694, 'learning_rate': 4.716e-07, 'completion_length': 59.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0303955078125, 'epoch': 0.53} 53%|█████▎ | 1321/2500 [5:08:10<4:32:49, 13.88s/it] 53%|█████▎ | 1322/2500 [5:08:26<4:43:26, 14.44s/it] {'loss': 0.0005, 'grad_norm': 0.05109766247910734, 'learning_rate': 4.712e-07, 'completion_length': 64.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013580322265625, 'epoch': 0.53} 53%|█████▎ | 1322/2500 [5:08:26<4:43:26, 14.44s/it] 53%|█████▎ | 1323/2500 [5:08:41<4:48:19, 14.70s/it] {'loss': 0.0011, 'grad_norm': 0.08810875257166707, 'learning_rate': 4.7079999999999995e-07, 'completion_length': 64.89286231994629, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02777099609375, 'epoch': 0.53} 53%|█████▎ | 1323/2500 [5:08:41<4:48:19, 14.70s/it] 53%|█████▎ | 1324/2500 [5:08:55<4:44:40, 14.52s/it] {'loss': 0.0011, 'grad_norm': 0.05624013402339198, 'learning_rate': 4.704e-07, 'completion_length': 55.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0262603759765625, 'epoch': 0.53} 53%|█████▎ | 1324/2500 [5:08:55<4:44:40, 14.52s/it] 53%|█████▎ | 1325/2500 [5:09:10<4:45:41, 14.59s/it] {'loss': 0.0009, 'grad_norm': 0.05530113759737563, 'learning_rate': 4.6999999999999995e-07, 'completion_length': 57.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02313232421875, 'epoch': 0.53} 53%|█████▎ | 1325/2500 [5:09:10<4:45:41, 14.59s/it] 53%|█████▎ | 1326/2500 [5:09:24<4:42:31, 14.44s/it] {'loss': 0.0009, 'grad_norm': 0.054334735789470685, 'learning_rate': 4.6959999999999997e-07, 'completion_length': 51.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02349853515625, 'epoch': 0.53} 53%|█████▎ | 1326/2500 [5:09:24<4:42:31, 14.44s/it] 53%|█████▎ | 1327/2500 [5:09:38<4:43:12, 14.49s/it] {'loss': 0.0013, 'grad_norm': 0.09093242238873869, 'learning_rate': 4.692e-07, 'completion_length': 66.00000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0316162109375, 'epoch': 0.53} 53%|█████▎ | 1327/2500 [5:09:38<4:43:12, 14.49s/it] 53%|█████▎ | 1328/2500 [5:09:53<4:42:30, 14.46s/it] {'loss': 0.0016, 'grad_norm': 0.09551025734471486, 'learning_rate': 4.6879999999999996e-07, 'completion_length': 59.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0406494140625, 'epoch': 0.53} 53%|█████▎ | 1328/2500 [5:09:53<4:42:30, 14.46s/it] 53%|█████▎ | 1329/2500 [5:10:07<4:39:14, 14.31s/it] {'loss': 0.0006, 'grad_norm': 0.05459028717226958, 'learning_rate': 4.684e-07, 'completion_length': 54.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01483154296875, 'epoch': 0.53} 53%|█████▎ | 1329/2500 [5:10:07<4:39:14, 14.31s/it] 53%|█████▎ | 1330/2500 [5:10:21<4:36:13, 14.17s/it] {'loss': 0.0017, 'grad_norm': 0.23789050730256148, 'learning_rate': 4.68e-07, 'completion_length': 55.125003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04296875, 'epoch': 0.53} 53%|█████▎ | 1330/2500 [5:10:21<4:36:13, 14.17s/it] 53%|█████▎ | 1331/2500 [5:10:35<4:38:14, 14.28s/it] {'loss': 0.0016, 'grad_norm': 1.4020682318143005, 'learning_rate': 4.676e-07, 'completion_length': 64.23214530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.03912353515625, 'epoch': 0.53} 53%|█████▎ | 1331/2500 [5:10:35<4:38:14, 14.28s/it] 53%|█████▎ | 1332/2500 [5:10:48<4:31:29, 13.95s/it] {'loss': 0.0011, 'grad_norm': 0.05829777387306187, 'learning_rate': 4.672e-07, 'completion_length': 51.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028167724609375, 'epoch': 0.53} 53%|█████▎ | 1332/2500 [5:10:48<4:31:29, 13.95s/it] 53%|█████▎ | 1333/2500 [5:11:06<4:54:13, 15.13s/it] {'loss': 0.0009, 'grad_norm': 1.1717300473101189, 'learning_rate': 4.6679999999999997e-07, 'completion_length': 55.57143211364746, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.023193359375, 'epoch': 0.53} 53%|█████▎ | 1333/2500 [5:11:06<4:54:13, 15.13s/it] 53%|█████▎ | 1334/2500 [5:11:20<4:43:11, 14.57s/it] {'loss': 0.001, 'grad_norm': 0.15076766099169595, 'learning_rate': 4.6639999999999994e-07, 'completion_length': 60.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02545166015625, 'epoch': 0.53} 53%|█████▎ | 1334/2500 [5:11:20<4:43:11, 14.57s/it] 53%|█████▎ | 1335/2500 [5:11:33<4:38:30, 14.34s/it] {'loss': 0.001, 'grad_norm': 4.412104364645674, 'learning_rate': 4.66e-07, 'completion_length': 51.17857551574707, 'rewards/accuracy_reward': 0.892857164144516, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.04123930633068085, 'kl': 0.0253143310546875, 'epoch': 0.53} 53%|█████▎ | 1335/2500 [5:11:33<4:38:30, 14.34s/it] 53%|█████▎ | 1336/2500 [5:11:47<4:35:16, 14.19s/it] {'loss': 0.0009, 'grad_norm': 0.06458636624594416, 'learning_rate': 4.656e-07, 'completion_length': 61.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0216064453125, 'epoch': 0.53} 53%|█████▎ | 1336/2500 [5:11:47<4:35:16, 14.19s/it] 53%|█████▎ | 1337/2500 [5:12:02<4:38:35, 14.37s/it] {'loss': 0.0019, 'grad_norm': 0.0702075578662613, 'learning_rate': 4.6519999999999996e-07, 'completion_length': 60.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.048583984375, 'epoch': 0.53} 53%|█████▎ | 1337/2500 [5:12:02<4:38:35, 14.37s/it] 54%|█████▎ | 1338/2500 [5:12:16<4:34:35, 14.18s/it] {'loss': 0.0007, 'grad_norm': 0.07336242204614933, 'learning_rate': 4.648e-07, 'completion_length': 56.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018585205078125, 'epoch': 0.54} 54%|█████▎ | 1338/2500 [5:12:16<4:34:35, 14.18s/it] 54%|█████▎ | 1339/2500 [5:12:29<4:29:12, 13.91s/it] {'loss': 0.0008, 'grad_norm': 3.4244049829643717, 'learning_rate': 4.6439999999999995e-07, 'completion_length': 46.69643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.018951416015625, 'epoch': 0.54} 54%|█████▎ | 1339/2500 [5:12:29<4:29:12, 13.91s/it] 54%|█████▎ | 1340/2500 [5:12:42<4:25:36, 13.74s/it] {'loss': 0.0007, 'grad_norm': 0.1369065297734535, 'learning_rate': 4.64e-07, 'completion_length': 51.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018585205078125, 'epoch': 0.54} 54%|█████▎ | 1340/2500 [5:12:42<4:25:36, 13.74s/it] 54%|█████▎ | 1341/2500 [5:12:59<4:42:18, 14.61s/it] {'loss': 0.002, 'grad_norm': 0.8226406930286598, 'learning_rate': 4.636e-07, 'completion_length': 62.21428680419922, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0494384765625, 'epoch': 0.54} 54%|█████▎ | 1341/2500 [5:12:59<4:42:18, 14.61s/it] 54%|█████▎ | 1342/2500 [5:13:12<4:34:33, 14.23s/it] {'loss': 0.0015, 'grad_norm': 2.141246815733051, 'learning_rate': 4.6319999999999997e-07, 'completion_length': 53.82143020629883, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.03643798828125, 'epoch': 0.54} 54%|█████▎ | 1342/2500 [5:13:12<4:34:33, 14.23s/it] 54%|█████▎ | 1343/2500 [5:13:26<4:29:16, 13.96s/it] {'loss': 0.0008, 'grad_norm': 0.05869317529768475, 'learning_rate': 4.628e-07, 'completion_length': 54.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02069091796875, 'epoch': 0.54} 54%|█████▎ | 1343/2500 [5:13:26<4:29:16, 13.96s/it] 54%|█████▍ | 1344/2500 [5:13:40<4:30:56, 14.06s/it] {'loss': 0.0017, 'grad_norm': 0.06810353077679661, 'learning_rate': 4.6239999999999996e-07, 'completion_length': 64.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04248046875, 'epoch': 0.54} 54%|█████▍ | 1344/2500 [5:13:40<4:30:56, 14.06s/it] 54%|█████▍ | 1345/2500 [5:13:56<4:40:43, 14.58s/it] {'loss': 0.0007, 'grad_norm': 0.07517292512582135, 'learning_rate': 4.62e-07, 'completion_length': 57.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01641845703125, 'epoch': 0.54} 54%|█████▍ | 1345/2500 [5:13:56<4:40:43, 14.58s/it] 54%|█████▍ | 1346/2500 [5:14:09<4:33:33, 14.22s/it] {'loss': 0.0007, 'grad_norm': 0.06942487619970199, 'learning_rate': 4.616e-07, 'completion_length': 48.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01708984375, 'epoch': 0.54} 54%|█████▍ | 1346/2500 [5:14:09<4:33:33, 14.22s/it] 54%|█████▍ | 1347/2500 [5:14:23<4:32:28, 14.18s/it] {'loss': 0.0016, 'grad_norm': 0.066603186988133, 'learning_rate': 4.612e-07, 'completion_length': 53.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.040679931640625, 'epoch': 0.54} 54%|█████▍ | 1347/2500 [5:14:23<4:32:28, 14.18s/it] 54%|█████▍ | 1348/2500 [5:14:37<4:29:47, 14.05s/it] {'loss': 0.001, 'grad_norm': 0.07745240670876272, 'learning_rate': 4.6079999999999994e-07, 'completion_length': 58.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0255126953125, 'epoch': 0.54} 54%|█████▍ | 1348/2500 [5:14:37<4:29:47, 14.05s/it] 54%|█████▍ | 1349/2500 [5:14:52<4:33:07, 14.24s/it] {'loss': 0.0008, 'grad_norm': 0.0446493033920605, 'learning_rate': 4.6039999999999997e-07, 'completion_length': 60.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02105712890625, 'epoch': 0.54} 54%|█████▍ | 1349/2500 [5:14:52<4:33:07, 14.24s/it] 54%|█████▍ | 1350/2500 [5:15:06<4:32:43, 14.23s/it] {'loss': 0.0013, 'grad_norm': 0.07765435146453278, 'learning_rate': 4.6e-07, 'completion_length': 54.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0321044921875, 'epoch': 0.54} 54%|█████▍ | 1350/2500 [5:15:06<4:32:43, 14.23s/it] 54%|█████▍ | 1351/2500 [5:15:19<4:27:59, 13.99s/it] {'loss': 0.0008, 'grad_norm': 0.0653199555352885, 'learning_rate': 4.596e-07, 'completion_length': 60.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02001953125, 'epoch': 0.54} 54%|█████▍ | 1351/2500 [5:15:19<4:27:59, 13.99s/it] 54%|█████▍ | 1352/2500 [5:15:34<4:32:02, 14.22s/it] {'loss': 0.0007, 'grad_norm': 0.07451670410493101, 'learning_rate': 4.592e-07, 'completion_length': 58.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0174560546875, 'epoch': 0.54} 54%|█████▍ | 1352/2500 [5:15:34<4:32:02, 14.22s/it] 54%|█████▍ | 1353/2500 [5:15:49<4:37:52, 14.54s/it] {'loss': 0.001, 'grad_norm': 0.09288897753123895, 'learning_rate': 4.5879999999999995e-07, 'completion_length': 65.75000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024139404296875, 'epoch': 0.54} 54%|█████▍ | 1353/2500 [5:15:49<4:37:52, 14.54s/it] 54%|█████▍ | 1354/2500 [5:16:03<4:35:16, 14.41s/it] {'loss': 0.0003, 'grad_norm': 0.08854238584438426, 'learning_rate': 4.584e-07, 'completion_length': 52.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.008453369140625, 'epoch': 0.54} 54%|█████▍ | 1354/2500 [5:16:03<4:35:16, 14.41s/it] 54%|█████▍ | 1355/2500 [5:16:17<4:28:36, 14.08s/it] {'loss': 0.0011, 'grad_norm': 0.06481773985137368, 'learning_rate': 4.58e-07, 'completion_length': 52.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027435302734375, 'epoch': 0.54} 54%|█████▍ | 1355/2500 [5:16:17<4:28:36, 14.08s/it] 54%|█████▍ | 1356/2500 [5:16:30<4:25:16, 13.91s/it] {'loss': 0.0009, 'grad_norm': 0.08193825334666512, 'learning_rate': 4.5759999999999997e-07, 'completion_length': 56.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02362060546875, 'epoch': 0.54} 54%|█████▍ | 1356/2500 [5:16:30<4:25:16, 13.91s/it] 54%|█████▍ | 1357/2500 [5:16:45<4:29:36, 14.15s/it] {'loss': 0.0007, 'grad_norm': 0.06939866492390218, 'learning_rate': 4.572e-07, 'completion_length': 65.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016448974609375, 'epoch': 0.54} 54%|█████▍ | 1357/2500 [5:16:45<4:29:36, 14.15s/it] 54%|█████▍ | 1358/2500 [5:17:01<4:38:26, 14.63s/it] {'loss': 0.0006, 'grad_norm': 0.048961402116377895, 'learning_rate': 4.5679999999999996e-07, 'completion_length': 61.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015045166015625, 'epoch': 0.54} 54%|█████▍ | 1358/2500 [5:17:01<4:38:26, 14.63s/it] 54%|█████▍ | 1359/2500 [5:17:15<4:36:30, 14.54s/it] {'loss': 0.001, 'grad_norm': 0.05840950300141294, 'learning_rate': 4.5639999999999993e-07, 'completion_length': 65.6785774230957, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02587890625, 'epoch': 0.54} 54%|█████▍ | 1359/2500 [5:17:15<4:36:30, 14.54s/it] 54%|█████▍ | 1360/2500 [5:17:29<4:33:56, 14.42s/it] {'loss': 0.0007, 'grad_norm': 0.07419733015340015, 'learning_rate': 4.56e-07, 'completion_length': 53.875003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01690673828125, 'epoch': 0.54} 54%|█████▍ | 1360/2500 [5:17:29<4:33:56, 14.42s/it] 54%|█████▍ | 1361/2500 [5:17:43<4:27:43, 14.10s/it] {'loss': 0.001, 'grad_norm': 0.09767749412884161, 'learning_rate': 4.556e-07, 'completion_length': 57.928571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0238037109375, 'epoch': 0.54} 54%|█████▍ | 1361/2500 [5:17:43<4:27:43, 14.10s/it] 54%|█████▍ | 1362/2500 [5:17:59<4:38:33, 14.69s/it] {'loss': 0.0016, 'grad_norm': 0.16165680435121546, 'learning_rate': 4.5519999999999995e-07, 'completion_length': 68.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0401611328125, 'epoch': 0.54} 54%|█████▍ | 1362/2500 [5:17:59<4:38:33, 14.69s/it] 55%|█████▍ | 1363/2500 [5:18:12<4:32:26, 14.38s/it] {'loss': 0.0014, 'grad_norm': 0.11534055808999442, 'learning_rate': 4.5479999999999997e-07, 'completion_length': 53.53571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03607177734375, 'epoch': 0.55} 55%|█████▍ | 1363/2500 [5:18:12<4:32:26, 14.38s/it] 55%|█████▍ | 1364/2500 [5:18:27<4:33:54, 14.47s/it] {'loss': 0.0016, 'grad_norm': 0.06334045365532816, 'learning_rate': 4.544e-07, 'completion_length': 64.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03900146484375, 'epoch': 0.55} 55%|█████▍ | 1364/2500 [5:18:27<4:33:54, 14.47s/it] 55%|█████▍ | 1365/2500 [5:18:41<4:32:07, 14.39s/it] {'loss': 0.0009, 'grad_norm': 0.2134770168706067, 'learning_rate': 4.54e-07, 'completion_length': 52.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022705078125, 'epoch': 0.55} 55%|█████▍ | 1365/2500 [5:18:41<4:32:07, 14.39s/it] 55%|█████▍ | 1366/2500 [5:18:55<4:26:58, 14.13s/it] {'loss': 0.0011, 'grad_norm': 0.08391782265714749, 'learning_rate': 4.536e-07, 'completion_length': 57.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0272216796875, 'epoch': 0.55} 55%|█████▍ | 1366/2500 [5:18:55<4:26:58, 14.13s/it] 55%|█████▍ | 1367/2500 [5:19:08<4:23:57, 13.98s/it] {'loss': 0.0012, 'grad_norm': 0.04674939816923399, 'learning_rate': 4.5319999999999996e-07, 'completion_length': 57.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0303955078125, 'epoch': 0.55} 55%|█████▍ | 1367/2500 [5:19:08<4:23:57, 13.98s/it] 55%|█████▍ | 1368/2500 [5:19:22<4:21:44, 13.87s/it] {'loss': 0.0017, 'grad_norm': 0.08412666463653455, 'learning_rate': 4.528e-07, 'completion_length': 53.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04241943359375, 'epoch': 0.55} 55%|█████▍ | 1368/2500 [5:19:22<4:21:44, 13.87s/it] 55%|█████▍ | 1369/2500 [5:19:36<4:22:21, 13.92s/it] {'loss': 0.0005, 'grad_norm': 0.04589749165140556, 'learning_rate': 4.524e-07, 'completion_length': 53.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01251220703125, 'epoch': 0.55} 55%|█████▍ | 1369/2500 [5:19:36<4:22:21, 13.92s/it] 55%|█████▍ | 1370/2500 [5:19:50<4:20:27, 13.83s/it] {'loss': 0.0012, 'grad_norm': 0.055178069663571847, 'learning_rate': 4.5199999999999997e-07, 'completion_length': 51.80357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02947998046875, 'epoch': 0.55} 55%|█████▍ | 1370/2500 [5:19:50<4:20:27, 13.83s/it] 55%|█████▍ | 1371/2500 [5:20:04<4:26:29, 14.16s/it] {'loss': 0.0005, 'grad_norm': 0.055836129643514525, 'learning_rate': 4.516e-07, 'completion_length': 57.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013580322265625, 'epoch': 0.55} 55%|█████▍ | 1371/2500 [5:20:04<4:26:29, 14.16s/it] 55%|█████▍ | 1372/2500 [5:20:17<4:18:18, 13.74s/it] {'loss': 0.0011, 'grad_norm': 0.06635527364612288, 'learning_rate': 4.5119999999999996e-07, 'completion_length': 48.392860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0283203125, 'epoch': 0.55} 55%|█████▍ | 1372/2500 [5:20:17<4:18:18, 13.74s/it] 55%|█████▍ | 1373/2500 [5:20:31<4:16:49, 13.67s/it] {'loss': 0.0009, 'grad_norm': 0.048744180097856964, 'learning_rate': 4.5079999999999993e-07, 'completion_length': 51.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02264404296875, 'epoch': 0.55} 55%|█████▍ | 1373/2500 [5:20:31<4:16:49, 13.67s/it] 55%|█████▍ | 1374/2500 [5:20:45<4:21:56, 13.96s/it] {'loss': 0.0011, 'grad_norm': 1.8791278869692685, 'learning_rate': 4.504e-07, 'completion_length': 56.33928871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02850341796875, 'epoch': 0.55} 55%|█████▍ | 1374/2500 [5:20:45<4:21:56, 13.96s/it] 55%|█████▌ | 1375/2500 [5:21:02<4:39:03, 14.88s/it] {'loss': 0.001, 'grad_norm': 0.05419581063401521, 'learning_rate': 4.5e-07, 'completion_length': 60.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0238037109375, 'epoch': 0.55} 55%|█████▌ | 1375/2500 [5:21:02<4:39:03, 14.88s/it] 55%|█████▌ | 1376/2500 [5:21:16<4:32:43, 14.56s/it] {'loss': 0.0017, 'grad_norm': 3.8680325890589407, 'learning_rate': 4.496e-07, 'completion_length': 58.16071701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.04180908203125, 'epoch': 0.55} 55%|█████▌ | 1376/2500 [5:21:16<4:32:43, 14.56s/it] 55%|█████▌ | 1377/2500 [5:21:29<4:24:49, 14.15s/it] {'loss': 0.0024, 'grad_norm': 0.06189693537284663, 'learning_rate': 4.4919999999999997e-07, 'completion_length': 49.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0601806640625, 'epoch': 0.55} 55%|█████▌ | 1377/2500 [5:21:29<4:24:49, 14.15s/it] 55%|█████▌ | 1378/2500 [5:21:44<4:25:06, 14.18s/it] {'loss': 0.0006, 'grad_norm': 0.3335095812669603, 'learning_rate': 4.4879999999999994e-07, 'completion_length': 56.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014129638671875, 'epoch': 0.55} 55%|█████▌ | 1378/2500 [5:21:44<4:25:06, 14.18s/it] 55%|█████▌ | 1379/2500 [5:21:57<4:20:14, 13.93s/it] {'loss': 0.0011, 'grad_norm': 1.1500653341673732, 'learning_rate': 4.484e-07, 'completion_length': 53.57143211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.028564453125, 'epoch': 0.55} 55%|█████▌ | 1379/2500 [5:21:57<4:20:14, 13.93s/it] 55%|█████▌ | 1380/2500 [5:22:11<4:22:02, 14.04s/it] {'loss': 0.0012, 'grad_norm': 0.08142376542767774, 'learning_rate': 4.48e-07, 'completion_length': 61.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02984619140625, 'epoch': 0.55} 55%|█████▌ | 1380/2500 [5:22:11<4:22:02, 14.04s/it] 55%|█████▌ | 1381/2500 [5:22:26<4:27:36, 14.35s/it] {'loss': 0.0009, 'grad_norm': 0.07090670049513302, 'learning_rate': 4.4759999999999996e-07, 'completion_length': 55.30357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022216796875, 'epoch': 0.55} 55%|█████▌ | 1381/2500 [5:22:26<4:27:36, 14.35s/it] 55%|█████▌ | 1382/2500 [5:22:41<4:30:13, 14.50s/it] {'loss': 0.0011, 'grad_norm': 0.08173413367071854, 'learning_rate': 4.472e-07, 'completion_length': 58.732147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027252197265625, 'epoch': 0.55} 55%|█████▌ | 1382/2500 [5:22:41<4:30:13, 14.50s/it] 55%|█████▌ | 1383/2500 [5:22:55<4:24:32, 14.21s/it] {'loss': 0.0012, 'grad_norm': 0.08827635340627345, 'learning_rate': 4.4679999999999995e-07, 'completion_length': 54.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0306396484375, 'epoch': 0.55} 55%|█████▌ | 1383/2500 [5:22:55<4:24:32, 14.21s/it] 55%|█████▌ | 1384/2500 [5:23:09<4:24:06, 14.20s/it] {'loss': 0.0012, 'grad_norm': 0.07662992592441821, 'learning_rate': 4.464e-07, 'completion_length': 57.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02899169921875, 'epoch': 0.55} 55%|█████▌ | 1384/2500 [5:23:09<4:24:06, 14.20s/it] 55%|█████▌ | 1385/2500 [5:23:22<4:19:16, 13.95s/it] {'loss': 0.0017, 'grad_norm': 0.08546661776852503, 'learning_rate': 4.46e-07, 'completion_length': 54.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04327392578125, 'epoch': 0.55} 55%|█████▌ | 1385/2500 [5:23:22<4:19:16, 13.95s/it] 55%|█████▌ | 1386/2500 [5:23:36<4:19:33, 13.98s/it] {'loss': 0.002, 'grad_norm': 0.06351201328537326, 'learning_rate': 4.4559999999999997e-07, 'completion_length': 57.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04931640625, 'epoch': 0.55} 55%|█████▌ | 1386/2500 [5:23:36<4:19:33, 13.98s/it] 55%|█████▌ | 1387/2500 [5:23:50<4:16:13, 13.81s/it] {'loss': 0.0018, 'grad_norm': 0.0801415951514552, 'learning_rate': 4.452e-07, 'completion_length': 56.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0438232421875, 'epoch': 0.55} 55%|█████▌ | 1387/2500 [5:23:50<4:16:13, 13.81s/it] 56%|█████▌ | 1388/2500 [5:24:04<4:17:17, 13.88s/it] {'loss': 0.0014, 'grad_norm': 0.07717353551296914, 'learning_rate': 4.4479999999999996e-07, 'completion_length': 63.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.036285400390625, 'epoch': 0.56} 56%|█████▌ | 1388/2500 [5:24:04<4:17:17, 13.88s/it] 56%|█████▌ | 1389/2500 [5:24:18<4:18:49, 13.98s/it] {'loss': 0.0014, 'grad_norm': 0.09811820365553608, 'learning_rate': 4.444e-07, 'completion_length': 61.80357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03436279296875, 'epoch': 0.56} 56%|█████▌ | 1389/2500 [5:24:18<4:18:49, 13.98s/it] 56%|█████▌ | 1390/2500 [5:24:32<4:16:57, 13.89s/it] {'loss': 0.0006, 'grad_norm': 0.09241410319982718, 'learning_rate': 4.44e-07, 'completion_length': 55.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014129638671875, 'epoch': 0.56} 56%|█████▌ | 1390/2500 [5:24:32<4:16:57, 13.89s/it] 56%|█████▌ | 1391/2500 [5:24:46<4:19:03, 14.02s/it] {'loss': 0.0007, 'grad_norm': 0.08518561028357134, 'learning_rate': 4.436e-07, 'completion_length': 61.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017333984375, 'epoch': 0.56} 56%|█████▌ | 1391/2500 [5:24:46<4:19:03, 14.02s/it] 56%|█████▌ | 1392/2500 [5:25:01<4:24:30, 14.32s/it] {'loss': 0.0013, 'grad_norm': 0.094656424819806, 'learning_rate': 4.4319999999999995e-07, 'completion_length': 60.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03179931640625, 'epoch': 0.56} 56%|█████▌ | 1392/2500 [5:25:01<4:24:30, 14.32s/it] 56%|█████▌ | 1393/2500 [5:25:16<4:28:59, 14.58s/it] {'loss': 0.0009, 'grad_norm': 0.05576166686704615, 'learning_rate': 4.428e-07, 'completion_length': 62.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02325439453125, 'epoch': 0.56} 56%|█████▌ | 1393/2500 [5:25:16<4:28:59, 14.58s/it] 56%|█████▌ | 1394/2500 [5:25:30<4:26:25, 14.45s/it] {'loss': 0.0013, 'grad_norm': 2.0286422397492814, 'learning_rate': 4.424e-07, 'completion_length': 56.50000190734863, 'rewards/accuracy_reward': 0.9107142984867096, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.07695358991622925, 'kl': 0.03271484375, 'epoch': 0.56} 56%|█████▌ | 1394/2500 [5:25:30<4:26:25, 14.45s/it] 56%|█████▌ | 1395/2500 [5:25:44<4:23:06, 14.29s/it] {'loss': 0.0009, 'grad_norm': 0.0881626778563641, 'learning_rate': 4.4199999999999996e-07, 'completion_length': 55.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022613525390625, 'epoch': 0.56} 56%|█████▌ | 1395/2500 [5:25:44<4:23:06, 14.29s/it] 56%|█████▌ | 1396/2500 [5:25:58<4:21:58, 14.24s/it] {'loss': 0.0013, 'grad_norm': 1.3255969882825338, 'learning_rate': 4.416e-07, 'completion_length': 55.267860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03167724609375, 'epoch': 0.56} 56%|█████▌ | 1396/2500 [5:25:58<4:21:58, 14.24s/it] 56%|█████▌ | 1397/2500 [5:26:13<4:23:14, 14.32s/it] {'loss': 0.0014, 'grad_norm': 1.2779071138141778, 'learning_rate': 4.4119999999999995e-07, 'completion_length': 54.03571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.034912109375, 'epoch': 0.56} 56%|█████▌ | 1397/2500 [5:26:13<4:23:14, 14.32s/it] 56%|█████▌ | 1398/2500 [5:26:28<4:25:37, 14.46s/it] {'loss': 0.0018, 'grad_norm': 0.055405762726318454, 'learning_rate': 4.4080000000000003e-07, 'completion_length': 57.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0440673828125, 'epoch': 0.56} 56%|█████▌ | 1398/2500 [5:26:28<4:25:37, 14.46s/it] 56%|█████▌ | 1399/2500 [5:26:42<4:23:07, 14.34s/it] {'loss': 0.0007, 'grad_norm': 0.05620355471670063, 'learning_rate': 4.404e-07, 'completion_length': 64.14286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016845703125, 'epoch': 0.56} 56%|█████▌ | 1399/2500 [5:26:42<4:23:07, 14.34s/it] 56%|█████▌ | 1400/2500 [5:26:55<4:15:19, 13.93s/it] {'loss': 0.0012, 'grad_norm': 0.16246310161483674, 'learning_rate': 4.3999999999999997e-07, 'completion_length': 49.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031005859375, 'epoch': 0.56} 56%|█████▌ | 1400/2500 [5:26:55<4:15:19, 13.93s/it] 56%|█████▌ | 1401/2500 [5:28:02<9:07:56, 29.91s/it] {'loss': 0.0016, 'grad_norm': 0.18006159139597555, 'learning_rate': 4.396e-07, 'completion_length': 52.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04034423828125, 'epoch': 0.56} 56%|█████▌ | 1401/2500 [5:28:02<9:07:56, 29.91s/it] 56%|█████▌ | 1402/2500 [5:28:17<7:47:38, 25.55s/it] {'loss': 0.0008, 'grad_norm': 0.06607640417587904, 'learning_rate': 4.3919999999999996e-07, 'completion_length': 64.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020233154296875, 'epoch': 0.56} 56%|█████▌ | 1402/2500 [5:28:17<7:47:38, 25.55s/it] 56%|█████▌ | 1403/2500 [5:28:31<6:40:47, 21.92s/it] {'loss': 0.0009, 'grad_norm': 0.06491928825341922, 'learning_rate': 4.388e-07, 'completion_length': 51.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023101806640625, 'epoch': 0.56} 56%|█████▌ | 1403/2500 [5:28:31<6:40:47, 21.92s/it] 56%|█████▌ | 1404/2500 [5:28:45<5:56:44, 19.53s/it] {'loss': 0.0007, 'grad_norm': 0.06659395265566907, 'learning_rate': 4.384e-07, 'completion_length': 53.23214530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.01849365234375, 'epoch': 0.56} 56%|█████▌ | 1404/2500 [5:28:45<5:56:44, 19.53s/it] 56%|█████▌ | 1405/2500 [5:28:58<5:22:54, 17.69s/it] {'loss': 0.0005, 'grad_norm': 0.0831768453682371, 'learning_rate': 4.38e-07, 'completion_length': 51.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.012725830078125, 'epoch': 0.56} 56%|█████▌ | 1405/2500 [5:28:58<5:22:54, 17.69s/it] 56%|█████▌ | 1406/2500 [5:29:13<5:05:03, 16.73s/it] {'loss': 0.001, 'grad_norm': 1.2938859152080218, 'learning_rate': 4.3759999999999995e-07, 'completion_length': 59.14285850524902, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02606201171875, 'epoch': 0.56} 56%|█████▌ | 1406/2500 [5:29:13<5:05:03, 16.73s/it] 56%|█████▋ | 1407/2500 [5:29:27<4:52:56, 16.08s/it] {'loss': 0.001, 'grad_norm': 0.0769163521531858, 'learning_rate': 4.3719999999999997e-07, 'completion_length': 59.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025634765625, 'epoch': 0.56} 56%|█████▋ | 1407/2500 [5:29:27<4:52:56, 16.08s/it] 56%|█████▋ | 1408/2500 [5:29:42<4:44:18, 15.62s/it] {'loss': 0.0014, 'grad_norm': 0.07113578876823569, 'learning_rate': 4.368e-07, 'completion_length': 62.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0360107421875, 'epoch': 0.56} 56%|█████▋ | 1408/2500 [5:29:42<4:44:18, 15.62s/it] 56%|█████▋ | 1409/2500 [5:29:55<4:33:37, 15.05s/it] {'loss': 0.0013, 'grad_norm': 0.2024090417358403, 'learning_rate': 4.364e-07, 'completion_length': 53.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0328369140625, 'epoch': 0.56} 56%|█████▋ | 1409/2500 [5:29:55<4:33:37, 15.05s/it] 56%|█████▋ | 1410/2500 [5:30:10<4:31:36, 14.95s/it] {'loss': 0.0015, 'grad_norm': 2.8649643208974913, 'learning_rate': 4.36e-07, 'completion_length': 59.42857360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0384521484375, 'epoch': 0.56} 56%|█████▋ | 1410/2500 [5:30:10<4:31:36, 14.95s/it] 56%|█████▋ | 1411/2500 [5:30:24<4:23:38, 14.53s/it] {'loss': 0.0013, 'grad_norm': 0.14081619847158428, 'learning_rate': 4.3559999999999996e-07, 'completion_length': 57.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03363037109375, 'epoch': 0.56} 56%|█████▋ | 1411/2500 [5:30:24<4:23:38, 14.53s/it] 56%|█████▋ | 1412/2500 [5:30:39<4:25:59, 14.67s/it] {'loss': 0.0012, 'grad_norm': 1.2719539230964918, 'learning_rate': 4.352e-07, 'completion_length': 53.392860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0311279296875, 'epoch': 0.56} 56%|█████▋ | 1412/2500 [5:30:39<4:25:59, 14.67s/it] 57%|█████▋ | 1413/2500 [5:30:53<4:25:10, 14.64s/it] {'loss': 0.0008, 'grad_norm': 0.09155804073403101, 'learning_rate': 4.348e-07, 'completion_length': 62.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01947021484375, 'epoch': 0.57} 57%|█████▋ | 1413/2500 [5:30:53<4:25:10, 14.64s/it] 57%|█████▋ | 1414/2500 [5:31:07<4:17:23, 14.22s/it] {'loss': 0.0007, 'grad_norm': 0.9861230462285363, 'learning_rate': 4.3439999999999997e-07, 'completion_length': 50.267860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.018524169921875, 'epoch': 0.57} 57%|█████▋ | 1414/2500 [5:31:07<4:17:23, 14.22s/it] 57%|█████▋ | 1415/2500 [5:31:21<4:16:43, 14.20s/it] {'loss': 0.0016, 'grad_norm': 0.09059078232019033, 'learning_rate': 4.34e-07, 'completion_length': 58.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.039794921875, 'epoch': 0.57} 57%|█████▋ | 1415/2500 [5:31:21<4:16:43, 14.20s/it] 57%|█████▋ | 1416/2500 [5:31:35<4:16:06, 14.18s/it] {'loss': 0.0008, 'grad_norm': 0.0743700706809682, 'learning_rate': 4.3359999999999997e-07, 'completion_length': 55.96428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02093505859375, 'epoch': 0.57} 57%|█████▋ | 1416/2500 [5:31:35<4:16:06, 14.18s/it] 57%|█████▋ | 1417/2500 [5:31:50<4:23:17, 14.59s/it] {'loss': 0.0014, 'grad_norm': 0.13148344335860246, 'learning_rate': 4.3319999999999994e-07, 'completion_length': 62.946434020996094, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03521728515625, 'epoch': 0.57} 57%|█████▋ | 1417/2500 [5:31:50<4:23:17, 14.59s/it] 57%|█████▋ | 1418/2500 [5:32:05<4:25:08, 14.70s/it] {'loss': 0.001, 'grad_norm': 1.7462466177578644, 'learning_rate': 4.328e-07, 'completion_length': 59.767860412597656, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0824786126613617, 'kl': 0.0242919921875, 'epoch': 0.57} 57%|█████▋ | 1418/2500 [5:32:05<4:25:08, 14.70s/it] 57%|█████▋ | 1419/2500 [5:32:19<4:20:36, 14.46s/it] {'loss': 0.0013, 'grad_norm': 0.12857911328438776, 'learning_rate': 4.324e-07, 'completion_length': 47.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03204345703125, 'epoch': 0.57} 57%|█████▋ | 1419/2500 [5:32:19<4:20:36, 14.46s/it] 57%|█████▋ | 1420/2500 [5:32:33<4:14:22, 14.13s/it] {'loss': 0.0013, 'grad_norm': 0.09206270336110844, 'learning_rate': 4.3199999999999995e-07, 'completion_length': 56.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03350830078125, 'epoch': 0.57} 57%|█████▋ | 1420/2500 [5:32:33<4:14:22, 14.13s/it] 57%|█████▋ | 1421/2500 [5:32:46<4:09:40, 13.88s/it] {'loss': 0.0023, 'grad_norm': 4.04824431041305, 'learning_rate': 4.316e-07, 'completion_length': 48.32143020629883, 'rewards/accuracy_reward': 0.9107142984867096, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.0357142873108387, 'kl': 0.05755615234375, 'epoch': 0.57} 57%|█████▋ | 1421/2500 [5:32:46<4:09:40, 13.88s/it] 57%|█████▋ | 1422/2500 [5:33:00<4:12:09, 14.03s/it] {'loss': 0.0014, 'grad_norm': 0.17503520697038946, 'learning_rate': 4.312e-07, 'completion_length': 55.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0357666015625, 'epoch': 0.57} 57%|█████▋ | 1422/2500 [5:33:00<4:12:09, 14.03s/it] 57%|█████▋ | 1423/2500 [5:33:14<4:12:33, 14.07s/it] {'loss': 0.0012, 'grad_norm': 0.07337351905219221, 'learning_rate': 4.308e-07, 'completion_length': 56.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029052734375, 'epoch': 0.57} 57%|█████▋ | 1423/2500 [5:33:14<4:12:33, 14.07s/it] 57%|█████▋ | 1424/2500 [5:33:28<4:09:33, 13.92s/it] {'loss': 0.001, 'grad_norm': 1.4276496367603382, 'learning_rate': 4.304e-07, 'completion_length': 57.73214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02374267578125, 'epoch': 0.57} 57%|█████▋ | 1424/2500 [5:33:28<4:09:33, 13.92s/it] 57%|█████▋ | 1425/2500 [5:33:42<4:08:59, 13.90s/it] {'loss': 0.0007, 'grad_norm': 0.08738031578746523, 'learning_rate': 4.2999999999999996e-07, 'completion_length': 52.82143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01690673828125, 'epoch': 0.57} 57%|█████▋ | 1425/2500 [5:33:42<4:08:59, 13.90s/it] 57%|█████▋ | 1426/2500 [5:33:56<4:09:34, 13.94s/it] {'loss': 0.0014, 'grad_norm': 0.11326780987447589, 'learning_rate': 4.296e-07, 'completion_length': 51.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0345458984375, 'epoch': 0.57} 57%|█████▋ | 1426/2500 [5:33:56<4:09:34, 13.94s/it] 57%|█████▋ | 1427/2500 [5:34:10<4:08:48, 13.91s/it] {'loss': 0.0012, 'grad_norm': 0.09994794684189912, 'learning_rate': 4.292e-07, 'completion_length': 54.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03118896484375, 'epoch': 0.57} 57%|█████▋ | 1427/2500 [5:34:10<4:08:48, 13.91s/it] 57%|█████▋ | 1428/2500 [5:34:23<4:04:43, 13.70s/it] {'loss': 0.0008, 'grad_norm': 0.08725769820361129, 'learning_rate': 4.288e-07, 'completion_length': 49.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0201416015625, 'epoch': 0.57} 57%|█████▋ | 1428/2500 [5:34:23<4:04:43, 13.70s/it] 57%|█████▋ | 1429/2500 [5:34:37<4:05:58, 13.78s/it] {'loss': 0.0016, 'grad_norm': 0.11308050217273923, 'learning_rate': 4.284e-07, 'completion_length': 53.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04095458984375, 'epoch': 0.57} 57%|█████▋ | 1429/2500 [5:34:37<4:05:58, 13.78s/it] 57%|█████▋ | 1430/2500 [5:34:50<4:02:13, 13.58s/it] {'loss': 0.0007, 'grad_norm': 0.07985071228413505, 'learning_rate': 4.2799999999999997e-07, 'completion_length': 49.16071701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0185546875, 'epoch': 0.57} 57%|█████▋ | 1430/2500 [5:34:50<4:02:13, 13.58s/it] 57%|█████▋ | 1431/2500 [5:35:04<4:06:40, 13.85s/it] {'loss': 0.0017, 'grad_norm': 3.054930221873875, 'learning_rate': 4.2759999999999994e-07, 'completion_length': 53.142860412597656, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0416259765625, 'epoch': 0.57} 57%|█████▋ | 1431/2500 [5:35:04<4:06:40, 13.85s/it] 57%|█████▋ | 1432/2500 [5:35:18<4:05:57, 13.82s/it] {'loss': 0.0014, 'grad_norm': 0.09262916505269432, 'learning_rate': 4.272e-07, 'completion_length': 54.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03375244140625, 'epoch': 0.57} 57%|█████▋ | 1432/2500 [5:35:18<4:05:57, 13.82s/it] 57%|█████▋ | 1433/2500 [5:35:32<4:03:51, 13.71s/it] {'loss': 0.0014, 'grad_norm': 0.08425473678231894, 'learning_rate': 4.268e-07, 'completion_length': 53.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0338134765625, 'epoch': 0.57} 57%|█████▋ | 1433/2500 [5:35:32<4:03:51, 13.71s/it] 57%|█████▋ | 1434/2500 [5:35:46<4:07:41, 13.94s/it] {'loss': 0.0011, 'grad_norm': 0.13283161560109005, 'learning_rate': 4.264e-07, 'completion_length': 53.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0284423828125, 'epoch': 0.57} 57%|█████▋ | 1434/2500 [5:35:46<4:07:41, 13.94s/it] 57%|█████▋ | 1435/2500 [5:36:01<4:11:55, 14.19s/it] {'loss': 0.0023, 'grad_norm': 0.13655370469793351, 'learning_rate': 4.26e-07, 'completion_length': 65.67857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0562744140625, 'epoch': 0.57} 57%|█████▋ | 1435/2500 [5:36:01<4:11:55, 14.19s/it] 57%|█████▋ | 1436/2500 [5:36:15<4:10:31, 14.13s/it] {'loss': 0.0026, 'grad_norm': 0.20735728346806412, 'learning_rate': 4.2559999999999995e-07, 'completion_length': 53.66071701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0655517578125, 'epoch': 0.57} 57%|█████▋ | 1436/2500 [5:36:15<4:10:31, 14.13s/it] 57%|█████▋ | 1437/2500 [5:36:30<4:14:10, 14.35s/it] {'loss': 0.001, 'grad_norm': 0.6338272517891724, 'learning_rate': 4.252e-07, 'completion_length': 64.30357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0250244140625, 'epoch': 0.57} 57%|█████▋ | 1437/2500 [5:36:30<4:14:10, 14.35s/it] 58%|█████▊ | 1438/2500 [5:36:45<4:18:52, 14.63s/it] {'loss': 0.0007, 'grad_norm': 0.15960426643249306, 'learning_rate': 4.248e-07, 'completion_length': 58.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0184326171875, 'epoch': 0.58} 58%|█████▊ | 1438/2500 [5:36:45<4:18:52, 14.63s/it] 58%|█████▊ | 1439/2500 [5:36:59<4:13:35, 14.34s/it] {'loss': 0.0024, 'grad_norm': 1.5111239852322274, 'learning_rate': 4.2439999999999996e-07, 'completion_length': 59.98214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.060546875, 'epoch': 0.58} 58%|█████▊ | 1439/2500 [5:36:59<4:13:35, 14.34s/it] 58%|█████▊ | 1440/2500 [5:37:12<4:09:55, 14.15s/it] {'loss': 0.0018, 'grad_norm': 0.09077612502394222, 'learning_rate': 4.24e-07, 'completion_length': 51.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0438232421875, 'epoch': 0.58} 58%|█████▊ | 1440/2500 [5:37:12<4:09:55, 14.15s/it] 58%|█████▊ | 1441/2500 [5:37:27<4:13:31, 14.36s/it] {'loss': 0.0019, 'grad_norm': 0.07940502271230283, 'learning_rate': 4.2359999999999995e-07, 'completion_length': 65.46429061889648, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04736328125, 'epoch': 0.58} 58%|█████▊ | 1441/2500 [5:37:27<4:13:31, 14.36s/it] 58%|█████▊ | 1442/2500 [5:37:41<4:12:10, 14.30s/it] {'loss': 0.0011, 'grad_norm': 0.08620936071747642, 'learning_rate': 4.232e-07, 'completion_length': 64.10714721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0263671875, 'epoch': 0.58} 58%|█████▊ | 1442/2500 [5:37:41<4:12:10, 14.30s/it] 58%|█████▊ | 1443/2500 [5:37:54<4:04:51, 13.90s/it] {'loss': 0.0013, 'grad_norm': 0.06087845876443215, 'learning_rate': 4.228e-07, 'completion_length': 56.375003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0325927734375, 'epoch': 0.58} 58%|█████▊ | 1443/2500 [5:37:54<4:04:51, 13.90s/it] 58%|█████▊ | 1444/2500 [5:38:08<4:05:26, 13.95s/it] {'loss': 0.0005, 'grad_norm': 0.05843890072662839, 'learning_rate': 4.2239999999999997e-07, 'completion_length': 51.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.012939453125, 'epoch': 0.58} 58%|█████▊ | 1444/2500 [5:38:08<4:05:26, 13.95s/it] 58%|█████▊ | 1445/2500 [5:38:23<4:06:13, 14.00s/it] {'loss': 0.0014, 'grad_norm': 0.10709183634495394, 'learning_rate': 4.2199999999999994e-07, 'completion_length': 57.66071891784668, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03594970703125, 'epoch': 0.58} 58%|█████▊ | 1445/2500 [5:38:23<4:06:13, 14.00s/it] 58%|█████▊ | 1446/2500 [5:38:37<4:09:40, 14.21s/it] {'loss': 0.0016, 'grad_norm': 0.07255973791551727, 'learning_rate': 4.2159999999999996e-07, 'completion_length': 56.42857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0391845703125, 'epoch': 0.58} 58%|█████▊ | 1446/2500 [5:38:37<4:09:40, 14.21s/it] 58%|█████▊ | 1447/2500 [5:38:51<4:07:00, 14.07s/it] {'loss': 0.0015, 'grad_norm': 5.822449967775768, 'learning_rate': 4.212e-07, 'completion_length': 53.50000190734863, 'rewards/accuracy_reward': 0.8750000596046448, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.0357142873108387, 'kl': 0.03656005859375, 'epoch': 0.58} 58%|█████▊ | 1447/2500 [5:38:51<4:07:00, 14.07s/it] 58%|█████▊ | 1448/2500 [5:39:06<4:11:03, 14.32s/it] {'loss': 0.0013, 'grad_norm': 0.1272350349391161, 'learning_rate': 4.208e-07, 'completion_length': 59.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03192138671875, 'epoch': 0.58} 58%|█████▊ | 1448/2500 [5:39:06<4:11:03, 14.32s/it] 58%|█████▊ | 1449/2500 [5:39:20<4:11:33, 14.36s/it] {'loss': 0.002, 'grad_norm': 0.10705163806801057, 'learning_rate': 4.204e-07, 'completion_length': 57.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05126953125, 'epoch': 0.58} 58%|█████▊ | 1449/2500 [5:39:20<4:11:33, 14.36s/it] 58%|█████▊ | 1450/2500 [5:39:35<4:10:07, 14.29s/it] {'loss': 0.0014, 'grad_norm': 0.08081469762970307, 'learning_rate': 4.1999999999999995e-07, 'completion_length': 61.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03424072265625, 'epoch': 0.58} 58%|█████▊ | 1450/2500 [5:39:35<4:10:07, 14.29s/it] 58%|█████▊ | 1451/2500 [5:39:48<4:06:51, 14.12s/it] {'loss': 0.0015, 'grad_norm': 0.08115116875538349, 'learning_rate': 4.1959999999999997e-07, 'completion_length': 61.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03851318359375, 'epoch': 0.58} 58%|█████▊ | 1451/2500 [5:39:48<4:06:51, 14.12s/it] 58%|█████▊ | 1452/2500 [5:40:02<4:05:46, 14.07s/it] {'loss': 0.0005, 'grad_norm': 0.057286198952649275, 'learning_rate': 4.192e-07, 'completion_length': 63.03571891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0118408203125, 'epoch': 0.58} 58%|█████▊ | 1452/2500 [5:40:02<4:05:46, 14.07s/it] 58%|█████▊ | 1453/2500 [5:40:17<4:09:47, 14.31s/it] {'loss': 0.0018, 'grad_norm': 0.07859493836045518, 'learning_rate': 4.1879999999999996e-07, 'completion_length': 63.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.044189453125, 'epoch': 0.58} 58%|█████▊ | 1453/2500 [5:40:17<4:09:47, 14.31s/it] 58%|█████▊ | 1454/2500 [5:40:31<4:07:14, 14.18s/it] {'loss': 0.0009, 'grad_norm': 0.09944255412423435, 'learning_rate': 4.184e-07, 'completion_length': 58.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022979736328125, 'epoch': 0.58} 58%|█████▊ | 1454/2500 [5:40:31<4:07:14, 14.18s/it] 58%|█████▊ | 1455/2500 [5:40:44<4:00:16, 13.80s/it] {'loss': 0.0012, 'grad_norm': 1.3923280187009963, 'learning_rate': 4.1799999999999996e-07, 'completion_length': 55.08928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03106689453125, 'epoch': 0.58} 58%|█████▊ | 1455/2500 [5:40:44<4:00:16, 13.80s/it] 58%|█████▊ | 1456/2500 [5:40:57<3:58:21, 13.70s/it] {'loss': 0.0017, 'grad_norm': 1.047497720485999, 'learning_rate': 4.1760000000000003e-07, 'completion_length': 58.66071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0419921875, 'epoch': 0.58} 58%|█████▊ | 1456/2500 [5:40:57<3:58:21, 13.70s/it] 58%|█████▊ | 1457/2500 [5:41:11<3:56:31, 13.61s/it] {'loss': 0.0009, 'grad_norm': 0.07415623392930894, 'learning_rate': 4.172e-07, 'completion_length': 56.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0213623046875, 'epoch': 0.58} 58%|█████▊ | 1457/2500 [5:41:11<3:56:31, 13.61s/it] 58%|█████▊ | 1458/2500 [5:41:24<3:56:45, 13.63s/it] {'loss': 0.0006, 'grad_norm': 0.06628063819748385, 'learning_rate': 4.1679999999999997e-07, 'completion_length': 56.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014251708984375, 'epoch': 0.58} 58%|█████▊ | 1458/2500 [5:41:24<3:56:45, 13.63s/it] 58%|█████▊ | 1459/2500 [5:41:38<3:57:17, 13.68s/it] {'loss': 0.002, 'grad_norm': 2.127719165110084, 'learning_rate': 4.164e-07, 'completion_length': 57.98214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0491943359375, 'epoch': 0.58} 58%|█████▊ | 1459/2500 [5:41:38<3:57:17, 13.68s/it] 58%|█████▊ | 1460/2500 [5:41:52<3:59:27, 13.81s/it] {'loss': 0.0011, 'grad_norm': 2.260260085880246, 'learning_rate': 4.1599999999999997e-07, 'completion_length': 53.50000190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0274658203125, 'epoch': 0.58} 58%|█████▊ | 1460/2500 [5:41:52<3:59:27, 13.81s/it] 58%|█████▊ | 1461/2500 [5:42:06<3:57:22, 13.71s/it] {'loss': 0.0011, 'grad_norm': 0.2607128674013655, 'learning_rate': 4.156e-07, 'completion_length': 55.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.58} 58%|█████▊ | 1461/2500 [5:42:06<3:57:22, 13.71s/it] 58%|█████▊ | 1462/2500 [5:42:20<3:57:22, 13.72s/it] {'loss': 0.0013, 'grad_norm': 0.09811761756980888, 'learning_rate': 4.152e-07, 'completion_length': 55.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03204345703125, 'epoch': 0.58} 58%|█████▊ | 1462/2500 [5:42:20<3:57:22, 13.72s/it] 59%|█████▊ | 1463/2500 [5:42:37<4:15:25, 14.78s/it] {'loss': 0.0011, 'grad_norm': 0.09216690044683827, 'learning_rate': 4.148e-07, 'completion_length': 57.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02862548828125, 'epoch': 0.59} 59%|█████▊ | 1463/2500 [5:42:37<4:15:25, 14.78s/it] 59%|█████▊ | 1464/2500 [5:42:50<4:06:29, 14.28s/it] {'loss': 0.0013, 'grad_norm': 0.07819645767529337, 'learning_rate': 4.1439999999999995e-07, 'completion_length': 53.125003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03265380859375, 'epoch': 0.59} 59%|█████▊ | 1464/2500 [5:42:50<4:06:29, 14.28s/it] 59%|█████▊ | 1465/2500 [5:43:05<4:09:52, 14.49s/it] {'loss': 0.0008, 'grad_norm': 0.09712533170835673, 'learning_rate': 4.14e-07, 'completion_length': 55.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01971435546875, 'epoch': 0.59} 59%|█████▊ | 1465/2500 [5:43:05<4:09:52, 14.49s/it] 59%|█████▊ | 1466/2500 [5:43:18<4:04:39, 14.20s/it] {'loss': 0.0015, 'grad_norm': 0.09789727035999557, 'learning_rate': 4.136e-07, 'completion_length': 49.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.59} 59%|█████▊ | 1466/2500 [5:43:18<4:04:39, 14.20s/it] 59%|█████▊ | 1467/2500 [5:43:32<4:01:27, 14.02s/it] {'loss': 0.0008, 'grad_norm': 0.07716888864917719, 'learning_rate': 4.1319999999999997e-07, 'completion_length': 55.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019622802734375, 'epoch': 0.59} 59%|█████▊ | 1467/2500 [5:43:32<4:01:27, 14.02s/it] 59%|█████▊ | 1468/2500 [5:43:47<4:07:50, 14.41s/it] {'loss': 0.001, 'grad_norm': 0.12882054497553513, 'learning_rate': 4.128e-07, 'completion_length': 61.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02593994140625, 'epoch': 0.59} 59%|█████▊ | 1468/2500 [5:43:47<4:07:50, 14.41s/it] 59%|█████▉ | 1469/2500 [5:44:06<4:30:10, 15.72s/it] {'loss': 0.001, 'grad_norm': 0.5370740186120169, 'learning_rate': 4.1239999999999996e-07, 'completion_length': 69.51785850524902, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0250244140625, 'epoch': 0.59} 59%|█████▉ | 1469/2500 [5:44:06<4:30:10, 15.72s/it] 59%|█████▉ | 1470/2500 [5:44:19<4:17:11, 14.98s/it] {'loss': 0.0007, 'grad_norm': 0.07610206878273239, 'learning_rate': 4.12e-07, 'completion_length': 47.23214340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0169677734375, 'epoch': 0.59} 59%|█████▉ | 1470/2500 [5:44:19<4:17:11, 14.98s/it] 59%|█████▉ | 1471/2500 [5:44:33<4:10:30, 14.61s/it] {'loss': 0.0011, 'grad_norm': 0.07858544440165849, 'learning_rate': 4.116e-07, 'completion_length': 55.875003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0274658203125, 'epoch': 0.59} 59%|█████▉ | 1471/2500 [5:44:33<4:10:30, 14.61s/it] 59%|█████▉ | 1472/2500 [5:44:46<4:03:23, 14.21s/it] {'loss': 0.0019, 'grad_norm': 0.07386752807388583, 'learning_rate': 4.112e-07, 'completion_length': 53.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.047210693359375, 'epoch': 0.59} 59%|█████▉ | 1472/2500 [5:44:46<4:03:23, 14.21s/it] 59%|█████▉ | 1473/2500 [5:45:01<4:04:59, 14.31s/it] {'loss': 0.0016, 'grad_norm': 3.3723479074562572, 'learning_rate': 4.108e-07, 'completion_length': 58.00000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.040283203125, 'epoch': 0.59} 59%|█████▉ | 1473/2500 [5:45:01<4:04:59, 14.31s/it] 59%|█████▉ | 1474/2500 [5:45:14<3:59:59, 14.03s/it] {'loss': 0.0011, 'grad_norm': 0.060149090640588064, 'learning_rate': 4.1039999999999997e-07, 'completion_length': 57.892860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028594970703125, 'epoch': 0.59} 59%|█████▉ | 1474/2500 [5:45:14<3:59:59, 14.03s/it] 59%|█████▉ | 1475/2500 [5:45:28<3:58:47, 13.98s/it] {'loss': 0.001, 'grad_norm': 0.07705260876859085, 'learning_rate': 4.0999999999999994e-07, 'completion_length': 56.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0238037109375, 'epoch': 0.59} 59%|█████▉ | 1475/2500 [5:45:28<3:58:47, 13.98s/it] 59%|█████▉ | 1476/2500 [5:45:42<3:59:31, 14.03s/it] {'loss': 0.0016, 'grad_norm': 0.0965136216644731, 'learning_rate': 4.096e-07, 'completion_length': 60.892860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0396728515625, 'epoch': 0.59} 59%|█████▉ | 1476/2500 [5:45:42<3:59:31, 14.03s/it] 59%|█████▉ | 1477/2500 [5:45:56<3:56:47, 13.89s/it] {'loss': 0.0008, 'grad_norm': 0.3853032299385439, 'learning_rate': 4.092e-07, 'completion_length': 57.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019989013671875, 'epoch': 0.59} 59%|█████▉ | 1477/2500 [5:45:56<3:56:47, 13.89s/it] 59%|█████▉ | 1478/2500 [5:46:11<4:00:29, 14.12s/it] {'loss': 0.0009, 'grad_norm': 0.2967365690117253, 'learning_rate': 4.0879999999999995e-07, 'completion_length': 62.03571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0213623046875, 'epoch': 0.59} 59%|█████▉ | 1478/2500 [5:46:11<4:00:29, 14.12s/it] 59%|█████▉ | 1479/2500 [5:46:24<3:58:36, 14.02s/it] {'loss': 0.0011, 'grad_norm': 0.07116338460622212, 'learning_rate': 4.084e-07, 'completion_length': 60.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0281982421875, 'epoch': 0.59} 59%|█████▉ | 1479/2500 [5:46:24<3:58:36, 14.02s/it] 59%|█████▉ | 1480/2500 [5:46:38<3:57:08, 13.95s/it] {'loss': 0.0016, 'grad_norm': 0.07078660971024565, 'learning_rate': 4.0799999999999995e-07, 'completion_length': 57.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0411376953125, 'epoch': 0.59} 59%|█████▉ | 1480/2500 [5:46:38<3:57:08, 13.95s/it] 59%|█████▉ | 1481/2500 [5:46:52<3:55:28, 13.87s/it] {'loss': 0.0006, 'grad_norm': 0.0649156266449153, 'learning_rate': 4.076e-07, 'completion_length': 55.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015777587890625, 'epoch': 0.59} 59%|█████▉ | 1481/2500 [5:46:52<3:55:28, 13.87s/it] 59%|█████▉ | 1482/2500 [5:47:05<3:53:04, 13.74s/it] {'loss': 0.0026, 'grad_norm': 2.9496120778386983, 'learning_rate': 4.072e-07, 'completion_length': 58.94643020629883, 'rewards/accuracy_reward': 0.8750000298023224, 'rewards/format_reward': 1.0, 'reward': 1.8750000596046448, 'reward_std': 0.07695358991622925, 'kl': 0.06494140625, 'epoch': 0.59} 59%|█████▉ | 1482/2500 [5:47:05<3:53:04, 13.74s/it] 59%|█████▉ | 1483/2500 [5:47:19<3:53:41, 13.79s/it] {'loss': 0.0009, 'grad_norm': 0.0499427518661103, 'learning_rate': 4.0679999999999996e-07, 'completion_length': 59.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0216064453125, 'epoch': 0.59} 59%|█████▉ | 1483/2500 [5:47:19<3:53:41, 13.79s/it] 59%|█████▉ | 1484/2500 [5:47:34<3:57:13, 14.01s/it] {'loss': 0.0008, 'grad_norm': 0.05422964779468955, 'learning_rate': 4.064e-07, 'completion_length': 56.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019927978515625, 'epoch': 0.59} 59%|█████▉ | 1484/2500 [5:47:34<3:57:13, 14.01s/it] 59%|█████▉ | 1485/2500 [5:47:47<3:51:27, 13.68s/it] {'loss': 0.0012, 'grad_norm': 0.2901766992888509, 'learning_rate': 4.06e-07, 'completion_length': 48.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02911376953125, 'epoch': 0.59} 59%|█████▉ | 1485/2500 [5:47:47<3:51:27, 13.68s/it] 59%|█████▉ | 1486/2500 [5:48:01<3:54:24, 13.87s/it] {'loss': 0.0009, 'grad_norm': 0.07206834864950201, 'learning_rate': 4.056e-07, 'completion_length': 60.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0216064453125, 'epoch': 0.59} 59%|█████▉ | 1486/2500 [5:48:01<3:54:24, 13.87s/it] 59%|█████▉ | 1487/2500 [5:48:16<4:00:55, 14.27s/it] {'loss': 0.0007, 'grad_norm': 0.09435006077557605, 'learning_rate': 4.052e-07, 'completion_length': 59.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0186767578125, 'epoch': 0.59} 59%|█████▉ | 1487/2500 [5:48:16<4:00:55, 14.27s/it] 60%|█████▉ | 1488/2500 [5:48:31<4:01:49, 14.34s/it] {'loss': 0.0011, 'grad_norm': 0.06183644334417744, 'learning_rate': 4.0479999999999997e-07, 'completion_length': 60.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0263671875, 'epoch': 0.6} 60%|█████▉ | 1488/2500 [5:48:31<4:01:49, 14.34s/it] 60%|█████▉ | 1489/2500 [5:48:44<3:57:34, 14.10s/it] {'loss': 0.0008, 'grad_norm': 0.09355719828724145, 'learning_rate': 4.0439999999999994e-07, 'completion_length': 53.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019805908203125, 'epoch': 0.6} 60%|█████▉ | 1489/2500 [5:48:44<3:57:34, 14.10s/it] 60%|█████▉ | 1490/2500 [5:48:58<3:55:47, 14.01s/it] {'loss': 0.0013, 'grad_norm': 0.10751744362177322, 'learning_rate': 4.04e-07, 'completion_length': 55.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031982421875, 'epoch': 0.6} 60%|█████▉ | 1490/2500 [5:48:58<3:55:47, 14.01s/it] 60%|█████▉ | 1491/2500 [5:49:12<3:55:11, 13.99s/it] {'loss': 0.0012, 'grad_norm': 0.2665773784151654, 'learning_rate': 4.036e-07, 'completion_length': 59.392860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03021240234375, 'epoch': 0.6} 60%|█████▉ | 1491/2500 [5:49:12<3:55:11, 13.99s/it] 60%|█████▉ | 1492/2500 [5:49:26<3:55:40, 14.03s/it] {'loss': 0.0013, 'grad_norm': 0.9111188842172232, 'learning_rate': 4.032e-07, 'completion_length': 57.05357360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03216552734375, 'epoch': 0.6} 60%|█████▉ | 1492/2500 [5:49:26<3:55:40, 14.03s/it] 60%|█████▉ | 1493/2500 [5:49:39<3:52:27, 13.85s/it] {'loss': 0.0008, 'grad_norm': 0.06521143165346831, 'learning_rate': 4.028e-07, 'completion_length': 54.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02099609375, 'epoch': 0.6} 60%|█████▉ | 1493/2500 [5:49:39<3:52:27, 13.85s/it] 60%|█████▉ | 1494/2500 [5:49:53<3:50:19, 13.74s/it] {'loss': 0.0013, 'grad_norm': 0.44191198783134866, 'learning_rate': 4.0239999999999995e-07, 'completion_length': 55.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0318603515625, 'epoch': 0.6} 60%|█████▉ | 1494/2500 [5:49:53<3:50:19, 13.74s/it] 60%|█████▉ | 1495/2500 [5:50:07<3:50:11, 13.74s/it] {'loss': 0.0008, 'grad_norm': 0.09189009645503372, 'learning_rate': 4.02e-07, 'completion_length': 61.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021270751953125, 'epoch': 0.6} 60%|█████▉ | 1495/2500 [5:50:07<3:50:11, 13.74s/it] 60%|█████▉ | 1496/2500 [5:50:21<3:51:44, 13.85s/it] {'loss': 0.0007, 'grad_norm': 0.09278222805636939, 'learning_rate': 4.016e-07, 'completion_length': 57.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016937255859375, 'epoch': 0.6} 60%|█████▉ | 1496/2500 [5:50:21<3:51:44, 13.85s/it] 60%|█████▉ | 1497/2500 [5:50:34<3:46:16, 13.54s/it] {'loss': 0.0009, 'grad_norm': 0.04850166868483959, 'learning_rate': 4.0119999999999997e-07, 'completion_length': 51.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02264404296875, 'epoch': 0.6} 60%|█████▉ | 1497/2500 [5:50:34<3:46:16, 13.54s/it] 60%|█████▉ | 1498/2500 [5:50:47<3:46:49, 13.58s/it] {'loss': 0.0014, 'grad_norm': 0.06012863966338161, 'learning_rate': 4.008e-07, 'completion_length': 62.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03411865234375, 'epoch': 0.6} 60%|█████▉ | 1498/2500 [5:50:47<3:46:49, 13.58s/it] 60%|█████▉ | 1499/2500 [5:51:00<3:43:40, 13.41s/it] {'loss': 0.0007, 'grad_norm': 0.11224713963389539, 'learning_rate': 4.0039999999999996e-07, 'completion_length': 53.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016571044921875, 'epoch': 0.6} 60%|█████▉ | 1499/2500 [5:51:00<3:43:40, 13.41s/it] 60%|██████ | 1500/2500 [5:51:15<3:50:09, 13.81s/it] {'loss': 0.001, 'grad_norm': 1.6182071987177356, 'learning_rate': 4e-07, 'completion_length': 59.53571701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.024444580078125, 'epoch': 0.6} 60%|██████ | 1500/2500 [5:51:15<3:50:09, 13.81s/it] 60%|██████ | 1501/2500 [5:52:19<7:58:36, 28.75s/it] {'loss': 0.0012, 'grad_norm': 0.06775466285279483, 'learning_rate': 3.996e-07, 'completion_length': 52.160715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028900146484375, 'epoch': 0.6} 60%|██████ | 1501/2500 [5:52:19<7:58:36, 28.75s/it] 60%|██████ | 1502/2500 [5:52:28<6:21:51, 22.96s/it] {'loss': 0.001, 'grad_norm': 0.06949831035963955, 'learning_rate': 3.992e-07, 'completion_length': 55.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024139404296875, 'epoch': 0.6} 60%|██████ | 1502/2500 [5:52:28<6:21:51, 22.96s/it] 60%|██████ | 1503/2500 [5:52:36<5:07:24, 18.50s/it] {'loss': 0.0014, 'grad_norm': 0.09057111882616577, 'learning_rate': 3.9879999999999994e-07, 'completion_length': 55.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03411865234375, 'epoch': 0.6} 60%|██████ | 1503/2500 [5:52:36<5:07:24, 18.50s/it] 60%|██████ | 1504/2500 [5:52:44<4:14:59, 15.36s/it] {'loss': 0.0014, 'grad_norm': 0.07050750596736981, 'learning_rate': 3.9839999999999997e-07, 'completion_length': 52.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03387451171875, 'epoch': 0.6} 60%|██████ | 1504/2500 [5:52:44<4:14:59, 15.36s/it] 60%|██████ | 1505/2500 [5:52:52<3:38:32, 13.18s/it] {'loss': 0.001, 'grad_norm': 0.07578563436583534, 'learning_rate': 3.98e-07, 'completion_length': 51.67857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02508544921875, 'epoch': 0.6} 60%|██████ | 1505/2500 [5:52:52<3:38:32, 13.18s/it] 60%|██████ | 1506/2500 [5:53:00<3:12:41, 11.63s/it] {'loss': 0.0008, 'grad_norm': 0.06044681222510805, 'learning_rate': 3.976e-07, 'completion_length': 56.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02130126953125, 'epoch': 0.6} 60%|██████ | 1506/2500 [5:53:00<3:12:41, 11.63s/it] 60%|██████ | 1507/2500 [5:53:09<2:58:37, 10.79s/it] {'loss': 0.0005, 'grad_norm': 0.09438120933936131, 'learning_rate': 3.972e-07, 'completion_length': 56.82143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01373291015625, 'epoch': 0.6} 60%|██████ | 1507/2500 [5:53:09<2:58:37, 10.79s/it] 60%|██████ | 1508/2500 [5:53:17<2:42:11, 9.81s/it] {'loss': 0.0012, 'grad_norm': 0.07762757724481154, 'learning_rate': 3.9679999999999995e-07, 'completion_length': 49.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02984619140625, 'epoch': 0.6} 60%|██████ | 1508/2500 [5:53:17<2:42:11, 9.81s/it] 60%|██████ | 1509/2500 [5:53:25<2:34:19, 9.34s/it] {'loss': 0.0011, 'grad_norm': 0.06433537098171566, 'learning_rate': 3.964e-07, 'completion_length': 54.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0284423828125, 'epoch': 0.6} 60%|██████ | 1509/2500 [5:53:25<2:34:19, 9.34s/it] 60%|██████ | 1510/2500 [5:53:34<2:32:48, 9.26s/it] {'loss': 0.0013, 'grad_norm': 0.08396452672990183, 'learning_rate': 3.96e-07, 'completion_length': 61.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03277587890625, 'epoch': 0.6} 60%|██████ | 1510/2500 [5:53:34<2:32:48, 9.26s/it] 60%|██████ | 1511/2500 [5:53:43<2:30:27, 9.13s/it] {'loss': 0.0008, 'grad_norm': 0.054043202171407206, 'learning_rate': 3.9559999999999997e-07, 'completion_length': 49.96428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019775390625, 'epoch': 0.6} 60%|██████ | 1511/2500 [5:53:43<2:30:27, 9.13s/it] 60%|██████ | 1512/2500 [5:53:51<2:26:04, 8.87s/it] {'loss': 0.0017, 'grad_norm': 0.6860643779942822, 'learning_rate': 3.952e-07, 'completion_length': 54.96428871154785, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9107143878936768, 'reward_std': 0.0357142873108387, 'kl': 0.0426025390625, 'epoch': 0.6} 60%|██████ | 1512/2500 [5:53:51<2:26:04, 8.87s/it] 61%|██████ | 1513/2500 [5:53:59<2:21:21, 8.59s/it] {'loss': 0.0018, 'grad_norm': 0.2276444068505415, 'learning_rate': 3.9479999999999996e-07, 'completion_length': 55.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.044189453125, 'epoch': 0.61} 61%|██████ | 1513/2500 [5:53:59<2:21:21, 8.59s/it] 61%|██████ | 1514/2500 [5:54:07<2:18:54, 8.45s/it] {'loss': 0.0011, 'grad_norm': 0.054811269683584914, 'learning_rate': 3.9439999999999993e-07, 'completion_length': 51.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027801513671875, 'epoch': 0.61} 61%|██████ | 1514/2500 [5:54:07<2:18:54, 8.45s/it] 61%|██████ | 1515/2500 [5:54:16<2:19:58, 8.53s/it] {'loss': 0.0013, 'grad_norm': 0.07404268694289141, 'learning_rate': 3.94e-07, 'completion_length': 58.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03363037109375, 'epoch': 0.61} 61%|██████ | 1515/2500 [5:54:16<2:19:58, 8.53s/it] 61%|██████ | 1516/2500 [5:54:25<2:23:32, 8.75s/it] {'loss': 0.0006, 'grad_norm': 0.058231669409595156, 'learning_rate': 3.936e-07, 'completion_length': 56.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01519775390625, 'epoch': 0.61} 61%|██████ | 1516/2500 [5:54:25<2:23:32, 8.75s/it] 61%|██████ | 1517/2500 [5:54:35<2:29:46, 9.14s/it] {'loss': 0.001, 'grad_norm': 0.08355858705584365, 'learning_rate': 3.932e-07, 'completion_length': 60.91071891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0242919921875, 'epoch': 0.61} 61%|██████ | 1517/2500 [5:54:35<2:29:46, 9.14s/it] 61%|██████ | 1518/2500 [5:54:44<2:27:28, 9.01s/it] {'loss': 0.0011, 'grad_norm': 0.1102070168056448, 'learning_rate': 3.9279999999999997e-07, 'completion_length': 56.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02740478515625, 'epoch': 0.61} 61%|██████ | 1518/2500 [5:54:44<2:27:28, 9.01s/it] 61%|██████ | 1519/2500 [5:54:52<2:23:06, 8.75s/it] {'loss': 0.001, 'grad_norm': 0.08800405203737682, 'learning_rate': 3.924e-07, 'completion_length': 60.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025634765625, 'epoch': 0.61} 61%|██████ | 1519/2500 [5:54:52<2:23:06, 8.75s/it] 61%|██████ | 1520/2500 [5:55:00<2:19:33, 8.54s/it] {'loss': 0.0014, 'grad_norm': 0.07714102189845802, 'learning_rate': 3.92e-07, 'completion_length': 54.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03515625, 'epoch': 0.61} 61%|██████ | 1520/2500 [5:55:00<2:19:33, 8.54s/it] 61%|██████ | 1521/2500 [5:55:09<2:19:20, 8.54s/it] {'loss': 0.0007, 'grad_norm': 0.13095927107208222, 'learning_rate': 3.916e-07, 'completion_length': 54.750003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016998291015625, 'epoch': 0.61} 61%|██████ | 1521/2500 [5:55:09<2:19:20, 8.54s/it] 61%|██████ | 1522/2500 [5:55:17<2:18:31, 8.50s/it] {'loss': 0.0003, 'grad_norm': 0.08498759839673191, 'learning_rate': 3.9119999999999996e-07, 'completion_length': 57.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.00726318359375, 'epoch': 0.61} 61%|██████ | 1522/2500 [5:55:17<2:18:31, 8.50s/it] 61%|██████ | 1523/2500 [5:55:26<2:22:32, 8.75s/it] {'loss': 0.001, 'grad_norm': 0.06863837215835236, 'learning_rate': 3.908e-07, 'completion_length': 62.125003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024169921875, 'epoch': 0.61} 61%|██████ | 1523/2500 [5:55:26<2:22:32, 8.75s/it] 61%|██████ | 1524/2500 [5:55:37<2:30:21, 9.24s/it] {'loss': 0.0018, 'grad_norm': 0.08860814430831379, 'learning_rate': 3.904e-07, 'completion_length': 67.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0450439453125, 'epoch': 0.61} 61%|██████ | 1524/2500 [5:55:37<2:30:21, 9.24s/it] 61%|██████ | 1525/2500 [5:55:45<2:27:04, 9.05s/it] {'loss': 0.0009, 'grad_norm': 0.7726756370442287, 'learning_rate': 3.8999999999999997e-07, 'completion_length': 53.96428871154785, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.02252197265625, 'epoch': 0.61} 61%|██████ | 1525/2500 [5:55:45<2:27:04, 9.05s/it] 61%|██████ | 1526/2500 [5:55:54<2:26:59, 9.05s/it] {'loss': 0.0009, 'grad_norm': 0.06718033378703708, 'learning_rate': 3.896e-07, 'completion_length': 64.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02276611328125, 'epoch': 0.61} 61%|██████ | 1526/2500 [5:55:54<2:26:59, 9.05s/it] 61%|██████ | 1527/2500 [5:56:07<2:43:28, 10.08s/it] {'loss': 0.0012, 'grad_norm': 0.08256569288743089, 'learning_rate': 3.8919999999999996e-07, 'completion_length': 58.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03045654296875, 'epoch': 0.61} 61%|██████ | 1527/2500 [5:56:07<2:43:28, 10.08s/it] 61%|██████ | 1528/2500 [5:56:23<3:11:42, 11.83s/it] {'loss': 0.0016, 'grad_norm': 2.120061072945256, 'learning_rate': 3.888e-07, 'completion_length': 59.339290618896484, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.07695358991622925, 'kl': 0.0399169921875, 'epoch': 0.61} 61%|██████ | 1528/2500 [5:56:23<3:11:42, 11.83s/it] 61%|██████ | 1529/2500 [5:56:36<3:18:24, 12.26s/it] {'loss': 0.0011, 'grad_norm': 0.06185471808289634, 'learning_rate': 3.884e-07, 'completion_length': 57.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0264892578125, 'epoch': 0.61} 61%|██████ | 1529/2500 [5:56:36<3:18:24, 12.26s/it] 61%|██████ | 1530/2500 [5:56:49<3:22:51, 12.55s/it] {'loss': 0.0007, 'grad_norm': 0.07071855685403544, 'learning_rate': 3.88e-07, 'completion_length': 56.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01861572265625, 'epoch': 0.61} 61%|██████ | 1530/2500 [5:56:49<3:22:51, 12.55s/it] 61%|██████ | 1531/2500 [5:57:02<3:23:04, 12.57s/it] {'loss': 0.0015, 'grad_norm': 0.09829268879452713, 'learning_rate': 3.876e-07, 'completion_length': 51.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.61} 61%|██████ | 1531/2500 [5:57:02<3:23:04, 12.57s/it] 61%|██████▏ | 1532/2500 [5:57:15<3:27:40, 12.87s/it] {'loss': 0.001, 'grad_norm': 1.5092397259426293, 'learning_rate': 3.8719999999999997e-07, 'completion_length': 52.160715103149414, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0250396728515625, 'epoch': 0.61} 61%|██████▏ | 1532/2500 [5:57:15<3:27:40, 12.87s/it] 61%|██████▏ | 1533/2500 [5:57:31<3:37:57, 13.52s/it] {'loss': 0.001, 'grad_norm': 0.42353848992751403, 'learning_rate': 3.8679999999999994e-07, 'completion_length': 63.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02386474609375, 'epoch': 0.61} 61%|██████▏ | 1533/2500 [5:57:31<3:37:57, 13.52s/it] 61%|██████▏ | 1534/2500 [5:57:43<3:34:52, 13.35s/it] {'loss': 0.0009, 'grad_norm': 0.07107827561442949, 'learning_rate': 3.864e-07, 'completion_length': 51.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02142333984375, 'epoch': 0.61} 61%|██████▏ | 1534/2500 [5:57:43<3:34:52, 13.35s/it] 61%|██████▏ | 1535/2500 [5:57:57<3:37:03, 13.50s/it] {'loss': 0.0019, 'grad_norm': 2.3980134521376453, 'learning_rate': 3.86e-07, 'completion_length': 49.87500190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.046875, 'epoch': 0.61} 61%|██████▏ | 1535/2500 [5:57:57<3:37:03, 13.50s/it] 61%|██████▏ | 1536/2500 [5:58:11<3:36:55, 13.50s/it] {'loss': 0.0015, 'grad_norm': 0.08894295042141322, 'learning_rate': 3.8559999999999996e-07, 'completion_length': 56.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.61} 61%|██████▏ | 1536/2500 [5:58:11<3:36:55, 13.50s/it] 61%|██████▏ | 1537/2500 [5:58:26<3:42:38, 13.87s/it] {'loss': 0.001, 'grad_norm': 0.0922673893464189, 'learning_rate': 3.852e-07, 'completion_length': 63.58928871154785, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02435302734375, 'epoch': 0.61} 61%|██████▏ | 1537/2500 [5:58:26<3:42:38, 13.87s/it] 62%|██████▏ | 1538/2500 [5:58:39<3:40:46, 13.77s/it] {'loss': 0.0014, 'grad_norm': 1.374157026947227, 'learning_rate': 3.8479999999999995e-07, 'completion_length': 57.892860412597656, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03564453125, 'epoch': 0.62} 62%|██████▏ | 1538/2500 [5:58:39<3:40:46, 13.77s/it] 62%|██████▏ | 1539/2500 [5:58:53<3:40:11, 13.75s/it] {'loss': 0.0025, 'grad_norm': 5.949116229779386, 'learning_rate': 3.8440000000000003e-07, 'completion_length': 51.83928871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.061798095703125, 'epoch': 0.62} 62%|██████▏ | 1539/2500 [5:58:53<3:40:11, 13.75s/it] 62%|██████▏ | 1540/2500 [5:59:07<3:42:03, 13.88s/it] {'loss': 0.0011, 'grad_norm': 0.08776889939191557, 'learning_rate': 3.84e-07, 'completion_length': 65.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02801513671875, 'epoch': 0.62} 62%|██████▏ | 1540/2500 [5:59:07<3:42:03, 13.88s/it] 62%|██████▏ | 1541/2500 [5:59:20<3:37:54, 13.63s/it] {'loss': 0.001, 'grad_norm': 1.042976436896199, 'learning_rate': 3.8359999999999997e-07, 'completion_length': 51.66071701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.02508544921875, 'epoch': 0.62} 62%|██████▏ | 1541/2500 [5:59:20<3:37:54, 13.63s/it] 62%|██████▏ | 1542/2500 [5:59:34<3:38:21, 13.68s/it] {'loss': 0.0014, 'grad_norm': 0.07687727195186803, 'learning_rate': 3.832e-07, 'completion_length': 53.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03466796875, 'epoch': 0.62} 62%|██████▏ | 1542/2500 [5:59:34<3:38:21, 13.68s/it] 62%|██████▏ | 1543/2500 [5:59:47<3:35:40, 13.52s/it] {'loss': 0.0011, 'grad_norm': 0.07761766558560144, 'learning_rate': 3.8279999999999996e-07, 'completion_length': 48.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02825927734375, 'epoch': 0.62} 62%|██████▏ | 1543/2500 [5:59:47<3:35:40, 13.52s/it] 62%|██████▏ | 1544/2500 [6:00:01<3:36:53, 13.61s/it] {'loss': 0.0009, 'grad_norm': 0.08396189728847313, 'learning_rate': 3.824e-07, 'completion_length': 55.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021484375, 'epoch': 0.62} 62%|██████▏ | 1544/2500 [6:00:01<3:36:53, 13.61s/it] 62%|██████▏ | 1545/2500 [6:00:15<3:38:13, 13.71s/it] {'loss': 0.0014, 'grad_norm': 0.0746572107780824, 'learning_rate': 3.82e-07, 'completion_length': 56.92857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03460693359375, 'epoch': 0.62} 62%|██████▏ | 1545/2500 [6:00:15<3:38:13, 13.71s/it] 62%|██████▏ | 1546/2500 [6:00:28<3:35:55, 13.58s/it] {'loss': 0.0022, 'grad_norm': 0.11393758432460446, 'learning_rate': 3.816e-07, 'completion_length': 51.71428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05419921875, 'epoch': 0.62} 62%|██████▏ | 1546/2500 [6:00:28<3:35:55, 13.58s/it] 62%|██████▏ | 1547/2500 [6:00:42<3:38:18, 13.74s/it] {'loss': 0.0012, 'grad_norm': 1.8027844500956058, 'learning_rate': 3.8119999999999995e-07, 'completion_length': 60.25000190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285715222358704, 'reward_std': 0.0714285746216774, 'kl': 0.03118896484375, 'epoch': 0.62} 62%|██████▏ | 1547/2500 [6:00:42<3:38:18, 13.74s/it] 62%|██████▏ | 1548/2500 [6:00:55<3:35:51, 13.60s/it] {'loss': 0.001, 'grad_norm': 0.06396109070303015, 'learning_rate': 3.808e-07, 'completion_length': 51.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025390625, 'epoch': 0.62} 62%|██████▏ | 1548/2500 [6:00:55<3:35:51, 13.60s/it] 62%|██████▏ | 1549/2500 [6:01:09<3:33:40, 13.48s/it] {'loss': 0.0013, 'grad_norm': 0.3071312726391362, 'learning_rate': 3.804e-07, 'completion_length': 53.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031982421875, 'epoch': 0.62} 62%|██████▏ | 1549/2500 [6:01:09<3:33:40, 13.48s/it] 62%|██████▏ | 1550/2500 [6:01:23<3:37:51, 13.76s/it] {'loss': 0.0004, 'grad_norm': 0.06471186458527445, 'learning_rate': 3.7999999999999996e-07, 'completion_length': 51.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0091552734375, 'epoch': 0.62} 62%|██████▏ | 1550/2500 [6:01:23<3:37:51, 13.76s/it] 62%|██████▏ | 1551/2500 [6:01:39<3:48:20, 14.44s/it] {'loss': 0.0018, 'grad_norm': 1.3965342489578656, 'learning_rate': 3.796e-07, 'completion_length': 63.28571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0457763671875, 'epoch': 0.62} 62%|██████▏ | 1551/2500 [6:01:39<3:48:20, 14.44s/it] 62%|██████▏ | 1552/2500 [6:01:54<3:49:00, 14.49s/it] {'loss': 0.0023, 'grad_norm': 0.22159818772313017, 'learning_rate': 3.7919999999999995e-07, 'completion_length': 61.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0584716796875, 'epoch': 0.62} 62%|██████▏ | 1552/2500 [6:01:54<3:49:00, 14.49s/it] 62%|██████▏ | 1553/2500 [6:02:08<3:48:32, 14.48s/it] {'loss': 0.0014, 'grad_norm': 0.07629934440804026, 'learning_rate': 3.7880000000000003e-07, 'completion_length': 60.392860412597656, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.034912109375, 'epoch': 0.62} 62%|██████▏ | 1553/2500 [6:02:08<3:48:32, 14.48s/it] 62%|██████▏ | 1554/2500 [6:02:21<3:41:18, 14.04s/it] {'loss': 0.0016, 'grad_norm': 0.09829691581041915, 'learning_rate': 3.784e-07, 'completion_length': 51.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0399169921875, 'epoch': 0.62} 62%|██████▏ | 1554/2500 [6:02:21<3:41:18, 14.04s/it] 62%|██████▏ | 1555/2500 [6:02:35<3:41:05, 14.04s/it] {'loss': 0.0016, 'grad_norm': 2.2423054903061264, 'learning_rate': 3.7799999999999997e-07, 'completion_length': 61.464290618896484, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0406494140625, 'epoch': 0.62} 62%|██████▏ | 1555/2500 [6:02:35<3:41:05, 14.04s/it] 62%|██████▏ | 1556/2500 [6:02:50<3:42:50, 14.16s/it] {'loss': 0.0014, 'grad_norm': 0.1742669782873131, 'learning_rate': 3.776e-07, 'completion_length': 55.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03411865234375, 'epoch': 0.62} 62%|██████▏ | 1556/2500 [6:02:50<3:42:50, 14.16s/it] 62%|██████▏ | 1557/2500 [6:03:03<3:41:21, 14.08s/it] {'loss': 0.0012, 'grad_norm': 0.15621298674524595, 'learning_rate': 3.7719999999999996e-07, 'completion_length': 62.41071891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029052734375, 'epoch': 0.62} 62%|██████▏ | 1557/2500 [6:03:03<3:41:21, 14.08s/it] 62%|██████▏ | 1558/2500 [6:03:19<3:45:30, 14.36s/it] {'loss': 0.0009, 'grad_norm': 0.0950755013813083, 'learning_rate': 3.768e-07, 'completion_length': 60.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02197265625, 'epoch': 0.62} 62%|██████▏ | 1558/2500 [6:03:19<3:45:30, 14.36s/it] 62%|██████▏ | 1559/2500 [6:03:32<3:42:47, 14.21s/it] {'loss': 0.0008, 'grad_norm': 0.055579470940276564, 'learning_rate': 3.764e-07, 'completion_length': 55.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018890380859375, 'epoch': 0.62} 62%|██████▏ | 1559/2500 [6:03:32<3:42:47, 14.21s/it] 62%|██████▏ | 1560/2500 [6:03:46<3:39:02, 13.98s/it] {'loss': 0.0009, 'grad_norm': 0.2127380539465332, 'learning_rate': 3.76e-07, 'completion_length': 60.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0223388671875, 'epoch': 0.62} 62%|██████▏ | 1560/2500 [6:03:46<3:39:02, 13.98s/it] 62%|██████▏ | 1561/2500 [6:04:00<3:38:09, 13.94s/it] {'loss': 0.0015, 'grad_norm': 0.07384009563526071, 'learning_rate': 3.7559999999999995e-07, 'completion_length': 63.71428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03790283203125, 'epoch': 0.62} 62%|██████▏ | 1561/2500 [6:04:00<3:38:09, 13.94s/it] 62%|██████▏ | 1562/2500 [6:04:19<4:04:35, 15.65s/it] {'loss': 0.0005, 'grad_norm': 0.44572473150662295, 'learning_rate': 3.7519999999999997e-07, 'completion_length': 66.48214721679688, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.01141357421875, 'epoch': 0.62} 62%|██████▏ | 1562/2500 [6:04:19<4:04:35, 15.65s/it] 63%|██████▎ | 1563/2500 [6:04:37<4:13:39, 16.24s/it] {'loss': 0.001, 'grad_norm': 0.0789388904562356, 'learning_rate': 3.748e-07, 'completion_length': 59.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0255126953125, 'epoch': 0.63} 63%|██████▎ | 1563/2500 [6:04:37<4:13:39, 16.24s/it] 63%|██████▎ | 1564/2500 [6:04:51<4:01:21, 15.47s/it] {'loss': 0.0016, 'grad_norm': 1.5476604865736097, 'learning_rate': 3.744e-07, 'completion_length': 56.25000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.039886474609375, 'epoch': 0.63} 63%|██████▎ | 1564/2500 [6:04:51<4:01:21, 15.47s/it] 63%|██████▎ | 1565/2500 [6:05:04<3:51:54, 14.88s/it] {'loss': 0.0015, 'grad_norm': 0.044484064483239265, 'learning_rate': 3.74e-07, 'completion_length': 50.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0377197265625, 'epoch': 0.63} 63%|██████▎ | 1565/2500 [6:05:04<3:51:54, 14.88s/it] 63%|██████▎ | 1566/2500 [6:05:20<3:56:46, 15.21s/it] {'loss': 0.0017, 'grad_norm': 0.11790026178696854, 'learning_rate': 3.7359999999999996e-07, 'completion_length': 69.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0433349609375, 'epoch': 0.63} 63%|██████▎ | 1566/2500 [6:05:20<3:56:46, 15.21s/it] 63%|██████▎ | 1567/2500 [6:05:34<3:49:59, 14.79s/it] {'loss': 0.0017, 'grad_norm': 0.06783559885456977, 'learning_rate': 3.732e-07, 'completion_length': 57.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0423583984375, 'epoch': 0.63} 63%|██████▎ | 1567/2500 [6:05:34<3:49:59, 14.79s/it] 63%|██████▎ | 1568/2500 [6:05:49<3:50:25, 14.83s/it] {'loss': 0.0007, 'grad_norm': 0.07514059345559225, 'learning_rate': 3.728e-07, 'completion_length': 63.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01800537109375, 'epoch': 0.63} 63%|██████▎ | 1568/2500 [6:05:49<3:50:25, 14.83s/it] 63%|██████▎ | 1569/2500 [6:06:04<3:50:01, 14.82s/it] {'loss': 0.0012, 'grad_norm': 0.07381798348188862, 'learning_rate': 3.7239999999999997e-07, 'completion_length': 62.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0291748046875, 'epoch': 0.63} 63%|██████▎ | 1569/2500 [6:06:04<3:50:01, 14.82s/it] 63%|██████▎ | 1570/2500 [6:06:19<3:50:43, 14.89s/it] {'loss': 0.0017, 'grad_norm': 0.08217724655656057, 'learning_rate': 3.72e-07, 'completion_length': 63.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04376220703125, 'epoch': 0.63} 63%|██████▎ | 1570/2500 [6:06:19<3:50:43, 14.89s/it] 63%|██████▎ | 1571/2500 [6:06:32<3:45:24, 14.56s/it] {'loss': 0.0008, 'grad_norm': 0.966490137039191, 'learning_rate': 3.7159999999999997e-07, 'completion_length': 55.48214530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.019073486328125, 'epoch': 0.63} 63%|██████▎ | 1571/2500 [6:06:32<3:45:24, 14.56s/it] 63%|██████▎ | 1572/2500 [6:06:48<3:49:59, 14.87s/it] {'loss': 0.0015, 'grad_norm': 0.07847424520564866, 'learning_rate': 3.7119999999999994e-07, 'completion_length': 53.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.036376953125, 'epoch': 0.63} 63%|██████▎ | 1572/2500 [6:06:48<3:49:59, 14.87s/it] 63%|██████▎ | 1573/2500 [6:07:02<3:43:58, 14.50s/it] {'loss': 0.0015, 'grad_norm': 0.08575746942726245, 'learning_rate': 3.708e-07, 'completion_length': 59.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0367431640625, 'epoch': 0.63} 63%|██████▎ | 1573/2500 [6:07:02<3:43:58, 14.50s/it] 63%|██████▎ | 1574/2500 [6:07:16<3:41:01, 14.32s/it] {'loss': 0.0015, 'grad_norm': 2.0159724402243246, 'learning_rate': 3.704e-07, 'completion_length': 57.750003814697266, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03802490234375, 'epoch': 0.63} 63%|██████▎ | 1574/2500 [6:07:16<3:41:01, 14.32s/it] 63%|██████▎ | 1575/2500 [6:07:31<3:43:42, 14.51s/it] {'loss': 0.0011, 'grad_norm': 0.0681146384480646, 'learning_rate': 3.7e-07, 'completion_length': 63.60714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02801513671875, 'epoch': 0.63} 63%|██████▎ | 1575/2500 [6:07:31<3:43:42, 14.51s/it] 63%|██████▎ | 1576/2500 [6:07:44<3:36:57, 14.09s/it] {'loss': 0.0019, 'grad_norm': 0.08037262022494092, 'learning_rate': 3.696e-07, 'completion_length': 47.78571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.048095703125, 'epoch': 0.63} 63%|██████▎ | 1576/2500 [6:07:44<3:36:57, 14.09s/it] 63%|██████▎ | 1577/2500 [6:07:57<3:32:27, 13.81s/it] {'loss': 0.0004, 'grad_norm': 0.07288596313050866, 'learning_rate': 3.6919999999999994e-07, 'completion_length': 47.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.010498046875, 'epoch': 0.63} 63%|██████▎ | 1577/2500 [6:07:57<3:32:27, 13.81s/it] 63%|██████▎ | 1578/2500 [6:08:10<3:30:16, 13.68s/it] {'loss': 0.0008, 'grad_norm': 0.08112287524033818, 'learning_rate': 3.688e-07, 'completion_length': 53.05357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02020263671875, 'epoch': 0.63} 63%|██████▎ | 1578/2500 [6:08:10<3:30:16, 13.68s/it] 63%|██████▎ | 1579/2500 [6:08:23<3:27:58, 13.55s/it] {'loss': 0.0008, 'grad_norm': 1.0154605146176088, 'learning_rate': 3.684e-07, 'completion_length': 55.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0204925537109375, 'epoch': 0.63} 63%|██████▎ | 1579/2500 [6:08:23<3:27:58, 13.55s/it] 63%|██████▎ | 1580/2500 [6:08:38<3:33:42, 13.94s/it] {'loss': 0.0013, 'grad_norm': 0.10430923209220351, 'learning_rate': 3.6799999999999996e-07, 'completion_length': 58.85714530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.032470703125, 'epoch': 0.63} 63%|██████▎ | 1580/2500 [6:08:38<3:33:42, 13.94s/it] 63%|██████▎ | 1581/2500 [6:08:53<3:36:19, 14.12s/it] {'loss': 0.0009, 'grad_norm': 0.8824083213992165, 'learning_rate': 3.676e-07, 'completion_length': 55.33928871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02276611328125, 'epoch': 0.63} 63%|██████▎ | 1581/2500 [6:08:53<3:36:19, 14.12s/it] 63%|██████▎ | 1582/2500 [6:09:06<3:33:02, 13.92s/it] {'loss': 0.0008, 'grad_norm': 0.08457666718983435, 'learning_rate': 3.672e-07, 'completion_length': 57.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0200653076171875, 'epoch': 0.63} 63%|██████▎ | 1582/2500 [6:09:06<3:33:02, 13.92s/it] 63%|██████▎ | 1583/2500 [6:09:20<3:33:25, 13.96s/it] {'loss': 0.0012, 'grad_norm': 0.0714390088672799, 'learning_rate': 3.668e-07, 'completion_length': 63.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03070068359375, 'epoch': 0.63} 63%|██████▎ | 1583/2500 [6:09:20<3:33:25, 13.96s/it] 63%|██████▎ | 1584/2500 [6:09:35<3:35:19, 14.10s/it] {'loss': 0.0017, 'grad_norm': 0.08059791214185245, 'learning_rate': 3.664e-07, 'completion_length': 57.37500190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.04345703125, 'epoch': 0.63} 63%|██████▎ | 1584/2500 [6:09:35<3:35:19, 14.10s/it] 63%|██████▎ | 1585/2500 [6:09:48<3:32:53, 13.96s/it] {'loss': 0.0009, 'grad_norm': 0.09923609665819419, 'learning_rate': 3.6599999999999997e-07, 'completion_length': 63.71428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022186279296875, 'epoch': 0.63} 63%|██████▎ | 1585/2500 [6:09:48<3:32:53, 13.96s/it] 63%|██████▎ | 1586/2500 [6:10:03<3:33:59, 14.05s/it] {'loss': 0.0009, 'grad_norm': 0.137680721682341, 'learning_rate': 3.6559999999999994e-07, 'completion_length': 55.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02252197265625, 'epoch': 0.63} 63%|██████▎ | 1586/2500 [6:10:03<3:33:59, 14.05s/it] 63%|██████▎ | 1587/2500 [6:10:17<3:36:45, 14.25s/it] {'loss': 0.0016, 'grad_norm': 0.9779423573204208, 'learning_rate': 3.652e-07, 'completion_length': 57.64286231994629, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0401611328125, 'epoch': 0.63} 63%|██████▎ | 1587/2500 [6:10:17<3:36:45, 14.25s/it] 64%|██████▎ | 1588/2500 [6:10:33<3:42:44, 14.65s/it] {'loss': 0.0012, 'grad_norm': 0.08968656079160779, 'learning_rate': 3.648e-07, 'completion_length': 60.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0301513671875, 'epoch': 0.64} 64%|██████▎ | 1588/2500 [6:10:33<3:42:44, 14.65s/it] 64%|██████▎ | 1589/2500 [6:10:46<3:35:24, 14.19s/it] {'loss': 0.0008, 'grad_norm': 0.1328021992496348, 'learning_rate': 3.644e-07, 'completion_length': 51.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02020263671875, 'epoch': 0.64} 64%|██████▎ | 1589/2500 [6:10:46<3:35:24, 14.19s/it] 64%|██████▎ | 1590/2500 [6:11:00<3:34:13, 14.12s/it] {'loss': 0.0015, 'grad_norm': 0.08743673960293755, 'learning_rate': 3.64e-07, 'completion_length': 55.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0374755859375, 'epoch': 0.64} 64%|██████▎ | 1590/2500 [6:11:00<3:34:13, 14.12s/it] 64%|██████▎ | 1591/2500 [6:11:14<3:32:08, 14.00s/it] {'loss': 0.0005, 'grad_norm': 0.05936744438569406, 'learning_rate': 3.6359999999999995e-07, 'completion_length': 54.96428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01373291015625, 'epoch': 0.64} 64%|██████▎ | 1591/2500 [6:11:14<3:32:08, 14.00s/it] 64%|██████▎ | 1592/2500 [6:11:29<3:37:34, 14.38s/it] {'loss': 0.0017, 'grad_norm': 0.5614311914275937, 'learning_rate': 3.632e-07, 'completion_length': 65.3214340209961, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0421142578125, 'epoch': 0.64} 64%|██████▎ | 1592/2500 [6:11:29<3:37:34, 14.38s/it] 64%|██████▎ | 1593/2500 [6:11:43<3:36:20, 14.31s/it] {'loss': 0.0008, 'grad_norm': 0.0795636257696657, 'learning_rate': 3.628e-07, 'completion_length': 58.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019439697265625, 'epoch': 0.64} 64%|██████▎ | 1593/2500 [6:11:43<3:36:20, 14.31s/it] 64%|██████▍ | 1594/2500 [6:11:57<3:33:48, 14.16s/it] {'loss': 0.0016, 'grad_norm': 0.06039971542995471, 'learning_rate': 3.6239999999999996e-07, 'completion_length': 56.08928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04052734375, 'epoch': 0.64} 64%|██████▍ | 1594/2500 [6:11:57<3:33:48, 14.16s/it] 64%|██████▍ | 1595/2500 [6:12:12<3:36:57, 14.38s/it] {'loss': 0.0017, 'grad_norm': 0.08434321921156987, 'learning_rate': 3.62e-07, 'completion_length': 61.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04290771484375, 'epoch': 0.64} 64%|██████▍ | 1595/2500 [6:12:12<3:36:57, 14.38s/it] 64%|██████▍ | 1596/2500 [6:12:25<3:31:27, 14.03s/it] {'loss': 0.0014, 'grad_norm': 0.08286736085114643, 'learning_rate': 3.6159999999999996e-07, 'completion_length': 54.53571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0347900390625, 'epoch': 0.64} 64%|██████▍ | 1596/2500 [6:12:25<3:31:27, 14.03s/it] 64%|██████▍ | 1597/2500 [6:12:39<3:30:15, 13.97s/it] {'loss': 0.0012, 'grad_norm': 0.07854679880860554, 'learning_rate': 3.612e-07, 'completion_length': 58.05357551574707, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0289306640625, 'epoch': 0.64} 64%|██████▍ | 1597/2500 [6:12:39<3:30:15, 13.97s/it] 64%|██████▍ | 1598/2500 [6:12:52<3:26:57, 13.77s/it] {'loss': 0.0009, 'grad_norm': 0.11134446105022965, 'learning_rate': 3.608e-07, 'completion_length': 50.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02264404296875, 'epoch': 0.64} 64%|██████▍ | 1598/2500 [6:12:52<3:26:57, 13.77s/it] 64%|██████▍ | 1599/2500 [6:13:06<3:26:33, 13.75s/it] {'loss': 0.001, 'grad_norm': 0.07092064154760648, 'learning_rate': 3.6039999999999997e-07, 'completion_length': 52.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02410888671875, 'epoch': 0.64} 64%|██████▍ | 1599/2500 [6:13:06<3:26:33, 13.75s/it] 64%|██████▍ | 1600/2500 [6:13:20<3:28:08, 13.88s/it] {'loss': 0.001, 'grad_norm': 0.08951755877622895, 'learning_rate': 3.6e-07, 'completion_length': 54.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02490234375, 'epoch': 0.64} 64%|██████▍ | 1600/2500 [6:13:20<3:28:08, 13.88s/it] 64%|██████▍ | 1601/2500 [6:14:30<7:41:18, 30.79s/it] {'loss': 0.0009, 'grad_norm': 0.07702477247484897, 'learning_rate': 3.5959999999999996e-07, 'completion_length': 61.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.64} 64%|██████▍ | 1601/2500 [6:14:30<7:41:18, 30.79s/it] 64%|██████▍ | 1602/2500 [6:14:44<6:23:13, 25.61s/it] {'loss': 0.001, 'grad_norm': 0.06297958223523521, 'learning_rate': 3.592e-07, 'completion_length': 50.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02545166015625, 'epoch': 0.64} 64%|██████▍ | 1602/2500 [6:14:44<6:23:13, 25.61s/it] 64%|██████▍ | 1603/2500 [6:14:58<5:31:01, 22.14s/it] {'loss': 0.0007, 'grad_norm': 0.07203927115584773, 'learning_rate': 3.588e-07, 'completion_length': 55.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017608642578125, 'epoch': 0.64} 64%|██████▍ | 1603/2500 [6:14:58<5:31:01, 22.14s/it] 64%|██████▍ | 1604/2500 [6:15:12<4:53:00, 19.62s/it] {'loss': 0.0009, 'grad_norm': 0.0751251421152314, 'learning_rate': 3.584e-07, 'completion_length': 52.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02252197265625, 'epoch': 0.64} 64%|██████▍ | 1604/2500 [6:15:12<4:53:00, 19.62s/it] 64%|██████▍ | 1605/2500 [6:15:27<4:31:39, 18.21s/it] {'loss': 0.0012, 'grad_norm': 0.07195817565145379, 'learning_rate': 3.5799999999999995e-07, 'completion_length': 57.92857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0311279296875, 'epoch': 0.64} 64%|██████▍ | 1605/2500 [6:15:27<4:31:39, 18.21s/it] 64%|██████▍ | 1606/2500 [6:15:41<4:14:40, 17.09s/it] {'loss': 0.0012, 'grad_norm': 0.0819584043497545, 'learning_rate': 3.5759999999999997e-07, 'completion_length': 53.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029052734375, 'epoch': 0.64} 64%|██████▍ | 1606/2500 [6:15:41<4:14:40, 17.09s/it] 64%|██████▍ | 1607/2500 [6:15:55<3:58:49, 16.05s/it] {'loss': 0.001, 'grad_norm': 0.10732336830308356, 'learning_rate': 3.572e-07, 'completion_length': 54.892860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025421142578125, 'epoch': 0.64} 64%|██████▍ | 1607/2500 [6:15:55<3:58:49, 16.05s/it] 64%|██████▍ | 1608/2500 [6:16:10<3:54:06, 15.75s/it] {'loss': 0.0015, 'grad_norm': 0.11567663808573382, 'learning_rate': 3.5679999999999997e-07, 'completion_length': 56.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038330078125, 'epoch': 0.64} 64%|██████▍ | 1608/2500 [6:16:10<3:54:06, 15.75s/it] 64%|██████▍ | 1609/2500 [6:16:24<3:48:14, 15.37s/it] {'loss': 0.0011, 'grad_norm': 0.07395573732345466, 'learning_rate': 3.564e-07, 'completion_length': 63.607147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0262451171875, 'epoch': 0.64} 64%|██████▍ | 1609/2500 [6:16:24<3:48:14, 15.37s/it] 64%|██████▍ | 1610/2500 [6:16:38<3:42:02, 14.97s/it] {'loss': 0.0012, 'grad_norm': 0.05704167084300882, 'learning_rate': 3.5599999999999996e-07, 'completion_length': 57.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031005859375, 'epoch': 0.64} 64%|██████▍ | 1610/2500 [6:16:38<3:42:02, 14.97s/it] 64%|██████▍ | 1611/2500 [6:16:52<3:36:45, 14.63s/it] {'loss': 0.0008, 'grad_norm': 0.07296235511656614, 'learning_rate': 3.5560000000000003e-07, 'completion_length': 56.000003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020477294921875, 'epoch': 0.64} 64%|██████▍ | 1611/2500 [6:16:52<3:36:45, 14.63s/it] 64%|██████▍ | 1612/2500 [6:17:07<3:35:46, 14.58s/it] {'loss': 0.0008, 'grad_norm': 0.046487930599168385, 'learning_rate': 3.552e-07, 'completion_length': 54.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0196533203125, 'epoch': 0.64} 64%|██████▍ | 1612/2500 [6:17:07<3:35:46, 14.58s/it] 65%|██████▍ | 1613/2500 [6:17:21<3:33:43, 14.46s/it] {'loss': 0.001, 'grad_norm': 0.17458370641539123, 'learning_rate': 3.548e-07, 'completion_length': 57.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024871826171875, 'epoch': 0.65} 65%|██████▍ | 1613/2500 [6:17:21<3:33:43, 14.46s/it] 65%|██████▍ | 1614/2500 [6:17:35<3:32:01, 14.36s/it] {'loss': 0.0013, 'grad_norm': 0.09446818319736992, 'learning_rate': 3.544e-07, 'completion_length': 54.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03204345703125, 'epoch': 0.65} 65%|██████▍ | 1614/2500 [6:17:35<3:32:01, 14.36s/it] 65%|██████▍ | 1615/2500 [6:17:52<3:42:18, 15.07s/it] {'loss': 0.0006, 'grad_norm': 0.05674367411262685, 'learning_rate': 3.5399999999999997e-07, 'completion_length': 69.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01483154296875, 'epoch': 0.65} 65%|██████▍ | 1615/2500 [6:17:52<3:42:18, 15.07s/it] 65%|██████▍ | 1616/2500 [6:18:07<3:44:51, 15.26s/it] {'loss': 0.0011, 'grad_norm': 0.13610974918979274, 'learning_rate': 3.536e-07, 'completion_length': 60.607147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028564453125, 'epoch': 0.65} 65%|██████▍ | 1616/2500 [6:18:07<3:44:51, 15.26s/it] 65%|██████▍ | 1617/2500 [6:18:27<4:04:56, 16.64s/it] {'loss': 0.0013, 'grad_norm': 0.4447476999467851, 'learning_rate': 3.532e-07, 'completion_length': 74.78571891784668, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0313720703125, 'epoch': 0.65} 65%|██████▍ | 1617/2500 [6:18:27<4:04:56, 16.64s/it] 65%|██████▍ | 1618/2500 [6:18:41<3:53:33, 15.89s/it] {'loss': 0.001, 'grad_norm': 0.18902828158300533, 'learning_rate': 3.528e-07, 'completion_length': 57.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0240478515625, 'epoch': 0.65} 65%|██████▍ | 1618/2500 [6:18:41<3:53:33, 15.89s/it] 65%|██████▍ | 1619/2500 [6:18:54<3:40:49, 15.04s/it] {'loss': 0.0013, 'grad_norm': 1.1147128233805332, 'learning_rate': 3.5239999999999995e-07, 'completion_length': 51.87500190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03173828125, 'epoch': 0.65} 65%|██████▍ | 1619/2500 [6:18:54<3:40:49, 15.04s/it] 65%|██████▍ | 1620/2500 [6:19:08<3:34:45, 14.64s/it] {'loss': 0.0006, 'grad_norm': 0.05897267380104163, 'learning_rate': 3.52e-07, 'completion_length': 50.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0149993896484375, 'epoch': 0.65} 65%|██████▍ | 1620/2500 [6:19:08<3:34:45, 14.64s/it] 65%|██████▍ | 1621/2500 [6:19:22<3:31:08, 14.41s/it] {'loss': 0.0011, 'grad_norm': 0.05993328675921242, 'learning_rate': 3.516e-07, 'completion_length': 61.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0286865234375, 'epoch': 0.65} 65%|██████▍ | 1621/2500 [6:19:22<3:31:08, 14.41s/it] 65%|██████▍ | 1622/2500 [6:19:36<3:29:21, 14.31s/it] {'loss': 0.0013, 'grad_norm': 0.12861640407310176, 'learning_rate': 3.512e-07, 'completion_length': 59.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0313720703125, 'epoch': 0.65} 65%|██████▍ | 1622/2500 [6:19:36<3:29:21, 14.31s/it] 65%|██████▍ | 1623/2500 [6:19:50<3:26:50, 14.15s/it] {'loss': 0.0022, 'grad_norm': 0.07145144001403435, 'learning_rate': 3.508e-07, 'completion_length': 52.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.054351806640625, 'epoch': 0.65} 65%|██████▍ | 1623/2500 [6:19:50<3:26:50, 14.15s/it] 65%|██████▍ | 1624/2500 [6:20:05<3:31:15, 14.47s/it] {'loss': 0.0012, 'grad_norm': 0.08949816298791625, 'learning_rate': 3.5039999999999996e-07, 'completion_length': 68.26786041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03070068359375, 'epoch': 0.65} 65%|██████▍ | 1624/2500 [6:20:05<3:31:15, 14.47s/it] 65%|██████▌ | 1625/2500 [6:20:21<3:35:43, 14.79s/it] {'loss': 0.0011, 'grad_norm': 0.07739825242825983, 'learning_rate': 3.5e-07, 'completion_length': 59.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02777099609375, 'epoch': 0.65} 65%|██████▌ | 1625/2500 [6:20:21<3:35:43, 14.79s/it] 65%|██████▌ | 1626/2500 [6:20:34<3:30:35, 14.46s/it] {'loss': 0.0014, 'grad_norm': 0.7017646616198975, 'learning_rate': 3.496e-07, 'completion_length': 57.46428680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03570556640625, 'epoch': 0.65} 65%|██████▌ | 1626/2500 [6:20:34<3:30:35, 14.46s/it] 65%|██████▌ | 1627/2500 [6:20:49<3:32:30, 14.61s/it] {'loss': 0.0015, 'grad_norm': 0.0711970012072918, 'learning_rate': 3.492e-07, 'completion_length': 58.10714530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.037109375, 'epoch': 0.65} 65%|██████▌ | 1627/2500 [6:20:49<3:32:30, 14.61s/it] 65%|██████▌ | 1628/2500 [6:21:03<3:28:55, 14.38s/it] {'loss': 0.0012, 'grad_norm': 0.07562970887741025, 'learning_rate': 3.488e-07, 'completion_length': 59.46428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03076171875, 'epoch': 0.65} 65%|██████▌ | 1628/2500 [6:21:03<3:28:55, 14.38s/it] 65%|██████▌ | 1629/2500 [6:21:16<3:23:33, 14.02s/it] {'loss': 0.0008, 'grad_norm': 2.066253592617458, 'learning_rate': 3.4839999999999997e-07, 'completion_length': 52.21428871154785, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.019287109375, 'epoch': 0.65} 65%|██████▌ | 1629/2500 [6:21:16<3:23:33, 14.02s/it] 65%|██████▌ | 1630/2500 [6:21:30<3:22:48, 13.99s/it] {'loss': 0.0016, 'grad_norm': 0.09643858100832373, 'learning_rate': 3.4799999999999994e-07, 'completion_length': 52.875003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0396728515625, 'epoch': 0.65} 65%|██████▌ | 1630/2500 [6:21:30<3:22:48, 13.99s/it] 65%|██████▌ | 1631/2500 [6:21:43<3:18:04, 13.68s/it] {'loss': 0.0017, 'grad_norm': 1.5922549604679168, 'learning_rate': 3.476e-07, 'completion_length': 51.69643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.04150390625, 'epoch': 0.65} 65%|██████▌ | 1631/2500 [6:21:43<3:18:04, 13.68s/it] 65%|██████▌ | 1632/2500 [6:21:58<3:23:49, 14.09s/it] {'loss': 0.0007, 'grad_norm': 0.06250055658140874, 'learning_rate': 3.472e-07, 'completion_length': 56.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01806640625, 'epoch': 0.65} 65%|██████▌ | 1632/2500 [6:21:58<3:23:49, 14.09s/it] 65%|██████▌ | 1633/2500 [6:22:12<3:23:14, 14.06s/it] {'loss': 0.0014, 'grad_norm': 0.06698112789814796, 'learning_rate': 3.4679999999999996e-07, 'completion_length': 56.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03424072265625, 'epoch': 0.65} 65%|██████▌ | 1633/2500 [6:22:12<3:23:14, 14.06s/it] 65%|██████▌ | 1634/2500 [6:22:25<3:19:00, 13.79s/it] {'loss': 0.0018, 'grad_norm': 0.0899911755310899, 'learning_rate': 3.464e-07, 'completion_length': 51.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04412841796875, 'epoch': 0.65} 65%|██████▌ | 1634/2500 [6:22:25<3:19:00, 13.79s/it] 65%|██████▌ | 1635/2500 [6:22:39<3:19:31, 13.84s/it] {'loss': 0.0007, 'grad_norm': 0.11274721853846485, 'learning_rate': 3.4599999999999995e-07, 'completion_length': 49.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01776123046875, 'epoch': 0.65} 65%|██████▌ | 1635/2500 [6:22:39<3:19:31, 13.84s/it] 65%|██████▌ | 1636/2500 [6:22:55<3:26:30, 14.34s/it] {'loss': 0.0005, 'grad_norm': 0.17157917467602182, 'learning_rate': 3.456e-07, 'completion_length': 60.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0125732421875, 'epoch': 0.65} 65%|██████▌ | 1636/2500 [6:22:55<3:26:30, 14.34s/it] 65%|██████▌ | 1637/2500 [6:23:09<3:27:09, 14.40s/it] {'loss': 0.0007, 'grad_norm': 0.06539360227434665, 'learning_rate': 3.452e-07, 'completion_length': 57.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017242431640625, 'epoch': 0.65} 65%|██████▌ | 1637/2500 [6:23:09<3:27:09, 14.40s/it] 66%|██████▌ | 1638/2500 [6:23:23<3:23:45, 14.18s/it] {'loss': 0.0013, 'grad_norm': 0.2248358000049308, 'learning_rate': 3.4479999999999996e-07, 'completion_length': 61.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03131103515625, 'epoch': 0.66} 66%|██████▌ | 1638/2500 [6:23:23<3:23:45, 14.18s/it] 66%|██████▌ | 1639/2500 [6:23:38<3:25:43, 14.34s/it] {'loss': 0.0021, 'grad_norm': 0.11553084801276901, 'learning_rate': 3.444e-07, 'completion_length': 55.910715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.052978515625, 'epoch': 0.66} 66%|██████▌ | 1639/2500 [6:23:38<3:25:43, 14.34s/it] 66%|██████▌ | 1640/2500 [6:23:51<3:20:17, 13.97s/it] {'loss': 0.0021, 'grad_norm': 0.05753393146441075, 'learning_rate': 3.4399999999999996e-07, 'completion_length': 52.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0535888671875, 'epoch': 0.66} 66%|██████▌ | 1640/2500 [6:23:51<3:20:17, 13.97s/it] 66%|██████▌ | 1641/2500 [6:24:06<3:26:04, 14.39s/it] {'loss': 0.0014, 'grad_norm': 0.08051176341833642, 'learning_rate': 3.436e-07, 'completion_length': 57.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.035919189453125, 'epoch': 0.66} 66%|██████▌ | 1641/2500 [6:24:06<3:26:04, 14.39s/it] 66%|██████▌ | 1642/2500 [6:24:20<3:22:28, 14.16s/it] {'loss': 0.001, 'grad_norm': 2.055918432724441, 'learning_rate': 3.432e-07, 'completion_length': 56.392860412597656, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02471923828125, 'epoch': 0.66} 66%|██████▌ | 1642/2500 [6:24:20<3:22:28, 14.16s/it] 66%|██████▌ | 1643/2500 [6:24:37<3:34:36, 15.02s/it] {'loss': 0.0014, 'grad_norm': 0.15195551922436307, 'learning_rate': 3.4279999999999997e-07, 'completion_length': 61.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0340576171875, 'epoch': 0.66} 66%|██████▌ | 1643/2500 [6:24:37<3:34:36, 15.02s/it] 66%|██████▌ | 1644/2500 [6:24:51<3:30:40, 14.77s/it] {'loss': 0.0011, 'grad_norm': 0.06657279707162805, 'learning_rate': 3.4239999999999994e-07, 'completion_length': 57.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02667236328125, 'epoch': 0.66} 66%|██████▌ | 1644/2500 [6:24:51<3:30:40, 14.77s/it] 66%|██████▌ | 1645/2500 [6:25:04<3:23:37, 14.29s/it] {'loss': 0.0011, 'grad_norm': 0.09083700028732387, 'learning_rate': 3.42e-07, 'completion_length': 51.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028045654296875, 'epoch': 0.66} 66%|██████▌ | 1645/2500 [6:25:04<3:23:37, 14.29s/it] 66%|██████▌ | 1646/2500 [6:25:21<3:32:20, 14.92s/it] {'loss': 0.0011, 'grad_norm': 0.07675110311575592, 'learning_rate': 3.416e-07, 'completion_length': 59.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02703857421875, 'epoch': 0.66} 66%|██████▌ | 1646/2500 [6:25:21<3:32:20, 14.92s/it] 66%|██████▌ | 1647/2500 [6:25:34<3:23:57, 14.35s/it] {'loss': 0.0006, 'grad_norm': 0.06364837167108531, 'learning_rate': 3.412e-07, 'completion_length': 56.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015960693359375, 'epoch': 0.66} 66%|██████▌ | 1647/2500 [6:25:34<3:23:57, 14.35s/it] 66%|██████▌ | 1648/2500 [6:25:47<3:18:35, 13.99s/it] {'loss': 0.0018, 'grad_norm': 0.06935635644501265, 'learning_rate': 3.408e-07, 'completion_length': 54.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0447998046875, 'epoch': 0.66} 66%|██████▌ | 1648/2500 [6:25:47<3:18:35, 13.99s/it] 66%|██████▌ | 1649/2500 [6:26:00<3:14:35, 13.72s/it] {'loss': 0.001, 'grad_norm': 0.08142743905607976, 'learning_rate': 3.4039999999999995e-07, 'completion_length': 55.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025848388671875, 'epoch': 0.66} 66%|██████▌ | 1649/2500 [6:26:00<3:14:35, 13.72s/it] 66%|██████▌ | 1650/2500 [6:26:13<3:13:32, 13.66s/it] {'loss': 0.0008, 'grad_norm': 0.05359611815954606, 'learning_rate': 3.4000000000000003e-07, 'completion_length': 55.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019989013671875, 'epoch': 0.66} 66%|██████▌ | 1650/2500 [6:26:13<3:13:32, 13.66s/it] 66%|██████▌ | 1651/2500 [6:26:27<3:14:03, 13.71s/it] {'loss': 0.0016, 'grad_norm': 3.3089539067522455, 'learning_rate': 3.396e-07, 'completion_length': 58.94643211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.04083251953125, 'epoch': 0.66} 66%|██████▌ | 1651/2500 [6:26:27<3:14:03, 13.71s/it] 66%|██████▌ | 1652/2500 [6:26:41<3:15:00, 13.80s/it] {'loss': 0.0011, 'grad_norm': 0.1907016096253901, 'learning_rate': 3.3919999999999997e-07, 'completion_length': 50.69643211364746, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02789306640625, 'epoch': 0.66} 66%|██████▌ | 1652/2500 [6:26:41<3:15:00, 13.80s/it] 66%|██████▌ | 1653/2500 [6:26:55<3:14:19, 13.77s/it] {'loss': 0.0006, 'grad_norm': 0.05852347387820505, 'learning_rate': 3.388e-07, 'completion_length': 59.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015380859375, 'epoch': 0.66} 66%|██████▌ | 1653/2500 [6:26:55<3:14:19, 13.77s/it] 66%|██████▌ | 1654/2500 [6:27:08<3:11:59, 13.62s/it] {'loss': 0.0013, 'grad_norm': 1.3095308919698858, 'learning_rate': 3.3839999999999996e-07, 'completion_length': 59.32143020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.032470703125, 'epoch': 0.66} 66%|██████▌ | 1654/2500 [6:27:08<3:11:59, 13.62s/it] 66%|██████▌ | 1655/2500 [6:27:21<3:09:30, 13.46s/it] {'loss': 0.0011, 'grad_norm': 0.08934543961197984, 'learning_rate': 3.38e-07, 'completion_length': 52.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02813720703125, 'epoch': 0.66} 66%|██████▌ | 1655/2500 [6:27:21<3:09:30, 13.46s/it] 66%|██████▌ | 1656/2500 [6:27:36<3:16:13, 13.95s/it] {'loss': 0.0006, 'grad_norm': 0.06120969011691117, 'learning_rate': 3.376e-07, 'completion_length': 59.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014068603515625, 'epoch': 0.66} 66%|██████▌ | 1656/2500 [6:27:36<3:16:13, 13.95s/it] 66%|██████▋ | 1657/2500 [6:27:50<3:13:40, 13.79s/it] {'loss': 0.0015, 'grad_norm': 0.10933649049952635, 'learning_rate': 3.372e-07, 'completion_length': 52.26785850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.036376953125, 'epoch': 0.66} 66%|██████▋ | 1657/2500 [6:27:50<3:13:40, 13.79s/it] 66%|██████▋ | 1658/2500 [6:28:03<3:13:03, 13.76s/it] {'loss': 0.0014, 'grad_norm': 0.1607156529876418, 'learning_rate': 3.368e-07, 'completion_length': 55.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03533935546875, 'epoch': 0.66} 66%|██████▋ | 1658/2500 [6:28:03<3:13:03, 13.76s/it] 66%|██████▋ | 1659/2500 [6:28:17<3:14:09, 13.85s/it] {'loss': 0.0012, 'grad_norm': 0.09969322668875753, 'learning_rate': 3.3639999999999997e-07, 'completion_length': 55.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03082275390625, 'epoch': 0.66} 66%|██████▋ | 1659/2500 [6:28:17<3:14:09, 13.85s/it] 66%|██████▋ | 1660/2500 [6:28:32<3:15:54, 13.99s/it] {'loss': 0.0011, 'grad_norm': 0.06939797889956942, 'learning_rate': 3.36e-07, 'completion_length': 57.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02642822265625, 'epoch': 0.66} 66%|██████▋ | 1660/2500 [6:28:32<3:15:54, 13.99s/it] 66%|██████▋ | 1661/2500 [6:28:45<3:14:15, 13.89s/it] {'loss': 0.0018, 'grad_norm': 0.08147378896552072, 'learning_rate': 3.356e-07, 'completion_length': 56.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0440673828125, 'epoch': 0.66} 66%|██████▋ | 1661/2500 [6:28:45<3:14:15, 13.89s/it] 66%|██████▋ | 1662/2500 [6:29:00<3:17:23, 14.13s/it] {'loss': 0.0013, 'grad_norm': 0.08904836039347816, 'learning_rate': 3.352e-07, 'completion_length': 54.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.66} 66%|██████▋ | 1662/2500 [6:29:00<3:17:23, 14.13s/it] 67%|██████▋ | 1663/2500 [6:29:14<3:16:19, 14.07s/it] {'loss': 0.0007, 'grad_norm': 0.08032553916299282, 'learning_rate': 3.3479999999999995e-07, 'completion_length': 59.16071891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01824951171875, 'epoch': 0.67} 67%|██████▋ | 1663/2500 [6:29:14<3:16:19, 14.07s/it] 67%|██████▋ | 1664/2500 [6:29:28<3:13:56, 13.92s/it] {'loss': 0.0007, 'grad_norm': 0.06313712586277177, 'learning_rate': 3.344e-07, 'completion_length': 54.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018585205078125, 'epoch': 0.67} 67%|██████▋ | 1664/2500 [6:29:28<3:13:56, 13.92s/it] 67%|██████▋ | 1665/2500 [6:29:41<3:13:20, 13.89s/it] {'loss': 0.001, 'grad_norm': 0.06274607030918351, 'learning_rate': 3.34e-07, 'completion_length': 51.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0247955322265625, 'epoch': 0.67} 67%|██████▋ | 1665/2500 [6:29:41<3:13:20, 13.89s/it] 67%|██████▋ | 1666/2500 [6:29:55<3:10:47, 13.73s/it] {'loss': 0.0009, 'grad_norm': 0.0618496528802627, 'learning_rate': 3.3359999999999997e-07, 'completion_length': 56.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02294921875, 'epoch': 0.67} 67%|██████▋ | 1666/2500 [6:29:55<3:10:47, 13.73s/it] 67%|██████▋ | 1667/2500 [6:30:08<3:09:35, 13.66s/it] {'loss': 0.0007, 'grad_norm': 0.08706076465964914, 'learning_rate': 3.332e-07, 'completion_length': 51.160715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018798828125, 'epoch': 0.67} 67%|██████▋ | 1667/2500 [6:30:08<3:09:35, 13.66s/it] 67%|██████▋ | 1668/2500 [6:30:21<3:06:41, 13.46s/it] {'loss': 0.0012, 'grad_norm': 2.7679280889451934, 'learning_rate': 3.3279999999999996e-07, 'completion_length': 52.00000190734863, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.03118896484375, 'epoch': 0.67} 67%|██████▋ | 1668/2500 [6:30:21<3:06:41, 13.46s/it] 67%|██████▋ | 1669/2500 [6:30:35<3:07:46, 13.56s/it] {'loss': 0.0009, 'grad_norm': 0.06541309624411064, 'learning_rate': 3.3239999999999993e-07, 'completion_length': 56.83928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022125244140625, 'epoch': 0.67} 67%|██████▋ | 1669/2500 [6:30:35<3:07:46, 13.56s/it] 67%|██████▋ | 1670/2500 [6:30:48<3:06:18, 13.47s/it] {'loss': 0.0014, 'grad_norm': 0.12191101297047048, 'learning_rate': 3.32e-07, 'completion_length': 56.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0338134765625, 'epoch': 0.67} 67%|██████▋ | 1670/2500 [6:30:48<3:06:18, 13.47s/it] 67%|██████▋ | 1671/2500 [6:31:03<3:10:37, 13.80s/it] {'loss': 0.001, 'grad_norm': 1.527049414358006, 'learning_rate': 3.316e-07, 'completion_length': 57.10714530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02386474609375, 'epoch': 0.67} 67%|██████▋ | 1671/2500 [6:31:03<3:10:37, 13.80s/it] 67%|██████▋ | 1672/2500 [6:31:18<3:17:51, 14.34s/it] {'loss': 0.0012, 'grad_norm': 0.05999127100112939, 'learning_rate': 3.312e-07, 'completion_length': 63.910715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02886962890625, 'epoch': 0.67} 67%|██████▋ | 1672/2500 [6:31:18<3:17:51, 14.34s/it] 67%|██████▋ | 1673/2500 [6:31:33<3:17:53, 14.36s/it] {'loss': 0.0008, 'grad_norm': 0.05244309991329404, 'learning_rate': 3.3079999999999997e-07, 'completion_length': 55.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0208740234375, 'epoch': 0.67} 67%|██████▋ | 1673/2500 [6:31:33<3:17:53, 14.36s/it] 67%|██████▋ | 1674/2500 [6:31:48<3:21:35, 14.64s/it] {'loss': 0.0007, 'grad_norm': 1.3630472708185666, 'learning_rate': 3.304e-07, 'completion_length': 64.62500381469727, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0181884765625, 'epoch': 0.67} 67%|██████▋ | 1674/2500 [6:31:48<3:21:35, 14.64s/it] 67%|██████▋ | 1675/2500 [6:32:01<3:15:51, 14.24s/it] {'loss': 0.0009, 'grad_norm': 0.07564664238483083, 'learning_rate': 3.3e-07, 'completion_length': 53.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02191162109375, 'epoch': 0.67} 67%|██████▋ | 1675/2500 [6:32:01<3:15:51, 14.24s/it] 67%|██████▋ | 1676/2500 [6:32:15<3:14:34, 14.17s/it] {'loss': 0.0013, 'grad_norm': 0.06486207577024564, 'learning_rate': 3.296e-07, 'completion_length': 54.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03253173828125, 'epoch': 0.67} 67%|██████▋ | 1676/2500 [6:32:15<3:14:34, 14.17s/it] 67%|██████▋ | 1677/2500 [6:32:29<3:13:31, 14.11s/it] {'loss': 0.0009, 'grad_norm': 0.05855662656488528, 'learning_rate': 3.2919999999999996e-07, 'completion_length': 57.035715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0223388671875, 'epoch': 0.67} 67%|██████▋ | 1677/2500 [6:32:29<3:13:31, 14.11s/it] 67%|██████▋ | 1678/2500 [6:32:44<3:13:51, 14.15s/it] {'loss': 0.0014, 'grad_norm': 0.05468644250293831, 'learning_rate': 3.288e-07, 'completion_length': 51.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03631591796875, 'epoch': 0.67} 67%|██████▋ | 1678/2500 [6:32:44<3:13:51, 14.15s/it] 67%|██████▋ | 1679/2500 [6:32:59<3:17:22, 14.42s/it] {'loss': 0.0017, 'grad_norm': 0.15545862398354598, 'learning_rate': 3.284e-07, 'completion_length': 58.30357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04302978515625, 'epoch': 0.67} 67%|██████▋ | 1679/2500 [6:32:59<3:17:22, 14.42s/it] 67%|██████▋ | 1680/2500 [6:33:13<3:16:39, 14.39s/it] {'loss': 0.0013, 'grad_norm': 0.08374254472719504, 'learning_rate': 3.28e-07, 'completion_length': 54.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03363037109375, 'epoch': 0.67} 67%|██████▋ | 1680/2500 [6:33:13<3:16:39, 14.39s/it] 67%|██████▋ | 1681/2500 [6:33:26<3:11:25, 14.02s/it] {'loss': 0.001, 'grad_norm': 0.058282637884939904, 'learning_rate': 3.276e-07, 'completion_length': 51.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02593994140625, 'epoch': 0.67} 67%|██████▋ | 1681/2500 [6:33:26<3:11:25, 14.02s/it] 67%|██████▋ | 1682/2500 [6:33:41<3:12:47, 14.14s/it] {'loss': 0.0006, 'grad_norm': 0.06062750042489923, 'learning_rate': 3.2719999999999997e-07, 'completion_length': 55.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014617919921875, 'epoch': 0.67} 67%|██████▋ | 1682/2500 [6:33:41<3:12:47, 14.14s/it] 67%|██████▋ | 1683/2500 [6:33:55<3:12:23, 14.13s/it] {'loss': 0.0018, 'grad_norm': 0.08089806923234989, 'learning_rate': 3.268e-07, 'completion_length': 58.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.044677734375, 'epoch': 0.67} 67%|██████▋ | 1683/2500 [6:33:55<3:12:23, 14.13s/it] 67%|██████▋ | 1684/2500 [6:34:10<3:16:09, 14.42s/it] {'loss': 0.0012, 'grad_norm': 0.06915391248589743, 'learning_rate': 3.264e-07, 'completion_length': 60.17857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02960205078125, 'epoch': 0.67} 67%|██████▋ | 1684/2500 [6:34:10<3:16:09, 14.42s/it] 67%|██████▋ | 1685/2500 [6:34:23<3:11:03, 14.07s/it] {'loss': 0.001, 'grad_norm': 0.1728927637802409, 'learning_rate': 3.26e-07, 'completion_length': 57.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0244140625, 'epoch': 0.67} 67%|██████▋ | 1685/2500 [6:34:23<3:11:03, 14.07s/it] 67%|██████▋ | 1686/2500 [6:34:37<3:10:54, 14.07s/it] {'loss': 0.0009, 'grad_norm': 0.19314063591218353, 'learning_rate': 3.256e-07, 'completion_length': 50.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02142333984375, 'epoch': 0.67} 67%|██████▋ | 1686/2500 [6:34:37<3:10:54, 14.07s/it] 67%|██████▋ | 1687/2500 [6:34:51<3:07:47, 13.86s/it] {'loss': 0.0005, 'grad_norm': 0.07623115638324231, 'learning_rate': 3.252e-07, 'completion_length': 58.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01214599609375, 'epoch': 0.67} 67%|██████▋ | 1687/2500 [6:34:51<3:07:47, 13.86s/it] 68%|██████▊ | 1688/2500 [6:35:04<3:06:38, 13.79s/it] {'loss': 0.0009, 'grad_norm': 0.11323900344959205, 'learning_rate': 3.2479999999999994e-07, 'completion_length': 57.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02349853515625, 'epoch': 0.68} 68%|██████▊ | 1688/2500 [6:35:04<3:06:38, 13.79s/it] 68%|██████▊ | 1689/2500 [6:35:18<3:06:55, 13.83s/it] {'loss': 0.0014, 'grad_norm': 0.13567803713649917, 'learning_rate': 3.244e-07, 'completion_length': 59.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03375244140625, 'epoch': 0.68} 68%|██████▊ | 1689/2500 [6:35:18<3:06:55, 13.83s/it] 68%|██████▊ | 1690/2500 [6:35:33<3:12:58, 14.29s/it] {'loss': 0.0006, 'grad_norm': 0.05099779722853422, 'learning_rate': 3.24e-07, 'completion_length': 55.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01446533203125, 'epoch': 0.68} 68%|██████▊ | 1690/2500 [6:35:33<3:12:58, 14.29s/it] 68%|██████▊ | 1691/2500 [6:35:48<3:11:37, 14.21s/it] {'loss': 0.0004, 'grad_norm': 0.05670245298481404, 'learning_rate': 3.2359999999999996e-07, 'completion_length': 56.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.010955810546875, 'epoch': 0.68} 68%|██████▊ | 1691/2500 [6:35:48<3:11:37, 14.21s/it] 68%|██████▊ | 1692/2500 [6:36:02<3:13:27, 14.37s/it] {'loss': 0.0014, 'grad_norm': 0.04835108011008286, 'learning_rate': 3.232e-07, 'completion_length': 57.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0357666015625, 'epoch': 0.68} 68%|██████▊ | 1692/2500 [6:36:02<3:13:27, 14.37s/it] 68%|██████▊ | 1693/2500 [6:36:17<3:16:41, 14.62s/it] {'loss': 0.001, 'grad_norm': 0.07096430359984279, 'learning_rate': 3.2279999999999995e-07, 'completion_length': 62.66071891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02520751953125, 'epoch': 0.68} 68%|██████▊ | 1693/2500 [6:36:17<3:16:41, 14.62s/it] 68%|██████▊ | 1694/2500 [6:36:33<3:19:46, 14.87s/it] {'loss': 0.0012, 'grad_norm': 0.05811461912571983, 'learning_rate': 3.2240000000000003e-07, 'completion_length': 62.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02978515625, 'epoch': 0.68} 68%|██████▊ | 1694/2500 [6:36:33<3:19:46, 14.87s/it] 68%|██████▊ | 1695/2500 [6:36:49<3:25:36, 15.33s/it] {'loss': 0.0012, 'grad_norm': 0.0717929261565919, 'learning_rate': 3.22e-07, 'completion_length': 66.82143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03082275390625, 'epoch': 0.68} 68%|██████▊ | 1695/2500 [6:36:49<3:25:36, 15.33s/it] 68%|██████▊ | 1696/2500 [6:37:03<3:18:13, 14.79s/it] {'loss': 0.0011, 'grad_norm': 1.8462430990788468, 'learning_rate': 3.2159999999999997e-07, 'completion_length': 50.08928680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.026885986328125, 'epoch': 0.68} 68%|██████▊ | 1696/2500 [6:37:03<3:18:13, 14.79s/it] 68%|██████▊ | 1697/2500 [6:37:17<3:14:50, 14.56s/it] {'loss': 0.0012, 'grad_norm': 0.1518327136736006, 'learning_rate': 3.212e-07, 'completion_length': 60.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03094482421875, 'epoch': 0.68} 68%|██████▊ | 1697/2500 [6:37:17<3:14:50, 14.56s/it] 68%|██████▊ | 1698/2500 [6:37:33<3:19:11, 14.90s/it] {'loss': 0.0016, 'grad_norm': 0.08196024624878145, 'learning_rate': 3.2079999999999996e-07, 'completion_length': 64.10714721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0399169921875, 'epoch': 0.68} 68%|██████▊ | 1698/2500 [6:37:33<3:19:11, 14.90s/it] 68%|██████▊ | 1699/2500 [6:37:49<3:24:34, 15.32s/it] {'loss': 0.0017, 'grad_norm': 0.08247421454503234, 'learning_rate': 3.204e-07, 'completion_length': 68.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.042236328125, 'epoch': 0.68} 68%|██████▊ | 1699/2500 [6:37:49<3:24:34, 15.32s/it] 68%|██████▊ | 1700/2500 [6:38:03<3:20:48, 15.06s/it] {'loss': 0.0015, 'grad_norm': 0.8603329993015553, 'learning_rate': 3.2e-07, 'completion_length': 54.10714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0377197265625, 'epoch': 0.68} 68%|██████▊ | 1700/2500 [6:38:03<3:20:48, 15.06s/it] 68%|██████▊ | 1701/2500 [6:39:13<7:00:48, 31.60s/it] {'loss': 0.0015, 'grad_norm': 2.1227402407167792, 'learning_rate': 3.196e-07, 'completion_length': 58.232147216796875, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.037841796875, 'epoch': 0.68} 68%|██████▊ | 1701/2500 [6:39:14<7:00:48, 31.60s/it] 68%|██████▊ | 1702/2500 [6:39:28<5:51:23, 26.42s/it] {'loss': 0.0016, 'grad_norm': 0.07143206535134569, 'learning_rate': 3.1919999999999995e-07, 'completion_length': 59.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.039794921875, 'epoch': 0.68} 68%|██████▊ | 1702/2500 [6:39:28<5:51:23, 26.42s/it] 68%|██████▊ | 1703/2500 [6:39:43<5:04:37, 22.93s/it] {'loss': 0.0007, 'grad_norm': 0.1059112851247482, 'learning_rate': 3.1879999999999997e-07, 'completion_length': 69.85714721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017822265625, 'epoch': 0.68} 68%|██████▊ | 1703/2500 [6:39:43<5:04:37, 22.93s/it] 68%|██████▊ | 1704/2500 [6:39:59<4:38:02, 20.96s/it] {'loss': 0.0016, 'grad_norm': 9.795514818942097, 'learning_rate': 3.184e-07, 'completion_length': 64.33928871154785, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.03900146484375, 'epoch': 0.68} 68%|██████▊ | 1704/2500 [6:39:59<4:38:02, 20.96s/it] 68%|██████▊ | 1705/2500 [6:40:13<4:10:54, 18.94s/it] {'loss': 0.0011, 'grad_norm': 0.08807377234386445, 'learning_rate': 3.18e-07, 'completion_length': 63.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.68} 68%|██████▊ | 1705/2500 [6:40:13<4:10:54, 18.94s/it] 68%|██████▊ | 1706/2500 [6:40:26<3:46:59, 17.15s/it] {'loss': 0.0006, 'grad_norm': 0.06064413541671472, 'learning_rate': 3.176e-07, 'completion_length': 46.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014434814453125, 'epoch': 0.68} 68%|██████▊ | 1706/2500 [6:40:26<3:46:59, 17.15s/it] 68%|██████▊ | 1707/2500 [6:40:40<3:31:51, 16.03s/it] {'loss': 0.001, 'grad_norm': 0.17843081258639912, 'learning_rate': 3.1719999999999996e-07, 'completion_length': 55.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025146484375, 'epoch': 0.68} 68%|██████▊ | 1707/2500 [6:40:40<3:31:51, 16.03s/it] 68%|██████▊ | 1708/2500 [6:40:53<3:21:51, 15.29s/it] {'loss': 0.0009, 'grad_norm': 0.09758003375125385, 'learning_rate': 3.1680000000000003e-07, 'completion_length': 53.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0228271484375, 'epoch': 0.68} 68%|██████▊ | 1708/2500 [6:40:53<3:21:51, 15.29s/it] 68%|██████▊ | 1709/2500 [6:41:07<3:13:52, 14.71s/it] {'loss': 0.001, 'grad_norm': 0.06030454400055572, 'learning_rate': 3.164e-07, 'completion_length': 59.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025909423828125, 'epoch': 0.68} 68%|██████▊ | 1709/2500 [6:41:07<3:13:52, 14.71s/it] 68%|██████▊ | 1710/2500 [6:41:20<3:09:01, 14.36s/it] {'loss': 0.0013, 'grad_norm': 1.3951886890489567, 'learning_rate': 3.1599999999999997e-07, 'completion_length': 59.08928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03314208984375, 'epoch': 0.68} 68%|██████▊ | 1710/2500 [6:41:20<3:09:01, 14.36s/it] 68%|██████▊ | 1711/2500 [6:41:34<3:08:02, 14.30s/it] {'loss': 0.0007, 'grad_norm': 0.05751168435856252, 'learning_rate': 3.156e-07, 'completion_length': 60.92857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017669677734375, 'epoch': 0.68} 68%|██████▊ | 1711/2500 [6:41:34<3:08:02, 14.30s/it] 68%|██████▊ | 1712/2500 [6:41:49<3:07:57, 14.31s/it] {'loss': 0.0013, 'grad_norm': 0.06356612963685772, 'learning_rate': 3.1519999999999996e-07, 'completion_length': 55.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032470703125, 'epoch': 0.68} 68%|██████▊ | 1712/2500 [6:41:49<3:07:57, 14.31s/it] 69%|██████▊ | 1713/2500 [6:42:04<3:10:55, 14.56s/it] {'loss': 0.001, 'grad_norm': 0.057103130185304295, 'learning_rate': 3.148e-07, 'completion_length': 64.17857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025146484375, 'epoch': 0.69} 69%|██████▊ | 1713/2500 [6:42:04<3:10:55, 14.56s/it] 69%|██████▊ | 1714/2500 [6:42:18<3:09:50, 14.49s/it] {'loss': 0.0016, 'grad_norm': 0.06365894428170361, 'learning_rate': 3.144e-07, 'completion_length': 61.375003814697266, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03924560546875, 'epoch': 0.69} 69%|██████▊ | 1714/2500 [6:42:18<3:09:50, 14.49s/it] 69%|██████▊ | 1715/2500 [6:42:33<3:11:07, 14.61s/it] {'loss': 0.0014, 'grad_norm': 0.060502163476835416, 'learning_rate': 3.14e-07, 'completion_length': 67.76786041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03424072265625, 'epoch': 0.69} 69%|██████▊ | 1715/2500 [6:42:33<3:11:07, 14.61s/it] 69%|██████▊ | 1716/2500 [6:42:46<3:05:54, 14.23s/it] {'loss': 0.0005, 'grad_norm': 0.10300030688271984, 'learning_rate': 3.1359999999999995e-07, 'completion_length': 54.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01239013671875, 'epoch': 0.69} 69%|██████▊ | 1716/2500 [6:42:46<3:05:54, 14.23s/it] 69%|██████▊ | 1717/2500 [6:43:01<3:09:23, 14.51s/it] {'loss': 0.001, 'grad_norm': 1.273581499539058, 'learning_rate': 3.1319999999999997e-07, 'completion_length': 62.23214530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0247802734375, 'epoch': 0.69} 69%|██████▊ | 1717/2500 [6:43:01<3:09:23, 14.51s/it] 69%|██████▊ | 1718/2500 [6:43:15<3:06:00, 14.27s/it] {'loss': 0.0011, 'grad_norm': 0.06447467336575491, 'learning_rate': 3.128e-07, 'completion_length': 54.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02642822265625, 'epoch': 0.69} 69%|██████▊ | 1718/2500 [6:43:15<3:06:00, 14.27s/it] 69%|██████▉ | 1719/2500 [6:43:30<3:07:57, 14.44s/it] {'loss': 0.0004, 'grad_norm': 0.07672409640229708, 'learning_rate': 3.124e-07, 'completion_length': 57.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0103759765625, 'epoch': 0.69} 69%|██████▉ | 1719/2500 [6:43:30<3:07:57, 14.44s/it] 69%|██████▉ | 1720/2500 [6:43:45<3:08:28, 14.50s/it] {'loss': 0.0012, 'grad_norm': 0.09672211557311905, 'learning_rate': 3.12e-07, 'completion_length': 63.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030029296875, 'epoch': 0.69} 69%|██████▉ | 1720/2500 [6:43:45<3:08:28, 14.50s/it] 69%|██████▉ | 1721/2500 [6:43:59<3:07:47, 14.46s/it] {'loss': 0.0018, 'grad_norm': 0.057717819978584314, 'learning_rate': 3.1159999999999996e-07, 'completion_length': 61.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.044677734375, 'epoch': 0.69} 69%|██████▉ | 1721/2500 [6:43:59<3:07:47, 14.46s/it] 69%|██████▉ | 1722/2500 [6:44:13<3:04:58, 14.27s/it] {'loss': 0.0013, 'grad_norm': 0.11673588703428797, 'learning_rate': 3.112e-07, 'completion_length': 61.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03289794921875, 'epoch': 0.69} 69%|██████▉ | 1722/2500 [6:44:13<3:04:58, 14.27s/it] 69%|██████▉ | 1723/2500 [6:44:27<3:03:11, 14.15s/it] {'loss': 0.0009, 'grad_norm': 0.07016460645597583, 'learning_rate': 3.108e-07, 'completion_length': 63.821434020996094, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02154541015625, 'epoch': 0.69} 69%|██████▉ | 1723/2500 [6:44:27<3:03:11, 14.15s/it] 69%|██████▉ | 1724/2500 [6:44:42<3:08:54, 14.61s/it] {'loss': 0.0008, 'grad_norm': 0.0913248628665597, 'learning_rate': 3.104e-07, 'completion_length': 59.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02099609375, 'epoch': 0.69} 69%|██████▉ | 1724/2500 [6:44:42<3:08:54, 14.61s/it] 69%|██████▉ | 1725/2500 [6:44:57<3:07:14, 14.50s/it] {'loss': 0.0011, 'grad_norm': 0.06356973609059216, 'learning_rate': 3.1e-07, 'completion_length': 59.750003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02850341796875, 'epoch': 0.69} 69%|██████▉ | 1725/2500 [6:44:57<3:07:14, 14.50s/it] 69%|██████▉ | 1726/2500 [6:45:11<3:07:09, 14.51s/it] {'loss': 0.0012, 'grad_norm': 0.05972911902486743, 'learning_rate': 3.0959999999999997e-07, 'completion_length': 64.48214721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0288543701171875, 'epoch': 0.69} 69%|██████▉ | 1726/2500 [6:45:11<3:07:09, 14.51s/it] 69%|██████▉ | 1727/2500 [6:45:26<3:08:26, 14.63s/it] {'loss': 0.001, 'grad_norm': 1.9316995750082173, 'learning_rate': 3.0919999999999994e-07, 'completion_length': 64.33929061889648, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02606201171875, 'epoch': 0.69} 69%|██████▉ | 1727/2500 [6:45:26<3:08:26, 14.63s/it] 69%|██████▉ | 1728/2500 [6:45:40<3:06:28, 14.49s/it] {'loss': 0.0017, 'grad_norm': 0.13418203109390106, 'learning_rate': 3.088e-07, 'completion_length': 61.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0428466796875, 'epoch': 0.69} 69%|██████▉ | 1728/2500 [6:45:40<3:06:28, 14.49s/it] 69%|██████▉ | 1729/2500 [6:45:55<3:05:35, 14.44s/it] {'loss': 0.0007, 'grad_norm': 0.07970388569294755, 'learning_rate': 3.084e-07, 'completion_length': 57.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0162353515625, 'epoch': 0.69} 69%|██████▉ | 1729/2500 [6:45:55<3:05:35, 14.44s/it] 69%|██████▉ | 1730/2500 [6:46:09<3:04:51, 14.40s/it] {'loss': 0.0011, 'grad_norm': 4.832041189481957, 'learning_rate': 3.08e-07, 'completion_length': 55.69643020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.027923583984375, 'epoch': 0.69} 69%|██████▉ | 1730/2500 [6:46:09<3:04:51, 14.40s/it] 69%|██████▉ | 1731/2500 [6:46:23<3:02:10, 14.21s/it] {'loss': 0.0011, 'grad_norm': 0.06279302450785852, 'learning_rate': 3.076e-07, 'completion_length': 55.035715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026458740234375, 'epoch': 0.69} 69%|██████▉ | 1731/2500 [6:46:23<3:02:10, 14.21s/it] 69%|██████▉ | 1732/2500 [6:46:38<3:07:39, 14.66s/it] {'loss': 0.0004, 'grad_norm': 0.05486364741111643, 'learning_rate': 3.0719999999999995e-07, 'completion_length': 58.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0100860595703125, 'epoch': 0.69} 69%|██████▉ | 1732/2500 [6:46:38<3:07:39, 14.66s/it] 69%|██████▉ | 1733/2500 [6:46:53<3:06:36, 14.60s/it] {'loss': 0.0009, 'grad_norm': 0.8552904057726842, 'learning_rate': 3.068e-07, 'completion_length': 54.875003814697266, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02191162109375, 'epoch': 0.69} 69%|██████▉ | 1733/2500 [6:46:53<3:06:36, 14.60s/it] 69%|██████▉ | 1734/2500 [6:47:07<3:04:10, 14.43s/it] {'loss': 0.0019, 'grad_norm': 0.06983575558519257, 'learning_rate': 3.064e-07, 'completion_length': 56.41071891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.048583984375, 'epoch': 0.69} 69%|██████▉ | 1734/2500 [6:47:07<3:04:10, 14.43s/it] 69%|██████▉ | 1735/2500 [6:47:20<2:58:49, 14.02s/it] {'loss': 0.001, 'grad_norm': 0.09000162757904073, 'learning_rate': 3.0599999999999996e-07, 'completion_length': 52.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024566650390625, 'epoch': 0.69} 69%|██████▉ | 1735/2500 [6:47:20<2:58:49, 14.02s/it] 69%|██████▉ | 1736/2500 [6:47:33<2:56:26, 13.86s/it] {'loss': 0.0017, 'grad_norm': 0.06404650936872157, 'learning_rate': 3.056e-07, 'completion_length': 57.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0413818359375, 'epoch': 0.69} 69%|██████▉ | 1736/2500 [6:47:33<2:56:26, 13.86s/it] 69%|██████▉ | 1737/2500 [6:47:47<2:54:19, 13.71s/it] {'loss': 0.0009, 'grad_norm': 0.05341530068641673, 'learning_rate': 3.052e-07, 'completion_length': 51.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02301025390625, 'epoch': 0.69} 69%|██████▉ | 1737/2500 [6:47:47<2:54:19, 13.71s/it] 70%|██████▉ | 1738/2500 [6:48:00<2:52:19, 13.57s/it] {'loss': 0.0008, 'grad_norm': 0.0663167284886191, 'learning_rate': 3.048e-07, 'completion_length': 52.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020263671875, 'epoch': 0.7} 70%|██████▉ | 1738/2500 [6:48:00<2:52:19, 13.57s/it] 70%|██████▉ | 1739/2500 [6:48:14<2:55:11, 13.81s/it] {'loss': 0.0011, 'grad_norm': 0.11101520303758095, 'learning_rate': 3.044e-07, 'completion_length': 54.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0263671875, 'epoch': 0.7} 70%|██████▉ | 1739/2500 [6:48:14<2:55:11, 13.81s/it] 70%|██████▉ | 1740/2500 [6:48:29<2:57:48, 14.04s/it] {'loss': 0.0013, 'grad_norm': 0.06202762310043937, 'learning_rate': 3.0399999999999997e-07, 'completion_length': 57.05357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03131103515625, 'epoch': 0.7} 70%|██████▉ | 1740/2500 [6:48:29<2:57:48, 14.04s/it] 70%|██████▉ | 1741/2500 [6:48:42<2:55:24, 13.87s/it] {'loss': 0.0009, 'grad_norm': 0.8188180501737765, 'learning_rate': 3.036e-07, 'completion_length': 55.71428871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02215576171875, 'epoch': 0.7} 70%|██████▉ | 1741/2500 [6:48:42<2:55:24, 13.87s/it] 70%|██████▉ | 1742/2500 [6:48:56<2:53:38, 13.75s/it] {'loss': 0.0011, 'grad_norm': 0.0737083787991931, 'learning_rate': 3.032e-07, 'completion_length': 56.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028076171875, 'epoch': 0.7} 70%|██████▉ | 1742/2500 [6:48:56<2:53:38, 13.75s/it] 70%|██████▉ | 1743/2500 [6:49:10<2:53:14, 13.73s/it] {'loss': 0.0009, 'grad_norm': 0.06580520393510929, 'learning_rate': 3.028e-07, 'completion_length': 53.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022705078125, 'epoch': 0.7} 70%|██████▉ | 1743/2500 [6:49:10<2:53:14, 13.73s/it] 70%|██████▉ | 1744/2500 [6:49:23<2:53:23, 13.76s/it] {'loss': 0.0016, 'grad_norm': 1.9494804676566913, 'learning_rate': 3.024e-07, 'completion_length': 56.05357360839844, 'rewards/accuracy_reward': 0.892857164144516, 'rewards/format_reward': 1.0, 'reward': 1.8928571939468384, 'reward_std': 0.04123930633068085, 'kl': 0.04052734375, 'epoch': 0.7} 70%|██████▉ | 1744/2500 [6:49:23<2:53:23, 13.76s/it] 70%|██████▉ | 1745/2500 [6:49:37<2:52:38, 13.72s/it] {'loss': 0.0011, 'grad_norm': 2.235605212399134, 'learning_rate': 3.02e-07, 'completion_length': 61.16071891784668, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.026824951171875, 'epoch': 0.7} 70%|██████▉ | 1745/2500 [6:49:37<2:52:38, 13.72s/it] 70%|██████▉ | 1746/2500 [6:49:51<2:51:56, 13.68s/it] {'loss': 0.0007, 'grad_norm': 0.07301474224380228, 'learning_rate': 3.0159999999999995e-07, 'completion_length': 53.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0185546875, 'epoch': 0.7} 70%|██████▉ | 1746/2500 [6:49:51<2:51:56, 13.68s/it] 70%|██████▉ | 1747/2500 [6:50:04<2:52:25, 13.74s/it] {'loss': 0.0012, 'grad_norm': 0.04436573210255907, 'learning_rate': 3.012e-07, 'completion_length': 54.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02923583984375, 'epoch': 0.7} 70%|██████▉ | 1747/2500 [6:50:04<2:52:25, 13.74s/it] 70%|██████▉ | 1748/2500 [6:50:18<2:51:19, 13.67s/it] {'loss': 0.0007, 'grad_norm': 0.062329000629864184, 'learning_rate': 3.008e-07, 'completion_length': 56.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01800537109375, 'epoch': 0.7} 70%|██████▉ | 1748/2500 [6:50:18<2:51:19, 13.67s/it] 70%|██████▉ | 1749/2500 [6:50:31<2:50:27, 13.62s/it] {'loss': 0.0007, 'grad_norm': 0.061177963539173257, 'learning_rate': 3.0039999999999996e-07, 'completion_length': 53.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018310546875, 'epoch': 0.7} 70%|██████▉ | 1749/2500 [6:50:31<2:50:27, 13.62s/it] 70%|███████ | 1750/2500 [6:50:45<2:51:22, 13.71s/it] {'loss': 0.0008, 'grad_norm': 0.09486105847702554, 'learning_rate': 3e-07, 'completion_length': 51.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02099609375, 'epoch': 0.7} 70%|███████ | 1750/2500 [6:50:45<2:51:22, 13.71s/it] 70%|███████ | 1751/2500 [6:50:58<2:48:49, 13.52s/it] {'loss': 0.0011, 'grad_norm': 0.08086784440917909, 'learning_rate': 2.9959999999999996e-07, 'completion_length': 55.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02734375, 'epoch': 0.7} 70%|███████ | 1751/2500 [6:50:58<2:48:49, 13.52s/it] 70%|███████ | 1752/2500 [6:51:13<2:51:30, 13.76s/it] {'loss': 0.0003, 'grad_norm': 0.05767989999706086, 'learning_rate': 2.9920000000000003e-07, 'completion_length': 56.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0082855224609375, 'epoch': 0.7} 70%|███████ | 1752/2500 [6:51:13<2:51:30, 13.76s/it] 70%|███████ | 1753/2500 [6:51:27<2:51:45, 13.80s/it] {'loss': 0.0008, 'grad_norm': 2.0015956292665202, 'learning_rate': 2.988e-07, 'completion_length': 52.48214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.020721435546875, 'epoch': 0.7} 70%|███████ | 1753/2500 [6:51:27<2:51:45, 13.80s/it] 70%|███████ | 1754/2500 [6:51:42<2:55:35, 14.12s/it] {'loss': 0.0011, 'grad_norm': 1.0068675181093187, 'learning_rate': 2.9839999999999997e-07, 'completion_length': 63.17857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.7} 70%|███████ | 1754/2500 [6:51:42<2:55:35, 14.12s/it] 70%|███████ | 1755/2500 [6:51:55<2:51:05, 13.78s/it] {'loss': 0.0008, 'grad_norm': 0.08270779416321938, 'learning_rate': 2.98e-07, 'completion_length': 55.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020477294921875, 'epoch': 0.7} 70%|███████ | 1755/2500 [6:51:55<2:51:05, 13.78s/it] 70%|███████ | 1756/2500 [6:52:08<2:49:42, 13.69s/it] {'loss': 0.001, 'grad_norm': 0.053510881580554955, 'learning_rate': 2.9759999999999996e-07, 'completion_length': 52.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02587890625, 'epoch': 0.7} 70%|███████ | 1756/2500 [6:52:08<2:49:42, 13.69s/it] 70%|███████ | 1757/2500 [6:52:23<2:52:49, 13.96s/it] {'loss': 0.0011, 'grad_norm': 0.07176098315499428, 'learning_rate': 2.972e-07, 'completion_length': 58.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02838134765625, 'epoch': 0.7} 70%|███████ | 1757/2500 [6:52:23<2:52:49, 13.96s/it] 70%|███████ | 1758/2500 [6:52:37<2:54:18, 14.10s/it] {'loss': 0.0014, 'grad_norm': 0.2641777209669213, 'learning_rate': 2.968e-07, 'completion_length': 60.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03564453125, 'epoch': 0.7} 70%|███████ | 1758/2500 [6:52:37<2:54:18, 14.10s/it] 70%|███████ | 1759/2500 [6:52:50<2:50:45, 13.83s/it] {'loss': 0.0012, 'grad_norm': 0.9509594258214744, 'learning_rate': 2.964e-07, 'completion_length': 56.107147216796875, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03125, 'epoch': 0.7} 70%|███████ | 1759/2500 [6:52:50<2:50:45, 13.83s/it] 70%|███████ | 1760/2500 [6:53:04<2:50:33, 13.83s/it] {'loss': 0.0009, 'grad_norm': 0.05061987571223316, 'learning_rate': 2.9599999999999995e-07, 'completion_length': 60.01786231994629, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0231475830078125, 'epoch': 0.7} 70%|███████ | 1760/2500 [6:53:04<2:50:33, 13.83s/it] 70%|███████ | 1761/2500 [6:53:17<2:47:56, 13.63s/it] {'loss': 0.0012, 'grad_norm': 0.10898905283159524, 'learning_rate': 2.9559999999999997e-07, 'completion_length': 50.17857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030029296875, 'epoch': 0.7} 70%|███████ | 1761/2500 [6:53:17<2:47:56, 13.63s/it] 70%|███████ | 1762/2500 [6:53:32<2:51:52, 13.97s/it] {'loss': 0.0012, 'grad_norm': 0.0720377475336045, 'learning_rate': 2.952e-07, 'completion_length': 58.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030242919921875, 'epoch': 0.7} 70%|███████ | 1762/2500 [6:53:32<2:51:52, 13.97s/it] 71%|███████ | 1763/2500 [6:53:45<2:48:43, 13.74s/it] {'loss': 0.0008, 'grad_norm': 0.05960680602039334, 'learning_rate': 2.948e-07, 'completion_length': 53.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02001953125, 'epoch': 0.71} 71%|███████ | 1763/2500 [6:53:45<2:48:43, 13.74s/it] 71%|███████ | 1764/2500 [6:53:59<2:50:07, 13.87s/it] {'loss': 0.0012, 'grad_norm': 0.08140707651006618, 'learning_rate': 2.944e-07, 'completion_length': 54.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029541015625, 'epoch': 0.71} 71%|███████ | 1764/2500 [6:53:59<2:50:07, 13.87s/it] 71%|███████ | 1765/2500 [6:54:14<2:51:02, 13.96s/it] {'loss': 0.0008, 'grad_norm': 0.06880830042230422, 'learning_rate': 2.9399999999999996e-07, 'completion_length': 58.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019256591796875, 'epoch': 0.71} 71%|███████ | 1765/2500 [6:54:14<2:51:02, 13.96s/it] 71%|███████ | 1766/2500 [6:54:27<2:49:25, 13.85s/it] {'loss': 0.0009, 'grad_norm': 0.269968984362273, 'learning_rate': 2.9360000000000003e-07, 'completion_length': 54.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021453857421875, 'epoch': 0.71} 71%|███████ | 1766/2500 [6:54:27<2:49:25, 13.85s/it] 71%|███████ | 1767/2500 [6:54:42<2:51:33, 14.04s/it] {'loss': 0.0015, 'grad_norm': 0.10829186711646513, 'learning_rate': 2.932e-07, 'completion_length': 56.750003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03692626953125, 'epoch': 0.71} 71%|███████ | 1767/2500 [6:54:42<2:51:33, 14.04s/it] 71%|███████ | 1768/2500 [6:54:56<2:51:10, 14.03s/it] {'loss': 0.0007, 'grad_norm': 0.13456315974630154, 'learning_rate': 2.928e-07, 'completion_length': 58.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0186767578125, 'epoch': 0.71} 71%|███████ | 1768/2500 [6:54:56<2:51:10, 14.03s/it] 71%|███████ | 1769/2500 [6:55:09<2:48:17, 13.81s/it] {'loss': 0.0009, 'grad_norm': 0.0869905135491323, 'learning_rate': 2.924e-07, 'completion_length': 56.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02227783203125, 'epoch': 0.71} 71%|███████ | 1769/2500 [6:55:09<2:48:17, 13.81s/it] 71%|███████ | 1770/2500 [6:55:23<2:48:14, 13.83s/it] {'loss': 0.001, 'grad_norm': 0.11403179318402754, 'learning_rate': 2.9199999999999997e-07, 'completion_length': 63.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02581787109375, 'epoch': 0.71} 71%|███████ | 1770/2500 [6:55:23<2:48:14, 13.83s/it] 71%|███████ | 1771/2500 [6:55:37<2:48:09, 13.84s/it] {'loss': 0.0009, 'grad_norm': 0.16817114497835406, 'learning_rate': 2.916e-07, 'completion_length': 59.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0235595703125, 'epoch': 0.71} 71%|███████ | 1771/2500 [6:55:37<2:48:09, 13.84s/it] 71%|███████ | 1772/2500 [6:55:50<2:47:46, 13.83s/it] {'loss': 0.0008, 'grad_norm': 0.13962350266206308, 'learning_rate': 2.912e-07, 'completion_length': 59.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019683837890625, 'epoch': 0.71} 71%|███████ | 1772/2500 [6:55:50<2:47:46, 13.83s/it] 71%|███████ | 1773/2500 [6:56:04<2:48:21, 13.89s/it] {'loss': 0.0009, 'grad_norm': 0.9561700157527979, 'learning_rate': 2.908e-07, 'completion_length': 57.28571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.022705078125, 'epoch': 0.71} 71%|███████ | 1773/2500 [6:56:04<2:48:21, 13.89s/it] 71%|███████ | 1774/2500 [6:56:19<2:51:29, 14.17s/it] {'loss': 0.0018, 'grad_norm': 0.05371282781137103, 'learning_rate': 2.9039999999999995e-07, 'completion_length': 58.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.044677734375, 'epoch': 0.71} 71%|███████ | 1774/2500 [6:56:19<2:51:29, 14.17s/it] 71%|███████ | 1775/2500 [6:56:34<2:51:54, 14.23s/it] {'loss': 0.0008, 'grad_norm': 1.637403338451325, 'learning_rate': 2.9e-07, 'completion_length': 65.9464340209961, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0191650390625, 'epoch': 0.71} 71%|███████ | 1775/2500 [6:56:34<2:51:54, 14.23s/it] 71%|███████ | 1776/2500 [6:56:47<2:49:20, 14.03s/it] {'loss': 0.0012, 'grad_norm': 0.09656175743531366, 'learning_rate': 2.896e-07, 'completion_length': 57.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030029296875, 'epoch': 0.71} 71%|███████ | 1776/2500 [6:56:47<2:49:20, 14.03s/it] 71%|███████ | 1777/2500 [6:57:02<2:51:06, 14.20s/it] {'loss': 0.0007, 'grad_norm': 0.147069504433057, 'learning_rate': 2.892e-07, 'completion_length': 58.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016998291015625, 'epoch': 0.71} 71%|███████ | 1777/2500 [6:57:02<2:51:06, 14.20s/it] 71%|███████ | 1778/2500 [6:57:16<2:49:15, 14.07s/it] {'loss': 0.0013, 'grad_norm': 1.0894286379456946, 'learning_rate': 2.888e-07, 'completion_length': 57.39285850524902, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.033447265625, 'epoch': 0.71} 71%|███████ | 1778/2500 [6:57:16<2:49:15, 14.07s/it] 71%|███████ | 1779/2500 [6:57:30<2:49:55, 14.14s/it] {'loss': 0.0015, 'grad_norm': 0.2204387409272094, 'learning_rate': 2.8839999999999996e-07, 'completion_length': 55.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0374755859375, 'epoch': 0.71} 71%|███████ | 1779/2500 [6:57:30<2:49:55, 14.14s/it] 71%|███████ | 1780/2500 [6:57:45<2:51:59, 14.33s/it] {'loss': 0.0016, 'grad_norm': 0.07937493672080716, 'learning_rate': 2.88e-07, 'completion_length': 59.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.039794921875, 'epoch': 0.71} 71%|███████ | 1780/2500 [6:57:45<2:51:59, 14.33s/it] 71%|███████ | 1781/2500 [6:57:58<2:49:02, 14.11s/it] {'loss': 0.0013, 'grad_norm': 0.07747661669316219, 'learning_rate': 2.876e-07, 'completion_length': 55.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0330810546875, 'epoch': 0.71} 71%|███████ | 1781/2500 [6:57:58<2:49:02, 14.11s/it] 71%|███████▏ | 1782/2500 [6:58:13<2:49:33, 14.17s/it] {'loss': 0.0018, 'grad_norm': 0.09226383132868175, 'learning_rate': 2.872e-07, 'completion_length': 57.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0439453125, 'epoch': 0.71} 71%|███████▏ | 1782/2500 [6:58:13<2:49:33, 14.17s/it] 71%|███████▏ | 1783/2500 [6:58:27<2:48:46, 14.12s/it] {'loss': 0.0014, 'grad_norm': 0.10733783083991767, 'learning_rate': 2.868e-07, 'completion_length': 56.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0352783203125, 'epoch': 0.71} 71%|███████▏ | 1783/2500 [6:58:27<2:48:46, 14.12s/it] 71%|███████▏ | 1784/2500 [6:58:40<2:46:14, 13.93s/it] {'loss': 0.0018, 'grad_norm': 0.11558907725079329, 'learning_rate': 2.8639999999999997e-07, 'completion_length': 51.910715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0450439453125, 'epoch': 0.71} 71%|███████▏ | 1784/2500 [6:58:40<2:46:14, 13.93s/it] 71%|███████▏ | 1785/2500 [6:58:53<2:42:53, 13.67s/it] {'loss': 0.0012, 'grad_norm': 0.07779919618250467, 'learning_rate': 2.8599999999999994e-07, 'completion_length': 54.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0306396484375, 'epoch': 0.71} 71%|███████▏ | 1785/2500 [6:58:53<2:42:53, 13.67s/it] 71%|███████▏ | 1786/2500 [6:59:08<2:46:51, 14.02s/it] {'loss': 0.0011, 'grad_norm': 0.12453769342596585, 'learning_rate': 2.856e-07, 'completion_length': 59.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02825927734375, 'epoch': 0.71} 71%|███████▏ | 1786/2500 [6:59:08<2:46:51, 14.02s/it] 71%|███████▏ | 1787/2500 [6:59:22<2:45:51, 13.96s/it] {'loss': 0.0017, 'grad_norm': 0.07267440790357085, 'learning_rate': 2.852e-07, 'completion_length': 58.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0413818359375, 'epoch': 0.71} 71%|███████▏ | 1787/2500 [6:59:22<2:45:51, 13.96s/it] 72%|███████▏ | 1788/2500 [6:59:35<2:43:01, 13.74s/it] {'loss': 0.002, 'grad_norm': 0.10971471483552629, 'learning_rate': 2.848e-07, 'completion_length': 53.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0494384765625, 'epoch': 0.72} 72%|███████▏ | 1788/2500 [6:59:35<2:43:01, 13.74s/it] 72%|███████▏ | 1789/2500 [6:59:49<2:45:06, 13.93s/it] {'loss': 0.0009, 'grad_norm': 0.06516254062329904, 'learning_rate': 2.844e-07, 'completion_length': 59.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02191162109375, 'epoch': 0.72} 72%|███████▏ | 1789/2500 [6:59:49<2:45:06, 13.93s/it] 72%|███████▏ | 1790/2500 [7:00:04<2:47:11, 14.13s/it] {'loss': 0.0012, 'grad_norm': 0.058293678033085376, 'learning_rate': 2.8399999999999995e-07, 'completion_length': 58.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0308837890625, 'epoch': 0.72} 72%|███████▏ | 1790/2500 [7:00:04<2:47:11, 14.13s/it] 72%|███████▏ | 1791/2500 [7:00:18<2:45:23, 14.00s/it] {'loss': 0.0015, 'grad_norm': 0.06749544169925904, 'learning_rate': 2.836e-07, 'completion_length': 48.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0372314453125, 'epoch': 0.72} 72%|███████▏ | 1791/2500 [7:00:18<2:45:23, 14.00s/it] 72%|███████▏ | 1792/2500 [7:00:33<2:48:35, 14.29s/it] {'loss': 0.0014, 'grad_norm': 0.08317051676052022, 'learning_rate': 2.832e-07, 'completion_length': 54.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0361328125, 'epoch': 0.72} 72%|███████▏ | 1792/2500 [7:00:33<2:48:35, 14.29s/it] 72%|███████▏ | 1793/2500 [7:00:47<2:49:42, 14.40s/it] {'loss': 0.0019, 'grad_norm': 0.08914849907407203, 'learning_rate': 2.8279999999999996e-07, 'completion_length': 62.46428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0474853515625, 'epoch': 0.72} 72%|███████▏ | 1793/2500 [7:00:47<2:49:42, 14.40s/it] 72%|███████▏ | 1794/2500 [7:01:02<2:50:22, 14.48s/it] {'loss': 0.002, 'grad_norm': 1.0145279562253497, 'learning_rate': 2.824e-07, 'completion_length': 62.17857360839844, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.049072265625, 'epoch': 0.72} 72%|███████▏ | 1794/2500 [7:01:02<2:50:22, 14.48s/it] 72%|███████▏ | 1795/2500 [7:01:16<2:50:17, 14.49s/it] {'loss': 0.0018, 'grad_norm': 0.09844219113584272, 'learning_rate': 2.8199999999999996e-07, 'completion_length': 59.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.045654296875, 'epoch': 0.72} 72%|███████▏ | 1795/2500 [7:01:16<2:50:17, 14.49s/it] 72%|███████▏ | 1796/2500 [7:01:31<2:49:46, 14.47s/it] {'loss': 0.0024, 'grad_norm': 0.07713431396552742, 'learning_rate': 2.816e-07, 'completion_length': 59.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0589599609375, 'epoch': 0.72} 72%|███████▏ | 1796/2500 [7:01:31<2:49:46, 14.47s/it] 72%|███████▏ | 1797/2500 [7:01:47<2:53:34, 14.81s/it] {'loss': 0.0013, 'grad_norm': 0.0646231850613464, 'learning_rate': 2.812e-07, 'completion_length': 69.26786041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0335693359375, 'epoch': 0.72} 72%|███████▏ | 1797/2500 [7:01:47<2:53:34, 14.81s/it] 72%|███████▏ | 1798/2500 [7:02:00<2:49:13, 14.46s/it] {'loss': 0.001, 'grad_norm': 0.06952385631238268, 'learning_rate': 2.8079999999999997e-07, 'completion_length': 53.08928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025390625, 'epoch': 0.72} 72%|███████▏ | 1798/2500 [7:02:00<2:49:13, 14.46s/it] 72%|███████▏ | 1799/2500 [7:02:14<2:47:43, 14.36s/it] {'loss': 0.0011, 'grad_norm': 0.06080886119476856, 'learning_rate': 2.804e-07, 'completion_length': 52.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0264892578125, 'epoch': 0.72} 72%|███████▏ | 1799/2500 [7:02:14<2:47:43, 14.36s/it] 72%|███████▏ | 1800/2500 [7:02:29<2:47:53, 14.39s/it] {'loss': 0.0015, 'grad_norm': 0.084502531100841, 'learning_rate': 2.8e-07, 'completion_length': 58.01786231994629, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0369873046875, 'epoch': 0.72} 72%|███████▏ | 1800/2500 [7:02:29<2:47:53, 14.39s/it] 72%|███████▏ | 1801/2500 [7:03:39<6:01:50, 31.06s/it] {'loss': 0.001, 'grad_norm': 0.06638045940328365, 'learning_rate': 2.796e-07, 'completion_length': 53.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02447509765625, 'epoch': 0.72} 72%|███████▏ | 1801/2500 [7:03:39<6:01:50, 31.06s/it] 72%|███████▏ | 1802/2500 [7:03:53<5:03:55, 26.13s/it] {'loss': 0.0009, 'grad_norm': 3.101705603990854, 'learning_rate': 2.792e-07, 'completion_length': 63.35714530944824, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.946428656578064, 'reward_std': 0.07695359364151955, 'kl': 0.02166748046875, 'epoch': 0.72} 72%|███████▏ | 1802/2500 [7:03:53<5:03:55, 26.13s/it] 72%|███████▏ | 1803/2500 [7:04:08<4:22:55, 22.63s/it] {'loss': 0.0009, 'grad_norm': 0.07511717562941483, 'learning_rate': 2.788e-07, 'completion_length': 60.21428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0230712890625, 'epoch': 0.72} 72%|███████▏ | 1803/2500 [7:04:08<4:22:55, 22.63s/it] 72%|███████▏ | 1804/2500 [7:04:21<3:50:07, 19.84s/it] {'loss': 0.0021, 'grad_norm': 0.06883939557546416, 'learning_rate': 2.7839999999999995e-07, 'completion_length': 56.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0513916015625, 'epoch': 0.72} 72%|███████▏ | 1804/2500 [7:04:21<3:50:07, 19.84s/it] 72%|███████▏ | 1805/2500 [7:04:35<3:29:25, 18.08s/it] {'loss': 0.0016, 'grad_norm': 0.16790693206394422, 'learning_rate': 2.7800000000000003e-07, 'completion_length': 62.964290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.039306640625, 'epoch': 0.72} 72%|███████▏ | 1805/2500 [7:04:35<3:29:25, 18.08s/it] 72%|███████▏ | 1806/2500 [7:04:49<3:14:35, 16.82s/it] {'loss': 0.0013, 'grad_norm': 0.06859222161604836, 'learning_rate': 2.776e-07, 'completion_length': 60.553571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03277587890625, 'epoch': 0.72} 72%|███████▏ | 1806/2500 [7:04:49<3:14:35, 16.82s/it] 72%|███████▏ | 1807/2500 [7:05:02<3:02:38, 15.81s/it] {'loss': 0.0017, 'grad_norm': 0.20040663807157258, 'learning_rate': 2.7719999999999997e-07, 'completion_length': 50.392860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0419921875, 'epoch': 0.72} 72%|███████▏ | 1807/2500 [7:05:02<3:02:38, 15.81s/it] 72%|███████▏ | 1808/2500 [7:05:18<3:01:47, 15.76s/it] {'loss': 0.0013, 'grad_norm': 1.4884094946557203, 'learning_rate': 2.768e-07, 'completion_length': 62.83928680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03363037109375, 'epoch': 0.72} 72%|███████▏ | 1808/2500 [7:05:18<3:01:47, 15.76s/it] 72%|███████▏ | 1809/2500 [7:05:32<2:54:40, 15.17s/it] {'loss': 0.0013, 'grad_norm': 0.0784321290036288, 'learning_rate': 2.7639999999999996e-07, 'completion_length': 55.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0328369140625, 'epoch': 0.72} 72%|███████▏ | 1809/2500 [7:05:32<2:54:40, 15.17s/it] 72%|███████▏ | 1810/2500 [7:05:46<2:52:19, 14.98s/it] {'loss': 0.0016, 'grad_norm': 0.10086334832451502, 'learning_rate': 2.7600000000000004e-07, 'completion_length': 60.85714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.040771484375, 'epoch': 0.72} 72%|███████▏ | 1810/2500 [7:05:46<2:52:19, 14.98s/it] 72%|███████▏ | 1811/2500 [7:06:00<2:46:39, 14.51s/it] {'loss': 0.001, 'grad_norm': 0.07059498154952787, 'learning_rate': 2.756e-07, 'completion_length': 48.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02374267578125, 'epoch': 0.72} 72%|███████▏ | 1811/2500 [7:06:00<2:46:39, 14.51s/it] 72%|███████▏ | 1812/2500 [7:06:15<2:48:51, 14.73s/it] {'loss': 0.0009, 'grad_norm': 0.14261407513371577, 'learning_rate': 2.752e-07, 'completion_length': 52.71428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02227783203125, 'epoch': 0.72} 72%|███████▏ | 1812/2500 [7:06:15<2:48:51, 14.73s/it] 73%|███████▎ | 1813/2500 [7:06:29<2:46:21, 14.53s/it] {'loss': 0.001, 'grad_norm': 0.05418507890189467, 'learning_rate': 2.748e-07, 'completion_length': 62.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024658203125, 'epoch': 0.73} 73%|███████▎ | 1813/2500 [7:06:29<2:46:21, 14.53s/it] 73%|███████▎ | 1814/2500 [7:06:44<2:47:40, 14.67s/it] {'loss': 0.0015, 'grad_norm': 0.08539347284095475, 'learning_rate': 2.7439999999999997e-07, 'completion_length': 52.67857360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0386962890625, 'epoch': 0.73} 73%|███████▎ | 1814/2500 [7:06:44<2:47:40, 14.67s/it] 73%|███████▎ | 1815/2500 [7:06:59<2:46:47, 14.61s/it] {'loss': 0.0009, 'grad_norm': 1.3689645986545305, 'learning_rate': 2.74e-07, 'completion_length': 56.42857551574707, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.02264404296875, 'epoch': 0.73} 73%|███████▎ | 1815/2500 [7:06:59<2:46:47, 14.61s/it] 73%|███████▎ | 1816/2500 [7:07:13<2:44:52, 14.46s/it] {'loss': 0.0013, 'grad_norm': 0.07427910360215033, 'learning_rate': 2.736e-07, 'completion_length': 56.160715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031494140625, 'epoch': 0.73} 73%|███████▎ | 1816/2500 [7:07:13<2:44:52, 14.46s/it] 73%|███████▎ | 1817/2500 [7:07:27<2:44:25, 14.44s/it] {'loss': 0.0016, 'grad_norm': 0.15747145932037981, 'learning_rate': 2.732e-07, 'completion_length': 57.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03997802734375, 'epoch': 0.73} 73%|███████▎ | 1817/2500 [7:07:27<2:44:25, 14.44s/it] 73%|███████▎ | 1818/2500 [7:07:41<2:42:47, 14.32s/it] {'loss': 0.0013, 'grad_norm': 0.21314893156344858, 'learning_rate': 2.7279999999999995e-07, 'completion_length': 63.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.033447265625, 'epoch': 0.73} 73%|███████▎ | 1818/2500 [7:07:41<2:42:47, 14.32s/it] 73%|███████▎ | 1819/2500 [7:07:55<2:40:34, 14.15s/it] {'loss': 0.0013, 'grad_norm': 0.13865756139159374, 'learning_rate': 2.724e-07, 'completion_length': 54.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03314208984375, 'epoch': 0.73} 73%|███████▎ | 1819/2500 [7:07:55<2:40:34, 14.15s/it] 73%|███████▎ | 1820/2500 [7:08:10<2:43:31, 14.43s/it] {'loss': 0.0007, 'grad_norm': 0.053778189123906174, 'learning_rate': 2.72e-07, 'completion_length': 58.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01837158203125, 'epoch': 0.73} 73%|███████▎ | 1820/2500 [7:08:10<2:43:31, 14.43s/it] 73%|███████▎ | 1821/2500 [7:08:26<2:47:45, 14.82s/it] {'loss': 0.0009, 'grad_norm': 2.0948480012389603, 'learning_rate': 2.7159999999999997e-07, 'completion_length': 61.80357360839844, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.021728515625, 'epoch': 0.73} 73%|███████▎ | 1821/2500 [7:08:26<2:47:45, 14.82s/it] 73%|███████▎ | 1822/2500 [7:08:41<2:49:36, 15.01s/it] {'loss': 0.0017, 'grad_norm': 0.09030306147482425, 'learning_rate': 2.712e-07, 'completion_length': 56.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0435791015625, 'epoch': 0.73} 73%|███████▎ | 1822/2500 [7:08:41<2:49:36, 15.01s/it] 73%|███████▎ | 1823/2500 [7:09:03<3:11:03, 16.93s/it] {'loss': 0.0009, 'grad_norm': 0.4400343216858441, 'learning_rate': 2.7079999999999996e-07, 'completion_length': 70.60714721679688, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0218505859375, 'epoch': 0.73} 73%|███████▎ | 1823/2500 [7:09:03<3:11:03, 16.93s/it] 73%|███████▎ | 1824/2500 [7:09:17<3:04:00, 16.33s/it] {'loss': 0.0007, 'grad_norm': 0.08910159456414904, 'learning_rate': 2.704e-07, 'completion_length': 57.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0174560546875, 'epoch': 0.73} 73%|███████▎ | 1824/2500 [7:09:17<3:04:00, 16.33s/it] 73%|███████▎ | 1825/2500 [7:09:32<2:58:13, 15.84s/it] {'loss': 0.0007, 'grad_norm': 0.12496445405972681, 'learning_rate': 2.7e-07, 'completion_length': 63.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01678466796875, 'epoch': 0.73} 73%|███████▎ | 1825/2500 [7:09:32<2:58:13, 15.84s/it] 73%|███████▎ | 1826/2500 [7:09:46<2:52:15, 15.33s/it] {'loss': 0.0012, 'grad_norm': 1.542187319654479, 'learning_rate': 2.696e-07, 'completion_length': 58.10714530944824, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9107143878936768, 'reward_std': 0.0357142873108387, 'kl': 0.029876708984375, 'epoch': 0.73} 73%|███████▎ | 1826/2500 [7:09:46<2:52:15, 15.33s/it] 73%|███████▎ | 1827/2500 [7:10:01<2:49:27, 15.11s/it] {'loss': 0.0007, 'grad_norm': 0.08455585325777495, 'learning_rate': 2.692e-07, 'completion_length': 54.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0174407958984375, 'epoch': 0.73} 73%|███████▎ | 1827/2500 [7:10:01<2:49:27, 15.11s/it] 73%|███████▎ | 1828/2500 [7:10:15<2:46:26, 14.86s/it] {'loss': 0.0012, 'grad_norm': 0.05116344843247286, 'learning_rate': 2.6879999999999997e-07, 'completion_length': 57.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0303955078125, 'epoch': 0.73} 73%|███████▎ | 1828/2500 [7:10:15<2:46:26, 14.86s/it] 73%|███████▎ | 1829/2500 [7:10:30<2:45:21, 14.79s/it] {'loss': 0.0018, 'grad_norm': 0.08207340808090581, 'learning_rate': 2.684e-07, 'completion_length': 57.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0445556640625, 'epoch': 0.73} 73%|███████▎ | 1829/2500 [7:10:30<2:45:21, 14.79s/it] 73%|███████▎ | 1830/2500 [7:10:46<2:48:29, 15.09s/it] {'loss': 0.0011, 'grad_norm': 0.08138512924089909, 'learning_rate': 2.68e-07, 'completion_length': 71.46429061889648, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02740478515625, 'epoch': 0.73} 73%|███████▎ | 1830/2500 [7:10:46<2:48:29, 15.09s/it] 73%|███████▎ | 1831/2500 [7:11:01<2:48:02, 15.07s/it] {'loss': 0.0016, 'grad_norm': 0.06709054328647358, 'learning_rate': 2.676e-07, 'completion_length': 60.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0401611328125, 'epoch': 0.73} 73%|███████▎ | 1831/2500 [7:11:01<2:48:02, 15.07s/it] 73%|███████▎ | 1832/2500 [7:11:15<2:43:45, 14.71s/it] {'loss': 0.0008, 'grad_norm': 0.07420276925685294, 'learning_rate': 2.6719999999999996e-07, 'completion_length': 53.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02099609375, 'epoch': 0.73} 73%|███████▎ | 1832/2500 [7:11:15<2:43:45, 14.71s/it] 73%|███████▎ | 1833/2500 [7:11:30<2:46:35, 14.99s/it] {'loss': 0.0011, 'grad_norm': 0.07387361768242183, 'learning_rate': 2.668e-07, 'completion_length': 60.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02813720703125, 'epoch': 0.73} 73%|███████▎ | 1833/2500 [7:11:30<2:46:35, 14.99s/it] 73%|███████▎ | 1834/2500 [7:11:44<2:42:38, 14.65s/it] {'loss': 0.0014, 'grad_norm': 0.06323163438122407, 'learning_rate': 2.664e-07, 'completion_length': 46.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03472900390625, 'epoch': 0.73} 73%|███████▎ | 1834/2500 [7:11:44<2:42:38, 14.65s/it] 73%|███████▎ | 1835/2500 [7:11:58<2:41:02, 14.53s/it] {'loss': 0.0021, 'grad_norm': 0.06690399794757798, 'learning_rate': 2.66e-07, 'completion_length': 57.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0533447265625, 'epoch': 0.73} 73%|███████▎ | 1835/2500 [7:11:58<2:41:02, 14.53s/it] 73%|███████▎ | 1836/2500 [7:12:12<2:38:28, 14.32s/it] {'loss': 0.0016, 'grad_norm': 0.07327927803883964, 'learning_rate': 2.656e-07, 'completion_length': 51.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038818359375, 'epoch': 0.73} 73%|███████▎ | 1836/2500 [7:12:12<2:38:28, 14.32s/it] 73%|███████▎ | 1837/2500 [7:12:27<2:40:50, 14.56s/it] {'loss': 0.0009, 'grad_norm': 0.08874535573038275, 'learning_rate': 2.6519999999999997e-07, 'completion_length': 53.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02294921875, 'epoch': 0.73} 73%|███████▎ | 1837/2500 [7:12:27<2:40:50, 14.56s/it] 74%|███████▎ | 1838/2500 [7:12:41<2:39:06, 14.42s/it] {'loss': 0.0022, 'grad_norm': 0.8557845299506738, 'learning_rate': 2.648e-07, 'completion_length': 55.46428680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0540771484375, 'epoch': 0.74} 74%|███████▎ | 1838/2500 [7:12:41<2:39:06, 14.42s/it] 74%|███████▎ | 1839/2500 [7:12:56<2:38:47, 14.41s/it] {'loss': 0.0011, 'grad_norm': 0.06038836454263576, 'learning_rate': 2.644e-07, 'completion_length': 61.05357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02655029296875, 'epoch': 0.74} 74%|███████▎ | 1839/2500 [7:12:56<2:38:47, 14.41s/it] 74%|███████▎ | 1840/2500 [7:13:10<2:38:29, 14.41s/it] {'loss': 0.0011, 'grad_norm': 0.06826055712904622, 'learning_rate': 2.64e-07, 'completion_length': 58.750003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028076171875, 'epoch': 0.74} 74%|███████▎ | 1840/2500 [7:13:10<2:38:29, 14.41s/it] 74%|███████▎ | 1841/2500 [7:13:26<2:44:19, 14.96s/it] {'loss': 0.0015, 'grad_norm': 5.291968407716158, 'learning_rate': 2.636e-07, 'completion_length': 59.44643211364746, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.07695359364151955, 'kl': 0.037109375, 'epoch': 0.74} 74%|███████▎ | 1841/2500 [7:13:26<2:44:19, 14.96s/it] 74%|███████▎ | 1842/2500 [7:13:40<2:40:39, 14.65s/it] {'loss': 0.0008, 'grad_norm': 0.10972637185344024, 'learning_rate': 2.632e-07, 'completion_length': 55.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0205078125, 'epoch': 0.74} 74%|███████▎ | 1842/2500 [7:13:40<2:40:39, 14.65s/it] 74%|███████▎ | 1843/2500 [7:13:54<2:36:32, 14.30s/it] {'loss': 0.0007, 'grad_norm': 0.08727319831657694, 'learning_rate': 2.6279999999999994e-07, 'completion_length': 47.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01641845703125, 'epoch': 0.74} 74%|███████▎ | 1843/2500 [7:13:54<2:36:32, 14.30s/it] 74%|███████▍ | 1844/2500 [7:14:08<2:36:58, 14.36s/it] {'loss': 0.0012, 'grad_norm': 0.10695593208365665, 'learning_rate': 2.624e-07, 'completion_length': 58.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03057861328125, 'epoch': 0.74} 74%|███████▍ | 1844/2500 [7:14:08<2:36:58, 14.36s/it] 74%|███████▍ | 1845/2500 [7:14:22<2:33:33, 14.07s/it] {'loss': 0.0012, 'grad_norm': 0.07106243391139647, 'learning_rate': 2.62e-07, 'completion_length': 53.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02978515625, 'epoch': 0.74} 74%|███████▍ | 1845/2500 [7:14:22<2:33:33, 14.07s/it] 74%|███████▍ | 1846/2500 [7:14:35<2:31:39, 13.91s/it] {'loss': 0.0013, 'grad_norm': 0.06786092950606729, 'learning_rate': 2.616e-07, 'completion_length': 50.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0335693359375, 'epoch': 0.74} 74%|███████▍ | 1846/2500 [7:14:35<2:31:39, 13.91s/it] 74%|███████▍ | 1847/2500 [7:14:49<2:30:33, 13.83s/it] {'loss': 0.0014, 'grad_norm': 0.08378884830629979, 'learning_rate': 2.612e-07, 'completion_length': 52.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0362548828125, 'epoch': 0.74} 74%|███████▍ | 1847/2500 [7:14:49<2:30:33, 13.83s/it] 74%|███████▍ | 1848/2500 [7:15:04<2:34:06, 14.18s/it] {'loss': 0.0007, 'grad_norm': 0.05970310928304336, 'learning_rate': 2.6079999999999995e-07, 'completion_length': 48.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0169677734375, 'epoch': 0.74} 74%|███████▍ | 1848/2500 [7:15:04<2:34:06, 14.18s/it] 74%|███████▍ | 1849/2500 [7:15:17<2:31:27, 13.96s/it] {'loss': 0.0009, 'grad_norm': 0.0960470282878352, 'learning_rate': 2.6040000000000003e-07, 'completion_length': 56.71428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0216064453125, 'epoch': 0.74} 74%|███████▍ | 1849/2500 [7:15:17<2:31:27, 13.96s/it] 74%|███████▍ | 1850/2500 [7:15:31<2:31:25, 13.98s/it] {'loss': 0.0008, 'grad_norm': 0.13668544153087456, 'learning_rate': 2.6e-07, 'completion_length': 64.42857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02032470703125, 'epoch': 0.74} 74%|███████▍ | 1850/2500 [7:15:31<2:31:25, 13.98s/it] 74%|███████▍ | 1851/2500 [7:15:46<2:33:57, 14.23s/it] {'loss': 0.0014, 'grad_norm': 0.17338665927099892, 'learning_rate': 2.5959999999999997e-07, 'completion_length': 58.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0350341796875, 'epoch': 0.74} 74%|███████▍ | 1851/2500 [7:15:46<2:33:57, 14.23s/it] 74%|███████▍ | 1852/2500 [7:16:00<2:33:20, 14.20s/it] {'loss': 0.0011, 'grad_norm': 0.08343307989651655, 'learning_rate': 2.592e-07, 'completion_length': 62.58928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0262451171875, 'epoch': 0.74} 74%|███████▍ | 1852/2500 [7:16:00<2:33:20, 14.20s/it] 74%|███████▍ | 1853/2500 [7:16:16<2:36:52, 14.55s/it] {'loss': 0.0008, 'grad_norm': 0.38257154094972395, 'learning_rate': 2.5879999999999996e-07, 'completion_length': 58.30357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020263671875, 'epoch': 0.74} 74%|███████▍ | 1853/2500 [7:16:16<2:36:52, 14.55s/it] 74%|███████▍ | 1854/2500 [7:16:30<2:35:16, 14.42s/it] {'loss': 0.0011, 'grad_norm': 0.12594145919115776, 'learning_rate': 2.584e-07, 'completion_length': 57.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027374267578125, 'epoch': 0.74} 74%|███████▍ | 1854/2500 [7:16:30<2:35:16, 14.42s/it] 74%|███████▍ | 1855/2500 [7:16:44<2:34:26, 14.37s/it] {'loss': 0.0016, 'grad_norm': 0.06905337191918175, 'learning_rate': 2.58e-07, 'completion_length': 63.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0399169921875, 'epoch': 0.74} 74%|███████▍ | 1855/2500 [7:16:44<2:34:26, 14.37s/it] 74%|███████▍ | 1856/2500 [7:16:58<2:34:42, 14.41s/it] {'loss': 0.0013, 'grad_norm': 1.3622536110831973, 'learning_rate': 2.576e-07, 'completion_length': 63.44643020629883, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0333251953125, 'epoch': 0.74} 74%|███████▍ | 1856/2500 [7:16:58<2:34:42, 14.41s/it] 74%|███████▍ | 1857/2500 [7:17:12<2:32:45, 14.25s/it] {'loss': 0.0008, 'grad_norm': 0.11055477700364302, 'learning_rate': 2.5719999999999995e-07, 'completion_length': 56.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0201416015625, 'epoch': 0.74} 74%|███████▍ | 1857/2500 [7:17:12<2:32:45, 14.25s/it] 74%|███████▍ | 1858/2500 [7:17:28<2:36:12, 14.60s/it] {'loss': 0.0007, 'grad_norm': 0.768288571950435, 'learning_rate': 2.5679999999999997e-07, 'completion_length': 64.30357551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.017822265625, 'epoch': 0.74} 74%|███████▍ | 1858/2500 [7:17:28<2:36:12, 14.60s/it] 74%|███████▍ | 1859/2500 [7:17:42<2:34:25, 14.45s/it] {'loss': 0.001, 'grad_norm': 0.07187452203514114, 'learning_rate': 2.564e-07, 'completion_length': 58.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0257568359375, 'epoch': 0.74} 74%|███████▍ | 1859/2500 [7:17:42<2:34:25, 14.45s/it] 74%|███████▍ | 1860/2500 [7:17:56<2:34:21, 14.47s/it] {'loss': 0.0008, 'grad_norm': 0.054530160846816306, 'learning_rate': 2.56e-07, 'completion_length': 53.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0196533203125, 'epoch': 0.74} 74%|███████▍ | 1860/2500 [7:17:56<2:34:21, 14.47s/it] 74%|███████▍ | 1861/2500 [7:18:11<2:33:14, 14.39s/it] {'loss': 0.0013, 'grad_norm': 0.09667103723953195, 'learning_rate': 2.556e-07, 'completion_length': 54.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032958984375, 'epoch': 0.74} 74%|███████▍ | 1861/2500 [7:18:11<2:33:14, 14.39s/it] 74%|███████▍ | 1862/2500 [7:18:26<2:34:43, 14.55s/it] {'loss': 0.0009, 'grad_norm': 0.07229170107894294, 'learning_rate': 2.5519999999999996e-07, 'completion_length': 59.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02362060546875, 'epoch': 0.74} 74%|███████▍ | 1862/2500 [7:18:26<2:34:43, 14.55s/it] 75%|███████▍ | 1863/2500 [7:18:45<2:49:22, 15.95s/it] {'loss': 0.0005, 'grad_norm': 0.3722985895334413, 'learning_rate': 2.5480000000000003e-07, 'completion_length': 64.82143211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.013214111328125, 'epoch': 0.75} 75%|███████▍ | 1863/2500 [7:18:45<2:49:22, 15.95s/it] 75%|███████▍ | 1864/2500 [7:18:59<2:43:59, 15.47s/it] {'loss': 0.0011, 'grad_norm': 0.07760603460336395, 'learning_rate': 2.544e-07, 'completion_length': 54.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02685546875, 'epoch': 0.75} 75%|███████▍ | 1864/2500 [7:18:59<2:43:59, 15.47s/it] 75%|███████▍ | 1865/2500 [7:19:15<2:44:01, 15.50s/it] {'loss': 0.0009, 'grad_norm': 0.1544713602799484, 'learning_rate': 2.5399999999999997e-07, 'completion_length': 62.000003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02337646484375, 'epoch': 0.75} 75%|███████▍ | 1865/2500 [7:19:15<2:44:01, 15.50s/it] 75%|███████▍ | 1866/2500 [7:19:29<2:40:48, 15.22s/it] {'loss': 0.0019, 'grad_norm': 0.09271956557320221, 'learning_rate': 2.536e-07, 'completion_length': 54.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.048095703125, 'epoch': 0.75} 75%|███████▍ | 1866/2500 [7:19:29<2:40:48, 15.22s/it] 75%|███████▍ | 1867/2500 [7:19:43<2:35:16, 14.72s/it] {'loss': 0.0011, 'grad_norm': 0.08205258254421353, 'learning_rate': 2.5319999999999996e-07, 'completion_length': 53.660715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02740478515625, 'epoch': 0.75} 75%|███████▍ | 1867/2500 [7:19:43<2:35:16, 14.72s/it] 75%|███████▍ | 1868/2500 [7:19:56<2:31:31, 14.38s/it] {'loss': 0.0011, 'grad_norm': 0.08485125325458116, 'learning_rate': 2.528e-07, 'completion_length': 50.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02874755859375, 'epoch': 0.75} 75%|███████▍ | 1868/2500 [7:19:56<2:31:31, 14.38s/it] 75%|███████▍ | 1869/2500 [7:20:10<2:29:44, 14.24s/it] {'loss': 0.0015, 'grad_norm': 0.09896316260320348, 'learning_rate': 2.524e-07, 'completion_length': 58.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0372314453125, 'epoch': 0.75} 75%|███████▍ | 1869/2500 [7:20:10<2:29:44, 14.24s/it] 75%|███████▍ | 1870/2500 [7:20:24<2:28:19, 14.13s/it] {'loss': 0.0007, 'grad_norm': 0.06380065213608918, 'learning_rate': 2.52e-07, 'completion_length': 52.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017822265625, 'epoch': 0.75} 75%|███████▍ | 1870/2500 [7:20:24<2:28:19, 14.13s/it] 75%|███████▍ | 1871/2500 [7:20:38<2:26:27, 13.97s/it] {'loss': 0.0006, 'grad_norm': 0.49834282641956595, 'learning_rate': 2.516e-07, 'completion_length': 52.392860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014892578125, 'epoch': 0.75} 75%|███████▍ | 1871/2500 [7:20:38<2:26:27, 13.97s/it] 75%|███████▍ | 1872/2500 [7:20:53<2:29:18, 14.27s/it] {'loss': 0.0012, 'grad_norm': 0.07686249440994618, 'learning_rate': 2.5119999999999997e-07, 'completion_length': 52.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02978515625, 'epoch': 0.75} 75%|███████▍ | 1872/2500 [7:20:53<2:29:18, 14.27s/it] 75%|███████▍ | 1873/2500 [7:21:06<2:26:45, 14.04s/it] {'loss': 0.0011, 'grad_norm': 0.04304001045676219, 'learning_rate': 2.508e-07, 'completion_length': 51.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02655029296875, 'epoch': 0.75} 75%|███████▍ | 1873/2500 [7:21:06<2:26:45, 14.04s/it] 75%|███████▍ | 1874/2500 [7:21:20<2:25:36, 13.96s/it] {'loss': 0.0007, 'grad_norm': 0.054628344377309165, 'learning_rate': 2.504e-07, 'completion_length': 53.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01702880859375, 'epoch': 0.75} 75%|███████▍ | 1874/2500 [7:21:20<2:25:36, 13.96s/it] 75%|███████▌ | 1875/2500 [7:21:35<2:28:44, 14.28s/it] {'loss': 0.0017, 'grad_norm': 1.0627318027741954, 'learning_rate': 2.5e-07, 'completion_length': 61.57143211364746, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0419921875, 'epoch': 0.75} 75%|███████▌ | 1875/2500 [7:21:35<2:28:44, 14.28s/it] 75%|███████▌ | 1876/2500 [7:21:49<2:28:33, 14.28s/it] {'loss': 0.0008, 'grad_norm': 0.08489505010277715, 'learning_rate': 2.4959999999999996e-07, 'completion_length': 59.375003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0204620361328125, 'epoch': 0.75} 75%|███████▌ | 1876/2500 [7:21:49<2:28:33, 14.28s/it] 75%|███████▌ | 1877/2500 [7:22:04<2:29:55, 14.44s/it] {'loss': 0.0012, 'grad_norm': 0.13863746278769937, 'learning_rate': 2.492e-07, 'completion_length': 64.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03076171875, 'epoch': 0.75} 75%|███████▌ | 1877/2500 [7:22:04<2:29:55, 14.44s/it] 75%|███████▌ | 1878/2500 [7:22:18<2:27:18, 14.21s/it] {'loss': 0.001, 'grad_norm': 0.07637955385221251, 'learning_rate': 2.488e-07, 'completion_length': 50.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02593994140625, 'epoch': 0.75} 75%|███████▌ | 1878/2500 [7:22:18<2:27:18, 14.21s/it] 75%|███████▌ | 1879/2500 [7:22:32<2:26:12, 14.13s/it] {'loss': 0.0008, 'grad_norm': 0.049397989311642056, 'learning_rate': 2.484e-07, 'completion_length': 55.000003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01922607421875, 'epoch': 0.75} 75%|███████▌ | 1879/2500 [7:22:32<2:26:12, 14.13s/it] 75%|███████▌ | 1880/2500 [7:22:46<2:26:55, 14.22s/it] {'loss': 0.0019, 'grad_norm': 0.16410177003803067, 'learning_rate': 2.48e-07, 'completion_length': 59.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0462646484375, 'epoch': 0.75} 75%|███████▌ | 1880/2500 [7:22:46<2:26:55, 14.22s/it] 75%|███████▌ | 1881/2500 [7:22:59<2:23:59, 13.96s/it] {'loss': 0.0003, 'grad_norm': 0.07321251209565949, 'learning_rate': 2.4759999999999997e-07, 'completion_length': 51.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0077972412109375, 'epoch': 0.75} 75%|███████▌ | 1881/2500 [7:22:59<2:23:59, 13.96s/it] 75%|███████▌ | 1882/2500 [7:23:15<2:27:34, 14.33s/it] {'loss': 0.0011, 'grad_norm': 2.48689057690803, 'learning_rate': 2.472e-07, 'completion_length': 68.41071701049805, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.02813720703125, 'epoch': 0.75} 75%|███████▌ | 1882/2500 [7:23:15<2:27:34, 14.33s/it] 75%|███████▌ | 1883/2500 [7:23:30<2:29:06, 14.50s/it] {'loss': 0.0026, 'grad_norm': 5.142777086352517, 'learning_rate': 2.4679999999999996e-07, 'completion_length': 61.75000190734863, 'rewards/accuracy_reward': 0.910714328289032, 'rewards/format_reward': 1.0, 'reward': 1.910714328289032, 'reward_std': 0.07695359364151955, 'kl': 0.065673828125, 'epoch': 0.75} 75%|███████▌ | 1883/2500 [7:23:30<2:29:06, 14.50s/it] 75%|███████▌ | 1884/2500 [7:23:43<2:25:18, 14.15s/it] {'loss': 0.0008, 'grad_norm': 0.06633505032844267, 'learning_rate': 2.464e-07, 'completion_length': 50.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019317626953125, 'epoch': 0.75} 75%|███████▌ | 1884/2500 [7:23:43<2:25:18, 14.15s/it] 75%|███████▌ | 1885/2500 [7:23:57<2:23:52, 14.04s/it] {'loss': 0.0015, 'grad_norm': 0.06409281006869695, 'learning_rate': 2.46e-07, 'completion_length': 56.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.75} 75%|███████▌ | 1885/2500 [7:23:57<2:23:52, 14.04s/it] 75%|███████▌ | 1886/2500 [7:24:10<2:22:53, 13.96s/it] {'loss': 0.0011, 'grad_norm': 0.06534725796728966, 'learning_rate': 2.456e-07, 'completion_length': 59.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0281982421875, 'epoch': 0.75} 75%|███████▌ | 1886/2500 [7:24:10<2:22:53, 13.96s/it] 75%|███████▌ | 1887/2500 [7:24:23<2:19:43, 13.68s/it] {'loss': 0.0016, 'grad_norm': 2.801337115297586, 'learning_rate': 2.452e-07, 'completion_length': 53.30357360839844, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0394287109375, 'epoch': 0.75} 75%|███████▌ | 1887/2500 [7:24:24<2:19:43, 13.68s/it] 76%|███████▌ | 1888/2500 [7:24:38<2:20:56, 13.82s/it] {'loss': 0.0014, 'grad_norm': 0.2768029388362597, 'learning_rate': 2.4479999999999997e-07, 'completion_length': 60.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03369140625, 'epoch': 0.76} 76%|███████▌ | 1888/2500 [7:24:38<2:20:56, 13.82s/it] 76%|███████▌ | 1889/2500 [7:24:51<2:19:43, 13.72s/it] {'loss': 0.001, 'grad_norm': 1.247084506387907, 'learning_rate': 2.444e-07, 'completion_length': 58.250003814697266, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.024139404296875, 'epoch': 0.76} 76%|███████▌ | 1889/2500 [7:24:51<2:19:43, 13.72s/it] 76%|███████▌ | 1890/2500 [7:25:05<2:20:53, 13.86s/it] {'loss': 0.0009, 'grad_norm': 0.05807507102996408, 'learning_rate': 2.4399999999999996e-07, 'completion_length': 62.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02203369140625, 'epoch': 0.76} 76%|███████▌ | 1890/2500 [7:25:05<2:20:53, 13.86s/it] 76%|███████▌ | 1891/2500 [7:25:19<2:20:34, 13.85s/it] {'loss': 0.0015, 'grad_norm': 0.0746891403630753, 'learning_rate': 2.436e-07, 'completion_length': 57.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038665771484375, 'epoch': 0.76} 76%|███████▌ | 1891/2500 [7:25:19<2:20:34, 13.85s/it] 76%|███████▌ | 1892/2500 [7:25:34<2:22:30, 14.06s/it] {'loss': 0.001, 'grad_norm': 0.07212656412700302, 'learning_rate': 2.432e-07, 'completion_length': 64.96429061889648, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024505615234375, 'epoch': 0.76} 76%|███████▌ | 1892/2500 [7:25:34<2:22:30, 14.06s/it] 76%|███████▌ | 1893/2500 [7:25:47<2:19:13, 13.76s/it] {'loss': 0.0013, 'grad_norm': 0.09329753816369356, 'learning_rate': 2.428e-07, 'completion_length': 48.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03228759765625, 'epoch': 0.76} 76%|███████▌ | 1893/2500 [7:25:47<2:19:13, 13.76s/it] 76%|███████▌ | 1894/2500 [7:26:02<2:24:02, 14.26s/it] {'loss': 0.0013, 'grad_norm': 0.07079328232382845, 'learning_rate': 2.424e-07, 'completion_length': 57.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03314208984375, 'epoch': 0.76} 76%|███████▌ | 1894/2500 [7:26:02<2:24:02, 14.26s/it] 76%|███████▌ | 1895/2500 [7:26:16<2:21:10, 14.00s/it] {'loss': 0.0009, 'grad_norm': 0.08293070543856805, 'learning_rate': 2.4199999999999997e-07, 'completion_length': 56.16071701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02294921875, 'epoch': 0.76} 76%|███████▌ | 1895/2500 [7:26:16<2:21:10, 14.00s/it] 76%|███████▌ | 1896/2500 [7:26:30<2:22:32, 14.16s/it] {'loss': 0.0009, 'grad_norm': 0.08057705596968719, 'learning_rate': 2.416e-07, 'completion_length': 66.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021881103515625, 'epoch': 0.76} 76%|███████▌ | 1896/2500 [7:26:30<2:22:32, 14.16s/it] 76%|███████▌ | 1897/2500 [7:26:43<2:19:43, 13.90s/it] {'loss': 0.0009, 'grad_norm': 0.14807300256357456, 'learning_rate': 2.4119999999999996e-07, 'completion_length': 52.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021728515625, 'epoch': 0.76} 76%|███████▌ | 1897/2500 [7:26:43<2:19:43, 13.90s/it] 76%|███████▌ | 1898/2500 [7:27:01<2:30:00, 14.95s/it] {'loss': 0.0007, 'grad_norm': 0.10918506977962379, 'learning_rate': 2.408e-07, 'completion_length': 61.857147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0181884765625, 'epoch': 0.76} 76%|███████▌ | 1898/2500 [7:27:01<2:30:00, 14.95s/it] 76%|███████▌ | 1899/2500 [7:27:16<2:29:09, 14.89s/it] {'loss': 0.0009, 'grad_norm': 0.08319107069486911, 'learning_rate': 2.404e-07, 'completion_length': 59.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022308349609375, 'epoch': 0.76} 76%|███████▌ | 1899/2500 [7:27:16<2:29:09, 14.89s/it] 76%|███████▌ | 1900/2500 [7:27:30<2:27:40, 14.77s/it] {'loss': 0.0009, 'grad_norm': 1.4672590902834228, 'learning_rate': 2.4e-07, 'completion_length': 59.91071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0223388671875, 'epoch': 0.76} 76%|███████▌ | 1900/2500 [7:27:30<2:27:40, 14.77s/it] 76%|███████▌ | 1901/2500 [7:28:43<5:22:29, 32.30s/it] {'loss': 0.0005, 'grad_norm': 0.07794301558482225, 'learning_rate': 2.396e-07, 'completion_length': 56.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01171875, 'epoch': 0.76} 76%|███████▌ | 1901/2500 [7:28:43<5:22:29, 32.30s/it] 76%|███████▌ | 1902/2500 [7:28:57<4:25:57, 26.69s/it] {'loss': 0.0014, 'grad_norm': 0.10350227632710701, 'learning_rate': 2.3919999999999997e-07, 'completion_length': 55.83928680419922, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0357666015625, 'epoch': 0.76} 76%|███████▌ | 1902/2500 [7:28:57<4:25:57, 26.69s/it] 76%|███████▌ | 1903/2500 [7:29:11<3:48:49, 23.00s/it] {'loss': 0.0013, 'grad_norm': 0.377698358714927, 'learning_rate': 2.388e-07, 'completion_length': 60.982147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031494140625, 'epoch': 0.76} 76%|███████▌ | 1903/2500 [7:29:11<3:48:49, 23.00s/it] 76%|███████▌ | 1904/2500 [7:29:25<3:21:09, 20.25s/it] {'loss': 0.0012, 'grad_norm': 0.06605866275239068, 'learning_rate': 2.384e-07, 'completion_length': 55.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03125, 'epoch': 0.76} 76%|███████▌ | 1904/2500 [7:29:25<3:21:09, 20.25s/it] 76%|███████▌ | 1905/2500 [7:29:39<3:02:14, 18.38s/it] {'loss': 0.0011, 'grad_norm': 0.1478925158929713, 'learning_rate': 2.38e-07, 'completion_length': 59.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02703857421875, 'epoch': 0.76} 76%|███████▌ | 1905/2500 [7:29:39<3:02:14, 18.38s/it] 76%|███████▌ | 1906/2500 [7:29:53<2:48:18, 17.00s/it] {'loss': 0.0012, 'grad_norm': 0.18863468919565274, 'learning_rate': 2.3759999999999998e-07, 'completion_length': 55.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02886962890625, 'epoch': 0.76} 76%|███████▌ | 1906/2500 [7:29:53<2:48:18, 17.00s/it] 76%|███████▋ | 1907/2500 [7:30:07<2:38:29, 16.04s/it] {'loss': 0.0014, 'grad_norm': 0.09363401732581025, 'learning_rate': 2.3719999999999998e-07, 'completion_length': 60.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03466796875, 'epoch': 0.76} 76%|███████▋ | 1907/2500 [7:30:07<2:38:29, 16.04s/it] 76%|███████▋ | 1908/2500 [7:30:20<2:31:40, 15.37s/it] {'loss': 0.0009, 'grad_norm': 0.08924688866477888, 'learning_rate': 2.368e-07, 'completion_length': 55.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022705078125, 'epoch': 0.76} 76%|███████▋ | 1908/2500 [7:30:20<2:31:40, 15.37s/it] 76%|███████▋ | 1909/2500 [7:30:34<2:26:53, 14.91s/it] {'loss': 0.0012, 'grad_norm': 0.0895496172595628, 'learning_rate': 2.364e-07, 'completion_length': 57.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02984619140625, 'epoch': 0.76} 76%|███████▋ | 1909/2500 [7:30:34<2:26:53, 14.91s/it] 76%|███████▋ | 1910/2500 [7:30:49<2:26:33, 14.90s/it] {'loss': 0.001, 'grad_norm': 0.062068663670769235, 'learning_rate': 2.3599999999999997e-07, 'completion_length': 64.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024139404296875, 'epoch': 0.76} 76%|███████▋ | 1910/2500 [7:30:49<2:26:33, 14.90s/it] 76%|███████▋ | 1911/2500 [7:31:04<2:25:45, 14.85s/it] {'loss': 0.0009, 'grad_norm': 0.08323198199310473, 'learning_rate': 2.356e-07, 'completion_length': 65.85714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02325439453125, 'epoch': 0.76} 76%|███████▋ | 1911/2500 [7:31:04<2:25:45, 14.85s/it] 76%|███████▋ | 1912/2500 [7:31:19<2:24:45, 14.77s/it] {'loss': 0.0011, 'grad_norm': 0.0876612704638848, 'learning_rate': 2.352e-07, 'completion_length': 63.03571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.027252197265625, 'epoch': 0.76} 76%|███████▋ | 1912/2500 [7:31:19<2:24:45, 14.77s/it] 77%|███████▋ | 1913/2500 [7:31:32<2:21:14, 14.44s/it] {'loss': 0.0005, 'grad_norm': 0.09568520079579144, 'learning_rate': 2.3479999999999998e-07, 'completion_length': 52.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01220703125, 'epoch': 0.77} 77%|███████▋ | 1913/2500 [7:31:32<2:21:14, 14.44s/it] 77%|███████▋ | 1914/2500 [7:31:47<2:22:08, 14.55s/it] {'loss': 0.0008, 'grad_norm': 0.0858295222436728, 'learning_rate': 2.3439999999999998e-07, 'completion_length': 56.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018951416015625, 'epoch': 0.77} 77%|███████▋ | 1914/2500 [7:31:47<2:22:08, 14.55s/it] 77%|███████▋ | 1915/2500 [7:32:00<2:17:09, 14.07s/it] {'loss': 0.0006, 'grad_norm': 0.20688984323943993, 'learning_rate': 2.34e-07, 'completion_length': 55.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013916015625, 'epoch': 0.77} 77%|███████▋ | 1915/2500 [7:32:00<2:17:09, 14.07s/it] 77%|███████▋ | 1916/2500 [7:32:14<2:17:15, 14.10s/it] {'loss': 0.0011, 'grad_norm': 0.11399875023842099, 'learning_rate': 2.336e-07, 'completion_length': 54.23214340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02838134765625, 'epoch': 0.77} 77%|███████▋ | 1916/2500 [7:32:14<2:17:15, 14.10s/it] 77%|███████▋ | 1917/2500 [7:32:28<2:16:31, 14.05s/it] {'loss': 0.0005, 'grad_norm': 1.4848097342140048, 'learning_rate': 2.3319999999999997e-07, 'completion_length': 54.44643020629883, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.011566162109375, 'epoch': 0.77} 77%|███████▋ | 1917/2500 [7:32:28<2:16:31, 14.05s/it] 77%|███████▋ | 1918/2500 [7:32:41<2:13:31, 13.77s/it] {'loss': 0.0005, 'grad_norm': 0.07291435065052401, 'learning_rate': 2.328e-07, 'completion_length': 52.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013519287109375, 'epoch': 0.77} 77%|███████▋ | 1918/2500 [7:32:41<2:13:31, 13.77s/it] 77%|███████▋ | 1919/2500 [7:32:55<2:14:04, 13.85s/it] {'loss': 0.0013, 'grad_norm': 0.9571961803501852, 'learning_rate': 2.324e-07, 'completion_length': 60.85714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.033203125, 'epoch': 0.77} 77%|███████▋ | 1919/2500 [7:32:55<2:14:04, 13.85s/it] 77%|███████▋ | 1920/2500 [7:33:10<2:17:46, 14.25s/it] {'loss': 0.0015, 'grad_norm': 0.10634478552609446, 'learning_rate': 2.32e-07, 'completion_length': 56.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038330078125, 'epoch': 0.77} 77%|███████▋ | 1920/2500 [7:33:10<2:17:46, 14.25s/it] 77%|███████▋ | 1921/2500 [7:33:28<2:26:21, 15.17s/it] {'loss': 0.001, 'grad_norm': 0.8041555315925368, 'learning_rate': 2.3159999999999998e-07, 'completion_length': 72.9285774230957, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.02423095703125, 'epoch': 0.77} 77%|███████▋ | 1921/2500 [7:33:28<2:26:21, 15.17s/it] 77%|███████▋ | 1922/2500 [7:33:41<2:21:21, 14.67s/it] {'loss': 0.0011, 'grad_norm': 0.06496331505211217, 'learning_rate': 2.3119999999999998e-07, 'completion_length': 55.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0282440185546875, 'epoch': 0.77} 77%|███████▋ | 1922/2500 [7:33:41<2:21:21, 14.67s/it] 77%|███████▋ | 1923/2500 [7:33:56<2:20:05, 14.57s/it] {'loss': 0.0014, 'grad_norm': 0.08083976863185043, 'learning_rate': 2.308e-07, 'completion_length': 63.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03472900390625, 'epoch': 0.77} 77%|███████▋ | 1923/2500 [7:33:56<2:20:05, 14.57s/it] 77%|███████▋ | 1924/2500 [7:34:10<2:20:30, 14.64s/it] {'loss': 0.0012, 'grad_norm': 0.07974497048405506, 'learning_rate': 2.3039999999999997e-07, 'completion_length': 66.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02899169921875, 'epoch': 0.77} 77%|███████▋ | 1924/2500 [7:34:10<2:20:30, 14.64s/it] 77%|███████▋ | 1925/2500 [7:34:25<2:18:59, 14.50s/it] {'loss': 0.0012, 'grad_norm': 0.04506665808791631, 'learning_rate': 2.3e-07, 'completion_length': 56.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0296630859375, 'epoch': 0.77} 77%|███████▋ | 1925/2500 [7:34:25<2:18:59, 14.50s/it] 77%|███████▋ | 1926/2500 [7:34:38<2:16:05, 14.22s/it] {'loss': 0.0016, 'grad_norm': 0.9318506824207513, 'learning_rate': 2.296e-07, 'completion_length': 51.82143020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03973388671875, 'epoch': 0.77} 77%|███████▋ | 1926/2500 [7:34:38<2:16:05, 14.22s/it] 77%|███████▋ | 1927/2500 [7:34:52<2:13:57, 14.03s/it] {'loss': 0.0015, 'grad_norm': 0.1426732846302086, 'learning_rate': 2.292e-07, 'completion_length': 50.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03717041015625, 'epoch': 0.77} 77%|███████▋ | 1927/2500 [7:34:52<2:13:57, 14.03s/it] 77%|███████▋ | 1928/2500 [7:35:06<2:15:47, 14.24s/it] {'loss': 0.0012, 'grad_norm': 0.12364580120531177, 'learning_rate': 2.2879999999999998e-07, 'completion_length': 68.21429061889648, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02984619140625, 'epoch': 0.77} 77%|███████▋ | 1928/2500 [7:35:06<2:15:47, 14.24s/it] 77%|███████▋ | 1929/2500 [7:35:22<2:18:04, 14.51s/it] {'loss': 0.0013, 'grad_norm': 1.509562304030711, 'learning_rate': 2.2839999999999998e-07, 'completion_length': 54.07143020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.032958984375, 'epoch': 0.77} 77%|███████▋ | 1929/2500 [7:35:22<2:18:04, 14.51s/it] 77%|███████▋ | 1930/2500 [7:35:35<2:14:51, 14.20s/it] {'loss': 0.0021, 'grad_norm': 1.108275030343899, 'learning_rate': 2.28e-07, 'completion_length': 50.48214530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.051513671875, 'epoch': 0.77} 77%|███████▋ | 1930/2500 [7:35:35<2:14:51, 14.20s/it] 77%|███████▋ | 1931/2500 [7:35:50<2:15:48, 14.32s/it] {'loss': 0.0013, 'grad_norm': 0.0476746118368817, 'learning_rate': 2.2759999999999997e-07, 'completion_length': 62.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03253173828125, 'epoch': 0.77} 77%|███████▋ | 1931/2500 [7:35:50<2:15:48, 14.32s/it] 77%|███████▋ | 1932/2500 [7:36:03<2:13:10, 14.07s/it] {'loss': 0.0005, 'grad_norm': 0.06103584491962964, 'learning_rate': 2.272e-07, 'completion_length': 54.392860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0113067626953125, 'epoch': 0.77} 77%|███████▋ | 1932/2500 [7:36:03<2:13:10, 14.07s/it] 77%|███████▋ | 1933/2500 [7:36:16<2:10:34, 13.82s/it] {'loss': 0.0009, 'grad_norm': 0.0731369146977755, 'learning_rate': 2.268e-07, 'completion_length': 50.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0235595703125, 'epoch': 0.77} 77%|███████▋ | 1933/2500 [7:36:16<2:10:34, 13.82s/it] 77%|███████▋ | 1934/2500 [7:36:31<2:14:09, 14.22s/it] {'loss': 0.0012, 'grad_norm': 0.0870908036496999, 'learning_rate': 2.264e-07, 'completion_length': 59.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02923583984375, 'epoch': 0.77} 77%|███████▋ | 1934/2500 [7:36:31<2:14:09, 14.22s/it] 77%|███████▋ | 1935/2500 [7:36:46<2:14:55, 14.33s/it] {'loss': 0.0008, 'grad_norm': 0.10157607357468222, 'learning_rate': 2.2599999999999999e-07, 'completion_length': 56.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02008056640625, 'epoch': 0.77} 77%|███████▋ | 1935/2500 [7:36:46<2:14:55, 14.33s/it] 77%|███████▋ | 1936/2500 [7:37:01<2:16:54, 14.56s/it] {'loss': 0.0009, 'grad_norm': 0.057628339729365985, 'learning_rate': 2.2559999999999998e-07, 'completion_length': 57.785715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023345947265625, 'epoch': 0.77} 77%|███████▋ | 1936/2500 [7:37:01<2:16:54, 14.56s/it] 77%|███████▋ | 1937/2500 [7:37:14<2:13:04, 14.18s/it] {'loss': 0.0012, 'grad_norm': 0.1062010924039462, 'learning_rate': 2.252e-07, 'completion_length': 55.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02935791015625, 'epoch': 0.77} 77%|███████▋ | 1937/2500 [7:37:14<2:13:04, 14.18s/it] 78%|███████▊ | 1938/2500 [7:37:28<2:10:59, 13.98s/it] {'loss': 0.0016, 'grad_norm': 0.11758706072823473, 'learning_rate': 2.248e-07, 'completion_length': 51.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04071044921875, 'epoch': 0.78} 78%|███████▊ | 1938/2500 [7:37:28<2:10:59, 13.98s/it] 78%|███████▊ | 1939/2500 [7:37:42<2:12:08, 14.13s/it] {'loss': 0.0017, 'grad_norm': 1.8306402120997762, 'learning_rate': 2.2439999999999997e-07, 'completion_length': 59.73214530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0419921875, 'epoch': 0.78} 78%|███████▊ | 1939/2500 [7:37:42<2:12:08, 14.13s/it] 78%|███████▊ | 1940/2500 [7:37:56<2:10:59, 14.04s/it] {'loss': 0.001, 'grad_norm': 0.06188340274886641, 'learning_rate': 2.24e-07, 'completion_length': 55.660715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02484130859375, 'epoch': 0.78} 78%|███████▊ | 1940/2500 [7:37:56<2:10:59, 14.04s/it] 78%|███████▊ | 1941/2500 [7:38:11<2:12:18, 14.20s/it] {'loss': 0.001, 'grad_norm': 0.05629199560801959, 'learning_rate': 2.236e-07, 'completion_length': 55.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0245361328125, 'epoch': 0.78} 78%|███████▊ | 1941/2500 [7:38:11<2:12:18, 14.20s/it] 78%|███████▊ | 1942/2500 [7:38:24<2:10:04, 13.99s/it] {'loss': 0.0014, 'grad_norm': 0.08987885613096637, 'learning_rate': 2.232e-07, 'completion_length': 55.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03472900390625, 'epoch': 0.78} 78%|███████▊ | 1942/2500 [7:38:24<2:10:04, 13.99s/it] 78%|███████▊ | 1943/2500 [7:38:40<2:15:19, 14.58s/it] {'loss': 0.0011, 'grad_norm': 1.7586755868743293, 'learning_rate': 2.2279999999999998e-07, 'completion_length': 58.625003814697266, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02789306640625, 'epoch': 0.78} 78%|███████▊ | 1943/2500 [7:38:40<2:15:19, 14.58s/it] 78%|███████▊ | 1944/2500 [7:38:54<2:12:58, 14.35s/it] {'loss': 0.0006, 'grad_norm': 0.23540096825906856, 'learning_rate': 2.2239999999999998e-07, 'completion_length': 52.55357360839844, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.015289306640625, 'epoch': 0.78} 78%|███████▊ | 1944/2500 [7:38:54<2:12:58, 14.35s/it] 78%|███████▊ | 1945/2500 [7:39:08<2:12:46, 14.35s/it] {'loss': 0.0008, 'grad_norm': 0.043576025186975356, 'learning_rate': 2.22e-07, 'completion_length': 54.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02001953125, 'epoch': 0.78} 78%|███████▊ | 1945/2500 [7:39:08<2:12:46, 14.35s/it] 78%|███████▊ | 1946/2500 [7:39:23<2:11:41, 14.26s/it] {'loss': 0.0012, 'grad_norm': 0.11201465002189054, 'learning_rate': 2.2159999999999997e-07, 'completion_length': 53.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029083251953125, 'epoch': 0.78} 78%|███████▊ | 1946/2500 [7:39:23<2:11:41, 14.26s/it] 78%|███████▊ | 1947/2500 [7:39:37<2:12:19, 14.36s/it] {'loss': 0.0009, 'grad_norm': 0.15592866589914228, 'learning_rate': 2.212e-07, 'completion_length': 66.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022064208984375, 'epoch': 0.78} 78%|███████▊ | 1947/2500 [7:39:37<2:12:19, 14.36s/it] 78%|███████▊ | 1948/2500 [7:39:54<2:18:53, 15.10s/it] {'loss': 0.0013, 'grad_norm': 0.1375060887049792, 'learning_rate': 2.208e-07, 'completion_length': 76.0714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032470703125, 'epoch': 0.78} 78%|███████▊ | 1948/2500 [7:39:54<2:18:53, 15.10s/it] 78%|███████▊ | 1949/2500 [7:40:07<2:14:16, 14.62s/it] {'loss': 0.001, 'grad_norm': 0.07945094811185338, 'learning_rate': 2.2040000000000001e-07, 'completion_length': 55.82143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02410888671875, 'epoch': 0.78} 78%|███████▊ | 1949/2500 [7:40:07<2:14:16, 14.62s/it] 78%|███████▊ | 1950/2500 [7:40:21<2:10:34, 14.25s/it] {'loss': 0.001, 'grad_norm': 0.0724057634221543, 'learning_rate': 2.1999999999999998e-07, 'completion_length': 47.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0242919921875, 'epoch': 0.78} 78%|███████▊ | 1950/2500 [7:40:21<2:10:34, 14.25s/it] 78%|███████▊ | 1951/2500 [7:40:35<2:08:56, 14.09s/it] {'loss': 0.0018, 'grad_norm': 0.05831169546087124, 'learning_rate': 2.1959999999999998e-07, 'completion_length': 54.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0458984375, 'epoch': 0.78} 78%|███████▊ | 1951/2500 [7:40:35<2:08:56, 14.09s/it] 78%|███████▊ | 1952/2500 [7:40:48<2:07:59, 14.01s/it] {'loss': 0.0018, 'grad_norm': 0.06785798299628162, 'learning_rate': 2.192e-07, 'completion_length': 56.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0460205078125, 'epoch': 0.78} 78%|███████▊ | 1952/2500 [7:40:48<2:07:59, 14.01s/it] 78%|███████▊ | 1953/2500 [7:41:03<2:08:48, 14.13s/it] {'loss': 0.0013, 'grad_norm': 0.05472867896491541, 'learning_rate': 2.1879999999999997e-07, 'completion_length': 58.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03289794921875, 'epoch': 0.78} 78%|███████▊ | 1953/2500 [7:41:03<2:08:48, 14.13s/it] 78%|███████▊ | 1954/2500 [7:41:16<2:07:22, 14.00s/it] {'loss': 0.0007, 'grad_norm': 0.052775026551083126, 'learning_rate': 2.184e-07, 'completion_length': 54.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017333984375, 'epoch': 0.78} 78%|███████▊ | 1954/2500 [7:41:16<2:07:22, 14.00s/it] 78%|███████▊ | 1955/2500 [7:41:30<2:06:12, 13.89s/it] {'loss': 0.0009, 'grad_norm': 0.05655341573887514, 'learning_rate': 2.18e-07, 'completion_length': 57.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021575927734375, 'epoch': 0.78} 78%|███████▊ | 1955/2500 [7:41:30<2:06:12, 13.89s/it] 78%|███████▊ | 1956/2500 [7:41:43<2:03:49, 13.66s/it] {'loss': 0.0015, 'grad_norm': 0.06458775160228804, 'learning_rate': 2.176e-07, 'completion_length': 49.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03619384765625, 'epoch': 0.78} 78%|███████▊ | 1956/2500 [7:41:43<2:03:49, 13.66s/it] 78%|███████▊ | 1957/2500 [7:41:56<2:02:22, 13.52s/it] {'loss': 0.001, 'grad_norm': 0.050552476989013134, 'learning_rate': 2.1719999999999999e-07, 'completion_length': 48.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024169921875, 'epoch': 0.78} 78%|███████▊ | 1957/2500 [7:41:56<2:02:22, 13.52s/it] 78%|███████▊ | 1958/2500 [7:42:10<2:02:07, 13.52s/it] {'loss': 0.0018, 'grad_norm': 0.07838277494699018, 'learning_rate': 2.1679999999999998e-07, 'completion_length': 52.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.046142578125, 'epoch': 0.78} 78%|███████▊ | 1958/2500 [7:42:10<2:02:07, 13.52s/it] 78%|███████▊ | 1959/2500 [7:42:24<2:03:25, 13.69s/it] {'loss': 0.0011, 'grad_norm': 0.07617589018344904, 'learning_rate': 2.164e-07, 'completion_length': 59.53571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.02703857421875, 'epoch': 0.78} 78%|███████▊ | 1959/2500 [7:42:24<2:03:25, 13.69s/it] 78%|███████▊ | 1960/2500 [7:42:39<2:05:55, 13.99s/it] {'loss': 0.0012, 'grad_norm': 0.06457269354264393, 'learning_rate': 2.1599999999999998e-07, 'completion_length': 58.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029632568359375, 'epoch': 0.78} 78%|███████▊ | 1960/2500 [7:42:39<2:05:55, 13.99s/it] 78%|███████▊ | 1961/2500 [7:42:53<2:07:30, 14.19s/it] {'loss': 0.0012, 'grad_norm': 0.624530264510639, 'learning_rate': 2.156e-07, 'completion_length': 55.642860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03057861328125, 'epoch': 0.78} 78%|███████▊ | 1961/2500 [7:42:53<2:07:30, 14.19s/it] 78%|███████▊ | 1962/2500 [7:43:12<2:18:53, 15.49s/it] {'loss': 0.0005, 'grad_norm': 0.05003671915725767, 'learning_rate': 2.152e-07, 'completion_length': 66.50000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0118865966796875, 'epoch': 0.78} 78%|███████▊ | 1962/2500 [7:43:12<2:18:53, 15.49s/it] 79%|███████▊ | 1963/2500 [7:43:27<2:18:02, 15.42s/it] {'loss': 0.0008, 'grad_norm': 0.07193431725597305, 'learning_rate': 2.148e-07, 'completion_length': 63.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01898193359375, 'epoch': 0.79} 79%|███████▊ | 1963/2500 [7:43:27<2:18:02, 15.42s/it] 79%|███████▊ | 1964/2500 [7:43:41<2:13:34, 14.95s/it] {'loss': 0.0014, 'grad_norm': 0.08653264423985388, 'learning_rate': 2.144e-07, 'completion_length': 54.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0350341796875, 'epoch': 0.79} 79%|███████▊ | 1964/2500 [7:43:41<2:13:34, 14.95s/it] 79%|███████▊ | 1965/2500 [7:43:55<2:11:33, 14.75s/it] {'loss': 0.0012, 'grad_norm': 0.10809468571518979, 'learning_rate': 2.1399999999999998e-07, 'completion_length': 60.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029388427734375, 'epoch': 0.79} 79%|███████▊ | 1965/2500 [7:43:55<2:11:33, 14.75s/it] 79%|███████▊ | 1966/2500 [7:44:09<2:08:04, 14.39s/it] {'loss': 0.0024, 'grad_norm': 0.06597596360009443, 'learning_rate': 2.136e-07, 'completion_length': 55.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0587158203125, 'epoch': 0.79} 79%|███████▊ | 1966/2500 [7:44:09<2:08:04, 14.39s/it] 79%|███████▊ | 1967/2500 [7:44:23<2:06:49, 14.28s/it] {'loss': 0.002, 'grad_norm': 7.180393899430958, 'learning_rate': 2.132e-07, 'completion_length': 59.678571701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.05023193359375, 'epoch': 0.79} 79%|███████▊ | 1967/2500 [7:44:23<2:06:49, 14.28s/it] 79%|███████▊ | 1968/2500 [7:44:36<2:03:58, 13.98s/it] {'loss': 0.0009, 'grad_norm': 0.11782745338919594, 'learning_rate': 2.1279999999999997e-07, 'completion_length': 53.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0218505859375, 'epoch': 0.79} 79%|███████▊ | 1968/2500 [7:44:36<2:03:58, 13.98s/it] 79%|███████▉ | 1969/2500 [7:44:51<2:05:30, 14.18s/it] {'loss': 0.0012, 'grad_norm': 0.05246535175738851, 'learning_rate': 2.124e-07, 'completion_length': 58.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0294189453125, 'epoch': 0.79} 79%|███████▉ | 1969/2500 [7:44:51<2:05:30, 14.18s/it] 79%|███████▉ | 1970/2500 [7:45:04<2:03:20, 13.96s/it] {'loss': 0.0009, 'grad_norm': 0.05719797661267249, 'learning_rate': 2.12e-07, 'completion_length': 54.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02154541015625, 'epoch': 0.79} 79%|███████▉ | 1970/2500 [7:45:04<2:03:20, 13.96s/it] 79%|███████▉ | 1971/2500 [7:45:18<2:03:09, 13.97s/it] {'loss': 0.001, 'grad_norm': 0.09440367204592605, 'learning_rate': 2.116e-07, 'completion_length': 47.83928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0257568359375, 'epoch': 0.79} 79%|███████▉ | 1971/2500 [7:45:18<2:03:09, 13.97s/it] 79%|███████▉ | 1972/2500 [7:45:32<2:01:57, 13.86s/it] {'loss': 0.0017, 'grad_norm': 0.09806423979885347, 'learning_rate': 2.1119999999999999e-07, 'completion_length': 61.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04150390625, 'epoch': 0.79} 79%|███████▉ | 1972/2500 [7:45:32<2:01:57, 13.86s/it] 79%|███████▉ | 1973/2500 [7:45:46<2:01:59, 13.89s/it] {'loss': 0.0008, 'grad_norm': 0.054676000251436555, 'learning_rate': 2.1079999999999998e-07, 'completion_length': 54.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01910400390625, 'epoch': 0.79} 79%|███████▉ | 1973/2500 [7:45:46<2:01:59, 13.89s/it] 79%|███████▉ | 1974/2500 [7:46:02<2:07:53, 14.59s/it] {'loss': 0.0003, 'grad_norm': 0.05428365968908143, 'learning_rate': 2.104e-07, 'completion_length': 65.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.008026123046875, 'epoch': 0.79} 79%|███████▉ | 1974/2500 [7:46:02<2:07:53, 14.59s/it] 79%|███████▉ | 1975/2500 [7:46:16<2:06:53, 14.50s/it] {'loss': 0.0016, 'grad_norm': 0.06385591118802052, 'learning_rate': 2.0999999999999997e-07, 'completion_length': 61.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0389404296875, 'epoch': 0.79} 79%|███████▉ | 1975/2500 [7:46:16<2:06:53, 14.50s/it] 79%|███████▉ | 1976/2500 [7:46:30<2:05:16, 14.34s/it] {'loss': 0.0014, 'grad_norm': 0.04998819189725579, 'learning_rate': 2.096e-07, 'completion_length': 60.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0341796875, 'epoch': 0.79} 79%|███████▉ | 1976/2500 [7:46:30<2:05:16, 14.34s/it] 79%|███████▉ | 1977/2500 [7:46:44<2:03:13, 14.14s/it] {'loss': 0.0012, 'grad_norm': 0.05638829966743597, 'learning_rate': 2.092e-07, 'completion_length': 55.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02923583984375, 'epoch': 0.79} 79%|███████▉ | 1977/2500 [7:46:44<2:03:13, 14.14s/it] 79%|███████▉ | 1978/2500 [7:46:59<2:05:24, 14.42s/it] {'loss': 0.0006, 'grad_norm': 0.0396251152903268, 'learning_rate': 2.0880000000000002e-07, 'completion_length': 60.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015716552734375, 'epoch': 0.79} 79%|███████▉ | 1978/2500 [7:46:59<2:05:24, 14.42s/it] 79%|███████▉ | 1979/2500 [7:47:12<2:02:39, 14.13s/it] {'loss': 0.0013, 'grad_norm': 1.0732827072658575, 'learning_rate': 2.0839999999999999e-07, 'completion_length': 61.96428680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.032958984375, 'epoch': 0.79} 79%|███████▉ | 1979/2500 [7:47:12<2:02:39, 14.13s/it] 79%|███████▉ | 1980/2500 [7:47:26<2:01:11, 13.98s/it] {'loss': 0.0017, 'grad_norm': 0.1262797708425925, 'learning_rate': 2.0799999999999998e-07, 'completion_length': 54.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0418701171875, 'epoch': 0.79} 79%|███████▉ | 1980/2500 [7:47:26<2:01:11, 13.98s/it] 79%|███████▉ | 1981/2500 [7:47:40<2:00:33, 13.94s/it] {'loss': 0.0012, 'grad_norm': 0.9180145502006944, 'learning_rate': 2.076e-07, 'completion_length': 57.42857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030029296875, 'epoch': 0.79} 79%|███████▉ | 1981/2500 [7:47:40<2:00:33, 13.94s/it] 79%|███████▉ | 1982/2500 [7:47:54<2:00:46, 13.99s/it] {'loss': 0.0008, 'grad_norm': 0.12447674182274977, 'learning_rate': 2.0719999999999998e-07, 'completion_length': 55.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02044677734375, 'epoch': 0.79} 79%|███████▉ | 1982/2500 [7:47:54<2:00:46, 13.99s/it] 79%|███████▉ | 1983/2500 [7:48:08<2:01:34, 14.11s/it] {'loss': 0.0009, 'grad_norm': 0.06770415590286041, 'learning_rate': 2.068e-07, 'completion_length': 50.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0228271484375, 'epoch': 0.79} 79%|███████▉ | 1983/2500 [7:48:08<2:01:34, 14.11s/it] 79%|███████▉ | 1984/2500 [7:48:22<2:00:02, 13.96s/it] {'loss': 0.0007, 'grad_norm': 1.409392007274456, 'learning_rate': 2.064e-07, 'completion_length': 54.55357360839844, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.01739501953125, 'epoch': 0.79} 79%|███████▉ | 1984/2500 [7:48:22<2:00:02, 13.96s/it] 79%|███████▉ | 1985/2500 [7:48:36<1:59:25, 13.91s/it] {'loss': 0.0011, 'grad_norm': 1.2029981762706805, 'learning_rate': 2.06e-07, 'completion_length': 58.267860412597656, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0262451171875, 'epoch': 0.79} 79%|███████▉ | 1985/2500 [7:48:36<1:59:25, 13.91s/it] 79%|███████▉ | 1986/2500 [7:48:50<2:00:05, 14.02s/it] {'loss': 0.0012, 'grad_norm': 0.08168435003563446, 'learning_rate': 2.056e-07, 'completion_length': 60.107147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029571533203125, 'epoch': 0.79} 79%|███████▉ | 1986/2500 [7:48:50<2:00:05, 14.02s/it] 79%|███████▉ | 1987/2500 [7:49:04<1:58:52, 13.90s/it] {'loss': 0.0007, 'grad_norm': 0.07348930316467082, 'learning_rate': 2.0519999999999998e-07, 'completion_length': 56.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01788330078125, 'epoch': 0.79} 79%|███████▉ | 1987/2500 [7:49:04<1:58:52, 13.90s/it] 80%|███████▉ | 1988/2500 [7:49:17<1:57:25, 13.76s/it] {'loss': 0.0012, 'grad_norm': 0.05850135159188962, 'learning_rate': 2.048e-07, 'completion_length': 59.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029571533203125, 'epoch': 0.8} 80%|███████▉ | 1988/2500 [7:49:17<1:57:25, 13.76s/it] 80%|███████▉ | 1989/2500 [7:49:31<1:57:08, 13.75s/it] {'loss': 0.0011, 'grad_norm': 1.5154052773447613, 'learning_rate': 2.0439999999999998e-07, 'completion_length': 50.26785850524902, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02642822265625, 'epoch': 0.8} 80%|███████▉ | 1989/2500 [7:49:31<1:57:08, 13.75s/it] 80%|███████▉ | 1990/2500 [7:49:45<1:57:54, 13.87s/it] {'loss': 0.0013, 'grad_norm': 0.07543989999153422, 'learning_rate': 2.0399999999999997e-07, 'completion_length': 58.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03143310546875, 'epoch': 0.8} 80%|███████▉ | 1990/2500 [7:49:45<1:57:54, 13.87s/it] 80%|███████▉ | 1991/2500 [7:49:59<1:56:31, 13.74s/it] {'loss': 0.001, 'grad_norm': 1.6429493179668133, 'learning_rate': 2.036e-07, 'completion_length': 49.69643211364746, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0257720947265625, 'epoch': 0.8} 80%|███████▉ | 1991/2500 [7:49:59<1:56:31, 13.74s/it] 80%|███████▉ | 1992/2500 [7:50:12<1:55:53, 13.69s/it] {'loss': 0.0017, 'grad_norm': 0.052407419769585586, 'learning_rate': 2.032e-07, 'completion_length': 51.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.041748046875, 'epoch': 0.8} 80%|███████▉ | 1992/2500 [7:50:12<1:55:53, 13.69s/it] 80%|███████▉ | 1993/2500 [7:50:26<1:57:05, 13.86s/it] {'loss': 0.0024, 'grad_norm': 0.08844689612547703, 'learning_rate': 2.028e-07, 'completion_length': 59.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.060302734375, 'epoch': 0.8} 80%|███████▉ | 1993/2500 [7:50:26<1:57:05, 13.86s/it] 80%|███████▉ | 1994/2500 [7:50:41<1:57:55, 13.98s/it] {'loss': 0.0011, 'grad_norm': 0.06894273669895697, 'learning_rate': 2.0239999999999999e-07, 'completion_length': 52.125003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026763916015625, 'epoch': 0.8} 80%|███████▉ | 1994/2500 [7:50:41<1:57:55, 13.98s/it] 80%|███████▉ | 1995/2500 [7:50:56<2:01:00, 14.38s/it] {'loss': 0.0015, 'grad_norm': 0.06442041300844334, 'learning_rate': 2.02e-07, 'completion_length': 67.14286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.037109375, 'epoch': 0.8} 80%|███████▉ | 1995/2500 [7:50:56<2:01:00, 14.38s/it] 80%|███████▉ | 1996/2500 [7:51:10<2:01:11, 14.43s/it] {'loss': 0.0016, 'grad_norm': 2.7854845899119725, 'learning_rate': 2.016e-07, 'completion_length': 56.08928871154785, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.04083251953125, 'epoch': 0.8} 80%|███████▉ | 1996/2500 [7:51:10<2:01:11, 14.43s/it] 80%|███████▉ | 1997/2500 [7:51:25<2:01:31, 14.50s/it] {'loss': 0.0009, 'grad_norm': 0.7549686148510366, 'learning_rate': 2.0119999999999998e-07, 'completion_length': 54.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.022125244140625, 'epoch': 0.8} 80%|███████▉ | 1997/2500 [7:51:25<2:01:31, 14.50s/it] 80%|███████▉ | 1998/2500 [7:51:41<2:03:44, 14.79s/it] {'loss': 0.0009, 'grad_norm': 0.05548854717809375, 'learning_rate': 2.008e-07, 'completion_length': 62.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02264404296875, 'epoch': 0.8} 80%|███████▉ | 1998/2500 [7:51:41<2:03:44, 14.79s/it] 80%|███████▉ | 1999/2500 [7:51:55<2:02:54, 14.72s/it] {'loss': 0.0014, 'grad_norm': 0.07283224935301338, 'learning_rate': 2.004e-07, 'completion_length': 56.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.034912109375, 'epoch': 0.8} 80%|███████▉ | 1999/2500 [7:51:55<2:02:54, 14.72s/it] 80%|████████ | 2000/2500 [7:52:08<1:58:50, 14.26s/it] {'loss': 0.0009, 'grad_norm': 0.11515901585958718, 'learning_rate': 2e-07, 'completion_length': 52.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02264404296875, 'epoch': 0.8} 80%|████████ | 2000/2500 [7:52:08<1:58:50, 14.26s/it] 80%|████████ | 2001/2500 [7:53:21<4:24:10, 31.77s/it] {'loss': 0.0006, 'grad_norm': 0.060016465704614885, 'learning_rate': 1.996e-07, 'completion_length': 51.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015167236328125, 'epoch': 0.8} 80%|████████ | 2001/2500 [7:53:21<4:24:10, 31.77s/it] 80%|████████ | 2002/2500 [7:53:35<3:40:14, 26.53s/it] {'loss': 0.0007, 'grad_norm': 0.05699870989154904, 'learning_rate': 1.9919999999999998e-07, 'completion_length': 54.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01806640625, 'epoch': 0.8} 80%|████████ | 2002/2500 [7:53:35<3:40:14, 26.53s/it] 80%|████████ | 2003/2500 [7:53:50<3:11:34, 23.13s/it] {'loss': 0.0011, 'grad_norm': 0.06481141922509233, 'learning_rate': 1.988e-07, 'completion_length': 60.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028076171875, 'epoch': 0.8} 80%|████████ | 2003/2500 [7:53:50<3:11:34, 23.13s/it] 80%|████████ | 2004/2500 [7:54:03<2:45:56, 20.07s/it] {'loss': 0.0014, 'grad_norm': 5.47544920448142, 'learning_rate': 1.9839999999999998e-07, 'completion_length': 47.48214530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03424072265625, 'epoch': 0.8} 80%|████████ | 2004/2500 [7:54:03<2:45:56, 20.07s/it] 80%|████████ | 2005/2500 [7:54:17<2:29:19, 18.10s/it] {'loss': 0.001, 'grad_norm': 0.11118839225222185, 'learning_rate': 1.98e-07, 'completion_length': 57.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02392578125, 'epoch': 0.8} 80%|████████ | 2005/2500 [7:54:17<2:29:19, 18.10s/it] 80%|████████ | 2006/2500 [7:54:31<2:19:43, 16.97s/it] {'loss': 0.0006, 'grad_norm': 0.08628680152034955, 'learning_rate': 1.976e-07, 'completion_length': 51.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014434814453125, 'epoch': 0.8} 80%|████████ | 2006/2500 [7:54:31<2:19:43, 16.97s/it] 80%|████████ | 2007/2500 [7:54:45<2:10:53, 15.93s/it] {'loss': 0.0008, 'grad_norm': 0.05608843921848965, 'learning_rate': 1.9719999999999997e-07, 'completion_length': 55.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019927978515625, 'epoch': 0.8} 80%|████████ | 2007/2500 [7:54:45<2:10:53, 15.93s/it] 80%|████████ | 2008/2500 [7:54:58<2:04:59, 15.24s/it] {'loss': 0.0006, 'grad_norm': 0.05602487778927404, 'learning_rate': 1.968e-07, 'completion_length': 54.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015045166015625, 'epoch': 0.8} 80%|████████ | 2008/2500 [7:54:58<2:04:59, 15.24s/it] 80%|████████ | 2009/2500 [7:55:12<2:00:51, 14.77s/it] {'loss': 0.0025, 'grad_norm': 0.1728970405638541, 'learning_rate': 1.9639999999999999e-07, 'completion_length': 55.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0618896484375, 'epoch': 0.8} 80%|████████ | 2009/2500 [7:55:12<2:00:51, 14.77s/it] 80%|████████ | 2010/2500 [7:55:28<2:02:43, 15.03s/it] {'loss': 0.0012, 'grad_norm': 0.0755032979693856, 'learning_rate': 1.96e-07, 'completion_length': 61.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0291748046875, 'epoch': 0.8} 80%|████████ | 2010/2500 [7:55:28<2:02:43, 15.03s/it] 80%|████████ | 2011/2500 [7:55:41<1:59:02, 14.61s/it] {'loss': 0.001, 'grad_norm': 0.6844031806193447, 'learning_rate': 1.9559999999999998e-07, 'completion_length': 55.32143020629883, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.025146484375, 'epoch': 0.8} 80%|████████ | 2011/2500 [7:55:41<1:59:02, 14.61s/it] 80%|████████ | 2012/2500 [7:55:55<1:56:42, 14.35s/it] {'loss': 0.001, 'grad_norm': 0.06784454909870068, 'learning_rate': 1.952e-07, 'completion_length': 54.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025848388671875, 'epoch': 0.8} 80%|████████ | 2012/2500 [7:55:55<1:56:42, 14.35s/it] 81%|████████ | 2013/2500 [7:56:08<1:53:36, 14.00s/it] {'loss': 0.0008, 'grad_norm': 0.08491721179148039, 'learning_rate': 1.948e-07, 'completion_length': 54.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019287109375, 'epoch': 0.81} 81%|████████ | 2013/2500 [7:56:08<1:53:36, 14.00s/it] 81%|████████ | 2014/2500 [7:56:22<1:52:51, 13.93s/it] {'loss': 0.0012, 'grad_norm': 0.12817547829710416, 'learning_rate': 1.944e-07, 'completion_length': 56.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0303955078125, 'epoch': 0.81} 81%|████████ | 2014/2500 [7:56:22<1:52:51, 13.93s/it] 81%|████████ | 2015/2500 [7:56:37<1:55:23, 14.28s/it] {'loss': 0.0011, 'grad_norm': 0.07914359609401797, 'learning_rate': 1.94e-07, 'completion_length': 56.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0283203125, 'epoch': 0.81} 81%|████████ | 2015/2500 [7:56:37<1:55:23, 14.28s/it] 81%|████████ | 2016/2500 [7:56:52<1:56:05, 14.39s/it] {'loss': 0.0011, 'grad_norm': 0.054380980217113066, 'learning_rate': 1.9359999999999999e-07, 'completion_length': 55.92857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026458740234375, 'epoch': 0.81} 81%|████████ | 2016/2500 [7:56:52<1:56:05, 14.39s/it] 81%|████████ | 2017/2500 [7:57:06<1:56:38, 14.49s/it] {'loss': 0.0013, 'grad_norm': 0.05756130467598877, 'learning_rate': 1.932e-07, 'completion_length': 60.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03350830078125, 'epoch': 0.81} 81%|████████ | 2017/2500 [7:57:06<1:56:38, 14.49s/it] 81%|████████ | 2018/2500 [7:57:20<1:55:09, 14.33s/it] {'loss': 0.0009, 'grad_norm': 0.07235733405461843, 'learning_rate': 1.9279999999999998e-07, 'completion_length': 57.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02374267578125, 'epoch': 0.81} 81%|████████ | 2018/2500 [7:57:20<1:55:09, 14.33s/it] 81%|████████ | 2019/2500 [7:57:36<1:57:08, 14.61s/it] {'loss': 0.0015, 'grad_norm': 0.06685013148887171, 'learning_rate': 1.9239999999999998e-07, 'completion_length': 58.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0384521484375, 'epoch': 0.81} 81%|████████ | 2019/2500 [7:57:36<1:57:08, 14.61s/it] 81%|████████ | 2020/2500 [7:57:49<1:54:44, 14.34s/it] {'loss': 0.0007, 'grad_norm': 0.15904751934914632, 'learning_rate': 1.92e-07, 'completion_length': 52.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017303466796875, 'epoch': 0.81} 81%|████████ | 2020/2500 [7:57:49<1:54:44, 14.34s/it] 81%|████████ | 2021/2500 [7:58:03<1:52:22, 14.08s/it] {'loss': 0.0004, 'grad_norm': 0.047144330325969375, 'learning_rate': 1.916e-07, 'completion_length': 56.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.00885009765625, 'epoch': 0.81} 81%|████████ | 2021/2500 [7:58:03<1:52:22, 14.08s/it] 81%|████████ | 2022/2500 [7:58:17<1:51:36, 14.01s/it] {'loss': 0.001, 'grad_norm': 0.057638553583339934, 'learning_rate': 1.912e-07, 'completion_length': 49.750003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02386474609375, 'epoch': 0.81} 81%|████████ | 2022/2500 [7:58:17<1:51:36, 14.01s/it] 81%|████████ | 2023/2500 [7:58:31<1:51:42, 14.05s/it] {'loss': 0.001, 'grad_norm': 0.07384195119727129, 'learning_rate': 1.908e-07, 'completion_length': 54.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024169921875, 'epoch': 0.81} 81%|████████ | 2023/2500 [7:58:31<1:51:42, 14.05s/it] 81%|████████ | 2024/2500 [7:58:44<1:50:09, 13.89s/it] {'loss': 0.0014, 'grad_norm': 0.0676949865189673, 'learning_rate': 1.904e-07, 'completion_length': 54.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.035888671875, 'epoch': 0.81} 81%|████████ | 2024/2500 [7:58:44<1:50:09, 13.89s/it] 81%|████████ | 2025/2500 [7:58:58<1:50:04, 13.91s/it] {'loss': 0.0011, 'grad_norm': 1.9087019674790975, 'learning_rate': 1.8999999999999998e-07, 'completion_length': 54.41071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02679443359375, 'epoch': 0.81} 81%|████████ | 2025/2500 [7:58:58<1:50:04, 13.91s/it] 81%|████████ | 2026/2500 [7:59:12<1:50:03, 13.93s/it] {'loss': 0.0013, 'grad_norm': 0.07548273835997794, 'learning_rate': 1.8959999999999998e-07, 'completion_length': 63.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032562255859375, 'epoch': 0.81} 81%|████████ | 2026/2500 [7:59:12<1:50:03, 13.93s/it] 81%|████████ | 2027/2500 [7:59:28<1:53:41, 14.42s/it] {'loss': 0.0005, 'grad_norm': 1.8820114871925395, 'learning_rate': 1.892e-07, 'completion_length': 57.26785850524902, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.01207733154296875, 'epoch': 0.81} 81%|████████ | 2027/2500 [7:59:28<1:53:41, 14.42s/it] 81%|████████ | 2028/2500 [7:59:42<1:52:53, 14.35s/it] {'loss': 0.001, 'grad_norm': 0.0649473309842829, 'learning_rate': 1.888e-07, 'completion_length': 54.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0244140625, 'epoch': 0.81} 81%|████████ | 2028/2500 [7:59:42<1:52:53, 14.35s/it] 81%|████████ | 2029/2500 [7:59:56<1:51:53, 14.25s/it] {'loss': 0.0003, 'grad_norm': 0.07580156172560555, 'learning_rate': 1.884e-07, 'completion_length': 55.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0069427490234375, 'epoch': 0.81} 81%|████████ | 2029/2500 [7:59:56<1:51:53, 14.25s/it] 81%|████████ | 2030/2500 [8:00:10<1:51:24, 14.22s/it] {'loss': 0.001, 'grad_norm': 0.08393694699126487, 'learning_rate': 1.88e-07, 'completion_length': 60.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02508544921875, 'epoch': 0.81} 81%|████████ | 2030/2500 [8:00:10<1:51:24, 14.22s/it] 81%|████████ | 2031/2500 [8:00:23<1:48:36, 13.89s/it] {'loss': 0.001, 'grad_norm': 1.163081855549353, 'learning_rate': 1.8759999999999999e-07, 'completion_length': 53.267860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02398681640625, 'epoch': 0.81} 81%|████████ | 2031/2500 [8:00:23<1:48:36, 13.89s/it] 81%|████████▏ | 2032/2500 [8:00:38<1:49:32, 14.04s/it] {'loss': 0.0011, 'grad_norm': 0.263115901152485, 'learning_rate': 1.872e-07, 'completion_length': 57.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028411865234375, 'epoch': 0.81} 81%|████████▏ | 2032/2500 [8:00:38<1:49:32, 14.04s/it] 81%|████████▏ | 2033/2500 [8:00:51<1:48:14, 13.91s/it] {'loss': 0.0014, 'grad_norm': 0.08438983655797705, 'learning_rate': 1.8679999999999998e-07, 'completion_length': 55.392860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0347900390625, 'epoch': 0.81} 81%|████████▏ | 2033/2500 [8:00:51<1:48:14, 13.91s/it] 81%|████████▏ | 2034/2500 [8:01:05<1:48:04, 13.92s/it] {'loss': 0.0021, 'grad_norm': 1.1950248373103174, 'learning_rate': 1.864e-07, 'completion_length': 56.44643211364746, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0535888671875, 'epoch': 0.81} 81%|████████▏ | 2034/2500 [8:01:05<1:48:04, 13.92s/it] 81%|████████▏ | 2035/2500 [8:01:20<1:48:47, 14.04s/it] {'loss': 0.001, 'grad_norm': 0.0805991007051389, 'learning_rate': 1.86e-07, 'completion_length': 58.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02520751953125, 'epoch': 0.81} 81%|████████▏ | 2035/2500 [8:01:20<1:48:47, 14.04s/it] 81%|████████▏ | 2036/2500 [8:01:34<1:49:56, 14.22s/it] {'loss': 0.0013, 'grad_norm': 0.06648944962004233, 'learning_rate': 1.8559999999999997e-07, 'completion_length': 57.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032958984375, 'epoch': 0.81} 81%|████████▏ | 2036/2500 [8:01:34<1:49:56, 14.22s/it] 81%|████████▏ | 2037/2500 [8:01:49<1:50:21, 14.30s/it] {'loss': 0.0011, 'grad_norm': 0.05755759964067568, 'learning_rate': 1.852e-07, 'completion_length': 58.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0264892578125, 'epoch': 0.81} 81%|████████▏ | 2037/2500 [8:01:49<1:50:21, 14.30s/it] 82%|████████▏ | 2038/2500 [8:02:05<1:55:45, 15.03s/it] {'loss': 0.0013, 'grad_norm': 0.0921188832974156, 'learning_rate': 1.848e-07, 'completion_length': 63.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03131103515625, 'epoch': 0.82} 82%|████████▏ | 2038/2500 [8:02:05<1:55:45, 15.03s/it] 82%|████████▏ | 2039/2500 [8:02:19<1:53:06, 14.72s/it] {'loss': 0.0011, 'grad_norm': 2.37452522242531, 'learning_rate': 1.844e-07, 'completion_length': 54.96428680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02630615234375, 'epoch': 0.82} 82%|████████▏ | 2039/2500 [8:02:19<1:53:06, 14.72s/it] 82%|████████▏ | 2040/2500 [8:02:34<1:51:47, 14.58s/it] {'loss': 0.0013, 'grad_norm': 1.0391916764276068, 'learning_rate': 1.8399999999999998e-07, 'completion_length': 57.142860412597656, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0313720703125, 'epoch': 0.82} 82%|████████▏ | 2040/2500 [8:02:34<1:51:47, 14.58s/it] 82%|████████▏ | 2041/2500 [8:02:47<1:49:01, 14.25s/it] {'loss': 0.0015, 'grad_norm': 0.1473774032254446, 'learning_rate': 1.836e-07, 'completion_length': 54.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03790283203125, 'epoch': 0.82} 82%|████████▏ | 2041/2500 [8:02:47<1:49:01, 14.25s/it] 82%|████████▏ | 2042/2500 [8:03:02<1:49:10, 14.30s/it] {'loss': 0.0014, 'grad_norm': 0.0898960987707345, 'learning_rate': 1.832e-07, 'completion_length': 60.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03582763671875, 'epoch': 0.82} 82%|████████▏ | 2042/2500 [8:03:02<1:49:10, 14.30s/it] 82%|████████▏ | 2043/2500 [8:03:15<1:46:25, 13.97s/it] {'loss': 0.0012, 'grad_norm': 1.6845253200957364, 'learning_rate': 1.8279999999999997e-07, 'completion_length': 49.57143020629883, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0303955078125, 'epoch': 0.82} 82%|████████▏ | 2043/2500 [8:03:15<1:46:25, 13.97s/it] 82%|████████▏ | 2044/2500 [8:03:28<1:45:30, 13.88s/it] {'loss': 0.0013, 'grad_norm': 0.10588571603840029, 'learning_rate': 1.824e-07, 'completion_length': 52.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03375244140625, 'epoch': 0.82} 82%|████████▏ | 2044/2500 [8:03:29<1:45:30, 13.88s/it] 82%|████████▏ | 2045/2500 [8:03:42<1:44:23, 13.77s/it] {'loss': 0.002, 'grad_norm': 0.10901110372231022, 'learning_rate': 1.82e-07, 'completion_length': 51.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.051025390625, 'epoch': 0.82} 82%|████████▏ | 2045/2500 [8:03:42<1:44:23, 13.77s/it] 82%|████████▏ | 2046/2500 [8:03:56<1:45:34, 13.95s/it] {'loss': 0.0018, 'grad_norm': 0.10655261662036399, 'learning_rate': 1.816e-07, 'completion_length': 58.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0440673828125, 'epoch': 0.82} 82%|████████▏ | 2046/2500 [8:03:56<1:45:34, 13.95s/it] 82%|████████▏ | 2047/2500 [8:04:11<1:45:47, 14.01s/it] {'loss': 0.0012, 'grad_norm': 0.07851324816519395, 'learning_rate': 1.8119999999999998e-07, 'completion_length': 57.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02923583984375, 'epoch': 0.82} 82%|████████▏ | 2047/2500 [8:04:11<1:45:47, 14.01s/it] 82%|████████▏ | 2048/2500 [8:04:26<1:47:45, 14.30s/it] {'loss': 0.0007, 'grad_norm': 0.06817756405313406, 'learning_rate': 1.8079999999999998e-07, 'completion_length': 51.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018707275390625, 'epoch': 0.82} 82%|████████▏ | 2048/2500 [8:04:26<1:47:45, 14.30s/it] 82%|████████▏ | 2049/2500 [8:04:40<1:46:49, 14.21s/it] {'loss': 0.0007, 'grad_norm': 0.09200779427815024, 'learning_rate': 1.804e-07, 'completion_length': 56.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01806640625, 'epoch': 0.82} 82%|████████▏ | 2049/2500 [8:04:40<1:46:49, 14.21s/it] 82%|████████▏ | 2050/2500 [8:04:53<1:44:10, 13.89s/it] {'loss': 0.0005, 'grad_norm': 0.06167713965635363, 'learning_rate': 1.8e-07, 'completion_length': 56.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0124053955078125, 'epoch': 0.82} 82%|████████▏ | 2050/2500 [8:04:53<1:44:10, 13.89s/it] 82%|████████▏ | 2051/2500 [8:05:06<1:43:08, 13.78s/it] {'loss': 0.0011, 'grad_norm': 0.08512014055947455, 'learning_rate': 1.796e-07, 'completion_length': 58.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027801513671875, 'epoch': 0.82} 82%|████████▏ | 2051/2500 [8:05:06<1:43:08, 13.78s/it] 82%|████████▏ | 2052/2500 [8:05:20<1:42:37, 13.74s/it] {'loss': 0.0019, 'grad_norm': 0.11626611413646247, 'learning_rate': 1.792e-07, 'completion_length': 53.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0474853515625, 'epoch': 0.82} 82%|████████▏ | 2052/2500 [8:05:20<1:42:37, 13.74s/it] 82%|████████▏ | 2053/2500 [8:05:35<1:44:39, 14.05s/it] {'loss': 0.0017, 'grad_norm': 0.08110482719307155, 'learning_rate': 1.7879999999999999e-07, 'completion_length': 52.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04150390625, 'epoch': 0.82} 82%|████████▏ | 2053/2500 [8:05:35<1:44:39, 14.05s/it] 82%|████████▏ | 2054/2500 [8:05:48<1:42:59, 13.85s/it] {'loss': 0.0005, 'grad_norm': 0.05598555514132886, 'learning_rate': 1.7839999999999998e-07, 'completion_length': 59.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0115966796875, 'epoch': 0.82} 82%|████████▏ | 2054/2500 [8:05:48<1:42:59, 13.85s/it] 82%|████████▏ | 2055/2500 [8:06:03<1:44:46, 14.13s/it] {'loss': 0.0011, 'grad_norm': 0.06101623780630976, 'learning_rate': 1.7799999999999998e-07, 'completion_length': 58.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02789306640625, 'epoch': 0.82} 82%|████████▏ | 2055/2500 [8:06:03<1:44:46, 14.13s/it] 82%|████████▏ | 2056/2500 [8:06:17<1:44:31, 14.12s/it] {'loss': 0.0006, 'grad_norm': 0.05472320658082128, 'learning_rate': 1.776e-07, 'completion_length': 53.875003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013946533203125, 'epoch': 0.82} 82%|████████▏ | 2056/2500 [8:06:17<1:44:31, 14.12s/it] 82%|████████▏ | 2057/2500 [8:06:30<1:42:58, 13.95s/it] {'loss': 0.0013, 'grad_norm': 0.06135171686611695, 'learning_rate': 1.772e-07, 'completion_length': 51.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.033599853515625, 'epoch': 0.82} 82%|████████▏ | 2057/2500 [8:06:30<1:42:58, 13.95s/it] 82%|████████▏ | 2058/2500 [8:06:46<1:46:55, 14.51s/it] {'loss': 0.0021, 'grad_norm': 0.05691900995667633, 'learning_rate': 1.768e-07, 'completion_length': 64.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.052978515625, 'epoch': 0.82} 82%|████████▏ | 2058/2500 [8:06:46<1:46:55, 14.51s/it] 82%|████████▏ | 2059/2500 [8:07:01<1:46:48, 14.53s/it] {'loss': 0.0016, 'grad_norm': 0.9186968145572942, 'learning_rate': 1.764e-07, 'completion_length': 59.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0406494140625, 'epoch': 0.82} 82%|████████▏ | 2059/2500 [8:07:01<1:46:48, 14.53s/it] 82%|████████▏ | 2060/2500 [8:07:15<1:44:56, 14.31s/it] {'loss': 0.001, 'grad_norm': 0.05370051616828978, 'learning_rate': 1.76e-07, 'completion_length': 56.69643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025146484375, 'epoch': 0.82} 82%|████████▏ | 2060/2500 [8:07:15<1:44:56, 14.31s/it] 82%|████████▏ | 2061/2500 [8:07:29<1:44:01, 14.22s/it] {'loss': 0.0011, 'grad_norm': 0.04614080743676179, 'learning_rate': 1.756e-07, 'completion_length': 51.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02691650390625, 'epoch': 0.82} 82%|████████▏ | 2061/2500 [8:07:29<1:44:01, 14.22s/it] 82%|████████▏ | 2062/2500 [8:07:42<1:43:01, 14.11s/it] {'loss': 0.0009, 'grad_norm': 0.08295548342002586, 'learning_rate': 1.7519999999999998e-07, 'completion_length': 53.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02301025390625, 'epoch': 0.82} 82%|████████▏ | 2062/2500 [8:07:42<1:43:01, 14.11s/it] 83%|████████▎ | 2063/2500 [8:07:57<1:43:42, 14.24s/it] {'loss': 0.0014, 'grad_norm': 0.15789359967827266, 'learning_rate': 1.748e-07, 'completion_length': 58.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03607177734375, 'epoch': 0.83} 83%|████████▎ | 2063/2500 [8:07:57<1:43:42, 14.24s/it] 83%|████████▎ | 2064/2500 [8:08:11<1:42:13, 14.07s/it] {'loss': 0.0018, 'grad_norm': 0.1468989389877041, 'learning_rate': 1.744e-07, 'completion_length': 60.714290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0439453125, 'epoch': 0.83} 83%|████████▎ | 2064/2500 [8:08:11<1:42:13, 14.07s/it] 83%|████████▎ | 2065/2500 [8:08:25<1:42:06, 14.08s/it] {'loss': 0.0012, 'grad_norm': 0.07535782917084725, 'learning_rate': 1.7399999999999997e-07, 'completion_length': 63.732147216796875, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03057861328125, 'epoch': 0.83} 83%|████████▎ | 2065/2500 [8:08:25<1:42:06, 14.08s/it] 83%|████████▎ | 2066/2500 [8:08:40<1:44:22, 14.43s/it] {'loss': 0.0019, 'grad_norm': 0.7039876487092134, 'learning_rate': 1.736e-07, 'completion_length': 60.35714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.048095703125, 'epoch': 0.83} 83%|████████▎ | 2066/2500 [8:08:40<1:44:22, 14.43s/it] 83%|████████▎ | 2067/2500 [8:08:57<1:48:32, 15.04s/it] {'loss': 0.0016, 'grad_norm': 0.05882233078691127, 'learning_rate': 1.732e-07, 'completion_length': 60.875003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0389404296875, 'epoch': 0.83} 83%|████████▎ | 2067/2500 [8:08:57<1:48:32, 15.04s/it] 83%|████████▎ | 2068/2500 [8:09:11<1:46:18, 14.76s/it] {'loss': 0.0011, 'grad_norm': 0.08751999282828966, 'learning_rate': 1.728e-07, 'completion_length': 52.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02874755859375, 'epoch': 0.83} 83%|████████▎ | 2068/2500 [8:09:11<1:46:18, 14.76s/it] 83%|████████▎ | 2069/2500 [8:09:25<1:44:55, 14.61s/it] {'loss': 0.0013, 'grad_norm': 0.08476640663955405, 'learning_rate': 1.7239999999999998e-07, 'completion_length': 53.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0316162109375, 'epoch': 0.83} 83%|████████▎ | 2069/2500 [8:09:25<1:44:55, 14.61s/it] 83%|████████▎ | 2070/2500 [8:09:38<1:42:06, 14.25s/it] {'loss': 0.0012, 'grad_norm': 0.06076873703456928, 'learning_rate': 1.7199999999999998e-07, 'completion_length': 52.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03131103515625, 'epoch': 0.83} 83%|████████▎ | 2070/2500 [8:09:38<1:42:06, 14.25s/it] 83%|████████▎ | 2071/2500 [8:09:53<1:42:15, 14.30s/it] {'loss': 0.0006, 'grad_norm': 0.07331703715415945, 'learning_rate': 1.716e-07, 'completion_length': 62.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01495361328125, 'epoch': 0.83} 83%|████████▎ | 2071/2500 [8:09:53<1:42:15, 14.30s/it] 83%|████████▎ | 2072/2500 [8:10:06<1:40:20, 14.07s/it] {'loss': 0.0005, 'grad_norm': 0.12690282745934783, 'learning_rate': 1.7119999999999997e-07, 'completion_length': 51.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0135498046875, 'epoch': 0.83} 83%|████████▎ | 2072/2500 [8:10:06<1:40:20, 14.07s/it] 83%|████████▎ | 2073/2500 [8:10:19<1:37:12, 13.66s/it] {'loss': 0.0011, 'grad_norm': 0.051956769942487366, 'learning_rate': 1.708e-07, 'completion_length': 48.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02862548828125, 'epoch': 0.83} 83%|████████▎ | 2073/2500 [8:10:19<1:37:12, 13.66s/it] 83%|████████▎ | 2074/2500 [8:10:33<1:38:46, 13.91s/it] {'loss': 0.001, 'grad_norm': 0.11468348360852222, 'learning_rate': 1.704e-07, 'completion_length': 56.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025634765625, 'epoch': 0.83} 83%|████████▎ | 2074/2500 [8:10:33<1:38:46, 13.91s/it] 83%|████████▎ | 2075/2500 [8:10:48<1:39:18, 14.02s/it] {'loss': 0.0015, 'grad_norm': 0.06919965748501235, 'learning_rate': 1.7000000000000001e-07, 'completion_length': 56.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0364990234375, 'epoch': 0.83} 83%|████████▎ | 2075/2500 [8:10:48<1:39:18, 14.02s/it] 83%|████████▎ | 2076/2500 [8:11:03<1:41:03, 14.30s/it] {'loss': 0.0007, 'grad_norm': 0.07314589229933913, 'learning_rate': 1.6959999999999998e-07, 'completion_length': 62.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018218994140625, 'epoch': 0.83} 83%|████████▎ | 2076/2500 [8:11:03<1:41:03, 14.30s/it] 83%|████████▎ | 2077/2500 [8:11:16<1:38:03, 13.91s/it] {'loss': 0.001, 'grad_norm': 0.1145338163195673, 'learning_rate': 1.6919999999999998e-07, 'completion_length': 45.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02410888671875, 'epoch': 0.83} 83%|████████▎ | 2077/2500 [8:11:16<1:38:03, 13.91s/it] 83%|████████▎ | 2078/2500 [8:11:30<1:38:15, 13.97s/it] {'loss': 0.0005, 'grad_norm': 0.053782618781289926, 'learning_rate': 1.688e-07, 'completion_length': 58.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01348876953125, 'epoch': 0.83} 83%|████████▎ | 2078/2500 [8:11:30<1:38:15, 13.97s/it] 83%|████████▎ | 2079/2500 [8:11:44<1:38:30, 14.04s/it] {'loss': 0.0012, 'grad_norm': 0.06695183393560819, 'learning_rate': 1.684e-07, 'completion_length': 59.000003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02899169921875, 'epoch': 0.83} 83%|████████▎ | 2079/2500 [8:11:44<1:38:30, 14.04s/it] 83%|████████▎ | 2080/2500 [8:11:58<1:39:08, 14.16s/it] {'loss': 0.0007, 'grad_norm': 0.04747076762556126, 'learning_rate': 1.68e-07, 'completion_length': 50.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0177001953125, 'epoch': 0.83} 83%|████████▎ | 2080/2500 [8:11:58<1:39:08, 14.16s/it] 83%|████████▎ | 2081/2500 [8:12:13<1:38:46, 14.14s/it] {'loss': 0.0017, 'grad_norm': 0.09808337587617716, 'learning_rate': 1.676e-07, 'completion_length': 53.76785850524902, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.04296875, 'epoch': 0.83} 83%|████████▎ | 2081/2500 [8:12:13<1:38:46, 14.14s/it] 83%|████████▎ | 2082/2500 [8:12:26<1:38:06, 14.08s/it] {'loss': 0.001, 'grad_norm': 0.10232653595910293, 'learning_rate': 1.672e-07, 'completion_length': 50.910715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024627685546875, 'epoch': 0.83} 83%|████████▎ | 2082/2500 [8:12:26<1:38:06, 14.08s/it] 83%|████████▎ | 2083/2500 [8:12:40<1:35:52, 13.79s/it] {'loss': 0.001, 'grad_norm': 0.051364377600629424, 'learning_rate': 1.6679999999999998e-07, 'completion_length': 51.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025146484375, 'epoch': 0.83} 83%|████████▎ | 2083/2500 [8:12:40<1:35:52, 13.79s/it] 83%|████████▎ | 2084/2500 [8:12:53<1:35:26, 13.77s/it] {'loss': 0.0014, 'grad_norm': 0.05529869375763711, 'learning_rate': 1.6639999999999998e-07, 'completion_length': 56.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03509521484375, 'epoch': 0.83} 83%|████████▎ | 2084/2500 [8:12:53<1:35:26, 13.77s/it] 83%|████████▎ | 2085/2500 [8:13:06<1:33:13, 13.48s/it] {'loss': 0.0004, 'grad_norm': 0.3103862968433045, 'learning_rate': 1.66e-07, 'completion_length': 52.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.011016845703125, 'epoch': 0.83} 83%|████████▎ | 2085/2500 [8:13:06<1:33:13, 13.48s/it] 83%|████████▎ | 2086/2500 [8:13:21<1:36:13, 13.95s/it] {'loss': 0.0014, 'grad_norm': 0.06524866201920224, 'learning_rate': 1.656e-07, 'completion_length': 57.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0355224609375, 'epoch': 0.83} 83%|████████▎ | 2086/2500 [8:13:21<1:36:13, 13.95s/it] 83%|████████▎ | 2087/2500 [8:13:36<1:37:08, 14.11s/it] {'loss': 0.0011, 'grad_norm': 0.04138035959548164, 'learning_rate': 1.652e-07, 'completion_length': 58.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.83} 83%|████████▎ | 2087/2500 [8:13:36<1:37:08, 14.11s/it] 84%|████████▎ | 2088/2500 [8:13:55<1:46:45, 15.55s/it] {'loss': 0.0013, 'grad_norm': 0.052322752954013896, 'learning_rate': 1.648e-07, 'completion_length': 65.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.84} 84%|████████▎ | 2088/2500 [8:13:55<1:46:45, 15.55s/it] 84%|████████▎ | 2089/2500 [8:14:10<1:46:05, 15.49s/it] {'loss': 0.0011, 'grad_norm': 0.07490768624723675, 'learning_rate': 1.644e-07, 'completion_length': 59.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028350830078125, 'epoch': 0.84} 84%|████████▎ | 2089/2500 [8:14:10<1:46:05, 15.49s/it] 84%|████████▎ | 2090/2500 [8:14:25<1:45:01, 15.37s/it] {'loss': 0.0008, 'grad_norm': 0.05508697531949601, 'learning_rate': 1.64e-07, 'completion_length': 51.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020843505859375, 'epoch': 0.84} 84%|████████▎ | 2090/2500 [8:14:25<1:45:01, 15.37s/it] 84%|████████▎ | 2091/2500 [8:14:39<1:42:12, 14.99s/it] {'loss': 0.001, 'grad_norm': 0.06170584632364958, 'learning_rate': 1.6359999999999998e-07, 'completion_length': 66.48214721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02410888671875, 'epoch': 0.84} 84%|████████▎ | 2091/2500 [8:14:39<1:42:12, 14.99s/it] 84%|████████▎ | 2092/2500 [8:14:54<1:40:47, 14.82s/it] {'loss': 0.0012, 'grad_norm': 0.06699103405487972, 'learning_rate': 1.632e-07, 'completion_length': 57.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03021240234375, 'epoch': 0.84} 84%|████████▎ | 2092/2500 [8:14:54<1:40:47, 14.82s/it] 84%|████████▎ | 2093/2500 [8:15:08<1:40:18, 14.79s/it] {'loss': 0.0007, 'grad_norm': 0.06356085593626261, 'learning_rate': 1.628e-07, 'completion_length': 58.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01702880859375, 'epoch': 0.84} 84%|████████▎ | 2093/2500 [8:15:08<1:40:18, 14.79s/it] 84%|████████▍ | 2094/2500 [8:15:21<1:36:33, 14.27s/it] {'loss': 0.0005, 'grad_norm': 0.11124833263228079, 'learning_rate': 1.6239999999999997e-07, 'completion_length': 54.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0136871337890625, 'epoch': 0.84} 84%|████████▍ | 2094/2500 [8:15:21<1:36:33, 14.27s/it] 84%|████████▍ | 2095/2500 [8:15:39<1:43:12, 15.29s/it] {'loss': 0.0008, 'grad_norm': 0.08414639422310406, 'learning_rate': 1.62e-07, 'completion_length': 50.892860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0198974609375, 'epoch': 0.84} 84%|████████▍ | 2095/2500 [8:15:39<1:43:12, 15.29s/it] 84%|████████▍ | 2096/2500 [8:15:54<1:42:25, 15.21s/it] {'loss': 0.0021, 'grad_norm': 0.05397292583729733, 'learning_rate': 1.616e-07, 'completion_length': 65.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0518798828125, 'epoch': 0.84} 84%|████████▍ | 2096/2500 [8:15:54<1:42:25, 15.21s/it] 84%|████████▍ | 2097/2500 [8:16:12<1:47:47, 16.05s/it] {'loss': 0.0009, 'grad_norm': 0.06865540930171916, 'learning_rate': 1.6120000000000001e-07, 'completion_length': 67.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0228271484375, 'epoch': 0.84} 84%|████████▍ | 2097/2500 [8:16:12<1:47:47, 16.05s/it] 84%|████████▍ | 2098/2500 [8:16:28<1:47:49, 16.09s/it] {'loss': 0.0013, 'grad_norm': 0.7538685014930925, 'learning_rate': 1.6079999999999998e-07, 'completion_length': 56.00000190734863, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.0325927734375, 'epoch': 0.84} 84%|████████▍ | 2098/2500 [8:16:28<1:47:49, 16.09s/it] 84%|████████▍ | 2099/2500 [8:16:41<1:41:24, 15.17s/it] {'loss': 0.0013, 'grad_norm': 0.04947932806620339, 'learning_rate': 1.6039999999999998e-07, 'completion_length': 53.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.84} 84%|████████▍ | 2099/2500 [8:16:41<1:41:24, 15.17s/it] 84%|████████▍ | 2100/2500 [8:16:56<1:40:27, 15.07s/it] {'loss': 0.0004, 'grad_norm': 0.04657526737424761, 'learning_rate': 1.6e-07, 'completion_length': 64.16071701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0092010498046875, 'epoch': 0.84} 84%|████████▍ | 2100/2500 [8:16:56<1:40:27, 15.07s/it] 84%|████████▍ | 2101/2500 [8:18:08<3:32:59, 32.03s/it] {'loss': 0.0005, 'grad_norm': 0.05378376702216442, 'learning_rate': 1.5959999999999997e-07, 'completion_length': 61.10714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.012664794921875, 'epoch': 0.84} 84%|████████▍ | 2101/2500 [8:18:08<3:32:59, 32.03s/it] 84%|████████▍ | 2102/2500 [8:18:21<2:55:03, 26.39s/it] {'loss': 0.0009, 'grad_norm': 0.06450061481931338, 'learning_rate': 1.592e-07, 'completion_length': 52.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02294921875, 'epoch': 0.84} 84%|████████▍ | 2102/2500 [8:18:21<2:55:03, 26.39s/it] 84%|████████▍ | 2103/2500 [8:18:34<2:29:06, 22.54s/it] {'loss': 0.001, 'grad_norm': 0.20862440235980012, 'learning_rate': 1.588e-07, 'completion_length': 59.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025634765625, 'epoch': 0.84} 84%|████████▍ | 2103/2500 [8:18:34<2:29:06, 22.54s/it] 84%|████████▍ | 2104/2500 [8:18:49<2:12:53, 20.13s/it] {'loss': 0.001, 'grad_norm': 0.04566495376957443, 'learning_rate': 1.5840000000000002e-07, 'completion_length': 61.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02496337890625, 'epoch': 0.84} 84%|████████▍ | 2104/2500 [8:18:49<2:12:53, 20.13s/it] 84%|████████▍ | 2105/2500 [8:19:04<2:02:17, 18.58s/it] {'loss': 0.001, 'grad_norm': 0.05305886202537494, 'learning_rate': 1.5799999999999999e-07, 'completion_length': 62.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0240478515625, 'epoch': 0.84} 84%|████████▍ | 2105/2500 [8:19:04<2:02:17, 18.58s/it] 84%|████████▍ | 2106/2500 [8:19:17<1:52:07, 17.07s/it] {'loss': 0.0013, 'grad_norm': 0.053337153225033275, 'learning_rate': 1.5759999999999998e-07, 'completion_length': 52.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031494140625, 'epoch': 0.84} 84%|████████▍ | 2106/2500 [8:19:17<1:52:07, 17.07s/it] 84%|████████▍ | 2107/2500 [8:19:32<1:47:18, 16.38s/it] {'loss': 0.0009, 'grad_norm': 1.832630709526247, 'learning_rate': 1.572e-07, 'completion_length': 56.160715103149414, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0223388671875, 'epoch': 0.84} 84%|████████▍ | 2107/2500 [8:19:32<1:47:18, 16.38s/it] 84%|████████▍ | 2108/2500 [8:19:47<1:43:08, 15.79s/it] {'loss': 0.0004, 'grad_norm': 0.08541357652327726, 'learning_rate': 1.5679999999999997e-07, 'completion_length': 62.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.010650634765625, 'epoch': 0.84} 84%|████████▍ | 2108/2500 [8:19:47<1:43:08, 15.79s/it] 84%|████████▍ | 2109/2500 [8:20:00<1:38:41, 15.15s/it] {'loss': 0.002, 'grad_norm': 0.08536353658771234, 'learning_rate': 1.564e-07, 'completion_length': 57.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.05126953125, 'epoch': 0.84} 84%|████████▍ | 2109/2500 [8:20:00<1:38:41, 15.15s/it] 84%|████████▍ | 2110/2500 [8:20:15<1:37:20, 14.98s/it] {'loss': 0.0015, 'grad_norm': 0.09776140319889062, 'learning_rate': 1.56e-07, 'completion_length': 59.107147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03662109375, 'epoch': 0.84} 84%|████████▍ | 2110/2500 [8:20:15<1:37:20, 14.98s/it] 84%|████████▍ | 2111/2500 [8:20:29<1:34:34, 14.59s/it] {'loss': 0.0012, 'grad_norm': 0.031942820441478705, 'learning_rate': 1.556e-07, 'completion_length': 51.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02972412109375, 'epoch': 0.84} 84%|████████▍ | 2111/2500 [8:20:29<1:34:34, 14.59s/it] 84%|████████▍ | 2112/2500 [8:20:42<1:32:53, 14.37s/it] {'loss': 0.0008, 'grad_norm': 0.044366100132993934, 'learning_rate': 1.552e-07, 'completion_length': 53.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02056884765625, 'epoch': 0.84} 84%|████████▍ | 2112/2500 [8:20:42<1:32:53, 14.37s/it] 85%|████████▍ | 2113/2500 [8:20:57<1:32:45, 14.38s/it] {'loss': 0.0013, 'grad_norm': 0.13875052827119144, 'learning_rate': 1.5479999999999998e-07, 'completion_length': 66.8035774230957, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03143310546875, 'epoch': 0.85} 85%|████████▍ | 2113/2500 [8:20:57<1:32:45, 14.38s/it] 85%|████████▍ | 2114/2500 [8:21:11<1:31:23, 14.21s/it] {'loss': 0.0008, 'grad_norm': 0.05507456442241036, 'learning_rate': 1.544e-07, 'completion_length': 56.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02105712890625, 'epoch': 0.85} 85%|████████▍ | 2114/2500 [8:21:11<1:31:23, 14.21s/it] 85%|████████▍ | 2115/2500 [8:21:26<1:32:32, 14.42s/it] {'loss': 0.0008, 'grad_norm': 0.07905358263221914, 'learning_rate': 1.54e-07, 'completion_length': 55.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0201416015625, 'epoch': 0.85} 85%|████████▍ | 2115/2500 [8:21:26<1:32:32, 14.42s/it] 85%|████████▍ | 2116/2500 [8:21:40<1:31:52, 14.36s/it] {'loss': 0.0011, 'grad_norm': 0.04223604815192293, 'learning_rate': 1.5359999999999997e-07, 'completion_length': 52.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0279541015625, 'epoch': 0.85} 85%|████████▍ | 2116/2500 [8:21:40<1:31:52, 14.36s/it] 85%|████████▍ | 2117/2500 [8:21:53<1:30:09, 14.12s/it] {'loss': 0.0007, 'grad_norm': 0.06367860349706228, 'learning_rate': 1.532e-07, 'completion_length': 62.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01751708984375, 'epoch': 0.85} 85%|████████▍ | 2117/2500 [8:21:53<1:30:09, 14.12s/it] 85%|████████▍ | 2118/2500 [8:22:06<1:27:43, 13.78s/it] {'loss': 0.0013, 'grad_norm': 0.07887544880630061, 'learning_rate': 1.528e-07, 'completion_length': 48.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0328369140625, 'epoch': 0.85} 85%|████████▍ | 2118/2500 [8:22:06<1:27:43, 13.78s/it] 85%|████████▍ | 2119/2500 [8:22:20<1:28:08, 13.88s/it] {'loss': 0.0014, 'grad_norm': 0.08533206709800313, 'learning_rate': 1.524e-07, 'completion_length': 55.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.034423828125, 'epoch': 0.85} 85%|████████▍ | 2119/2500 [8:22:20<1:28:08, 13.88s/it] 85%|████████▍ | 2120/2500 [8:22:34<1:26:40, 13.68s/it] {'loss': 0.0011, 'grad_norm': 0.053837263361159605, 'learning_rate': 1.5199999999999998e-07, 'completion_length': 54.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02740478515625, 'epoch': 0.85} 85%|████████▍ | 2120/2500 [8:22:34<1:26:40, 13.68s/it] 85%|████████▍ | 2121/2500 [8:22:47<1:26:01, 13.62s/it] {'loss': 0.001, 'grad_norm': 0.055045050827776006, 'learning_rate': 1.516e-07, 'completion_length': 52.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0244140625, 'epoch': 0.85} 85%|████████▍ | 2121/2500 [8:22:47<1:26:01, 13.62s/it] 85%|████████▍ | 2122/2500 [8:23:06<1:36:06, 15.26s/it] {'loss': 0.0011, 'grad_norm': 0.541611550821805, 'learning_rate': 1.512e-07, 'completion_length': 60.73214340209961, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.027130126953125, 'epoch': 0.85} 85%|████████▍ | 2122/2500 [8:23:06<1:36:06, 15.26s/it] 85%|████████▍ | 2123/2500 [8:23:21<1:34:42, 15.07s/it] {'loss': 0.0006, 'grad_norm': 0.08898049385730673, 'learning_rate': 1.5079999999999997e-07, 'completion_length': 55.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014251708984375, 'epoch': 0.85} 85%|████████▍ | 2123/2500 [8:23:21<1:34:42, 15.07s/it] 85%|████████▍ | 2124/2500 [8:23:36<1:34:30, 15.08s/it] {'loss': 0.0004, 'grad_norm': 0.05842668025022377, 'learning_rate': 1.504e-07, 'completion_length': 56.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01068115234375, 'epoch': 0.85} 85%|████████▍ | 2124/2500 [8:23:36<1:34:30, 15.08s/it] 85%|████████▌ | 2125/2500 [8:23:50<1:31:30, 14.64s/it] {'loss': 0.0006, 'grad_norm': 1.3488028575335842, 'learning_rate': 1.5e-07, 'completion_length': 56.142860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.01605224609375, 'epoch': 0.85} 85%|████████▌ | 2125/2500 [8:23:50<1:31:30, 14.64s/it] 85%|████████▌ | 2126/2500 [8:24:04<1:31:01, 14.60s/it] {'loss': 0.0006, 'grad_norm': 0.19209799356899, 'learning_rate': 1.4960000000000002e-07, 'completion_length': 60.67857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016204833984375, 'epoch': 0.85} 85%|████████▌ | 2126/2500 [8:24:04<1:31:01, 14.60s/it] 85%|████████▌ | 2127/2500 [8:24:19<1:31:07, 14.66s/it] {'loss': 0.0014, 'grad_norm': 0.0717966934981999, 'learning_rate': 1.4919999999999999e-07, 'completion_length': 53.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03436279296875, 'epoch': 0.85} 85%|████████▌ | 2127/2500 [8:24:19<1:31:07, 14.66s/it] 85%|████████▌ | 2128/2500 [8:24:33<1:29:31, 14.44s/it] {'loss': 0.0012, 'grad_norm': 0.0669138369948112, 'learning_rate': 1.4879999999999998e-07, 'completion_length': 57.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02960205078125, 'epoch': 0.85} 85%|████████▌ | 2128/2500 [8:24:33<1:29:31, 14.44s/it] 85%|████████▌ | 2129/2500 [8:24:46<1:26:09, 13.93s/it] {'loss': 0.0011, 'grad_norm': 0.07314722575792566, 'learning_rate': 1.484e-07, 'completion_length': 48.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.85} 85%|████████▌ | 2129/2500 [8:24:46<1:26:09, 13.93s/it] 85%|████████▌ | 2130/2500 [8:24:59<1:25:28, 13.86s/it] {'loss': 0.001, 'grad_norm': 0.06955653065657146, 'learning_rate': 1.4799999999999998e-07, 'completion_length': 58.78571891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026092529296875, 'epoch': 0.85} 85%|████████▌ | 2130/2500 [8:24:59<1:25:28, 13.86s/it] 85%|████████▌ | 2131/2500 [8:25:13<1:24:38, 13.76s/it] {'loss': 0.001, 'grad_norm': 0.07845854932882071, 'learning_rate': 1.476e-07, 'completion_length': 54.96428871154785, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.024169921875, 'epoch': 0.85} 85%|████████▌ | 2131/2500 [8:25:13<1:24:38, 13.76s/it] 85%|████████▌ | 2132/2500 [8:25:27<1:24:26, 13.77s/it] {'loss': 0.001, 'grad_norm': 0.050238667917304955, 'learning_rate': 1.472e-07, 'completion_length': 58.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0242919921875, 'epoch': 0.85} 85%|████████▌ | 2132/2500 [8:25:27<1:24:26, 13.77s/it] 85%|████████▌ | 2133/2500 [8:25:40<1:24:24, 13.80s/it] {'loss': 0.0009, 'grad_norm': 0.059285050087957966, 'learning_rate': 1.4680000000000002e-07, 'completion_length': 58.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022430419921875, 'epoch': 0.85} 85%|████████▌ | 2133/2500 [8:25:40<1:24:24, 13.80s/it] 85%|████████▌ | 2134/2500 [8:25:54<1:24:11, 13.80s/it] {'loss': 0.0013, 'grad_norm': 2.383795691259048, 'learning_rate': 1.464e-07, 'completion_length': 51.142860412597656, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03271484375, 'epoch': 0.85} 85%|████████▌ | 2134/2500 [8:25:54<1:24:11, 13.80s/it] 85%|████████▌ | 2135/2500 [8:26:09<1:25:18, 14.02s/it] {'loss': 0.0007, 'grad_norm': 0.056494125947406784, 'learning_rate': 1.4599999999999998e-07, 'completion_length': 54.82143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016448974609375, 'epoch': 0.85} 85%|████████▌ | 2135/2500 [8:26:09<1:25:18, 14.02s/it] 85%|████████▌ | 2136/2500 [8:26:23<1:25:32, 14.10s/it] {'loss': 0.0011, 'grad_norm': 0.045490241064835193, 'learning_rate': 1.456e-07, 'completion_length': 56.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0279541015625, 'epoch': 0.85} 85%|████████▌ | 2136/2500 [8:26:23<1:25:32, 14.10s/it] 85%|████████▌ | 2137/2500 [8:26:36<1:23:58, 13.88s/it] {'loss': 0.0012, 'grad_norm': 0.07470698259785044, 'learning_rate': 1.4519999999999998e-07, 'completion_length': 51.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0308837890625, 'epoch': 0.85} 85%|████████▌ | 2137/2500 [8:26:36<1:23:58, 13.88s/it] 86%|████████▌ | 2138/2500 [8:26:50<1:23:39, 13.87s/it] {'loss': 0.0015, 'grad_norm': 2.6985599038962618, 'learning_rate': 1.448e-07, 'completion_length': 44.46428680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0364990234375, 'epoch': 0.86} 86%|████████▌ | 2138/2500 [8:26:50<1:23:39, 13.87s/it] 86%|████████▌ | 2139/2500 [8:27:04<1:23:07, 13.81s/it] {'loss': 0.0011, 'grad_norm': 0.24152695163367452, 'learning_rate': 1.444e-07, 'completion_length': 51.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.86} 86%|████████▌ | 2139/2500 [8:27:04<1:23:07, 13.81s/it] 86%|████████▌ | 2140/2500 [8:27:19<1:24:24, 14.07s/it] {'loss': 0.0011, 'grad_norm': 0.06832306626798375, 'learning_rate': 1.44e-07, 'completion_length': 53.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02740478515625, 'epoch': 0.86} 86%|████████▌ | 2140/2500 [8:27:19<1:24:24, 14.07s/it] 86%|████████▌ | 2141/2500 [8:27:33<1:24:38, 14.15s/it] {'loss': 0.0013, 'grad_norm': 0.06308446337333376, 'learning_rate': 1.436e-07, 'completion_length': 63.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03192138671875, 'epoch': 0.86} 86%|████████▌ | 2141/2500 [8:27:33<1:24:38, 14.15s/it] 86%|████████▌ | 2142/2500 [8:27:46<1:22:55, 13.90s/it] {'loss': 0.0006, 'grad_norm': 0.06607319800432135, 'learning_rate': 1.4319999999999999e-07, 'completion_length': 52.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015838623046875, 'epoch': 0.86} 86%|████████▌ | 2142/2500 [8:27:46<1:22:55, 13.90s/it] 86%|████████▌ | 2143/2500 [8:28:01<1:24:23, 14.18s/it] {'loss': 0.001, 'grad_norm': 0.07195022743310882, 'learning_rate': 1.428e-07, 'completion_length': 58.69643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024169921875, 'epoch': 0.86} 86%|████████▌ | 2143/2500 [8:28:01<1:24:23, 14.18s/it] 86%|████████▌ | 2144/2500 [8:28:16<1:25:44, 14.45s/it] {'loss': 0.0006, 'grad_norm': 0.047333668847377657, 'learning_rate': 1.424e-07, 'completion_length': 63.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015594482421875, 'epoch': 0.86} 86%|████████▌ | 2144/2500 [8:28:16<1:25:44, 14.45s/it] 86%|████████▌ | 2145/2500 [8:28:29<1:23:33, 14.12s/it] {'loss': 0.0012, 'grad_norm': 0.07348493981958211, 'learning_rate': 1.4199999999999997e-07, 'completion_length': 48.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029388427734375, 'epoch': 0.86} 86%|████████▌ | 2145/2500 [8:28:29<1:23:33, 14.12s/it] 86%|████████▌ | 2146/2500 [8:28:44<1:23:47, 14.20s/it] {'loss': 0.0014, 'grad_norm': 0.08497097101553364, 'learning_rate': 1.416e-07, 'completion_length': 55.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0361328125, 'epoch': 0.86} 86%|████████▌ | 2146/2500 [8:28:44<1:23:47, 14.20s/it] 86%|████████▌ | 2147/2500 [8:28:59<1:24:41, 14.39s/it] {'loss': 0.0012, 'grad_norm': 0.08855074711356078, 'learning_rate': 1.412e-07, 'completion_length': 61.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03070068359375, 'epoch': 0.86} 86%|████████▌ | 2147/2500 [8:28:59<1:24:41, 14.39s/it] 86%|████████▌ | 2148/2500 [8:29:13<1:24:28, 14.40s/it] {'loss': 0.0012, 'grad_norm': 0.07330392993363113, 'learning_rate': 1.408e-07, 'completion_length': 53.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0294189453125, 'epoch': 0.86} 86%|████████▌ | 2148/2500 [8:29:13<1:24:28, 14.40s/it] 86%|████████▌ | 2149/2500 [8:29:33<1:34:06, 16.09s/it] {'loss': 0.0006, 'grad_norm': 0.4417694731561179, 'learning_rate': 1.4039999999999999e-07, 'completion_length': 65.00000190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.016021728515625, 'epoch': 0.86} 86%|████████▌ | 2149/2500 [8:29:33<1:34:06, 16.09s/it] 86%|████████▌ | 2150/2500 [8:29:46<1:28:56, 15.25s/it] {'loss': 0.0011, 'grad_norm': 0.07329507233822743, 'learning_rate': 1.4e-07, 'completion_length': 53.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026763916015625, 'epoch': 0.86} 86%|████████▌ | 2150/2500 [8:29:46<1:28:56, 15.25s/it] 86%|████████▌ | 2151/2500 [8:30:01<1:27:09, 14.99s/it] {'loss': 0.0012, 'grad_norm': 0.09740890825602196, 'learning_rate': 1.396e-07, 'completion_length': 60.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02947998046875, 'epoch': 0.86} 86%|████████▌ | 2151/2500 [8:30:01<1:27:09, 14.99s/it] 86%|████████▌ | 2152/2500 [8:30:15<1:25:06, 14.67s/it] {'loss': 0.0007, 'grad_norm': 0.04502429944067633, 'learning_rate': 1.3919999999999998e-07, 'completion_length': 55.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0179443359375, 'epoch': 0.86} 86%|████████▌ | 2152/2500 [8:30:15<1:25:06, 14.67s/it] 86%|████████▌ | 2153/2500 [8:30:30<1:25:40, 14.81s/it] {'loss': 0.0012, 'grad_norm': 0.0470266994863095, 'learning_rate': 1.388e-07, 'completion_length': 55.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03033447265625, 'epoch': 0.86} 86%|████████▌ | 2153/2500 [8:30:30<1:25:40, 14.81s/it] 86%|████████▌ | 2154/2500 [8:30:44<1:23:22, 14.46s/it] {'loss': 0.0014, 'grad_norm': 0.15505831054381125, 'learning_rate': 1.384e-07, 'completion_length': 54.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03515625, 'epoch': 0.86} 86%|████████▌ | 2154/2500 [8:30:44<1:23:22, 14.46s/it] 86%|████████▌ | 2155/2500 [8:30:57<1:21:16, 14.14s/it] {'loss': 0.0015, 'grad_norm': 0.09249449636074207, 'learning_rate': 1.3800000000000002e-07, 'completion_length': 52.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0367431640625, 'epoch': 0.86} 86%|████████▌ | 2155/2500 [8:30:57<1:21:16, 14.14s/it] 86%|████████▌ | 2156/2500 [8:31:11<1:20:41, 14.07s/it] {'loss': 0.0009, 'grad_norm': 0.16997829637073186, 'learning_rate': 1.376e-07, 'completion_length': 50.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02142333984375, 'epoch': 0.86} 86%|████████▌ | 2156/2500 [8:31:11<1:20:41, 14.07s/it] 86%|████████▋ | 2157/2500 [8:31:24<1:19:11, 13.85s/it] {'loss': 0.0011, 'grad_norm': 0.06229483531319164, 'learning_rate': 1.3719999999999998e-07, 'completion_length': 49.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02777099609375, 'epoch': 0.86} 86%|████████▋ | 2157/2500 [8:31:24<1:19:11, 13.85s/it] 86%|████████▋ | 2158/2500 [8:31:38<1:18:48, 13.83s/it] {'loss': 0.0013, 'grad_norm': 1.28324403226878, 'learning_rate': 1.368e-07, 'completion_length': 53.85714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03155517578125, 'epoch': 0.86} 86%|████████▋ | 2158/2500 [8:31:38<1:18:48, 13.83s/it] 86%|████████▋ | 2159/2500 [8:31:52<1:18:19, 13.78s/it] {'loss': 0.0011, 'grad_norm': 0.04793755215976915, 'learning_rate': 1.3639999999999998e-07, 'completion_length': 58.71428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.86} 86%|████████▋ | 2159/2500 [8:31:52<1:18:19, 13.78s/it] 86%|████████▋ | 2160/2500 [8:32:06<1:19:50, 14.09s/it] {'loss': 0.0015, 'grad_norm': 0.07351284459872802, 'learning_rate': 1.36e-07, 'completion_length': 66.4464340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0377197265625, 'epoch': 0.86} 86%|████████▋ | 2160/2500 [8:32:06<1:19:50, 14.09s/it] 86%|████████▋ | 2161/2500 [8:32:21<1:20:14, 14.20s/it] {'loss': 0.0008, 'grad_norm': 0.04721601458445849, 'learning_rate': 1.356e-07, 'completion_length': 57.535715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020111083984375, 'epoch': 0.86} 86%|████████▋ | 2161/2500 [8:32:21<1:20:14, 14.20s/it] 86%|████████▋ | 2162/2500 [8:32:35<1:20:18, 14.26s/it] {'loss': 0.001, 'grad_norm': 0.04207458902212934, 'learning_rate': 1.352e-07, 'completion_length': 62.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0244140625, 'epoch': 0.86} 86%|████████▋ | 2162/2500 [8:32:35<1:20:18, 14.26s/it] 87%|████████▋ | 2163/2500 [8:32:50<1:20:37, 14.35s/it] {'loss': 0.0012, 'grad_norm': 0.1233699672292535, 'learning_rate': 1.348e-07, 'completion_length': 55.750003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0308837890625, 'epoch': 0.87} 87%|████████▋ | 2163/2500 [8:32:50<1:20:37, 14.35s/it] 87%|████████▋ | 2164/2500 [8:33:03<1:19:05, 14.12s/it] {'loss': 0.0009, 'grad_norm': 0.06977499190658562, 'learning_rate': 1.3439999999999999e-07, 'completion_length': 58.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021728515625, 'epoch': 0.87} 87%|████████▋ | 2164/2500 [8:33:03<1:19:05, 14.12s/it] 87%|████████▋ | 2165/2500 [8:33:17<1:17:33, 13.89s/it] {'loss': 0.0008, 'grad_norm': 0.07945158514268079, 'learning_rate': 1.34e-07, 'completion_length': 55.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0194091796875, 'epoch': 0.87} 87%|████████▋ | 2165/2500 [8:33:17<1:17:33, 13.89s/it] 87%|████████▋ | 2166/2500 [8:33:31<1:17:23, 13.90s/it] {'loss': 0.0012, 'grad_norm': 0.06514728711577085, 'learning_rate': 1.3359999999999998e-07, 'completion_length': 56.82143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02996826171875, 'epoch': 0.87} 87%|████████▋ | 2166/2500 [8:33:31<1:17:23, 13.90s/it] 87%|████████▋ | 2167/2500 [8:33:46<1:18:52, 14.21s/it] {'loss': 0.0007, 'grad_norm': 0.07897413899799746, 'learning_rate': 1.332e-07, 'completion_length': 57.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01776123046875, 'epoch': 0.87} 87%|████████▋ | 2167/2500 [8:33:46<1:18:52, 14.21s/it] 87%|████████▋ | 2168/2500 [8:33:59<1:17:33, 14.02s/it] {'loss': 0.001, 'grad_norm': 0.08394169846226347, 'learning_rate': 1.328e-07, 'completion_length': 60.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02386474609375, 'epoch': 0.87} 87%|████████▋ | 2168/2500 [8:33:59<1:17:33, 14.02s/it] 87%|████████▋ | 2169/2500 [8:34:15<1:20:10, 14.53s/it] {'loss': 0.0007, 'grad_norm': 0.1320821598932628, 'learning_rate': 1.324e-07, 'completion_length': 60.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01788330078125, 'epoch': 0.87} 87%|████████▋ | 2169/2500 [8:34:15<1:20:10, 14.53s/it] 87%|████████▋ | 2170/2500 [8:34:28<1:18:05, 14.20s/it] {'loss': 0.001, 'grad_norm': 0.059992500119448545, 'learning_rate': 1.32e-07, 'completion_length': 50.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02587890625, 'epoch': 0.87} 87%|████████▋ | 2170/2500 [8:34:28<1:18:05, 14.20s/it] 87%|████████▋ | 2171/2500 [8:34:42<1:16:12, 13.90s/it] {'loss': 0.001, 'grad_norm': 0.0478159832610925, 'learning_rate': 1.316e-07, 'completion_length': 50.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02398681640625, 'epoch': 0.87} 87%|████████▋ | 2171/2500 [8:34:42<1:16:12, 13.90s/it] 87%|████████▋ | 2172/2500 [8:34:57<1:19:10, 14.48s/it] {'loss': 0.0008, 'grad_norm': 0.08242556270681357, 'learning_rate': 1.312e-07, 'completion_length': 60.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02044677734375, 'epoch': 0.87} 87%|████████▋ | 2172/2500 [8:34:57<1:19:10, 14.48s/it] 87%|████████▋ | 2173/2500 [8:35:11<1:17:19, 14.19s/it] {'loss': 0.001, 'grad_norm': 0.272462740326752, 'learning_rate': 1.308e-07, 'completion_length': 54.42857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025390625, 'epoch': 0.87} 87%|████████▋ | 2173/2500 [8:35:11<1:17:19, 14.19s/it] 87%|████████▋ | 2174/2500 [8:35:26<1:18:06, 14.37s/it] {'loss': 0.0014, 'grad_norm': 0.08915082748194103, 'learning_rate': 1.3039999999999998e-07, 'completion_length': 64.64286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03521728515625, 'epoch': 0.87} 87%|████████▋ | 2174/2500 [8:35:26<1:18:06, 14.37s/it] 87%|████████▋ | 2175/2500 [8:35:41<1:19:06, 14.60s/it] {'loss': 0.0011, 'grad_norm': 0.06284437304039299, 'learning_rate': 1.3e-07, 'completion_length': 52.892860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02880859375, 'epoch': 0.87} 87%|████████▋ | 2175/2500 [8:35:41<1:19:06, 14.60s/it] 87%|████████▋ | 2176/2500 [8:35:56<1:19:49, 14.78s/it] {'loss': 0.0008, 'grad_norm': 0.0465820994439433, 'learning_rate': 1.296e-07, 'completion_length': 57.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0208740234375, 'epoch': 0.87} 87%|████████▋ | 2176/2500 [8:35:56<1:19:49, 14.78s/it] 87%|████████▋ | 2177/2500 [8:36:10<1:18:17, 14.54s/it] {'loss': 0.0006, 'grad_norm': 0.06794159242650731, 'learning_rate': 1.292e-07, 'completion_length': 50.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013824462890625, 'epoch': 0.87} 87%|████████▋ | 2177/2500 [8:36:10<1:18:17, 14.54s/it] 87%|████████▋ | 2178/2500 [8:36:24<1:17:18, 14.40s/it] {'loss': 0.0012, 'grad_norm': 0.09146604189825336, 'learning_rate': 1.288e-07, 'completion_length': 60.125003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03057861328125, 'epoch': 0.87} 87%|████████▋ | 2178/2500 [8:36:24<1:17:18, 14.40s/it] 87%|████████▋ | 2179/2500 [8:36:38<1:16:27, 14.29s/it] {'loss': 0.0012, 'grad_norm': 0.0872529389963767, 'learning_rate': 1.2839999999999999e-07, 'completion_length': 55.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02911376953125, 'epoch': 0.87} 87%|████████▋ | 2179/2500 [8:36:38<1:16:27, 14.29s/it] 87%|████████▋ | 2180/2500 [8:36:51<1:14:36, 13.99s/it] {'loss': 0.0019, 'grad_norm': 0.09214656765015933, 'learning_rate': 1.28e-07, 'completion_length': 49.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0482177734375, 'epoch': 0.87} 87%|████████▋ | 2180/2500 [8:36:51<1:14:36, 13.99s/it] 87%|████████▋ | 2181/2500 [8:37:05<1:13:15, 13.78s/it] {'loss': 0.0013, 'grad_norm': 0.1006187997136639, 'learning_rate': 1.2759999999999998e-07, 'completion_length': 56.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03179931640625, 'epoch': 0.87} 87%|████████▋ | 2181/2500 [8:37:05<1:13:15, 13.78s/it] 87%|████████▋ | 2182/2500 [8:37:19<1:13:04, 13.79s/it] {'loss': 0.0007, 'grad_norm': 0.04969805273953612, 'learning_rate': 1.272e-07, 'completion_length': 52.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01641845703125, 'epoch': 0.87} 87%|████████▋ | 2182/2500 [8:37:19<1:13:04, 13.79s/it] 87%|████████▋ | 2183/2500 [8:37:32<1:12:31, 13.73s/it] {'loss': 0.0009, 'grad_norm': 0.06540973394818717, 'learning_rate': 1.268e-07, 'completion_length': 59.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0233154296875, 'epoch': 0.87} 87%|████████▋ | 2183/2500 [8:37:32<1:12:31, 13.73s/it] 87%|████████▋ | 2184/2500 [8:37:47<1:14:02, 14.06s/it] {'loss': 0.0008, 'grad_norm': 0.03957410928106887, 'learning_rate': 1.264e-07, 'completion_length': 54.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02032470703125, 'epoch': 0.87} 87%|████████▋ | 2184/2500 [8:37:47<1:14:02, 14.06s/it] 87%|████████▋ | 2185/2500 [8:38:00<1:12:49, 13.87s/it] {'loss': 0.0005, 'grad_norm': 0.042380589104578176, 'learning_rate': 1.26e-07, 'completion_length': 50.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.011688232421875, 'epoch': 0.87} 87%|████████▋ | 2185/2500 [8:38:00<1:12:49, 13.87s/it] 87%|████████▋ | 2186/2500 [8:38:15<1:13:49, 14.11s/it] {'loss': 0.001, 'grad_norm': 0.08546074756862918, 'learning_rate': 1.2559999999999999e-07, 'completion_length': 61.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0250244140625, 'epoch': 0.87} 87%|████████▋ | 2186/2500 [8:38:15<1:13:49, 14.11s/it] 87%|████████▋ | 2187/2500 [8:38:29<1:12:54, 13.98s/it] {'loss': 0.0009, 'grad_norm': 0.051120177512980425, 'learning_rate': 1.252e-07, 'completion_length': 55.17857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023193359375, 'epoch': 0.87} 87%|████████▋ | 2187/2500 [8:38:29<1:12:54, 13.98s/it] 88%|████████▊ | 2188/2500 [8:38:43<1:13:23, 14.11s/it] {'loss': 0.0015, 'grad_norm': 0.06449076484481953, 'learning_rate': 1.2479999999999998e-07, 'completion_length': 62.10714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.036865234375, 'epoch': 0.88} 88%|████████▊ | 2188/2500 [8:38:43<1:13:23, 14.11s/it] 88%|████████▊ | 2189/2500 [8:38:57<1:12:48, 14.05s/it] {'loss': 0.0011, 'grad_norm': 0.04673323353767022, 'learning_rate': 1.244e-07, 'completion_length': 54.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02838134765625, 'epoch': 0.88} 88%|████████▊ | 2189/2500 [8:38:57<1:12:48, 14.05s/it] 88%|████████▊ | 2190/2500 [8:39:11<1:12:19, 14.00s/it] {'loss': 0.0013, 'grad_norm': 0.097735661431744, 'learning_rate': 1.24e-07, 'completion_length': 55.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032470703125, 'epoch': 0.88} 88%|████████▊ | 2190/2500 [8:39:11<1:12:19, 14.00s/it] 88%|████████▊ | 2191/2500 [8:39:25<1:11:29, 13.88s/it] {'loss': 0.0008, 'grad_norm': 0.05163154070920784, 'learning_rate': 1.236e-07, 'completion_length': 58.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0191650390625, 'epoch': 0.88} 88%|████████▊ | 2191/2500 [8:39:25<1:11:29, 13.88s/it] 88%|████████▊ | 2192/2500 [8:39:39<1:12:23, 14.10s/it] {'loss': 0.0007, 'grad_norm': 0.29679915100652887, 'learning_rate': 1.232e-07, 'completion_length': 57.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018280029296875, 'epoch': 0.88} 88%|████████▊ | 2192/2500 [8:39:39<1:12:23, 14.10s/it] 88%|████████▊ | 2193/2500 [8:39:52<1:10:56, 13.86s/it] {'loss': 0.0013, 'grad_norm': 0.06630508673696807, 'learning_rate': 1.228e-07, 'completion_length': 49.892860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03228759765625, 'epoch': 0.88} 88%|████████▊ | 2193/2500 [8:39:52<1:10:56, 13.86s/it] 88%|████████▊ | 2194/2500 [8:40:06<1:10:02, 13.73s/it] {'loss': 0.0012, 'grad_norm': 1.9130689247211012, 'learning_rate': 1.2239999999999998e-07, 'completion_length': 53.78571701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0289306640625, 'epoch': 0.88} 88%|████████▊ | 2194/2500 [8:40:06<1:10:02, 13.73s/it] 88%|████████▊ | 2195/2500 [8:40:21<1:11:57, 14.15s/it] {'loss': 0.0014, 'grad_norm': 0.06794677737529485, 'learning_rate': 1.2199999999999998e-07, 'completion_length': 56.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0350341796875, 'epoch': 0.88} 88%|████████▊ | 2195/2500 [8:40:21<1:11:57, 14.15s/it] 88%|████████▊ | 2196/2500 [8:40:36<1:12:28, 14.31s/it] {'loss': 0.0009, 'grad_norm': 0.10847092825072435, 'learning_rate': 1.216e-07, 'completion_length': 56.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022857666015625, 'epoch': 0.88} 88%|████████▊ | 2196/2500 [8:40:36<1:12:28, 14.31s/it] 88%|████████▊ | 2197/2500 [8:40:50<1:11:34, 14.17s/it] {'loss': 0.0005, 'grad_norm': 0.04647275339207203, 'learning_rate': 1.212e-07, 'completion_length': 59.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0130462646484375, 'epoch': 0.88} 88%|████████▊ | 2197/2500 [8:40:50<1:11:34, 14.17s/it] 88%|████████▊ | 2198/2500 [8:41:03<1:10:11, 13.94s/it] {'loss': 0.0008, 'grad_norm': 0.04977917328508591, 'learning_rate': 1.208e-07, 'completion_length': 48.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021087646484375, 'epoch': 0.88} 88%|████████▊ | 2198/2500 [8:41:03<1:10:11, 13.94s/it] 88%|████████▊ | 2199/2500 [8:41:17<1:09:40, 13.89s/it] {'loss': 0.0009, 'grad_norm': 0.10596133987561188, 'learning_rate': 1.204e-07, 'completion_length': 51.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023590087890625, 'epoch': 0.88} 88%|████████▊ | 2199/2500 [8:41:17<1:09:40, 13.89s/it] 88%|████████▊ | 2200/2500 [8:41:31<1:09:26, 13.89s/it] {'loss': 0.0012, 'grad_norm': 0.13861789991034773, 'learning_rate': 1.2e-07, 'completion_length': 55.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0302734375, 'epoch': 0.88} 88%|████████▊ | 2200/2500 [8:41:31<1:09:26, 13.89s/it] 88%|████████▊ | 2201/2500 [8:42:38<2:29:10, 29.93s/it] {'loss': 0.0006, 'grad_norm': 0.06849602546582076, 'learning_rate': 1.1959999999999999e-07, 'completion_length': 57.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014892578125, 'epoch': 0.88} 88%|████████▊ | 2201/2500 [8:42:38<2:29:10, 29.93s/it] 88%|████████▊ | 2202/2500 [8:42:51<2:03:57, 24.96s/it] {'loss': 0.0006, 'grad_norm': 0.04615730475491253, 'learning_rate': 1.192e-07, 'completion_length': 53.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014312744140625, 'epoch': 0.88} 88%|████████▊ | 2202/2500 [8:42:51<2:03:57, 24.96s/it] 88%|████████▊ | 2203/2500 [8:43:05<1:46:09, 21.45s/it] {'loss': 0.0009, 'grad_norm': 0.08025866409904865, 'learning_rate': 1.1879999999999999e-07, 'completion_length': 55.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0235595703125, 'epoch': 0.88} 88%|████████▊ | 2203/2500 [8:43:05<1:46:09, 21.45s/it] 88%|████████▊ | 2204/2500 [8:43:19<1:35:21, 19.33s/it] {'loss': 0.0009, 'grad_norm': 0.05651419735669035, 'learning_rate': 1.184e-07, 'completion_length': 62.85714340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022003173828125, 'epoch': 0.88} 88%|████████▊ | 2204/2500 [8:43:19<1:35:21, 19.33s/it] 88%|████████▊ | 2205/2500 [8:43:37<1:32:25, 18.80s/it] {'loss': 0.0006, 'grad_norm': 0.06333126961674787, 'learning_rate': 1.1799999999999998e-07, 'completion_length': 60.55357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015228271484375, 'epoch': 0.88} 88%|████████▊ | 2205/2500 [8:43:37<1:32:25, 18.80s/it] 88%|████████▊ | 2206/2500 [8:43:51<1:26:21, 17.63s/it] {'loss': 0.0006, 'grad_norm': 0.04936498189196168, 'learning_rate': 1.176e-07, 'completion_length': 55.750003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01611328125, 'epoch': 0.88} 88%|████████▊ | 2206/2500 [8:43:51<1:26:21, 17.63s/it] 88%|████████▊ | 2207/2500 [8:44:06<1:21:39, 16.72s/it] {'loss': 0.0009, 'grad_norm': 0.05078130956638234, 'learning_rate': 1.1719999999999999e-07, 'completion_length': 58.58928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0234375, 'epoch': 0.88} 88%|████████▊ | 2207/2500 [8:44:06<1:21:39, 16.72s/it] 88%|████████▊ | 2208/2500 [8:44:21<1:19:05, 16.25s/it] {'loss': 0.0007, 'grad_norm': 1.7745584247692598, 'learning_rate': 1.168e-07, 'completion_length': 60.76786231994629, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.016998291015625, 'epoch': 0.88} 88%|████████▊ | 2208/2500 [8:44:21<1:19:05, 16.25s/it] 88%|████████▊ | 2209/2500 [8:44:35<1:15:22, 15.54s/it] {'loss': 0.0006, 'grad_norm': 0.04679179527712105, 'learning_rate': 1.164e-07, 'completion_length': 53.535715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016021728515625, 'epoch': 0.88} 88%|████████▊ | 2209/2500 [8:44:35<1:15:22, 15.54s/it] 88%|████████▊ | 2210/2500 [8:44:49<1:12:54, 15.08s/it] {'loss': 0.0008, 'grad_norm': 0.058994127786413116, 'learning_rate': 1.16e-07, 'completion_length': 57.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019439697265625, 'epoch': 0.88} 88%|████████▊ | 2210/2500 [8:44:49<1:12:54, 15.08s/it] 88%|████████▊ | 2211/2500 [8:45:03<1:11:37, 14.87s/it] {'loss': 0.0013, 'grad_norm': 0.16977455202689765, 'learning_rate': 1.1559999999999999e-07, 'completion_length': 59.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0318603515625, 'epoch': 0.88} 88%|████████▊ | 2211/2500 [8:45:03<1:11:37, 14.87s/it] 88%|████████▊ | 2212/2500 [8:45:18<1:10:23, 14.66s/it] {'loss': 0.001, 'grad_norm': 2.77792474245732, 'learning_rate': 1.1519999999999999e-07, 'completion_length': 65.51786041259766, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02471923828125, 'epoch': 0.88} 88%|████████▊ | 2212/2500 [8:45:18<1:10:23, 14.66s/it] 89%|████████▊ | 2213/2500 [8:45:39<1:20:27, 16.82s/it] {'loss': 0.0011, 'grad_norm': 0.6644540638664386, 'learning_rate': 1.148e-07, 'completion_length': 77.71429061889648, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.0263671875, 'epoch': 0.89} 89%|████████▊ | 2213/2500 [8:45:39<1:20:27, 16.82s/it] 89%|████████▊ | 2214/2500 [8:45:54<1:16:23, 16.03s/it] {'loss': 0.0011, 'grad_norm': 0.07165574671913566, 'learning_rate': 1.1439999999999999e-07, 'completion_length': 53.32143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02691650390625, 'epoch': 0.89} 89%|████████▊ | 2214/2500 [8:45:54<1:16:23, 16.03s/it] 89%|████████▊ | 2215/2500 [8:46:08<1:13:29, 15.47s/it] {'loss': 0.0008, 'grad_norm': 0.05074561943549086, 'learning_rate': 1.14e-07, 'completion_length': 54.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0201416015625, 'epoch': 0.89} 89%|████████▊ | 2215/2500 [8:46:08<1:13:29, 15.47s/it] 89%|████████▊ | 2216/2500 [8:46:23<1:12:56, 15.41s/it] {'loss': 0.0013, 'grad_norm': 0.07687281244895885, 'learning_rate': 1.136e-07, 'completion_length': 65.53571510314941, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03271484375, 'epoch': 0.89} 89%|████████▊ | 2216/2500 [8:46:23<1:12:56, 15.41s/it] 89%|████████▊ | 2217/2500 [8:46:37<1:10:45, 15.00s/it] {'loss': 0.001, 'grad_norm': 0.06677990591856851, 'learning_rate': 1.132e-07, 'completion_length': 55.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024383544921875, 'epoch': 0.89} 89%|████████▊ | 2217/2500 [8:46:37<1:10:45, 15.00s/it] 89%|████████▊ | 2218/2500 [8:46:51<1:08:47, 14.63s/it] {'loss': 0.002, 'grad_norm': 0.16124896386919998, 'learning_rate': 1.1279999999999999e-07, 'completion_length': 55.37500190734863, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.0491943359375, 'epoch': 0.89} 89%|████████▊ | 2218/2500 [8:46:51<1:08:47, 14.63s/it] 89%|████████▉ | 2219/2500 [8:47:05<1:08:01, 14.53s/it] {'loss': 0.0009, 'grad_norm': 1.4590542877255013, 'learning_rate': 1.124e-07, 'completion_length': 51.67857360839844, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.021636962890625, 'epoch': 0.89} 89%|████████▉ | 2219/2500 [8:47:05<1:08:01, 14.53s/it] 89%|████████▉ | 2220/2500 [8:47:21<1:09:14, 14.84s/it] {'loss': 0.001, 'grad_norm': 0.0859126757001809, 'learning_rate': 1.12e-07, 'completion_length': 55.375003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0247802734375, 'epoch': 0.89} 89%|████████▉ | 2220/2500 [8:47:21<1:09:14, 14.84s/it] 89%|████████▉ | 2221/2500 [8:47:35<1:08:45, 14.79s/it] {'loss': 0.0012, 'grad_norm': 0.057724609724356786, 'learning_rate': 1.116e-07, 'completion_length': 66.41071510314941, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030029296875, 'epoch': 0.89} 89%|████████▉ | 2221/2500 [8:47:35<1:08:45, 14.79s/it] 89%|████████▉ | 2222/2500 [8:47:48<1:06:05, 14.27s/it] {'loss': 0.0009, 'grad_norm': 0.05256431813262693, 'learning_rate': 1.1119999999999999e-07, 'completion_length': 50.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022216796875, 'epoch': 0.89} 89%|████████▉ | 2222/2500 [8:47:48<1:06:05, 14.27s/it] 89%|████████▉ | 2223/2500 [8:48:04<1:06:57, 14.50s/it] {'loss': 0.0011, 'grad_norm': 0.08323562139565184, 'learning_rate': 1.1079999999999999e-07, 'completion_length': 52.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02777099609375, 'epoch': 0.89} 89%|████████▉ | 2223/2500 [8:48:04<1:06:57, 14.50s/it] 89%|████████▉ | 2224/2500 [8:48:19<1:07:29, 14.67s/it] {'loss': 0.0011, 'grad_norm': 0.05048974295057472, 'learning_rate': 1.104e-07, 'completion_length': 60.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02813720703125, 'epoch': 0.89} 89%|████████▉ | 2224/2500 [8:48:19<1:07:29, 14.67s/it] 89%|████████▉ | 2225/2500 [8:48:31<1:04:41, 14.12s/it] {'loss': 0.0014, 'grad_norm': 0.06856969863449389, 'learning_rate': 1.0999999999999999e-07, 'completion_length': 51.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03411865234375, 'epoch': 0.89} 89%|████████▉ | 2225/2500 [8:48:31<1:04:41, 14.12s/it] 89%|████████▉ | 2226/2500 [8:48:45<1:04:11, 14.06s/it] {'loss': 0.0011, 'grad_norm': 0.07777241549132632, 'learning_rate': 1.096e-07, 'completion_length': 57.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02777099609375, 'epoch': 0.89} 89%|████████▉ | 2226/2500 [8:48:45<1:04:11, 14.06s/it] 89%|████████▉ | 2227/2500 [8:49:00<1:04:45, 14.23s/it] {'loss': 0.0015, 'grad_norm': 0.05895070221578132, 'learning_rate': 1.092e-07, 'completion_length': 53.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03668212890625, 'epoch': 0.89} 89%|████████▉ | 2227/2500 [8:49:00<1:04:45, 14.23s/it] 89%|████████▉ | 2228/2500 [8:49:16<1:07:03, 14.79s/it] {'loss': 0.0007, 'grad_norm': 0.052606004699824814, 'learning_rate': 1.088e-07, 'completion_length': 63.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01849365234375, 'epoch': 0.89} 89%|████████▉ | 2228/2500 [8:49:16<1:07:03, 14.79s/it] 89%|████████▉ | 2229/2500 [8:49:30<1:05:05, 14.41s/it] {'loss': 0.0006, 'grad_norm': 5.753268264417203, 'learning_rate': 1.0839999999999999e-07, 'completion_length': 54.12500190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.015960693359375, 'epoch': 0.89} 89%|████████▉ | 2229/2500 [8:49:30<1:05:05, 14.41s/it] 89%|████████▉ | 2230/2500 [8:49:44<1:04:20, 14.30s/it] {'loss': 0.0018, 'grad_norm': 0.09639425101745037, 'learning_rate': 1.0799999999999999e-07, 'completion_length': 59.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0439453125, 'epoch': 0.89} 89%|████████▉ | 2230/2500 [8:49:44<1:04:20, 14.30s/it] 89%|████████▉ | 2231/2500 [8:49:57<1:03:18, 14.12s/it] {'loss': 0.0009, 'grad_norm': 0.07645215338030614, 'learning_rate': 1.076e-07, 'completion_length': 53.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02276611328125, 'epoch': 0.89} 89%|████████▉ | 2231/2500 [8:49:57<1:03:18, 14.12s/it] 89%|████████▉ | 2232/2500 [8:50:11<1:02:59, 14.10s/it] {'loss': 0.0013, 'grad_norm': 0.13340709042256169, 'learning_rate': 1.072e-07, 'completion_length': 51.23214340209961, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.03204345703125, 'epoch': 0.89} 89%|████████▉ | 2232/2500 [8:50:11<1:02:59, 14.10s/it] 89%|████████▉ | 2233/2500 [8:50:25<1:02:30, 14.05s/it] {'loss': 0.0007, 'grad_norm': 0.07812250903025929, 'learning_rate': 1.068e-07, 'completion_length': 57.892860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017974853515625, 'epoch': 0.89} 89%|████████▉ | 2233/2500 [8:50:25<1:02:30, 14.05s/it] 89%|████████▉ | 2234/2500 [8:50:39<1:02:14, 14.04s/it] {'loss': 0.0007, 'grad_norm': 0.05039725412546188, 'learning_rate': 1.0639999999999999e-07, 'completion_length': 55.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016510009765625, 'epoch': 0.89} 89%|████████▉ | 2234/2500 [8:50:39<1:02:14, 14.04s/it] 89%|████████▉ | 2235/2500 [8:50:53<1:01:52, 14.01s/it] {'loss': 0.001, 'grad_norm': 0.12883402042023995, 'learning_rate': 1.06e-07, 'completion_length': 55.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0240478515625, 'epoch': 0.89} 89%|████████▉ | 2235/2500 [8:50:53<1:01:52, 14.01s/it] 89%|████████▉ | 2236/2500 [8:51:07<1:00:55, 13.85s/it] {'loss': 0.0006, 'grad_norm': 0.08977541855575642, 'learning_rate': 1.0559999999999999e-07, 'completion_length': 57.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0162353515625, 'epoch': 0.89} 89%|████████▉ | 2236/2500 [8:51:07<1:00:55, 13.85s/it] 89%|████████▉ | 2237/2500 [8:51:21<1:01:36, 14.06s/it] {'loss': 0.001, 'grad_norm': 0.7701012415069597, 'learning_rate': 1.052e-07, 'completion_length': 59.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02618408203125, 'epoch': 0.89} 89%|████████▉ | 2237/2500 [8:51:21<1:01:36, 14.06s/it] 90%|████████▉ | 2238/2500 [8:51:35<1:00:58, 13.96s/it] {'loss': 0.0009, 'grad_norm': 0.04624860243137238, 'learning_rate': 1.048e-07, 'completion_length': 57.46428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0224609375, 'epoch': 0.9} 90%|████████▉ | 2238/2500 [8:51:35<1:00:58, 13.96s/it] 90%|████████▉ | 2239/2500 [8:51:49<1:00:36, 13.93s/it] {'loss': 0.0008, 'grad_norm': 0.3062962223705206, 'learning_rate': 1.0440000000000001e-07, 'completion_length': 60.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01995849609375, 'epoch': 0.9} 90%|████████▉ | 2239/2500 [8:51:49<1:00:36, 13.93s/it] 90%|████████▉ | 2240/2500 [8:52:04<1:02:29, 14.42s/it] {'loss': 0.0012, 'grad_norm': 0.07739900992021924, 'learning_rate': 1.0399999999999999e-07, 'completion_length': 65.03571510314941, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02972412109375, 'epoch': 0.9} 90%|████████▉ | 2240/2500 [8:52:04<1:02:29, 14.42s/it] 90%|████████▉ | 2241/2500 [8:52:18<1:01:24, 14.23s/it] {'loss': 0.0004, 'grad_norm': 0.0447750237073913, 'learning_rate': 1.0359999999999999e-07, 'completion_length': 53.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01055908203125, 'epoch': 0.9} 90%|████████▉ | 2241/2500 [8:52:18<1:01:24, 14.23s/it] 90%|████████▉ | 2242/2500 [8:52:32<1:00:10, 13.99s/it] {'loss': 0.001, 'grad_norm': 0.06677884873606775, 'learning_rate': 1.032e-07, 'completion_length': 59.250003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02471923828125, 'epoch': 0.9} 90%|████████▉ | 2242/2500 [8:52:32<1:00:10, 13.99s/it] 90%|████████▉ | 2243/2500 [8:52:46<59:58, 14.00s/it] {'loss': 0.0008, 'grad_norm': 0.10600201467058423, 'learning_rate': 1.028e-07, 'completion_length': 55.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0198974609375, 'epoch': 0.9} 90%|████████▉ | 2243/2500 [8:52:46<59:58, 14.00s/it] 90%|████████▉ | 2244/2500 [8:53:00<59:40, 13.99s/it] {'loss': 0.0009, 'grad_norm': 1.4057228000751048, 'learning_rate': 1.024e-07, 'completion_length': 53.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021942138671875, 'epoch': 0.9} 90%|████████▉ | 2244/2500 [8:53:00<59:40, 13.99s/it] 90%|████████▉ | 2245/2500 [8:53:13<58:56, 13.87s/it] {'loss': 0.0007, 'grad_norm': 0.049184839900431494, 'learning_rate': 1.0199999999999999e-07, 'completion_length': 59.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0169677734375, 'epoch': 0.9} 90%|████████▉ | 2245/2500 [8:53:13<58:56, 13.87s/it] 90%|████████▉ | 2246/2500 [8:53:26<57:21, 13.55s/it] {'loss': 0.0003, 'grad_norm': 0.0487522435286162, 'learning_rate': 1.016e-07, 'completion_length': 49.69643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.008392333984375, 'epoch': 0.9} 90%|████████▉ | 2246/2500 [8:53:26<57:21, 13.55s/it] 90%|████████▉ | 2247/2500 [8:53:40<57:37, 13.66s/it] {'loss': 0.001, 'grad_norm': 0.048834125211773896, 'learning_rate': 1.0119999999999999e-07, 'completion_length': 56.607147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025146484375, 'epoch': 0.9} 90%|████████▉ | 2247/2500 [8:53:40<57:37, 13.66s/it] 90%|████████▉ | 2248/2500 [8:53:54<58:14, 13.87s/it] {'loss': 0.0016, 'grad_norm': 0.049435157836755235, 'learning_rate': 1.008e-07, 'completion_length': 60.76786231994629, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.040283203125, 'epoch': 0.9} 90%|████████▉ | 2248/2500 [8:53:54<58:14, 13.87s/it] 90%|████████▉ | 2249/2500 [8:54:10<59:39, 14.26s/it] {'loss': 0.0005, 'grad_norm': 0.08937253554586, 'learning_rate': 1.004e-07, 'completion_length': 51.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0137176513671875, 'epoch': 0.9} 90%|████████▉ | 2249/2500 [8:54:10<59:39, 14.26s/it] 90%|█████████ | 2250/2500 [8:54:23<57:59, 13.92s/it] {'loss': 0.0012, 'grad_norm': 0.04739801293217019, 'learning_rate': 1e-07, 'completion_length': 52.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.030242919921875, 'epoch': 0.9} 90%|█████████ | 2250/2500 [8:54:23<57:59, 13.92s/it] 90%|█████████ | 2251/2500 [8:54:37<57:42, 13.91s/it] {'loss': 0.0013, 'grad_norm': 1.6697973489120805, 'learning_rate': 9.959999999999999e-08, 'completion_length': 54.41071701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0313720703125, 'epoch': 0.9} 90%|█████████ | 2251/2500 [8:54:37<57:42, 13.91s/it] 90%|█████████ | 2252/2500 [8:54:52<59:11, 14.32s/it] {'loss': 0.001, 'grad_norm': 0.35296594840467793, 'learning_rate': 9.919999999999999e-08, 'completion_length': 60.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02606201171875, 'epoch': 0.9} 90%|█████████ | 2252/2500 [8:54:52<59:11, 14.32s/it] 90%|█████████ | 2253/2500 [8:55:06<58:58, 14.33s/it] {'loss': 0.0012, 'grad_norm': 0.09292845855794671, 'learning_rate': 9.88e-08, 'completion_length': 55.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03125, 'epoch': 0.9} 90%|█████████ | 2253/2500 [8:55:06<58:58, 14.33s/it] 90%|█████████ | 2254/2500 [8:55:20<58:12, 14.20s/it] {'loss': 0.0013, 'grad_norm': 0.042453642846344994, 'learning_rate': 9.84e-08, 'completion_length': 52.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03338623046875, 'epoch': 0.9} 90%|█████████ | 2254/2500 [8:55:20<58:12, 14.20s/it] 90%|█████████ | 2255/2500 [8:55:34<58:13, 14.26s/it] {'loss': 0.0014, 'grad_norm': 0.07284314422324152, 'learning_rate': 9.8e-08, 'completion_length': 61.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0343017578125, 'epoch': 0.9} 90%|█████████ | 2255/2500 [8:55:34<58:13, 14.26s/it] 90%|█████████ | 2256/2500 [8:55:49<58:54, 14.48s/it] {'loss': 0.0007, 'grad_norm': 0.08543976511460355, 'learning_rate': 9.76e-08, 'completion_length': 61.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017608642578125, 'epoch': 0.9} 90%|█████████ | 2256/2500 [8:55:49<58:54, 14.48s/it] 90%|█████████ | 2257/2500 [8:56:03<58:04, 14.34s/it] {'loss': 0.0015, 'grad_norm': 1.0563924609911761, 'learning_rate': 9.72e-08, 'completion_length': 55.78571701049805, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0372314453125, 'epoch': 0.9} 90%|█████████ | 2257/2500 [8:56:03<58:04, 14.34s/it] 90%|█████████ | 2258/2500 [8:56:18<58:31, 14.51s/it] {'loss': 0.0013, 'grad_norm': 0.07418314677625563, 'learning_rate': 9.679999999999999e-08, 'completion_length': 63.285715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03192138671875, 'epoch': 0.9} 90%|█████████ | 2258/2500 [8:56:18<58:31, 14.51s/it] 90%|█████████ | 2259/2500 [8:56:32<57:09, 14.23s/it] {'loss': 0.0011, 'grad_norm': 0.0622064256552031, 'learning_rate': 9.639999999999999e-08, 'completion_length': 51.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02838897705078125, 'epoch': 0.9} 90%|█████████ | 2259/2500 [8:56:32<57:09, 14.23s/it] 90%|█████████ | 2260/2500 [8:56:46<56:25, 14.11s/it] {'loss': 0.0003, 'grad_norm': 0.1040241982404546, 'learning_rate': 9.6e-08, 'completion_length': 53.51785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.00778961181640625, 'epoch': 0.9} 90%|█████████ | 2260/2500 [8:56:46<56:25, 14.11s/it] 90%|█████████ | 2261/2500 [8:57:00<56:57, 14.30s/it] {'loss': 0.0008, 'grad_norm': 0.1956916371659817, 'learning_rate': 9.56e-08, 'completion_length': 65.6964340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02069091796875, 'epoch': 0.9} 90%|█████████ | 2261/2500 [8:57:00<56:57, 14.30s/it] 90%|█████████ | 2262/2500 [8:57:14<55:59, 14.12s/it] {'loss': 0.001, 'grad_norm': 0.07452260326712688, 'learning_rate': 9.52e-08, 'completion_length': 62.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0260009765625, 'epoch': 0.9} 90%|█████████ | 2262/2500 [8:57:14<55:59, 14.12s/it] 91%|█████████ | 2263/2500 [8:57:31<58:56, 14.92s/it] {'loss': 0.0011, 'grad_norm': 0.0834219476378447, 'learning_rate': 9.479999999999999e-08, 'completion_length': 67.12500381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0279541015625, 'epoch': 0.91} 91%|█████████ | 2263/2500 [8:57:31<58:56, 14.92s/it] 91%|█████████ | 2264/2500 [8:57:49<1:01:46, 15.71s/it] {'loss': 0.0007, 'grad_norm': 0.08772630334061651, 'learning_rate': 9.44e-08, 'completion_length': 63.42857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01727294921875, 'epoch': 0.91} 91%|█████████ | 2264/2500 [8:57:49<1:01:46, 15.71s/it] 91%|█████████ | 2265/2500 [8:58:03<1:00:14, 15.38s/it] {'loss': 0.0014, 'grad_norm': 7.269338582860819, 'learning_rate': 9.4e-08, 'completion_length': 54.41071701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.03387451171875, 'epoch': 0.91} 91%|█████████ | 2265/2500 [8:58:03<1:00:14, 15.38s/it] 91%|█████████ | 2266/2500 [8:58:19<1:00:03, 15.40s/it] {'loss': 0.0011, 'grad_norm': 1.8597232729955606, 'learning_rate': 9.36e-08, 'completion_length': 56.53571701049805, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0277099609375, 'epoch': 0.91} 91%|█████████ | 2266/2500 [8:58:19<1:00:03, 15.40s/it] 91%|█████████ | 2267/2500 [8:58:33<58:34, 15.08s/it] {'loss': 0.001, 'grad_norm': 0.06890593930622675, 'learning_rate': 9.32e-08, 'completion_length': 57.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02618408203125, 'epoch': 0.91} 91%|█████████ | 2267/2500 [8:58:33<58:34, 15.08s/it] 91%|█████████ | 2268/2500 [8:58:48<58:00, 15.00s/it] {'loss': 0.0008, 'grad_norm': 0.0431419969091637, 'learning_rate': 9.279999999999998e-08, 'completion_length': 58.69643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02056884765625, 'epoch': 0.91} 91%|█████████ | 2268/2500 [8:58:48<58:00, 15.00s/it] 91%|█████████ | 2269/2500 [8:59:03<57:58, 15.06s/it] {'loss': 0.001, 'grad_norm': 0.06224267782423364, 'learning_rate': 9.24e-08, 'completion_length': 63.339290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02435302734375, 'epoch': 0.91} 91%|█████████ | 2269/2500 [8:59:03<57:58, 15.06s/it] 91%|█████████ | 2270/2500 [8:59:17<56:55, 14.85s/it] {'loss': 0.0014, 'grad_norm': 0.046356544144321554, 'learning_rate': 9.199999999999999e-08, 'completion_length': 62.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03448486328125, 'epoch': 0.91} 91%|█████████ | 2270/2500 [8:59:17<56:55, 14.85s/it] 91%|█████████ | 2271/2500 [8:59:31<54:53, 14.38s/it] {'loss': 0.001, 'grad_norm': 0.04837827210037384, 'learning_rate': 9.16e-08, 'completion_length': 53.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02557373046875, 'epoch': 0.91} 91%|█████████ | 2271/2500 [8:59:31<54:53, 14.38s/it] 91%|█████████ | 2272/2500 [8:59:48<57:33, 15.15s/it] {'loss': 0.0008, 'grad_norm': 0.06062177985909929, 'learning_rate': 9.12e-08, 'completion_length': 68.14286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020904541015625, 'epoch': 0.91} 91%|█████████ | 2272/2500 [8:59:48<57:33, 15.15s/it] 91%|█████████ | 2273/2500 [9:00:02<56:07, 14.84s/it] {'loss': 0.0011, 'grad_norm': 0.12504956054904157, 'learning_rate': 9.08e-08, 'completion_length': 50.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0283203125, 'epoch': 0.91} 91%|█████████ | 2273/2500 [9:00:02<56:07, 14.84s/it] 91%|█████████ | 2274/2500 [9:00:15<54:44, 14.54s/it] {'loss': 0.0005, 'grad_norm': 0.05595680220560321, 'learning_rate': 9.039999999999999e-08, 'completion_length': 57.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0128326416015625, 'epoch': 0.91} 91%|█████████ | 2274/2500 [9:00:15<54:44, 14.54s/it] 91%|█████████ | 2275/2500 [9:00:29<53:33, 14.28s/it] {'loss': 0.0008, 'grad_norm': 0.05300938726484843, 'learning_rate': 9e-08, 'completion_length': 51.46428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019775390625, 'epoch': 0.91} 91%|█████████ | 2275/2500 [9:00:29<53:33, 14.28s/it] 91%|█████████ | 2276/2500 [9:00:43<52:58, 14.19s/it] {'loss': 0.0011, 'grad_norm': 0.05246839050961563, 'learning_rate': 8.96e-08, 'completion_length': 58.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02728271484375, 'epoch': 0.91} 91%|█████████ | 2276/2500 [9:00:43<52:58, 14.19s/it] 91%|█████████ | 2277/2500 [9:00:57<52:18, 14.07s/it] {'loss': 0.0014, 'grad_norm': 0.07325235119197596, 'learning_rate': 8.919999999999999e-08, 'completion_length': 55.42857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.035888671875, 'epoch': 0.91} 91%|█████████ | 2277/2500 [9:00:57<52:18, 14.07s/it] 91%|█████████ | 2278/2500 [9:01:11<52:08, 14.09s/it] {'loss': 0.0006, 'grad_norm': 0.05414472613620158, 'learning_rate': 8.88e-08, 'completion_length': 54.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01593017578125, 'epoch': 0.91} 91%|█████████ | 2278/2500 [9:01:11<52:08, 14.09s/it] 91%|█████████ | 2279/2500 [9:01:25<51:47, 14.06s/it] {'loss': 0.0005, 'grad_norm': 0.05431973699380077, 'learning_rate': 8.84e-08, 'completion_length': 60.464290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01300048828125, 'epoch': 0.91} 91%|█████████ | 2279/2500 [9:01:25<51:47, 14.06s/it] 91%|█████████ | 2280/2500 [9:01:41<53:39, 14.64s/it] {'loss': 0.001, 'grad_norm': 0.05377873600461697, 'learning_rate': 8.8e-08, 'completion_length': 62.107147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02508544921875, 'epoch': 0.91} 91%|█████████ | 2280/2500 [9:01:41<53:39, 14.64s/it] 91%|█████████ | 2281/2500 [9:01:55<52:27, 14.37s/it] {'loss': 0.0014, 'grad_norm': 1.9041059043877928, 'learning_rate': 8.759999999999999e-08, 'completion_length': 58.42857551574707, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.034912109375, 'epoch': 0.91} 91%|█████████ | 2281/2500 [9:01:55<52:27, 14.37s/it] 91%|█████████▏| 2282/2500 [9:02:09<52:17, 14.39s/it] {'loss': 0.0007, 'grad_norm': 0.08662890596421166, 'learning_rate': 8.72e-08, 'completion_length': 58.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018768310546875, 'epoch': 0.91} 91%|█████████▏| 2282/2500 [9:02:09<52:17, 14.39s/it] 91%|█████████▏| 2283/2500 [9:02:24<52:07, 14.41s/it] {'loss': 0.0008, 'grad_norm': 0.03903357349146185, 'learning_rate': 8.68e-08, 'completion_length': 64.25000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01971435546875, 'epoch': 0.91} 91%|█████████▏| 2283/2500 [9:02:24<52:07, 14.41s/it] 91%|█████████▏| 2284/2500 [9:02:40<53:45, 14.93s/it] {'loss': 0.0007, 'grad_norm': 0.12261877158814022, 'learning_rate': 8.64e-08, 'completion_length': 56.58928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018310546875, 'epoch': 0.91} 91%|█████████▏| 2284/2500 [9:02:40<53:45, 14.93s/it] 91%|█████████▏| 2285/2500 [9:02:55<54:13, 15.13s/it] {'loss': 0.001, 'grad_norm': 0.06611722860594754, 'learning_rate': 8.599999999999999e-08, 'completion_length': 58.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02423095703125, 'epoch': 0.91} 91%|█████████▏| 2285/2500 [9:02:55<54:13, 15.13s/it] 91%|█████████▏| 2286/2500 [9:03:10<52:57, 14.85s/it] {'loss': 0.0012, 'grad_norm': 0.14076509723324301, 'learning_rate': 8.559999999999999e-08, 'completion_length': 58.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0311279296875, 'epoch': 0.91} 91%|█████████▏| 2286/2500 [9:03:10<52:57, 14.85s/it] 91%|█████████▏| 2287/2500 [9:03:24<51:52, 14.61s/it] {'loss': 0.0011, 'grad_norm': 1.3112380929192633, 'learning_rate': 8.52e-08, 'completion_length': 60.410715103149414, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02716064453125, 'epoch': 0.91} 91%|█████████▏| 2287/2500 [9:03:24<51:52, 14.61s/it] 92%|█████████▏| 2288/2500 [9:03:38<51:36, 14.60s/it] {'loss': 0.0008, 'grad_norm': 0.07156564560058622, 'learning_rate': 8.479999999999999e-08, 'completion_length': 58.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019287109375, 'epoch': 0.92} 92%|█████████▏| 2288/2500 [9:03:38<51:36, 14.60s/it] 92%|█████████▏| 2289/2500 [9:03:52<50:48, 14.45s/it] {'loss': 0.0011, 'grad_norm': 0.053453907603775164, 'learning_rate': 8.44e-08, 'completion_length': 64.26786041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027374267578125, 'epoch': 0.92} 92%|█████████▏| 2289/2500 [9:03:52<50:48, 14.45s/it] 92%|█████████▏| 2290/2500 [9:04:07<50:45, 14.50s/it] {'loss': 0.0008, 'grad_norm': 0.09679203023884658, 'learning_rate': 8.4e-08, 'completion_length': 57.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0205078125, 'epoch': 0.92} 92%|█████████▏| 2290/2500 [9:04:07<50:45, 14.50s/it] 92%|█████████▏| 2291/2500 [9:04:21<49:41, 14.27s/it] {'loss': 0.0009, 'grad_norm': 0.0650957439709445, 'learning_rate': 8.36e-08, 'completion_length': 61.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02215576171875, 'epoch': 0.92} 92%|█████████▏| 2291/2500 [9:04:21<49:41, 14.27s/it] 92%|█████████▏| 2292/2500 [9:04:36<50:09, 14.47s/it] {'loss': 0.0004, 'grad_norm': 0.07014435781867143, 'learning_rate': 8.319999999999999e-08, 'completion_length': 63.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0087432861328125, 'epoch': 0.92} 92%|█████████▏| 2292/2500 [9:04:36<50:09, 14.47s/it] 92%|█████████▏| 2293/2500 [9:04:50<50:00, 14.49s/it] {'loss': 0.0011, 'grad_norm': 0.06025589562156579, 'learning_rate': 8.28e-08, 'completion_length': 57.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027496337890625, 'epoch': 0.92} 92%|█████████▏| 2293/2500 [9:04:50<50:00, 14.49s/it] 92%|█████████▏| 2294/2500 [9:05:04<49:10, 14.32s/it] {'loss': 0.0012, 'grad_norm': 0.07998917455407073, 'learning_rate': 8.24e-08, 'completion_length': 55.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0294189453125, 'epoch': 0.92} 92%|█████████▏| 2294/2500 [9:05:04<49:10, 14.32s/it] 92%|█████████▏| 2295/2500 [9:05:19<50:00, 14.64s/it] {'loss': 0.0016, 'grad_norm': 0.05969164490854028, 'learning_rate': 8.2e-08, 'completion_length': 59.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03955078125, 'epoch': 0.92} 92%|█████████▏| 2295/2500 [9:05:19<50:00, 14.64s/it] 92%|█████████▏| 2296/2500 [9:05:34<49:32, 14.57s/it] {'loss': 0.0011, 'grad_norm': 0.05200809615388933, 'learning_rate': 8.16e-08, 'completion_length': 57.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0274658203125, 'epoch': 0.92} 92%|█████████▏| 2296/2500 [9:05:34<49:32, 14.57s/it] 92%|█████████▏| 2297/2500 [9:05:47<48:14, 14.26s/it] {'loss': 0.0011, 'grad_norm': 0.09994429369140599, 'learning_rate': 8.119999999999999e-08, 'completion_length': 50.46428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0264892578125, 'epoch': 0.92} 92%|█████████▏| 2297/2500 [9:05:47<48:14, 14.26s/it] 92%|█████████▏| 2298/2500 [9:06:07<53:07, 15.78s/it] {'loss': 0.0011, 'grad_norm': 2.164648604428095, 'learning_rate': 8.08e-08, 'completion_length': 66.1785717010498, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 0.9821428656578064, 'reward': 1.9285714626312256, 'reward_std': 0.11266788095235825, 'kl': 0.0279541015625, 'epoch': 0.92} 92%|█████████▏| 2298/2500 [9:06:07<53:07, 15.78s/it] 92%|█████████▏| 2299/2500 [9:06:20<50:27, 15.06s/it] {'loss': 0.0011, 'grad_norm': 0.061407029553733984, 'learning_rate': 8.039999999999999e-08, 'completion_length': 56.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02642822265625, 'epoch': 0.92} 92%|█████████▏| 2299/2500 [9:06:20<50:27, 15.06s/it] 92%|█████████▏| 2300/2500 [9:06:34<48:44, 14.62s/it] {'loss': 0.0009, 'grad_norm': 0.1721517478398587, 'learning_rate': 8e-08, 'completion_length': 53.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02142333984375, 'epoch': 0.92} 92%|█████████▏| 2300/2500 [9:06:34<48:44, 14.62s/it] 92%|█████████▏| 2301/2500 [9:07:41<1:40:42, 30.37s/it] {'loss': 0.0005, 'grad_norm': 0.069633371246512, 'learning_rate': 7.96e-08, 'completion_length': 45.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.012237548828125, 'epoch': 0.92} 92%|█████████▏| 2301/2500 [9:07:41<1:40:42, 30.37s/it] 92%|█████████▏| 2302/2500 [9:07:55<1:23:57, 25.44s/it] {'loss': 0.0007, 'grad_norm': 0.061082026111582866, 'learning_rate': 7.920000000000001e-08, 'completion_length': 61.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018035888671875, 'epoch': 0.92} 92%|█████████▏| 2302/2500 [9:07:55<1:23:57, 25.44s/it] 92%|█████████▏| 2303/2500 [9:08:09<1:12:07, 21.96s/it] {'loss': 0.0006, 'grad_norm': 0.055804507009790304, 'learning_rate': 7.879999999999999e-08, 'completion_length': 54.46428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0146636962890625, 'epoch': 0.92} 92%|█████████▏| 2303/2500 [9:08:09<1:12:07, 21.96s/it] 92%|█████████▏| 2304/2500 [9:08:22<1:03:13, 19.36s/it] {'loss': 0.0007, 'grad_norm': 0.05699077770025499, 'learning_rate': 7.839999999999999e-08, 'completion_length': 55.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018768310546875, 'epoch': 0.92} 92%|█████████▏| 2304/2500 [9:08:22<1:03:13, 19.36s/it] 92%|█████████▏| 2305/2500 [9:08:37<58:20, 17.95s/it] {'loss': 0.0016, 'grad_norm': 0.08243746049078013, 'learning_rate': 7.8e-08, 'completion_length': 63.62500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0400390625, 'epoch': 0.92} 92%|█████████▏| 2305/2500 [9:08:37<58:20, 17.95s/it] 92%|█████████▏| 2306/2500 [9:08:51<54:42, 16.92s/it] {'loss': 0.001, 'grad_norm': 0.03693263747698727, 'learning_rate': 7.76e-08, 'completion_length': 61.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0242919921875, 'epoch': 0.92} 92%|█████████▏| 2306/2500 [9:08:51<54:42, 16.92s/it] 92%|█████████▏| 2307/2500 [9:09:06<52:32, 16.33s/it] {'loss': 0.002, 'grad_norm': 0.10424329824377432, 'learning_rate': 7.72e-08, 'completion_length': 56.78571701049805, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.050048828125, 'epoch': 0.92} 92%|█████████▏| 2307/2500 [9:09:06<52:32, 16.33s/it] 92%|█████████▏| 2308/2500 [9:09:19<49:11, 15.37s/it] {'loss': 0.0009, 'grad_norm': 0.06533766793788892, 'learning_rate': 7.679999999999999e-08, 'completion_length': 51.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02294921875, 'epoch': 0.92} 92%|█████████▏| 2308/2500 [9:09:19<49:11, 15.37s/it] 92%|█████████▏| 2309/2500 [9:09:33<47:27, 14.91s/it] {'loss': 0.0008, 'grad_norm': 0.06021753776188137, 'learning_rate': 7.64e-08, 'completion_length': 56.19643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0210418701171875, 'epoch': 0.92} 92%|█████████▏| 2309/2500 [9:09:33<47:27, 14.91s/it] 92%|█████████▏| 2310/2500 [9:09:47<46:02, 14.54s/it] {'loss': 0.0011, 'grad_norm': 0.09019015234801189, 'learning_rate': 7.599999999999999e-08, 'completion_length': 53.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02630615234375, 'epoch': 0.92} 92%|█████████▏| 2310/2500 [9:09:47<46:02, 14.54s/it] 92%|█████████▏| 2311/2500 [9:10:00<44:48, 14.23s/it] {'loss': 0.001, 'grad_norm': 0.05907988174872481, 'learning_rate': 7.56e-08, 'completion_length': 53.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.024169921875, 'epoch': 0.92} 92%|█████████▏| 2311/2500 [9:10:00<44:48, 14.23s/it] 92%|█████████▏| 2312/2500 [9:10:15<45:00, 14.37s/it] {'loss': 0.001, 'grad_norm': 0.1223292288073117, 'learning_rate': 7.52e-08, 'completion_length': 64.83929061889648, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02593994140625, 'epoch': 0.92} 92%|█████████▏| 2312/2500 [9:10:15<45:00, 14.37s/it] 93%|█████████▎| 2313/2500 [9:10:28<43:56, 14.10s/it] {'loss': 0.0004, 'grad_norm': 0.03957158840410053, 'learning_rate': 7.480000000000001e-08, 'completion_length': 49.30357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0104522705078125, 'epoch': 0.93} 93%|█████████▎| 2313/2500 [9:10:28<43:56, 14.10s/it] 93%|█████████▎| 2314/2500 [9:10:42<43:30, 14.04s/it] {'loss': 0.0006, 'grad_norm': 0.12830439613298872, 'learning_rate': 7.439999999999999e-08, 'completion_length': 61.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01556396484375, 'epoch': 0.93} 93%|█████████▎| 2314/2500 [9:10:42<43:30, 14.04s/it] 93%|█████████▎| 2315/2500 [9:10:57<44:05, 14.30s/it] {'loss': 0.0005, 'grad_norm': 0.05126399001459402, 'learning_rate': 7.399999999999999e-08, 'completion_length': 63.92857551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01324462890625, 'epoch': 0.93} 93%|█████████▎| 2315/2500 [9:10:57<44:05, 14.30s/it] 93%|█████████▎| 2316/2500 [9:11:12<44:01, 14.36s/it] {'loss': 0.0017, 'grad_norm': 1.2141436765441733, 'learning_rate': 7.36e-08, 'completion_length': 56.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0435943603515625, 'epoch': 0.93} 93%|█████████▎| 2316/2500 [9:11:12<44:01, 14.36s/it] 93%|█████████▎| 2317/2500 [9:11:26<43:22, 14.22s/it] {'loss': 0.0012, 'grad_norm': 0.05172363719252381, 'learning_rate': 7.32e-08, 'completion_length': 58.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03076171875, 'epoch': 0.93} 93%|█████████▎| 2317/2500 [9:11:26<43:22, 14.22s/it] 93%|█████████▎| 2318/2500 [9:11:39<42:26, 13.99s/it] {'loss': 0.0013, 'grad_norm': 0.04386782834229738, 'learning_rate': 7.28e-08, 'completion_length': 48.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03177642822265625, 'epoch': 0.93} 93%|█████████▎| 2318/2500 [9:11:39<42:26, 13.99s/it] 93%|█████████▎| 2319/2500 [9:11:53<42:32, 14.10s/it] {'loss': 0.0008, 'grad_norm': 0.15728151077729846, 'learning_rate': 7.24e-08, 'completion_length': 56.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01995849609375, 'epoch': 0.93} 93%|█████████▎| 2319/2500 [9:11:53<42:32, 14.10s/it] 93%|█████████▎| 2320/2500 [9:12:09<43:52, 14.62s/it] {'loss': 0.0008, 'grad_norm': 0.10421706373774825, 'learning_rate': 7.2e-08, 'completion_length': 64.89286041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019195556640625, 'epoch': 0.93} 93%|█████████▎| 2320/2500 [9:12:09<43:52, 14.62s/it] 93%|█████████▎| 2321/2500 [9:12:23<42:35, 14.28s/it] {'loss': 0.0008, 'grad_norm': 0.1304604503065208, 'learning_rate': 7.159999999999999e-08, 'completion_length': 51.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02069091796875, 'epoch': 0.93} 93%|█████████▎| 2321/2500 [9:12:23<42:35, 14.28s/it] 93%|█████████▎| 2322/2500 [9:12:37<42:39, 14.38s/it] {'loss': 0.0008, 'grad_norm': 0.07696111104080733, 'learning_rate': 7.12e-08, 'completion_length': 54.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0189208984375, 'epoch': 0.93} 93%|█████████▎| 2322/2500 [9:12:37<42:39, 14.38s/it] 93%|█████████▎| 2323/2500 [9:12:51<42:10, 14.29s/it] {'loss': 0.0019, 'grad_norm': 0.05727301529146745, 'learning_rate': 7.08e-08, 'completion_length': 58.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.048095703125, 'epoch': 0.93} 93%|█████████▎| 2323/2500 [9:12:51<42:10, 14.29s/it] 93%|█████████▎| 2324/2500 [9:13:05<41:24, 14.12s/it] {'loss': 0.0013, 'grad_norm': 0.08796529856822523, 'learning_rate': 7.04e-08, 'completion_length': 61.05357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.033203125, 'epoch': 0.93} 93%|█████████▎| 2324/2500 [9:13:05<41:24, 14.12s/it] 93%|█████████▎| 2325/2500 [9:13:20<41:36, 14.27s/it] {'loss': 0.0011, 'grad_norm': 0.04179138140839959, 'learning_rate': 7e-08, 'completion_length': 57.517860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02764892578125, 'epoch': 0.93} 93%|█████████▎| 2325/2500 [9:13:20<41:36, 14.27s/it] 93%|█████████▎| 2326/2500 [9:13:33<40:52, 14.09s/it] {'loss': 0.0009, 'grad_norm': 0.05089514908923286, 'learning_rate': 6.959999999999999e-08, 'completion_length': 52.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022003173828125, 'epoch': 0.93} 93%|█████████▎| 2326/2500 [9:13:33<40:52, 14.09s/it] 93%|█████████▎| 2327/2500 [9:13:50<42:30, 14.75s/it] {'loss': 0.0013, 'grad_norm': 0.08697561240181646, 'learning_rate': 6.92e-08, 'completion_length': 61.01785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0322265625, 'epoch': 0.93} 93%|█████████▎| 2327/2500 [9:13:50<42:30, 14.75s/it] 93%|█████████▎| 2328/2500 [9:14:05<42:35, 14.86s/it] {'loss': 0.0011, 'grad_norm': 0.08260325490842056, 'learning_rate': 6.88e-08, 'completion_length': 61.053571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02655029296875, 'epoch': 0.93} 93%|█████████▎| 2328/2500 [9:14:05<42:35, 14.86s/it] 93%|█████████▎| 2329/2500 [9:14:18<41:13, 14.47s/it] {'loss': 0.0008, 'grad_norm': 0.05173731967050396, 'learning_rate': 6.84e-08, 'completion_length': 58.500003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.020782470703125, 'epoch': 0.93} 93%|█████████▎| 2329/2500 [9:14:18<41:13, 14.47s/it] 93%|█████████▎| 2330/2500 [9:14:32<40:23, 14.26s/it] {'loss': 0.0008, 'grad_norm': 0.05154180422278199, 'learning_rate': 6.8e-08, 'completion_length': 59.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0194091796875, 'epoch': 0.93} 93%|█████████▎| 2330/2500 [9:14:32<40:23, 14.26s/it] 93%|█████████▎| 2331/2500 [9:14:46<39:26, 14.00s/it] {'loss': 0.0006, 'grad_norm': 0.06297577639801175, 'learning_rate': 6.76e-08, 'completion_length': 45.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0147552490234375, 'epoch': 0.93} 93%|█████████▎| 2331/2500 [9:14:46<39:26, 14.00s/it] 93%|█████████▎| 2332/2500 [9:14:59<39:07, 13.98s/it] {'loss': 0.0005, 'grad_norm': 0.06879189251538702, 'learning_rate': 6.719999999999999e-08, 'completion_length': 65.48214721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.012451171875, 'epoch': 0.93} 93%|█████████▎| 2332/2500 [9:14:59<39:07, 13.98s/it] 93%|█████████▎| 2333/2500 [9:15:13<38:47, 13.94s/it] {'loss': 0.0007, 'grad_norm': 0.0514587733999085, 'learning_rate': 6.679999999999999e-08, 'completion_length': 62.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.018585205078125, 'epoch': 0.93} 93%|█████████▎| 2333/2500 [9:15:13<38:47, 13.94s/it] 93%|█████████▎| 2334/2500 [9:15:28<39:03, 14.12s/it] {'loss': 0.0012, 'grad_norm': 0.0487243112347703, 'learning_rate': 6.64e-08, 'completion_length': 65.85714721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02978515625, 'epoch': 0.93} 93%|█████████▎| 2334/2500 [9:15:28<39:03, 14.12s/it] 93%|█████████▎| 2335/2500 [9:15:41<38:11, 13.89s/it] {'loss': 0.0008, 'grad_norm': 0.11984235498312155, 'learning_rate': 6.6e-08, 'completion_length': 53.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01947021484375, 'epoch': 0.93} 93%|█████████▎| 2335/2500 [9:15:41<38:11, 13.89s/it] 93%|█████████▎| 2336/2500 [9:15:57<39:10, 14.33s/it] {'loss': 0.0014, 'grad_norm': 0.11370060258889839, 'learning_rate': 6.56e-08, 'completion_length': 57.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03387451171875, 'epoch': 0.93} 93%|█████████▎| 2336/2500 [9:15:57<39:10, 14.33s/it] 93%|█████████▎| 2337/2500 [9:16:12<39:37, 14.59s/it] {'loss': 0.0008, 'grad_norm': 0.05976208118148079, 'learning_rate': 6.519999999999999e-08, 'completion_length': 59.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02093505859375, 'epoch': 0.93} 93%|█████████▎| 2337/2500 [9:16:12<39:37, 14.59s/it] 94%|█████████▎| 2338/2500 [9:16:26<38:46, 14.36s/it] {'loss': 0.0005, 'grad_norm': 0.06426184476895379, 'learning_rate': 6.48e-08, 'completion_length': 57.44643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.011871337890625, 'epoch': 0.94} 94%|█████████▎| 2338/2500 [9:16:26<38:46, 14.36s/it] 94%|█████████▎| 2339/2500 [9:16:41<39:12, 14.61s/it] {'loss': 0.0016, 'grad_norm': 0.046466804373277944, 'learning_rate': 6.44e-08, 'completion_length': 59.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.041015625, 'epoch': 0.94} 94%|█████████▎| 2339/2500 [9:16:41<39:12, 14.61s/it] 94%|█████████▎| 2340/2500 [9:16:54<38:05, 14.28s/it] {'loss': 0.0004, 'grad_norm': 0.06666068877845509, 'learning_rate': 6.4e-08, 'completion_length': 55.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.00994873046875, 'epoch': 0.94} 94%|█████████▎| 2340/2500 [9:16:54<38:05, 14.28s/it] 94%|█████████▎| 2341/2500 [9:17:09<38:24, 14.49s/it] {'loss': 0.0014, 'grad_norm': 0.07194555083619086, 'learning_rate': 6.36e-08, 'completion_length': 56.80357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.034332275390625, 'epoch': 0.94} 94%|█████████▎| 2341/2500 [9:17:09<38:24, 14.49s/it] 94%|█████████▎| 2342/2500 [9:17:23<37:17, 14.16s/it] {'loss': 0.0013, 'grad_norm': 0.05941380405257556, 'learning_rate': 6.32e-08, 'completion_length': 58.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03155517578125, 'epoch': 0.94} 94%|█████████▎| 2342/2500 [9:17:23<37:17, 14.16s/it] 94%|█████████▎| 2343/2500 [9:17:36<36:39, 14.01s/it] {'loss': 0.0009, 'grad_norm': 0.0750893706420253, 'learning_rate': 6.279999999999999e-08, 'completion_length': 57.482147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.023590087890625, 'epoch': 0.94} 94%|█████████▎| 2343/2500 [9:17:36<36:39, 14.01s/it] 94%|█████████▍| 2344/2500 [9:17:50<36:20, 13.98s/it] {'loss': 0.0008, 'grad_norm': 0.04396969686768377, 'learning_rate': 6.239999999999999e-08, 'completion_length': 53.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01898193359375, 'epoch': 0.94} 94%|█████████▍| 2344/2500 [9:17:50<36:20, 13.98s/it] 94%|█████████▍| 2345/2500 [9:18:05<36:45, 14.23s/it] {'loss': 0.0011, 'grad_norm': 0.04148432859724577, 'learning_rate': 6.2e-08, 'completion_length': 70.33929061889648, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.027099609375, 'epoch': 0.94} 94%|█████████▍| 2345/2500 [9:18:05<36:45, 14.23s/it] 94%|█████████▍| 2346/2500 [9:18:19<36:21, 14.16s/it] {'loss': 0.0017, 'grad_norm': 0.09900134061826413, 'learning_rate': 6.16e-08, 'completion_length': 55.03571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0423583984375, 'epoch': 0.94} 94%|█████████▍| 2346/2500 [9:18:19<36:21, 14.16s/it] 94%|█████████▍| 2347/2500 [9:18:33<36:18, 14.24s/it] {'loss': 0.0014, 'grad_norm': 0.8933734811973313, 'learning_rate': 6.119999999999999e-08, 'completion_length': 60.76785850524902, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.03515625, 'epoch': 0.94} 94%|█████████▍| 2347/2500 [9:18:33<36:18, 14.24s/it] 94%|█████████▍| 2348/2500 [9:18:48<36:03, 14.23s/it] {'loss': 0.001, 'grad_norm': 0.07496103361307789, 'learning_rate': 6.08e-08, 'completion_length': 55.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0245361328125, 'epoch': 0.94} 94%|█████████▍| 2348/2500 [9:18:48<36:03, 14.23s/it] 94%|█████████▍| 2349/2500 [9:19:02<35:56, 14.28s/it] {'loss': 0.0007, 'grad_norm': 0.07851868584162343, 'learning_rate': 6.04e-08, 'completion_length': 50.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017974853515625, 'epoch': 0.94} 94%|█████████▍| 2349/2500 [9:19:02<35:56, 14.28s/it] 94%|█████████▍| 2350/2500 [9:19:18<36:39, 14.67s/it] {'loss': 0.0011, 'grad_norm': 1.3380288187758371, 'learning_rate': 6e-08, 'completion_length': 64.62500381469727, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.0284423828125, 'epoch': 0.94} 94%|█████████▍| 2350/2500 [9:19:18<36:39, 14.67s/it] 94%|█████████▍| 2351/2500 [9:19:34<37:45, 15.20s/it] {'loss': 0.0012, 'grad_norm': 0.06448507963170923, 'learning_rate': 5.96e-08, 'completion_length': 66.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0296630859375, 'epoch': 0.94} 94%|█████████▍| 2351/2500 [9:19:34<37:45, 15.20s/it] 94%|█████████▍| 2352/2500 [9:19:47<35:57, 14.58s/it] {'loss': 0.0008, 'grad_norm': 0.0423634133315015, 'learning_rate': 5.92e-08, 'completion_length': 52.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0201416015625, 'epoch': 0.94} 94%|█████████▍| 2352/2500 [9:19:47<35:57, 14.58s/it] 94%|█████████▍| 2353/2500 [9:20:03<36:17, 14.81s/it] {'loss': 0.0015, 'grad_norm': 0.1455554510205247, 'learning_rate': 5.88e-08, 'completion_length': 63.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.037841796875, 'epoch': 0.94} 94%|█████████▍| 2353/2500 [9:20:03<36:17, 14.81s/it] 94%|█████████▍| 2354/2500 [9:20:18<36:12, 14.88s/it] {'loss': 0.0008, 'grad_norm': 0.04670454794983477, 'learning_rate': 5.84e-08, 'completion_length': 70.01786041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019287109375, 'epoch': 0.94} 94%|█████████▍| 2354/2500 [9:20:18<36:12, 14.88s/it] 94%|█████████▍| 2355/2500 [9:20:33<36:12, 14.99s/it] {'loss': 0.0011, 'grad_norm': 0.03869429512483751, 'learning_rate': 5.8e-08, 'completion_length': 56.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0281982421875, 'epoch': 0.94} 94%|█████████▍| 2355/2500 [9:20:33<36:12, 14.99s/it] 94%|█████████▍| 2356/2500 [9:20:48<35:47, 14.91s/it] {'loss': 0.0013, 'grad_norm': 0.11609302533160597, 'learning_rate': 5.759999999999999e-08, 'completion_length': 58.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0325927734375, 'epoch': 0.94} 94%|█████████▍| 2356/2500 [9:20:48<35:47, 14.91s/it] 94%|█████████▍| 2357/2500 [9:21:02<34:55, 14.66s/it] {'loss': 0.0012, 'grad_norm': 0.07970088732361251, 'learning_rate': 5.7199999999999996e-08, 'completion_length': 59.87500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0306396484375, 'epoch': 0.94} 94%|█████████▍| 2357/2500 [9:21:02<34:55, 14.66s/it] 94%|█████████▍| 2358/2500 [9:21:16<34:27, 14.56s/it] {'loss': 0.002, 'grad_norm': 0.07220462746566005, 'learning_rate': 5.68e-08, 'completion_length': 58.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04876708984375, 'epoch': 0.94} 94%|█████████▍| 2358/2500 [9:21:16<34:27, 14.56s/it] 94%|█████████▍| 2359/2500 [9:21:29<33:26, 14.23s/it] {'loss': 0.0006, 'grad_norm': 0.0481708461258808, 'learning_rate': 5.6399999999999995e-08, 'completion_length': 53.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01470947265625, 'epoch': 0.94} 94%|█████████▍| 2359/2500 [9:21:29<33:26, 14.23s/it] 94%|█████████▍| 2360/2500 [9:21:44<33:29, 14.36s/it] {'loss': 0.0008, 'grad_norm': 5.1465519158955635, 'learning_rate': 5.6e-08, 'completion_length': 60.35714530944824, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.0714285746216774, 'kl': 0.02056884765625, 'epoch': 0.94} 94%|█████████▍| 2360/2500 [9:21:44<33:29, 14.36s/it] 94%|█████████▍| 2361/2500 [9:21:59<33:43, 14.56s/it] {'loss': 0.0009, 'grad_norm': 0.0647302306554435, 'learning_rate': 5.5599999999999995e-08, 'completion_length': 59.83928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0228271484375, 'epoch': 0.94} 94%|█████████▍| 2361/2500 [9:21:59<33:43, 14.56s/it] 94%|█████████▍| 2362/2500 [9:22:13<32:42, 14.22s/it] {'loss': 0.0012, 'grad_norm': 0.06575343629369702, 'learning_rate': 5.52e-08, 'completion_length': 56.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03045654296875, 'epoch': 0.94} 94%|█████████▍| 2362/2500 [9:22:13<32:42, 14.22s/it] 95%|█████████▍| 2363/2500 [9:22:25<31:24, 13.75s/it] {'loss': 0.0019, 'grad_norm': 0.0631132228911447, 'learning_rate': 5.48e-08, 'completion_length': 49.76785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04864501953125, 'epoch': 0.95} 95%|█████████▍| 2363/2500 [9:22:25<31:24, 13.75s/it] 95%|█████████▍| 2364/2500 [9:22:42<33:15, 14.67s/it] {'loss': 0.001, 'grad_norm': 2.2147544438774065, 'learning_rate': 5.44e-08, 'completion_length': 65.83928871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02508544921875, 'epoch': 0.95} 95%|█████████▍| 2364/2500 [9:22:42<33:15, 14.67s/it] 95%|█████████▍| 2365/2500 [9:22:57<33:28, 14.88s/it] {'loss': 0.0005, 'grad_norm': 0.11547784140220269, 'learning_rate': 5.3999999999999994e-08, 'completion_length': 66.01786041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0115966796875, 'epoch': 0.95} 95%|█████████▍| 2365/2500 [9:22:57<33:28, 14.88s/it] 95%|█████████▍| 2366/2500 [9:23:12<33:07, 14.83s/it] {'loss': 0.0016, 'grad_norm': 0.04922419196801834, 'learning_rate': 5.36e-08, 'completion_length': 51.42857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03961181640625, 'epoch': 0.95} 95%|█████████▍| 2366/2500 [9:23:12<33:07, 14.83s/it] 95%|█████████▍| 2367/2500 [9:23:25<31:37, 14.27s/it] {'loss': 0.0003, 'grad_norm': 0.03667474402921907, 'learning_rate': 5.319999999999999e-08, 'completion_length': 51.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0074462890625, 'epoch': 0.95} 95%|█████████▍| 2367/2500 [9:23:25<31:37, 14.27s/it] 95%|█████████▍| 2368/2500 [9:23:39<31:28, 14.31s/it] {'loss': 0.0006, 'grad_norm': 0.06192213816557467, 'learning_rate': 5.2799999999999996e-08, 'completion_length': 57.19643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015289306640625, 'epoch': 0.95} 95%|█████████▍| 2368/2500 [9:23:39<31:28, 14.31s/it] 95%|█████████▍| 2369/2500 [9:23:52<30:20, 13.90s/it] {'loss': 0.0019, 'grad_norm': 0.126926396344822, 'learning_rate': 5.24e-08, 'completion_length': 45.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04833984375, 'epoch': 0.95} 95%|█████████▍| 2369/2500 [9:23:52<30:20, 13.90s/it] 95%|█████████▍| 2370/2500 [9:24:06<29:50, 13.78s/it] {'loss': 0.0013, 'grad_norm': 0.0488287915075368, 'learning_rate': 5.1999999999999996e-08, 'completion_length': 54.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032470703125, 'epoch': 0.95} 95%|█████████▍| 2370/2500 [9:24:06<29:50, 13.78s/it] 95%|█████████▍| 2371/2500 [9:24:19<29:12, 13.59s/it] {'loss': 0.0015, 'grad_norm': 0.07377492094977818, 'learning_rate': 5.16e-08, 'completion_length': 48.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03759765625, 'epoch': 0.95} 95%|█████████▍| 2371/2500 [9:24:19<29:12, 13.59s/it] 95%|█████████▍| 2372/2500 [9:24:33<29:31, 13.84s/it] {'loss': 0.0011, 'grad_norm': 0.04012478669644821, 'learning_rate': 5.12e-08, 'completion_length': 53.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026611328125, 'epoch': 0.95} 95%|█████████▍| 2372/2500 [9:24:33<29:31, 13.84s/it] 95%|█████████▍| 2373/2500 [9:24:47<29:20, 13.86s/it] {'loss': 0.0014, 'grad_norm': 0.1086818930035927, 'learning_rate': 5.08e-08, 'completion_length': 59.232147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.034912109375, 'epoch': 0.95} 95%|█████████▍| 2373/2500 [9:24:47<29:20, 13.86s/it] 95%|█████████▍| 2374/2500 [9:25:00<28:35, 13.62s/it] {'loss': 0.0009, 'grad_norm': 2.806001618616178, 'learning_rate': 5.04e-08, 'completion_length': 51.71428871154785, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.02197265625, 'epoch': 0.95} 95%|█████████▍| 2374/2500 [9:25:00<28:35, 13.62s/it] 95%|█████████▌| 2375/2500 [9:25:14<28:24, 13.64s/it] {'loss': 0.001, 'grad_norm': 0.06549969320405244, 'learning_rate': 5e-08, 'completion_length': 57.58928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.025146484375, 'epoch': 0.95} 95%|█████████▌| 2375/2500 [9:25:14<28:24, 13.64s/it] 95%|█████████▌| 2376/2500 [9:25:28<28:12, 13.65s/it] {'loss': 0.0018, 'grad_norm': 0.06972727061670428, 'learning_rate': 4.9599999999999994e-08, 'completion_length': 56.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0443115234375, 'epoch': 0.95} 95%|█████████▌| 2376/2500 [9:25:28<28:12, 13.65s/it] 95%|█████████▌| 2377/2500 [9:25:42<28:25, 13.87s/it] {'loss': 0.0014, 'grad_norm': 0.07552336694718217, 'learning_rate': 4.92e-08, 'completion_length': 66.35714721679688, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.034637451171875, 'epoch': 0.95} 95%|█████████▌| 2377/2500 [9:25:42<28:25, 13.87s/it] 95%|█████████▌| 2378/2500 [9:25:56<28:23, 13.97s/it] {'loss': 0.001, 'grad_norm': 0.07948487363834106, 'learning_rate': 4.88e-08, 'completion_length': 62.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02410888671875, 'epoch': 0.95} 95%|█████████▌| 2378/2500 [9:25:56<28:23, 13.97s/it] 95%|█████████▌| 2379/2500 [9:26:10<27:40, 13.72s/it] {'loss': 0.0019, 'grad_norm': 0.05412079199769872, 'learning_rate': 4.8399999999999997e-08, 'completion_length': 50.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.047119140625, 'epoch': 0.95} 95%|█████████▌| 2379/2500 [9:26:10<27:40, 13.72s/it] 95%|█████████▌| 2380/2500 [9:26:23<27:27, 13.73s/it] {'loss': 0.0006, 'grad_norm': 0.06272705562559613, 'learning_rate': 4.8e-08, 'completion_length': 54.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0137481689453125, 'epoch': 0.95} 95%|█████████▌| 2380/2500 [9:26:23<27:27, 13.73s/it] 95%|█████████▌| 2381/2500 [9:26:37<27:14, 13.73s/it] {'loss': 0.0006, 'grad_norm': 0.03895336283153596, 'learning_rate': 4.76e-08, 'completion_length': 53.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01470947265625, 'epoch': 0.95} 95%|█████████▌| 2381/2500 [9:26:37<27:14, 13.73s/it] 95%|█████████▌| 2382/2500 [9:26:51<27:26, 13.95s/it] {'loss': 0.0009, 'grad_norm': 0.11831167010031639, 'learning_rate': 4.72e-08, 'completion_length': 57.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0225830078125, 'epoch': 0.95} 95%|█████████▌| 2382/2500 [9:26:51<27:26, 13.95s/it] 95%|█████████▌| 2383/2500 [9:27:06<27:48, 14.26s/it] {'loss': 0.0008, 'grad_norm': 0.05033694897024354, 'learning_rate': 4.68e-08, 'completion_length': 68.37500381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02044677734375, 'epoch': 0.95} 95%|█████████▌| 2383/2500 [9:27:06<27:48, 14.26s/it] 95%|█████████▌| 2384/2500 [9:27:21<27:55, 14.45s/it] {'loss': 0.0008, 'grad_norm': 0.056832680597146, 'learning_rate': 4.639999999999999e-08, 'completion_length': 66.23214721679688, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02020263671875, 'epoch': 0.95} 95%|█████████▌| 2384/2500 [9:27:21<27:55, 14.45s/it] 95%|█████████▌| 2385/2500 [9:27:36<27:35, 14.40s/it] {'loss': 0.0006, 'grad_norm': 0.11805928122739251, 'learning_rate': 4.5999999999999995e-08, 'completion_length': 52.83928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.014617919921875, 'epoch': 0.95} 95%|█████████▌| 2385/2500 [9:27:36<27:35, 14.40s/it] 95%|█████████▌| 2386/2500 [9:27:49<26:52, 14.15s/it] {'loss': 0.0006, 'grad_norm': 0.0971636563950686, 'learning_rate': 4.56e-08, 'completion_length': 55.60714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0155029296875, 'epoch': 0.95} 95%|█████████▌| 2386/2500 [9:27:49<26:52, 14.15s/it] 95%|█████████▌| 2387/2500 [9:28:02<25:54, 13.76s/it] {'loss': 0.0003, 'grad_norm': 0.04389584662502929, 'learning_rate': 4.5199999999999994e-08, 'completion_length': 47.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0074462890625, 'epoch': 0.95} 95%|█████████▌| 2387/2500 [9:28:02<25:54, 13.76s/it] 96%|█████████▌| 2388/2500 [9:28:16<25:57, 13.90s/it] {'loss': 0.0012, 'grad_norm': 0.06825594140838133, 'learning_rate': 4.48e-08, 'completion_length': 61.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02972412109375, 'epoch': 0.96} 96%|█████████▌| 2388/2500 [9:28:16<25:57, 13.90s/it] 96%|█████████▌| 2389/2500 [9:28:30<25:36, 13.85s/it] {'loss': 0.0009, 'grad_norm': 0.07106136195728227, 'learning_rate': 4.44e-08, 'completion_length': 53.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022705078125, 'epoch': 0.96} 96%|█████████▌| 2389/2500 [9:28:30<25:36, 13.85s/it] 96%|█████████▌| 2390/2500 [9:28:44<25:24, 13.86s/it] {'loss': 0.0006, 'grad_norm': 0.05423292396653097, 'learning_rate': 4.4e-08, 'completion_length': 53.50000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01611328125, 'epoch': 0.96} 96%|█████████▌| 2390/2500 [9:28:44<25:24, 13.86s/it] 96%|█████████▌| 2391/2500 [9:28:57<24:53, 13.70s/it] {'loss': 0.0006, 'grad_norm': 0.07165087660065735, 'learning_rate': 4.36e-08, 'completion_length': 53.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0137939453125, 'epoch': 0.96} 96%|█████████▌| 2391/2500 [9:28:57<24:53, 13.70s/it] 96%|█████████▌| 2392/2500 [9:29:10<24:15, 13.48s/it] {'loss': 0.0007, 'grad_norm': 0.05540391231986388, 'learning_rate': 4.32e-08, 'completion_length': 53.46428680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01861572265625, 'epoch': 0.96} 96%|█████████▌| 2392/2500 [9:29:10<24:15, 13.48s/it] 96%|█████████▌| 2393/2500 [9:29:24<24:24, 13.68s/it] {'loss': 0.0013, 'grad_norm': 0.06101835915748216, 'learning_rate': 4.279999999999999e-08, 'completion_length': 57.10714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03155517578125, 'epoch': 0.96} 96%|█████████▌| 2393/2500 [9:29:24<24:24, 13.68s/it] 96%|█████████▌| 2394/2500 [9:29:38<24:01, 13.60s/it] {'loss': 0.0014, 'grad_norm': 0.055629422924363875, 'learning_rate': 4.2399999999999996e-08, 'completion_length': 52.82143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0362548828125, 'epoch': 0.96} 96%|█████████▌| 2394/2500 [9:29:38<24:01, 13.60s/it] 96%|█████████▌| 2395/2500 [9:29:52<23:57, 13.69s/it] {'loss': 0.0004, 'grad_norm': 0.039111182450512115, 'learning_rate': 4.2e-08, 'completion_length': 54.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.009521484375, 'epoch': 0.96} 96%|█████████▌| 2395/2500 [9:29:52<23:57, 13.69s/it] 96%|█████████▌| 2396/2500 [9:30:05<23:44, 13.70s/it] {'loss': 0.001, 'grad_norm': 0.9584964027458647, 'learning_rate': 4.1599999999999995e-08, 'completion_length': 57.96428680419922, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.0257568359375, 'epoch': 0.96} 96%|█████████▌| 2396/2500 [9:30:05<23:44, 13.70s/it] 96%|█████████▌| 2397/2500 [9:30:27<27:25, 15.98s/it] {'loss': 0.0007, 'grad_norm': 0.27383826525802496, 'learning_rate': 4.12e-08, 'completion_length': 74.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016448974609375, 'epoch': 0.96} 96%|█████████▌| 2397/2500 [9:30:27<27:25, 15.98s/it] 96%|█████████▌| 2398/2500 [9:30:41<26:27, 15.56s/it] {'loss': 0.0014, 'grad_norm': 0.06794283259917551, 'learning_rate': 4.08e-08, 'completion_length': 63.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03582763671875, 'epoch': 0.96} 96%|█████████▌| 2398/2500 [9:30:41<26:27, 15.56s/it] 96%|█████████▌| 2399/2500 [9:30:58<26:54, 15.99s/it] {'loss': 0.0015, 'grad_norm': 0.08666700396734683, 'learning_rate': 4.04e-08, 'completion_length': 62.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03802490234375, 'epoch': 0.96} 96%|█████████▌| 2399/2500 [9:30:58<26:54, 15.99s/it] 96%|█████████▌| 2400/2500 [9:31:12<25:44, 15.45s/it] {'loss': 0.0005, 'grad_norm': 0.05120315476703455, 'learning_rate': 4e-08, 'completion_length': 53.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0118408203125, 'epoch': 0.96} 96%|█████████▌| 2400/2500 [9:31:12<25:44, 15.45s/it] 96%|█████████▌| 2401/2500 [9:32:26<54:20, 32.94s/it] {'loss': 0.001, 'grad_norm': 0.06627713177765937, 'learning_rate': 3.9600000000000004e-08, 'completion_length': 53.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02410888671875, 'epoch': 0.96} 96%|█████████▌| 2401/2500 [9:32:26<54:20, 32.94s/it] 96%|█████████▌| 2402/2500 [9:32:41<45:07, 27.63s/it] {'loss': 0.0016, 'grad_norm': 0.10387224516653926, 'learning_rate': 3.9199999999999994e-08, 'completion_length': 66.83928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0406494140625, 'epoch': 0.96} 96%|█████████▌| 2402/2500 [9:32:41<45:07, 27.63s/it] 96%|█████████▌| 2403/2500 [9:32:56<38:22, 23.74s/it] {'loss': 0.0013, 'grad_norm': 0.03974553143864577, 'learning_rate': 3.88e-08, 'completion_length': 60.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03240966796875, 'epoch': 0.96} 96%|█████████▌| 2403/2500 [9:32:56<38:22, 23.74s/it] 96%|█████████▌| 2404/2500 [9:33:10<33:09, 20.73s/it] {'loss': 0.001, 'grad_norm': 0.06803335378623943, 'learning_rate': 3.839999999999999e-08, 'completion_length': 58.26785850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02557373046875, 'epoch': 0.96} 96%|█████████▌| 2404/2500 [9:33:10<33:09, 20.73s/it] 96%|█████████▌| 2405/2500 [9:33:27<30:59, 19.57s/it] {'loss': 0.0005, 'grad_norm': 0.05364347439381113, 'learning_rate': 3.7999999999999996e-08, 'completion_length': 62.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.011871337890625, 'epoch': 0.96} 96%|█████████▌| 2405/2500 [9:33:27<30:59, 19.57s/it] 96%|█████████▌| 2406/2500 [9:33:40<27:57, 17.84s/it] {'loss': 0.0008, 'grad_norm': 1.0264057210388857, 'learning_rate': 3.76e-08, 'completion_length': 53.69643211364746, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.021240234375, 'epoch': 0.96} 96%|█████████▌| 2406/2500 [9:33:40<27:57, 17.84s/it] 96%|█████████▋| 2407/2500 [9:33:56<26:41, 17.22s/it] {'loss': 0.0006, 'grad_norm': 0.049536776592118754, 'learning_rate': 3.7199999999999996e-08, 'completion_length': 64.17857551574707, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.013885498046875, 'epoch': 0.96} 96%|█████████▋| 2407/2500 [9:33:56<26:41, 17.22s/it] 96%|█████████▋| 2408/2500 [9:34:10<24:54, 16.24s/it] {'loss': 0.0008, 'grad_norm': 0.09705668513749267, 'learning_rate': 3.68e-08, 'completion_length': 56.92857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0206298828125, 'epoch': 0.96} 96%|█████████▋| 2408/2500 [9:34:10<24:54, 16.24s/it] 96%|█████████▋| 2409/2500 [9:34:23<23:06, 15.23s/it] {'loss': 0.0011, 'grad_norm': 0.04891834725270609, 'learning_rate': 3.64e-08, 'completion_length': 52.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02630615234375, 'epoch': 0.96} 96%|█████████▋| 2409/2500 [9:34:23<23:06, 15.23s/it] 96%|█████████▋| 2410/2500 [9:34:36<21:52, 14.58s/it] {'loss': 0.0012, 'grad_norm': 0.042166604140208334, 'learning_rate': 3.6e-08, 'completion_length': 46.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03094482421875, 'epoch': 0.96} 96%|█████████▋| 2410/2500 [9:34:36<21:52, 14.58s/it] 96%|█████████▋| 2411/2500 [9:34:50<21:25, 14.45s/it] {'loss': 0.0012, 'grad_norm': 0.06457623334909696, 'learning_rate': 3.56e-08, 'completion_length': 54.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02923583984375, 'epoch': 0.96} 96%|█████████▋| 2411/2500 [9:34:50<21:25, 14.45s/it] 96%|█████████▋| 2412/2500 [9:35:05<21:08, 14.42s/it] {'loss': 0.0008, 'grad_norm': 0.09724476355483642, 'learning_rate': 3.52e-08, 'completion_length': 56.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02008056640625, 'epoch': 0.96} 96%|█████████▋| 2412/2500 [9:35:05<21:08, 14.42s/it] 97%|█████████▋| 2413/2500 [9:35:18<20:27, 14.11s/it] {'loss': 0.0011, 'grad_norm': 0.04226125983152054, 'learning_rate': 3.4799999999999994e-08, 'completion_length': 52.125003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02630615234375, 'epoch': 0.97} 97%|█████████▋| 2413/2500 [9:35:18<20:27, 14.11s/it] 97%|█████████▋| 2414/2500 [9:35:31<19:50, 13.84s/it] {'loss': 0.0014, 'grad_norm': 0.06527754814600724, 'learning_rate': 3.44e-08, 'completion_length': 52.73214340209961, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03436279296875, 'epoch': 0.97} 97%|█████████▋| 2414/2500 [9:35:31<19:50, 13.84s/it] 97%|█████████▋| 2415/2500 [9:35:45<19:31, 13.79s/it] {'loss': 0.0006, 'grad_norm': 0.12596691085929368, 'learning_rate': 3.4e-08, 'completion_length': 55.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.015380859375, 'epoch': 0.97} 97%|█████████▋| 2415/2500 [9:35:45<19:31, 13.79s/it] 97%|█████████▋| 2416/2500 [9:35:58<19:12, 13.72s/it] {'loss': 0.0012, 'grad_norm': 0.05564568154949949, 'learning_rate': 3.3599999999999996e-08, 'completion_length': 52.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.029541015625, 'epoch': 0.97} 97%|█████████▋| 2416/2500 [9:35:58<19:12, 13.72s/it] 97%|█████████▋| 2417/2500 [9:36:12<19:04, 13.79s/it] {'loss': 0.0006, 'grad_norm': 0.08764832716556607, 'learning_rate': 3.32e-08, 'completion_length': 54.57143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0144500732421875, 'epoch': 0.97} 97%|█████████▋| 2417/2500 [9:36:12<19:04, 13.79s/it] 97%|█████████▋| 2418/2500 [9:36:29<20:03, 14.67s/it] {'loss': 0.001, 'grad_norm': 0.07310434137629285, 'learning_rate': 3.28e-08, 'completion_length': 63.39285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02545166015625, 'epoch': 0.97} 97%|█████████▋| 2418/2500 [9:36:29<20:03, 14.67s/it] 97%|█████████▋| 2419/2500 [9:36:43<19:23, 14.36s/it] {'loss': 0.0009, 'grad_norm': 0.08224832711696971, 'learning_rate': 3.24e-08, 'completion_length': 48.53571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02197265625, 'epoch': 0.97} 97%|█████████▋| 2419/2500 [9:36:43<19:23, 14.36s/it] 97%|█████████▋| 2420/2500 [9:36:56<18:48, 14.11s/it] {'loss': 0.0015, 'grad_norm': 0.09037635667491813, 'learning_rate': 3.2e-08, 'completion_length': 59.55357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03875732421875, 'epoch': 0.97} 97%|█████████▋| 2420/2500 [9:36:56<18:48, 14.11s/it] 97%|█████████▋| 2421/2500 [9:37:10<18:13, 13.85s/it] {'loss': 0.0005, 'grad_norm': 0.04651626814010446, 'learning_rate': 3.16e-08, 'completion_length': 52.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.012451171875, 'epoch': 0.97} 97%|█████████▋| 2421/2500 [9:37:10<18:13, 13.85s/it] 97%|█████████▋| 2422/2500 [9:37:24<18:13, 14.02s/it] {'loss': 0.0009, 'grad_norm': 0.14122988146340407, 'learning_rate': 3.1199999999999995e-08, 'completion_length': 58.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021881103515625, 'epoch': 0.97} 97%|█████████▋| 2422/2500 [9:37:24<18:13, 14.02s/it] 97%|█████████▋| 2423/2500 [9:37:39<18:16, 14.24s/it] {'loss': 0.0008, 'grad_norm': 0.11698574792500192, 'learning_rate': 3.08e-08, 'completion_length': 55.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0198974609375, 'epoch': 0.97} 97%|█████████▋| 2423/2500 [9:37:39<18:16, 14.24s/it] 97%|█████████▋| 2424/2500 [9:37:52<17:39, 13.94s/it] {'loss': 0.001, 'grad_norm': 0.08055925018672311, 'learning_rate': 3.04e-08, 'completion_length': 55.82143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0244140625, 'epoch': 0.97} 97%|█████████▋| 2424/2500 [9:37:52<17:39, 13.94s/it] 97%|█████████▋| 2425/2500 [9:38:07<17:43, 14.18s/it] {'loss': 0.0012, 'grad_norm': 0.06961114410761714, 'learning_rate': 3e-08, 'completion_length': 61.16071891784668, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03106689453125, 'epoch': 0.97} 97%|█████████▋| 2425/2500 [9:38:07<17:43, 14.18s/it] 97%|█████████▋| 2426/2500 [9:38:21<17:27, 14.15s/it] {'loss': 0.0009, 'grad_norm': 0.09033263696630395, 'learning_rate': 2.96e-08, 'completion_length': 57.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022216796875, 'epoch': 0.97} 97%|█████████▋| 2426/2500 [9:38:21<17:27, 14.15s/it] 97%|█████████▋| 2427/2500 [9:38:36<17:29, 14.37s/it] {'loss': 0.0016, 'grad_norm': 0.1181372028151482, 'learning_rate': 2.92e-08, 'completion_length': 60.64285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.038818359375, 'epoch': 0.97} 97%|█████████▋| 2427/2500 [9:38:36<17:29, 14.37s/it] 97%|█████████▋| 2428/2500 [9:38:50<17:18, 14.43s/it] {'loss': 0.0004, 'grad_norm': 0.04700495182153842, 'learning_rate': 2.8799999999999996e-08, 'completion_length': 58.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01104736328125, 'epoch': 0.97} 97%|█████████▋| 2428/2500 [9:38:50<17:18, 14.43s/it] 97%|█████████▋| 2429/2500 [9:39:03<16:36, 14.04s/it] {'loss': 0.0017, 'grad_norm': 0.0741009592972157, 'learning_rate': 2.84e-08, 'completion_length': 53.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04296875, 'epoch': 0.97} 97%|█████████▋| 2429/2500 [9:39:03<16:36, 14.04s/it] 97%|█████████▋| 2430/2500 [9:39:20<17:25, 14.94s/it] {'loss': 0.0009, 'grad_norm': 2.089950997383588, 'learning_rate': 2.8e-08, 'completion_length': 65.10714530944824, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.023590087890625, 'epoch': 0.97} 97%|█████████▋| 2430/2500 [9:39:20<17:25, 14.94s/it] 97%|█████████▋| 2431/2500 [9:39:33<16:27, 14.31s/it] {'loss': 0.0016, 'grad_norm': 0.056259731094057276, 'learning_rate': 2.76e-08, 'completion_length': 51.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04022216796875, 'epoch': 0.97} 97%|█████████▋| 2431/2500 [9:39:33<16:27, 14.31s/it] 97%|█████████▋| 2432/2500 [9:39:48<16:15, 14.35s/it] {'loss': 0.0013, 'grad_norm': 0.08992620170315882, 'learning_rate': 2.72e-08, 'completion_length': 55.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03228759765625, 'epoch': 0.97} 97%|█████████▋| 2432/2500 [9:39:48<16:15, 14.35s/it] 97%|█████████▋| 2433/2500 [9:40:01<15:47, 14.14s/it] {'loss': 0.0008, 'grad_norm': 0.032280983808850035, 'learning_rate': 2.68e-08, 'completion_length': 61.71428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0211181640625, 'epoch': 0.97} 97%|█████████▋| 2433/2500 [9:40:01<15:47, 14.14s/it] 97%|█████████▋| 2434/2500 [9:40:16<15:54, 14.46s/it] {'loss': 0.0012, 'grad_norm': 0.07880923077915342, 'learning_rate': 2.6399999999999998e-08, 'completion_length': 58.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0302734375, 'epoch': 0.97} 97%|█████████▋| 2434/2500 [9:40:16<15:54, 14.46s/it] 97%|█████████▋| 2435/2500 [9:40:31<15:31, 14.33s/it] {'loss': 0.001, 'grad_norm': 0.07254885747453377, 'learning_rate': 2.5999999999999998e-08, 'completion_length': 65.12500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02618408203125, 'epoch': 0.97} 97%|█████████▋| 2435/2500 [9:40:31<15:31, 14.33s/it] 97%|█████████▋| 2436/2500 [9:40:44<15:05, 14.14s/it] {'loss': 0.0013, 'grad_norm': 0.1817797863125862, 'learning_rate': 2.56e-08, 'completion_length': 56.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03363037109375, 'epoch': 0.97} 97%|█████████▋| 2436/2500 [9:40:44<15:05, 14.14s/it] 97%|█████████▋| 2437/2500 [9:41:00<15:13, 14.50s/it] {'loss': 0.0009, 'grad_norm': 0.06272268244442664, 'learning_rate': 2.52e-08, 'completion_length': 62.625003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.022705078125, 'epoch': 0.97} 97%|█████████▋| 2437/2500 [9:41:00<15:13, 14.50s/it] 98%|█████████▊| 2438/2500 [9:41:13<14:45, 14.28s/it] {'loss': 0.0015, 'grad_norm': 0.08416371325802842, 'learning_rate': 2.4799999999999997e-08, 'completion_length': 58.08928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03857421875, 'epoch': 0.98} 98%|█████████▊| 2438/2500 [9:41:13<14:45, 14.28s/it] 98%|█████████▊| 2439/2500 [9:41:27<14:18, 14.08s/it] {'loss': 0.0012, 'grad_norm': 0.08503854921874754, 'learning_rate': 2.44e-08, 'completion_length': 63.05357551574707, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03057861328125, 'epoch': 0.98} 98%|█████████▊| 2439/2500 [9:41:27<14:18, 14.08s/it] 98%|█████████▊| 2440/2500 [9:41:41<14:03, 14.07s/it] {'loss': 0.0012, 'grad_norm': 0.08955380461564967, 'learning_rate': 2.4e-08, 'completion_length': 53.28571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03082275390625, 'epoch': 0.98} 98%|█████████▊| 2440/2500 [9:41:41<14:03, 14.07s/it] 98%|█████████▊| 2441/2500 [9:41:56<13:59, 14.22s/it] {'loss': 0.0017, 'grad_norm': 0.11105813996934809, 'learning_rate': 2.36e-08, 'completion_length': 62.78571701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0428466796875, 'epoch': 0.98} 98%|█████████▊| 2441/2500 [9:41:56<13:59, 14.22s/it] 98%|█████████▊| 2442/2500 [9:42:09<13:36, 14.08s/it] {'loss': 0.0008, 'grad_norm': 0.12867460826981275, 'learning_rate': 2.3199999999999996e-08, 'completion_length': 55.25000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0191650390625, 'epoch': 0.98} 98%|█████████▊| 2442/2500 [9:42:09<13:36, 14.08s/it] 98%|█████████▊| 2443/2500 [9:42:23<13:14, 13.94s/it] {'loss': 0.0013, 'grad_norm': 0.0464351111506738, 'learning_rate': 2.28e-08, 'completion_length': 55.89285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.98} 98%|█████████▊| 2443/2500 [9:42:23<13:14, 13.94s/it] 98%|█████████▊| 2444/2500 [9:42:38<13:27, 14.42s/it] {'loss': 0.0008, 'grad_norm': 0.6426445827563675, 'learning_rate': 2.24e-08, 'completion_length': 69.12500190734863, 'rewards/accuracy_reward': 0.9821428656578064, 'rewards/format_reward': 1.0, 'reward': 1.9821429252624512, 'reward_std': 0.0357142873108387, 'kl': 0.020751953125, 'epoch': 0.98} 98%|█████████▊| 2444/2500 [9:42:38<13:27, 14.42s/it] 98%|█████████▊| 2445/2500 [9:42:52<12:55, 14.10s/it] {'loss': 0.0013, 'grad_norm': 0.04663410880683586, 'learning_rate': 2.2e-08, 'completion_length': 62.05357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031494140625, 'epoch': 0.98} 98%|█████████▊| 2445/2500 [9:42:52<12:55, 14.10s/it] 98%|█████████▊| 2446/2500 [9:43:05<12:26, 13.82s/it] {'loss': 0.0014, 'grad_norm': 0.04570271298408026, 'learning_rate': 2.16e-08, 'completion_length': 51.160715103149414, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03515625, 'epoch': 0.98} 98%|█████████▊| 2446/2500 [9:43:05<12:26, 13.82s/it] 98%|█████████▊| 2447/2500 [9:43:18<12:07, 13.73s/it] {'loss': 0.0007, 'grad_norm': 0.05719723929474366, 'learning_rate': 2.1199999999999998e-08, 'completion_length': 50.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.016845703125, 'epoch': 0.98} 98%|█████████▊| 2447/2500 [9:43:19<12:07, 13.73s/it] 98%|█████████▊| 2448/2500 [9:43:32<11:49, 13.65s/it] {'loss': 0.0016, 'grad_norm': 0.2277663713660961, 'learning_rate': 2.0799999999999998e-08, 'completion_length': 51.267860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.040283203125, 'epoch': 0.98} 98%|█████████▊| 2448/2500 [9:43:32<11:49, 13.65s/it] 98%|█████████▊| 2449/2500 [9:43:47<11:54, 14.00s/it] {'loss': 0.0013, 'grad_norm': 0.14990288158073972, 'learning_rate': 2.04e-08, 'completion_length': 67.07143020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03167724609375, 'epoch': 0.98} 98%|█████████▊| 2449/2500 [9:43:47<11:54, 14.00s/it] 98%|█████████▊| 2450/2500 [9:44:01<11:45, 14.11s/it] {'loss': 0.0011, 'grad_norm': 0.14087476598937979, 'learning_rate': 2e-08, 'completion_length': 53.37500190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0269775390625, 'epoch': 0.98} 98%|█████████▊| 2450/2500 [9:44:01<11:45, 14.11s/it] 98%|█████████▊| 2451/2500 [9:44:16<11:38, 14.25s/it] {'loss': 0.0009, 'grad_norm': 0.08889900001925512, 'learning_rate': 1.9599999999999997e-08, 'completion_length': 58.142860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021636962890625, 'epoch': 0.98} 98%|█████████▊| 2451/2500 [9:44:16<11:38, 14.25s/it] 98%|█████████▊| 2452/2500 [9:44:30<11:20, 14.17s/it] {'loss': 0.0012, 'grad_norm': 0.054985801090086786, 'learning_rate': 1.9199999999999997e-08, 'completion_length': 62.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0286865234375, 'epoch': 0.98} 98%|█████████▊| 2452/2500 [9:44:30<11:20, 14.17s/it] 98%|█████████▊| 2453/2500 [9:44:42<10:40, 13.63s/it] {'loss': 0.0012, 'grad_norm': 0.06556716762725968, 'learning_rate': 1.88e-08, 'completion_length': 47.375003814697266, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03021240234375, 'epoch': 0.98} 98%|█████████▊| 2453/2500 [9:44:42<10:40, 13.63s/it] 98%|█████████▊| 2454/2500 [9:44:56<10:35, 13.82s/it] {'loss': 0.0017, 'grad_norm': 0.12669339265281312, 'learning_rate': 1.84e-08, 'completion_length': 56.44643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0413818359375, 'epoch': 0.98} 98%|█████████▊| 2454/2500 [9:44:56<10:35, 13.82s/it] 98%|█████████▊| 2455/2500 [9:45:11<10:35, 14.12s/it] {'loss': 0.0007, 'grad_norm': 0.10548618421735785, 'learning_rate': 1.8e-08, 'completion_length': 61.07143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01861572265625, 'epoch': 0.98} 98%|█████████▊| 2455/2500 [9:45:11<10:35, 14.12s/it] 98%|█████████▊| 2456/2500 [9:45:26<10:27, 14.27s/it] {'loss': 0.0018, 'grad_norm': 0.05527606122128575, 'learning_rate': 1.76e-08, 'completion_length': 61.57143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0445556640625, 'epoch': 0.98} 98%|█████████▊| 2456/2500 [9:45:26<10:27, 14.27s/it] 98%|█████████▊| 2457/2500 [9:45:39<10:02, 14.00s/it] {'loss': 0.001, 'grad_norm': 0.0696657386552902, 'learning_rate': 1.72e-08, 'completion_length': 52.14285850524902, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02587890625, 'epoch': 0.98} 98%|█████████▊| 2457/2500 [9:45:39<10:02, 14.00s/it] 98%|█████████▊| 2458/2500 [9:45:53<09:41, 13.84s/it] {'loss': 0.0015, 'grad_norm': 0.06747313880815775, 'learning_rate': 1.6799999999999998e-08, 'completion_length': 50.17857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0384521484375, 'epoch': 0.98} 98%|█████████▊| 2458/2500 [9:45:53<09:41, 13.84s/it] 98%|█████████▊| 2459/2500 [9:46:07<09:35, 14.04s/it] {'loss': 0.0013, 'grad_norm': 0.23126927632158542, 'learning_rate': 1.64e-08, 'completion_length': 58.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.031982421875, 'epoch': 0.98} 98%|█████████▊| 2459/2500 [9:46:07<09:35, 14.04s/it] 98%|█████████▊| 2460/2500 [9:46:21<09:13, 13.84s/it] {'loss': 0.0016, 'grad_norm': 0.10624925071412179, 'learning_rate': 1.6e-08, 'completion_length': 55.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03875732421875, 'epoch': 0.98} 98%|█████████▊| 2460/2500 [9:46:21<09:13, 13.84s/it] 98%|█████████▊| 2461/2500 [9:46:35<09:07, 14.04s/it] {'loss': 0.0008, 'grad_norm': 0.05781238268438955, 'learning_rate': 1.5599999999999997e-08, 'completion_length': 59.017860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0207061767578125, 'epoch': 0.98} 98%|█████████▊| 2461/2500 [9:46:35<09:07, 14.04s/it] 98%|█████████▊| 2462/2500 [9:46:49<08:57, 14.15s/it] {'loss': 0.001, 'grad_norm': 0.05145442784560932, 'learning_rate': 1.52e-08, 'completion_length': 57.73214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0239715576171875, 'epoch': 0.98} 98%|█████████▊| 2462/2500 [9:46:49<08:57, 14.15s/it] 99%|█████████▊| 2463/2500 [9:47:04<08:51, 14.36s/it] {'loss': 0.0015, 'grad_norm': 0.06399109150097622, 'learning_rate': 1.48e-08, 'completion_length': 56.67857360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.037689208984375, 'epoch': 0.99} 99%|█████████▊| 2463/2500 [9:47:04<08:51, 14.36s/it] 99%|█████████▊| 2464/2500 [9:47:18<08:30, 14.18s/it] {'loss': 0.0017, 'grad_norm': 0.07406714608536967, 'learning_rate': 1.4399999999999998e-08, 'completion_length': 58.33928680419922, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04248046875, 'epoch': 0.99} 99%|█████████▊| 2464/2500 [9:47:18<08:30, 14.18s/it] 99%|█████████▊| 2465/2500 [9:47:32<08:13, 14.11s/it] {'loss': 0.0013, 'grad_norm': 0.06686571401280055, 'learning_rate': 1.4e-08, 'completion_length': 58.33928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0318603515625, 'epoch': 0.99} 99%|█████████▊| 2465/2500 [9:47:32<08:13, 14.11s/it] 99%|█████████▊| 2466/2500 [9:47:47<08:12, 14.48s/it] {'loss': 0.0017, 'grad_norm': 0.05616607971832614, 'learning_rate': 1.36e-08, 'completion_length': 48.48214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.04248046875, 'epoch': 0.99} 99%|█████████▊| 2466/2500 [9:47:47<08:12, 14.48s/it] 99%|█████████▊| 2467/2500 [9:48:01<07:47, 14.16s/it] {'loss': 0.0009, 'grad_norm': 0.10404763245077363, 'learning_rate': 1.3199999999999999e-08, 'completion_length': 55.85714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0220947265625, 'epoch': 0.99} 99%|█████████▊| 2467/2500 [9:48:01<07:47, 14.16s/it] 99%|█████████▊| 2468/2500 [9:48:15<07:37, 14.30s/it] {'loss': 0.0008, 'grad_norm': 0.06588418311577017, 'learning_rate': 1.28e-08, 'completion_length': 65.01786041259766, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02008056640625, 'epoch': 0.99} 99%|█████████▊| 2468/2500 [9:48:15<07:37, 14.30s/it] 99%|█████████▉| 2469/2500 [9:48:30<07:27, 14.44s/it] {'loss': 0.0015, 'grad_norm': 0.046071384795165975, 'learning_rate': 1.2399999999999999e-08, 'completion_length': 58.44643211364746, 'rewards/accuracy_reward': 0.9285714626312256, 'rewards/format_reward': 1.0, 'reward': 1.9285714626312256, 'reward_std': 0.0, 'kl': 0.037109375, 'epoch': 0.99} 99%|█████████▉| 2469/2500 [9:48:30<07:27, 14.44s/it] 99%|█████████▉| 2470/2500 [9:48:44<07:06, 14.22s/it] {'loss': 0.0009, 'grad_norm': 0.044302805585372036, 'learning_rate': 1.2e-08, 'completion_length': 61.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02227783203125, 'epoch': 0.99} 99%|█████████▉| 2470/2500 [9:48:44<07:06, 14.22s/it] 99%|█████████▉| 2471/2500 [9:48:57<06:47, 14.04s/it] {'loss': 0.0008, 'grad_norm': 0.04919638359048527, 'learning_rate': 1.1599999999999998e-08, 'completion_length': 53.91071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01934814453125, 'epoch': 0.99} 99%|█████████▉| 2471/2500 [9:48:57<06:47, 14.04s/it] 99%|█████████▉| 2472/2500 [9:49:10<06:24, 13.73s/it] {'loss': 0.0014, 'grad_norm': 0.08367753963876007, 'learning_rate': 1.12e-08, 'completion_length': 47.767860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03582763671875, 'epoch': 0.99} 99%|█████████▉| 2472/2500 [9:49:10<06:24, 13.73s/it] 99%|█████████▉| 2473/2500 [9:49:24<06:09, 13.68s/it] {'loss': 0.0007, 'grad_norm': 0.07450781429830901, 'learning_rate': 1.08e-08, 'completion_length': 55.32143211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017791748046875, 'epoch': 0.99} 99%|█████████▉| 2473/2500 [9:49:24<06:09, 13.68s/it] 99%|█████████▉| 2474/2500 [9:49:37<05:52, 13.55s/it] {'loss': 0.0012, 'grad_norm': 0.08767829787379937, 'learning_rate': 1.0399999999999999e-08, 'completion_length': 50.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03082275390625, 'epoch': 0.99} 99%|█████████▉| 2474/2500 [9:49:37<05:52, 13.55s/it] 99%|█████████▉| 2475/2500 [9:49:51<05:37, 13.49s/it] {'loss': 0.0012, 'grad_norm': 0.35933392010791243, 'learning_rate': 1e-08, 'completion_length': 52.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0311279296875, 'epoch': 0.99} 99%|█████████▉| 2475/2500 [9:49:51<05:37, 13.49s/it] 99%|█████████▉| 2476/2500 [9:50:05<05:28, 13.70s/it] {'loss': 0.0011, 'grad_norm': 0.10503607579256417, 'learning_rate': 9.599999999999998e-09, 'completion_length': 66.75000381469727, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.028533935546875, 'epoch': 0.99} 99%|█████████▉| 2476/2500 [9:50:05<05:28, 13.70s/it] 99%|█████████▉| 2477/2500 [9:50:18<05:14, 13.68s/it] {'loss': 0.0008, 'grad_norm': 0.04928181215680156, 'learning_rate': 9.2e-09, 'completion_length': 60.857147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.019287109375, 'epoch': 0.99} 99%|█████████▉| 2477/2500 [9:50:18<05:14, 13.68s/it] 99%|█████████▉| 2478/2500 [9:50:33<05:09, 14.08s/it] {'loss': 0.0011, 'grad_norm': 0.04510773707770444, 'learning_rate': 8.8e-09, 'completion_length': 63.857147216796875, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02716064453125, 'epoch': 0.99} 99%|█████████▉| 2478/2500 [9:50:33<05:09, 14.08s/it] 99%|█████████▉| 2479/2500 [9:50:49<05:02, 14.40s/it] {'loss': 0.001, 'grad_norm': 0.0569848971702345, 'learning_rate': 8.399999999999999e-09, 'completion_length': 60.80357360839844, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.026123046875, 'epoch': 0.99} 99%|█████████▉| 2479/2500 [9:50:49<05:02, 14.40s/it] 99%|█████████▉| 2480/2500 [9:51:05<04:57, 14.88s/it] {'loss': 0.0012, 'grad_norm': 0.10432361895884448, 'learning_rate': 8e-09, 'completion_length': 63.69643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0306396484375, 'epoch': 0.99} 99%|█████████▉| 2480/2500 [9:51:05<04:57, 14.88s/it] 99%|█████████▉| 2481/2500 [9:51:21<04:53, 15.46s/it] {'loss': 0.0007, 'grad_norm': 0.08718817788686978, 'learning_rate': 7.6e-09, 'completion_length': 65.98214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.017608642578125, 'epoch': 0.99} 99%|█████████▉| 2481/2500 [9:51:21<04:53, 15.46s/it] 99%|█████████▉| 2482/2500 [9:51:35<04:27, 14.87s/it] {'loss': 0.0008, 'grad_norm': 0.04258851283074975, 'learning_rate': 7.199999999999999e-09, 'completion_length': 56.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0206298828125, 'epoch': 0.99} 99%|█████████▉| 2482/2500 [9:51:35<04:27, 14.87s/it] 99%|█████████▉| 2483/2500 [9:51:49<04:10, 14.74s/it] {'loss': 0.0006, 'grad_norm': 0.05788386693849491, 'learning_rate': 6.8e-09, 'completion_length': 63.35714530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.013946533203125, 'epoch': 0.99} 99%|█████████▉| 2483/2500 [9:51:49<04:10, 14.74s/it] 99%|█████████▉| 2484/2500 [9:52:03<03:50, 14.40s/it] {'loss': 0.001, 'grad_norm': 2.1482029100363915, 'learning_rate': 6.4e-09, 'completion_length': 54.87500190734863, 'rewards/accuracy_reward': 0.9642857313156128, 'rewards/format_reward': 1.0, 'reward': 1.9642857313156128, 'reward_std': 0.04123930633068085, 'kl': 0.02520751953125, 'epoch': 0.99} 99%|█████████▉| 2484/2500 [9:52:03<03:50, 14.40s/it] 99%|█████████▉| 2485/2500 [9:52:17<03:33, 14.24s/it] {'loss': 0.0011, 'grad_norm': 0.04973377554695801, 'learning_rate': 6e-09, 'completion_length': 53.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02783203125, 'epoch': 0.99} 99%|█████████▉| 2485/2500 [9:52:17<03:33, 14.24s/it] 99%|█████████▉| 2486/2500 [9:52:33<03:29, 14.93s/it] {'loss': 0.0009, 'grad_norm': 0.130850490548099, 'learning_rate': 5.6e-09, 'completion_length': 57.58928871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.021697998046875, 'epoch': 0.99} 99%|█████████▉| 2486/2500 [9:52:33<03:29, 14.93s/it] 99%|█████████▉| 2487/2500 [9:52:48<03:11, 14.72s/it] {'loss': 0.0009, 'grad_norm': 0.08771305670189225, 'learning_rate': 5.1999999999999994e-09, 'completion_length': 57.00000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0220947265625, 'epoch': 0.99} 99%|█████████▉| 2487/2500 [9:52:48<03:11, 14.72s/it] 100%|█████████▉| 2488/2500 [9:53:02<02:56, 14.71s/it] {'loss': 0.0009, 'grad_norm': 0.04179122174314287, 'learning_rate': 4.799999999999999e-09, 'completion_length': 66.23214530944824, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02239990234375, 'epoch': 1.0} 100%|█████████▉| 2488/2500 [9:53:02<02:56, 14.71s/it] 100%|█████████▉| 2489/2500 [9:53:17<02:43, 14.84s/it] {'loss': 0.0013, 'grad_norm': 0.06655408655799651, 'learning_rate': 4.4e-09, 'completion_length': 61.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.032958984375, 'epoch': 1.0} 100%|█████████▉| 2489/2500 [9:53:17<02:43, 14.84s/it] 100%|█████████▉| 2490/2500 [9:53:33<02:30, 15.02s/it] {'loss': 0.0012, 'grad_norm': 0.07969131851919331, 'learning_rate': 4e-09, 'completion_length': 63.41071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02874755859375, 'epoch': 1.0} 100%|█████████▉| 2490/2500 [9:53:33<02:30, 15.02s/it] 100%|█████████▉| 2491/2500 [9:53:47<02:13, 14.87s/it] {'loss': 0.0005, 'grad_norm': 0.10237941078872656, 'learning_rate': 3.5999999999999996e-09, 'completion_length': 51.642860412597656, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.011566162109375, 'epoch': 1.0} 100%|█████████▉| 2491/2500 [9:53:47<02:13, 14.87s/it] 100%|█████████▉| 2492/2500 [9:54:02<01:58, 14.86s/it] {'loss': 0.0009, 'grad_norm': 2.7438290006078505, 'learning_rate': 3.2e-09, 'completion_length': 60.94643020629883, 'rewards/accuracy_reward': 0.9464285969734192, 'rewards/format_reward': 1.0, 'reward': 1.9464285969734192, 'reward_std': 0.0357142873108387, 'kl': 0.02375030517578125, 'epoch': 1.0} 100%|█████████▉| 2492/2500 [9:54:02<01:58, 14.86s/it] 100%|█████████▉| 2493/2500 [9:54:17<01:44, 14.94s/it] {'loss': 0.0013, 'grad_norm': 0.0759621178898159, 'learning_rate': 2.8e-09, 'completion_length': 68.16071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.03125, 'epoch': 1.0} 100%|█████████▉| 2493/2500 [9:54:17<01:44, 14.94s/it] 100%|█████████▉| 2494/2500 [9:54:32<01:28, 14.81s/it] {'loss': 0.0011, 'grad_norm': 0.07314549453402322, 'learning_rate': 2.3999999999999996e-09, 'completion_length': 60.96428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02716064453125, 'epoch': 1.0} 100%|█████████▉| 2494/2500 [9:54:32<01:28, 14.81s/it] 100%|█████████▉| 2495/2500 [9:54:46<01:13, 14.63s/it] {'loss': 0.0004, 'grad_norm': 0.23679726955425723, 'learning_rate': 2e-09, 'completion_length': 53.94643211364746, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01123046875, 'epoch': 1.0} 100%|█████████▉| 2495/2500 [9:54:46<01:13, 14.63s/it] 100%|█████████▉| 2496/2500 [9:55:00<00:58, 14.53s/it] {'loss': 0.001, 'grad_norm': 0.24691371601337758, 'learning_rate': 1.6e-09, 'completion_length': 59.66071701049805, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02484130859375, 'epoch': 1.0} 100%|█████████▉| 2496/2500 [9:55:00<00:58, 14.53s/it] 100%|█████████▉| 2497/2500 [9:55:14<00:42, 14.32s/it] {'loss': 0.0007, 'grad_norm': 0.03932949672704467, 'learning_rate': 1.1999999999999998e-09, 'completion_length': 57.21428871154785, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01812744140625, 'epoch': 1.0} 100%|█████████▉| 2497/2500 [9:55:14<00:42, 14.32s/it] 100%|█████████▉| 2498/2500 [9:55:28<00:28, 14.24s/it] {'loss': 0.0005, 'grad_norm': 0.07408437308791482, 'learning_rate': 8e-10, 'completion_length': 52.94643020629883, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.01324462890625, 'epoch': 1.0} 100%|█████████▉| 2498/2500 [9:55:28<00:28, 14.24s/it] 100%|█████████▉| 2499/2500 [9:55:42<00:14, 14.01s/it] {'loss': 0.001, 'grad_norm': 0.06156563864092088, 'learning_rate': 4e-10, 'completion_length': 63.089290618896484, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.02386474609375, 'epoch': 1.0} 100%|█████████▉| 2499/2500 [9:55:42<00:14, 14.01s/it] 100%|██████████| 2500/2500 [9:55:56<00:00, 14.09s/it] {'loss': 0.0014, 'grad_norm': 0.0539107029005011, 'learning_rate': 0.0, 'completion_length': 59.75000190734863, 'rewards/accuracy_reward': 1.0, 'rewards/format_reward': 1.0, 'reward': 2.0, 'reward_std': 0.0, 'kl': 0.0357666015625, 'epoch': 1.0} 100%|██████████| 2500/2500 [9:55:56<00:00, 14.09s/it] {'train_runtime': 35810.0483, 'train_samples_per_second': 0.977, 'train_steps_per_second': 0.07, 'train_loss': 0.0013002554619376105, 'epoch': 1.0} 100%|██████████| 2500/2500 [9:56:49<00:00, 14.09s/it] 100%|██████████| 2500/2500 [9:56:49<00:00, 14.32s/it] wandb: wandb: 🚀 View run VLLM-Correct-Qwen2-VL-7B-GRPO-ClevrMath-35k-2025-02-13-03-38-44 at: https://wandb.ai/tanhuajie264-peking-university/vison-open-r1/runs/49fy2ac2 wandb: Find logs at: wandb/run-20250213_034120-49fy2ac2/logs