groot-bs16 / training.log
LucaFrat's picture
Add files using upload-large-folder tool
1e4f476 verified
Raw
History Blame Contribute Delete
369 kB
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/albumentations/__init__.py:13: UserWarning: A new version of Albumentations is available: 2.0.8 (you have 1.4.18). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
check_for_updates()
/home/ubuntu/Isaac-GR00T/gr00t/experiment/experiment.py:98: UserWarning: image_crop_size and image_target_size will be deprecated in the future. Please use shortest_image_edge and crop_fraction instead.
warnings.warn(
/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/albumentations/__init__.py:13: UserWarning: A new version of Albumentations is available: 2.0.8 (you have 1.4.18). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
check_for_updates()
/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/albumentations/__init__.py:13: UserWarning: A new version of Albumentations is available: 2.0.8 (you have 1.4.18). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
check_for_updates()
/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/albumentations/__init__.py:13: UserWarning: A new version of Albumentations is available: 2.0.8 (you have 1.4.18). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
check_for_updates()
/home/ubuntu/Isaac-GR00T/gr00t/experiment/experiment.py:98: UserWarning: image_crop_size and image_target_size will be deprecated in the future. Please use shortest_image_edge and crop_fraction instead.
warnings.warn(
/home/ubuntu/Isaac-GR00T/gr00t/experiment/experiment.py:98: UserWarning: image_crop_size and image_target_size will be deprecated in the future. Please use shortest_image_edge and crop_fraction instead.
warnings.warn(
/home/ubuntu/Isaac-GR00T/gr00t/experiment/experiment.py:98: UserWarning: image_crop_size and image_target_size will be deprecated in the future. Please use shortest_image_edge and crop_fraction instead.
warnings.warn(
05/27/2026 10:29:46 - INFO - Saved config to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/experiment_cfg
wandb: Currently logged in as: lucafrat (lucafrat-microsoft) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLForConditionalGeneration is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLForConditionalGeneration is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLVisionModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLForConditionalGeneration is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLVisionModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLVisionModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLTextModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLTextModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLTextModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
wandb: Tracking run with wandb version 0.23.0
wandb: Run data is saved locally in /home/ubuntu/Isaac-GR00T/wandb/run-20260527_102947-11o59yla
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run g1_finetune-20260527-102938
wandb: ⭐️ View project at https://wandb.ai/lucafrat-microsoft/groot-finetune
wandb: πŸš€ View run at https://wandb.ai/lucafrat-microsoft/groot-finetune/runs/11o59yla
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLForConditionalGeneration is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLVisionModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3VLTextModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`
/home/ubuntu/Isaac-GR00T/gr00t/model/modules/dit.py:255: FutureWarning: Accessing config attribute `compute_dtype` directly via 'AlternateVLDiT' object attribute is deprecated. Please access 'compute_dtype' over 'AlternateVLDiT's config object instead, e.g. 'unet.config.compute_dtype'.
embedding_dim=self.inner_dim, compute_dtype=self.compute_dtype
/home/ubuntu/Isaac-GR00T/gr00t/model/modules/dit.py:286: FutureWarning: Accessing config attribute `output_dim` directly via 'AlternateVLDiT' object attribute is deprecated. Please access 'output_dim' over 'AlternateVLDiT's config object instead, e.g. 'unet.config.output_dim'.
self.proj_out_2 = nn.Linear(self.inner_dim, self.output_dim)
Total number of DiT parameters: 1091722240
/home/ubuntu/Isaac-GR00T/gr00t/model/modules/dit.py:255: FutureWarning: Accessing config attribute `compute_dtype` directly via 'AlternateVLDiT' object attribute is deprecated. Please access 'compute_dtype' over 'AlternateVLDiT's config object instead, e.g. 'unet.config.compute_dtype'.
embedding_dim=self.inner_dim, compute_dtype=self.compute_dtype
/home/ubuntu/Isaac-GR00T/gr00t/model/modules/dit.py:255: FutureWarning: Accessing config attribute `compute_dtype` directly via 'AlternateVLDiT' object attribute is deprecated. Please access 'compute_dtype' over 'AlternateVLDiT's config object instead, e.g. 'unet.config.compute_dtype'.
embedding_dim=self.inner_dim, compute_dtype=self.compute_dtype
/home/ubuntu/Isaac-GR00T/gr00t/model/modules/dit.py:286: FutureWarning: Accessing config attribute `output_dim` directly via 'AlternateVLDiT' object attribute is deprecated. Please access 'output_dim' over 'AlternateVLDiT's config object instead, e.g. 'unet.config.output_dim'.
self.proj_out_2 = nn.Linear(self.inner_dim, self.output_dim)
Total number of DiT parameters: 1091722240
/home/ubuntu/Isaac-GR00T/gr00t/model/modules/dit.py:286: FutureWarning: Accessing config attribute `output_dim` directly via 'AlternateVLDiT' object attribute is deprecated. Please access 'output_dim' over 'AlternateVLDiT's config object instead, e.g. 'unet.config.output_dim'.
self.proj_out_2 = nn.Linear(self.inner_dim, self.output_dim)
Total number of DiT parameters: 1091722240
/home/ubuntu/Isaac-GR00T/gr00t/model/modules/dit.py:255: FutureWarning: Accessing config attribute `compute_dtype` directly via 'AlternateVLDiT' object attribute is deprecated. Please access 'compute_dtype' over 'AlternateVLDiT's config object instead, e.g. 'unet.config.compute_dtype'.
embedding_dim=self.inner_dim, compute_dtype=self.compute_dtype
/home/ubuntu/Isaac-GR00T/gr00t/model/modules/dit.py:286: FutureWarning: Accessing config attribute `output_dim` directly via 'AlternateVLDiT' object attribute is deprecated. Please access 'output_dim' over 'AlternateVLDiT's config object instead, e.g. 'unet.config.output_dim'.
self.proj_out_2 = nn.Linear(self.inner_dim, self.output_dim)
Total number of DiT parameters: 1091722240
05/27/2026 10:29:51 - INFO - Using AlternateVLDiT for diffusion model
Total number of SelfAttentionTransformer parameters: 201433088
Total number of SelfAttentionTransformer parameters: 201433088
Total number of SelfAttentionTransformer parameters: 201433088
Total number of SelfAttentionTransformer parameters: 201433088
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:04<00:04, 4.09s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:04<00:04, 4.10s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:04<00:04, 4.18s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:04<00:04, 4.05s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:05<00:00, 2.60s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:05<00:00, 2.82s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:05<00:00, 2.61s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:05<00:00, 2.83s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:05<00:00, 2.64s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:05<00:00, 2.87s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:05<00:00, 2.58s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:05<00:00, 2.80s/it]
05/27/2026 10:30:00 - INFO - Total parameters: 3,144,016,000
05/27/2026 10:30:00 - INFO - Trainable parameters: 1,620,515,968 (51.54%)
Initializing datasets: 0%| | 0/1 [00:00<?, ?it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
[rank3]:[W527 10:30:03.764444723 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
Initializing datasets: 0%| | 0/1 [00:00<?, ?it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
[rank1]:[W527 10:30:03.008662679 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
Initializing datasets: 0%| | 0/1 [00:00<?, ?it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
[rank2]:[W527 10:30:03.272989398 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
Initializing datasets: 0%| | 0/1 [00:00<?, ?it/s]Generating stats for /home/ubuntu/groot-files/dataset
/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
[rank0]:[W527 10:30:04.610969276 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
Generated 64 shards for dataset /home/ubuntu/groot-files/dataset
Total steps: 64712, average shard length: 1011.125, shard length std: 17.115873480485885
Initializing datasets: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:01<00:00, 1.43s/it] Initializing datasets: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:01<00:00, 1.43s/it]
Generated 64 shards for dataset /home/ubuntu/groot-files/dataset
Total steps: 64712, average shard length: 1011.125, shard length std: 17.115873480485885
Initializing datasets: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 1.02it/s] Initializing datasets: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 1.02it/s]
Generated 64 shards for dataset /home/ubuntu/groot-files/dataset
Generated 64 shards for dataset /home/ubuntu/groot-files/dataset
Total steps: 64712, average shard length: 1011.125, shard length std: 17.115873480485885Total steps: 64712, average shard length: 1011.125, shard length std: 17.115873480485885
Initializing datasets: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:01<00:00, 1.67s/it] Initializing datasets: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:01<00:00, 1.17s/it] Initializing datasets: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:01<00:00, 1.17s/it] Initializing datasets: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:01<00:00, 1.67s/it]
05/27/2026 10:30:04 - INFO - Overriding statistics for embodiment 'unitree_g1_sonic'
05/27/2026 10:30:05 - INFO - Saved dataset statistics for inference
05/27/2026 10:30:06 - WARNING - No valid checkpoint found in output directory (/home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938)
05/27/2026 10:30:06 - INFO - πŸš€ Starting training...
05/27/2026 10:30:06 - WARNING - No valid checkpoint found in output directory (/home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938)
05/27/2026 10:30:06 - WARNING - No valid checkpoint found in output directory (/home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938)
05/27/2026 10:30:06 - WARNING - No valid checkpoint found in output directory (/home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938)
Current global step: 0
Creating custom train dataloader
Current global step: 0
Creating custom train dataloader
Current global step: 0
Creating custom train dataloader
Current global step: 0
Creating custom train dataloader
0%| | 0/4000 [00:00<?, ?it/s]Rank 1, Worker 4: Caching shard...Rank 1, Worker 1: Caching shard...
Rank 1, Worker 0: Caching shard...
Rank 1, Worker 3: Caching shard...
Rank 1, Worker 2: Caching shard...
Rank 1, Worker 5: Caching shard...
Rank 2, Worker 0: Caching shard...
Rank 2, Worker 2: Caching shard...
Rank 2, Worker 3: Caching shard...
Rank 2, Worker 5: Caching shard...
Rank 2, Worker 1: Caching shard...
Rank 2, Worker 4: Caching shard...
Rank 3, Worker 0: Caching shard...
Rank 3, Worker 1: Caching shard...
Rank 3, Worker 2: Caching shard...
Rank 3, Worker 4: Caching shard...Rank 3, Worker 3: Caching shard...
Rank 3, Worker 5: Caching shard...
Rank 0, Worker 2: Caching shard...Rank 0, Worker 3: Caching shard...
Rank 0, Worker 0: Caching shard...
Rank 0, Worker 5: Caching shard...Rank 0, Worker 4: Caching shard...
Rank 0, Worker 1: Caching shard...
Rank 2, Worker 3: Wait for shard 20 in dataset 0 in 16.63 seconds
Rank 2, Worker 3: Caching shard...
Rank 3, Worker 1: Wait for shard 61 in dataset 0 in 16.54 seconds
Rank 3, Worker 1: Caching shard...
Rank 1, Worker 3: Wait for shard 2 in dataset 0 in 16.78 seconds
Rank 1, Worker 3: Caching shard...
Rank 3, Worker 4: Wait for shard 62 in dataset 0 in 16.81 seconds
Rank 3, Worker 4: Caching shard...
Rank 2, Worker 5: Wait for shard 51 in dataset 0 in 16.91 seconds
Rank 2, Worker 5: Caching shard...
Rank 2, Worker 1: Wait for shard 32 in dataset 0 in 16.91 seconds
Rank 2, Worker 1: Caching shard...
Rank 1, Worker 2: Wait for shard 0 in dataset 0 in 16.93 seconds
Rank 1, Worker 2: Caching shard...
Rank 0, Worker 0: Wait for shard 33 in dataset 0 in 16.33 seconds
Rank 0, Worker 0: Caching shard...
Rank 3, Worker 0: Wait for shard 38 in dataset 0 in 17.00 seconds
Rank 3, Worker 0: Caching shard...
Rank 1, Worker 0: Wait for shard 46 in dataset 0 in 17.14 seconds
Rank 1, Worker 0: Caching shard...
Rank 3, Worker 2: Wait for shard 4 in dataset 0 in 17.04 seconds
Rank 3, Worker 2: Caching shard...
Rank 2, Worker 4: Wait for shard 56 in dataset 0 in 17.19 seconds
Rank 2, Worker 4: Caching shard...
Rank 1, Worker 4: Wait for shard 18 in dataset 0 in 17.25 seconds
Rank 1, Worker 4: Caching shard...
Rank 0, Worker 3: Wait for shard 22 in dataset 0 in 16.67 seconds
Rank 0, Worker 3: Caching shard...
Rank 1, Worker 5: Wait for shard 9 in dataset 0 in 17.41 seconds
Rank 1, Worker 5: Caching shard...
Rank 1, Worker 1: Wait for shard 59 in dataset 0 in 17.46 seconds
Rank 1, Worker 1: Caching shard...
Rank 3, Worker 3: Wait for shard 57 in dataset 0 in 17.37 seconds
Rank 3, Worker 3: Caching shard...
Rank 2, Worker 0: Wait for shard 10 in dataset 0 in 17.50 seconds
Rank 2, Worker 0: Caching shard...
Rank 2, Worker 2: Wait for shard 34 in dataset 0 in 17.52 seconds
Rank 2, Worker 2: Caching shard...
Rank 0, Worker 1: Wait for shard 48 in dataset 0 in 16.87 seconds
Rank 0, Worker 1: Caching shard...
Rank 0, Worker 5: Wait for shard 24 in dataset 0 in 16.96 seconds
Rank 0, Worker 5: Caching shard...
Rank 3, Worker 5: Wait for shard 55 in dataset 0 in 17.59 seconds
Rank 3, Worker 5: Caching shard...
Rank 0, Worker 2: Wait for shard 3 in dataset 0 in 17.03 seconds
Rank 0, Worker 2: Caching shard...
Rank 0, Worker 4: Wait for shard 37 in dataset 0 in 17.51 seconds
Rank 0, Worker 4: Caching shard...
Could not estimate the number of tokens of the input, floating-point operations will not be computed
Could not estimate the number of tokens of the input, floating-point operations will not be computed
Could not estimate the number of tokens of the input, floating-point operations will not be computed
Could not estimate the number of tokens of the input, floating-point operations will not be computed
0%| | 1/4000 [00:19<21:28:59, 19.34s/it] 0%| | 2/4000 [00:19<8:58:20, 8.08s/it] 0%| | 3/4000 [00:19<5:00:49, 4.52s/it] 0%| | 4/4000 [00:20<3:08:37, 2.83s/it] 0%| | 5/4000 [00:20<2:05:49, 1.89s/it] 0%| | 6/4000 [00:20<1:28:34, 1.33s/it] 0%| | 7/4000 [00:20<1:05:36, 1.01it/s] 0%| | 8/4000 [00:21<50:13, 1.32it/s] 0%| | 9/4000 [00:21<39:27, 1.69it/s] 0%| | 10/4000 [00:21<32:54, 2.02it/s] {'loss': 1.2127, 'grad_norm': 0.31839632987976074, 'learning_rate': 4.5e-06}
0%| | 10/4000 [00:21<32:54, 2.02it/s] 0%| | 11/4000 [00:21<28:07, 2.36it/s] 0%| | 12/4000 [00:22<23:48, 2.79it/s] 0%| | 13/4000 [00:22<22:19, 2.98it/s] 0%| | 14/4000 [00:22<20:11, 3.29it/s] 0%| | 15/4000 [00:22<18:40, 3.56it/s] 0%| | 16/4000 [00:22<17:13, 3.86it/s] 0%| | 17/4000 [00:23<17:14, 3.85it/s] 0%| | 18/4000 [00:23<17:26, 3.81it/s] 0%| | 19/4000 [00:23<16:47, 3.95it/s] 0%| | 20/4000 [00:24<17:02, 3.89it/s] {'loss': 1.1842, 'grad_norm': 0.19609376788139343, 'learning_rate': 9.5e-06}
0%| | 20/4000 [00:24<17:02, 3.89it/s] 1%| | 21/4000 [00:24<16:55, 3.92it/s] 1%| | 22/4000 [00:24<16:41, 3.97it/s] 1%| | 23/4000 [00:24<16:21, 4.05it/s] 1%| | 24/4000 [00:25<17:12, 3.85it/s] 1%| | 25/4000 [00:25<17:38, 3.76it/s] 1%| | 26/4000 [00:25<16:22, 4.05it/s] 1%| | 27/4000 [00:25<15:24, 4.30it/s] 1%| | 28/4000 [00:25<15:24, 4.30it/s] 1%| | 29/4000 [00:26<15:46, 4.20it/s] 1%| | 30/4000 [00:26<16:09, 4.10it/s] {'loss': 1.1838, 'grad_norm': 0.4635807275772095, 'learning_rate': 1.45e-05}
1%| | 30/4000 [00:26<16:09, 4.10it/s] 1%| | 31/4000 [00:26<15:21, 4.31it/s] 1%| | 32/4000 [00:26<15:44, 4.20it/s] 1%| | 33/4000 [00:27<16:12, 4.08it/s] 1%| | 34/4000 [00:27<16:33, 3.99it/s] 1%| | 35/4000 [00:27<16:47, 3.94it/s] 1%| | 36/4000 [00:27<16:30, 4.00it/s] 1%| | 37/4000 [00:28<15:45, 4.19it/s] 1%| | 38/4000 [00:28<16:33, 3.99it/s] 1%| | 39/4000 [00:28<16:30, 4.00it/s] 1%| | 40/4000 [00:28<16:16, 4.06it/s] {'loss': 1.1629, 'grad_norm': 0.3277273178100586, 'learning_rate': 1.9500000000000003e-05}
1%| | 40/4000 [00:28<16:16, 4.06it/s] 1%| | 41/4000 [00:29<15:39, 4.21it/s] 1%| | 42/4000 [00:29<15:49, 4.17it/s] 1%| | 43/4000 [00:29<16:36, 3.97it/s] 1%| | 44/4000 [00:29<16:54, 3.90it/s] 1%| | 45/4000 [00:30<17:12, 3.83it/s] 1%| | 46/4000 [00:30<16:32, 3.98it/s] 1%| | 47/4000 [00:30<16:11, 4.07it/s] 1%| | 48/4000 [00:30<15:54, 4.14it/s] 1%| | 49/4000 [00:31<16:49, 3.91it/s] 1%|▏ | 50/4000 [00:31<15:57, 4.12it/s] {'loss': 1.1264, 'grad_norm': 0.40489643812179565, 'learning_rate': 2.45e-05}
1%|▏ | 50/4000 [00:31<15:57, 4.12it/s] 1%|▏ | 51/4000 [00:31<15:13, 4.32it/s] 1%|▏ | 52/4000 [00:31<15:27, 4.26it/s] 1%|▏ | 53/4000 [00:32<15:38, 4.20it/s] 1%|▏ | 54/4000 [00:32<15:47, 4.16it/s] 1%|▏ | 55/4000 [00:32<15:47, 4.16it/s] 1%|▏ | 56/4000 [00:32<16:03, 4.09it/s] 1%|▏ | 57/4000 [00:33<16:04, 4.09it/s] 1%|▏ | 58/4000 [00:33<15:38, 4.20it/s] 1%|▏ | 59/4000 [00:33<15:39, 4.19it/s] 2%|▏ | 60/4000 [00:33<16:51, 3.89it/s] {'loss': 1.1117, 'grad_norm': 1.2546579837799072, 'learning_rate': 2.95e-05}
2%|▏ | 60/4000 [00:33<16:51, 3.89it/s] 2%|▏ | 61/4000 [00:34<16:39, 3.94it/s] 2%|▏ | 62/4000 [00:34<15:48, 4.15it/s] 2%|▏ | 63/4000 [00:34<15:23, 4.26it/s] 2%|▏ | 64/4000 [00:34<15:16, 4.30it/s] 2%|▏ | 65/4000 [00:34<14:36, 4.49it/s] 2%|▏ | 66/4000 [00:35<13:59, 4.68it/s] 2%|▏ | 67/4000 [00:35<13:17, 4.93it/s] 2%|▏ | 68/4000 [00:35<12:40, 5.17it/s] 2%|▏ | 69/4000 [00:35<11:58, 5.47it/s] 2%|▏ | 70/4000 [00:35<11:28, 5.71it/s] {'loss': 1.1066, 'grad_norm': 0.3789711594581604, 'learning_rate': 3.45e-05}
2%|▏ | 70/4000 [00:35<11:28, 5.71it/s] 2%|▏ | 71/4000 [00:35<11:12, 5.85it/s] 2%|▏ | 72/4000 [00:36<10:57, 5.97it/s] 2%|▏ | 73/4000 [00:36<10:47, 6.07it/s] 2%|▏ | 74/4000 [00:36<10:37, 6.15it/s] 2%|▏ | 75/4000 [00:36<10:31, 6.21it/s] 2%|▏ | 76/4000 [00:36<10:27, 6.26it/s] 2%|▏ | 77/4000 [00:36<10:24, 6.28it/s] 2%|▏ | 78/4000 [00:37<10:22, 6.30it/s] 2%|▏ | 79/4000 [00:37<10:22, 6.30it/s] 2%|▏ | 80/4000 [00:37<10:20, 6.32it/s] {'loss': 1.1082, 'grad_norm': 0.3612673878669739, 'learning_rate': 3.9500000000000005e-05}
2%|▏ | 80/4000 [00:37<10:20, 6.32it/s] 2%|▏ | 81/4000 [00:37<10:22, 6.29it/s] 2%|▏ | 82/4000 [00:37<10:21, 6.31it/s] 2%|▏ | 83/4000 [00:37<10:19, 6.32it/s] 2%|▏ | 84/4000 [00:38<10:18, 6.33it/s] 2%|▏ | 85/4000 [00:38<10:17, 6.34it/s] 2%|▏ | 86/4000 [00:38<10:17, 6.34it/s] 2%|▏ | 87/4000 [00:38<10:15, 6.35it/s] 2%|▏ | 88/4000 [00:38<10:16, 6.35it/s] 2%|▏ | 89/4000 [00:38<10:16, 6.34it/s] 2%|▏ | 90/4000 [00:38<10:16, 6.34it/s] {'loss': 1.1211, 'grad_norm': 0.28633758425712585, 'learning_rate': 4.4500000000000004e-05}
2%|▏ | 90/4000 [00:38<10:16, 6.34it/s] 2%|▏ | 91/4000 [00:39<10:18, 6.32it/s] 2%|▏ | 92/4000 [00:39<10:16, 6.34it/s] 2%|▏ | 93/4000 [00:39<10:18, 6.31it/s] 2%|▏ | 94/4000 [00:39<10:17, 6.32it/s] 2%|▏ | 95/4000 [00:39<10:16, 6.33it/s] 2%|▏ | 96/4000 [00:39<10:15, 6.34it/s] 2%|▏ | 97/4000 [00:40<10:16, 6.33it/s] 2%|▏ | 98/4000 [00:40<10:15, 6.34it/s] 2%|▏ | 99/4000 [00:40<10:15, 6.34it/s] 2%|β–Ž | 100/4000 [00:40<10:14, 6.35it/s] {'loss': 1.0998, 'grad_norm': 0.41268351674079895, 'learning_rate': 4.9500000000000004e-05}
2%|β–Ž | 100/4000 [00:40<10:14, 6.35it/s] 3%|β–Ž | 101/4000 [00:40<10:17, 6.31it/s] 3%|β–Ž | 102/4000 [00:40<10:15, 6.33it/s] 3%|β–Ž | 103/4000 [00:41<10:14, 6.34it/s] 3%|β–Ž | 104/4000 [00:41<10:14, 6.34it/s] 3%|β–Ž | 105/4000 [00:41<10:13, 6.35it/s] 3%|β–Ž | 106/4000 [00:41<10:12, 6.36it/s] 3%|β–Ž | 107/4000 [00:41<10:12, 6.36it/s] 3%|β–Ž | 108/4000 [00:41<10:11, 6.36it/s] 3%|β–Ž | 109/4000 [00:41<10:11, 6.36it/s] 3%|β–Ž | 110/4000 [00:42<10:13, 6.34it/s] {'loss': 1.1131, 'grad_norm': 0.2796061933040619, 'learning_rate': 5.45e-05}
3%|β–Ž | 110/4000 [00:42<10:13, 6.34it/s] 3%|β–Ž | 111/4000 [00:42<10:14, 6.33it/s] 3%|β–Ž | 112/4000 [00:42<10:13, 6.34it/s] 3%|β–Ž | 113/4000 [00:42<10:13, 6.34it/s] 3%|β–Ž | 114/4000 [00:42<10:12, 6.35it/s] 3%|β–Ž | 115/4000 [00:42<10:11, 6.36it/s] 3%|β–Ž | 116/4000 [00:43<10:10, 6.36it/s] 3%|β–Ž | 117/4000 [00:43<10:11, 6.35it/s] 3%|β–Ž | 118/4000 [00:43<10:11, 6.35it/s] 3%|β–Ž | 119/4000 [00:43<10:10, 6.35it/s] 3%|β–Ž | 120/4000 [00:43<10:09, 6.36it/s] {'loss': 1.1004, 'grad_norm': 0.3423110544681549, 'learning_rate': 5.95e-05}
3%|β–Ž | 120/4000 [00:43<10:09, 6.36it/s] 3%|β–Ž | 121/4000 [00:43<10:10, 6.35it/s] 3%|β–Ž | 122/4000 [00:44<10:10, 6.35it/s] 3%|β–Ž | 123/4000 [00:44<10:10, 6.35it/s] 3%|β–Ž | 124/4000 [00:44<10:10, 6.35it/s] 3%|β–Ž | 125/4000 [00:44<10:09, 6.35it/s] 3%|β–Ž | 126/4000 [00:44<10:10, 6.34it/s] 3%|β–Ž | 127/4000 [00:44<10:11, 6.33it/s] 3%|β–Ž | 128/4000 [00:44<10:10, 6.34it/s] 3%|β–Ž | 129/4000 [00:45<10:11, 6.33it/s] 3%|β–Ž | 130/4000 [00:45<10:11, 6.33it/s] {'loss': 1.0887, 'grad_norm': 0.33770301938056946, 'learning_rate': 6.450000000000001e-05}
3%|β–Ž | 130/4000 [00:45<10:11, 6.33it/s] 3%|β–Ž | 131/4000 [00:45<10:11, 6.32it/s] 3%|β–Ž | 132/4000 [00:45<10:11, 6.33it/s] 3%|β–Ž | 133/4000 [00:45<10:09, 6.34it/s] 3%|β–Ž | 134/4000 [00:45<10:08, 6.35it/s] 3%|β–Ž | 135/4000 [00:46<10:08, 6.35it/s] 3%|β–Ž | 136/4000 [00:46<10:08, 6.35it/s] 3%|β–Ž | 137/4000 [00:46<10:11, 6.32it/s] 3%|β–Ž | 138/4000 [00:46<10:10, 6.33it/s] 3%|β–Ž | 139/4000 [00:46<10:08, 6.34it/s] 4%|β–Ž | 140/4000 [00:46<10:07, 6.35it/s] {'loss': 1.1043, 'grad_norm': 0.4104395806789398, 'learning_rate': 6.95e-05}
4%|β–Ž | 140/4000 [00:46<10:07, 6.35it/s] 4%|β–Ž | 141/4000 [00:47<10:08, 6.34it/s] 4%|β–Ž | 142/4000 [00:47<10:08, 6.34it/s] 4%|β–Ž | 143/4000 [00:47<10:08, 6.34it/s] 4%|β–Ž | 144/4000 [00:47<10:10, 6.32it/s] 4%|β–Ž | 145/4000 [00:47<10:09, 6.32it/s] 4%|β–Ž | 146/4000 [00:47<10:08, 6.34it/s] 4%|β–Ž | 147/4000 [00:47<10:08, 6.34it/s] 4%|β–Ž | 148/4000 [00:48<10:09, 6.32it/s] 4%|β–Ž | 149/4000 [00:48<10:09, 6.32it/s] 4%|▍ | 150/4000 [00:48<10:08, 6.33it/s] {'loss': 1.093, 'grad_norm': 0.38792213797569275, 'learning_rate': 7.450000000000001e-05}
4%|▍ | 150/4000 [00:48<10:08, 6.33it/s] 4%|▍ | 151/4000 [00:48<10:08, 6.32it/s] 4%|▍ | 152/4000 [00:48<10:07, 6.33it/s] 4%|▍ | 153/4000 [00:48<10:06, 6.35it/s] 4%|▍ | 154/4000 [00:49<10:04, 6.36it/s] 4%|▍ | 155/4000 [00:49<10:04, 6.36it/s] 4%|▍ | 156/4000 [00:49<10:05, 6.35it/s] 4%|▍ | 157/4000 [00:49<10:04, 6.36it/s] 4%|▍ | 158/4000 [00:49<10:06, 6.33it/s] 4%|▍ | 159/4000 [00:49<10:06, 6.34it/s] 4%|▍ | 160/4000 [00:50<10:05, 6.35it/s] {'loss': 1.0992, 'grad_norm': 0.31532591581344604, 'learning_rate': 7.950000000000001e-05}
4%|▍ | 160/4000 [00:50<10:05, 6.35it/s] 4%|▍ | 161/4000 [00:50<10:06, 6.33it/s] 4%|▍ | 162/4000 [00:50<10:05, 6.34it/s] 4%|▍ | 163/4000 [00:50<10:04, 6.35it/s] 4%|▍ | 164/4000 [00:50<10:03, 6.36it/s] 4%|▍ | 165/4000 [00:50<10:03, 6.36it/s] 4%|▍ | 166/4000 [00:50<10:02, 6.36it/s] 4%|▍ | 167/4000 [00:51<10:01, 6.37it/s] 4%|▍ | 168/4000 [00:51<10:02, 6.36it/s] 4%|▍ | 169/4000 [00:51<10:01, 6.37it/s] 4%|▍ | 170/4000 [00:51<10:01, 6.37it/s] {'loss': 1.1033, 'grad_norm': 0.37639284133911133, 'learning_rate': 8.450000000000001e-05}
4%|▍ | 170/4000 [00:51<10:01, 6.37it/s] 4%|▍ | 171/4000 [00:51<10:02, 6.35it/s] 4%|▍ | 172/4000 [00:51<10:02, 6.35it/s] 4%|▍ | 173/4000 [00:52<10:02, 6.35it/s] 4%|▍ | 174/4000 [00:52<10:02, 6.35it/s] 4%|▍ | 175/4000 [00:52<10:02, 6.35it/s] 4%|▍ | 176/4000 [00:52<10:00, 6.36it/s] 4%|▍ | 177/4000 [00:52<10:00, 6.36it/s] 4%|▍ | 178/4000 [00:52<10:00, 6.36it/s] 4%|▍ | 179/4000 [00:52<10:00, 6.37it/s] 4%|▍ | 180/4000 [00:53<09:59, 6.37it/s] {'loss': 1.0834, 'grad_norm': 0.4326634407043457, 'learning_rate': 8.950000000000001e-05}
4%|▍ | 180/4000 [00:53<09:59, 6.37it/s] 5%|▍ | 181/4000 [00:53<10:02, 6.34it/s] 5%|▍ | 182/4000 [00:53<10:03, 6.33it/s] 5%|▍ | 183/4000 [00:53<10:01, 6.34it/s] 5%|▍ | 184/4000 [00:53<10:00, 6.35it/s] 5%|▍ | 185/4000 [00:53<09:59, 6.36it/s] 5%|▍ | 186/4000 [00:54<09:59, 6.36it/s] 5%|▍ | 187/4000 [00:54<10:00, 6.35it/s] 5%|▍ | 188/4000 [00:54<09:59, 6.36it/s] 5%|▍ | 189/4000 [00:54<09:58, 6.37it/s] 5%|▍ | 190/4000 [00:54<09:58, 6.37it/s] {'loss': 1.0852, 'grad_norm': 0.4046025276184082, 'learning_rate': 9.449999999999999e-05}
5%|▍ | 190/4000 [00:54<09:58, 6.37it/s] 5%|▍ | 191/4000 [00:54<10:00, 6.34it/s] 5%|▍ | 192/4000 [00:55<09:59, 6.35it/s] 5%|▍ | 193/4000 [00:55<09:58, 6.36it/s] 5%|▍ | 194/4000 [00:55<09:57, 6.36it/s] 5%|▍ | 195/4000 [00:55<09:57, 6.37it/s] 5%|▍ | 196/4000 [00:55<09:57, 6.37it/s] 5%|▍ | 197/4000 [00:55<09:57, 6.37it/s] 5%|▍ | 198/4000 [00:55<09:57, 6.36it/s] 5%|▍ | 199/4000 [00:56<09:58, 6.35it/s] 5%|β–Œ | 200/4000 [00:56<09:57, 6.36it/s] {'loss': 1.0775, 'grad_norm': 0.45471683144569397, 'learning_rate': 9.95e-05}
5%|β–Œ | 200/4000 [00:56<09:57, 6.36it/s] 5%|β–Œ | 201/4000 [00:56<09:58, 6.35it/s] 5%|β–Œ | 202/4000 [00:56<09:57, 6.36it/s] 5%|β–Œ | 203/4000 [00:56<09:58, 6.34it/s] 5%|β–Œ | 204/4000 [00:56<09:58, 6.35it/s] 5%|β–Œ | 205/4000 [00:57<09:57, 6.35it/s] 5%|β–Œ | 206/4000 [00:57<09:56, 6.36it/s] 5%|β–Œ | 207/4000 [00:57<09:56, 6.36it/s] 5%|β–Œ | 208/4000 [00:57<09:55, 6.36it/s] 5%|β–Œ | 209/4000 [00:57<09:56, 6.35it/s] 5%|β–Œ | 210/4000 [00:57<09:56, 6.35it/s] {'loss': 1.0781, 'grad_norm': 0.5478058457374573, 'learning_rate': 9.999861593790126e-05}
5%|β–Œ | 210/4000 [00:57<09:56, 6.35it/s] 5%|β–Œ | 211/4000 [00:58<09:58, 6.33it/s] 5%|β–Œ | 212/4000 [00:58<09:57, 6.34it/s] 5%|β–Œ | 213/4000 [00:58<09:56, 6.35it/s] 5%|β–Œ | 214/4000 [00:58<09:56, 6.35it/s] 5%|β–Œ | 215/4000 [00:58<09:54, 6.37it/s] 5%|β–Œ | 216/4000 [00:58<09:54, 6.36it/s] 5%|β–Œ | 217/4000 [00:58<09:54, 6.37it/s] 5%|β–Œ | 218/4000 [00:59<09:53, 6.37it/s] 5%|β–Œ | 219/4000 [00:59<09:53, 6.37it/s] 6%|β–Œ | 220/4000 [00:59<09:53, 6.37it/s] {'loss': 1.0828, 'grad_norm': 0.3471389412879944, 'learning_rate': 9.999383162408304e-05}
6%|β–Œ | 220/4000 [00:59<09:53, 6.37it/s] 6%|β–Œ | 221/4000 [00:59<09:54, 6.36it/s] 6%|β–Œ | 222/4000 [00:59<09:53, 6.36it/s] 6%|β–Œ | 223/4000 [00:59<09:56, 6.33it/s] 6%|β–Œ | 224/4000 [01:00<09:55, 6.34it/s] 6%|β–Œ | 225/4000 [01:00<09:55, 6.34it/s] 6%|β–Œ | 226/4000 [01:00<09:54, 6.35it/s] 6%|β–Œ | 227/4000 [01:00<09:53, 6.36it/s] 6%|β–Œ | 228/4000 [01:00<09:53, 6.35it/s] 6%|β–Œ | 229/4000 [01:00<09:53, 6.35it/s] 6%|β–Œ | 230/4000 [01:01<09:53, 6.36it/s] {'loss': 1.0568, 'grad_norm': 0.4683222770690918, 'learning_rate': 9.998563029828259e-05}
6%|β–Œ | 230/4000 [01:01<09:53, 6.36it/s] 6%|β–Œ | 231/4000 [01:01<09:53, 6.35it/s] 6%|β–Œ | 232/4000 [01:01<09:55, 6.32it/s] 6%|β–Œ | 233/4000 [01:01<09:54, 6.34it/s] 6%|β–Œ | 234/4000 [01:01<09:53, 6.35it/s] 6%|β–Œ | 235/4000 [01:01<09:53, 6.35it/s] 6%|β–Œ | 236/4000 [01:01<09:52, 6.35it/s] 6%|β–Œ | 237/4000 [01:02<09:52, 6.35it/s] 6%|β–Œ | 238/4000 [01:02<09:54, 6.32it/s] 6%|β–Œ | 239/4000 [01:02<09:54, 6.33it/s] 6%|β–Œ | 240/4000 [01:02<09:53, 6.33it/s] {'loss': 1.0364, 'grad_norm': 0.3368767499923706, 'learning_rate': 9.997401252104962e-05}
6%|β–Œ | 240/4000 [01:02<09:53, 6.33it/s] 6%|β–Œ | 241/4000 [01:02<09:53, 6.33it/s] 6%|β–Œ | 242/4000 [01:02<09:52, 6.34it/s] 6%|β–Œ | 243/4000 [01:03<09:51, 6.35it/s] 6%|β–Œ | 244/4000 [01:03<09:53, 6.32it/s] 6%|β–Œ | 245/4000 [01:03<09:52, 6.34it/s] 6%|β–Œ | 246/4000 [01:03<09:51, 6.35it/s] 6%|β–Œ | 247/4000 [01:03<09:51, 6.35it/s] 6%|β–Œ | 248/4000 [01:03<09:51, 6.35it/s] 6%|β–Œ | 249/4000 [01:04<09:50, 6.36it/s] 6%|β–‹ | 250/4000 [01:04<09:52, 6.32it/s] {'loss': 1.0061, 'grad_norm': 0.425786554813385, 'learning_rate': 9.995897908644378e-05}
6%|β–‹ | 250/4000 [01:04<09:52, 6.32it/s] 6%|β–‹ | 251/4000 [01:04<09:52, 6.33it/s] 6%|β–‹ | 252/4000 [01:04<09:53, 6.31it/s] 6%|β–‹ | 253/4000 [01:04<09:51, 6.33it/s] 6%|β–‹ | 254/4000 [01:04<09:51, 6.33it/s] 6%|β–‹ | 255/4000 [01:04<09:49, 6.35it/s] 6%|β–‹ | 256/4000 [01:05<09:52, 6.32it/s] 6%|β–‹ | 257/4000 [01:05<09:51, 6.33it/s] 6%|β–‹ | 258/4000 [01:05<09:50, 6.34it/s] 6%|β–‹ | 259/4000 [01:05<09:49, 6.35it/s] 6%|β–‹ | 260/4000 [01:05<09:47, 6.36it/s] {'loss': 1.0064, 'grad_norm': 0.38125181198120117, 'learning_rate': 9.994053102198034e-05}
6%|β–‹ | 260/4000 [01:05<09:47, 6.36it/s] 7%|β–‹ | 261/4000 [01:05<09:49, 6.34it/s] 7%|β–‹ | 262/4000 [01:06<09:53, 6.30it/s] 7%|β–‹ | 263/4000 [01:06<09:51, 6.32it/s] 7%|β–‹ | 264/4000 [01:06<09:50, 6.33it/s] 7%|β–‹ | 265/4000 [01:06<09:49, 6.34it/s] 7%|β–‹ | 266/4000 [01:06<09:47, 6.35it/s] 7%|β–‹ | 267/4000 [01:06<09:48, 6.34it/s] 7%|β–‹ | 268/4000 [01:07<09:52, 6.30it/s] 7%|β–‹ | 269/4000 [01:07<09:49, 6.33it/s] 7%|β–‹ | 270/4000 [01:07<09:50, 6.32it/s] {'loss': 1.0119, 'grad_norm': 0.569205105304718, 'learning_rate': 9.991866958856003e-05}
7%|β–‹ | 270/4000 [01:07<09:50, 6.32it/s] 7%|β–‹ | 271/4000 [01:07<09:51, 6.31it/s] 7%|β–‹ | 272/4000 [01:07<09:49, 6.33it/s] 7%|β–‹ | 273/4000 [01:07<09:48, 6.33it/s] 7%|β–‹ | 274/4000 [01:07<09:51, 6.30it/s] 7%|β–‹ | 275/4000 [01:08<09:51, 6.30it/s] 7%|β–‹ | 276/4000 [01:08<09:51, 6.30it/s] 7%|β–‹ | 277/4000 [01:08<09:48, 6.32it/s] 7%|β–‹ | 278/4000 [01:08<09:47, 6.34it/s] 7%|β–‹ | 279/4000 [01:08<09:46, 6.34it/s] 7%|β–‹ | 280/4000 [01:08<09:50, 6.30it/s] {'loss': 0.9783, 'grad_norm': 0.360039621591568, 'learning_rate': 9.989339628038276e-05}
7%|β–‹ | 280/4000 [01:08<09:50, 6.30it/s] 7%|β–‹ | 281/4000 [01:09<09:49, 6.31it/s] 7%|β–‹ | 282/4000 [01:09<09:47, 6.32it/s] 7%|β–‹ | 283/4000 [01:09<09:46, 6.33it/s] 7%|β–‹ | 284/4000 [01:09<09:46, 6.33it/s] 7%|β–‹ | 285/4000 [01:09<09:46, 6.34it/s] 7%|β–‹ | 286/4000 [01:09<09:49, 6.31it/s] 7%|β–‹ | 287/4000 [01:10<09:47, 6.32it/s] 7%|β–‹ | 288/4000 [01:10<09:50, 6.28it/s] 7%|β–‹ | 289/4000 [01:10<09:49, 6.30it/s] 7%|β–‹ | 290/4000 [01:10<09:47, 6.31it/s] {'loss': 0.9876, 'grad_norm': 0.3894808292388916, 'learning_rate': 9.98647128248456e-05}
7%|β–‹ | 290/4000 [01:10<09:47, 6.31it/s] 7%|β–‹ | 291/4000 [01:10<09:48, 6.30it/s] 7%|β–‹ | 292/4000 [01:10<09:51, 6.27it/s] 7%|β–‹ | 293/4000 [01:10<09:48, 6.30it/s] 7%|β–‹ | 294/4000 [01:11<09:46, 6.31it/s] 7%|β–‹ | 295/4000 [01:11<09:46, 6.32it/s] 7%|β–‹ | 296/4000 [01:11<09:46, 6.32it/s] 7%|β–‹ | 297/4000 [01:11<09:45, 6.32it/s] 7%|β–‹ | 298/4000 [01:11<09:47, 6.30it/s] 7%|β–‹ | 299/4000 [01:11<09:46, 6.31it/s] 8%|β–Š | 300/4000 [01:12<09:45, 6.32it/s] {'loss': 0.9774, 'grad_norm': 0.49482613801956177, 'learning_rate': 9.98326211824246e-05}
8%|β–Š | 300/4000 [01:12<09:45, 6.32it/s] 8%|β–Š | 301/4000 [01:12<09:45, 6.32it/s] 8%|β–Š | 302/4000 [01:12<09:44, 6.33it/s] 8%|β–Š | 303/4000 [01:12<09:43, 6.34it/s] 8%|β–Š | 304/4000 [01:12<09:46, 6.30it/s] 8%|β–Š | 305/4000 [01:12<09:44, 6.32it/s] 8%|β–Š | 306/4000 [01:13<09:43, 6.33it/s] 8%|β–Š | 307/4000 [01:13<09:44, 6.32it/s] 8%|β–Š | 308/4000 [01:13<09:43, 6.32it/s] 8%|β–Š | 309/4000 [01:13<09:43, 6.33it/s] 8%|β–Š | 310/4000 [01:13<09:45, 6.30it/s] {'loss': 0.9528, 'grad_norm': 0.43771424889564514, 'learning_rate': 9.979712354654091e-05}
8%|β–Š | 310/4000 [01:13<09:45, 6.30it/s] 8%|β–Š | 311/4000 [01:13<09:45, 6.30it/s] 8%|β–Š | 312/4000 [01:13<09:43, 6.32it/s] 8%|β–Š | 313/4000 [01:14<09:44, 6.31it/s] 8%|β–Š | 314/4000 [01:14<09:44, 6.31it/s] 8%|β–Š | 315/4000 [01:14<09:43, 6.32it/s] 8%|β–Š | 316/4000 [01:14<09:42, 6.32it/s] 8%|β–Š | 317/4000 [01:14<09:41, 6.33it/s] 8%|β–Š | 318/4000 [01:14<09:42, 6.33it/s] 8%|β–Š | 319/4000 [01:15<09:42, 6.32it/s] 8%|β–Š | 320/4000 [01:15<09:46, 6.28it/s] {'loss': 0.949, 'grad_norm': 0.3924391269683838, 'learning_rate': 9.975822234341079e-05}
8%|β–Š | 320/4000 [01:15<09:46, 6.28it/s] 8%|β–Š | 321/4000 [01:15<09:46, 6.28it/s] 8%|β–Š | 322/4000 [01:15<09:44, 6.30it/s] 8%|β–Š | 323/4000 [01:15<09:43, 6.30it/s] 8%|β–Š | 324/4000 [01:15<09:42, 6.31it/s] 8%|β–Š | 325/4000 [01:16<09:41, 6.32it/s] 8%|β–Š | 326/4000 [01:16<09:41, 6.32it/s] 8%|β–Š | 327/4000 [01:16<09:41, 6.32it/s] 8%|β–Š | 328/4000 [01:16<09:43, 6.29it/s] 8%|β–Š | 329/4000 [01:16<09:42, 6.30it/s] 8%|β–Š | 330/4000 [01:16<09:41, 6.31it/s] {'loss': 0.95, 'grad_norm': 0.3948723077774048, 'learning_rate': 9.97159202318798e-05}
8%|β–Š | 330/4000 [01:16<09:41, 6.31it/s] 8%|β–Š | 331/4000 [01:16<09:41, 6.31it/s] 8%|β–Š | 332/4000 [01:17<09:41, 6.31it/s] 8%|β–Š | 333/4000 [01:17<09:40, 6.31it/s] 8%|β–Š | 334/4000 [01:17<09:41, 6.30it/s] 8%|β–Š | 335/4000 [01:17<09:40, 6.31it/s] 8%|β–Š | 336/4000 [01:17<09:40, 6.32it/s] 8%|β–Š | 337/4000 [01:17<09:39, 6.32it/s] 8%|β–Š | 338/4000 [01:18<09:39, 6.32it/s] 8%|β–Š | 339/4000 [01:18<09:39, 6.32it/s] 8%|β–Š | 340/4000 [01:18<09:40, 6.30it/s] {'loss': 0.9263, 'grad_norm': 0.4701744019985199, 'learning_rate': 9.967022010324105e-05}
8%|β–Š | 340/4000 [01:18<09:40, 6.30it/s] 9%|β–Š | 341/4000 [01:18<09:42, 6.28it/s] 9%|β–Š | 342/4000 [01:18<09:40, 6.30it/s] 9%|β–Š | 343/4000 [01:18<09:40, 6.30it/s] 9%|β–Š | 344/4000 [01:19<09:39, 6.31it/s] 9%|β–Š | 345/4000 [01:19<09:39, 6.31it/s] 9%|β–Š | 346/4000 [01:19<09:40, 6.30it/s] 9%|β–Š | 347/4000 [01:19<09:38, 6.31it/s] 9%|β–Š | 348/4000 [01:19<09:37, 6.33it/s] 9%|β–Š | 349/4000 [01:19<09:36, 6.33it/s] 9%|β–‰ | 350/4000 [01:20<09:37, 6.33it/s] {'loss': 0.9409, 'grad_norm': 0.5202460289001465, 'learning_rate': 9.962112508103765e-05}
9%|β–‰ | 350/4000 [01:20<09:37, 6.33it/s] 9%|β–‰ | 351/4000 [01:20<09:39, 6.29it/s] 9%|β–‰ | 352/4000 [01:20<09:41, 6.27it/s] 9%|β–‰ | 353/4000 [01:20<09:39, 6.29it/s] 9%|β–‰ | 354/4000 [01:20<09:38, 6.31it/s] 9%|β–‰ | 355/4000 [01:20<09:37, 6.31it/s] 9%|β–‰ | 356/4000 [01:20<09:36, 6.32it/s] 9%|β–‰ | 357/4000 [01:21<09:36, 6.31it/s] 9%|β–‰ | 358/4000 [01:21<09:37, 6.30it/s] 9%|β–‰ | 359/4000 [01:21<09:36, 6.32it/s] 9%|β–‰ | 360/4000 [01:21<09:36, 6.31it/s] {'loss': 0.9107, 'grad_norm': 0.4282516837120056, 'learning_rate': 9.956863852084914e-05}
9%|β–‰ | 360/4000 [01:21<09:36, 6.31it/s] 9%|β–‰ | 361/4000 [01:21<09:37, 6.30it/s] 9%|β–‰ | 362/4000 [01:21<09:39, 6.28it/s] 9%|β–‰ | 363/4000 [01:22<09:38, 6.29it/s] 9%|β–‰ | 364/4000 [01:22<09:40, 6.27it/s] 9%|β–‰ | 365/4000 [01:22<09:38, 6.29it/s] 9%|β–‰ | 366/4000 [01:22<09:39, 6.27it/s] 9%|β–‰ | 367/4000 [01:22<09:38, 6.28it/s] 9%|β–‰ | 368/4000 [01:22<09:40, 6.26it/s] 9%|β–‰ | 369/4000 [01:23<09:38, 6.27it/s] 9%|β–‰ | 370/4000 [01:23<09:38, 6.28it/s] {'loss': 0.9072, 'grad_norm': 0.5906583666801453, 'learning_rate': 9.951276401006221e-05}
9%|β–‰ | 370/4000 [01:23<09:38, 6.28it/s] 9%|β–‰ | 371/4000 [01:23<09:38, 6.27it/s] 9%|β–‰ | 372/4000 [01:23<09:37, 6.28it/s] 9%|β–‰ | 373/4000 [01:23<09:38, 6.26it/s] 9%|β–‰ | 374/4000 [01:23<09:37, 6.28it/s] 9%|β–‰ | 375/4000 [01:23<09:35, 6.30it/s] 9%|β–‰ | 376/4000 [01:24<09:34, 6.31it/s] 9%|β–‰ | 377/4000 [01:24<09:33, 6.32it/s] 9%|β–‰ | 378/4000 [01:24<09:32, 6.32it/s] 9%|β–‰ | 379/4000 [01:24<09:34, 6.30it/s] 10%|β–‰ | 380/4000 [01:24<09:33, 6.31it/s] {'loss': 0.9142, 'grad_norm': 0.4856233298778534, 'learning_rate': 9.945350536762543e-05}
10%|β–‰ | 380/4000 [01:24<09:33, 6.31it/s] 10%|β–‰ | 381/4000 [01:24<09:35, 6.29it/s] 10%|β–‰ | 382/4000 [01:25<10:04, 5.98it/s] 10%|β–‰ | 383/4000 [01:25<09:59, 6.04it/s] 10%|β–‰ | 384/4000 [01:25<09:50, 6.12it/s] 10%|β–‰ | 385/4000 [01:25<09:48, 6.15it/s] 10%|β–‰ | 386/4000 [01:25<09:43, 6.19it/s] 10%|β–‰ | 387/4000 [01:25<09:40, 6.23it/s] 10%|β–‰ | 388/4000 [01:26<09:37, 6.25it/s] 10%|β–‰ | 389/4000 [01:26<09:35, 6.27it/s] 10%|β–‰ | 390/4000 [01:26<09:33, 6.29it/s] {'loss': 0.8903, 'grad_norm': 0.44197753071784973, 'learning_rate': 9.939086664378829e-05}
10%|β–‰ | 390/4000 [01:26<09:33, 6.29it/s] 10%|β–‰ | 391/4000 [01:26<09:36, 6.26it/s] 10%|β–‰ | 392/4000 [01:26<09:34, 6.28it/s] 10%|β–‰ | 393/4000 [01:26<09:32, 6.30it/s] 10%|β–‰ | 394/4000 [01:27<09:32, 6.30it/s] 10%|β–‰ | 395/4000 [01:27<09:30, 6.32it/s] 10%|β–‰ | 396/4000 [01:27<09:30, 6.32it/s] 10%|β–‰ | 397/4000 [01:27<09:29, 6.33it/s] 10%|β–‰ | 398/4000 [01:27<09:29, 6.33it/s] 10%|β–‰ | 399/4000 [01:27<09:29, 6.33it/s] 10%|β–ˆ | 400/4000 [01:27<09:28, 6.33it/s] {'loss': 0.8942, 'grad_norm': 0.6055497527122498, 'learning_rate': 9.932485211982437e-05}
10%|β–ˆ | 400/4000 [01:27<09:28, 6.33it/s] 10%|β–ˆ | 401/4000 [01:28<09:32, 6.29it/s] 10%|β–ˆ | 402/4000 [01:28<09:30, 6.31it/s] 10%|β–ˆ | 403/4000 [01:28<09:28, 6.33it/s] 10%|β–ˆ | 404/4000 [01:28<09:28, 6.32it/s] 10%|β–ˆ | 405/4000 [01:28<09:29, 6.31it/s] 10%|β–ˆ | 406/4000 [01:28<09:29, 6.32it/s] 10%|β–ˆ | 407/4000 [01:29<09:28, 6.32it/s] 10%|β–ˆ | 408/4000 [01:29<09:27, 6.33it/s] 10%|β–ˆ | 409/4000 [01:29<09:26, 6.34it/s] 10%|β–ˆ | 410/4000 [01:29<09:26, 6.33it/s] {'loss': 0.8765, 'grad_norm': 0.5160871744155884, 'learning_rate': 9.92554663077387e-05}
10%|β–ˆ | 410/4000 [01:29<09:26, 6.33it/s] 10%|β–ˆ | 411/4000 [01:29<09:28, 6.31it/s] 10%|β–ˆ | 412/4000 [01:29<09:28, 6.32it/s] 10%|β–ˆ | 413/4000 [01:30<09:27, 6.32it/s] 10%|β–ˆ | 414/4000 [01:30<09:27, 6.32it/s] 10%|β–ˆ | 415/4000 [01:30<09:26, 6.33it/s] 10%|β–ˆ | 416/4000 [01:30<09:26, 6.32it/s] 10%|β–ˆ | 417/4000 [01:30<09:29, 6.29it/s] 10%|β–ˆ | 418/4000 [01:30<09:27, 6.31it/s] 10%|β–ˆ | 419/4000 [01:30<09:26, 6.32it/s] 10%|β–ˆ | 420/4000 [01:31<09:25, 6.33it/s] {'loss': 0.8774, 'grad_norm': 0.479907751083374, 'learning_rate': 9.918271394995935e-05}
10%|β–ˆ | 420/4000 [01:31<09:25, 6.33it/s] 11%|β–ˆ | 421/4000 [01:31<09:26, 6.32it/s] 11%|β–ˆ | 422/4000 [01:31<09:25, 6.33it/s] 11%|β–ˆ | 423/4000 [01:31<09:25, 6.32it/s] 11%|β–ˆ | 424/4000 [01:31<09:25, 6.33it/s] 11%|β–ˆ | 425/4000 [01:31<09:25, 6.32it/s] 11%|β–ˆ | 426/4000 [01:32<09:26, 6.30it/s] 11%|β–ˆ | 427/4000 [01:32<09:25, 6.32it/s] 11%|β–ˆ | 428/4000 [01:32<09:25, 6.32it/s] 11%|β–ˆ | 429/4000 [01:32<09:24, 6.32it/s] 11%|β–ˆ | 430/4000 [01:32<09:24, 6.32it/s] {'loss': 0.8775, 'grad_norm': 0.5040944218635559, 'learning_rate': 9.910660001901335e-05}
11%|β–ˆ | 430/4000 [01:32<09:24, 6.32it/s] 11%|β–ˆ | 431/4000 [01:32<09:24, 6.32it/s] 11%|β–ˆ | 432/4000 [01:33<09:24, 6.32it/s] 11%|β–ˆ | 433/4000 [01:33<09:23, 6.33it/s] 11%|β–ˆ | 434/4000 [01:33<09:24, 6.31it/s] 11%|β–ˆ | 435/4000 [01:33<09:24, 6.32it/s] 11%|β–ˆ | 436/4000 [01:33<09:23, 6.32it/s] 11%|β–ˆ | 437/4000 [01:33<09:23, 6.33it/s] 11%|β–ˆ | 438/4000 [01:33<09:22, 6.34it/s] 11%|β–ˆ | 439/4000 [01:34<09:23, 6.32it/s] 11%|β–ˆ | 440/4000 [01:34<09:23, 6.32it/s] {'loss': 0.8647, 'grad_norm': 0.5206490755081177, 'learning_rate': 9.902712971718675e-05}
11%|β–ˆ | 440/4000 [01:34<09:23, 6.32it/s] 11%|β–ˆ | 441/4000 [01:34<09:25, 6.29it/s] 11%|β–ˆ | 442/4000 [01:34<09:24, 6.31it/s] 11%|β–ˆ | 443/4000 [01:34<09:23, 6.32it/s] 11%|β–ˆ | 444/4000 [01:34<09:21, 6.33it/s] 11%|β–ˆ | 445/4000 [01:35<09:21, 6.33it/s] 11%|β–ˆ | 446/4000 [01:35<09:21, 6.33it/s] 11%|β–ˆ | 447/4000 [01:35<09:20, 6.33it/s] 11%|β–ˆ | 448/4000 [01:35<09:21, 6.33it/s] 11%|β–ˆ | 449/4000 [01:35<09:22, 6.31it/s] 11%|β–ˆβ– | 450/4000 [01:35<09:22, 6.32it/s] {'loss': 0.8619, 'grad_norm': 0.44746702909469604, 'learning_rate': 9.894430847616915e-05}
11%|β–ˆβ– | 450/4000 [01:35<09:22, 6.32it/s] 11%|β–ˆβ– | 451/4000 [01:36<09:23, 6.29it/s] 11%|β–ˆβ– | 452/4000 [01:36<09:23, 6.30it/s] 11%|β–ˆβ– | 453/4000 [01:36<09:22, 6.30it/s] 11%|β–ˆβ– | 454/4000 [01:36<09:21, 6.32it/s] 11%|β–ˆβ– | 455/4000 [01:36<09:21, 6.31it/s] 11%|β–ˆβ– | 456/4000 [01:36<09:20, 6.33it/s] 11%|β–ˆβ– | 457/4000 [01:36<09:19, 6.33it/s] 11%|β–ˆβ– | 458/4000 [01:37<09:18, 6.34it/s] 11%|β–ˆβ– | 459/4000 [01:37<09:19, 6.33it/s] 12%|β–ˆβ– | 460/4000 [01:37<09:18, 6.34it/s] {'loss': 0.8444, 'grad_norm': 0.4645554721355438, 'learning_rate': 9.885814195668232e-05}
12%|β–ˆβ– | 460/4000 [01:37<09:18, 6.34it/s] 12%|β–ˆβ– | 461/4000 [01:37<09:20, 6.31it/s] 12%|β–ˆβ– | 462/4000 [01:37<09:19, 6.32it/s] 12%|β–ˆβ– | 463/4000 [01:37<09:18, 6.33it/s] 12%|β–ˆβ– | 464/4000 [01:38<09:18, 6.33it/s] 12%|β–ˆβ– | 465/4000 [01:38<09:18, 6.33it/s] 12%|β–ˆβ– | 466/4000 [01:38<09:17, 6.33it/s] 12%|β–ˆβ– | 467/4000 [01:38<09:17, 6.34it/s] 12%|β–ˆβ– | 468/4000 [01:38<09:17, 6.34it/s] 12%|β–ˆβ– | 469/4000 [01:38<09:17, 6.33it/s] 12%|β–ˆβ– | 470/4000 [01:39<09:17, 6.33it/s] {'loss': 0.8333, 'grad_norm': 0.47834452986717224, 'learning_rate': 9.876863604809344e-05}
12%|β–ˆβ– | 470/4000 [01:39<09:17, 6.33it/s] 12%|β–ˆβ– | 471/4000 [01:39<09:19, 6.31it/s] 12%|β–ˆβ– | 472/4000 [01:39<09:18, 6.32it/s] 12%|β–ˆβ– | 473/4000 [01:39<09:17, 6.33it/s] 12%|β–ˆβ– | 474/4000 [01:39<09:16, 6.33it/s] 12%|β–ˆβ– | 475/4000 [01:39<09:16, 6.33it/s] 12%|β–ˆβ– | 476/4000 [01:40<09:16, 6.34it/s] 12%|β–ˆβ– | 477/4000 [01:40<09:16, 6.34it/s] 12%|β–ˆβ– | 478/4000 [01:40<09:16, 6.33it/s] 12%|β–ˆβ– | 479/4000 [01:40<09:15, 6.34it/s] 12%|β–ˆβ– | 480/4000 [01:40<09:15, 6.34it/s] {'loss': 0.8598, 'grad_norm': 0.43582776188850403, 'learning_rate': 9.867579686801245e-05}
12%|β–ˆβ– | 480/4000 [01:40<09:15, 6.34it/s] 12%|β–ˆβ– | 481/4000 [01:40<09:17, 6.31it/s] 12%|β–ˆβ– | 482/4000 [01:40<09:17, 6.32it/s] 12%|β–ˆβ– | 483/4000 [01:41<09:16, 6.32it/s] 12%|β–ˆβ– | 484/4000 [01:41<09:16, 6.32it/s] 12%|β–ˆβ– | 485/4000 [01:41<09:14, 6.34it/s] 12%|β–ˆβ– | 486/4000 [01:41<09:13, 6.35it/s] 12%|β–ˆβ– | 487/4000 [01:41<09:13, 6.35it/s] 12%|β–ˆβ– | 488/4000 [01:41<09:13, 6.35it/s] 12%|β–ˆβ– | 489/4000 [01:42<09:13, 6.35it/s] 12%|β–ˆβ– | 490/4000 [01:42<09:13, 6.34it/s] {'loss': 0.8198, 'grad_norm': 0.4257030189037323, 'learning_rate': 9.8579630761874e-05}
12%|β–ˆβ– | 490/4000 [01:42<09:13, 6.34it/s] 12%|β–ˆβ– | 491/4000 [01:42<09:14, 6.33it/s] 12%|β–ˆβ– | 492/4000 [01:42<09:14, 6.33it/s] 12%|β–ˆβ– | 493/4000 [01:42<09:13, 6.34it/s] 12%|β–ˆβ– | 494/4000 [01:42<09:12, 6.34it/s] 12%|β–ˆβ– | 495/4000 [01:42<09:12, 6.35it/s] 12%|β–ˆβ– | 496/4000 [01:43<09:12, 6.34it/s] 12%|β–ˆβ– | 497/4000 [01:43<09:12, 6.34it/s] 12%|β–ˆβ– | 498/4000 [01:43<09:13, 6.33it/s] 12%|β–ˆβ– | 499/4000 [01:43<09:13, 6.33it/s] 12%|β–ˆβ–Ž | 500/4000 [01:43<09:13, 6.33it/s] {'loss': 0.8271, 'grad_norm': 0.5325835943222046, 'learning_rate': 9.848014430250367e-05}
12%|β–ˆβ–Ž | 500/4000 [01:43<09:13, 6.33it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Copying experiment config directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/experiment_cfg to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-500/experiment_cfg
Copying processor directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/processor to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-500
Copying wandb_config.json from /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/wandb_config.json to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-500/wandb_config.json
13%|β–ˆβ–Ž | 501/4000 [02:10<8:00:50, 8.25s/it] 13%|β–ˆβ–Ž | 502/4000 [02:11<5:39:16, 5.82s/it] 13%|β–ˆβ–Ž | 503/4000 [02:11<4:00:11, 4.12s/it] 13%|β–ˆβ–Ž | 504/4000 [02:11<2:50:50, 2.93s/it] 13%|β–ˆβ–Ž | 505/4000 [02:11<2:02:19, 2.10s/it] 13%|β–ˆβ–Ž | 506/4000 [02:11<1:28:22, 1.52s/it] 13%|β–ˆβ–Ž | 507/4000 [02:11<1:04:36, 1.11s/it] 13%|β–ˆβ–Ž | 508/4000 [02:12<47:58, 1.21it/s] 13%|β–ˆβ–Ž | 509/4000 [02:12<36:19, 1.60it/s] 13%|β–ˆβ–Ž | 510/4000 [02:12<28:09, 2.07it/s] {'loss': 0.8202, 'grad_norm': 0.46889224648475647, 'learning_rate': 9.837734428966885e-05}
13%|β–ˆβ–Ž | 510/4000 [02:12<28:09, 2.07it/s] 13%|β–ˆβ–Ž | 511/4000 [02:12<22:30, 2.58it/s] 13%|β–ˆβ–Ž | 512/4000 [02:12<18:29, 3.14it/s] 13%|β–ˆβ–Ž | 513/4000 [02:12<15:41, 3.70it/s] 13%|β–ˆβ–Ž | 514/4000 [02:12<13:44, 4.23it/s] 13%|β–ˆβ–Ž | 515/4000 [02:13<12:21, 4.70it/s] 13%|β–ˆβ–Ž | 516/4000 [02:13<11:24, 5.09it/s] 13%|β–ˆβ–Ž | 517/4000 [02:13<10:44, 5.41it/s] 13%|β–ˆβ–Ž | 518/4000 [02:13<10:15, 5.66it/s] 13%|β–ˆβ–Ž | 519/4000 [02:13<09:55, 5.85it/s] 13%|β–ˆβ–Ž | 520/4000 [02:13<09:41, 5.98it/s] {'loss': 0.8078, 'grad_norm': 0.5707681775093079, 'learning_rate': 9.827123774961383e-05}
13%|β–ˆβ–Ž | 520/4000 [02:13<09:41, 5.98it/s] 13%|β–ˆβ–Ž | 521/4000 [02:14<09:34, 6.05it/s] 13%|β–ˆβ–Ž | 522/4000 [02:14<09:27, 6.13it/s] 13%|β–ˆβ–Ž | 523/4000 [02:14<09:22, 6.19it/s] 13%|β–ˆβ–Ž | 524/4000 [02:14<09:17, 6.23it/s] 13%|β–ˆβ–Ž | 525/4000 [02:14<09:15, 6.25it/s] 13%|β–ˆβ–Ž | 526/4000 [02:14<09:14, 6.27it/s] 13%|β–ˆβ–Ž | 527/4000 [02:15<09:12, 6.28it/s] 13%|β–ˆβ–Ž | 528/4000 [02:15<09:12, 6.28it/s] 13%|β–ˆβ–Ž | 529/4000 [02:15<09:12, 6.29it/s] 13%|β–ˆβ–Ž | 530/4000 [02:15<09:10, 6.30it/s] {'loss': 0.8296, 'grad_norm': 0.5601910352706909, 'learning_rate': 9.816183193457968e-05}
13%|β–ˆβ–Ž | 530/4000 [02:15<09:10, 6.30it/s] 13%|β–ˆβ–Ž | 531/4000 [02:15<09:10, 6.30it/s] 13%|β–ˆβ–Ž | 532/4000 [02:15<09:09, 6.31it/s] 13%|β–ˆβ–Ž | 533/4000 [02:15<09:08, 6.32it/s] 13%|β–ˆβ–Ž | 534/4000 [02:16<09:08, 6.32it/s] 13%|β–ˆβ–Ž | 535/4000 [02:16<09:08, 6.32it/s] 13%|β–ˆβ–Ž | 536/4000 [02:16<09:07, 6.33it/s] 13%|β–ˆβ–Ž | 537/4000 [02:16<09:06, 6.33it/s] 13%|β–ˆβ–Ž | 538/4000 [02:16<09:07, 6.33it/s] 13%|β–ˆβ–Ž | 539/4000 [02:16<09:07, 6.32it/s] 14%|β–ˆβ–Ž | 540/4000 [02:17<09:07, 6.32it/s] {'loss': 0.8248, 'grad_norm': 0.6276665925979614, 'learning_rate': 9.804913432230856e-05}
14%|β–ˆβ–Ž | 540/4000 [02:17<09:07, 6.32it/s] 14%|β–ˆβ–Ž | 541/4000 [02:17<09:08, 6.31it/s] 14%|β–ˆβ–Ž | 542/4000 [02:17<09:06, 6.32it/s] 14%|β–ˆβ–Ž | 543/4000 [02:17<09:06, 6.33it/s] 14%|β–ˆβ–Ž | 544/4000 [02:17<09:05, 6.33it/s] 14%|β–ˆβ–Ž | 545/4000 [02:17<09:05, 6.33it/s] 14%|β–ˆβ–Ž | 546/4000 [02:18<09:05, 6.33it/s] 14%|β–ˆβ–Ž | 547/4000 [02:18<09:05, 6.33it/s] 14%|β–ˆβ–Ž | 548/4000 [02:18<09:05, 6.33it/s] 14%|β–ˆβ–Ž | 549/4000 [02:18<09:04, 6.33it/s] 14%|β–ˆβ– | 550/4000 [02:18<09:04, 6.33it/s] {'loss': 0.8042, 'grad_norm': 0.6616746187210083, 'learning_rate': 9.793315261553252e-05}
14%|β–ˆβ– | 550/4000 [02:18<09:04, 6.33it/s] 14%|β–ˆβ– | 551/4000 [02:18<09:05, 6.32it/s] 14%|β–ˆβ– | 552/4000 [02:18<09:05, 6.32it/s] 14%|β–ˆβ– | 553/4000 [02:19<09:05, 6.32it/s] 14%|β–ˆβ– | 554/4000 [02:19<09:05, 6.31it/s] 14%|β–ˆβ– | 555/4000 [02:19<09:05, 6.32it/s] 14%|β–ˆβ– | 556/4000 [02:19<09:04, 6.33it/s] 14%|β–ˆβ– | 557/4000 [02:19<09:04, 6.33it/s] 14%|β–ˆβ– | 558/4000 [02:19<09:03, 6.33it/s] 14%|β–ˆβ– | 559/4000 [02:20<09:03, 6.33it/s] 14%|β–ˆβ– | 560/4000 [02:20<09:03, 6.33it/s] {'loss': 0.8047, 'grad_norm': 0.5051801204681396, 'learning_rate': 9.781389474144717e-05}
14%|β–ˆβ– | 560/4000 [02:20<09:03, 6.33it/s] 14%|β–ˆβ– | 561/4000 [02:20<09:04, 6.32it/s] 14%|β–ˆβ– | 562/4000 [02:20<09:03, 6.33it/s] 14%|β–ˆβ– | 563/4000 [02:20<09:03, 6.32it/s] 14%|β–ˆβ– | 564/4000 [02:20<09:03, 6.32it/s] 14%|β–ˆβ– | 565/4000 [02:21<09:04, 6.31it/s] 14%|β–ˆβ– | 566/4000 [02:21<09:03, 6.32it/s] 14%|β–ˆβ– | 567/4000 [02:21<09:02, 6.33it/s] 14%|β–ˆβ– | 568/4000 [02:21<09:02, 6.33it/s] 14%|β–ˆβ– | 569/4000 [02:21<09:02, 6.33it/s] 14%|β–ˆβ– | 570/4000 [02:21<09:04, 6.30it/s] {'loss': 0.8061, 'grad_norm': 0.45216768980026245, 'learning_rate': 9.76913688511698e-05}
14%|β–ˆβ– | 570/4000 [02:21<09:04, 6.30it/s] 14%|β–ˆβ– | 571/4000 [02:21<09:05, 6.29it/s] 14%|β–ˆβ– | 572/4000 [02:22<09:03, 6.31it/s] 14%|β–ˆβ– | 573/4000 [02:22<09:03, 6.30it/s] 14%|β–ˆβ– | 574/4000 [02:22<09:14, 6.18it/s] 14%|β–ˆβ– | 575/4000 [02:22<09:56, 5.74it/s] 14%|β–ˆβ– | 576/4000 [02:22<10:26, 5.47it/s] 14%|β–ˆβ– | 577/4000 [02:23<10:44, 5.31it/s] 14%|β–ˆβ– | 578/4000 [02:23<10:12, 5.58it/s] 14%|β–ˆβ– | 579/4000 [02:23<09:50, 5.80it/s] 14%|β–ˆβ– | 580/4000 [02:23<09:36, 5.94it/s] {'loss': 0.8082, 'grad_norm': 0.5215824246406555, 'learning_rate': 9.756558331918227e-05}
14%|β–ˆβ– | 580/4000 [02:23<09:36, 5.94it/s] 15%|β–ˆβ– | 581/4000 [02:23<09:26, 6.03it/s] 15%|β–ˆβ– | 582/4000 [02:23<09:18, 6.12it/s] 15%|β–ˆβ– | 583/4000 [02:24<09:12, 6.18it/s] 15%|β–ˆβ– | 584/4000 [02:24<09:09, 6.21it/s] 15%|β–ˆβ– | 585/4000 [02:24<09:07, 6.24it/s] 15%|β–ˆβ– | 586/4000 [02:24<09:05, 6.26it/s] 15%|β–ˆβ– | 587/4000 [02:24<09:02, 6.29it/s] 15%|β–ˆβ– | 588/4000 [02:24<09:01, 6.30it/s] 15%|β–ˆβ– | 589/4000 [02:24<09:00, 6.31it/s] 15%|β–ˆβ– | 590/4000 [02:25<08:59, 6.33it/s] {'loss': 0.8092, 'grad_norm': 0.5071956515312195, 'learning_rate': 9.743654674275855e-05}
15%|β–ˆβ– | 590/4000 [02:25<08:59, 6.33it/s] 15%|β–ˆβ– | 591/4000 [02:25<08:59, 6.31it/s] 15%|β–ˆβ– | 592/4000 [02:25<08:58, 6.33it/s] 15%|β–ˆβ– | 593/4000 [02:25<08:57, 6.33it/s] 15%|β–ˆβ– | 594/4000 [02:25<08:57, 6.34it/s] 15%|β–ˆβ– | 595/4000 [02:25<08:57, 6.33it/s] 15%|β–ˆβ– | 596/4000 [02:26<08:57, 6.33it/s] 15%|β–ˆβ– | 597/4000 [02:26<08:58, 6.32it/s] 15%|β–ˆβ– | 598/4000 [02:26<08:56, 6.34it/s] 15%|β–ˆβ– | 599/4000 [02:26<08:57, 6.33it/s] 15%|β–ˆβ–Œ | 600/4000 [02:26<08:57, 6.32it/s] {'loss': 0.7902, 'grad_norm': 0.4734920263290405, 'learning_rate': 9.730426794137727e-05}
15%|β–ˆβ–Œ | 600/4000 [02:26<08:57, 6.32it/s] 15%|β–ˆβ–Œ | 601/4000 [02:26<08:58, 6.31it/s] 15%|β–ˆβ–Œ | 602/4000 [02:27<08:58, 6.31it/s] 15%|β–ˆβ–Œ | 603/4000 [02:27<08:57, 6.32it/s] 15%|β–ˆβ–Œ | 604/4000 [02:27<08:57, 6.32it/s] 15%|β–ˆβ–Œ | 605/4000 [02:27<08:57, 6.32it/s] 15%|β–ˆβ–Œ | 606/4000 [02:27<08:57, 6.32it/s] 15%|β–ˆβ–Œ | 607/4000 [02:27<08:57, 6.32it/s] 15%|β–ˆβ–Œ | 608/4000 [02:27<08:55, 6.33it/s] 15%|β–ˆβ–Œ | 609/4000 [02:28<08:57, 6.30it/s] 15%|β–ˆβ–Œ | 610/4000 [02:28<08:57, 6.31it/s] {'loss': 0.7976, 'grad_norm': 0.6150533556938171, 'learning_rate': 9.716875595611879e-05}
15%|β–ˆβ–Œ | 610/4000 [02:28<08:57, 6.31it/s] 15%|β–ˆβ–Œ | 611/4000 [02:28<09:23, 6.02it/s] 15%|β–ˆβ–Œ | 612/4000 [02:28<09:58, 5.66it/s] 15%|β–ˆβ–Œ | 613/4000 [02:28<10:20, 5.46it/s] 15%|β–ˆβ–Œ | 614/4000 [02:29<10:33, 5.34it/s] 15%|β–ˆβ–Œ | 615/4000 [02:29<10:38, 5.31it/s] 15%|β–ˆβ–Œ | 616/4000 [02:29<10:47, 5.22it/s] 15%|β–ˆβ–Œ | 617/4000 [02:29<10:52, 5.19it/s] 15%|β–ˆβ–Œ | 618/4000 [02:29<10:45, 5.24it/s] 15%|β–ˆβ–Œ | 619/4000 [02:30<10:51, 5.19it/s] 16%|β–ˆβ–Œ | 620/4000 [02:30<10:44, 5.24it/s] {'loss': 0.7894, 'grad_norm': 0.6230007410049438, 'learning_rate': 9.703002004904729e-05}
16%|β–ˆβ–Œ | 620/4000 [02:30<10:44, 5.24it/s] 16%|β–ˆβ–Œ | 621/4000 [02:30<10:47, 5.22it/s] 16%|β–ˆβ–Œ | 622/4000 [02:30<10:50, 5.19it/s] 16%|β–ˆβ–Œ | 623/4000 [02:30<10:50, 5.19it/s] 16%|β–ˆβ–Œ | 624/4000 [02:31<10:55, 5.15it/s] 16%|β–ˆβ–Œ | 625/4000 [02:31<11:03, 5.09it/s] 16%|β–ˆβ–Œ | 626/4000 [02:31<11:02, 5.09it/s] 16%|β–ˆβ–Œ | 627/4000 [02:31<11:04, 5.08it/s] 16%|β–ˆβ–Œ | 628/4000 [02:31<11:05, 5.07it/s] 16%|β–ˆβ–Œ | 629/4000 [02:32<11:05, 5.07it/s] 16%|β–ˆβ–Œ | 630/4000 [02:32<10:38, 5.28it/s] {'loss': 0.7771, 'grad_norm': 0.47863709926605225, 'learning_rate': 9.688806970257773e-05}
16%|β–ˆβ–Œ | 630/4000 [02:32<10:38, 5.28it/s] 16%|β–ˆβ–Œ | 631/4000 [02:32<10:09, 5.52it/s] 16%|β–ˆβ–Œ | 632/4000 [02:32<09:46, 5.74it/s] 16%|β–ˆβ–Œ | 633/4000 [02:32<09:29, 5.91it/s] 16%|β–ˆβ–Œ | 634/4000 [02:32<09:19, 6.01it/s] 16%|β–ˆβ–Œ | 635/4000 [02:32<09:10, 6.11it/s] 16%|β–ˆβ–Œ | 636/4000 [02:33<09:06, 6.16it/s] 16%|β–ˆβ–Œ | 637/4000 [02:33<09:04, 6.17it/s] 16%|β–ˆβ–Œ | 638/4000 [02:33<09:01, 6.21it/s] 16%|β–ˆβ–Œ | 639/4000 [02:33<08:58, 6.24it/s] 16%|β–ˆβ–Œ | 640/4000 [02:33<08:56, 6.26it/s] {'loss': 0.7608, 'grad_norm': 0.5021641850471497, 'learning_rate': 9.674291461882774e-05}
16%|β–ˆβ–Œ | 640/4000 [02:33<08:56, 6.26it/s] 16%|β–ˆβ–Œ | 641/4000 [02:33<08:57, 6.25it/s] 16%|β–ˆβ–Œ | 642/4000 [02:34<08:55, 6.27it/s] 16%|β–ˆβ–Œ | 643/4000 [02:34<08:57, 6.25it/s] 16%|β–ˆβ–Œ | 644/4000 [02:34<08:54, 6.28it/s] 16%|β–ˆβ–Œ | 645/4000 [02:34<08:53, 6.29it/s] 16%|β–ˆβ–Œ | 646/4000 [02:34<08:53, 6.29it/s] 16%|β–ˆβ–Œ | 647/4000 [02:34<08:52, 6.29it/s] 16%|β–ˆβ–Œ | 648/4000 [02:35<08:51, 6.30it/s] 16%|β–ˆβ–Œ | 649/4000 [02:35<08:55, 6.26it/s] 16%|β–ˆβ–‹ | 650/4000 [02:35<08:52, 6.29it/s] {'loss': 0.7793, 'grad_norm': 0.6127395033836365, 'learning_rate': 9.659456471895445e-05}
16%|β–ˆβ–‹ | 650/4000 [02:35<08:52, 6.29it/s] 16%|β–ˆβ–‹ | 651/4000 [02:35<08:52, 6.28it/s] 16%|β–ˆβ–‹ | 652/4000 [02:35<08:51, 6.30it/s] 16%|β–ˆβ–‹ | 653/4000 [02:35<08:49, 6.32it/s] 16%|β–ˆβ–‹ | 654/4000 [02:35<08:48, 6.33it/s] 16%|β–ˆβ–‹ | 655/4000 [02:36<08:52, 6.29it/s] 16%|β–ˆβ–‹ | 656/4000 [02:36<08:50, 6.30it/s] 16%|β–ˆβ–‹ | 657/4000 [02:36<08:49, 6.32it/s] 16%|β–ˆβ–‹ | 658/4000 [02:36<08:49, 6.31it/s] 16%|β–ˆβ–‹ | 659/4000 [02:36<08:49, 6.31it/s] 16%|β–ˆβ–‹ | 660/4000 [02:36<08:48, 6.33it/s] {'loss': 0.7759, 'grad_norm': 0.5065549612045288, 'learning_rate': 9.644303014247648e-05}
16%|β–ˆβ–‹ | 660/4000 [02:36<08:48, 6.33it/s] 17%|β–ˆβ–‹ | 661/4000 [02:37<08:53, 6.26it/s] 17%|β–ˆβ–‹ | 662/4000 [02:37<09:11, 6.06it/s] 17%|β–ˆβ–‹ | 663/4000 [02:37<09:30, 5.85it/s] 17%|β–ˆβ–‹ | 664/4000 [02:37<09:46, 5.69it/s] 17%|β–ˆβ–‹ | 665/4000 [02:37<10:04, 5.51it/s] 17%|β–ˆβ–‹ | 666/4000 [02:38<10:03, 5.53it/s] 17%|β–ˆβ–‹ | 667/4000 [02:38<09:43, 5.71it/s] 17%|β–ˆβ–‹ | 668/4000 [02:38<09:28, 5.87it/s] 17%|β–ˆβ–‹ | 669/4000 [02:38<09:14, 6.00it/s] 17%|β–ˆβ–‹ | 670/4000 [02:38<09:06, 6.09it/s] {'loss': 0.7647, 'grad_norm': 0.5618578195571899, 'learning_rate': 9.628832124658085e-05}
17%|β–ˆβ–‹ | 670/4000 [02:38<09:06, 6.09it/s] 17%|β–ˆβ–‹ | 671/4000 [02:38<09:02, 6.14it/s] 17%|β–ˆβ–‹ | 672/4000 [02:38<08:56, 6.21it/s] 17%|β–ˆβ–‹ | 673/4000 [02:39<08:56, 6.20it/s] 17%|β–ˆβ–‹ | 674/4000 [02:39<08:53, 6.24it/s] 17%|β–ˆβ–‹ | 675/4000 [02:39<08:50, 6.27it/s] 17%|β–ˆβ–‹ | 676/4000 [02:39<08:49, 6.28it/s] 17%|β–ˆβ–‹ | 677/4000 [02:39<08:48, 6.29it/s] 17%|β–ˆβ–‹ | 678/4000 [02:39<08:47, 6.30it/s] 17%|β–ˆβ–‹ | 679/4000 [02:40<08:50, 6.27it/s] 17%|β–ˆβ–‹ | 680/4000 [02:40<08:48, 6.28it/s] {'loss': 0.7829, 'grad_norm': 0.7806953191757202, 'learning_rate': 9.613044860541507e-05}
17%|β–ˆβ–‹ | 680/4000 [02:40<08:48, 6.28it/s] 17%|β–ˆβ–‹ | 681/4000 [02:40<08:48, 6.28it/s] 17%|β–ˆβ–‹ | 682/4000 [02:40<08:47, 6.29it/s] 17%|β–ˆβ–‹ | 683/4000 [02:40<08:49, 6.26it/s] 17%|β–ˆβ–‹ | 684/4000 [02:40<08:48, 6.28it/s] 17%|β–ˆβ–‹ | 685/4000 [02:41<08:50, 6.25it/s] 17%|β–ˆβ–‹ | 686/4000 [02:41<08:48, 6.27it/s] 17%|β–ˆβ–‹ | 687/4000 [02:41<08:46, 6.30it/s] 17%|β–ˆβ–‹ | 688/4000 [02:41<08:45, 6.30it/s] 17%|β–ˆβ–‹ | 689/4000 [02:41<08:45, 6.29it/s] 17%|β–ˆβ–‹ | 690/4000 [02:41<08:44, 6.31it/s] {'loss': 0.7624, 'grad_norm': 0.48667505383491516, 'learning_rate': 9.596942300936445e-05}
17%|β–ˆβ–‹ | 690/4000 [02:41<08:44, 6.31it/s] 17%|β–ˆβ–‹ | 691/4000 [02:42<08:49, 6.25it/s] 17%|β–ˆβ–‹ | 692/4000 [02:42<08:48, 6.26it/s] 17%|β–ˆβ–‹ | 693/4000 [02:42<08:46, 6.28it/s] 17%|β–ˆβ–‹ | 694/4000 [02:42<08:45, 6.29it/s] 17%|β–ˆβ–‹ | 695/4000 [02:42<08:45, 6.29it/s] 17%|β–ˆβ–‹ | 696/4000 [02:42<08:44, 6.30it/s] 17%|β–ˆβ–‹ | 697/4000 [02:42<08:46, 6.27it/s] 17%|β–ˆβ–‹ | 698/4000 [02:43<08:45, 6.28it/s] 17%|β–ˆβ–‹ | 699/4000 [02:43<08:45, 6.28it/s] 18%|β–ˆβ–Š | 700/4000 [02:43<08:45, 6.28it/s] {'loss': 0.7617, 'grad_norm': 0.4916687309741974, 'learning_rate': 9.580525546431459e-05}
18%|β–ˆβ–Š | 700/4000 [02:43<08:45, 6.28it/s] 18%|β–ˆβ–Š | 701/4000 [02:43<08:46, 6.26it/s] 18%|β–ˆβ–Š | 702/4000 [02:43<08:45, 6.28it/s] 18%|β–ˆβ–Š | 703/4000 [02:43<08:47, 6.25it/s] 18%|β–ˆβ–Š | 704/4000 [02:44<08:44, 6.29it/s] 18%|β–ˆβ–Š | 705/4000 [02:44<08:45, 6.27it/s] 18%|β–ˆβ–Š | 706/4000 [02:44<10:18, 5.33it/s] 18%|β–ˆβ–Š | 707/4000 [02:44<09:49, 5.59it/s] 18%|β–ˆβ–Š | 708/4000 [02:44<09:30, 5.77it/s] 18%|β–ˆβ–Š | 709/4000 [02:44<09:18, 5.90it/s] 18%|β–ˆβ–Š | 710/4000 [02:45<09:09, 5.98it/s] {'loss': 0.7561, 'grad_norm': 0.382213830947876, 'learning_rate': 9.563795719089911e-05}
18%|β–ˆβ–Š | 710/4000 [02:45<09:09, 5.98it/s] 18%|β–ˆβ–Š | 711/4000 [02:45<09:03, 6.05it/s] 18%|β–ˆβ–Š | 712/4000 [02:45<08:56, 6.12it/s] 18%|β–ˆβ–Š | 713/4000 [02:45<08:53, 6.16it/s] 18%|β–ˆβ–Š | 714/4000 [02:45<08:50, 6.19it/s] 18%|β–ˆβ–Š | 715/4000 [02:45<08:51, 6.18it/s] 18%|β–ˆβ–Š | 716/4000 [02:46<08:50, 6.20it/s] 18%|β–ˆβ–Š | 717/4000 [02:46<08:48, 6.21it/s] 18%|β–ˆβ–Š | 718/4000 [02:46<08:46, 6.23it/s] 18%|β–ˆβ–Š | 719/4000 [02:46<08:43, 6.27it/s] 18%|β–ˆβ–Š | 720/4000 [02:46<08:42, 6.27it/s] {'loss': 0.7619, 'grad_norm': 0.7185860276222229, 'learning_rate': 9.546753962373281e-05}
18%|β–ˆβ–Š | 720/4000 [02:46<08:42, 6.27it/s] 18%|β–ˆβ–Š | 721/4000 [02:46<08:44, 6.25it/s] 18%|β–ˆβ–Š | 722/4000 [02:47<08:44, 6.25it/s] 18%|β–ˆβ–Š | 723/4000 [02:47<08:45, 6.24it/s] 18%|β–ˆβ–Š | 724/4000 [02:47<08:42, 6.26it/s] 18%|β–ˆβ–Š | 725/4000 [02:47<08:41, 6.28it/s] 18%|β–ˆβ–Š | 726/4000 [02:47<08:40, 6.30it/s] 18%|β–ˆβ–Š | 727/4000 [02:47<08:45, 6.22it/s] 18%|β–ˆβ–Š | 728/4000 [02:48<08:49, 6.18it/s] 18%|β–ˆβ–Š | 729/4000 [02:48<08:53, 6.13it/s] 18%|β–ˆβ–Š | 730/4000 [02:48<08:50, 6.16it/s] {'loss': 0.773, 'grad_norm': 0.48129522800445557, 'learning_rate': 9.529401441062997e-05}
18%|β–ˆβ–Š | 730/4000 [02:48<08:50, 6.16it/s] 18%|β–ˆβ–Š | 731/4000 [02:48<08:50, 6.16it/s] 18%|β–ˆβ–Š | 732/4000 [02:48<08:46, 6.20it/s] 18%|β–ˆβ–Š | 733/4000 [02:48<08:48, 6.19it/s] 18%|β–ˆβ–Š | 734/4000 [02:48<08:47, 6.20it/s] 18%|β–ˆβ–Š | 735/4000 [02:49<08:45, 6.21it/s] 18%|β–ˆβ–Š | 736/4000 [02:49<08:43, 6.23it/s] 18%|β–ˆβ–Š | 737/4000 [02:49<08:43, 6.24it/s] 18%|β–ˆβ–Š | 738/4000 [02:49<08:40, 6.27it/s] 18%|β–ˆβ–Š | 739/4000 [02:49<08:41, 6.25it/s] 18%|β–ˆβ–Š | 740/4000 [02:49<08:41, 6.25it/s] {'loss': 0.7669, 'grad_norm': 0.5167684555053711, 'learning_rate': 9.511739341180842e-05}
18%|β–ˆβ–Š | 740/4000 [02:49<08:41, 6.25it/s] 19%|β–ˆβ–Š | 741/4000 [02:50<08:43, 6.23it/s] 19%|β–ˆβ–Š | 742/4000 [02:50<08:41, 6.25it/s] 19%|β–ˆβ–Š | 743/4000 [02:50<08:40, 6.25it/s] 19%|β–ˆβ–Š | 744/4000 [02:50<08:39, 6.27it/s] 19%|β–ˆβ–Š | 745/4000 [02:50<08:40, 6.25it/s] 19%|β–ˆβ–Š | 746/4000 [02:50<08:40, 6.25it/s] 19%|β–ˆβ–Š | 747/4000 [02:51<08:39, 6.26it/s] 19%|β–ˆβ–Š | 748/4000 [02:51<08:39, 6.25it/s] 19%|β–ˆβ–Š | 749/4000 [02:51<08:42, 6.23it/s] 19%|β–ˆβ–‰ | 750/4000 [02:51<08:40, 6.24it/s] {'loss': 0.764, 'grad_norm': 0.5255888104438782, 'learning_rate': 9.493768869907886e-05}
19%|β–ˆβ–‰ | 750/4000 [02:51<08:40, 6.24it/s] 19%|β–ˆβ–‰ | 751/4000 [02:51<08:42, 6.22it/s] 19%|β–ˆβ–‰ | 752/4000 [02:51<08:41, 6.23it/s] 19%|β–ˆβ–‰ | 753/4000 [02:52<08:42, 6.21it/s] 19%|β–ˆβ–‰ | 754/4000 [02:52<08:41, 6.22it/s] 19%|β–ˆβ–‰ | 755/4000 [02:52<08:40, 6.23it/s] 19%|β–ˆβ–‰ | 756/4000 [02:52<08:38, 6.26it/s] 19%|β–ˆβ–‰ | 757/4000 [02:52<08:39, 6.24it/s] 19%|β–ˆβ–‰ | 758/4000 [02:52<08:40, 6.23it/s] 19%|β–ˆβ–‰ | 759/4000 [02:52<08:41, 6.21it/s] 19%|β–ˆβ–‰ | 760/4000 [02:53<08:40, 6.23it/s] {'loss': 0.7532, 'grad_norm': 0.3794105052947998, 'learning_rate': 9.475491255501968e-05}
19%|β–ˆβ–‰ | 760/4000 [02:53<08:40, 6.23it/s] 19%|β–ˆβ–‰ | 761/4000 [02:53<08:40, 6.23it/s] 19%|β–ˆβ–‰ | 762/4000 [02:53<08:39, 6.23it/s] 19%|β–ˆβ–‰ | 763/4000 [02:53<08:42, 6.20it/s] 19%|β–ˆβ–‰ | 764/4000 [02:53<08:42, 6.19it/s] 19%|β–ˆβ–‰ | 765/4000 [02:53<08:43, 6.19it/s] 19%|β–ˆβ–‰ | 766/4000 [02:54<08:43, 6.18it/s] 19%|β–ˆβ–‰ | 767/4000 [02:54<08:39, 6.22it/s] 19%|β–ˆβ–‰ | 768/4000 [02:54<08:40, 6.21it/s] 19%|β–ˆβ–‰ | 769/4000 [02:54<08:48, 6.11it/s] 19%|β–ˆβ–‰ | 770/4000 [02:54<08:49, 6.10it/s] {'loss': 0.7512, 'grad_norm': 0.4577208459377289, 'learning_rate': 9.456907747213748e-05}
19%|β–ˆβ–‰ | 770/4000 [02:54<08:49, 6.10it/s] 19%|β–ˆβ–‰ | 771/4000 [02:54<08:51, 6.07it/s] 19%|β–ˆβ–‰ | 772/4000 [02:55<08:48, 6.10it/s] 19%|β–ˆβ–‰ | 773/4000 [02:55<08:46, 6.13it/s] 19%|β–ˆβ–‰ | 774/4000 [02:55<08:44, 6.15it/s] 19%|β–ˆβ–‰ | 775/4000 [02:55<08:44, 6.15it/s] 19%|β–ˆβ–‰ | 776/4000 [02:55<08:42, 6.17it/s] 19%|β–ˆβ–‰ | 777/4000 [02:55<08:45, 6.14it/s] 19%|β–ˆβ–‰ | 778/4000 [02:56<08:44, 6.14it/s] 19%|β–ˆβ–‰ | 779/4000 [02:56<08:41, 6.17it/s] 20%|β–ˆβ–‰ | 780/4000 [02:56<08:39, 6.20it/s] {'loss': 0.7653, 'grad_norm': 0.47813680768013, 'learning_rate': 9.438019615201336e-05}
20%|β–ˆβ–‰ | 780/4000 [02:56<08:39, 6.20it/s] 20%|β–ˆβ–‰ | 781/4000 [02:56<08:40, 6.18it/s] 20%|β–ˆβ–‰ | 782/4000 [02:56<08:40, 6.19it/s] 20%|β–ˆβ–‰ | 783/4000 [02:56<08:38, 6.20it/s] 20%|β–ˆβ–‰ | 784/4000 [02:57<08:36, 6.23it/s] 20%|β–ˆβ–‰ | 785/4000 [02:57<08:38, 6.20it/s] 20%|β–ˆβ–‰ | 786/4000 [02:57<08:36, 6.22it/s] 20%|β–ˆβ–‰ | 787/4000 [02:57<08:40, 6.18it/s] 20%|β–ˆβ–‰ | 788/4000 [02:57<08:40, 6.17it/s] 20%|β–ˆβ–‰ | 789/4000 [02:57<08:40, 6.16it/s] 20%|β–ˆβ–‰ | 790/4000 [02:58<08:38, 6.19it/s] {'loss': 0.7354, 'grad_norm': 0.48698732256889343, 'learning_rate': 9.418828150443469e-05}
20%|β–ˆβ–‰ | 790/4000 [02:58<08:38, 6.19it/s] 20%|β–ˆβ–‰ | 791/4000 [02:58<08:37, 6.20it/s] 20%|β–ˆβ–‰ | 792/4000 [02:58<08:36, 6.21it/s] 20%|β–ˆβ–‰ | 793/4000 [02:58<08:38, 6.18it/s] 20%|β–ˆβ–‰ | 794/4000 [02:58<08:43, 6.13it/s] 20%|β–ˆβ–‰ | 795/4000 [02:58<08:43, 6.12it/s] 20%|β–ˆβ–‰ | 796/4000 [02:58<08:44, 6.10it/s] 20%|β–ˆβ–‰ | 797/4000 [02:59<08:42, 6.13it/s] 20%|β–ˆβ–‰ | 798/4000 [02:59<08:38, 6.18it/s] 20%|β–ˆβ–‰ | 799/4000 [02:59<08:40, 6.15it/s] 20%|β–ˆβ–ˆ | 800/4000 [02:59<08:41, 6.13it/s] {'loss': 0.7649, 'grad_norm': 0.48488903045654297, 'learning_rate': 9.399334664651262e-05}
20%|β–ˆβ–ˆ | 800/4000 [02:59<08:41, 6.13it/s] 20%|β–ˆβ–ˆ | 801/4000 [02:59<08:43, 6.11it/s] 20%|β–ˆβ–ˆ | 802/4000 [02:59<08:39, 6.16it/s] 20%|β–ˆβ–ˆ | 803/4000 [03:00<08:42, 6.12it/s] 20%|β–ˆβ–ˆ | 804/4000 [03:00<08:39, 6.16it/s] 20%|β–ˆβ–ˆ | 805/4000 [03:00<08:38, 6.17it/s] 20%|β–ˆβ–ˆ | 806/4000 [03:00<08:36, 6.19it/s] 20%|β–ˆβ–ˆ | 807/4000 [03:00<08:40, 6.13it/s] 20%|β–ˆβ–ˆ | 808/4000 [03:00<08:40, 6.14it/s] 20%|β–ˆβ–ˆ | 809/4000 [03:01<08:39, 6.15it/s] 20%|β–ˆβ–ˆ | 810/4000 [03:01<08:37, 6.17it/s] {'loss': 0.7617, 'grad_norm': 0.46380874514579773, 'learning_rate': 9.379540490178581e-05}
20%|β–ˆβ–ˆ | 810/4000 [03:01<08:37, 6.17it/s] 20%|β–ˆβ–ˆ | 811/4000 [03:01<08:39, 6.14it/s] 20%|β–ˆβ–ˆ | 812/4000 [03:01<08:36, 6.17it/s] 20%|β–ˆβ–ˆ | 813/4000 [03:01<08:34, 6.20it/s] 20%|β–ˆβ–ˆ | 814/4000 [03:01<08:36, 6.17it/s] 20%|β–ˆβ–ˆ | 815/4000 [03:02<08:39, 6.14it/s] 20%|β–ˆβ–ˆ | 816/4000 [03:02<08:40, 6.12it/s] 20%|β–ˆβ–ˆ | 817/4000 [03:02<08:42, 6.09it/s] 20%|β–ˆβ–ˆ | 818/4000 [03:02<08:41, 6.11it/s] 20%|β–ˆβ–ˆ | 819/4000 [03:02<08:37, 6.15it/s] 20%|β–ˆβ–ˆ | 820/4000 [03:02<08:35, 6.17it/s] {'loss': 0.7466, 'grad_norm': 0.48268088698387146, 'learning_rate': 9.359446979930955e-05}
20%|β–ˆβ–ˆ | 820/4000 [03:02<08:35, 6.17it/s] 21%|β–ˆβ–ˆ | 821/4000 [03:03<08:39, 6.11it/s] 21%|β–ˆβ–ˆ | 822/4000 [03:03<08:38, 6.13it/s] 21%|β–ˆβ–ˆ | 823/4000 [03:03<08:41, 6.10it/s] 21%|β–ˆβ–ˆ | 824/4000 [03:03<08:43, 6.07it/s] 21%|β–ˆβ–ˆ | 825/4000 [03:03<08:38, 6.12it/s] 21%|β–ˆβ–ˆ | 826/4000 [03:03<08:35, 6.16it/s] 21%|β–ˆβ–ˆ | 827/4000 [03:04<08:46, 6.02it/s] 21%|β–ˆβ–ˆ | 828/4000 [03:04<08:56, 5.91it/s] 21%|β–ˆβ–ˆ | 829/4000 [03:04<08:51, 5.97it/s] 21%|β–ˆβ–ˆ | 830/4000 [03:04<08:49, 5.99it/s] {'loss': 0.7389, 'grad_norm': 0.5104127526283264, 'learning_rate': 9.33905550727312e-05}
21%|β–ˆβ–ˆ | 830/4000 [03:04<08:49, 5.99it/s] 21%|β–ˆβ–ˆ | 831/4000 [03:04<08:49, 5.98it/s] 21%|β–ˆβ–ˆ | 832/4000 [03:04<08:43, 6.06it/s] 21%|β–ˆβ–ˆ | 833/4000 [03:05<08:36, 6.13it/s] 21%|β–ˆβ–ˆ | 834/4000 [03:05<08:36, 6.13it/s] 21%|β–ˆβ–ˆ | 835/4000 [03:05<08:37, 6.12it/s] 21%|β–ˆβ–ˆ | 836/4000 [03:05<08:35, 6.14it/s] 21%|β–ˆβ–ˆ | 837/4000 [03:05<08:42, 6.05it/s] 21%|β–ˆβ–ˆ | 838/4000 [03:05<08:43, 6.04it/s] 21%|β–ˆβ–ˆ | 839/4000 [03:06<08:42, 6.04it/s] 21%|β–ˆβ–ˆ | 840/4000 [03:06<08:44, 6.03it/s] {'loss': 0.7414, 'grad_norm': 0.4325626790523529, 'learning_rate': 9.318367465935142e-05}
21%|β–ˆβ–ˆ | 840/4000 [03:06<08:44, 6.03it/s] 21%|β–ˆβ–ˆ | 841/4000 [03:06<08:48, 5.97it/s] 21%|β–ˆβ–ˆ | 842/4000 [03:06<08:48, 5.98it/s] 21%|β–ˆβ–ˆ | 843/4000 [03:06<08:52, 5.93it/s] 21%|β–ˆβ–ˆ | 844/4000 [03:06<08:52, 5.93it/s] 21%|β–ˆβ–ˆ | 845/4000 [03:07<08:52, 5.93it/s] 21%|β–ˆβ–ˆ | 846/4000 [03:07<08:51, 5.94it/s] 21%|β–ˆβ–ˆ | 847/4000 [03:07<08:52, 5.92it/s] 21%|β–ˆβ–ˆ | 848/4000 [03:07<08:55, 5.89it/s] 21%|β–ˆβ–ˆ | 849/4000 [03:07<08:56, 5.88it/s] 21%|β–ˆβ–ˆβ– | 850/4000 [03:07<08:55, 5.88it/s] {'loss': 0.7517, 'grad_norm': 0.45519083738327026, 'learning_rate': 9.29738426991717e-05}
21%|β–ˆβ–ˆβ– | 850/4000 [03:07<08:55, 5.88it/s] 21%|β–ˆβ–ˆβ– | 851/4000 [03:08<08:55, 5.88it/s] 21%|β–ˆβ–ˆβ– | 852/4000 [03:08<08:52, 5.91it/s] 21%|β–ˆβ–ˆβ– | 853/4000 [03:08<08:52, 5.91it/s] 21%|β–ˆβ–ˆβ– | 854/4000 [03:08<08:49, 5.94it/s] 21%|β–ˆβ–ˆβ– | 855/4000 [03:08<08:52, 5.91it/s] 21%|β–ˆβ–ˆβ– | 856/4000 [03:08<08:50, 5.92it/s] 21%|β–ˆβ–ˆβ– | 857/4000 [03:09<08:47, 5.95it/s] 21%|β–ˆβ–ˆβ– | 858/4000 [03:09<08:45, 5.98it/s] 21%|β–ˆβ–ˆβ– | 859/4000 [03:09<08:47, 5.95it/s] 22%|β–ˆβ–ˆβ– | 860/4000 [03:09<08:47, 5.95it/s] {'loss': 0.7427, 'grad_norm': 0.5354924201965332, 'learning_rate': 9.276107353392774e-05}
22%|β–ˆβ–ˆβ– | 860/4000 [03:09<08:47, 5.95it/s] 22%|β–ˆβ–ˆβ– | 861/4000 [03:09<08:49, 5.93it/s] 22%|β–ˆβ–ˆβ– | 862/4000 [03:09<08:50, 5.92it/s] 22%|β–ˆβ–ˆβ– | 863/4000 [03:10<08:51, 5.90it/s] 22%|β–ˆβ–ˆβ– | 864/4000 [03:10<08:50, 5.91it/s] 22%|β–ˆβ–ˆβ– | 865/4000 [03:10<08:50, 5.91it/s] 22%|β–ˆβ–ˆβ– | 866/4000 [03:10<08:49, 5.92it/s] 22%|β–ˆβ–ˆβ– | 867/4000 [03:10<08:48, 5.93it/s] 22%|β–ˆβ–ˆβ– | 868/4000 [03:10<08:49, 5.92it/s] 22%|β–ˆβ–ˆβ– | 869/4000 [03:11<08:47, 5.94it/s] 22%|β–ˆβ–ˆβ– | 870/4000 [03:11<08:47, 5.94it/s] {'loss': 0.7515, 'grad_norm': 0.5575360655784607, 'learning_rate': 9.254538170610938e-05}
22%|β–ˆβ–ˆβ– | 870/4000 [03:11<08:47, 5.94it/s] 22%|β–ˆβ–ˆβ– | 871/4000 [03:11<08:48, 5.91it/s] 22%|β–ˆβ–ˆβ– | 872/4000 [03:11<08:49, 5.91it/s] 22%|β–ˆβ–ˆβ– | 873/4000 [03:11<08:51, 5.88it/s] 22%|β–ˆβ–ˆβ– | 874/4000 [03:11<08:51, 5.89it/s] 22%|β–ˆβ–ˆβ– | 875/4000 [03:12<08:51, 5.88it/s] 22%|β–ˆβ–ˆβ– | 876/4000 [03:12<08:46, 5.93it/s] 22%|β–ˆβ–ˆβ– | 877/4000 [03:12<08:44, 5.96it/s] 22%|β–ˆβ–ˆβ– | 878/4000 [03:12<08:45, 5.94it/s] 22%|β–ˆβ–ˆβ– | 879/4000 [03:12<08:45, 5.94it/s] 22%|β–ˆβ–ˆβ– | 880/4000 [03:12<08:45, 5.93it/s] {'loss': 0.722, 'grad_norm': 0.5105330944061279, 'learning_rate': 9.232678195796654e-05}
22%|β–ˆβ–ˆβ– | 880/4000 [03:12<08:45, 5.93it/s] 22%|β–ˆβ–ˆβ– | 881/4000 [03:13<08:49, 5.89it/s] 22%|β–ˆβ–ˆβ– | 882/4000 [03:13<08:46, 5.92it/s] 22%|β–ˆβ–ˆβ– | 883/4000 [03:13<08:46, 5.92it/s] 22%|β–ˆβ–ˆβ– | 884/4000 [03:13<08:47, 5.90it/s] 22%|β–ˆβ–ˆβ– | 885/4000 [03:13<08:50, 5.87it/s] 22%|β–ˆβ–ˆβ– | 886/4000 [03:13<08:53, 5.84it/s] 22%|β–ˆβ–ˆβ– | 887/4000 [03:14<08:51, 5.85it/s] 22%|β–ˆβ–ˆβ– | 888/4000 [03:14<08:46, 5.91it/s] 22%|β–ˆβ–ˆβ– | 889/4000 [03:14<08:44, 5.93it/s] 22%|β–ˆβ–ˆβ– | 890/4000 [03:14<08:44, 5.93it/s] {'loss': 0.7215, 'grad_norm': 0.5286569595336914, 'learning_rate': 9.210528923050164e-05}
22%|β–ˆβ–ˆβ– | 890/4000 [03:14<08:44, 5.93it/s] 22%|β–ˆβ–ˆβ– | 891/4000 [03:14<08:45, 5.92it/s] 22%|β–ˆβ–ˆβ– | 892/4000 [03:14<08:46, 5.91it/s] 22%|β–ˆβ–ˆβ– | 893/4000 [03:15<08:49, 5.87it/s] 22%|β–ˆβ–ˆβ– | 894/4000 [03:15<09:17, 5.57it/s] 22%|β–ˆβ–ˆβ– | 895/4000 [03:15<09:03, 5.72it/s] 22%|β–ˆβ–ˆβ– | 896/4000 [03:15<08:57, 5.77it/s] 22%|β–ˆβ–ˆβ– | 897/4000 [03:15<08:49, 5.86it/s] 22%|β–ˆβ–ˆβ– | 898/4000 [03:16<08:45, 5.90it/s] 22%|β–ˆβ–ˆβ– | 899/4000 [03:16<08:42, 5.93it/s] 22%|β–ˆβ–ˆβ–Ž | 900/4000 [03:16<08:36, 6.00it/s] {'loss': 0.72, 'grad_norm': 0.46469196677207947, 'learning_rate': 9.188091866244834e-05}
22%|β–ˆβ–ˆβ–Ž | 900/4000 [03:16<08:36, 6.00it/s] 23%|β–ˆβ–ˆβ–Ž | 901/4000 [03:16<08:35, 6.01it/s] 23%|β–ˆβ–ˆβ–Ž | 902/4000 [03:16<08:34, 6.02it/s] 23%|β–ˆβ–ˆβ–Ž | 903/4000 [03:16<08:32, 6.05it/s] 23%|β–ˆβ–ˆβ–Ž | 904/4000 [03:17<08:30, 6.07it/s] 23%|β–ˆβ–ˆβ–Ž | 905/4000 [03:17<08:27, 6.10it/s] 23%|β–ˆβ–ˆβ–Ž | 906/4000 [03:17<08:25, 6.12it/s] 23%|β–ˆβ–ˆβ–Ž | 907/4000 [03:17<08:23, 6.14it/s] 23%|β–ˆβ–ˆβ–Ž | 908/4000 [03:17<08:23, 6.14it/s] 23%|β–ˆβ–ˆβ–Ž | 909/4000 [03:17<08:21, 6.16it/s] 23%|β–ˆβ–ˆβ–Ž | 910/4000 [03:17<08:21, 6.17it/s] {'loss': 0.7195, 'grad_norm': 0.43992435932159424, 'learning_rate': 9.165368558923695e-05}
23%|β–ˆβ–ˆβ–Ž | 910/4000 [03:17<08:21, 6.17it/s] 23%|β–ˆβ–ˆβ–Ž | 911/4000 [03:18<08:20, 6.17it/s] 23%|β–ˆβ–ˆβ–Ž | 912/4000 [03:18<08:17, 6.20it/s] 23%|β–ˆβ–ˆβ–Ž | 913/4000 [03:18<08:18, 6.19it/s] 23%|β–ˆβ–ˆβ–Ž | 914/4000 [03:18<08:19, 6.17it/s] 23%|β–ˆβ–ˆβ–Ž | 915/4000 [03:18<08:18, 6.18it/s] 23%|β–ˆβ–ˆβ–Ž | 916/4000 [03:18<08:17, 6.20it/s] 23%|β–ˆβ–ˆβ–Ž | 917/4000 [03:19<08:17, 6.20it/s] 23%|β–ˆβ–ˆβ–Ž | 918/4000 [03:19<08:15, 6.22it/s] 23%|β–ˆβ–ˆβ–Ž | 919/4000 [03:19<08:14, 6.23it/s] 23%|β–ˆβ–ˆβ–Ž | 920/4000 [03:19<08:14, 6.22it/s] {'loss': 0.7272, 'grad_norm': 0.4441291391849518, 'learning_rate': 9.142360554194618e-05}
23%|β–ˆβ–ˆβ–Ž | 920/4000 [03:19<08:14, 6.22it/s] 23%|β–ˆβ–ˆβ–Ž | 921/4000 [03:19<08:15, 6.21it/s] 23%|β–ˆβ–ˆβ–Ž | 922/4000 [03:19<08:14, 6.23it/s] 23%|β–ˆβ–ˆβ–Ž | 923/4000 [03:20<08:11, 6.26it/s] 23%|β–ˆβ–ˆβ–Ž | 924/4000 [03:20<08:10, 6.27it/s] 23%|β–ˆβ–ˆβ–Ž | 925/4000 [03:20<08:09, 6.28it/s] 23%|β–ˆβ–ˆβ–Ž | 926/4000 [03:20<08:09, 6.28it/s] 23%|β–ˆβ–ˆβ–Ž | 927/4000 [03:20<08:09, 6.28it/s] 23%|β–ˆβ–ˆβ–Ž | 928/4000 [03:20<08:07, 6.30it/s] 23%|β–ˆβ–ˆβ–Ž | 929/4000 [03:21<08:07, 6.31it/s] 23%|β–ˆβ–ˆβ–Ž | 930/4000 [03:21<08:07, 6.30it/s] {'loss': 0.7142, 'grad_norm': 0.5133683085441589, 'learning_rate': 9.119069424624163e-05}
23%|β–ˆβ–ˆβ–Ž | 930/4000 [03:21<08:07, 6.30it/s] 23%|β–ˆβ–ˆβ–Ž | 931/4000 [03:21<08:09, 6.27it/s] 23%|β–ˆβ–ˆβ–Ž | 932/4000 [03:21<08:10, 6.25it/s] 23%|β–ˆβ–ˆβ–Ž | 933/4000 [03:21<08:09, 6.26it/s] 23%|β–ˆβ–ˆβ–Ž | 934/4000 [03:21<08:08, 6.28it/s] 23%|β–ˆβ–ˆβ–Ž | 935/4000 [03:21<08:06, 6.30it/s] 23%|β–ˆβ–ˆβ–Ž | 936/4000 [03:22<08:06, 6.30it/s] 23%|β–ˆβ–ˆβ–Ž | 937/4000 [03:22<08:05, 6.31it/s] 23%|β–ˆβ–ˆβ–Ž | 938/4000 [03:22<08:06, 6.30it/s] 23%|β–ˆβ–ˆβ–Ž | 939/4000 [03:22<08:05, 6.31it/s] 24%|β–ˆβ–ˆβ–Ž | 940/4000 [03:22<08:03, 6.33it/s] {'loss': 0.7095, 'grad_norm': 0.5577358603477478, 'learning_rate': 9.0954967621301e-05}
24%|β–ˆβ–ˆβ–Ž | 940/4000 [03:22<08:03, 6.33it/s] 24%|β–ˆβ–ˆβ–Ž | 941/4000 [03:22<08:04, 6.31it/s] 24%|β–ˆβ–ˆβ–Ž | 942/4000 [03:23<08:03, 6.33it/s] 24%|β–ˆβ–ˆβ–Ž | 943/4000 [03:23<08:01, 6.34it/s] 24%|β–ˆβ–ˆβ–Ž | 944/4000 [03:23<08:03, 6.32it/s] 24%|β–ˆβ–ˆβ–Ž | 945/4000 [03:23<08:04, 6.31it/s] 24%|β–ˆβ–ˆβ–Ž | 946/4000 [03:23<08:03, 6.31it/s] 24%|β–ˆβ–ˆβ–Ž | 947/4000 [03:23<08:02, 6.33it/s] 24%|β–ˆβ–ˆβ–Ž | 948/4000 [03:24<08:01, 6.34it/s] 24%|β–ˆβ–ˆβ–Ž | 949/4000 [03:24<08:02, 6.32it/s] 24%|β–ˆβ–ˆβ– | 950/4000 [03:24<08:04, 6.30it/s] {'loss': 0.7276, 'grad_norm': 0.621738076210022, 'learning_rate': 9.071644177872594e-05}
24%|β–ˆβ–ˆβ– | 950/4000 [03:24<08:04, 6.30it/s] 24%|β–ˆβ–ˆβ– | 951/4000 [03:24<08:05, 6.28it/s] 24%|β–ˆβ–ˆβ– | 952/4000 [03:24<08:02, 6.31it/s] 24%|β–ˆβ–ˆβ– | 953/4000 [03:24<08:01, 6.32it/s] 24%|β–ˆβ–ˆβ– | 954/4000 [03:24<08:01, 6.33it/s] 24%|β–ˆβ–ˆβ– | 955/4000 [03:25<08:01, 6.33it/s] 24%|β–ˆβ–ˆβ– | 956/4000 [03:25<08:26, 6.01it/s] 24%|β–ˆβ–ˆβ– | 957/4000 [03:25<08:27, 6.00it/s] 24%|β–ˆβ–ˆβ– | 958/4000 [03:25<08:18, 6.11it/s] 24%|β–ˆβ–ˆβ– | 959/4000 [03:25<08:12, 6.18it/s] 24%|β–ˆβ–ˆβ– | 960/4000 [03:25<08:07, 6.23it/s] {'loss': 0.7196, 'grad_norm': 0.4756941497325897, 'learning_rate': 9.047513302144095e-05}
24%|β–ˆβ–ˆβ– | 960/4000 [03:25<08:07, 6.23it/s] 24%|β–ˆβ–ˆβ– | 961/4000 [03:26<08:07, 6.24it/s] 24%|β–ˆβ–ˆβ– | 962/4000 [03:26<08:05, 6.25it/s] 24%|β–ˆβ–ˆβ– | 963/4000 [03:26<08:04, 6.27it/s] 24%|β–ˆβ–ˆβ– | 964/4000 [03:26<08:02, 6.29it/s] 24%|β–ˆβ–ˆβ– | 965/4000 [03:26<08:01, 6.30it/s] 24%|β–ˆβ–ˆβ– | 966/4000 [03:26<08:01, 6.31it/s] 24%|β–ˆβ–ˆβ– | 967/4000 [03:27<08:01, 6.30it/s] 24%|β–ˆβ–ˆβ– | 968/4000 [03:27<08:02, 6.28it/s] 24%|β–ˆβ–ˆβ– | 969/4000 [03:27<08:02, 6.28it/s] 24%|β–ˆβ–ˆβ– | 970/4000 [03:27<08:00, 6.31it/s] {'loss': 0.7159, 'grad_norm': 0.3742871582508087, 'learning_rate': 9.023105784257906e-05}
24%|β–ˆβ–ˆβ– | 970/4000 [03:27<08:00, 6.31it/s] 24%|β–ˆβ–ˆβ– | 971/4000 [03:27<07:59, 6.31it/s] 24%|β–ˆβ–ˆβ– | 972/4000 [03:27<07:59, 6.32it/s] 24%|β–ˆβ–ˆβ– | 973/4000 [03:28<07:59, 6.32it/s] 24%|β–ˆβ–ˆβ– | 974/4000 [03:28<08:00, 6.30it/s] 24%|β–ˆβ–ˆβ– | 975/4000 [03:28<08:00, 6.30it/s] 24%|β–ˆβ–ˆβ– | 976/4000 [03:28<07:59, 6.31it/s] 24%|β–ˆβ–ˆβ– | 977/4000 [03:28<07:58, 6.32it/s] 24%|β–ˆβ–ˆβ– | 978/4000 [03:28<07:57, 6.33it/s] 24%|β–ˆβ–ˆβ– | 979/4000 [03:28<07:57, 6.33it/s] 24%|β–ˆβ–ˆβ– | 980/4000 [03:29<07:58, 6.31it/s] {'loss': 0.7127, 'grad_norm': 0.43546372652053833, 'learning_rate': 8.998423292435454e-05}
24%|β–ˆβ–ˆβ– | 980/4000 [03:29<07:58, 6.31it/s] 25%|β–ˆβ–ˆβ– | 981/4000 [03:29<08:00, 6.29it/s] 25%|β–ˆβ–ˆβ– | 982/4000 [03:29<08:00, 6.28it/s] 25%|β–ˆβ–ˆβ– | 983/4000 [03:29<07:57, 6.31it/s] 25%|β–ˆβ–ˆβ– | 984/4000 [03:29<07:57, 6.32it/s] 25%|β–ˆβ–ˆβ– | 985/4000 [03:29<07:57, 6.31it/s] 25%|β–ˆβ–ˆβ– | 986/4000 [03:30<07:58, 6.30it/s] 25%|β–ˆβ–ˆβ– | 987/4000 [03:30<07:58, 6.30it/s] 25%|β–ˆβ–ˆβ– | 988/4000 [03:30<07:56, 6.32it/s] 25%|β–ˆβ–ˆβ– | 989/4000 [03:30<07:55, 6.33it/s] 25%|β–ˆβ–ˆβ– | 990/4000 [03:30<07:54, 6.34it/s] {'loss': 0.7089, 'grad_norm': 0.5410100817680359, 'learning_rate': 8.973467513692265e-05}
25%|β–ˆβ–ˆβ– | 990/4000 [03:30<07:54, 6.34it/s] 25%|β–ˆβ–ˆβ– | 991/4000 [03:30<07:56, 6.31it/s] 25%|β–ˆβ–ˆβ– | 992/4000 [03:31<07:57, 6.29it/s] 25%|β–ˆβ–ˆβ– | 993/4000 [03:31<07:57, 6.30it/s] 25%|β–ˆβ–ˆβ– | 994/4000 [03:31<07:56, 6.31it/s] 25%|β–ˆβ–ˆβ– | 995/4000 [03:31<07:54, 6.34it/s] 25%|β–ˆβ–ˆβ– | 996/4000 [03:31<07:54, 6.34it/s] 25%|β–ˆβ–ˆβ– | 997/4000 [03:31<07:54, 6.33it/s] 25%|β–ˆβ–ˆβ– | 998/4000 [03:31<07:55, 6.31it/s] 25%|β–ˆβ–ˆβ– | 999/4000 [03:32<07:56, 6.30it/s] 25%|β–ˆβ–ˆβ–Œ | 1000/4000 [03:32<07:55, 6.31it/s] {'loss': 0.6982, 'grad_norm': 0.5081359148025513, 'learning_rate': 8.94824015372267e-05}
25%|β–ˆβ–ˆβ–Œ | 1000/4000 [03:32<07:55, 6.31it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Copying experiment config directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/experiment_cfg to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-1000/experiment_cfg
Copying processor directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/processor to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-1000
Copying wandb_config.json from /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/wandb_config.json to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-1000/wandb_config.json
25%|β–ˆβ–ˆβ–Œ | 1001/4000 [03:59<6:52:15, 8.25s/it] 25%|β–ˆβ–ˆβ–Œ | 1002/4000 [03:59<4:50:51, 5.82s/it] 25%|β–ˆβ–ˆβ–Œ | 1003/4000 [03:59<3:25:54, 4.12s/it] 25%|β–ˆβ–ˆβ–Œ | 1004/4000 [03:59<2:26:29, 2.93s/it] 25%|β–ˆβ–ˆβ–Œ | 1005/4000 [04:00<1:44:51, 2.10s/it] 25%|β–ˆβ–ˆβ–Œ | 1006/4000 [04:00<1:15:45, 1.52s/it] 25%|β–ˆβ–ˆβ–Œ | 1007/4000 [04:00<55:22, 1.11s/it] 25%|β–ˆβ–ˆβ–Œ | 1008/4000 [04:00<41:08, 1.21it/s] 25%|β–ˆβ–ˆβ–Œ | 1009/4000 [04:00<31:11, 1.60it/s] 25%|β–ˆβ–ˆβ–Œ | 1010/4000 [04:00<24:10, 2.06it/s] {'loss': 0.7099, 'grad_norm': 0.49883362650871277, 'learning_rate': 8.922742936783207e-05}
25%|β–ˆβ–ˆβ–Œ | 1010/4000 [04:00<24:10, 2.06it/s] 25%|β–ˆβ–ˆβ–Œ | 1011/4000 [04:01<19:18, 2.58it/s] 25%|β–ˆβ–ˆβ–Œ | 1012/4000 [04:01<15:52, 3.14it/s] 25%|β–ˆβ–ˆβ–Œ | 1013/4000 [04:01<13:27, 3.70it/s] 25%|β–ˆβ–ˆβ–Œ | 1014/4000 [04:01<11:46, 4.22it/s] 25%|β–ˆβ–ˆβ–Œ | 1015/4000 [04:01<10:36, 4.69it/s] 25%|β–ˆβ–ˆβ–Œ | 1016/4000 [04:01<09:46, 5.09it/s] 25%|β–ˆβ–ˆβ–Œ | 1017/4000 [04:01<09:12, 5.40it/s] 25%|β–ˆβ–ˆβ–Œ | 1018/4000 [04:02<08:49, 5.64it/s] 25%|β–ˆβ–ˆβ–Œ | 1019/4000 [04:02<08:30, 5.83it/s] 26%|β–ˆβ–ˆβ–Œ | 1020/4000 [04:02<08:18, 5.98it/s] {'loss': 0.7041, 'grad_norm': 0.5988953709602356, 'learning_rate': 8.896977605574788e-05}
26%|β–ˆβ–ˆβ–Œ | 1020/4000 [04:02<08:18, 5.98it/s] 26%|β–ˆβ–ˆβ–Œ | 1021/4000 [04:02<08:11, 6.06it/s] 26%|β–ˆβ–ˆβ–Œ | 1022/4000 [04:02<08:05, 6.14it/s] 26%|β–ˆβ–ˆβ–Œ | 1023/4000 [04:02<08:01, 6.18it/s] 26%|β–ˆβ–ˆβ–Œ | 1024/4000 [04:03<07:57, 6.23it/s] 26%|β–ˆβ–ˆβ–Œ | 1025/4000 [04:03<07:56, 6.25it/s] 26%|β–ˆβ–ˆβ–Œ | 1026/4000 [04:03<07:54, 6.27it/s] 26%|β–ˆβ–ˆβ–Œ | 1027/4000 [04:03<07:53, 6.28it/s] 26%|β–ˆβ–ˆβ–Œ | 1028/4000 [04:03<07:51, 6.30it/s] 26%|β–ˆβ–ˆβ–Œ | 1029/4000 [04:03<07:52, 6.29it/s] 26%|β–ˆβ–ˆβ–Œ | 1030/4000 [04:04<07:52, 6.29it/s] {'loss': 0.7184, 'grad_norm': 0.648632824420929, 'learning_rate': 8.870945921123576e-05}
26%|β–ˆβ–ˆβ–Œ | 1030/4000 [04:04<07:52, 6.29it/s] 26%|β–ˆβ–ˆβ–Œ | 1031/4000 [04:04<07:54, 6.26it/s] 26%|β–ˆβ–ˆβ–Œ | 1032/4000 [04:04<07:52, 6.28it/s] 26%|β–ˆβ–ˆβ–Œ | 1033/4000 [04:04<07:51, 6.29it/s] 26%|β–ˆβ–ˆβ–Œ | 1034/4000 [04:04<07:52, 6.28it/s] 26%|β–ˆβ–ˆβ–Œ | 1035/4000 [04:04<07:52, 6.28it/s] 26%|β–ˆβ–ˆβ–Œ | 1036/4000 [04:04<07:50, 6.30it/s] 26%|β–ˆβ–ˆβ–Œ | 1037/4000 [04:05<07:49, 6.31it/s] 26%|β–ˆβ–ˆβ–Œ | 1038/4000 [04:05<07:48, 6.32it/s] 26%|β–ˆβ–ˆβ–Œ | 1039/4000 [04:05<07:49, 6.31it/s] 26%|β–ˆβ–ˆβ–Œ | 1040/4000 [04:05<07:48, 6.32it/s] {'loss': 0.7186, 'grad_norm': 0.4201085865497589, 'learning_rate': 8.844649662660624e-05}
26%|β–ˆβ–ˆβ–Œ | 1040/4000 [04:05<07:48, 6.32it/s] 26%|β–ˆβ–ˆβ–Œ | 1041/4000 [04:05<07:50, 6.29it/s] 26%|β–ˆβ–ˆβ–Œ | 1042/4000 [04:05<07:49, 6.30it/s] 26%|β–ˆβ–ˆβ–Œ | 1043/4000 [04:06<07:48, 6.32it/s] 26%|β–ˆβ–ˆβ–Œ | 1044/4000 [04:06<07:48, 6.31it/s] 26%|β–ˆβ–ˆβ–Œ | 1045/4000 [04:06<07:48, 6.31it/s] 26%|β–ˆβ–ˆβ–Œ | 1046/4000 [04:06<07:47, 6.33it/s] 26%|β–ˆβ–ˆβ–Œ | 1047/4000 [04:06<07:48, 6.31it/s] 26%|β–ˆβ–ˆβ–Œ | 1048/4000 [04:06<07:47, 6.32it/s] 26%|β–ˆβ–ˆβ–Œ | 1049/4000 [04:07<07:46, 6.32it/s] 26%|β–ˆβ–ˆβ–‹ | 1050/4000 [04:07<07:47, 6.31it/s] {'loss': 0.7271, 'grad_norm': 0.5427833199501038, 'learning_rate': 8.818090627500266e-05}
26%|β–ˆβ–ˆβ–‹ | 1050/4000 [04:07<07:47, 6.31it/s] 26%|β–ˆβ–ˆβ–‹ | 1051/4000 [04:07<07:48, 6.29it/s] 26%|β–ˆβ–ˆβ–‹ | 1052/4000 [04:07<07:47, 6.31it/s] 26%|β–ˆβ–ˆβ–‹ | 1053/4000 [04:07<07:48, 6.29it/s] 26%|β–ˆβ–ˆβ–‹ | 1054/4000 [04:07<07:47, 6.31it/s] 26%|β–ˆβ–ˆβ–‹ | 1055/4000 [04:08<07:47, 6.30it/s] 26%|β–ˆβ–ˆβ–‹ | 1056/4000 [04:08<07:49, 6.27it/s] 26%|β–ˆβ–ˆβ–‹ | 1057/4000 [04:08<07:49, 6.27it/s] 26%|β–ˆβ–ˆβ–‹ | 1058/4000 [04:08<07:48, 6.28it/s] 26%|β–ˆβ–ˆβ–‹ | 1059/4000 [04:08<07:49, 6.26it/s] 26%|β–ˆβ–ˆβ–‹ | 1060/4000 [04:08<07:47, 6.28it/s] {'loss': 0.7092, 'grad_norm': 0.5188281536102295, 'learning_rate': 8.791270630917275e-05}
26%|β–ˆβ–ˆβ–‹ | 1060/4000 [04:08<07:47, 6.28it/s] 27%|β–ˆβ–ˆβ–‹ | 1061/4000 [04:08<07:47, 6.28it/s] 27%|β–ˆβ–ˆβ–‹ | 1062/4000 [04:09<07:47, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1063/4000 [04:09<07:46, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1064/4000 [04:09<07:45, 6.31it/s] 27%|β–ˆβ–ˆβ–‹ | 1065/4000 [04:09<07:47, 6.28it/s] 27%|β–ˆβ–ˆβ–‹ | 1066/4000 [04:09<07:45, 6.31it/s] 27%|β–ˆβ–ˆβ–‹ | 1067/4000 [04:09<07:46, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1068/4000 [04:10<07:45, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1069/4000 [04:10<07:45, 6.30it/s] 27%|β–ˆβ–ˆβ–‹ | 1070/4000 [04:10<07:45, 6.29it/s] {'loss': 0.7114, 'grad_norm': 0.6617937684059143, 'learning_rate': 8.764191506022795e-05}
27%|β–ˆβ–ˆβ–‹ | 1070/4000 [04:10<07:45, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1071/4000 [04:10<07:46, 6.27it/s] 27%|β–ˆβ–ˆβ–‹ | 1072/4000 [04:10<07:45, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1073/4000 [04:10<07:47, 6.26it/s] 27%|β–ˆβ–ˆβ–‹ | 1074/4000 [04:11<07:46, 6.28it/s] 27%|β–ˆβ–ˆβ–‹ | 1075/4000 [04:11<07:44, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1076/4000 [04:11<07:44, 6.30it/s] 27%|β–ˆβ–ˆβ–‹ | 1077/4000 [04:11<07:44, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1078/4000 [04:11<07:42, 6.31it/s] 27%|β–ˆβ–ˆβ–‹ | 1079/4000 [04:11<07:43, 6.31it/s] 27%|β–ˆβ–ˆβ–‹ | 1080/4000 [04:11<07:42, 6.31it/s] {'loss': 0.6944, 'grad_norm': 0.5660023093223572, 'learning_rate': 8.736855103639037e-05}
27%|β–ˆβ–ˆβ–‹ | 1080/4000 [04:11<07:42, 6.31it/s] 27%|β–ˆβ–ˆβ–‹ | 1081/4000 [04:12<07:45, 6.26it/s] 27%|β–ˆβ–ˆβ–‹ | 1082/4000 [04:12<07:44, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1083/4000 [04:12<07:44, 6.28it/s] 27%|β–ˆβ–ˆβ–‹ | 1084/4000 [04:12<07:42, 6.30it/s] 27%|β–ˆβ–ˆβ–‹ | 1085/4000 [04:12<07:42, 6.30it/s] 27%|β–ˆβ–ˆβ–‹ | 1086/4000 [04:12<07:42, 6.30it/s] 27%|β–ˆβ–ˆβ–‹ | 1087/4000 [04:13<07:41, 6.31it/s] 27%|β–ˆβ–ˆβ–‹ | 1088/4000 [04:13<07:42, 6.30it/s] 27%|β–ˆβ–ˆβ–‹ | 1089/4000 [04:13<07:43, 6.28it/s] 27%|β–ˆβ–ˆβ–‹ | 1090/4000 [04:13<07:42, 6.29it/s] {'loss': 0.7176, 'grad_norm': 0.45464205741882324, 'learning_rate': 8.709263292172794e-05}
27%|β–ˆβ–ˆβ–‹ | 1090/4000 [04:13<07:42, 6.29it/s] 27%|β–ˆβ–ˆβ–‹ | 1091/4000 [04:13<07:43, 6.28it/s] 27%|β–ˆβ–ˆβ–‹ | 1092/4000 [04:13<07:43, 6.27it/s] 27%|β–ˆβ–ˆβ–‹ | 1093/4000 [04:14<07:42, 6.28it/s] 27%|β–ˆβ–ˆβ–‹ | 1094/4000 [04:14<07:48, 6.21it/s] 27%|β–ˆβ–ˆβ–‹ | 1095/4000 [04:14<08:29, 5.71it/s] 27%|β–ˆβ–ˆβ–‹ | 1096/4000 [04:14<08:13, 5.88it/s] 27%|β–ˆβ–ˆβ–‹ | 1097/4000 [04:14<08:03, 6.01it/s] 27%|β–ˆβ–ˆβ–‹ | 1098/4000 [04:14<07:58, 6.06it/s] 27%|β–ˆβ–ˆβ–‹ | 1099/4000 [04:15<07:53, 6.12it/s] 28%|β–ˆβ–ˆβ–Š | 1100/4000 [04:15<07:51, 6.14it/s] {'loss': 0.7053, 'grad_norm': 0.44230857491493225, 'learning_rate': 8.681417957487729e-05}
28%|β–ˆβ–ˆβ–Š | 1100/4000 [04:15<07:51, 6.14it/s] 28%|β–ˆβ–ˆβ–Š | 1101/4000 [04:15<07:49, 6.18it/s] 28%|β–ˆβ–ˆβ–Š | 1102/4000 [04:15<07:45, 6.23it/s] 28%|β–ˆβ–ˆβ–Š | 1103/4000 [04:15<07:44, 6.23it/s] 28%|β–ˆβ–ˆβ–Š | 1104/4000 [04:15<07:42, 6.27it/s] 28%|β–ˆβ–ˆβ–Š | 1105/4000 [04:16<07:40, 6.29it/s] 28%|β–ˆβ–ˆβ–Š | 1106/4000 [04:16<07:40, 6.29it/s] 28%|β–ˆβ–ˆβ–Š | 1107/4000 [04:16<07:40, 6.28it/s] 28%|β–ˆβ–ˆβ–Š | 1108/4000 [04:16<07:40, 6.28it/s] 28%|β–ˆβ–ˆβ–Š | 1109/4000 [04:16<07:41, 6.27it/s] 28%|β–ˆβ–ˆβ–Š | 1110/4000 [04:16<07:40, 6.28it/s] {'loss': 0.7061, 'grad_norm': 0.5209611654281616, 'learning_rate': 8.653321002775478e-05}
28%|β–ˆβ–ˆβ–Š | 1110/4000 [04:16<07:40, 6.28it/s] 28%|β–ˆβ–ˆβ–Š | 1111/4000 [04:16<07:40, 6.28it/s] 28%|β–ˆβ–ˆβ–Š | 1112/4000 [04:17<07:40, 6.28it/s] 28%|β–ˆβ–ˆβ–Š | 1113/4000 [04:17<07:39, 6.29it/s] 28%|β–ˆβ–ˆβ–Š | 1114/4000 [04:17<07:38, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1115/4000 [04:17<07:39, 6.28it/s] 28%|β–ˆβ–ˆβ–Š | 1116/4000 [04:17<07:37, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1117/4000 [04:17<07:37, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1118/4000 [04:18<07:36, 6.32it/s] 28%|β–ˆβ–ˆβ–Š | 1119/4000 [04:18<07:36, 6.31it/s] 28%|β–ˆβ–ˆβ–Š | 1120/4000 [04:18<07:36, 6.30it/s] {'loss': 0.7162, 'grad_norm': 0.5073080658912659, 'learning_rate': 8.624974348425574e-05}
28%|β–ˆβ–ˆβ–Š | 1120/4000 [04:18<07:36, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1121/4000 [04:18<07:39, 6.27it/s] 28%|β–ˆβ–ˆβ–Š | 1122/4000 [04:18<07:37, 6.29it/s] 28%|β–ˆβ–ˆβ–Š | 1123/4000 [04:18<07:36, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1124/4000 [04:19<07:36, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1125/4000 [04:19<07:36, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1126/4000 [04:19<07:35, 6.31it/s] 28%|β–ˆβ–ˆβ–Š | 1127/4000 [04:19<07:36, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1128/4000 [04:19<07:36, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1129/4000 [04:19<07:36, 6.29it/s] 28%|β–ˆβ–ˆβ–Š | 1130/4000 [04:19<07:35, 6.30it/s] {'loss': 0.7059, 'grad_norm': 0.5269641280174255, 'learning_rate': 8.596379931894188e-05}
28%|β–ˆβ–ˆβ–Š | 1130/4000 [04:19<07:35, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1131/4000 [04:20<07:39, 6.24it/s] 28%|β–ˆβ–ˆβ–Š | 1132/4000 [04:20<07:37, 6.27it/s] 28%|β–ˆβ–ˆβ–Š | 1133/4000 [04:20<07:36, 6.28it/s] 28%|β–ˆβ–ˆβ–Š | 1134/4000 [04:20<07:34, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1135/4000 [04:20<07:34, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1136/4000 [04:20<07:34, 6.31it/s] 28%|β–ˆβ–ˆβ–Š | 1137/4000 [04:21<07:34, 6.30it/s] 28%|β–ˆβ–ˆβ–Š | 1138/4000 [04:21<07:32, 6.32it/s] 28%|β–ˆβ–ˆβ–Š | 1139/4000 [04:21<07:33, 6.31it/s] 28%|β–ˆβ–ˆβ–Š | 1140/4000 [04:21<07:32, 6.32it/s] {'loss': 0.7021, 'grad_norm': 0.45516136288642883, 'learning_rate': 8.567539707571703e-05}
28%|β–ˆβ–ˆβ–Š | 1140/4000 [04:21<07:32, 6.32it/s] 29%|β–ˆβ–ˆβ–Š | 1141/4000 [04:21<07:33, 6.30it/s] 29%|β–ˆβ–ˆβ–Š | 1142/4000 [04:21<07:32, 6.32it/s] 29%|β–ˆβ–ˆβ–Š | 1143/4000 [04:22<07:32, 6.31it/s] 29%|β–ˆβ–ˆβ–Š | 1144/4000 [04:22<07:35, 6.27it/s] 29%|β–ˆβ–ˆβ–Š | 1145/4000 [04:22<07:35, 6.27it/s] 29%|β–ˆβ–ˆβ–Š | 1146/4000 [04:22<07:33, 6.29it/s] 29%|β–ˆβ–ˆβ–Š | 1147/4000 [04:22<07:33, 6.29it/s] 29%|β–ˆβ–ˆβ–Š | 1148/4000 [04:22<07:31, 6.31it/s] 29%|β–ˆβ–ˆβ–Š | 1149/4000 [04:23<07:31, 6.31it/s] 29%|β–ˆβ–ˆβ–‰ | 1150/4000 [04:23<07:30, 6.33it/s] {'loss': 0.6988, 'grad_norm': 0.4557558298110962, 'learning_rate': 8.538455646649146e-05}
29%|β–ˆβ–ˆβ–‰ | 1150/4000 [04:23<07:30, 6.33it/s] 29%|β–ˆβ–ˆβ–‰ | 1151/4000 [04:23<07:32, 6.30it/s] 29%|β–ˆβ–ˆβ–‰ | 1152/4000 [04:23<07:32, 6.30it/s] 29%|β–ˆβ–ˆβ–‰ | 1153/4000 [04:23<07:31, 6.30it/s] 29%|β–ˆβ–ˆβ–‰ | 1154/4000 [04:23<07:30, 6.32it/s] 29%|β–ˆβ–ˆβ–‰ | 1155/4000 [04:23<07:31, 6.30it/s] 29%|β–ˆβ–ˆβ–‰ | 1156/4000 [04:24<07:30, 6.31it/s] 29%|β–ˆβ–ˆβ–‰ | 1157/4000 [04:24<07:32, 6.28it/s] 29%|β–ˆβ–ˆβ–‰ | 1158/4000 [04:24<07:32, 6.27it/s] 29%|β–ˆβ–ˆβ–‰ | 1159/4000 [04:24<07:32, 6.28it/s] 29%|β–ˆβ–ˆβ–‰ | 1160/4000 [04:24<07:32, 6.28it/s] {'loss': 0.7003, 'grad_norm': 0.5678983926773071, 'learning_rate': 8.509129736983446e-05}
29%|β–ˆβ–ˆβ–‰ | 1160/4000 [04:24<07:32, 6.28it/s] 29%|β–ˆβ–ˆβ–‰ | 1161/4000 [04:24<07:33, 6.26it/s] 29%|β–ˆβ–ˆβ–‰ | 1162/4000 [04:25<07:33, 6.26it/s] 29%|β–ˆβ–ˆβ–‰ | 1163/4000 [04:25<07:45, 6.09it/s] 29%|β–ˆβ–ˆβ–‰ | 1164/4000 [04:25<08:12, 5.75it/s] 29%|β–ˆβ–ˆβ–‰ | 1165/4000 [04:25<08:31, 5.54it/s] 29%|β–ˆβ–ˆβ–‰ | 1166/4000 [04:25<08:41, 5.44it/s] 29%|β–ˆβ–ˆβ–‰ | 1167/4000 [04:26<08:32, 5.52it/s] 29%|β–ˆβ–ˆβ–‰ | 1168/4000 [04:26<08:13, 5.74it/s] 29%|β–ˆβ–ˆβ–‰ | 1169/4000 [04:26<08:02, 5.87it/s] 29%|β–ˆβ–ˆβ–‰ | 1170/4000 [04:26<07:52, 5.99it/s] {'loss': 0.7027, 'grad_norm': 0.5725619196891785, 'learning_rate': 8.479563982961571e-05}
29%|β–ˆβ–ˆβ–‰ | 1170/4000 [04:26<07:52, 5.99it/s] 29%|β–ˆβ–ˆβ–‰ | 1171/4000 [04:26<07:49, 6.03it/s] 29%|β–ˆβ–ˆβ–‰ | 1172/4000 [04:26<07:43, 6.10it/s] 29%|β–ˆβ–ˆβ–‰ | 1173/4000 [04:26<07:41, 6.13it/s] 29%|β–ˆβ–ˆβ–‰ | 1174/4000 [04:27<07:36, 6.19it/s] 29%|β–ˆβ–ˆβ–‰ | 1175/4000 [04:27<07:34, 6.22it/s] 29%|β–ˆβ–ˆβ–‰ | 1176/4000 [04:27<07:32, 6.25it/s] 29%|β–ˆβ–ˆβ–‰ | 1177/4000 [04:27<07:31, 6.25it/s] 29%|β–ˆβ–ˆβ–‰ | 1178/4000 [04:27<07:29, 6.28it/s] 29%|β–ˆβ–ˆβ–‰ | 1179/4000 [04:27<07:29, 6.27it/s] 30%|β–ˆβ–ˆβ–‰ | 1180/4000 [04:28<07:28, 6.29it/s] {'loss': 0.7054, 'grad_norm': 0.46510183811187744, 'learning_rate': 8.449760405363539e-05}
30%|β–ˆβ–ˆβ–‰ | 1180/4000 [04:28<07:28, 6.29it/s] 30%|β–ˆβ–ˆβ–‰ | 1181/4000 [04:28<07:31, 6.24it/s] 30%|β–ˆβ–ˆβ–‰ | 1182/4000 [04:28<07:29, 6.27it/s] 30%|β–ˆβ–ˆβ–‰ | 1183/4000 [04:28<07:29, 6.27it/s] 30%|β–ˆβ–ˆβ–‰ | 1184/4000 [04:28<07:28, 6.27it/s] 30%|β–ˆβ–ˆβ–‰ | 1185/4000 [04:28<07:28, 6.27it/s] 30%|β–ˆβ–ˆβ–‰ | 1186/4000 [04:29<07:28, 6.28it/s] 30%|β–ˆβ–ˆβ–‰ | 1187/4000 [04:29<07:28, 6.28it/s] 30%|β–ˆβ–ˆβ–‰ | 1188/4000 [04:29<07:27, 6.29it/s] 30%|β–ˆβ–ˆβ–‰ | 1189/4000 [04:29<07:27, 6.29it/s] 30%|β–ˆβ–ˆβ–‰ | 1190/4000 [04:29<07:26, 6.29it/s] {'loss': 0.6948, 'grad_norm': 0.5408710241317749, 'learning_rate': 8.419721041224287e-05}
30%|β–ˆβ–ˆβ–‰ | 1190/4000 [04:29<07:26, 6.29it/s] 30%|β–ˆβ–ˆβ–‰ | 1191/4000 [04:29<07:26, 6.29it/s] 30%|β–ˆβ–ˆβ–‰ | 1192/4000 [04:29<07:26, 6.29it/s] 30%|β–ˆβ–ˆβ–‰ | 1193/4000 [04:30<07:25, 6.30it/s] 30%|β–ˆβ–ˆβ–‰ | 1194/4000 [04:30<07:24, 6.31it/s] 30%|β–ˆβ–ˆβ–‰ | 1195/4000 [04:30<07:24, 6.31it/s] 30%|β–ˆβ–ˆβ–‰ | 1196/4000 [04:30<07:24, 6.31it/s] 30%|β–ˆβ–ˆβ–‰ | 1197/4000 [04:30<07:24, 6.30it/s] 30%|β–ˆβ–ˆβ–‰ | 1198/4000 [04:30<07:24, 6.30it/s] 30%|β–ˆβ–ˆβ–‰ | 1199/4000 [04:31<07:23, 6.31it/s] 30%|β–ˆβ–ˆβ–ˆ | 1200/4000 [04:31<07:23, 6.32it/s] {'loss': 0.7045, 'grad_norm': 0.46410325169563293, 'learning_rate': 8.389447943694451e-05}
30%|β–ˆβ–ˆβ–ˆ | 1200/4000 [04:31<07:23, 6.32it/s] 30%|β–ˆβ–ˆβ–ˆ | 1201/4000 [04:31<07:34, 6.16it/s] 30%|β–ˆβ–ˆβ–ˆ | 1202/4000 [04:31<07:31, 6.20it/s] 30%|β–ˆβ–ˆβ–ˆ | 1203/4000 [04:31<07:28, 6.24it/s] 30%|β–ˆβ–ˆβ–ˆ | 1204/4000 [04:31<07:26, 6.26it/s] 30%|β–ˆβ–ˆβ–ˆ | 1205/4000 [04:32<07:25, 6.27it/s] 30%|β–ˆβ–ˆβ–ˆ | 1206/4000 [04:32<07:25, 6.27it/s] 30%|β–ˆβ–ˆβ–ˆ | 1207/4000 [04:32<07:24, 6.29it/s] 30%|β–ˆβ–ˆβ–ˆ | 1208/4000 [04:32<07:22, 6.31it/s] 30%|β–ˆβ–ˆβ–ˆ | 1209/4000 [04:32<07:22, 6.30it/s] 30%|β–ˆβ–ˆβ–ˆ | 1210/4000 [04:32<07:23, 6.30it/s] {'loss': 0.6939, 'grad_norm': 0.5080893039703369, 'learning_rate': 8.358943181900032e-05}
30%|β–ˆβ–ˆβ–ˆ | 1210/4000 [04:32<07:23, 6.30it/s] 30%|β–ˆβ–ˆβ–ˆ | 1211/4000 [04:33<07:23, 6.29it/s] 30%|β–ˆβ–ˆβ–ˆ | 1212/4000 [04:33<07:22, 6.30it/s] 30%|β–ˆβ–ˆβ–ˆ | 1213/4000 [04:33<07:22, 6.29it/s] 30%|β–ˆβ–ˆβ–ˆ | 1214/4000 [04:33<07:23, 6.28it/s] 30%|β–ˆβ–ˆβ–ˆ | 1215/4000 [04:33<07:23, 6.28it/s] 30%|β–ˆβ–ˆβ–ˆ | 1216/4000 [04:33<07:22, 6.29it/s] 30%|β–ˆβ–ˆβ–ˆ | 1217/4000 [04:33<07:22, 6.29it/s] 30%|β–ˆβ–ˆβ–ˆ | 1218/4000 [04:34<07:21, 6.30it/s] 30%|β–ˆβ–ˆβ–ˆ | 1219/4000 [04:34<07:21, 6.30it/s] 30%|β–ˆβ–ˆβ–ˆ | 1220/4000 [04:34<07:20, 6.31it/s] {'loss': 0.695, 'grad_norm': 0.5485800504684448, 'learning_rate': 8.328208840800981e-05}
30%|β–ˆβ–ˆβ–ˆ | 1220/4000 [04:34<07:20, 6.31it/s] 31%|β–ˆβ–ˆβ–ˆ | 1221/4000 [04:34<07:22, 6.28it/s] 31%|β–ˆβ–ˆβ–ˆ | 1222/4000 [04:34<07:21, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 1223/4000 [04:34<07:21, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 1224/4000 [04:35<07:21, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 1225/4000 [04:35<07:21, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 1226/4000 [04:35<07:20, 6.30it/s] 31%|β–ˆβ–ˆβ–ˆ | 1227/4000 [04:35<07:19, 6.30it/s] 31%|β–ˆβ–ˆβ–ˆ | 1228/4000 [04:35<07:19, 6.31it/s] 31%|β–ˆβ–ˆβ–ˆ | 1229/4000 [04:35<07:19, 6.31it/s] 31%|β–ˆβ–ˆβ–ˆ | 1230/4000 [04:36<07:21, 6.28it/s] {'loss': 0.7029, 'grad_norm': 0.5217918157577515, 'learning_rate': 8.297247021048686e-05}
31%|β–ˆβ–ˆβ–ˆ | 1230/4000 [04:36<07:21, 6.28it/s] 31%|β–ˆβ–ˆβ–ˆ | 1231/4000 [04:36<07:23, 6.25it/s] 31%|β–ˆβ–ˆβ–ˆ | 1232/4000 [04:36<07:22, 6.26it/s] 31%|β–ˆβ–ˆβ–ˆ | 1233/4000 [04:36<07:23, 6.25it/s] 31%|β–ˆβ–ˆβ–ˆ | 1234/4000 [04:36<07:21, 6.26it/s] 31%|β–ˆβ–ˆβ–ˆ | 1235/4000 [04:36<07:19, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 1236/4000 [04:36<07:20, 6.28it/s] 31%|β–ˆβ–ˆβ–ˆ | 1237/4000 [04:37<07:19, 6.28it/s] 31%|β–ˆβ–ˆβ–ˆ | 1238/4000 [04:37<07:19, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 1239/4000 [04:37<07:19, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 1240/4000 [04:37<07:21, 6.25it/s] {'loss': 0.7023, 'grad_norm': 0.5817974209785461, 'learning_rate': 8.266059838842396e-05}
31%|β–ˆβ–ˆβ–ˆ | 1240/4000 [04:37<07:21, 6.25it/s] 31%|β–ˆβ–ˆβ–ˆ | 1241/4000 [04:37<07:21, 6.25it/s] 31%|β–ˆβ–ˆβ–ˆ | 1242/4000 [04:37<07:20, 6.26it/s] 31%|β–ˆβ–ˆβ–ˆ | 1243/4000 [04:38<07:19, 6.28it/s] 31%|β–ˆβ–ˆβ–ˆ | 1244/4000 [04:38<07:18, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 1245/4000 [04:38<07:16, 6.31it/s] 31%|β–ˆβ–ˆβ–ˆ | 1246/4000 [04:38<07:16, 6.31it/s] 31%|β–ˆβ–ˆβ–ˆ | 1247/4000 [04:38<07:16, 6.31it/s] 31%|β–ˆβ–ˆβ–ˆ | 1248/4000 [04:38<07:15, 6.32it/s] 31%|β–ˆβ–ˆβ–ˆ | 1249/4000 [04:39<07:15, 6.31it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1250/4000 [04:39<07:15, 6.31it/s] {'loss': 0.6997, 'grad_norm': 0.4946444034576416, 'learning_rate': 8.23464942578459e-05}
31%|β–ˆβ–ˆβ–ˆβ– | 1250/4000 [04:39<07:15, 6.31it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1251/4000 [04:39<07:17, 6.28it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1252/4000 [04:39<07:16, 6.30it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1253/4000 [04:39<07:16, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1254/4000 [04:39<07:16, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1255/4000 [04:40<07:15, 6.30it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1256/4000 [04:40<07:16, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1257/4000 [04:40<07:16, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1258/4000 [04:40<07:15, 6.29it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 1259/4000 [04:40<07:15, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1260/4000 [04:40<07:16, 6.28it/s] {'loss': 0.7013, 'grad_norm': 0.5017271041870117, 'learning_rate': 8.203017928735277e-05}
32%|β–ˆβ–ˆβ–ˆβ– | 1260/4000 [04:40<07:16, 6.28it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1261/4000 [04:40<07:17, 6.27it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1262/4000 [04:41<07:15, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1263/4000 [04:41<07:14, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1264/4000 [04:41<07:13, 6.31it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1265/4000 [04:41<07:15, 6.28it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1266/4000 [04:41<07:14, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1267/4000 [04:41<07:13, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1268/4000 [04:42<07:12, 6.32it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1269/4000 [04:42<07:13, 6.31it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1270/4000 [04:42<07:13, 6.29it/s] {'loss': 0.6914, 'grad_norm': 0.4506302773952484, 'learning_rate': 8.17116750966526e-05}
32%|β–ˆβ–ˆβ–ˆβ– | 1270/4000 [04:42<07:13, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1271/4000 [04:42<07:14, 6.27it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1272/4000 [04:42<07:14, 6.28it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1273/4000 [04:42<07:13, 6.28it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1274/4000 [04:43<07:12, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1275/4000 [04:43<07:11, 6.32it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1276/4000 [04:43<07:10, 6.33it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1277/4000 [04:43<07:10, 6.33it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1278/4000 [04:43<07:09, 6.34it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1279/4000 [04:43<07:10, 6.32it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1280/4000 [04:43<07:11, 6.30it/s] {'loss': 0.7048, 'grad_norm': 0.5228651762008667, 'learning_rate': 8.139100345508377e-05}
32%|β–ˆβ–ˆβ–ˆβ– | 1280/4000 [04:43<07:11, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1281/4000 [04:44<07:13, 6.28it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1282/4000 [04:44<07:12, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1283/4000 [04:44<07:11, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1284/4000 [04:44<07:11, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1285/4000 [04:44<07:10, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1286/4000 [04:44<07:10, 6.31it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1287/4000 [04:45<07:10, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1288/4000 [04:45<07:10, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1289/4000 [04:45<07:10, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1290/4000 [04:45<07:10, 6.30it/s] {'loss': 0.6809, 'grad_norm': 0.47274133563041687, 'learning_rate': 8.106818628012697e-05}
32%|β–ˆβ–ˆβ–ˆβ– | 1290/4000 [04:45<07:10, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1291/4000 [04:45<07:11, 6.28it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1292/4000 [04:45<07:09, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1293/4000 [04:46<07:10, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1294/4000 [04:46<07:09, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1295/4000 [04:46<07:10, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1296/4000 [04:46<07:09, 6.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1297/4000 [04:46<07:09, 6.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1298/4000 [04:46<07:08, 6.31it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 1299/4000 [04:46<07:07, 6.31it/s] 32%|β–ˆβ–ˆβ–ˆβ–Ž | 1300/4000 [04:47<07:07, 6.32it/s] {'loss': 0.693, 'grad_norm': 0.47493863105773926, 'learning_rate': 8.074324563590736e-05}
32%|β–ˆβ–ˆβ–ˆβ–Ž | 1300/4000 [04:47<07:07, 6.32it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1301/4000 [04:47<07:08, 6.29it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1302/4000 [04:47<07:08, 6.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1303/4000 [04:47<07:08, 6.29it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1304/4000 [04:47<07:08, 6.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1305/4000 [04:47<07:07, 6.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1306/4000 [04:48<07:07, 6.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1307/4000 [04:48<07:07, 6.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1308/4000 [04:48<07:08, 6.29it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1309/4000 [04:48<07:08, 6.28it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1310/4000 [04:48<07:08, 6.28it/s] {'loss': 0.7053, 'grad_norm': 0.5628791451454163, 'learning_rate': 8.041620373168628e-05}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 1310/4000 [04:48<07:08, 6.28it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1311/4000 [04:48<07:08, 6.27it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1312/4000 [04:49<07:08, 6.27it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1313/4000 [04:49<07:08, 6.27it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1314/4000 [04:49<07:06, 6.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1315/4000 [04:49<07:06, 6.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1316/4000 [04:49<07:06, 6.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1317/4000 [04:49<07:06, 6.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1318/4000 [04:50<07:09, 6.24it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1319/4000 [04:50<07:09, 6.25it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1320/4000 [04:50<07:07, 6.26it/s] {'loss': 0.7127, 'grad_norm': 0.5795761942863464, 'learning_rate': 8.008708292034349e-05}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 1320/4000 [04:50<07:07, 6.26it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1321/4000 [04:50<07:09, 6.24it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1322/4000 [04:50<07:08, 6.25it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1323/4000 [04:50<07:06, 6.27it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1324/4000 [04:50<07:07, 6.26it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1325/4000 [04:51<07:07, 6.26it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1326/4000 [04:51<07:06, 6.27it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1327/4000 [04:51<07:06, 6.27it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1328/4000 [04:51<07:05, 6.27it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1329/4000 [04:51<07:06, 6.26it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1330/4000 [04:51<07:05, 6.27it/s] {'loss': 0.6882, 'grad_norm': 0.46782001852989197, 'learning_rate': 7.975590569684925e-05}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 1330/4000 [04:51<07:05, 6.27it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1331/4000 [04:52<07:07, 6.25it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1332/4000 [04:52<07:07, 6.24it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1333/4000 [04:52<07:07, 6.24it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1334/4000 [04:52<07:06, 6.26it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1335/4000 [04:52<07:10, 6.19it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1336/4000 [04:52<07:24, 5.99it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1337/4000 [04:53<07:26, 5.96it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1338/4000 [04:53<07:27, 5.95it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1339/4000 [04:53<07:26, 5.96it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1340/4000 [04:53<07:26, 5.96it/s] {'loss': 0.6894, 'grad_norm': 0.45206907391548157, 'learning_rate': 7.942269469672687e-05}
34%|β–ˆβ–ˆβ–ˆβ–Ž | 1340/4000 [04:53<07:26, 5.96it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1341/4000 [04:53<07:29, 5.92it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1342/4000 [04:53<07:28, 5.93it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1343/4000 [04:54<07:26, 5.96it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1344/4000 [04:54<07:25, 5.97it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1345/4000 [04:54<07:33, 5.85it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1346/4000 [04:54<07:28, 5.92it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1347/4000 [04:54<07:24, 5.97it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1348/4000 [04:54<07:19, 6.04it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1349/4000 [04:55<07:15, 6.09it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1350/4000 [04:55<07:10, 6.16it/s] {'loss': 0.6839, 'grad_norm': 0.5340469479560852, 'learning_rate': 7.908747269450558e-05}
34%|β–ˆβ–ˆβ–ˆβ– | 1350/4000 [04:55<07:10, 6.16it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1351/4000 [04:55<07:08, 6.19it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1352/4000 [04:55<07:06, 6.20it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1353/4000 [04:55<07:07, 6.19it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1354/4000 [04:55<07:06, 6.20it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1355/4000 [04:56<07:06, 6.20it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1356/4000 [04:56<07:05, 6.21it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1357/4000 [04:56<07:03, 6.24it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1358/4000 [04:56<07:01, 6.27it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1359/4000 [04:56<07:07, 6.18it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1360/4000 [04:56<07:31, 5.84it/s] {'loss': 0.6956, 'grad_norm': 0.6080896854400635, 'learning_rate': 7.875026260216393e-05}
34%|β–ˆβ–ˆβ–ˆβ– | 1360/4000 [04:56<07:31, 5.84it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1361/4000 [04:57<07:32, 5.83it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1362/4000 [04:57<07:27, 5.90it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1363/4000 [04:57<07:25, 5.92it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1364/4000 [04:57<07:23, 5.95it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1365/4000 [04:57<07:15, 6.05it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1366/4000 [04:57<07:09, 6.13it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1367/4000 [04:58<07:07, 6.16it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1368/4000 [04:58<07:09, 6.12it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1369/4000 [04:58<07:26, 5.89it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1370/4000 [04:58<07:42, 5.68it/s] {'loss': 0.6895, 'grad_norm': 0.5087507367134094, 'learning_rate': 7.841108746756382e-05}
34%|β–ˆβ–ˆβ–ˆβ– | 1370/4000 [04:58<07:42, 5.68it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1371/4000 [04:58<07:38, 5.74it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1372/4000 [04:58<07:26, 5.89it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1373/4000 [04:59<07:19, 5.98it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1374/4000 [04:59<07:12, 6.07it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1375/4000 [04:59<07:09, 6.11it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1376/4000 [04:59<07:13, 6.05it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1377/4000 [04:59<07:13, 6.05it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1378/4000 [04:59<07:12, 6.07it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1379/4000 [05:00<07:09, 6.10it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 1380/4000 [05:00<07:10, 6.09it/s] {'loss': 0.6839, 'grad_norm': 0.4710014760494232, 'learning_rate': 7.806997047287516e-05}
34%|β–ˆβ–ˆβ–ˆβ– | 1380/4000 [05:00<07:10, 6.09it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1381/4000 [05:00<07:08, 6.11it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1382/4000 [05:00<07:06, 6.13it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1383/4000 [05:00<07:05, 6.15it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1384/4000 [05:00<07:02, 6.18it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1385/4000 [05:01<07:04, 6.16it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1386/4000 [05:01<07:04, 6.16it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1387/4000 [05:01<07:05, 6.15it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1388/4000 [05:01<07:02, 6.18it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1389/4000 [05:01<07:01, 6.19it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1390/4000 [05:01<07:02, 6.18it/s] {'loss': 0.6944, 'grad_norm': 0.5395244359970093, 'learning_rate': 7.772693493299138e-05}
35%|β–ˆβ–ˆβ–ˆβ– | 1390/4000 [05:01<07:02, 6.18it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1391/4000 [05:02<07:04, 6.15it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1392/4000 [05:02<07:06, 6.12it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1393/4000 [05:02<07:04, 6.15it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1394/4000 [05:02<07:04, 6.14it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1395/4000 [05:02<07:01, 6.18it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1396/4000 [05:02<06:58, 6.22it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1397/4000 [05:02<06:58, 6.22it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1398/4000 [05:03<07:00, 6.20it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 1399/4000 [05:03<07:02, 6.16it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1400/4000 [05:03<07:06, 6.10it/s] {'loss': 0.6979, 'grad_norm': 0.5339425206184387, 'learning_rate': 7.7382004293936e-05}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 1400/4000 [05:03<07:06, 6.10it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1401/4000 [05:03<07:11, 6.02it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1402/4000 [05:03<07:10, 6.04it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1403/4000 [05:03<07:08, 6.07it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1404/4000 [05:04<07:05, 6.11it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1405/4000 [05:04<07:00, 6.16it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1406/4000 [05:04<07:02, 6.14it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1407/4000 [05:04<07:04, 6.11it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1408/4000 [05:04<07:07, 6.06it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1409/4000 [05:04<07:11, 6.00it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1410/4000 [05:05<07:13, 5.98it/s] {'loss': 0.6793, 'grad_norm': 0.4487476348876953, 'learning_rate': 7.703520213126e-05}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 1410/4000 [05:05<07:13, 5.98it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1411/4000 [05:05<07:12, 5.99it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1412/4000 [05:05<07:12, 5.98it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1413/4000 [05:05<07:09, 6.02it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1414/4000 [05:05<07:10, 6.01it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1415/4000 [05:05<07:12, 5.97it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1416/4000 [05:06<07:14, 5.95it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1417/4000 [05:06<07:15, 5.94it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1418/4000 [05:06<07:17, 5.90it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1419/4000 [05:06<07:17, 5.90it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1420/4000 [05:06<07:15, 5.93it/s] {'loss': 0.7039, 'grad_norm': 0.5018513798713684, 'learning_rate': 7.66865521484305e-05}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 1420/4000 [05:06<07:15, 5.93it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1421/4000 [05:06<07:16, 5.91it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1422/4000 [05:07<07:15, 5.91it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1423/4000 [05:07<07:17, 5.89it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1424/4000 [05:07<07:17, 5.89it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1425/4000 [05:07<07:14, 5.93it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1426/4000 [05:07<07:09, 5.99it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1427/4000 [05:07<07:09, 5.98it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1428/4000 [05:08<07:12, 5.94it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1429/4000 [05:08<07:11, 5.96it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1430/4000 [05:08<07:12, 5.94it/s] {'loss': 0.6904, 'grad_norm': 0.497934490442276, 'learning_rate': 7.633607817521074e-05}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 1430/4000 [05:08<07:12, 5.94it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1431/4000 [05:08<07:12, 5.94it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1432/4000 [05:08<07:11, 5.95it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1433/4000 [05:08<07:07, 6.00it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1434/4000 [05:09<07:09, 5.98it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1435/4000 [05:09<07:10, 5.96it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1436/4000 [05:09<07:12, 5.93it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1437/4000 [05:09<07:11, 5.93it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1438/4000 [05:09<07:09, 5.96it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1439/4000 [05:10<07:10, 5.94it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1440/4000 [05:10<07:11, 5.94it/s] {'loss': 0.6821, 'grad_norm': 0.5556654334068298, 'learning_rate': 7.598380416603119e-05}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 1440/4000 [05:10<07:11, 5.94it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1441/4000 [05:10<07:09, 5.96it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1442/4000 [05:10<07:10, 5.94it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1443/4000 [05:10<07:09, 5.95it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1444/4000 [05:10<07:08, 5.96it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1445/4000 [05:11<07:08, 5.96it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1446/4000 [05:11<07:08, 5.96it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1447/4000 [05:11<07:05, 6.00it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1448/4000 [05:11<07:06, 5.99it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1449/4000 [05:11<07:06, 5.98it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1450/4000 [05:11<07:04, 6.00it/s] {'loss': 0.682, 'grad_norm': 0.5551984906196594, 'learning_rate': 7.562975419835247e-05}
36%|β–ˆβ–ˆβ–ˆβ–‹ | 1450/4000 [05:11<07:04, 6.00it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1451/4000 [05:12<07:04, 6.00it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1452/4000 [05:12<07:07, 5.95it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1453/4000 [05:12<07:08, 5.95it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1454/4000 [05:12<07:07, 5.96it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1455/4000 [05:12<07:05, 5.99it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1456/4000 [05:12<07:04, 6.00it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1457/4000 [05:13<07:03, 6.00it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1458/4000 [05:13<07:05, 5.97it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1459/4000 [05:13<07:05, 5.98it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1460/4000 [05:13<07:05, 5.96it/s] {'loss': 0.6929, 'grad_norm': 0.5837743878364563, 'learning_rate': 7.527395247101956e-05}
36%|β–ˆβ–ˆβ–ˆβ–‹ | 1460/4000 [05:13<07:05, 5.96it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1461/4000 [05:13<07:07, 5.94it/s]Rank 2, Worker 3: Wait for shard 19 in dataset 0 in 0.00 seconds
Rank 2, Worker 3: Caching shard...
37%|β–ˆβ–ˆβ–ˆβ–‹ | 1462/4000 [05:13<07:14, 5.85it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1463/4000 [05:14<07:19, 5.78it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1464/4000 [05:14<07:17, 5.80it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1465/4000 [05:14<07:15, 5.82it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1466/4000 [05:14<07:14, 5.84it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1467/4000 [05:14<07:14, 5.83it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1468/4000 [05:14<07:15, 5.81it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1469/4000 [05:15<07:11, 5.87it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1470/4000 [05:15<07:09, 5.89it/s] {'loss': 0.6718, 'grad_norm': 0.504239022731781, 'learning_rate': 7.491642330260789e-05}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 1470/4000 [05:15<07:09, 5.89it/s]Rank 0, Worker 0: Wait for shard 60 in dataset 0 in 0.00 seconds
Rank 0, Worker 0: Caching shard...
37%|β–ˆβ–ˆβ–ˆβ–‹ | 1471/4000 [05:15<08:01, 5.25it/s]Rank 2, Worker 1: Wait for shard 1 in dataset 0 in 0.00 seconds
Rank 2, Worker 1: Caching shard...
Rank 3, Worker 1: Wait for shard 53 in dataset 0 in 0.00 seconds
Rank 3, Worker 1: Caching shard...
37%|β–ˆβ–ˆβ–ˆβ–‹ | 1472/4000 [05:15<08:04, 5.21it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1473/4000 [05:15<08:08, 5.18it/s]Rank 1, Worker 3: Wait for shard 17 in dataset 0 in 0.00 seconds
Rank 1, Worker 3: Caching shard...
37%|β–ˆβ–ˆβ–ˆβ–‹ | 1474/4000 [05:16<07:57, 5.29it/s]Rank 3, Worker 4: Wait for shard 5 in dataset 0 in 0.00 seconds
Rank 3, Worker 4: Caching shard...
37%|β–ˆβ–ˆβ–ˆβ–‹ | 1475/4000 [05:16<07:55, 5.31it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1476/4000 [05:16<07:57, 5.29it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1477/4000 [05:16<08:01, 5.24it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1478/4000 [05:16<07:56, 5.29it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1479/4000 [05:16<07:51, 5.35it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1480/4000 [05:17<07:56, 5.29it/s] {'loss': 0.6712, 'grad_norm': 0.4760385751724243, 'learning_rate': 7.45571911297612e-05}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 1480/4000 [05:17<07:56, 5.29it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1481/4000 [05:17<07:55, 5.30it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1482/4000 [05:17<07:52, 5.32it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1483/4000 [05:17<07:58, 5.26it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1484/4000 [05:17<08:01, 5.23it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1485/4000 [05:18<07:59, 5.24it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1486/4000 [05:18<07:57, 5.26it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1487/4000 [05:18<08:16, 5.07it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1488/4000 [05:18<08:02, 5.20it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1489/4000 [05:18<08:00, 5.23it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1490/4000 [05:19<07:59, 5.23it/s] {'loss': 0.6831, 'grad_norm': 0.4641706347465515, 'learning_rate': 7.419628050552131e-05}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 1490/4000 [05:19<07:59, 5.23it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1491/4000 [05:19<07:56, 5.27it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1492/4000 [05:19<07:47, 5.36it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1493/4000 [05:19<07:35, 5.50it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1494/4000 [05:19<07:44, 5.40it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1495/4000 [05:20<08:22, 4.99it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1496/4000 [05:20<08:11, 5.09it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1497/4000 [05:20<07:48, 5.34it/s]Rank 0, Worker 3: Wait for shard 63 in dataset 0 in 0.00 seconds
Rank 0, Worker 3: Caching shard...
37%|β–ˆβ–ˆβ–ˆβ–‹ | 1498/4000 [05:20<07:46, 5.37it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1499/4000 [05:20<07:42, 5.40it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1500/4000 [05:20<07:45, 5.37it/s] {'loss': 0.6675, 'grad_norm': 0.5901619791984558, 'learning_rate': 7.383371609764999e-05}
38%|β–ˆβ–ˆβ–ˆβ–Š | 1500/4000 [05:20<07:45, 5.37it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Copying experiment config directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/experiment_cfg to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-1500/experiment_cfg
Copying processor directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/processor to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-1500
Copying wandb_config.json from /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/wandb_config.json to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-1500/wandb_config.json
38%|β–ˆβ–ˆβ–ˆβ–Š | 1501/4000 [05:48<5:52:59, 8.48s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1502/4000 [05:48<4:08:59, 5.98s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1503/4000 [05:49<2:56:11, 4.23s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1504/4000 [05:49<2:05:16, 3.01s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1505/4000 [05:49<1:29:37, 2.16s/it]Rank 0, Worker 5: Wait for shard 58 in dataset 0 in 0.00 seconds
Rank 0, Worker 5: Caching shard...
Rank 2, Worker 5: Wait for shard 7 in dataset 0 in 0.00 seconds
Rank 2, Worker 5: Caching shard...
38%|β–ˆβ–ˆβ–ˆβ–Š | 1506/4000 [05:49<1:04:52, 1.56s/it]Rank 1, Worker 0: Wait for shard 45 in dataset 0 in 0.00 seconds
Rank 1, Worker 0: Caching shard...
38%|β–ˆβ–ˆβ–ˆβ–Š | 1507/4000 [05:49<47:40, 1.15s/it] Rank 0, Worker 1: Wait for shard 28 in dataset 0 in 0.00 seconds
Rank 0, Worker 1: Caching shard...
38%|β–ˆβ–ˆβ–ˆβ–Š | 1508/4000 [05:49<35:37, 1.17it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1509/4000 [05:50<27:05, 1.53it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1510/4000 [05:50<21:00, 1.98it/s] {'loss': 0.6798, 'grad_norm': 0.5062085390090942, 'learning_rate': 7.346952268694288e-05}
38%|β–ˆβ–ˆβ–ˆβ–Š | 1510/4000 [05:50<21:00, 1.98it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1511/4000 [05:50<16:55, 2.45it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1512/4000 [05:50<14:04, 2.95it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1513/4000 [05:50<12:00, 3.45it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1514/4000 [05:50<10:27, 3.96it/s]Rank 0, Worker 2: Wait for shard 47 in dataset 0 in 0.00 seconds
Rank 0, Worker 2: Caching shard...
Rank 3, Worker 2: Wait for shard 41 in dataset 0 in 0.00 seconds
Rank 3, Worker 2: Caching shard...
Rank 1, Worker 2: Wait for shard 8 in dataset 0 in 0.00 seconds
Rank 1, Worker 2: Caching shard...
38%|β–ˆβ–ˆβ–ˆβ–Š | 1515/4000 [05:51<09:37, 4.30it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1516/4000 [05:51<09:06, 4.55it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1517/4000 [05:51<08:41, 4.76it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1518/4000 [05:51<08:11, 5.05it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1519/4000 [05:51<08:01, 5.15it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1520/4000 [05:52<07:59, 5.17it/s] {'loss': 0.68, 'grad_norm': 0.49795156717300415, 'learning_rate': 7.310372516553585e-05}
38%|β–ˆβ–ˆβ–ˆβ–Š | 1520/4000 [05:52<07:59, 5.17it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1521/4000 [05:52<07:54, 5.23it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1522/4000 [05:52<07:43, 5.34it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1523/4000 [05:52<07:45, 5.32it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1524/4000 [05:52<07:45, 5.32it/s]Rank 3, Worker 0: Wait for shard 14 in dataset 0 in 0.00 seconds
Rank 3, Worker 0: Caching shard...
38%|β–ˆβ–ˆβ–ˆβ–Š | 1525/4000 [05:53<07:42, 5.35it/s]Rank 1, Worker 1: Wait for shard 50 in dataset 0 in 0.00 seconds
Rank 1, Worker 1: Caching shard...
38%|β–ˆβ–ˆβ–ˆβ–Š | 1526/4000 [05:53<07:48, 5.28it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1527/4000 [05:53<07:43, 5.33it/s]Rank 3, Worker 3: Wait for shard 29 in dataset 0 in 0.00 seconds
Rank 3, Worker 3: Caching shard...
38%|β–ˆβ–ˆβ–ˆβ–Š | 1528/4000 [05:53<07:56, 5.19it/s]Rank 2, Worker 4: Wait for shard 35 in dataset 0 in 0.00 seconds
Rank 2, Worker 4: Caching shard...
38%|β–ˆβ–ˆβ–ˆβ–Š | 1529/4000 [05:53<08:05, 5.09it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1530/4000 [05:54<08:01, 5.13it/s] {'loss': 0.6758, 'grad_norm': 0.4773595929145813, 'learning_rate': 7.273634853520356e-05}
38%|β–ˆβ–ˆβ–ˆβ–Š | 1530/4000 [05:54<08:01, 5.13it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1531/4000 [05:54<07:56, 5.18it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1532/4000 [05:54<07:58, 5.15it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1533/4000 [05:54<08:04, 5.10it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1534/4000 [05:54<08:03, 5.10it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1535/4000 [05:54<08:02, 5.11it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1536/4000 [05:55<08:15, 4.97it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1537/4000 [05:55<08:20, 4.92it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1538/4000 [05:55<08:14, 4.97it/s]Rank 2, Worker 2: Wait for shard 30 in dataset 0 in 0.00 seconds
Rank 2, Worker 2: Caching shard...
38%|β–ˆβ–ˆβ–ˆβ–Š | 1539/4000 [05:55<08:04, 5.08it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 1540/4000 [05:55<07:52, 5.21it/s] {'loss': 0.687, 'grad_norm': 0.5229724645614624, 'learning_rate': 7.236741790565072e-05}
38%|β–ˆβ–ˆβ–ˆβ–Š | 1540/4000 [05:55<07:52, 5.21it/s]Rank 1, Worker 4: Wait for shard 40 in dataset 0 in 0.00 seconds
Rank 1, Worker 4: Caching shard...
39%|β–ˆβ–ˆβ–ˆβ–Š | 1541/4000 [05:56<08:04, 5.08it/s]Rank 3, Worker 5: Wait for shard 31 in dataset 0 in 0.00 seconds
Rank 3, Worker 5: Caching shard...
Rank 1, Worker 5: Wait for shard 27 in dataset 0 in 0.00 seconds
Rank 1, Worker 5: Caching shard...
39%|β–ˆβ–ˆβ–ˆβ–Š | 1542/4000 [05:56<08:31, 4.81it/s]Rank 2, Worker 0: Wait for shard 52 in dataset 0 in 0.00 seconds
Rank 2, Worker 0: Caching shard...
39%|β–ˆβ–ˆβ–ˆβ–Š | 1543/4000 [05:56<08:31, 4.81it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 1544/4000 [05:56<08:45, 4.67it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 1545/4000 [05:57<08:38, 4.74it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 1546/4000 [05:57<08:38, 4.73it/s]Rank 0, Worker 4: Wait for shard 12 in dataset 0 in 0.00 seconds
Rank 0, Worker 4: Caching shard...
39%|β–ˆβ–ˆβ–ˆβ–Š | 1547/4000 [05:57<08:55, 4.58it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 1548/4000 [05:57<08:58, 4.55it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 1549/4000 [05:57<08:50, 4.62it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1550/4000 [05:58<08:47, 4.65it/s] {'loss': 0.6939, 'grad_norm': 0.4959690570831299, 'learning_rate': 7.199695849279576e-05}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 1550/4000 [05:58<08:47, 4.65it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1551/4000 [05:58<08:54, 4.58it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1552/4000 [05:58<09:18, 4.38it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1553/4000 [05:58<09:04, 4.50it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1554/4000 [05:59<08:51, 4.60it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1555/4000 [05:59<09:00, 4.52it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1556/4000 [05:59<09:04, 4.49it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1557/4000 [05:59<09:08, 4.46it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1558/4000 [05:59<08:59, 4.53it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1559/4000 [06:00<08:50, 4.60it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1560/4000 [06:00<09:02, 4.50it/s] {'loss': 0.6905, 'grad_norm': 0.4881291687488556, 'learning_rate': 7.162499561704747e-05}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 1560/4000 [06:00<09:02, 4.50it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1561/4000 [06:00<08:51, 4.59it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1562/4000 [06:00<08:48, 4.62it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1563/4000 [06:01<09:02, 4.50it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1564/4000 [06:01<08:59, 4.52it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1565/4000 [06:01<08:48, 4.61it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1566/4000 [06:01<08:48, 4.60it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1567/4000 [06:01<08:51, 4.58it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1568/4000 [06:02<08:51, 4.57it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1569/4000 [06:02<08:54, 4.55it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1570/4000 [06:02<08:57, 4.52it/s] {'loss': 0.6672, 'grad_norm': 0.48054420948028564, 'learning_rate': 7.125155470157429e-05}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 1570/4000 [06:02<08:57, 4.52it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1571/4000 [06:02<09:21, 4.33it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1572/4000 [06:03<09:14, 4.38it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1573/4000 [06:03<08:46, 4.61it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1574/4000 [06:03<09:31, 4.24it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1575/4000 [06:03<09:12, 4.39it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1576/4000 [06:03<09:16, 4.36it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1577/4000 [06:04<09:07, 4.43it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1578/4000 [06:04<08:53, 4.54it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1579/4000 [06:04<08:46, 4.60it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1580/4000 [06:04<09:01, 4.47it/s] {'loss': 0.6745, 'grad_norm': 0.47718197107315063, 'learning_rate': 7.087666127056675e-05}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 1580/4000 [06:04<09:01, 4.47it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1581/4000 [06:05<09:07, 4.41it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1582/4000 [06:05<09:02, 4.46it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1583/4000 [06:05<09:06, 4.42it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1584/4000 [06:05<09:11, 4.38it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1585/4000 [06:05<09:00, 4.47it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1586/4000 [06:06<08:57, 4.49it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1587/4000 [06:06<08:57, 4.49it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1588/4000 [06:06<09:05, 4.42it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1589/4000 [06:06<09:29, 4.24it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1590/4000 [06:07<09:24, 4.27it/s] {'loss': 0.6731, 'grad_norm': 0.39660972356796265, 'learning_rate': 7.050034094749286e-05}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 1590/4000 [06:07<09:24, 4.27it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1591/4000 [06:07<09:40, 4.15it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1592/4000 [06:07<09:33, 4.20it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1593/4000 [06:07<09:18, 4.31it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1594/4000 [06:08<09:06, 4.41it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1595/4000 [06:08<08:55, 4.49it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1596/4000 [06:08<08:49, 4.54it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1597/4000 [06:08<08:34, 4.67it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1598/4000 [06:08<08:13, 4.87it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1599/4000 [06:09<08:01, 4.98it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1600/4000 [06:09<07:52, 5.08it/s] {'loss': 0.6727, 'grad_norm': 0.4159504771232605, 'learning_rate': 7.012261945334683e-05}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1600/4000 [06:09<07:52, 5.08it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1601/4000 [06:09<07:45, 5.15it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1602/4000 [06:09<07:42, 5.18it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1603/4000 [06:09<07:35, 5.26it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1604/4000 [06:09<07:23, 5.40it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1605/4000 [06:10<07:12, 5.54it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1606/4000 [06:10<07:09, 5.57it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1607/4000 [06:10<07:15, 5.49it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1608/4000 [06:10<07:21, 5.42it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1609/4000 [06:10<07:09, 5.56it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1610/4000 [06:11<06:57, 5.72it/s] {'loss': 0.6777, 'grad_norm': 0.5238457322120667, 'learning_rate': 6.974352260489103e-05}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1610/4000 [06:11<06:57, 5.72it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1611/4000 [06:11<07:03, 5.64it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1612/4000 [06:11<07:04, 5.62it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1613/4000 [06:11<06:57, 5.71it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1614/4000 [06:11<06:47, 5.85it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1615/4000 [06:11<06:39, 5.98it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1616/4000 [06:12<06:32, 6.07it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1617/4000 [06:12<06:31, 6.09it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1618/4000 [06:12<06:30, 6.10it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1619/4000 [06:12<06:28, 6.13it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1620/4000 [06:12<06:24, 6.19it/s] {'loss': 0.6842, 'grad_norm': 0.5296777486801147, 'learning_rate': 6.936307631289148e-05}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1620/4000 [06:12<06:24, 6.19it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1621/4000 [06:12<06:23, 6.20it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1622/4000 [06:12<06:20, 6.25it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1623/4000 [06:13<06:18, 6.28it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1624/4000 [06:13<06:16, 6.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1625/4000 [06:13<06:18, 6.27it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1626/4000 [06:13<06:18, 6.27it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1627/4000 [06:13<06:33, 6.03it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1628/4000 [06:14<06:56, 5.70it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1629/4000 [06:14<07:17, 5.42it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1630/4000 [06:14<07:21, 5.37it/s] {'loss': 0.6722, 'grad_norm': 0.5169921517372131, 'learning_rate': 6.898130658034685e-05}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1630/4000 [06:14<07:21, 5.37it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1631/4000 [06:14<07:15, 5.44it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1632/4000 [06:14<06:56, 5.68it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1633/4000 [06:14<06:44, 5.85it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1634/4000 [06:15<06:35, 5.98it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1635/4000 [06:15<06:30, 6.05it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1636/4000 [06:15<06:26, 6.12it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1637/4000 [06:15<06:24, 6.14it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1638/4000 [06:15<06:21, 6.20it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1639/4000 [06:15<06:19, 6.21it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1640/4000 [06:16<06:17, 6.25it/s] {'loss': 0.6768, 'grad_norm': 0.46622392535209656, 'learning_rate': 6.859823950071127e-05}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1640/4000 [06:16<06:17, 6.25it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1641/4000 [06:16<06:21, 6.19it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1642/4000 [06:16<06:18, 6.23it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1643/4000 [06:16<06:18, 6.23it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1644/4000 [06:16<06:15, 6.27it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1645/4000 [06:16<06:16, 6.26it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1646/4000 [06:16<06:14, 6.28it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1647/4000 [06:17<06:14, 6.28it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1648/4000 [06:17<06:13, 6.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1649/4000 [06:17<06:14, 6.28it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1650/4000 [06:17<06:14, 6.28it/s] {'loss': 0.6875, 'grad_norm': 0.4715481698513031, 'learning_rate': 6.821390125611078e-05}
41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1650/4000 [06:17<06:14, 6.28it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1651/4000 [06:17<06:14, 6.28it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1652/4000 [06:17<06:13, 6.29it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1653/4000 [06:18<06:12, 6.29it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1654/4000 [06:18<06:12, 6.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1655/4000 [06:18<06:12, 6.29it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1656/4000 [06:18<06:12, 6.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1657/4000 [06:18<06:12, 6.29it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1658/4000 [06:18<06:12, 6.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1659/4000 [06:19<06:11, 6.31it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1660/4000 [06:19<06:10, 6.31it/s] {'loss': 0.6647, 'grad_norm': 0.4542713761329651, 'learning_rate': 6.782831811555385e-05}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1660/4000 [06:19<06:10, 6.31it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1661/4000 [06:19<06:11, 6.30it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1662/4000 [06:19<06:11, 6.30it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1663/4000 [06:19<06:10, 6.31it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1664/4000 [06:19<06:11, 6.29it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1665/4000 [06:19<06:11, 6.29it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1666/4000 [06:20<06:10, 6.30it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1667/4000 [06:20<06:10, 6.30it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1668/4000 [06:20<06:09, 6.31it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1669/4000 [06:20<06:09, 6.31it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1670/4000 [06:20<06:10, 6.29it/s] {'loss': 0.6769, 'grad_norm': 0.4589802324771881, 'learning_rate': 6.744151643313597e-05}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1670/4000 [06:20<06:10, 6.29it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1671/4000 [06:20<06:12, 6.26it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1672/4000 [06:21<06:11, 6.26it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1673/4000 [06:21<06:11, 6.26it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1674/4000 [06:21<06:11, 6.27it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1675/4000 [06:21<06:08, 6.30it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1676/4000 [06:21<06:09, 6.29it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1677/4000 [06:21<06:10, 6.27it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1678/4000 [06:22<06:09, 6.29it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1679/4000 [06:22<06:08, 6.29it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1680/4000 [06:22<06:08, 6.29it/s] {'loss': 0.6818, 'grad_norm': 0.46197426319122314, 'learning_rate': 6.705352264623828e-05}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1680/4000 [06:22<06:08, 6.29it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1681/4000 [06:22<06:08, 6.29it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1682/4000 [06:22<06:09, 6.27it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1683/4000 [06:22<06:10, 6.26it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1684/4000 [06:23<06:10, 6.26it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1685/4000 [06:23<06:09, 6.27it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1686/4000 [06:23<06:08, 6.28it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1687/4000 [06:23<06:09, 6.26it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1688/4000 [06:23<06:08, 6.27it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1689/4000 [06:23<06:09, 6.26it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1690/4000 [06:23<06:08, 6.27it/s] {'loss': 0.6723, 'grad_norm': 0.4620882570743561, 'learning_rate': 6.666436327372078e-05}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1690/4000 [06:23<06:08, 6.27it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1691/4000 [06:24<06:08, 6.27it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1692/4000 [06:24<06:09, 6.25it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1693/4000 [06:24<06:08, 6.25it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1694/4000 [06:24<06:08, 6.27it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1695/4000 [06:24<06:08, 6.25it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1696/4000 [06:24<06:07, 6.26it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1697/4000 [06:25<06:08, 6.25it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1698/4000 [06:25<06:07, 6.26it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1699/4000 [06:25<06:06, 6.28it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1700/4000 [06:25<06:09, 6.22it/s] {'loss': 0.6743, 'grad_norm': 0.4430062174797058, 'learning_rate': 6.62740649141096e-05}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1700/4000 [06:25<06:09, 6.22it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1701/4000 [06:25<06:13, 6.15it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1702/4000 [06:25<06:14, 6.14it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1703/4000 [06:26<06:14, 6.14it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1704/4000 [06:26<06:13, 6.15it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1705/4000 [06:26<06:13, 6.15it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1706/4000 [06:26<06:12, 6.16it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1707/4000 [06:26<06:10, 6.19it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1708/4000 [06:26<06:08, 6.22it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1709/4000 [06:27<06:07, 6.23it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1710/4000 [06:27<06:06, 6.24it/s] {'loss': 0.6687, 'grad_norm': 0.49391141533851624, 'learning_rate': 6.588265424377919e-05}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1710/4000 [06:27<06:06, 6.24it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1711/4000 [06:27<06:10, 6.18it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1712/4000 [06:27<06:10, 6.18it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1713/4000 [06:27<06:08, 6.21it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1714/4000 [06:27<06:06, 6.24it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1715/4000 [06:27<06:05, 6.26it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1716/4000 [06:28<06:03, 6.28it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1717/4000 [06:28<06:03, 6.28it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1718/4000 [06:28<06:04, 6.26it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1719/4000 [06:28<06:08, 6.18it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1720/4000 [06:28<06:12, 6.12it/s] {'loss': 0.6715, 'grad_norm': 0.46312108635902405, 'learning_rate': 6.549015801512895e-05}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1720/4000 [06:28<06:12, 6.12it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1721/4000 [06:28<06:10, 6.16it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1722/4000 [06:29<06:06, 6.21it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1723/4000 [06:29<06:06, 6.22it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1724/4000 [06:29<06:06, 6.21it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1725/4000 [06:29<06:07, 6.19it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1726/4000 [06:29<06:10, 6.13it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1727/4000 [06:29<06:11, 6.11it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1728/4000 [06:30<06:13, 6.08it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1729/4000 [06:30<06:15, 6.06it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1730/4000 [06:30<06:10, 6.12it/s] {'loss': 0.6784, 'grad_norm': 0.5987693071365356, 'learning_rate': 6.509660305475468e-05}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1730/4000 [06:30<06:10, 6.12it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1731/4000 [06:30<06:08, 6.16it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1732/4000 [06:30<06:13, 6.06it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1733/4000 [06:30<06:15, 6.04it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1734/4000 [06:31<06:12, 6.08it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1735/4000 [06:31<06:14, 6.05it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1736/4000 [06:31<06:10, 6.11it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1737/4000 [06:31<06:07, 6.17it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1738/4000 [06:31<06:04, 6.21it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1739/4000 [06:31<06:04, 6.20it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1740/4000 [06:32<06:03, 6.21it/s] {'loss': 0.6722, 'grad_norm': 0.5958335995674133, 'learning_rate': 6.47020162616152e-05}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1740/4000 [06:32<06:03, 6.21it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1741/4000 [06:32<06:09, 6.11it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1742/4000 [06:32<06:11, 6.08it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1743/4000 [06:32<06:13, 6.04it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1744/4000 [06:32<06:13, 6.03it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1745/4000 [06:32<06:11, 6.07it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1746/4000 [06:33<06:31, 5.76it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1747/4000 [06:33<06:26, 5.83it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1748/4000 [06:33<06:22, 5.89it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1749/4000 [06:33<06:20, 5.92it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1750/4000 [06:33<06:17, 5.97it/s] {'loss': 0.6666, 'grad_norm': 0.47428029775619507, 'learning_rate': 6.430642460519365e-05}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1750/4000 [06:33<06:17, 5.97it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1751/4000 [06:33<06:19, 5.92it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1752/4000 [06:34<06:19, 5.92it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1753/4000 [06:34<06:21, 5.89it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1754/4000 [06:34<06:16, 5.97it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1755/4000 [06:34<06:33, 5.71it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1756/4000 [06:34<06:41, 5.59it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1757/4000 [06:34<06:32, 5.72it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1758/4000 [06:35<06:27, 5.79it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1759/4000 [06:35<06:48, 5.48it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1760/4000 [06:35<06:41, 5.58it/s] {'loss': 0.6717, 'grad_norm': 0.5168448090553284, 'learning_rate': 6.390985512365426e-05}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1760/4000 [06:35<06:41, 5.58it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1761/4000 [06:35<06:32, 5.70it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1762/4000 [06:35<06:32, 5.71it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1763/4000 [06:36<06:23, 5.83it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1764/4000 [06:36<06:17, 5.92it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1765/4000 [06:36<06:14, 5.97it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1766/4000 [06:36<06:12, 6.00it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1767/4000 [06:36<06:11, 6.00it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1768/4000 [06:36<06:13, 5.98it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1769/4000 [06:37<06:15, 5.95it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1770/4000 [06:37<06:15, 5.93it/s] {'loss': 0.6729, 'grad_norm': 0.5116503238677979, 'learning_rate': 6.351233492199431e-05}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1770/4000 [06:37<06:15, 5.93it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1771/4000 [06:37<06:14, 5.95it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1772/4000 [06:37<06:12, 5.98it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1773/4000 [06:37<06:16, 5.92it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1774/4000 [06:37<06:23, 5.81it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1775/4000 [06:38<06:44, 5.50it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1776/4000 [06:38<06:48, 5.44it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1777/4000 [06:38<06:39, 5.57it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1778/4000 [06:38<06:57, 5.32it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1779/4000 [06:38<06:44, 5.50it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1780/4000 [06:38<06:38, 5.57it/s] {'loss': 0.6729, 'grad_norm': 0.44382596015930176, 'learning_rate': 6.311389117019155e-05}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1780/4000 [06:38<06:38, 5.57it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1781/4000 [06:39<06:41, 5.53it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1782/4000 [06:39<06:34, 5.62it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1783/4000 [06:39<06:38, 5.56it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1784/4000 [06:39<06:34, 5.62it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1785/4000 [06:39<06:26, 5.72it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1786/4000 [06:40<06:23, 5.78it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1787/4000 [06:40<06:23, 5.77it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1788/4000 [06:40<06:22, 5.78it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1789/4000 [06:40<06:19, 5.83it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1790/4000 [06:40<06:21, 5.80it/s] {'loss': 0.67, 'grad_norm': 0.493990421295166, 'learning_rate': 6.271455110134713e-05}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1790/4000 [06:40<06:21, 5.80it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1791/4000 [06:40<06:19, 5.82it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1792/4000 [06:41<06:21, 5.79it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1793/4000 [06:41<06:25, 5.73it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1794/4000 [06:41<06:24, 5.74it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1795/4000 [06:41<06:23, 5.75it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1796/4000 [06:41<06:18, 5.82it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1797/4000 [06:41<06:35, 5.57it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1798/4000 [06:42<06:31, 5.63it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1799/4000 [06:42<06:30, 5.64it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1800/4000 [06:42<06:30, 5.63it/s] {'loss': 0.6675, 'grad_norm': 0.4669109880924225, 'learning_rate': 6.231434200982428e-05}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1800/4000 [06:42<06:30, 5.63it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1801/4000 [06:42<06:26, 5.69it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1802/4000 [06:42<06:20, 5.77it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1803/4000 [06:43<06:21, 5.76it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1804/4000 [06:43<06:18, 5.80it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1805/4000 [06:43<06:21, 5.76it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1806/4000 [06:43<06:18, 5.80it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1807/4000 [06:43<06:16, 5.82it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1808/4000 [06:43<06:14, 5.85it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1809/4000 [06:44<06:33, 5.56it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1810/4000 [06:44<06:50, 5.34it/s] {'loss': 0.6669, 'grad_norm': 0.5045506954193115, 'learning_rate': 6.191329124938285e-05}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1810/4000 [06:44<06:50, 5.34it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1811/4000 [06:44<06:54, 5.29it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1812/4000 [06:44<06:56, 5.26it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1813/4000 [06:44<06:50, 5.33it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1814/4000 [06:45<06:43, 5.41it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1815/4000 [06:45<07:03, 5.16it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1816/4000 [06:45<06:49, 5.33it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1817/4000 [06:45<06:41, 5.43it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1818/4000 [06:45<06:31, 5.57it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1819/4000 [06:45<06:25, 5.66it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1820/4000 [06:46<06:20, 5.73it/s] {'loss': 0.6674, 'grad_norm': 0.44471701979637146, 'learning_rate': 6.15114262313095e-05}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1820/4000 [06:46<06:20, 5.73it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1821/4000 [06:46<06:22, 5.70it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1822/4000 [06:46<06:16, 5.78it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1823/4000 [06:46<06:12, 5.85it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1824/4000 [06:46<06:09, 5.90it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1825/4000 [06:46<06:07, 5.92it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1826/4000 [06:47<06:04, 5.96it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1827/4000 [06:47<06:04, 5.96it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1828/4000 [06:47<06:04, 5.95it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1829/4000 [06:47<06:05, 5.94it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1830/4000 [06:47<06:06, 5.92it/s] {'loss': 0.6627, 'grad_norm': 0.4768633246421814, 'learning_rate': 6.110877442254444e-05}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1830/4000 [06:47<06:06, 5.92it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1831/4000 [06:47<06:13, 5.81it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1832/4000 [06:48<06:12, 5.82it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1833/4000 [06:48<06:09, 5.87it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1834/4000 [06:48<06:04, 5.94it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1835/4000 [06:48<06:01, 5.98it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1836/4000 [06:48<05:59, 6.02it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1837/4000 [06:48<05:57, 6.05it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1838/4000 [06:49<05:57, 6.04it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1839/4000 [06:49<05:56, 6.06it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1840/4000 [06:49<05:57, 6.04it/s] {'loss': 0.6601, 'grad_norm': 0.44661039113998413, 'learning_rate': 6.0705363343803946e-05}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1840/4000 [06:49<05:57, 6.04it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1841/4000 [06:49<05:59, 6.00it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1842/4000 [06:49<05:58, 6.01it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1843/4000 [06:49<05:57, 6.03it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1844/4000 [06:50<05:57, 6.03it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1845/4000 [06:50<05:56, 6.05it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1846/4000 [06:50<05:54, 6.08it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1847/4000 [06:50<05:51, 6.12it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1848/4000 [06:50<05:49, 6.15it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1849/4000 [06:50<05:48, 6.17it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1850/4000 [06:51<05:47, 6.19it/s] {'loss': 0.6664, 'grad_norm': 0.3572286367416382, 'learning_rate': 6.030122056769934e-05}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1850/4000 [06:51<05:47, 6.19it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1851/4000 [06:51<05:46, 6.19it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1852/4000 [06:51<05:47, 6.19it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1853/4000 [06:51<05:45, 6.22it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1854/4000 [06:51<05:43, 6.24it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1855/4000 [06:51<05:43, 6.24it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1856/4000 [06:52<05:43, 6.24it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1857/4000 [06:52<05:43, 6.24it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1858/4000 [06:52<05:42, 6.25it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1859/4000 [06:52<05:41, 6.27it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1860/4000 [06:52<05:40, 6.29it/s] {'loss': 0.6776, 'grad_norm': 0.5288762450218201, 'learning_rate': 5.989637371685257e-05}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1860/4000 [06:52<05:40, 6.29it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1861/4000 [06:52<05:40, 6.29it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1862/4000 [06:52<05:38, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1863/4000 [06:53<05:38, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1864/4000 [06:53<05:39, 6.30it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1865/4000 [06:53<05:38, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1866/4000 [06:53<05:38, 6.30it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1867/4000 [06:53<05:37, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1868/4000 [06:53<05:36, 6.33it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1869/4000 [06:54<05:37, 6.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1870/4000 [06:54<05:39, 6.28it/s] {'loss': 0.6725, 'grad_norm': 0.5009581446647644, 'learning_rate': 5.949085046200808e-05}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1870/4000 [06:54<05:39, 6.28it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1871/4000 [06:54<05:40, 6.25it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1872/4000 [06:54<05:39, 6.27it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1873/4000 [06:54<05:37, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1874/4000 [06:54<05:36, 6.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1875/4000 [06:55<05:36, 6.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1876/4000 [06:55<05:38, 6.28it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1877/4000 [06:55<05:37, 6.30it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1878/4000 [06:55<05:36, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1879/4000 [06:55<05:36, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1880/4000 [06:55<05:35, 6.32it/s] {'loss': 0.6632, 'grad_norm': 0.44043007493019104, 'learning_rate': 5.908467852014169e-05}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1880/4000 [06:55<05:35, 6.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1881/4000 [06:55<05:36, 6.30it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1882/4000 [06:56<05:36, 6.30it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1883/4000 [06:56<05:35, 6.30it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1884/4000 [06:56<05:35, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1885/4000 [06:56<05:35, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1886/4000 [06:56<05:34, 6.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1887/4000 [06:56<05:34, 6.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1888/4000 [06:57<05:34, 6.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1889/4000 [06:57<05:34, 6.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1890/4000 [06:57<05:33, 6.33it/s] {'loss': 0.6646, 'grad_norm': 0.4002927243709564, 'learning_rate': 5.867788565256607e-05}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1890/4000 [06:57<05:33, 6.33it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1891/4000 [06:57<05:33, 6.33it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1892/4000 [06:57<05:32, 6.34it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1893/4000 [06:57<05:32, 6.33it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1894/4000 [06:58<05:32, 6.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1895/4000 [06:58<05:32, 6.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1896/4000 [06:58<05:36, 6.25it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1897/4000 [06:58<05:37, 6.24it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1898/4000 [06:58<05:35, 6.26it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1899/4000 [06:58<05:35, 6.27it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1900/4000 [06:59<05:34, 6.27it/s] {'loss': 0.6567, 'grad_norm': 0.49478879570961, 'learning_rate': 5.827049966303335e-05}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1900/4000 [06:59<05:34, 6.27it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1901/4000 [06:59<05:34, 6.27it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1902/4000 [06:59<05:36, 6.24it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1903/4000 [06:59<05:35, 6.26it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1904/4000 [06:59<05:33, 6.28it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1905/4000 [06:59<05:33, 6.28it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1906/4000 [06:59<05:33, 6.27it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1907/4000 [07:00<05:32, 6.29it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1908/4000 [07:00<05:35, 6.24it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1909/4000 [07:00<05:34, 6.26it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1910/4000 [07:00<05:33, 6.27it/s] {'loss': 0.6634, 'grad_norm': 0.4791210889816284, 'learning_rate': 5.786254839583478e-05}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1910/4000 [07:00<05:33, 6.27it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1911/4000 [07:00<05:34, 6.25it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1912/4000 [07:00<05:34, 6.24it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1913/4000 [07:01<05:33, 6.26it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1914/4000 [07:01<05:34, 6.23it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1915/4000 [07:01<05:33, 6.26it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1916/4000 [07:01<05:32, 6.28it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1917/4000 [07:01<05:32, 6.27it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1918/4000 [07:01<05:32, 6.27it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1919/4000 [07:02<05:31, 6.28it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1920/4000 [07:02<05:32, 6.25it/s] {'loss': 0.6534, 'grad_norm': 0.5025086998939514, 'learning_rate': 5.745405973389757e-05}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1920/4000 [07:02<05:32, 6.25it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1921/4000 [07:02<05:32, 6.25it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1922/4000 [07:02<05:31, 6.28it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1923/4000 [07:02<05:30, 6.28it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1924/4000 [07:02<05:30, 6.29it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1925/4000 [07:03<05:29, 6.30it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1926/4000 [07:03<05:31, 6.26it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1927/4000 [07:03<05:29, 6.29it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1928/4000 [07:03<05:29, 6.30it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1929/4000 [07:03<05:29, 6.29it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1930/4000 [07:03<05:29, 6.29it/s] {'loss': 0.6736, 'grad_norm': 0.4516906142234802, 'learning_rate': 5.7045061596879134e-05}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1930/4000 [07:03<05:29, 6.29it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1931/4000 [07:03<05:29, 6.29it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1932/4000 [07:04<05:30, 6.26it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1933/4000 [07:04<05:29, 6.27it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1934/4000 [07:04<05:27, 6.30it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1935/4000 [07:04<05:27, 6.31it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1936/4000 [07:04<05:26, 6.31it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1937/4000 [07:04<05:25, 6.34it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1938/4000 [07:05<05:25, 6.34it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1939/4000 [07:05<05:24, 6.35it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1940/4000 [07:05<05:23, 6.36it/s] {'loss': 0.658, 'grad_norm': 0.4860899746417999, 'learning_rate': 5.6635581939258855e-05}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1940/4000 [07:05<05:23, 6.36it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1941/4000 [07:05<05:24, 6.34it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1942/4000 [07:05<05:24, 6.33it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1943/4000 [07:05<05:23, 6.35it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1944/4000 [07:06<05:23, 6.35it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1945/4000 [07:06<05:23, 6.35it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1946/4000 [07:06<05:22, 6.36it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1947/4000 [07:06<05:23, 6.35it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1948/4000 [07:06<05:23, 6.34it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1949/4000 [07:06<05:22, 6.36it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1950/4000 [07:06<05:22, 6.35it/s] {'loss': 0.6562, 'grad_norm': 0.514963686466217, 'learning_rate': 5.622564874842742e-05}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1950/4000 [07:06<05:22, 6.35it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1951/4000 [07:07<05:22, 6.35it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1952/4000 [07:07<05:23, 6.33it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1953/4000 [07:07<05:24, 6.31it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1954/4000 [07:07<05:24, 6.30it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1955/4000 [07:07<05:25, 6.29it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1956/4000 [07:07<05:25, 6.29it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1957/4000 [07:08<05:25, 6.28it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1958/4000 [07:08<05:25, 6.28it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1959/4000 [07:08<05:24, 6.29it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1960/4000 [07:08<05:23, 6.31it/s] {'loss': 0.6619, 'grad_norm': 0.48472991585731506, 'learning_rate': 5.5815290042773836e-05}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1960/4000 [07:08<05:23, 6.31it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1961/4000 [07:08<05:24, 6.29it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1962/4000 [07:08<05:22, 6.32it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1963/4000 [07:09<05:21, 6.34it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1964/4000 [07:09<05:20, 6.35it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1965/4000 [07:09<05:20, 6.36it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1966/4000 [07:09<05:20, 6.34it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1967/4000 [07:09<05:21, 6.31it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1968/4000 [07:09<05:20, 6.34it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1969/4000 [07:09<05:19, 6.35it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1970/4000 [07:10<05:19, 6.36it/s] {'loss': 0.651, 'grad_norm': 0.5486400723457336, 'learning_rate': 5.540453386977058e-05}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1970/4000 [07:10<05:19, 6.36it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1971/4000 [07:10<05:21, 6.32it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1972/4000 [07:10<05:20, 6.32it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1973/4000 [07:10<05:22, 6.28it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1974/4000 [07:10<05:21, 6.29it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1975/4000 [07:10<05:20, 6.32it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1976/4000 [07:11<05:19, 6.33it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1977/4000 [07:11<05:19, 6.34it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1978/4000 [07:11<05:19, 6.34it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1979/4000 [07:11<05:18, 6.34it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1980/4000 [07:11<05:18, 6.34it/s] {'loss': 0.6531, 'grad_norm': 0.5085767507553101, 'learning_rate': 5.4993408304056425e-05}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1980/4000 [07:11<05:18, 6.34it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1981/4000 [07:11<05:18, 6.33it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1982/4000 [07:12<05:18, 6.34it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1983/4000 [07:12<05:17, 6.35it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1984/4000 [07:12<05:18, 6.33it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1985/4000 [07:12<05:19, 6.31it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1986/4000 [07:12<05:19, 6.31it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1987/4000 [07:12<05:18, 6.32it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1988/4000 [07:12<05:17, 6.34it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1989/4000 [07:13<05:17, 6.34it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1990/4000 [07:13<05:17, 6.32it/s] {'loss': 0.6554, 'grad_norm': 0.5380701422691345, 'learning_rate': 5.458194144551768e-05}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1990/4000 [07:13<05:17, 6.32it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1991/4000 [07:13<05:21, 6.25it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1992/4000 [07:13<05:21, 6.24it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1993/4000 [07:13<05:20, 6.26it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1994/4000 [07:13<05:20, 6.27it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1995/4000 [07:14<05:19, 6.27it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1996/4000 [07:14<05:19, 6.27it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1997/4000 [07:14<05:21, 6.23it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1998/4000 [07:14<05:20, 6.24it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1999/4000 [07:14<05:19, 6.26it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2000/4000 [07:14<05:18, 6.27it/s] {'loss': 0.6568, 'grad_norm': 0.48443129658699036, 'learning_rate': 5.417016141736756e-05}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2000/4000 [07:14<05:18, 6.27it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Copying experiment config directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/experiment_cfg to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-2000/experiment_cfg
Copying processor directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/processor to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-2000
Copying wandb_config.json from /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/wandb_config.json to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-2000/wandb_config.json
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2001/4000 [07:42<4:38:18, 8.35s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2002/4000 [07:42<3:16:17, 5.89s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2003/4000 [07:42<2:18:55, 4.17s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2004/4000 [07:42<1:38:47, 2.97s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2005/4000 [07:42<1:10:42, 2.13s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2006/4000 [07:43<51:02, 1.54s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2007/4000 [07:43<37:16, 1.12s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2008/4000 [07:43<27:39, 1.20it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2009/4000 [07:43<20:55, 1.59it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2010/4000 [07:43<16:12, 2.05it/s] {'loss': 0.65, 'grad_norm': 0.500551164150238, 'learning_rate': 5.375809636422399e-05}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2010/4000 [07:43<16:12, 2.05it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2011/4000 [07:43<12:56, 2.56it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2012/4000 [07:44<10:38, 3.12it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2013/4000 [07:44<09:01, 3.67it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2014/4000 [07:44<07:52, 4.20it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2015/4000 [07:44<07:04, 4.68it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2016/4000 [07:44<06:32, 5.06it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2017/4000 [07:44<06:08, 5.39it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2018/4000 [07:45<05:51, 5.64it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2019/4000 [07:45<05:40, 5.82it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2020/4000 [07:45<05:31, 5.97it/s] {'loss': 0.6526, 'grad_norm': 0.45972248911857605, 'learning_rate': 5.334577445018599e-05}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2020/4000 [07:45<05:31, 5.97it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2021/4000 [07:45<05:26, 6.05it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2022/4000 [07:45<05:23, 6.12it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2023/4000 [07:45<05:19, 6.18it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2024/4000 [07:46<05:16, 6.24it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2025/4000 [07:46<05:14, 6.27it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2026/4000 [07:46<05:13, 6.29it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2027/4000 [07:46<05:13, 6.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2028/4000 [07:46<05:13, 6.29it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2029/4000 [07:46<05:12, 6.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2030/4000 [07:46<05:12, 6.31it/s] {'loss': 0.6528, 'grad_norm': 0.48264095187187195, 'learning_rate': 5.293322385690867e-05}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2030/4000 [07:46<05:12, 6.31it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2031/4000 [07:47<05:13, 6.29it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2032/4000 [07:47<05:12, 6.29it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2033/4000 [07:47<05:11, 6.32it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2034/4000 [07:47<05:11, 6.31it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2035/4000 [07:47<05:11, 6.31it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2036/4000 [07:47<05:11, 6.31it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2037/4000 [07:48<05:11, 6.31it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2038/4000 [07:48<05:11, 6.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2039/4000 [07:48<05:12, 6.27it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2040/4000 [07:48<05:12, 6.27it/s] {'loss': 0.6554, 'grad_norm': 0.5277835726737976, 'learning_rate': 5.252047278167709e-05}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2040/4000 [07:48<05:12, 6.27it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2041/4000 [07:48<05:13, 6.26it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2042/4000 [07:48<05:12, 6.27it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2043/4000 [07:49<05:12, 6.26it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2044/4000 [07:49<05:12, 6.26it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2045/4000 [07:49<05:10, 6.29it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2046/4000 [07:49<05:09, 6.31it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2047/4000 [07:49<05:09, 6.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2048/4000 [07:49<05:08, 6.33it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2049/4000 [07:49<05:08, 6.32it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2050/4000 [07:50<05:08, 6.32it/s] {'loss': 0.6401, 'grad_norm': 0.5141885280609131, 'learning_rate': 5.210754943547893e-05}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2050/4000 [07:50<05:08, 6.32it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2051/4000 [07:50<05:08, 6.32it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2052/4000 [07:50<05:07, 6.33it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2053/4000 [07:50<05:07, 6.32it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2054/4000 [07:50<05:07, 6.34it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2055/4000 [07:50<05:07, 6.33it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2056/4000 [07:51<05:07, 6.32it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2057/4000 [07:51<05:06, 6.33it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2058/4000 [07:51<05:06, 6.34it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2059/4000 [07:51<05:07, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2060/4000 [07:51<05:06, 6.34it/s] {'loss': 0.6523, 'grad_norm': 0.5499096512794495, 'learning_rate': 5.169448204107643e-05}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2060/4000 [07:51<05:06, 6.34it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2061/4000 [07:51<05:07, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2062/4000 [07:52<05:07, 6.30it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2063/4000 [07:52<05:06, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2064/4000 [07:52<05:06, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2065/4000 [07:52<05:06, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2066/4000 [07:52<05:06, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2067/4000 [07:52<05:05, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2068/4000 [07:52<05:06, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2069/4000 [07:53<05:06, 6.30it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2070/4000 [07:53<05:05, 6.32it/s] {'loss': 0.6457, 'grad_norm': 0.4706327021121979, 'learning_rate': 5.128129883107729e-05}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2070/4000 [07:53<05:05, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2071/4000 [07:53<05:06, 6.29it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2072/4000 [07:53<05:05, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2073/4000 [07:53<05:04, 6.33it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2074/4000 [07:53<05:04, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2075/4000 [07:54<05:04, 6.33it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2076/4000 [07:54<05:03, 6.35it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2077/4000 [07:54<05:02, 6.36it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2078/4000 [07:54<05:02, 6.36it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2079/4000 [07:54<05:03, 6.33it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2080/4000 [07:54<05:04, 6.31it/s] {'loss': 0.6563, 'grad_norm': 0.5084936618804932, 'learning_rate': 5.086802804600505e-05}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2080/4000 [07:54<05:04, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2081/4000 [07:55<05:05, 6.29it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2082/4000 [07:55<05:04, 6.29it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2083/4000 [07:55<05:03, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2084/4000 [07:55<05:03, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2085/4000 [07:55<05:02, 6.33it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2086/4000 [07:55<05:02, 6.33it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2087/4000 [07:55<05:02, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2088/4000 [07:56<05:01, 6.34it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2089/4000 [07:56<05:02, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2090/4000 [07:56<05:01, 6.33it/s] {'loss': 0.6661, 'grad_norm': 0.5018666386604309, 'learning_rate': 5.045469793236892e-05}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2090/4000 [07:56<05:01, 6.33it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2091/4000 [07:56<05:02, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2092/4000 [07:56<05:02, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2093/4000 [07:56<05:01, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2094/4000 [07:57<05:02, 6.30it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2095/4000 [07:57<05:01, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2096/4000 [07:57<05:01, 6.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2097/4000 [07:57<05:00, 6.32it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2098/4000 [07:57<04:59, 6.34it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2099/4000 [07:57<05:00, 6.33it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2100/4000 [07:58<05:01, 6.31it/s] {'loss': 0.6594, 'grad_norm': 0.5113282203674316, 'learning_rate': 5.00413367407331e-05}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2100/4000 [07:58<05:01, 6.31it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2101/4000 [07:58<05:07, 6.18it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2102/4000 [07:58<05:26, 5.81it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2103/4000 [07:58<05:47, 5.47it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2104/4000 [07:58<05:57, 5.31it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2105/4000 [07:59<05:58, 5.28it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2106/4000 [07:59<06:05, 5.18it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2107/4000 [07:59<06:08, 5.13it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2108/4000 [07:59<06:12, 5.07it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2109/4000 [07:59<06:15, 5.04it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2110/4000 [08:00<06:12, 5.07it/s] {'loss': 0.6575, 'grad_norm': 0.5432279109954834, 'learning_rate': 4.9627972723785964e-05}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2110/4000 [08:00<06:12, 5.07it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2111/4000 [08:00<06:14, 5.04it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2112/4000 [08:00<06:18, 4.99it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2113/4000 [08:00<06:15, 5.03it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2114/4000 [08:00<06:14, 5.04it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2115/4000 [08:00<06:08, 5.12it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2116/4000 [08:01<06:05, 5.15it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2117/4000 [08:01<06:04, 5.16it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2118/4000 [08:01<06:03, 5.18it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2119/4000 [08:01<05:59, 5.23it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2120/4000 [08:01<05:43, 5.47it/s] {'loss': 0.6635, 'grad_norm': 0.43599316477775574, 'learning_rate': 4.921463413440898e-05}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2120/4000 [08:01<05:43, 5.47it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2121/4000 [08:02<05:32, 5.66it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2122/4000 [08:02<05:21, 5.85it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2123/4000 [08:02<05:13, 5.99it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2124/4000 [08:02<05:09, 6.07it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2125/4000 [08:02<05:04, 6.16it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2126/4000 [08:02<05:01, 6.21it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2127/4000 [08:03<04:59, 6.26it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2128/4000 [08:03<04:57, 6.29it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2129/4000 [08:03<04:56, 6.31it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2130/4000 [08:03<04:57, 6.29it/s] {'loss': 0.6621, 'grad_norm': 0.5157215595245361, 'learning_rate': 4.8801349223745654e-05}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2130/4000 [08:03<04:57, 6.29it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2131/4000 [08:03<04:56, 6.30it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2132/4000 [08:03<04:56, 6.31it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2133/4000 [08:03<04:55, 6.32it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2134/4000 [08:04<04:54, 6.33it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2135/4000 [08:04<04:54, 6.33it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2136/4000 [08:04<04:55, 6.31it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2137/4000 [08:04<04:55, 6.30it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2138/4000 [08:04<04:54, 6.32it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2139/4000 [08:04<04:54, 6.32it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2140/4000 [08:05<04:54, 6.32it/s] {'loss': 0.6507, 'grad_norm': 0.5271717309951782, 'learning_rate': 4.838814623927067e-05}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2140/4000 [08:05<04:54, 6.32it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2141/4000 [08:05<04:54, 6.31it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2142/4000 [08:05<04:55, 6.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2143/4000 [08:05<04:54, 6.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2144/4000 [08:05<04:53, 6.32it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2145/4000 [08:05<04:53, 6.32it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2146/4000 [08:06<04:52, 6.34it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2147/4000 [08:06<04:52, 6.34it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2148/4000 [08:06<04:53, 6.31it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2149/4000 [08:06<04:52, 6.33it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2150/4000 [08:06<04:52, 6.33it/s] {'loss': 0.658, 'grad_norm': 0.6172201037406921, 'learning_rate': 4.797505342285912e-05}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2150/4000 [08:06<04:52, 6.33it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2151/4000 [08:06<04:54, 6.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2152/4000 [08:06<04:59, 6.17it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2153/4000 [08:07<05:14, 5.87it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2154/4000 [08:07<05:28, 5.61it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2155/4000 [08:07<05:35, 5.50it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2156/4000 [08:07<05:27, 5.64it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2157/4000 [08:07<05:15, 5.84it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2158/4000 [08:08<05:08, 5.97it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2159/4000 [08:08<05:03, 6.06it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2160/4000 [08:08<05:00, 6.12it/s] {'loss': 0.6563, 'grad_norm': 0.5685967206954956, 'learning_rate': 4.756209900885628e-05}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2160/4000 [08:08<05:00, 6.12it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2161/4000 [08:08<04:57, 6.17it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2162/4000 [08:08<04:55, 6.22it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2163/4000 [08:08<04:53, 6.25it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2164/4000 [08:09<04:52, 6.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2165/4000 [08:09<04:51, 6.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2166/4000 [08:09<04:51, 6.28it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2167/4000 [08:09<04:51, 6.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2168/4000 [08:09<04:50, 6.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2169/4000 [08:09<04:50, 6.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2170/4000 [08:09<04:51, 6.28it/s] {'loss': 0.6453, 'grad_norm': 0.5275602340698242, 'learning_rate': 4.714931122214781e-05}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2170/4000 [08:09<04:51, 6.28it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2171/4000 [08:10<04:51, 6.26it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2172/4000 [08:10<04:52, 6.26it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2173/4000 [08:10<04:50, 6.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2174/4000 [08:10<04:49, 6.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2175/4000 [08:10<04:49, 6.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2176/4000 [08:10<04:49, 6.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2177/4000 [08:11<04:49, 6.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2178/4000 [08:11<04:49, 6.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2179/4000 [08:11<04:48, 6.31it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2180/4000 [08:11<04:47, 6.32it/s] {'loss': 0.6433, 'grad_norm': 0.4504585266113281, 'learning_rate': 4.673671827623058e-05}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2180/4000 [08:11<04:47, 6.32it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2181/4000 [08:11<04:48, 6.31it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2182/4000 [08:11<04:47, 6.32it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2183/4000 [08:12<04:46, 6.34it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2184/4000 [08:12<04:47, 6.32it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2185/4000 [08:12<04:46, 6.34it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2186/4000 [08:12<04:46, 6.34it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2187/4000 [08:12<04:46, 6.33it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2188/4000 [08:12<04:46, 6.32it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2189/4000 [08:12<04:45, 6.33it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2190/4000 [08:13<04:50, 6.23it/s] {'loss': 0.6324, 'grad_norm': 0.5779574513435364, 'learning_rate': 4.632434837128443e-05}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2190/4000 [08:13<04:50, 6.23it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2191/4000 [08:13<04:49, 6.25it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2192/4000 [08:13<04:47, 6.28it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2193/4000 [08:13<04:47, 6.29it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2194/4000 [08:13<04:46, 6.30it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2195/4000 [08:13<04:45, 6.32it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2196/4000 [08:14<04:46, 6.30it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2197/4000 [08:14<04:45, 6.31it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2198/4000 [08:14<04:44, 6.33it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2199/4000 [08:14<04:44, 6.33it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2200/4000 [08:14<04:43, 6.34it/s] {'loss': 0.6315, 'grad_norm': 0.5458188652992249, 'learning_rate': 4.591222969224453e-05}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2200/4000 [08:14<04:43, 6.34it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2201/4000 [08:14<04:44, 6.31it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2202/4000 [08:15<04:45, 6.29it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2203/4000 [08:15<04:45, 6.29it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2204/4000 [08:15<04:45, 6.29it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2205/4000 [08:15<04:45, 6.30it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2206/4000 [08:15<04:44, 6.31it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2207/4000 [08:15<04:43, 6.32it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2208/4000 [08:15<04:44, 6.30it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2209/4000 [08:16<04:43, 6.31it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2210/4000 [08:16<04:43, 6.31it/s] {'loss': 0.6461, 'grad_norm': 0.4926324784755707, 'learning_rate': 4.550039040687518e-05}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2210/4000 [08:16<04:43, 6.31it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2211/4000 [08:16<04:44, 6.29it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2212/4000 [08:16<04:44, 6.29it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2213/4000 [08:16<04:44, 6.28it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2214/4000 [08:16<04:45, 6.25it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2215/4000 [08:17<04:44, 6.28it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2216/4000 [08:17<04:43, 6.30it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2217/4000 [08:17<04:42, 6.31it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2218/4000 [08:17<04:41, 6.32it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2219/4000 [08:17<04:41, 6.33it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2220/4000 [08:17<04:41, 6.32it/s] {'loss': 0.6547, 'grad_norm': 0.495878666639328, 'learning_rate': 4.508885866384446e-05}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2220/4000 [08:17<04:41, 6.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2221/4000 [08:18<04:42, 6.30it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2222/4000 [08:18<04:43, 6.28it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2223/4000 [08:18<04:42, 6.28it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2224/4000 [08:18<04:41, 6.30it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2225/4000 [08:18<04:40, 6.33it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2226/4000 [08:18<04:41, 6.30it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2227/4000 [08:19<04:40, 6.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2228/4000 [08:19<04:40, 6.31it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2229/4000 [08:19<04:40, 6.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2230/4000 [08:19<04:40, 6.32it/s] {'loss': 0.6362, 'grad_norm': 0.48175400495529175, 'learning_rate': 4.4677662590800355e-05}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2230/4000 [08:19<04:40, 6.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2231/4000 [08:19<04:40, 6.31it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2232/4000 [08:19<04:40, 6.30it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2233/4000 [08:19<04:40, 6.30it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2234/4000 [08:20<04:39, 6.31it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2235/4000 [08:20<04:39, 6.31it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2236/4000 [08:20<04:39, 6.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2237/4000 [08:20<04:38, 6.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2238/4000 [08:20<04:38, 6.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2239/4000 [08:20<04:37, 6.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2240/4000 [08:21<04:37, 6.35it/s] {'loss': 0.6301, 'grad_norm': 0.5608018040657043, 'learning_rate': 4.426683029244825e-05}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2240/4000 [08:21<04:37, 6.35it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2241/4000 [08:21<04:37, 6.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2242/4000 [08:21<04:37, 6.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2243/4000 [08:21<04:37, 6.33it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2244/4000 [08:21<04:38, 6.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2245/4000 [08:21<04:37, 6.33it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2246/4000 [08:22<04:36, 6.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2247/4000 [08:22<04:36, 6.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2248/4000 [08:22<04:36, 6.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2249/4000 [08:22<04:36, 6.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2250/4000 [08:22<04:36, 6.33it/s] {'loss': 0.6295, 'grad_norm': 0.5170360803604126, 'learning_rate': 4.385638984863e-05}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2250/4000 [08:22<04:36, 6.33it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2251/4000 [08:22<04:36, 6.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2252/4000 [08:22<04:35, 6.33it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2253/4000 [08:23<04:35, 6.35it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2254/4000 [08:23<04:36, 6.33it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2255/4000 [08:23<04:35, 6.35it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2256/4000 [08:23<04:36, 6.31it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2257/4000 [08:23<04:35, 6.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2258/4000 [08:23<04:34, 6.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2259/4000 [08:24<04:34, 6.33it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2260/4000 [08:24<04:35, 6.31it/s] {'loss': 0.6451, 'grad_norm': 0.4240834712982178, 'learning_rate': 4.3446369312404745e-05}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2260/4000 [08:24<04:35, 6.31it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2261/4000 [08:24<04:37, 6.27it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2262/4000 [08:24<04:37, 6.27it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2263/4000 [08:24<04:36, 6.28it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2264/4000 [08:24<04:35, 6.31it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2265/4000 [08:25<04:35, 6.29it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2266/4000 [08:25<04:35, 6.30it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2267/4000 [08:25<04:34, 6.32it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2268/4000 [08:25<04:34, 6.31it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2269/4000 [08:25<04:33, 6.33it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2270/4000 [08:25<04:33, 6.33it/s] {'loss': 0.6609, 'grad_norm': 0.5210784673690796, 'learning_rate': 4.3036796708131474e-05}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2270/4000 [08:25<04:33, 6.33it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2271/4000 [08:25<04:35, 6.28it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2272/4000 [08:26<04:34, 6.30it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2273/4000 [08:26<04:33, 6.32it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2274/4000 [08:26<04:34, 6.29it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2275/4000 [08:26<04:33, 6.31it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2276/4000 [08:26<04:34, 6.27it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2277/4000 [08:26<04:38, 6.20it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2278/4000 [08:27<04:37, 6.20it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2279/4000 [08:27<04:36, 6.22it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2280/4000 [08:27<04:37, 6.20it/s] {'loss': 0.65, 'grad_norm': 0.49699047207832336, 'learning_rate': 4.262770002955363e-05}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2280/4000 [08:27<04:37, 6.20it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2281/4000 [08:27<04:37, 6.20it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2282/4000 [08:27<04:36, 6.22it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2283/4000 [08:27<04:36, 6.22it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2284/4000 [08:28<04:34, 6.25it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2285/4000 [08:28<04:33, 6.27it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2286/4000 [08:28<04:33, 6.26it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2287/4000 [08:28<04:32, 6.29it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2288/4000 [08:28<04:31, 6.31it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2289/4000 [08:28<04:31, 6.29it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2290/4000 [08:28<04:30, 6.32it/s] {'loss': 0.6389, 'grad_norm': 0.5782934427261353, 'learning_rate': 4.221910723788578e-05}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2290/4000 [08:29<04:30, 6.32it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2291/4000 [08:29<04:30, 6.32it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2292/4000 [08:29<04:30, 6.30it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2293/4000 [08:29<04:29, 6.33it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2294/4000 [08:29<04:28, 6.35it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2295/4000 [08:29<04:29, 6.33it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2296/4000 [08:29<04:28, 6.34it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2297/4000 [08:30<04:27, 6.36it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2298/4000 [08:30<04:28, 6.33it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2299/4000 [08:30<04:28, 6.34it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2300/4000 [08:30<04:27, 6.36it/s] {'loss': 0.6498, 'grad_norm': 0.537568986415863, 'learning_rate': 4.1811046259902474e-05}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2300/4000 [08:30<04:27, 6.36it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2301/4000 [08:30<04:30, 6.29it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2302/4000 [08:30<04:29, 6.29it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2303/4000 [08:31<04:28, 6.32it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2304/4000 [08:31<04:28, 6.31it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2305/4000 [08:31<04:27, 6.34it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2306/4000 [08:31<04:26, 6.35it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2307/4000 [08:31<04:27, 6.34it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2308/4000 [08:31<04:26, 6.35it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2309/4000 [08:31<04:26, 6.34it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2310/4000 [08:32<04:28, 6.29it/s] {'loss': 0.6278, 'grad_norm': 0.5546226501464844, 'learning_rate': 4.140354498602952e-05}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2310/4000 [08:32<04:28, 6.29it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2311/4000 [08:32<04:30, 6.23it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2312/4000 [08:32<04:29, 6.27it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2313/4000 [08:32<04:29, 6.26it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2314/4000 [08:32<04:28, 6.28it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2315/4000 [08:32<04:26, 6.32it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2316/4000 [08:33<04:26, 6.32it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2317/4000 [08:33<04:26, 6.31it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2318/4000 [08:33<04:27, 6.29it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2319/4000 [08:33<04:27, 6.27it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2320/4000 [08:33<04:29, 6.24it/s] {'loss': 0.6412, 'grad_norm': 0.5449338555335999, 'learning_rate': 4.099663126843769e-05}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2320/4000 [08:33<04:29, 6.24it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2321/4000 [08:33<04:33, 6.14it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2322/4000 [08:34<04:31, 6.18it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2323/4000 [08:34<04:28, 6.24it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2324/4000 [08:34<04:33, 6.13it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2325/4000 [08:34<04:32, 6.14it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2326/4000 [08:34<04:30, 6.18it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2327/4000 [08:34<04:31, 6.16it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2328/4000 [08:35<04:31, 6.15it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2329/4000 [08:35<04:44, 5.87it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2330/4000 [08:35<04:39, 5.97it/s] {'loss': 0.6245, 'grad_norm': 0.47158190608024597, 'learning_rate': 4.059033291913902e-05}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2330/4000 [08:35<04:39, 5.97it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2331/4000 [08:35<04:36, 6.04it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2332/4000 [08:35<04:34, 6.08it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2333/4000 [08:35<04:49, 5.76it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2334/4000 [08:36<04:42, 5.90it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2335/4000 [08:36<04:55, 5.64it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2336/4000 [08:36<05:05, 5.44it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2337/4000 [08:36<04:54, 5.65it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2338/4000 [08:36<04:46, 5.81it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2339/4000 [08:36<04:40, 5.92it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2340/4000 [08:37<04:36, 6.00it/s] {'loss': 0.6351, 'grad_norm': 0.4968015253543854, 'learning_rate': 4.0184677708086014e-05}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2340/4000 [08:37<04:36, 6.00it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2341/4000 [08:37<04:33, 6.07it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2342/4000 [08:37<04:31, 6.10it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2343/4000 [08:37<04:32, 6.08it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2344/4000 [08:37<04:29, 6.14it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2345/4000 [08:37<04:29, 6.14it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2346/4000 [08:38<04:29, 6.13it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2347/4000 [08:38<04:28, 6.16it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2348/4000 [08:38<04:26, 6.20it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2349/4000 [08:38<04:27, 6.18it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2350/4000 [08:38<04:25, 6.21it/s] {'loss': 0.6199, 'grad_norm': 0.5768001079559326, 'learning_rate': 3.977969336127348e-05}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2350/4000 [08:38<04:25, 6.21it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2351/4000 [08:38<04:26, 6.18it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2352/4000 [08:39<04:27, 6.16it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2353/4000 [08:39<04:28, 6.13it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2354/4000 [08:39<04:28, 6.12it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2355/4000 [08:39<04:28, 6.12it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2356/4000 [08:39<04:31, 6.04it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2357/4000 [08:39<04:36, 5.95it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2358/4000 [08:40<04:34, 5.98it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2359/4000 [08:40<04:36, 5.94it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2360/4000 [08:40<04:34, 5.98it/s] {'loss': 0.6196, 'grad_norm': 0.5403270125389099, 'learning_rate': 3.937540755884357e-05}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2360/4000 [08:40<04:34, 5.98it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2361/4000 [08:40<04:33, 5.99it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2362/4000 [08:40<04:32, 6.02it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2363/4000 [08:40<04:31, 6.02it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2364/4000 [08:41<04:33, 5.99it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2365/4000 [08:41<04:34, 5.95it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2366/4000 [08:41<04:36, 5.90it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2367/4000 [08:41<04:39, 5.84it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2368/4000 [08:41<04:37, 5.87it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2369/4000 [08:41<04:36, 5.90it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2370/4000 [08:42<04:35, 5.92it/s] {'loss': 0.6335, 'grad_norm': 0.5366491675376892, 'learning_rate': 3.897184793319384e-05}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2370/4000 [08:42<04:35, 5.92it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2371/4000 [08:42<04:36, 5.90it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2372/4000 [08:42<04:35, 5.90it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2373/4000 [08:42<04:37, 5.87it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2374/4000 [08:42<04:35, 5.90it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2375/4000 [08:42<04:44, 5.72it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2376/4000 [08:43<04:40, 5.79it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2377/4000 [08:43<04:39, 5.81it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2378/4000 [08:43<04:41, 5.77it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2379/4000 [08:43<04:55, 5.48it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2380/4000 [08:43<04:51, 5.57it/s] {'loss': 0.626, 'grad_norm': 0.59100341796875, 'learning_rate': 3.856904206708863e-05}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2380/4000 [08:43<04:51, 5.57it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2381/4000 [08:44<04:50, 5.56it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2382/4000 [08:44<04:57, 5.43it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2383/4000 [08:44<04:55, 5.47it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2384/4000 [08:44<04:53, 5.51it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2385/4000 [08:44<04:51, 5.55it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2386/4000 [08:44<04:44, 5.67it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2387/4000 [08:45<04:38, 5.78it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2388/4000 [08:45<04:36, 5.83it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2389/4000 [08:45<04:34, 5.88it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2390/4000 [08:45<04:31, 5.93it/s] {'loss': 0.6129, 'grad_norm': 0.5830090641975403, 'learning_rate': 3.8167017491773847e-05}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2390/4000 [08:45<04:31, 5.93it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2391/4000 [08:45<04:32, 5.91it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2392/4000 [08:45<04:31, 5.92it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2393/4000 [08:46<04:28, 5.98it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2394/4000 [08:46<04:28, 5.98it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2395/4000 [08:46<04:28, 5.97it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2396/4000 [08:46<04:27, 5.99it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2397/4000 [08:46<04:30, 5.92it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2398/4000 [08:46<04:32, 5.88it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2399/4000 [08:47<04:46, 5.59it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2400/4000 [08:47<04:45, 5.61it/s] {'loss': 0.6326, 'grad_norm': 0.551158607006073, 'learning_rate': 3.776580168509516e-05}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2400/4000 [08:47<04:45, 5.61it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2401/4000 [08:47<04:39, 5.72it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2402/4000 [08:47<04:34, 5.82it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2403/4000 [08:47<04:33, 5.83it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2404/4000 [08:48<04:39, 5.71it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2405/4000 [08:48<04:37, 5.74it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2406/4000 [08:48<04:36, 5.77it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2407/4000 [08:48<04:33, 5.83it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2408/4000 [08:48<04:33, 5.82it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2409/4000 [08:48<04:34, 5.79it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2410/4000 [08:49<04:39, 5.70it/s] {'loss': 0.6334, 'grad_norm': 0.6444660425186157, 'learning_rate': 3.736542206962e-05}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2410/4000 [08:49<04:39, 5.70it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2411/4000 [08:49<04:46, 5.55it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2412/4000 [08:49<04:40, 5.66it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2413/4000 [08:49<04:38, 5.70it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2414/4000 [08:49<04:33, 5.80it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2415/4000 [08:49<04:35, 5.76it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2416/4000 [08:50<04:41, 5.62it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2417/4000 [08:50<04:36, 5.73it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2418/4000 [08:50<04:31, 5.82it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2419/4000 [08:50<04:29, 5.87it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2420/4000 [08:50<04:24, 5.97it/s] {'loss': 0.6127, 'grad_norm': 0.5306467413902283, 'learning_rate': 3.696590601076326e-05}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2420/4000 [08:50<04:24, 5.97it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2421/4000 [08:50<04:23, 5.98it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2422/4000 [08:51<04:20, 6.05it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2423/4000 [08:51<04:18, 6.09it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2424/4000 [08:51<04:18, 6.11it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2425/4000 [08:51<04:16, 6.14it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2426/4000 [08:51<04:15, 6.16it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2427/4000 [08:51<04:16, 6.14it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2428/4000 [08:52<04:15, 6.16it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2429/4000 [08:52<04:13, 6.19it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2430/4000 [08:52<04:14, 6.18it/s] {'loss': 0.6015, 'grad_norm': 0.557221531867981, 'learning_rate': 3.656728081491686e-05}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2430/4000 [08:52<04:14, 6.18it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2431/4000 [08:52<04:14, 6.17it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2432/4000 [08:52<04:13, 6.18it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2433/4000 [08:52<04:13, 6.18it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2434/4000 [08:53<04:13, 6.19it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2435/4000 [08:53<04:12, 6.19it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2436/4000 [08:53<04:13, 6.18it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2437/4000 [08:53<04:11, 6.21it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2438/4000 [08:53<04:11, 6.22it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2439/4000 [08:53<04:12, 6.19it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2440/4000 [08:54<04:12, 6.19it/s] {'loss': 0.6108, 'grad_norm': 0.5166459083557129, 'learning_rate': 3.6169573727583405e-05}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2440/4000 [08:54<04:12, 6.19it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2441/4000 [08:54<04:11, 6.21it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2442/4000 [08:54<04:10, 6.21it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2443/4000 [08:54<04:09, 6.24it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2444/4000 [08:54<04:07, 6.28it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2445/4000 [08:54<04:08, 6.26it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2446/4000 [08:54<04:06, 6.29it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2447/4000 [08:55<04:05, 6.32it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2448/4000 [08:55<04:06, 6.30it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2449/4000 [08:55<04:05, 6.31it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2450/4000 [08:55<04:05, 6.32it/s] {'loss': 0.609, 'grad_norm': 0.601004421710968, 'learning_rate': 3.5772811931514036e-05}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2450/4000 [08:55<04:05, 6.32it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2451/4000 [08:55<04:07, 6.27it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2452/4000 [08:55<04:06, 6.29it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2453/4000 [08:56<04:05, 6.29it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2454/4000 [08:56<04:06, 6.27it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2455/4000 [08:56<04:05, 6.30it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2456/4000 [08:56<04:04, 6.32it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2457/4000 [08:56<04:06, 6.26it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2458/4000 [08:56<04:05, 6.28it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2459/4000 [08:57<04:04, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2460/4000 [08:57<04:04, 6.29it/s] {'loss': 0.5829, 'grad_norm': 0.5955168604850769, 'learning_rate': 3.5377022544850505e-05}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2460/4000 [08:57<04:04, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2461/4000 [08:57<04:05, 6.28it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2462/4000 [08:57<04:04, 6.30it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2463/4000 [08:57<04:04, 6.28it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2464/4000 [08:57<04:03, 6.30it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2465/4000 [08:57<04:04, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2466/4000 [08:58<04:04, 6.28it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2467/4000 [08:58<04:06, 6.21it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2468/4000 [08:58<04:05, 6.25it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2469/4000 [08:58<04:05, 6.24it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2470/4000 [08:58<04:04, 6.26it/s] {'loss': 0.6138, 'grad_norm': 0.7004075050354004, 'learning_rate': 3.498223261927158e-05}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2470/4000 [08:58<04:04, 6.26it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2471/4000 [08:58<04:04, 6.26it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2472/4000 [08:59<04:04, 6.25it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2473/4000 [08:59<04:11, 6.07it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2474/4000 [08:59<04:08, 6.13it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2475/4000 [08:59<04:07, 6.17it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2476/4000 [08:59<04:05, 6.21it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2477/4000 [08:59<04:04, 6.24it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2478/4000 [09:00<04:03, 6.26it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2479/4000 [09:00<04:02, 6.27it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2480/4000 [09:00<04:01, 6.28it/s] {'loss': 0.5839, 'grad_norm': 0.7036697864532471, 'learning_rate': 3.4588469138144295e-05}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2480/4000 [09:00<04:01, 6.28it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2481/4000 [09:00<04:03, 6.25it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2482/4000 [09:00<04:02, 6.27it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2483/4000 [09:00<04:01, 6.28it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2484/4000 [09:01<04:01, 6.27it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2485/4000 [09:01<04:00, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2486/4000 [09:01<04:00, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2487/4000 [09:01<04:01, 6.27it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2488/4000 [09:01<04:00, 6.28it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2489/4000 [09:01<04:00, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2490/4000 [09:01<04:00, 6.29it/s] {'loss': 0.5833, 'grad_norm': 0.7682510614395142, 'learning_rate': 3.419575901467952e-05}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2490/4000 [09:01<04:00, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2491/4000 [09:02<04:00, 6.28it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2492/4000 [09:02<03:59, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2493/4000 [09:02<04:00, 6.27it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2494/4000 [09:02<04:00, 6.27it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2495/4000 [09:02<03:59, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2496/4000 [09:02<03:59, 6.27it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2497/4000 [09:03<03:58, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2498/4000 [09:03<03:58, 6.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2499/4000 [09:03<03:59, 6.27it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2500/4000 [09:03<03:58, 6.28it/s] {'loss': 0.5898, 'grad_norm': 0.627487301826477, 'learning_rate': 3.380412909009254e-05}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2500/4000 [09:03<03:58, 6.28it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Copying experiment config directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/experiment_cfg to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-2500/experiment_cfg
Copying processor directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/processor to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-2500
Copying wandb_config.json from /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/wandb_config.json to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-2500/wandb_config.json
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2501/4000 [09:31<3:31:52, 8.48s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2502/4000 [09:31<2:29:26, 5.99s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2503/4000 [09:31<1:45:44, 4.24s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2504/4000 [09:31<1:15:10, 3.02s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2505/4000 [09:32<53:46, 2.16s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2506/4000 [09:32<38:49, 1.56s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2507/4000 [09:32<28:21, 1.14s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2508/4000 [09:32<21:01, 1.18it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2509/4000 [09:32<15:53, 1.56it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2510/4000 [09:32<12:18, 2.02it/s] {'loss': 0.5891, 'grad_norm': 0.8526833653450012, 'learning_rate': 3.3413606131768475e-05}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2510/4000 [09:32<12:18, 2.02it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2511/4000 [09:33<09:48, 2.53it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2512/4000 [09:33<08:02, 3.08it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2513/4000 [09:33<06:48, 3.64it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2514/4000 [09:33<05:56, 4.17it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2515/4000 [09:33<05:19, 4.65it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2516/4000 [09:33<04:54, 5.04it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2517/4000 [09:34<04:36, 5.37it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2518/4000 [09:34<04:24, 5.61it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2519/4000 [09:34<04:15, 5.80it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2520/4000 [09:34<04:10, 5.92it/s] {'loss': 0.5835, 'grad_norm': 0.6904925107955933, 'learning_rate': 3.302421683143279e-05}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2520/4000 [09:34<04:10, 5.92it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2521/4000 [09:34<04:05, 6.02it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2522/4000 [09:34<04:02, 6.09it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2523/4000 [09:34<03:59, 6.16it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2524/4000 [09:35<03:57, 6.21it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2525/4000 [09:35<03:55, 6.25it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2526/4000 [09:35<03:55, 6.26it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2527/4000 [09:35<03:54, 6.28it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2528/4000 [09:35<03:54, 6.27it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2529/4000 [09:35<03:53, 6.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2530/4000 [09:36<03:53, 6.30it/s] {'loss': 0.5705, 'grad_norm': 0.6932260990142822, 'learning_rate': 3.2635987803326896e-05}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2530/4000 [09:36<03:53, 6.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2531/4000 [09:36<03:53, 6.29it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2532/4000 [09:36<03:53, 6.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2533/4000 [09:36<03:52, 6.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2534/4000 [09:36<03:53, 6.29it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2535/4000 [09:36<03:52, 6.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2536/4000 [09:37<03:52, 6.31it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2537/4000 [09:37<03:51, 6.31it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2538/4000 [09:37<03:51, 6.32it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2539/4000 [09:37<03:51, 6.30it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2540/4000 [09:37<03:52, 6.28it/s] {'loss': 0.5594, 'grad_norm': 0.7479665279388428, 'learning_rate': 3.224894558238918e-05}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2540/4000 [09:37<03:52, 6.28it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2541/4000 [09:37<03:52, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2542/4000 [09:37<03:52, 6.28it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2543/4000 [09:38<03:51, 6.30it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2544/4000 [09:38<03:51, 6.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2545/4000 [09:38<03:51, 6.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2546/4000 [09:38<03:51, 6.28it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2547/4000 [09:38<03:50, 6.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2548/4000 [09:38<03:50, 6.30it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2549/4000 [09:39<03:50, 6.31it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2550/4000 [09:39<03:49, 6.31it/s] {'loss': 0.5739, 'grad_norm': 0.7718488574028015, 'learning_rate': 3.18631166224413e-05}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2550/4000 [09:39<03:49, 6.31it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2551/4000 [09:39<03:50, 6.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2552/4000 [09:39<03:51, 6.25it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2553/4000 [09:39<03:50, 6.27it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2554/4000 [09:39<03:50, 6.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2555/4000 [09:40<03:49, 6.31it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2556/4000 [09:40<03:49, 6.30it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2557/4000 [09:40<03:49, 6.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2558/4000 [09:40<03:50, 6.27it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2559/4000 [09:40<03:51, 6.23it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2560/4000 [09:40<03:50, 6.26it/s] {'loss': 0.5642, 'grad_norm': 0.6979143619537354, 'learning_rate': 3.147852729438017e-05}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2560/4000 [09:40<03:50, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2561/4000 [09:41<03:50, 6.23it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2562/4000 [09:41<03:50, 6.25it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2563/4000 [09:41<03:49, 6.27it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2564/4000 [09:41<03:49, 6.27it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2565/4000 [09:41<03:49, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2566/4000 [09:41<03:49, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2567/4000 [09:41<03:48, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2568/4000 [09:42<03:48, 6.27it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2569/4000 [09:42<03:48, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2570/4000 [09:42<03:48, 6.27it/s] {'loss': 0.5722, 'grad_norm': 0.7248340845108032, 'learning_rate': 3.109520388437548e-05}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2570/4000 [09:42<03:48, 6.27it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2571/4000 [09:42<03:49, 6.24it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2572/4000 [09:42<03:48, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2573/4000 [09:42<03:48, 6.24it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2574/4000 [09:43<03:47, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2575/4000 [09:43<03:48, 6.25it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2576/4000 [09:43<03:47, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2577/4000 [09:43<03:47, 6.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2578/4000 [09:43<03:46, 6.27it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2579/4000 [09:43<03:46, 6.27it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2580/4000 [09:44<03:46, 6.27it/s] {'loss': 0.5494, 'grad_norm': 0.6221643686294556, 'learning_rate': 3.0713172592073116e-05}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2580/4000 [09:44<03:46, 6.27it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2581/4000 [09:44<03:47, 6.24it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2582/4000 [09:44<03:46, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2583/4000 [09:44<03:46, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2584/4000 [09:44<03:47, 6.23it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2585/4000 [09:44<03:46, 6.24it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2586/4000 [09:45<03:46, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2587/4000 [09:45<03:46, 6.24it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2588/4000 [09:45<03:46, 6.24it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2589/4000 [09:45<03:45, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2590/4000 [09:45<03:45, 6.26it/s] {'loss': 0.5576, 'grad_norm': 0.6455109715461731, 'learning_rate': 3.0332459528804457e-05}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2590/4000 [09:45<03:45, 6.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2591/4000 [09:45<03:45, 6.24it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2592/4000 [09:45<03:45, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2593/4000 [09:46<03:44, 6.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2594/4000 [09:46<03:43, 6.29it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2595/4000 [09:46<03:43, 6.29it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2596/4000 [09:46<03:43, 6.28it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2597/4000 [09:46<03:43, 6.28it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2598/4000 [09:46<03:43, 6.28it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2599/4000 [09:47<03:43, 6.28it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2600/4000 [09:47<03:43, 6.27it/s] {'loss': 0.5429, 'grad_norm': 0.9877533316612244, 'learning_rate': 2.9953090715801634e-05}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2600/4000 [09:47<03:43, 6.27it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2601/4000 [09:47<03:44, 6.24it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2602/4000 [09:47<03:43, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2603/4000 [09:47<03:43, 6.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2604/4000 [09:47<03:43, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2605/4000 [09:48<03:42, 6.27it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2606/4000 [09:48<03:42, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2607/4000 [09:48<03:42, 6.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2608/4000 [09:48<03:42, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2609/4000 [09:48<03:42, 6.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2610/4000 [09:48<03:41, 6.28it/s] {'loss': 0.5507, 'grad_norm': 0.7915239930152893, 'learning_rate': 2.9575092082419086e-05}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2610/4000 [09:48<03:41, 6.28it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2611/4000 [09:49<03:42, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2612/4000 [09:49<03:41, 6.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2613/4000 [09:49<03:41, 6.27it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2614/4000 [09:49<03:41, 6.27it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2615/4000 [09:49<03:40, 6.27it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2616/4000 [09:49<03:41, 6.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2617/4000 [09:49<03:41, 6.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2618/4000 [09:50<03:40, 6.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2619/4000 [09:50<03:40, 6.27it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2620/4000 [09:50<03:39, 6.28it/s] {'loss': 0.5413, 'grad_norm': 0.7198984622955322, 'learning_rate': 2.9198489464361288e-05}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2620/4000 [09:50<03:39, 6.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2621/4000 [09:50<03:40, 6.26it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2622/4000 [09:50<03:39, 6.27it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2623/4000 [09:50<03:39, 6.27it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2624/4000 [09:51<03:39, 6.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2625/4000 [09:51<03:39, 6.27it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2626/4000 [09:51<03:39, 6.27it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2627/4000 [09:51<03:38, 6.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2628/4000 [09:51<03:38, 6.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2629/4000 [09:51<03:39, 6.26it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2630/4000 [09:52<03:38, 6.27it/s] {'loss': 0.5279, 'grad_norm': 0.6209663152694702, 'learning_rate': 2.8823308601916948e-05}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2630/4000 [09:52<03:38, 6.27it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2631/4000 [09:52<03:39, 6.25it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2632/4000 [09:52<03:39, 6.24it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2633/4000 [09:52<03:39, 6.23it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2634/4000 [09:52<03:39, 6.22it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2635/4000 [09:52<03:38, 6.25it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2636/4000 [09:53<03:37, 6.26it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2637/4000 [09:53<03:37, 6.27it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2638/4000 [09:53<03:37, 6.27it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2639/4000 [09:53<03:36, 6.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2640/4000 [09:53<03:36, 6.28it/s] {'loss': 0.5459, 'grad_norm': 0.6671990752220154, 'learning_rate': 2.8449575138199613e-05}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2640/4000 [09:53<03:36, 6.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2641/4000 [09:53<03:36, 6.26it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2642/4000 [09:53<03:36, 6.26it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2643/4000 [09:54<03:36, 6.27it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2644/4000 [09:54<03:36, 6.26it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2645/4000 [09:54<03:35, 6.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2646/4000 [09:54<03:35, 6.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2647/4000 [09:54<03:35, 6.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2648/4000 [09:54<03:40, 6.14it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2649/4000 [09:55<03:50, 5.86it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2650/4000 [09:55<04:06, 5.48it/s] {'loss': 0.5154, 'grad_norm': 0.7317913770675659, 'learning_rate': 2.807731461739509e-05}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2650/4000 [09:55<04:06, 5.48it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2651/4000 [09:55<04:12, 5.35it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2652/4000 [09:55<04:10, 5.38it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2653/4000 [09:55<04:00, 5.60it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2654/4000 [09:56<03:53, 5.77it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2655/4000 [09:56<03:47, 5.91it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2656/4000 [09:56<03:43, 6.01it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2657/4000 [09:56<03:40, 6.09it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2658/4000 [09:56<03:38, 6.14it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2659/4000 [09:56<03:36, 6.19it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2660/4000 [09:56<03:35, 6.21it/s] {'loss': 0.5226, 'grad_norm': 0.7610775232315063, 'learning_rate': 2.7706552483015485e-05}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2660/4000 [09:56<03:35, 6.21it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2661/4000 [09:57<03:35, 6.21it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2662/4000 [09:57<03:34, 6.23it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2663/4000 [09:57<03:34, 6.24it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2664/4000 [09:57<03:33, 6.26it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2665/4000 [09:57<03:33, 6.26it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2666/4000 [09:57<03:32, 6.27it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2667/4000 [09:58<03:32, 6.27it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2668/4000 [09:58<03:32, 6.26it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2669/4000 [09:58<03:32, 6.27it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2670/4000 [09:58<03:31, 6.28it/s] {'loss': 0.4998, 'grad_norm': 0.781760573387146, 'learning_rate': 2.733731407616018e-05}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2670/4000 [09:58<03:31, 6.28it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2671/4000 [09:58<03:32, 6.26it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2672/4000 [09:58<03:32, 6.26it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2673/4000 [09:59<03:31, 6.26it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2674/4000 [09:59<03:31, 6.26it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2675/4000 [09:59<03:31, 6.27it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2676/4000 [09:59<03:31, 6.26it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2677/4000 [09:59<03:31, 6.25it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2678/4000 [09:59<03:31, 6.25it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2679/4000 [10:00<03:31, 6.25it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2680/4000 [10:00<03:31, 6.24it/s] {'loss': 0.515, 'grad_norm': 0.8036126494407654, 'learning_rate': 2.6969624633783806e-05}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2680/4000 [10:00<03:31, 6.24it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2681/4000 [10:00<03:32, 6.22it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2682/4000 [10:00<03:32, 6.21it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2683/4000 [10:00<03:31, 6.22it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2684/4000 [10:00<03:31, 6.23it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2685/4000 [10:00<03:30, 6.23it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2686/4000 [10:01<03:34, 6.13it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2687/4000 [10:01<03:33, 6.15it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2688/4000 [10:01<03:32, 6.17it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2689/4000 [10:01<03:31, 6.19it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2690/4000 [10:01<03:31, 6.21it/s] {'loss': 0.5162, 'grad_norm': 0.9693417549133301, 'learning_rate': 2.660350928697134e-05}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2690/4000 [10:01<03:31, 6.21it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2691/4000 [10:01<03:31, 6.20it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2692/4000 [10:02<03:30, 6.21it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2693/4000 [10:02<03:30, 6.22it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2694/4000 [10:02<03:29, 6.24it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2695/4000 [10:02<03:28, 6.24it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2696/4000 [10:02<03:28, 6.25it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2697/4000 [10:02<03:28, 6.25it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2698/4000 [10:03<03:28, 6.24it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2699/4000 [10:03<03:28, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2700/4000 [10:03<03:28, 6.24it/s] {'loss': 0.517, 'grad_norm': 0.7961346507072449, 'learning_rate': 2.6238993059220395e-05}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2700/4000 [10:03<03:28, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2701/4000 [10:03<03:28, 6.22it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2702/4000 [10:03<03:28, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2703/4000 [10:03<03:27, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2704/4000 [10:04<03:27, 6.25it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2705/4000 [10:04<03:27, 6.25it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2706/4000 [10:04<03:27, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2707/4000 [10:04<03:27, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2708/4000 [10:04<03:27, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2709/4000 [10:04<03:26, 6.25it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2710/4000 [10:05<03:26, 6.26it/s] {'loss': 0.4988, 'grad_norm': 0.7805870771408081, 'learning_rate': 2.5876100864730933e-05}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2710/4000 [10:05<03:26, 6.26it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2711/4000 [10:05<03:26, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2712/4000 [10:05<03:26, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2713/4000 [10:05<03:26, 6.22it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2714/4000 [10:05<03:26, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2715/4000 [10:05<03:26, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2716/4000 [10:05<03:26, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2717/4000 [10:06<03:26, 6.22it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2718/4000 [10:06<03:25, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2719/4000 [10:06<03:25, 6.22it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2720/4000 [10:06<03:25, 6.23it/s] {'loss': 0.5037, 'grad_norm': 0.7758889198303223, 'learning_rate': 2.5514857506702405e-05}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2720/4000 [10:06<03:25, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2721/4000 [10:06<03:25, 6.21it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2722/4000 [10:06<03:25, 6.22it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2723/4000 [10:07<03:24, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2724/4000 [10:07<03:24, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2725/4000 [10:07<03:24, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2726/4000 [10:07<03:24, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2727/4000 [10:07<03:24, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2728/4000 [10:07<03:24, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2729/4000 [10:08<03:23, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2730/4000 [10:08<03:23, 6.24it/s] {'loss': 0.4796, 'grad_norm': 0.737293541431427, 'learning_rate': 2.5155287675638474e-05}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2730/4000 [10:08<03:23, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2731/4000 [10:08<03:24, 6.20it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2732/4000 [10:08<03:24, 6.21it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2733/4000 [10:08<03:24, 6.21it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2734/4000 [10:08<03:22, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2735/4000 [10:09<03:22, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2736/4000 [10:09<03:22, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2737/4000 [10:09<03:22, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2738/4000 [10:09<03:22, 6.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2739/4000 [10:09<03:22, 6.24it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2740/4000 [10:09<03:21, 6.24it/s] {'loss': 0.4858, 'grad_norm': 0.8677336573600769, 'learning_rate': 2.4797415947659457e-05}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2740/4000 [10:09<03:21, 6.24it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2741/4000 [10:09<03:21, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2742/4000 [10:10<03:21, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2743/4000 [10:10<03:22, 6.22it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2744/4000 [10:10<03:21, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2745/4000 [10:10<03:21, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2746/4000 [10:10<03:21, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2747/4000 [10:10<03:20, 6.25it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2748/4000 [10:11<03:20, 6.25it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2749/4000 [10:11<03:20, 6.25it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2750/4000 [10:11<03:19, 6.26it/s] {'loss': 0.4781, 'grad_norm': 0.8488855361938477, 'learning_rate': 2.4441266782822588e-05}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2750/4000 [10:11<03:19, 6.26it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2751/4000 [10:11<03:20, 6.24it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2752/4000 [10:11<03:19, 6.24it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2753/4000 [10:11<03:19, 6.26it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2754/4000 [10:12<03:19, 6.26it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2755/4000 [10:12<03:19, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2756/4000 [10:12<03:19, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2757/4000 [10:12<03:19, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2758/4000 [10:12<03:19, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2759/4000 [10:12<03:18, 6.25it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2760/4000 [10:13<03:18, 6.25it/s] {'loss': 0.4737, 'grad_norm': 0.746828556060791, 'learning_rate': 2.4086864523450183e-05}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2760/4000 [10:13<03:18, 6.25it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2761/4000 [10:13<03:18, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2762/4000 [10:13<03:18, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2763/4000 [10:13<03:18, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2764/4000 [10:13<03:18, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2765/4000 [10:13<03:17, 6.26it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2766/4000 [10:13<03:17, 6.26it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2767/4000 [10:14<03:17, 6.25it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2768/4000 [10:14<03:17, 6.24it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2769/4000 [10:14<03:16, 6.26it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2770/4000 [10:14<03:16, 6.26it/s] {'loss': 0.4832, 'grad_norm': 0.8593442440032959, 'learning_rate': 2.3734233392465903e-05}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2770/4000 [10:14<03:16, 6.26it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2771/4000 [10:14<03:16, 6.25it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2772/4000 [10:14<03:16, 6.25it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2773/4000 [10:15<03:16, 6.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2774/4000 [10:15<03:17, 6.22it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2775/4000 [10:15<03:16, 6.22it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2776/4000 [10:15<03:16, 6.22it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2777/4000 [10:15<03:15, 6.24it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2778/4000 [10:15<03:15, 6.24it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2779/4000 [10:16<03:15, 6.24it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2780/4000 [10:16<03:15, 6.23it/s] {'loss': 0.4797, 'grad_norm': 0.737777590751648, 'learning_rate': 2.3383397491739145e-05}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2780/4000 [10:16<03:15, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2781/4000 [10:16<03:16, 6.21it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2782/4000 [10:16<03:15, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2783/4000 [10:16<03:14, 6.25it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2784/4000 [10:16<03:14, 6.27it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2785/4000 [10:17<03:14, 6.25it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2786/4000 [10:17<03:13, 6.26it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2787/4000 [10:17<03:14, 6.25it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2788/4000 [10:17<03:13, 6.25it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2789/4000 [10:17<03:13, 6.26it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2790/4000 [10:17<03:13, 6.25it/s] {'loss': 0.4545, 'grad_norm': 0.7161794304847717, 'learning_rate': 2.3034380800437678e-05}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2790/4000 [10:17<03:13, 6.25it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2791/4000 [10:17<03:14, 6.22it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2792/4000 [10:18<03:14, 6.22it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2793/4000 [10:18<03:13, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2794/4000 [10:18<03:13, 6.24it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2795/4000 [10:18<03:12, 6.25it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2796/4000 [10:18<03:13, 6.24it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2797/4000 [10:18<03:13, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2798/4000 [10:19<03:13, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2799/4000 [10:19<03:12, 6.24it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2800/4000 [10:19<03:12, 6.24it/s] {'loss': 0.4664, 'grad_norm': 0.8545267581939697, 'learning_rate': 2.2687207173388743e-05}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2800/4000 [10:19<03:12, 6.24it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2801/4000 [10:19<03:12, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2802/4000 [10:19<03:12, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2803/4000 [10:19<03:12, 6.21it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2804/4000 [10:20<03:12, 6.20it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2805/4000 [10:20<03:12, 6.21it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2806/4000 [10:20<03:12, 6.21it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2807/4000 [10:20<03:11, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2808/4000 [10:20<03:11, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2809/4000 [10:20<03:11, 6.22it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2810/4000 [10:21<03:11, 6.21it/s] {'loss': 0.4647, 'grad_norm': 0.7654674649238586, 'learning_rate': 2.234190033944858e-05}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2810/4000 [10:21<03:11, 6.21it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2811/4000 [10:21<03:11, 6.22it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2812/4000 [10:21<03:11, 6.22it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2813/4000 [10:21<03:11, 6.20it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2814/4000 [10:21<03:10, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2815/4000 [10:21<03:10, 6.22it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2816/4000 [10:22<03:10, 6.21it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2817/4000 [10:22<03:10, 6.22it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2818/4000 [10:22<03:09, 6.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2819/4000 [10:22<03:09, 6.24it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2820/4000 [10:22<03:09, 6.23it/s] {'loss': 0.4588, 'grad_norm': 0.8628227710723877, 'learning_rate': 2.1998483899880596e-05}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2820/4000 [10:22<03:09, 6.23it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2821/4000 [10:22<03:09, 6.21it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2822/4000 [10:22<03:09, 6.21it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2823/4000 [10:23<03:09, 6.23it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2824/4000 [10:23<03:08, 6.24it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2825/4000 [10:23<03:08, 6.25it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2826/4000 [10:23<03:08, 6.24it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2827/4000 [10:23<03:08, 6.22it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2828/4000 [10:23<03:08, 6.21it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2829/4000 [10:24<03:08, 6.21it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2830/4000 [10:24<03:08, 6.20it/s] {'loss': 0.4431, 'grad_norm': 0.9096059799194336, 'learning_rate': 2.1656981326742266e-05}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2830/4000 [10:24<03:08, 6.20it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2831/4000 [10:24<03:09, 6.17it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2832/4000 [10:24<03:09, 6.17it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2833/4000 [10:24<03:09, 6.16it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2834/4000 [10:24<03:09, 6.17it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2835/4000 [10:25<03:08, 6.19it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2836/4000 [10:25<03:07, 6.20it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2837/4000 [10:25<03:07, 6.20it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2838/4000 [10:25<03:07, 6.20it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2839/4000 [10:25<03:07, 6.20it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2840/4000 [10:25<03:07, 6.20it/s] {'loss': 0.4565, 'grad_norm': 0.8760948181152344, 'learning_rate': 2.1317415961280824e-05}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2840/4000 [10:25<03:07, 6.20it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2841/4000 [10:26<03:07, 6.20it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2842/4000 [10:26<03:07, 6.17it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2843/4000 [10:26<03:07, 6.19it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2844/4000 [10:26<03:07, 6.17it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2845/4000 [10:26<03:08, 6.14it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2846/4000 [10:26<03:07, 6.16it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2847/4000 [10:27<03:07, 6.16it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2848/4000 [10:27<03:07, 6.16it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2849/4000 [10:27<03:10, 6.03it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2850/4000 [10:27<03:13, 5.96it/s] {'loss': 0.4248, 'grad_norm': 0.8818371295928955, 'learning_rate': 2.097981101233794e-05}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2850/4000 [10:27<03:13, 5.96it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2851/4000 [10:27<03:12, 5.98it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2852/4000 [10:27<03:11, 6.00it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2853/4000 [10:28<03:11, 6.00it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2854/4000 [10:28<03:09, 6.04it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2855/4000 [10:28<03:08, 6.07it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2856/4000 [10:28<03:08, 6.07it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2857/4000 [10:28<03:08, 6.07it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2858/4000 [10:28<03:08, 6.05it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2859/4000 [10:29<03:06, 6.11it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2860/4000 [10:29<03:06, 6.13it/s] {'loss': 0.4139, 'grad_norm': 0.9236174821853638, 'learning_rate': 2.0644189554763417e-05}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2860/4000 [10:29<03:06, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2861/4000 [10:29<03:05, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2862/4000 [10:29<03:05, 6.15it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2863/4000 [10:29<03:04, 6.16it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2864/4000 [10:29<03:04, 6.17it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2865/4000 [10:29<03:03, 6.17it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2866/4000 [10:30<03:03, 6.19it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2867/4000 [10:30<03:03, 6.19it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2868/4000 [10:30<03:02, 6.19it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2869/4000 [10:30<03:03, 6.16it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2870/4000 [10:30<03:03, 6.16it/s] {'loss': 0.4365, 'grad_norm': 0.8099502921104431, 'learning_rate': 2.0310574527838072e-05}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2870/4000 [10:30<03:03, 6.16it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2871/4000 [10:30<03:03, 6.16it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2872/4000 [10:31<03:02, 6.20it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2873/4000 [10:31<03:01, 6.21it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2874/4000 [10:31<03:00, 6.23it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2875/4000 [10:31<03:01, 6.20it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2876/4000 [10:31<03:02, 6.16it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2877/4000 [10:31<03:03, 6.12it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2878/4000 [10:32<03:02, 6.15it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2879/4000 [10:32<03:02, 6.14it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2880/4000 [10:32<03:02, 6.14it/s] {'loss': 0.4367, 'grad_norm': 0.9216195344924927, 'learning_rate': 1.9978988733705807e-05}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2880/4000 [10:32<03:02, 6.14it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2881/4000 [10:32<03:02, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2882/4000 [10:32<03:02, 6.14it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2883/4000 [10:32<03:01, 6.15it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2884/4000 [10:33<03:01, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2885/4000 [10:33<03:02, 6.11it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2886/4000 [10:33<03:01, 6.14it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2887/4000 [10:33<03:01, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2888/4000 [10:33<03:02, 6.11it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2889/4000 [10:33<03:01, 6.12it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2890/4000 [10:34<03:01, 6.13it/s] {'loss': 0.4581, 'grad_norm': 0.8922995328903198, 'learning_rate': 1.9649454835815202e-05}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2890/4000 [10:34<03:01, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2891/4000 [10:34<03:01, 6.10it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2892/4000 [10:34<03:01, 6.10it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2893/4000 [10:34<03:01, 6.11it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2894/4000 [10:34<03:01, 6.10it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2895/4000 [10:34<03:00, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2896/4000 [10:35<03:00, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2897/4000 [10:35<02:59, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2898/4000 [10:35<02:59, 6.15it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2899/4000 [10:35<02:59, 6.13it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2900/4000 [10:35<03:01, 6.07it/s] {'loss': 0.4173, 'grad_norm': 0.8256015181541443, 'learning_rate': 1.932199535737045e-05}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2900/4000 [10:35<03:01, 6.07it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2901/4000 [10:35<03:01, 6.05it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2902/4000 [10:36<03:00, 6.07it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2903/4000 [10:36<02:59, 6.11it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2904/4000 [10:36<03:02, 6.02it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2905/4000 [10:36<03:02, 6.01it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2906/4000 [10:36<03:00, 6.05it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2907/4000 [10:36<03:03, 5.96it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2908/4000 [10:37<03:01, 6.01it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2909/4000 [10:37<03:02, 5.99it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2910/4000 [10:37<03:00, 6.03it/s] {'loss': 0.4392, 'grad_norm': 0.9429636001586914, 'learning_rate': 1.8996632679791914e-05}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2910/4000 [10:37<03:00, 6.03it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2911/4000 [10:37<03:01, 6.01it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2912/4000 [10:37<03:00, 6.01it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2913/4000 [10:37<03:01, 5.98it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2914/4000 [10:38<03:01, 5.98it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2915/4000 [10:38<02:59, 6.03it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2916/4000 [10:38<02:59, 6.04it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2917/4000 [10:38<02:59, 6.03it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2918/4000 [10:38<02:59, 6.03it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2919/4000 [10:38<03:02, 5.93it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2920/4000 [10:39<03:03, 5.88it/s] {'loss': 0.3899, 'grad_norm': 1.0027697086334229, 'learning_rate': 1.8673389041186418e-05}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2920/4000 [10:39<03:03, 5.88it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2921/4000 [10:39<03:04, 5.86it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2922/4000 [10:39<03:03, 5.89it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2923/4000 [10:39<03:03, 5.88it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2924/4000 [10:39<03:03, 5.85it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2925/4000 [10:39<03:08, 5.71it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2926/4000 [10:40<03:09, 5.67it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2927/4000 [10:40<03:07, 5.72it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2928/4000 [10:40<03:05, 5.77it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2929/4000 [10:40<03:06, 5.75it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2930/4000 [10:40<03:06, 5.73it/s] {'loss': 0.4113, 'grad_norm': 0.9124972820281982, 'learning_rate': 1.8352286534827274e-05}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2930/4000 [10:40<03:06, 5.73it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2931/4000 [10:40<03:06, 5.73it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2932/4000 [10:41<03:05, 5.76it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2933/4000 [10:41<03:02, 5.84it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2934/4000 [10:41<03:01, 5.87it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2935/4000 [10:41<03:00, 5.91it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2936/4000 [10:41<02:59, 5.93it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2937/4000 [10:41<02:59, 5.92it/s]Rank 2, Worker 3: Wait for shard 6 in dataset 0 in 0.00 seconds
Rank 2, Worker 3: Caching shard...
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2938/4000 [10:42<03:01, 5.85it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2939/4000 [10:42<03:03, 5.79it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2940/4000 [10:42<03:03, 5.77it/s] {'loss': 0.4008, 'grad_norm': 0.8158929944038391, 'learning_rate': 1.803334710764426e-05}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2940/4000 [10:42<03:03, 5.77it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2941/4000 [10:42<03:03, 5.77it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2942/4000 [10:42<03:03, 5.76it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2943/4000 [10:43<03:11, 5.53it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2944/4000 [10:43<03:18, 5.32it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2945/4000 [10:43<03:20, 5.27it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2946/4000 [10:43<03:27, 5.07it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2947/4000 [10:43<03:31, 4.98it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2948/4000 [10:44<03:21, 5.21it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2949/4000 [10:44<03:15, 5.37it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2950/4000 [10:44<03:13, 5.43it/s] {'loss': 0.392, 'grad_norm': 0.9610748291015625, 'learning_rate': 1.7716592558723556e-05}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2950/4000 [10:44<03:13, 5.43it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2951/4000 [10:44<03:14, 5.40it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2952/4000 [10:44<03:12, 5.43it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2953/4000 [10:44<03:08, 5.56it/s]Rank 2, Worker 1: Wait for shard 21 in dataset 0 in 0.00 seconds
Rank 2, Worker 1: Caching shard...
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2954/4000 [10:45<03:08, 5.54it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2955/4000 [10:45<03:11, 5.46it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2956/4000 [10:45<03:12, 5.41it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2957/4000 [10:45<03:08, 5.54it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2958/4000 [10:45<03:14, 5.37it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2959/4000 [10:46<03:18, 5.23it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2960/4000 [10:46<03:13, 5.37it/s] {'loss': 0.3887, 'grad_norm': 0.9714084267616272, 'learning_rate': 1.7402044537817824e-05}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2960/4000 [10:46<03:13, 5.37it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2961/4000 [10:46<03:17, 5.26it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2962/4000 [10:46<03:19, 5.20it/s]Rank 3, Worker 4: Wait for shard 5 in dataset 0 in 0.00 seconds
Rank 3, Worker 4: Caching shard...
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2963/4000 [10:46<03:20, 5.17it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2964/4000 [10:46<03:16, 5.27it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2965/4000 [10:47<03:13, 5.36it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2966/4000 [10:47<03:17, 5.22it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2967/4000 [10:47<03:18, 5.21it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2968/4000 [10:47<03:14, 5.30it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2969/4000 [10:47<03:17, 5.22it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2970/4000 [10:48<03:15, 5.27it/s] {'loss': 0.3764, 'grad_norm': 0.9433429837226868, 'learning_rate': 1.7089724543866465e-05}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2970/4000 [10:48<03:15, 5.27it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2971/4000 [10:48<03:14, 5.29it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2972/4000 [10:48<03:13, 5.31it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2973/4000 [10:48<03:11, 5.36it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2974/4000 [10:48<03:06, 5.50it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2975/4000 [10:49<03:04, 5.55it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2976/4000 [10:49<03:07, 5.45it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2977/4000 [10:49<03:10, 5.37it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2978/4000 [10:49<03:07, 5.44it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2979/4000 [10:49<03:03, 5.56it/s]Rank 1, Worker 3: Wait for shard 39 in dataset 0 in 0.00 seconds
Rank 1, Worker 3: Caching shard...
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2980/4000 [10:49<03:05, 5.49it/s] {'loss': 0.3971, 'grad_norm': 0.8527407050132751, 'learning_rate': 1.6779653923526188e-05}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2980/4000 [10:49<03:05, 5.49it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2981/4000 [10:50<03:08, 5.40it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2982/4000 [10:50<03:07, 5.43it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2983/4000 [10:50<03:08, 5.41it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2984/4000 [10:50<03:16, 5.18it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2985/4000 [10:50<03:15, 5.19it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2986/4000 [10:51<03:10, 5.33it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2987/4000 [10:51<03:06, 5.44it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2988/4000 [10:51<03:06, 5.43it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2989/4000 [10:51<03:07, 5.39it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2990/4000 [10:51<03:06, 5.41it/s] {'loss': 0.3891, 'grad_norm': 0.9382534027099609, 'learning_rate': 1.6471853869712023e-05}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2990/4000 [10:51<03:06, 5.41it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2991/4000 [10:52<03:06, 5.41it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2992/4000 [10:52<03:05, 5.44it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2993/4000 [10:52<03:07, 5.38it/s]Rank 0, Worker 5: Wait for shard 36 in dataset 0 in 0.00 seconds
Rank 0, Worker 5: Caching shard...
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2994/4000 [10:52<03:07, 5.36it/s]Rank 3, Worker 0: Wait for shard 62 in dataset 0 in 0.00 seconds
Rank 3, Worker 0: Caching shard...
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2995/4000 [10:52<03:07, 5.37it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2996/4000 [10:52<03:07, 5.35it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2997/4000 [10:53<03:05, 5.41it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2998/4000 [10:53<03:05, 5.41it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2999/4000 [10:53<03:08, 5.32it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3000/4000 [10:53<03:07, 5.33it/s] {'loss': 0.388, 'grad_norm': 0.8641057014465332, 'learning_rate': 1.6166345420148787e-05}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3000/4000 [10:53<03:07, 5.33it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Copying experiment config directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/experiment_cfg to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-3000/experiment_cfg
Copying processor directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/processor to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-3000
Copying wandb_config.json from /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/wandb_config.json to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-3000/wandb_config.json
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3001/4000 [11:22<2:26:41, 8.81s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3002/4000 [11:22<1:43:22, 6.21s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3003/4000 [11:22<1:13:05, 4.40s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3004/4000 [11:23<52:13, 3.15s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3005/4000 [11:23<37:40, 2.27s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3006/4000 [11:23<27:22, 1.65s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3007/4000 [11:23<20:08, 1.22s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3008/4000 [11:23<15:02, 1.10it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3009/4000 [11:24<11:29, 1.44it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3010/4000 [11:24<09:00, 1.83it/s] {'loss': 0.3764, 'grad_norm': 0.9208613038063049, 'learning_rate': 1.5863149455933158e-05}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3010/4000 [11:24<09:00, 1.83it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3011/4000 [11:24<07:16, 2.27it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3012/4000 [11:24<06:03, 2.72it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3013/4000 [11:24<05:12, 3.16it/s]Rank 3, Worker 1: Wait for shard 2 in dataset 0 in 0.00 seconds
Rank 3, Worker 1: Caching shard...
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3014/4000 [11:25<04:35, 3.58it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3015/4000 [11:25<04:11, 3.92it/s]Rank 3, Worker 3: Wait for shard 26 in dataset 0 in 0.00 seconds
Rank 3, Worker 3: Caching shard...
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3016/4000 [11:25<03:54, 4.19it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3017/4000 [11:25<03:42, 4.42it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3018/4000 [11:25<03:38, 4.49it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3019/4000 [11:26<03:32, 4.62it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3020/4000 [11:26<03:26, 4.74it/s] {'loss': 0.3683, 'grad_norm': 0.9462664127349854, 'learning_rate': 1.5562286700106558e-05}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3020/4000 [11:26<03:26, 4.74it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3021/4000 [11:26<03:24, 4.78it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3022/4000 [11:26<03:25, 4.75it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3023/4000 [11:26<03:20, 4.87it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3024/4000 [11:27<03:17, 4.94it/s]Rank 1, Worker 0: Wait for shard 13 in dataset 0 in 0.00 seconds
Rank 1, Worker 0: Caching shard...
Rank 0, Worker 0: Wait for shard 49 in dataset 0 in 0.00 seconds
Rank 0, Worker 0: Caching shard...
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3025/4000 [11:27<03:19, 4.89it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3026/4000 [11:27<03:23, 4.78it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3027/4000 [11:27<03:19, 4.88it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3028/4000 [11:28<03:17, 4.93it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3029/4000 [11:28<03:13, 5.03it/s]Rank 2, Worker 5: Wait for shard 37 in dataset 0 in 0.00 seconds
Rank 2, Worker 5: Caching shard...
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3030/4000 [11:28<03:09, 5.13it/s] {'loss': 0.3668, 'grad_norm': 0.971708357334137, 'learning_rate': 1.526377771623869e-05}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3030/4000 [11:28<03:09, 5.13it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3031/4000 [11:28<03:08, 5.13it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3032/4000 [11:28<03:02, 5.30it/s]Rank 0, Worker 2: Wait for shard 23 in dataset 0 in 0.00 seconds
Rank 0, Worker 2: Caching shard...
Rank 3, Worker 2: Wait for shard 33 in dataset 0 in 0.00 seconds
Rank 3, Worker 2: Caching shard...
Rank 1, Worker 2: Wait for shard 26 in dataset 0 in 0.00 seconds
Rank 1, Worker 2: Caching shard...
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3033/4000 [11:28<03:01, 5.32it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3034/4000 [11:29<03:03, 5.26it/s]Rank 2, Worker 4: Wait for shard 60 in dataset 0 in 0.00 seconds
Rank 2, Worker 4: Caching shard...
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3035/4000 [11:29<03:03, 5.25it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3036/4000 [11:29<03:03, 5.25it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3037/4000 [11:29<03:02, 5.27it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3038/4000 [11:29<03:01, 5.29it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3039/4000 [11:30<03:04, 5.21it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3040/4000 [11:30<03:04, 5.19it/s] {'loss': 0.3783, 'grad_norm': 0.8841544985771179, 'learning_rate': 1.496764290702209e-05}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3040/4000 [11:30<03:04, 5.19it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3041/4000 [11:30<03:03, 5.23it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3042/4000 [11:30<03:04, 5.19it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3043/4000 [11:30<03:04, 5.19it/s]Rank 1, Worker 1: Wait for shard 43 in dataset 0 in 0.00 seconds
Rank 1, Worker 1: Caching shard...
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3044/4000 [11:31<03:03, 5.21it/s]Rank 2, Worker 2: Wait for shard 16 in dataset 0 in 0.00 seconds
Rank 2, Worker 2: Caching shard...
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3045/4000 [11:31<03:03, 5.19it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3046/4000 [11:31<03:06, 5.11it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3047/4000 [11:31<03:05, 5.13it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3048/4000 [11:31<03:05, 5.15it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3049/4000 [11:32<03:04, 5.15it/s]Rank 0, Worker 1: Wait for shard 42 in dataset 0 in 0.00 seconds
Rank 0, Worker 1: Caching shard...
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3050/4000 [11:32<03:04, 5.14it/s] {'loss': 0.3456, 'grad_norm': 1.0586179494857788, 'learning_rate': 1.4673902512877585e-05}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3050/4000 [11:32<03:04, 5.14it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3051/4000 [11:32<03:04, 5.14it/s]Rank 0, Worker 3: Wait for shard 44 in dataset 0 in 0.00 seconds
Rank 0, Worker 3: Caching shard...
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3052/4000 [11:32<03:05, 5.12it/s]Rank 0, Worker 4: Wait for shard 11 in dataset 0 in 0.00 seconds
Rank 0, Worker 4: Caching shard...
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3053/4000 [11:32<03:07, 5.04it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3054/4000 [11:33<03:07, 5.05it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3055/4000 [11:33<03:07, 5.04it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3056/4000 [11:33<03:05, 5.09it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3057/4000 [11:33<03:09, 4.97it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3058/4000 [11:33<03:10, 4.95it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3059/4000 [11:34<03:08, 4.98it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3060/4000 [11:34<03:08, 4.99it/s] {'loss': 0.384, 'grad_norm': 0.9688908457756042, 'learning_rate': 1.438257661057093e-05}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3060/4000 [11:34<03:08, 4.99it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3061/4000 [11:34<03:08, 4.99it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3062/4000 [11:34<03:06, 5.02it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3063/4000 [11:34<03:10, 4.93it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3064/4000 [11:35<03:10, 4.91it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3065/4000 [11:35<03:09, 4.94it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3066/4000 [11:35<03:06, 5.01it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3067/4000 [11:35<03:03, 5.08it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3068/4000 [11:35<03:04, 5.05it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3069/4000 [11:36<03:10, 4.87it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3070/4000 [11:36<03:12, 4.83it/s] {'loss': 0.3646, 'grad_norm': 0.8748918771743774, 'learning_rate': 1.4093685111840566e-05}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3070/4000 [11:36<03:12, 4.83it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3071/4000 [11:36<03:09, 4.90it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3072/4000 [11:36<03:07, 4.95it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3073/4000 [11:36<03:06, 4.96it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3074/4000 [11:37<03:09, 4.88it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3075/4000 [11:37<03:07, 4.92it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3076/4000 [11:37<03:09, 4.88it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3077/4000 [11:37<03:09, 4.87it/s]Rank 3, Worker 5: Wait for shard 15 in dataset 0 in 0.00 seconds
Rank 3, Worker 5: Caching shard...
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3078/4000 [11:37<03:12, 4.78it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3079/4000 [11:38<03:21, 4.58it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3080/4000 [11:38<03:17, 4.65it/s] {'loss': 0.3398, 'grad_norm': 0.8363520503044128, 'learning_rate': 1.380724776203668e-05}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3080/4000 [11:38<03:17, 4.65it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3081/4000 [11:38<03:11, 4.80it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3082/4000 [11:38<03:09, 4.85it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3083/4000 [11:38<03:19, 4.60it/s]Rank 1, Worker 5: Wait for shard 54 in dataset 0 in 0.00 seconds
Rank 1, Worker 5: Caching shard...
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3084/4000 [11:39<03:19, 4.58it/s]Rank 2, Worker 0: Wait for shard 15 in dataset 0 in 0.00 seconds
Rank 2, Worker 0: Caching shard...
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3085/4000 [11:39<03:17, 4.64it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3086/4000 [11:39<03:13, 4.72it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3087/4000 [11:39<03:14, 4.69it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3088/4000 [11:40<03:14, 4.70it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3089/4000 [11:40<03:11, 4.76it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3090/4000 [11:40<03:08, 4.83it/s] {'loss': 0.3378, 'grad_norm': 1.029313325881958, 'learning_rate': 1.3523284138771642e-05}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3090/4000 [11:40<03:08, 4.83it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3091/4000 [11:40<03:06, 4.86it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3092/4000 [11:40<03:05, 4.90it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3093/4000 [11:41<03:08, 4.81it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3094/4000 [11:41<03:13, 4.69it/s]Rank 1, Worker 4: Wait for shard 25 in dataset 0 in 0.00 seconds
Rank 1, Worker 4: Caching shard...
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3095/4000 [11:41<03:12, 4.71it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3096/4000 [11:41<03:10, 4.75it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3097/4000 [11:41<03:10, 4.73it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3098/4000 [11:42<03:11, 4.70it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3099/4000 [11:42<03:08, 4.77it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3100/4000 [11:42<03:08, 4.78it/s] {'loss': 0.3467, 'grad_norm': 1.0453342199325562, 'learning_rate': 1.3241813650581902e-05}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3100/4000 [11:42<03:08, 4.78it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3101/4000 [11:42<03:08, 4.78it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3102/4000 [11:42<03:14, 4.61it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3103/4000 [11:43<03:09, 4.74it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3104/4000 [11:43<03:07, 4.78it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3105/4000 [11:43<03:04, 4.85it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3106/4000 [11:43<03:02, 4.90it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3107/4000 [11:43<02:58, 5.01it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3108/4000 [11:44<02:57, 5.02it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3109/4000 [11:44<02:56, 5.05it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3110/4000 [11:44<02:53, 5.12it/s] {'loss': 0.3653, 'grad_norm': 0.8220339417457581, 'learning_rate': 1.2962855535601438e-05}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3110/4000 [11:44<02:53, 5.12it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3111/4000 [11:44<02:51, 5.19it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3112/4000 [11:44<02:49, 5.25it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3113/4000 [11:45<02:52, 5.14it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3114/4000 [11:45<03:01, 4.87it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3115/4000 [11:45<02:57, 4.99it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3116/4000 [11:45<02:50, 5.18it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3117/4000 [11:45<02:49, 5.22it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3118/4000 [11:46<02:49, 5.20it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3119/4000 [11:46<02:49, 5.19it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3120/4000 [11:46<02:50, 5.17it/s] {'loss': 0.3447, 'grad_norm': 1.1042197942733765, 'learning_rate': 1.2686428860246852e-05}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3120/4000 [11:46<02:50, 5.17it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3121/4000 [11:46<02:47, 5.25it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3122/4000 [11:46<02:45, 5.31it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3123/4000 [11:47<02:45, 5.31it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3124/4000 [11:47<02:44, 5.31it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3125/4000 [11:47<02:45, 5.29it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3126/4000 [11:47<02:43, 5.33it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3127/4000 [11:47<02:39, 5.47it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3128/4000 [11:47<02:34, 5.65it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3129/4000 [11:48<02:34, 5.64it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3130/4000 [11:48<02:36, 5.54it/s] {'loss': 0.3212, 'grad_norm': 1.0877461433410645, 'learning_rate': 1.241255251791421e-05}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3130/4000 [11:48<02:36, 5.54it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3131/4000 [11:48<02:36, 5.56it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3132/4000 [11:48<02:34, 5.63it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3133/4000 [11:48<02:31, 5.74it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3134/4000 [11:49<02:33, 5.65it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3135/4000 [11:49<02:33, 5.65it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3136/4000 [11:49<02:32, 5.66it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3137/4000 [11:49<02:32, 5.64it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3138/4000 [11:49<02:31, 5.70it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3139/4000 [11:49<02:29, 5.77it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3140/4000 [11:50<02:33, 5.62it/s] {'loss': 0.3329, 'grad_norm': 1.1951923370361328, 'learning_rate': 1.2141245227687731e-05}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3140/4000 [11:50<02:33, 5.62it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3141/4000 [11:50<02:36, 5.49it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3142/4000 [11:50<02:34, 5.55it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3143/4000 [11:50<02:31, 5.66it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3144/4000 [11:50<02:31, 5.64it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3145/4000 [11:50<02:34, 5.55it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3146/4000 [11:51<02:33, 5.57it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3147/4000 [11:51<02:33, 5.57it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3148/4000 [11:51<02:31, 5.63it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3149/4000 [11:51<02:30, 5.65it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3150/4000 [11:51<02:30, 5.64it/s] {'loss': 0.3302, 'grad_norm': 0.9668404459953308, 'learning_rate': 1.1872525533060269e-05}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3150/4000 [11:51<02:30, 5.64it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3151/4000 [11:52<02:30, 5.63it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3152/4000 [11:52<02:31, 5.60it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3153/4000 [11:52<02:28, 5.72it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3154/4000 [11:52<02:25, 5.83it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3155/4000 [11:52<02:34, 5.45it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3156/4000 [11:52<02:36, 5.40it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3157/4000 [11:53<02:30, 5.59it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3158/4000 [11:53<02:27, 5.69it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3159/4000 [11:53<02:24, 5.83it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3160/4000 [11:53<02:21, 5.96it/s] {'loss': 0.3402, 'grad_norm': 0.9291806817054749, 'learning_rate': 1.1606411800666028e-05}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3160/4000 [11:53<02:21, 5.96it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3161/4000 [11:53<02:20, 5.98it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3162/4000 [11:53<02:20, 5.98it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3163/4000 [11:54<02:20, 5.96it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3164/4000 [11:54<02:19, 6.01it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3165/4000 [11:54<02:17, 6.08it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3166/4000 [11:54<02:16, 6.12it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3167/4000 [11:54<02:16, 6.11it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3168/4000 [11:54<02:15, 6.14it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3169/4000 [11:55<02:14, 6.16it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3170/4000 [11:55<02:14, 6.19it/s] {'loss': 0.3362, 'grad_norm': 1.126897931098938, 'learning_rate': 1.134292221902511e-05}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3170/4000 [11:55<02:14, 6.19it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3171/4000 [11:55<02:13, 6.19it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3172/4000 [11:55<02:13, 6.22it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3173/4000 [11:55<02:12, 6.24it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3174/4000 [11:55<02:12, 6.23it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3175/4000 [11:56<02:12, 6.23it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3176/4000 [11:56<02:11, 6.25it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3177/4000 [11:56<02:11, 6.24it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3178/4000 [11:56<02:11, 6.25it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3179/4000 [11:56<02:11, 6.25it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3180/4000 [11:56<02:10, 6.27it/s] {'loss': 0.3484, 'grad_norm': 0.9181669354438782, 'learning_rate': 1.1082074797300413e-05}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3180/4000 [11:56<02:10, 6.27it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3181/4000 [11:57<02:11, 6.24it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3182/4000 [11:57<02:10, 6.25it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3183/4000 [11:57<02:10, 6.25it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3184/4000 [11:57<02:11, 6.21it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3185/4000 [11:57<02:11, 6.21it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3186/4000 [11:57<02:10, 6.23it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3187/4000 [11:57<02:10, 6.22it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3188/4000 [11:58<02:10, 6.23it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3189/4000 [11:58<02:09, 6.24it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3190/4000 [11:58<02:09, 6.24it/s] {'loss': 0.3319, 'grad_norm': 1.1557016372680664, 'learning_rate': 1.0823887364066737e-05}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3190/4000 [11:58<02:09, 6.24it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3191/4000 [11:58<02:10, 6.21it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3192/4000 [11:58<02:10, 6.21it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3193/4000 [11:58<02:10, 6.20it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3194/4000 [11:59<02:09, 6.20it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3195/4000 [11:59<02:09, 6.21it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3196/4000 [11:59<02:09, 6.23it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3197/4000 [11:59<02:08, 6.23it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3198/4000 [11:59<02:08, 6.23it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3199/4000 [11:59<02:08, 6.22it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3200/4000 [12:00<02:08, 6.24it/s] {'loss': 0.3319, 'grad_norm': 1.0621029138565063, 'learning_rate': 1.0568377566092164e-05}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3200/4000 [12:00<02:08, 6.24it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3201/4000 [12:00<02:08, 6.21it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3202/4000 [12:00<02:08, 6.20it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3203/4000 [12:00<02:09, 6.13it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3204/4000 [12:00<02:10, 6.10it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3205/4000 [12:00<02:11, 6.06it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3206/4000 [12:01<02:11, 6.03it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3207/4000 [12:01<02:12, 5.99it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3208/4000 [12:01<02:12, 5.98it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3209/4000 [12:01<02:12, 5.98it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3210/4000 [12:01<02:12, 5.98it/s] {'loss': 0.3124, 'grad_norm': 0.981543779373169, 'learning_rate': 1.0315562867131983e-05}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3210/4000 [12:01<02:12, 5.98it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3211/4000 [12:01<02:11, 5.99it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3212/4000 [12:02<02:13, 5.90it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3213/4000 [12:02<02:11, 5.99it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3214/4000 [12:02<02:09, 6.08it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3215/4000 [12:02<02:12, 5.93it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3216/4000 [12:02<02:12, 5.93it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3217/4000 [12:02<02:10, 6.00it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3218/4000 [12:03<02:08, 6.07it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3219/4000 [12:03<02:07, 6.11it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3220/4000 [12:03<02:06, 6.14it/s] {'loss': 0.3072, 'grad_norm': 0.9941000938415527, 'learning_rate': 1.0065460546735045e-05}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3220/4000 [12:03<02:06, 6.14it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3221/4000 [12:03<02:06, 6.15it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3222/4000 [12:03<02:06, 6.16it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3223/4000 [12:03<02:06, 6.17it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3224/4000 [12:04<02:07, 6.10it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3225/4000 [12:04<02:06, 6.11it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3226/4000 [12:04<02:10, 5.92it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3227/4000 [12:04<02:11, 5.90it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3228/4000 [12:04<02:15, 5.71it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3229/4000 [12:04<02:12, 5.84it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3230/4000 [12:05<02:11, 5.86it/s] {'loss': 0.3219, 'grad_norm': 1.1148889064788818, 'learning_rate': 9.818087699062716e-06}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3230/4000 [12:05<02:11, 5.86it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3231/4000 [12:05<02:18, 5.56it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3232/4000 [12:05<02:17, 5.59it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3233/4000 [12:05<02:14, 5.69it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3234/4000 [12:05<02:13, 5.76it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3235/4000 [12:05<02:11, 5.82it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3236/4000 [12:06<02:10, 5.87it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3237/4000 [12:06<02:08, 5.93it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3238/4000 [12:06<02:08, 5.93it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3239/4000 [12:06<02:11, 5.79it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3240/4000 [12:06<02:09, 5.85it/s] {'loss': 0.3243, 'grad_norm': 1.0006496906280518, 'learning_rate': 9.57346123172055e-06}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3240/4000 [12:06<02:09, 5.85it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3241/4000 [12:06<02:09, 5.87it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3242/4000 [12:07<02:08, 5.92it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3243/4000 [12:07<02:07, 5.93it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3244/4000 [12:07<02:05, 6.03it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3245/4000 [12:07<02:06, 5.97it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3246/4000 [12:07<02:16, 5.53it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3247/4000 [12:08<02:13, 5.63it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3248/4000 [12:08<02:13, 5.61it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3249/4000 [12:08<02:10, 5.76it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3250/4000 [12:08<02:07, 5.86it/s] {'loss': 0.3297, 'grad_norm': 1.200180172920227, 'learning_rate': 9.331597864602632e-06}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3250/4000 [12:08<02:07, 5.86it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3251/4000 [12:08<02:07, 5.89it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3252/4000 [12:08<02:12, 5.67it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3253/4000 [12:09<02:12, 5.66it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3254/4000 [12:09<02:18, 5.39it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3255/4000 [12:09<02:15, 5.50it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3256/4000 [12:09<02:12, 5.61it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3257/4000 [12:09<02:10, 5.70it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3258/4000 [12:09<02:08, 5.77it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3259/4000 [12:10<02:15, 5.47it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3260/4000 [12:10<02:16, 5.42it/s] {'loss': 0.352, 'grad_norm': 1.2609386444091797, 'learning_rate': 9.09251412874882e-06}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3260/4000 [12:10<02:16, 5.42it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3261/4000 [12:10<02:13, 5.52it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3262/4000 [12:10<02:15, 5.44it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3263/4000 [12:10<02:12, 5.58it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3264/4000 [12:11<02:11, 5.58it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3265/4000 [12:11<02:10, 5.61it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3266/4000 [12:11<02:14, 5.45it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3267/4000 [12:11<02:13, 5.50it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3268/4000 [12:11<02:10, 5.61it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3269/4000 [12:11<02:07, 5.75it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3270/4000 [12:12<02:05, 5.80it/s] {'loss': 0.3226, 'grad_norm': 0.92691570520401, 'learning_rate': 8.856226365214897e-06}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3270/4000 [12:12<02:05, 5.80it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3271/4000 [12:12<02:08, 5.69it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3272/4000 [12:12<02:10, 5.59it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3273/4000 [12:12<02:09, 5.62it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3274/4000 [12:12<02:09, 5.61it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3275/4000 [12:12<02:07, 5.69it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3276/4000 [12:13<02:06, 5.74it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3277/4000 [12:13<02:06, 5.70it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3278/4000 [12:13<02:08, 5.63it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3279/4000 [12:13<02:07, 5.63it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3280/4000 [12:13<02:07, 5.63it/s] {'loss': 0.3188, 'grad_norm': 0.9962168335914612, 'learning_rate': 8.622750723955597e-06}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3280/4000 [12:13<02:07, 5.63it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3281/4000 [12:14<02:06, 5.68it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3282/4000 [12:14<02:09, 5.56it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3283/4000 [12:14<02:08, 5.57it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3284/4000 [12:14<02:07, 5.60it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3285/4000 [12:14<02:11, 5.44it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3286/4000 [12:14<02:11, 5.41it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3287/4000 [12:15<02:15, 5.28it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3288/4000 [12:15<02:12, 5.38it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3289/4000 [12:15<02:11, 5.41it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3290/4000 [12:15<02:12, 5.37it/s] {'loss': 0.3164, 'grad_norm': 0.9542444348335266, 'learning_rate': 8.392103162720883e-06}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3290/4000 [12:15<02:12, 5.37it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3291/4000 [12:15<02:12, 5.37it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3292/4000 [12:16<02:08, 5.50it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3293/4000 [12:16<02:09, 5.44it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3294/4000 [12:16<02:10, 5.43it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3295/4000 [12:16<02:09, 5.46it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3296/4000 [12:16<02:09, 5.44it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3297/4000 [12:17<02:08, 5.48it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3298/4000 [12:17<02:05, 5.58it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3299/4000 [12:17<02:05, 5.61it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3300/4000 [12:17<02:04, 5.63it/s] {'loss': 0.3147, 'grad_norm': 0.9532768726348877, 'learning_rate': 8.164299445965167e-06}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3300/4000 [12:17<02:04, 5.63it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3301/4000 [12:17<02:04, 5.60it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3302/4000 [12:17<02:03, 5.65it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3303/4000 [12:18<02:02, 5.67it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3304/4000 [12:18<02:01, 5.71it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3305/4000 [12:18<02:01, 5.73it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3306/4000 [12:18<02:00, 5.74it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3307/4000 [12:18<02:00, 5.77it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3308/4000 [12:18<01:59, 5.78it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3309/4000 [12:19<02:08, 5.36it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3310/4000 [12:19<02:10, 5.29it/s] {'loss': 0.2978, 'grad_norm': 1.0415639877319336, 'learning_rate': 7.939355143769905e-06}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3310/4000 [12:19<02:10, 5.29it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3311/4000 [12:19<02:11, 5.25it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3312/4000 [12:19<02:11, 5.21it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3313/4000 [12:19<02:11, 5.24it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3314/4000 [12:20<02:09, 5.32it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3315/4000 [12:20<02:06, 5.40it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3316/4000 [12:20<02:05, 5.46it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3317/4000 [12:20<02:04, 5.49it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3318/4000 [12:20<02:09, 5.28it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3319/4000 [12:21<02:09, 5.25it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3320/4000 [12:21<02:11, 5.15it/s] {'loss': 0.3212, 'grad_norm': 1.0933952331542969, 'learning_rate': 7.717285630779341e-06}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3320/4000 [12:21<02:11, 5.15it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3321/4000 [12:21<02:11, 5.17it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3322/4000 [12:21<02:10, 5.19it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3323/4000 [12:21<02:13, 5.06it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3324/4000 [12:22<02:11, 5.13it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3325/4000 [12:22<02:08, 5.25it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3326/4000 [12:22<02:03, 5.46it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3327/4000 [12:22<01:59, 5.62it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3328/4000 [12:22<01:57, 5.73it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3329/4000 [12:22<01:55, 5.83it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3330/4000 [12:23<02:00, 5.55it/s] {'loss': 0.292, 'grad_norm': 1.0649725198745728, 'learning_rate': 7.498106085149697e-06}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3330/4000 [12:23<02:00, 5.55it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3331/4000 [12:23<01:58, 5.64it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3332/4000 [12:23<01:56, 5.73it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3333/4000 [12:23<01:54, 5.82it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3334/4000 [12:23<01:53, 5.87it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3335/4000 [12:23<01:53, 5.88it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3336/4000 [12:24<01:53, 5.87it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3337/4000 [12:24<01:53, 5.83it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3338/4000 [12:24<01:53, 5.85it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3339/4000 [12:24<01:51, 5.91it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3340/4000 [12:24<01:52, 5.88it/s] {'loss': 0.2882, 'grad_norm': 1.0602281093597412, 'learning_rate': 7.2818314875117755e-06}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3340/4000 [12:24<01:52, 5.88it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3341/4000 [12:24<01:51, 5.92it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3342/4000 [12:25<01:54, 5.73it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3343/4000 [12:25<01:53, 5.80it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3344/4000 [12:25<01:53, 5.80it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3345/4000 [12:25<01:51, 5.87it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3346/4000 [12:25<01:53, 5.76it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3347/4000 [12:25<01:51, 5.84it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3348/4000 [12:26<01:49, 5.93it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3349/4000 [12:26<01:48, 6.00it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3350/4000 [12:26<01:47, 6.07it/s] {'loss': 0.2956, 'grad_norm': 0.9819366931915283, 'learning_rate': 7.06847661994704e-06}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3350/4000 [12:26<01:47, 6.07it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3351/4000 [12:26<01:46, 6.11it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3352/4000 [12:26<01:45, 6.16it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3353/4000 [12:26<01:44, 6.19it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3354/4000 [12:27<01:43, 6.22it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3355/4000 [12:27<01:43, 6.21it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3356/4000 [12:27<01:43, 6.23it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3357/4000 [12:27<01:42, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3358/4000 [12:27<01:42, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3359/4000 [12:27<01:42, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3360/4000 [12:28<01:41, 6.28it/s] {'loss': 0.2941, 'grad_norm': 1.047581672668457, 'learning_rate': 6.85805606497727e-06}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3360/4000 [12:28<01:41, 6.28it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3361/4000 [12:28<01:42, 6.23it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3362/4000 [12:28<01:41, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3363/4000 [12:28<01:41, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3364/4000 [12:28<01:41, 6.25it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3365/4000 [12:28<01:41, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3366/4000 [12:28<01:41, 6.27it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3367/4000 [12:29<01:41, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3368/4000 [12:29<01:40, 6.27it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3369/4000 [12:29<01:41, 6.23it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3370/4000 [12:29<01:41, 6.22it/s] {'loss': 0.2975, 'grad_norm': 0.8465542793273926, 'learning_rate': 6.650584204567889e-06}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3370/4000 [12:29<01:41, 6.22it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3371/4000 [12:29<01:41, 6.21it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3372/4000 [12:29<01:40, 6.24it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3373/4000 [12:30<01:40, 6.24it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3374/4000 [12:30<01:40, 6.25it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3375/4000 [12:30<01:39, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3376/4000 [12:30<01:39, 6.25it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3377/4000 [12:30<01:39, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3378/4000 [12:30<01:39, 6.26it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3379/4000 [12:31<01:39, 6.23it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3380/4000 [12:31<01:39, 6.24it/s] {'loss': 0.2934, 'grad_norm': 0.9267279505729675, 'learning_rate': 6.446075219144965e-06}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3380/4000 [12:31<01:39, 6.24it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3381/4000 [12:31<01:39, 6.21it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3382/4000 [12:31<01:39, 6.22it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3383/4000 [12:31<01:39, 6.21it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3384/4000 [12:31<01:38, 6.24it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3385/4000 [12:32<01:38, 6.24it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3386/4000 [12:32<01:38, 6.25it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3387/4000 [12:32<01:37, 6.27it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3388/4000 [12:32<01:37, 6.25it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3389/4000 [12:32<01:37, 6.25it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3390/4000 [12:32<01:37, 6.26it/s] {'loss': 0.2899, 'grad_norm': 0.9246605038642883, 'learning_rate': 6.244543086625987e-06}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3390/4000 [12:32<01:37, 6.26it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3391/4000 [12:32<01:37, 6.24it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3392/4000 [12:33<01:37, 6.26it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3393/4000 [12:33<01:36, 6.26it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3394/4000 [12:33<01:36, 6.26it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3395/4000 [12:33<01:36, 6.26it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3396/4000 [12:33<01:36, 6.27it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3397/4000 [12:33<01:36, 6.27it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3398/4000 [12:34<01:36, 6.21it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3399/4000 [12:34<01:36, 6.21it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3400/4000 [12:34<01:36, 6.21it/s] {'loss': 0.2908, 'grad_norm': 1.0072201490402222, 'learning_rate': 6.046001581464505e-06}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3400/4000 [12:34<01:36, 6.21it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3401/4000 [12:34<01:36, 6.21it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3402/4000 [12:34<01:36, 6.22it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3403/4000 [12:34<01:35, 6.23it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3404/4000 [12:35<01:35, 6.25it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3405/4000 [12:35<01:35, 6.26it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3406/4000 [12:35<01:35, 6.24it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3407/4000 [12:35<01:35, 6.23it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3408/4000 [12:35<01:35, 6.21it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3409/4000 [12:35<01:35, 6.19it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3410/4000 [12:36<01:35, 6.20it/s] {'loss': 0.2871, 'grad_norm': 1.091598629951477, 'learning_rate': 5.850464273708689e-06}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3410/4000 [12:36<01:35, 6.20it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3411/4000 [12:36<01:35, 6.16it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3412/4000 [12:36<01:35, 6.17it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3413/4000 [12:36<01:35, 6.17it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3414/4000 [12:36<01:35, 6.15it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3415/4000 [12:36<01:34, 6.16it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3416/4000 [12:37<01:34, 6.16it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3417/4000 [12:37<01:34, 6.16it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3418/4000 [12:37<01:34, 6.17it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3419/4000 [12:37<01:34, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3420/4000 [12:37<01:33, 6.18it/s] {'loss': 0.3229, 'grad_norm': 1.0161452293395996, 'learning_rate': 5.657944528073745e-06}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3420/4000 [12:37<01:33, 6.18it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3421/4000 [12:37<01:34, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3422/4000 [12:37<01:33, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3423/4000 [12:38<01:33, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3424/4000 [12:38<01:34, 6.08it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3425/4000 [12:38<01:34, 6.10it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3426/4000 [12:38<01:33, 6.12it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3427/4000 [12:38<01:33, 6.14it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3428/4000 [12:38<01:33, 6.14it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3429/4000 [12:39<01:32, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3430/4000 [12:39<01:32, 6.17it/s] {'loss': 0.3004, 'grad_norm': 1.214100956916809, 'learning_rate': 5.468455503028574e-06}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3430/4000 [12:39<01:32, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3431/4000 [12:39<01:32, 6.15it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3432/4000 [12:39<01:32, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3433/4000 [12:39<01:32, 6.12it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3434/4000 [12:39<01:32, 6.14it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3435/4000 [12:40<01:32, 6.13it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3436/4000 [12:40<01:31, 6.15it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3437/4000 [12:40<01:31, 6.15it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3438/4000 [12:40<01:31, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3439/4000 [12:40<01:31, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3440/4000 [12:40<01:30, 6.16it/s] {'loss': 0.2901, 'grad_norm': 0.9175690412521362, 'learning_rate': 5.282010149896327e-06}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3440/4000 [12:40<01:30, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3441/4000 [12:41<01:30, 6.14it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3442/4000 [12:41<01:30, 6.15it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3443/4000 [12:41<01:30, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3444/4000 [12:41<01:30, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3445/4000 [12:41<01:30, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3446/4000 [12:41<01:29, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3447/4000 [12:42<01:29, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3448/4000 [12:42<01:29, 6.18it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3449/4000 [12:42<01:29, 6.18it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3450/4000 [12:42<01:28, 6.18it/s] {'loss': 0.2948, 'grad_norm': 0.9620636701583862, 'learning_rate': 5.098621211969223e-06}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3450/4000 [12:42<01:28, 6.18it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3451/4000 [12:42<01:29, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3452/4000 [12:42<01:28, 6.16it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3453/4000 [12:43<01:28, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3454/4000 [12:43<01:28, 6.18it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3455/4000 [12:43<01:28, 6.18it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3456/4000 [12:43<01:28, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3457/4000 [12:43<01:27, 6.17it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3458/4000 [12:43<01:27, 6.18it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3459/4000 [12:44<01:27, 6.18it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3460/4000 [12:44<01:27, 6.18it/s] {'loss': 0.2787, 'grad_norm': 1.1081660985946655, 'learning_rate': 4.918301223637573e-06}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3460/4000 [12:44<01:27, 6.18it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3461/4000 [12:44<01:27, 6.15it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3462/4000 [12:44<01:27, 6.15it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3463/4000 [12:44<01:27, 6.16it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3464/4000 [12:44<01:27, 6.15it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3465/4000 [12:44<01:27, 6.14it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3466/4000 [12:45<01:27, 6.13it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3467/4000 [12:45<01:26, 6.13it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3468/4000 [12:45<01:26, 6.15it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3469/4000 [12:45<01:26, 6.16it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3470/4000 [12:45<01:26, 6.16it/s] {'loss': 0.2934, 'grad_norm': 1.2562566995620728, 'learning_rate': 4.74106250953304e-06}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3470/4000 [12:45<01:26, 6.16it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3471/4000 [12:45<01:26, 6.14it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3472/4000 [12:46<01:25, 6.16it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3473/4000 [12:46<01:25, 6.19it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3474/4000 [12:46<01:24, 6.21it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3475/4000 [12:46<01:24, 6.19it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3476/4000 [12:46<01:24, 6.21it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3477/4000 [12:46<01:24, 6.22it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3478/4000 [12:47<01:23, 6.24it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3479/4000 [12:47<01:23, 6.25it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3480/4000 [12:47<01:23, 6.25it/s] {'loss': 0.2649, 'grad_norm': 0.9992050528526306, 'learning_rate': 4.566917183686309e-06}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3480/4000 [12:47<01:23, 6.25it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3481/4000 [12:47<01:23, 6.23it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3482/4000 [12:47<01:22, 6.25it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3483/4000 [12:47<01:22, 6.27it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3484/4000 [12:48<01:22, 6.26it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3485/4000 [12:48<01:22, 6.26it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3486/4000 [12:48<01:21, 6.27it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3487/4000 [12:48<01:22, 6.24it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3488/4000 [12:48<01:21, 6.25it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3489/4000 [12:48<01:21, 6.26it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3490/4000 [12:49<01:21, 6.26it/s] {'loss': 0.2876, 'grad_norm': 1.1165425777435303, 'learning_rate': 4.3958771486990735e-06}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3490/4000 [12:49<01:21, 6.26it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3491/4000 [12:49<01:21, 6.22it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3492/4000 [12:49<01:21, 6.23it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3493/4000 [12:49<01:21, 6.23it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3494/4000 [12:49<01:21, 6.24it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3495/4000 [12:49<01:20, 6.25it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3496/4000 [12:49<01:20, 6.27it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3497/4000 [12:50<01:20, 6.27it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3498/4000 [12:50<01:20, 6.27it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3499/4000 [12:50<01:19, 6.26it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3500/4000 [12:50<01:19, 6.27it/s] {'loss': 0.2889, 'grad_norm': 1.1011738777160645, 'learning_rate': 4.22795409493052e-06}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3500/4000 [12:50<01:19, 6.27it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Copying experiment config directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/experiment_cfg to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-3500/experiment_cfg
Copying processor directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/processor to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-3500
Copying wandb_config.json from /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/wandb_config.json to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-3500/wandb_config.json
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3501/4000 [13:17<1:07:50, 8.16s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3502/4000 [13:17<47:47, 5.76s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3503/4000 [13:17<33:46, 4.08s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3504/4000 [13:17<23:59, 2.90s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3505/4000 [13:18<17:09, 2.08s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3506/4000 [13:18<12:22, 1.50s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3507/4000 [13:18<09:02, 1.10s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3508/4000 [13:18<06:42, 1.22it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3509/4000 [13:18<05:05, 1.61it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3510/4000 [13:18<03:56, 2.07it/s] {'loss': 0.2756, 'grad_norm': 0.9482487440109253, 'learning_rate': 4.06315949969831e-06}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3510/4000 [13:18<03:56, 2.07it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3511/4000 [13:19<03:09, 2.59it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3512/4000 [13:19<02:35, 3.14it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3513/4000 [13:19<02:12, 3.69it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3514/4000 [13:19<01:55, 4.21it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3515/4000 [13:19<01:44, 4.66it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3516/4000 [13:19<01:35, 5.04it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3517/4000 [13:19<01:30, 5.35it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3518/4000 [13:20<01:26, 5.60it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3519/4000 [13:20<01:23, 5.79it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3520/4000 [13:20<01:21, 5.92it/s] {'loss': 0.2787, 'grad_norm': 0.9160575866699219, 'learning_rate': 3.901504626494135e-06}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3520/4000 [13:20<01:21, 5.92it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3521/4000 [13:20<01:19, 5.99it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3522/4000 [13:20<01:19, 6.05it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3523/4000 [13:20<01:18, 6.10it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3524/4000 [13:21<01:17, 6.16it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3525/4000 [13:21<01:16, 6.18it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3526/4000 [13:21<01:16, 6.22it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3527/4000 [13:21<01:15, 6.23it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3528/4000 [13:21<01:15, 6.22it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3529/4000 [13:21<01:15, 6.24it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3530/4000 [13:22<01:14, 6.27it/s] {'loss': 0.2635, 'grad_norm': 0.917259156703949, 'learning_rate': 3.7430005242138354e-06}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3530/4000 [13:22<01:14, 6.27it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3531/4000 [13:22<01:15, 6.25it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3532/4000 [13:22<01:14, 6.27it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3533/4000 [13:22<01:14, 6.28it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3534/4000 [13:22<01:14, 6.28it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3535/4000 [13:22<01:14, 6.27it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3536/4000 [13:23<01:13, 6.30it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3537/4000 [13:23<01:13, 6.31it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3538/4000 [13:23<01:13, 6.32it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3539/4000 [13:23<01:12, 6.32it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3540/4000 [13:23<01:12, 6.32it/s] {'loss': 0.2672, 'grad_norm': 1.1588302850723267, 'learning_rate': 3.587658026402263e-06}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3540/4000 [13:23<01:12, 6.32it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3541/4000 [13:23<01:13, 6.28it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3542/4000 [13:23<01:12, 6.28it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3543/4000 [13:24<01:12, 6.29it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3544/4000 [13:24<01:12, 6.30it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3545/4000 [13:24<01:12, 6.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3546/4000 [13:24<01:11, 6.32it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3547/4000 [13:24<01:12, 6.28it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3548/4000 [13:24<01:11, 6.30it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3549/4000 [13:25<01:11, 6.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3550/4000 [13:25<01:11, 6.31it/s] {'loss': 0.3, 'grad_norm': 1.113232135772705, 'learning_rate': 3.435487750512778e-06}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3550/4000 [13:25<01:11, 6.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3551/4000 [13:25<01:11, 6.29it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3552/4000 [13:25<01:10, 6.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3553/4000 [13:25<01:10, 6.32it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3554/4000 [13:25<01:17, 5.78it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3555/4000 [13:26<01:21, 5.49it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3556/4000 [13:26<01:17, 5.72it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3557/4000 [13:26<01:15, 5.89it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3558/4000 [13:26<01:13, 6.02it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3559/4000 [13:26<01:12, 6.12it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3560/4000 [13:26<01:11, 6.14it/s] {'loss': 0.2911, 'grad_norm': 1.152016520500183, 'learning_rate': 3.2865000971816107e-06}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3560/4000 [13:26<01:11, 6.14it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3561/4000 [13:27<01:10, 6.19it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3562/4000 [13:27<01:10, 6.23it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3563/4000 [13:27<01:09, 6.26it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3564/4000 [13:27<01:09, 6.28it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3565/4000 [13:27<01:09, 6.30it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3566/4000 [13:27<01:08, 6.30it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3567/4000 [13:28<01:08, 6.30it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3568/4000 [13:28<01:08, 6.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3569/4000 [13:28<01:08, 6.32it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3570/4000 [13:28<01:07, 6.33it/s] {'loss': 0.2597, 'grad_norm': 1.0010101795196533, 'learning_rate': 3.1407052495169566e-06}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3570/4000 [13:28<01:07, 6.33it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3571/4000 [13:28<01:08, 6.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3572/4000 [13:28<01:08, 6.29it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3573/4000 [13:28<01:07, 6.30it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3574/4000 [13:29<01:07, 6.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3575/4000 [13:29<01:07, 6.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3576/4000 [13:29<01:07, 6.32it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3577/4000 [13:29<01:06, 6.33it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3578/4000 [13:29<01:06, 6.33it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3579/4000 [13:29<01:06, 6.33it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3580/4000 [13:30<01:06, 6.34it/s] {'loss': 0.2736, 'grad_norm': 0.996920108795166, 'learning_rate': 2.9981131724029887e-06}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3580/4000 [13:30<01:06, 6.34it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3581/4000 [13:30<01:06, 6.30it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3582/4000 [13:30<01:06, 6.31it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3583/4000 [13:30<01:06, 6.31it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3584/4000 [13:30<01:05, 6.32it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3585/4000 [13:30<01:05, 6.33it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3586/4000 [13:31<01:05, 6.33it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3587/4000 [13:31<01:05, 6.33it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3588/4000 [13:31<01:05, 6.32it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3589/4000 [13:31<01:04, 6.33it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3590/4000 [13:31<01:04, 6.33it/s] {'loss': 0.2807, 'grad_norm': 0.8811072707176208, 'learning_rate': 2.858733611818765e-06}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3590/4000 [13:31<01:04, 6.33it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3591/4000 [13:31<01:04, 6.30it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3592/4000 [13:31<01:05, 6.25it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3593/4000 [13:32<01:08, 5.93it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3594/4000 [13:32<01:14, 5.43it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3595/4000 [13:32<01:17, 5.24it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3596/4000 [13:32<01:18, 5.15it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3597/4000 [13:32<01:19, 5.10it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3598/4000 [13:33<01:18, 5.11it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3599/4000 [13:33<01:18, 5.12it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3600/4000 [13:33<01:18, 5.10it/s] {'loss': 0.2812, 'grad_norm': 1.1340476274490356, 'learning_rate': 2.7225760941721136e-06}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3600/4000 [13:33<01:18, 5.10it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3601/4000 [13:33<01:18, 5.07it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3602/4000 [13:33<01:18, 5.10it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3603/4000 [13:34<01:18, 5.06it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3604/4000 [13:34<01:18, 5.02it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3605/4000 [13:34<01:18, 5.06it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3606/4000 [13:34<01:17, 5.09it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3607/4000 [13:34<01:17, 5.06it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3608/4000 [13:35<01:18, 4.98it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3609/4000 [13:35<01:18, 4.98it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3610/4000 [13:35<01:17, 5.06it/s] {'loss': 0.2683, 'grad_norm': 1.1680265665054321, 'learning_rate': 2.589649925648491e-06}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3610/4000 [13:35<01:17, 5.06it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3611/4000 [13:35<01:12, 5.37it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3612/4000 [13:35<01:09, 5.60it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3613/4000 [13:36<01:06, 5.79it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3614/4000 [13:36<01:05, 5.92it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3615/4000 [13:36<01:03, 6.03it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3616/4000 [13:36<01:02, 6.11it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3617/4000 [13:36<01:02, 6.14it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3618/4000 [13:36<01:01, 6.17it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3619/4000 [13:37<01:01, 6.21it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3620/4000 [13:37<01:00, 6.23it/s] {'loss': 0.2584, 'grad_norm': 0.907533586025238, 'learning_rate': 2.459964191574948e-06}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3620/4000 [13:37<01:00, 6.23it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3621/4000 [13:37<01:01, 6.21it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3622/4000 [13:37<01:00, 6.24it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3623/4000 [13:37<01:00, 6.25it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3624/4000 [13:37<01:00, 6.25it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3625/4000 [13:37<00:59, 6.26it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3626/4000 [13:38<00:59, 6.26it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3627/4000 [13:38<00:59, 6.27it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3628/4000 [13:38<00:59, 6.29it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3629/4000 [13:38<00:58, 6.30it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3630/4000 [13:38<00:58, 6.28it/s] {'loss': 0.2632, 'grad_norm': 1.019956111907959, 'learning_rate': 2.3335277557991364e-06}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3630/4000 [13:38<00:58, 6.28it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3631/4000 [13:38<00:58, 6.27it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3632/4000 [13:39<00:58, 6.27it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3633/4000 [13:39<00:58, 6.28it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3634/4000 [13:39<00:58, 6.29it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3635/4000 [13:39<00:58, 6.26it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3636/4000 [13:39<00:58, 6.26it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3637/4000 [13:39<00:57, 6.28it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3638/4000 [13:40<00:57, 6.27it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3639/4000 [13:40<00:57, 6.28it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3640/4000 [13:40<00:57, 6.29it/s] {'loss': 0.2905, 'grad_norm': 1.2022958993911743, 'learning_rate': 2.210349260083494e-06}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3640/4000 [13:40<00:57, 6.29it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3641/4000 [13:40<00:57, 6.25it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3642/4000 [13:40<00:57, 6.22it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3643/4000 [13:40<00:57, 6.23it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3644/4000 [13:40<00:56, 6.25it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3645/4000 [13:41<00:56, 6.26it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3646/4000 [13:41<00:56, 6.27it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3647/4000 [13:41<00:56, 6.28it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3648/4000 [13:41<00:56, 6.28it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3649/4000 [13:41<00:55, 6.28it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3650/4000 [13:41<00:55, 6.27it/s] {'loss': 0.2518, 'grad_norm': 0.9064944982528687, 'learning_rate': 2.0904371235145827e-06}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3650/4000 [13:41<00:55, 6.27it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3651/4000 [13:42<00:55, 6.27it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3652/4000 [13:42<00:55, 6.27it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3653/4000 [13:42<00:55, 6.28it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3654/4000 [13:42<00:55, 6.27it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3655/4000 [13:42<00:54, 6.29it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3656/4000 [13:42<00:54, 6.28it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3657/4000 [13:43<00:54, 6.30it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3658/4000 [13:43<00:54, 6.30it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3659/4000 [13:43<00:54, 6.31it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3660/4000 [13:43<00:54, 6.29it/s] {'loss': 0.289, 'grad_norm': 1.0236254930496216, 'learning_rate': 1.9737995419276455e-06}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3660/4000 [13:43<00:54, 6.29it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3661/4000 [13:43<00:53, 6.28it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3662/4000 [13:43<00:53, 6.28it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3663/4000 [13:44<00:53, 6.29it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3664/4000 [13:44<00:53, 6.27it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3665/4000 [13:44<00:53, 6.28it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3666/4000 [13:44<00:53, 6.27it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3667/4000 [13:44<00:53, 6.28it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3668/4000 [13:44<00:52, 6.27it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3669/4000 [13:44<00:52, 6.28it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3670/4000 [13:45<00:52, 6.28it/s] {'loss': 0.2889, 'grad_norm': 1.0530792474746704, 'learning_rate': 1.8604444873464466e-06}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3670/4000 [13:45<00:52, 6.28it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3671/4000 [13:45<00:52, 6.24it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3672/4000 [13:45<00:52, 6.24it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3673/4000 [13:45<00:52, 6.26it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3674/4000 [13:45<00:52, 6.22it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3675/4000 [13:45<00:52, 6.15it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3676/4000 [13:46<00:56, 5.78it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3677/4000 [13:46<00:58, 5.53it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3678/4000 [13:46<01:00, 5.36it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3679/4000 [13:46<00:57, 5.60it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3680/4000 [13:46<00:55, 5.79it/s] {'loss': 0.2639, 'grad_norm': 0.8767210245132446, 'learning_rate': 1.7503797074383988e-06}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3680/4000 [13:46<00:55, 5.79it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3681/4000 [13:47<00:53, 5.92it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3682/4000 [13:47<00:52, 6.04it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3683/4000 [13:47<00:51, 6.12it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3684/4000 [13:47<00:51, 6.17it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3685/4000 [13:47<00:50, 6.20it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3686/4000 [13:47<00:50, 6.23it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3687/4000 [13:47<00:50, 6.23it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3688/4000 [13:48<00:49, 6.25it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3689/4000 [13:48<00:49, 6.27it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3690/4000 [13:48<00:49, 6.26it/s] {'loss': 0.3181, 'grad_norm': 0.9884085059165955, 'learning_rate': 1.6436127249849998e-06}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3690/4000 [13:48<00:49, 6.26it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3691/4000 [13:48<00:49, 6.25it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3692/4000 [13:48<00:49, 6.23it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3693/4000 [13:48<00:49, 6.25it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3694/4000 [13:49<00:48, 6.27it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3695/4000 [13:49<00:48, 6.26it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3696/4000 [13:49<00:48, 6.26it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3697/4000 [13:49<00:48, 6.28it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3698/4000 [13:49<00:48, 6.29it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3699/4000 [13:49<00:47, 6.29it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3700/4000 [13:50<00:47, 6.30it/s] {'loss': 0.276, 'grad_norm': 0.8931695222854614, 'learning_rate': 1.5401508373676764e-06}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3700/4000 [13:50<00:47, 6.30it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3701/4000 [13:50<00:47, 6.26it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3702/4000 [13:50<00:47, 6.23it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3703/4000 [13:50<00:47, 6.26it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3704/4000 [13:50<00:47, 6.27it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3705/4000 [13:50<00:46, 6.29it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3706/4000 [13:51<00:46, 6.30it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3707/4000 [13:51<00:46, 6.31it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3708/4000 [13:51<00:46, 6.27it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3709/4000 [13:51<00:46, 6.28it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3710/4000 [13:51<00:46, 6.28it/s] {'loss': 0.2637, 'grad_norm': 0.9981472492218018, 'learning_rate': 1.4400011160690175e-06}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3710/4000 [13:51<00:46, 6.28it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3711/4000 [13:51<00:46, 6.26it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3712/4000 [13:51<00:46, 6.18it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3713/4000 [13:52<00:46, 6.21it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3714/4000 [13:52<00:46, 6.19it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3715/4000 [13:52<00:45, 6.23it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3716/4000 [13:52<00:45, 6.26it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3717/4000 [13:52<00:45, 6.26it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3718/4000 [13:52<00:44, 6.28it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3719/4000 [13:53<00:44, 6.29it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3720/4000 [13:53<00:44, 6.25it/s] {'loss': 0.3144, 'grad_norm': 1.0214052200317383, 'learning_rate': 1.3431704061894312e-06}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3720/4000 [13:53<00:44, 6.25it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3721/4000 [13:53<00:44, 6.25it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3722/4000 [13:53<00:44, 6.27it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3723/4000 [13:53<00:44, 6.29it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3724/4000 [13:53<00:43, 6.30it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3725/4000 [13:54<00:43, 6.30it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3726/4000 [13:54<00:43, 6.27it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3727/4000 [13:54<00:43, 6.29it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3728/4000 [13:54<00:43, 6.30it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3729/4000 [13:54<00:43, 6.30it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3730/4000 [13:54<00:42, 6.30it/s] {'loss': 0.2505, 'grad_norm': 1.0098587274551392, 'learning_rate': 1.2496653259793268e-06}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3730/4000 [13:54<00:42, 6.30it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3731/4000 [13:54<00:42, 6.26it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3732/4000 [13:55<00:42, 6.25it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3733/4000 [13:55<00:42, 6.27it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3734/4000 [13:55<00:42, 6.27it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3735/4000 [13:55<00:42, 6.27it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3736/4000 [13:55<00:42, 6.28it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3737/4000 [13:55<00:41, 6.29it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3738/4000 [13:56<00:42, 6.24it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3739/4000 [13:56<00:41, 6.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3740/4000 [13:56<00:41, 6.27it/s] {'loss': 0.2711, 'grad_norm': 0.9494035243988037, 'learning_rate': 1.1594922663867135e-06}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3740/4000 [13:56<00:41, 6.27it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3741/4000 [13:56<00:41, 6.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3742/4000 [13:56<00:41, 6.27it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3743/4000 [13:56<00:40, 6.27it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3744/4000 [13:57<00:41, 6.22it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3745/4000 [13:57<00:41, 6.19it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3746/4000 [13:57<00:40, 6.21it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3747/4000 [13:57<00:40, 6.24it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3748/4000 [13:57<00:40, 6.24it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 3749/4000 [13:57<00:40, 6.24it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3750/4000 [13:58<00:40, 6.23it/s] {'loss': 0.2434, 'grad_norm': 0.8343875408172607, 'learning_rate': 1.0726573906204463e-06}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3750/4000 [13:58<00:40, 6.23it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3751/4000 [13:58<00:40, 6.21it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3752/4000 [13:58<00:39, 6.23it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3753/4000 [13:58<00:39, 6.23it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3754/4000 [13:58<00:39, 6.23it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3755/4000 [13:58<00:39, 6.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3756/4000 [13:59<00:39, 6.23it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3757/4000 [13:59<00:38, 6.24it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3758/4000 [13:59<00:38, 6.27it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3759/4000 [13:59<00:38, 6.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3760/4000 [13:59<00:38, 6.26it/s] {'loss': 0.2563, 'grad_norm': 1.033735990524292, 'learning_rate': 9.891666337289273e-07}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3760/4000 [13:59<00:38, 6.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3761/4000 [13:59<00:38, 6.24it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3762/4000 [13:59<00:38, 6.14it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3763/4000 [14:00<00:38, 6.14it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3764/4000 [14:00<00:38, 6.18it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3765/4000 [14:00<00:37, 6.21it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3766/4000 [14:00<00:37, 6.21it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3767/4000 [14:00<00:37, 6.21it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3768/4000 [14:00<00:37, 6.17it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3769/4000 [14:01<00:37, 6.19it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3770/4000 [14:01<00:37, 6.19it/s] {'loss': 0.2771, 'grad_norm': 1.0040981769561768, 'learning_rate': 9.090257021944881e-07}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3770/4000 [14:01<00:37, 6.19it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3771/4000 [14:01<00:37, 6.18it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3772/4000 [14:01<00:36, 6.20it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3773/4000 [14:01<00:36, 6.22it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3774/4000 [14:01<00:36, 6.19it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3775/4000 [14:02<00:36, 6.20it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3776/4000 [14:02<00:36, 6.17it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3777/4000 [14:02<00:36, 6.18it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3778/4000 [14:02<00:35, 6.18it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3779/4000 [14:02<00:35, 6.15it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3780/4000 [14:02<00:36, 6.09it/s] {'loss': 0.2475, 'grad_norm': 1.004023551940918, 'learning_rate': 8.3224007354335e-07}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3780/4000 [14:02<00:36, 6.09it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3781/4000 [14:03<00:36, 6.07it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3782/4000 [14:03<00:35, 6.09it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3783/4000 [14:03<00:35, 6.11it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3784/4000 [14:03<00:35, 6.13it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3785/4000 [14:03<00:34, 6.17it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3786/4000 [14:03<00:34, 6.17it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3787/4000 [14:04<00:34, 6.20it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3788/4000 [14:04<00:34, 6.18it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3789/4000 [14:04<00:34, 6.15it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3790/4000 [14:04<00:33, 6.19it/s] {'loss': 0.2655, 'grad_norm': 1.2295361757278442, 'learning_rate': 7.588149959712243e-07}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3790/4000 [14:04<00:33, 6.19it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3791/4000 [14:04<00:33, 6.20it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3792/4000 [14:04<00:33, 6.19it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3793/4000 [14:04<00:33, 6.21it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3794/4000 [14:05<00:33, 6.24it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3795/4000 [14:05<00:32, 6.21it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3796/4000 [14:05<00:32, 6.23it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3797/4000 [14:05<00:32, 6.21it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3798/4000 [14:05<00:32, 6.18it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 3799/4000 [14:05<00:32, 6.21it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3800/4000 [14:06<00:32, 6.21it/s] {'loss': 0.2518, 'grad_norm': 0.8794856667518616, 'learning_rate': 6.887554879846326e-07}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3800/4000 [14:06<00:32, 6.21it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3801/4000 [14:06<00:32, 6.20it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3802/4000 [14:06<00:31, 6.20it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3803/4000 [14:06<00:31, 6.19it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3804/4000 [14:06<00:31, 6.16it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3805/4000 [14:06<00:31, 6.14it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3806/4000 [14:07<00:31, 6.15it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3807/4000 [14:07<00:31, 6.17it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3808/4000 [14:07<00:31, 6.17it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3809/4000 [14:07<00:30, 6.21it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3810/4000 [14:07<00:30, 6.13it/s] {'loss': 0.2711, 'grad_norm': 0.8361702561378479, 'learning_rate': 6.220663380578861e-07}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3810/4000 [14:07<00:30, 6.13it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3811/4000 [14:07<00:31, 6.06it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3812/4000 [14:08<00:30, 6.08it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3813/4000 [14:08<00:30, 6.13it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3814/4000 [14:08<00:30, 6.13it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3815/4000 [14:08<00:30, 6.14it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3816/4000 [14:08<00:29, 6.14it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3817/4000 [14:08<00:29, 6.13it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3818/4000 [14:09<00:29, 6.14it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3819/4000 [14:09<00:29, 6.15it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3820/4000 [14:09<00:29, 6.14it/s] {'loss': 0.2922, 'grad_norm': 0.9113193154335022, 'learning_rate': 5.58752104305793e-07}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3820/4000 [14:09<00:29, 6.14it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3821/4000 [14:09<00:29, 6.13it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3822/4000 [14:09<00:29, 6.10it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3823/4000 [14:09<00:28, 6.15it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3824/4000 [14:10<00:28, 6.13it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3825/4000 [14:10<00:28, 6.18it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3826/4000 [14:10<00:28, 6.20it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3827/4000 [14:10<00:28, 6.14it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3828/4000 [14:10<00:28, 6.09it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3829/4000 [14:10<00:28, 6.10it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3830/4000 [14:11<00:27, 6.10it/s] {'loss': 0.2743, 'grad_norm': 0.888146698474884, 'learning_rate': 4.988171141721232e-07}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3830/4000 [14:11<00:27, 6.10it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3831/4000 [14:11<00:27, 6.13it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3832/4000 [14:11<00:27, 6.16it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3833/4000 [14:11<00:27, 6.13it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3834/4000 [14:11<00:27, 6.08it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3835/4000 [14:11<00:27, 6.08it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3836/4000 [14:11<00:26, 6.08it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3837/4000 [14:12<00:26, 6.11it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3838/4000 [14:12<00:26, 6.18it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3839/4000 [14:12<00:26, 6.09it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3840/4000 [14:12<00:26, 6.06it/s] {'loss': 0.2662, 'grad_norm': 1.067385196685791, 'learning_rate': 4.4226546413383974e-07}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3840/4000 [14:12<00:26, 6.06it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3841/4000 [14:12<00:26, 6.00it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3842/4000 [14:12<00:26, 6.00it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3843/4000 [14:13<00:26, 6.03it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3844/4000 [14:13<00:25, 6.04it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3845/4000 [14:13<00:25, 6.02it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3846/4000 [14:13<00:25, 6.02it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3847/4000 [14:13<00:25, 6.00it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3848/4000 [14:13<00:25, 5.94it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3849/4000 [14:14<00:25, 5.91it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3850/4000 [14:14<00:25, 5.88it/s] {'loss': 0.2653, 'grad_norm': 0.9755203127861023, 'learning_rate': 3.891010194211009e-07}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3850/4000 [14:14<00:25, 5.88it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3851/4000 [14:14<00:25, 5.83it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3852/4000 [14:14<00:25, 5.84it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3853/4000 [14:14<00:25, 5.83it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3854/4000 [14:15<00:25, 5.82it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3855/4000 [14:15<00:24, 5.81it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3856/4000 [14:15<00:24, 5.81it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3857/4000 [14:15<00:24, 5.82it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3858/4000 [14:15<00:24, 5.82it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3859/4000 [14:15<00:24, 5.83it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3860/4000 [14:16<00:24, 5.81it/s] {'loss': 0.2491, 'grad_norm': 0.9530524015426636, 'learning_rate': 3.393274137530877e-07}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3860/4000 [14:16<00:24, 5.81it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3861/4000 [14:16<00:23, 5.80it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3862/4000 [14:16<00:23, 5.83it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3863/4000 [14:16<00:23, 5.87it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3864/4000 [14:16<00:23, 5.89it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3865/4000 [14:16<00:22, 5.90it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3866/4000 [14:17<00:22, 5.89it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3867/4000 [14:17<00:22, 5.89it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3868/4000 [14:17<00:22, 5.87it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3869/4000 [14:17<00:22, 5.86it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3870/4000 [14:17<00:22, 5.88it/s] {'loss': 0.2624, 'grad_norm': 0.9093199968338013, 'learning_rate': 2.9294804908962525e-07}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3870/4000 [14:17<00:22, 5.88it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3871/4000 [14:17<00:21, 5.87it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3872/4000 [14:18<00:21, 5.87it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3873/4000 [14:18<00:21, 5.88it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3874/4000 [14:18<00:21, 5.88it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3875/4000 [14:18<00:21, 5.92it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3876/4000 [14:18<00:20, 5.91it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3877/4000 [14:18<00:20, 5.91it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3878/4000 [14:19<00:20, 5.92it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3879/4000 [14:19<00:20, 5.91it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3880/4000 [14:19<00:20, 5.92it/s] {'loss': 0.2705, 'grad_norm': 0.9156750440597534, 'learning_rate': 2.499660953986849e-07}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3880/4000 [14:19<00:20, 5.92it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3881/4000 [14:19<00:20, 5.92it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3882/4000 [14:19<00:20, 5.65it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3883/4000 [14:19<00:20, 5.63it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3884/4000 [14:20<00:20, 5.65it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3885/4000 [14:20<00:20, 5.73it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3886/4000 [14:20<00:19, 5.79it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3887/4000 [14:20<00:19, 5.84it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3888/4000 [14:20<00:19, 5.79it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3889/4000 [14:21<00:19, 5.83it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3890/4000 [14:21<00:18, 5.84it/s] {'loss': 0.2604, 'grad_norm': 0.721141517162323, 'learning_rate': 2.103844904397023e-07}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3890/4000 [14:21<00:18, 5.84it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3891/4000 [14:21<00:18, 5.83it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3892/4000 [14:21<00:18, 5.86it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3893/4000 [14:21<00:18, 5.86it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3894/4000 [14:21<00:18, 5.85it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3895/4000 [14:22<00:17, 5.89it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3896/4000 [14:22<00:17, 5.89it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3897/4000 [14:22<00:17, 5.91it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3898/4000 [14:22<00:17, 5.92it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3899/4000 [14:22<00:17, 5.91it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3900/4000 [14:22<00:16, 5.93it/s] {'loss': 0.2369, 'grad_norm': 0.925932765007019, 'learning_rate': 1.742059395627993e-07}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3900/4000 [14:22<00:16, 5.93it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3901/4000 [14:23<00:16, 5.94it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3902/4000 [14:23<00:16, 5.97it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3903/4000 [14:23<00:16, 5.96it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3904/4000 [14:23<00:16, 5.95it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3905/4000 [14:23<00:15, 5.94it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3906/4000 [14:23<00:15, 5.95it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3907/4000 [14:24<00:15, 5.97it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3908/4000 [14:24<00:15, 6.01it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3909/4000 [14:24<00:15, 5.91it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3910/4000 [14:24<00:15, 5.83it/s] {'loss': 0.2985, 'grad_norm': 1.0057785511016846, 'learning_rate': 1.414329155238703e-07}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3910/4000 [14:24<00:15, 5.83it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3911/4000 [14:24<00:15, 5.86it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3912/4000 [14:24<00:14, 5.92it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3913/4000 [14:25<00:14, 5.96it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3914/4000 [14:25<00:14, 6.02it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3915/4000 [14:25<00:14, 6.06it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3916/4000 [14:25<00:13, 6.09it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3917/4000 [14:26<00:23, 3.46it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3918/4000 [14:26<00:21, 3.77it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3919/4000 [14:26<00:19, 4.13it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3920/4000 [14:26<00:17, 4.53it/s] {'loss': 0.2799, 'grad_norm': 0.9957014322280884, 'learning_rate': 1.1206765831557886e-07}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3920/4000 [14:26<00:17, 4.53it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3921/4000 [14:26<00:16, 4.85it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3922/4000 [14:27<00:16, 4.74it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3923/4000 [14:27<00:15, 5.10it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3924/4000 [14:27<00:14, 5.39it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3925/4000 [14:27<00:13, 5.59it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3926/4000 [14:27<00:12, 5.80it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3927/4000 [14:27<00:12, 5.93it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3928/4000 [14:28<00:11, 6.03it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3929/4000 [14:28<00:11, 6.10it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3930/4000 [14:28<00:11, 6.15it/s] {'loss': 0.243, 'grad_norm': 1.024829626083374, 'learning_rate': 8.611217501423574e-08}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3930/4000 [14:28<00:11, 6.15it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3931/4000 [14:28<00:11, 6.19it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3932/4000 [14:28<00:10, 6.24it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3933/4000 [14:28<00:10, 6.26it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3934/4000 [14:29<00:10, 6.27it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3935/4000 [14:29<00:10, 6.29it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3936/4000 [14:29<00:10, 6.28it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3937/4000 [14:29<00:10, 6.29it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3938/4000 [14:29<00:09, 6.31it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3939/4000 [14:29<00:09, 6.31it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3940/4000 [14:29<00:09, 6.31it/s] {'loss': 0.2567, 'grad_norm': 0.8754726648330688, 'learning_rate': 6.356823964266401e-08}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3940/4000 [14:29<00:09, 6.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3941/4000 [14:30<00:09, 6.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3942/4000 [14:30<00:09, 6.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3943/4000 [14:30<00:09, 6.32it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3944/4000 [14:30<00:08, 6.33it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3945/4000 [14:30<00:08, 6.32it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3946/4000 [14:30<00:08, 6.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3947/4000 [14:31<00:08, 6.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3948/4000 [14:31<00:08, 6.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3949/4000 [14:31<00:08, 6.32it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3950/4000 [14:31<00:07, 6.34it/s] {'loss': 0.2581, 'grad_norm': 0.9914542436599731, 'learning_rate': 4.4437393048885056e-08}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3950/4000 [14:31<00:07, 6.34it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3951/4000 [14:31<00:07, 6.32it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3952/4000 [14:31<00:07, 6.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3953/4000 [14:32<00:07, 6.32it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3954/4000 [14:32<00:07, 6.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3955/4000 [14:32<00:07, 6.32it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3956/4000 [14:32<00:06, 6.33it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3957/4000 [14:32<00:06, 6.25it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3958/4000 [14:32<00:06, 6.24it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3959/4000 [14:32<00:06, 6.24it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3960/4000 [14:33<00:06, 6.26it/s] {'loss': 0.2811, 'grad_norm': 0.8681328892707825, 'learning_rate': 2.8720942800858353e-08}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3960/4000 [14:33<00:06, 6.26it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3961/4000 [14:33<00:06, 6.27it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3962/4000 [14:33<00:06, 6.28it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3963/4000 [14:33<00:05, 6.29it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3964/4000 [14:33<00:05, 6.28it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3965/4000 [14:33<00:05, 6.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3966/4000 [14:34<00:05, 6.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3967/4000 [14:34<00:05, 6.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3968/4000 [14:34<00:05, 6.32it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3969/4000 [14:34<00:04, 6.33it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3970/4000 [14:34<00:04, 6.32it/s] {'loss': 0.2493, 'grad_norm': 0.8622571229934692, 'learning_rate': 1.6419963097080715e-08}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3970/4000 [14:34<00:04, 6.32it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3971/4000 [14:34<00:04, 6.29it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3972/4000 [14:35<00:04, 6.29it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3973/4000 [14:35<00:04, 6.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3974/4000 [14:35<00:04, 6.28it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3975/4000 [14:35<00:03, 6.29it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3976/4000 [14:35<00:03, 6.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3977/4000 [14:35<00:03, 6.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3978/4000 [14:35<00:03, 6.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3979/4000 [14:36<00:03, 6.30it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3980/4000 [14:36<00:03, 6.31it/s] {'loss': 0.2487, 'grad_norm': 0.9945972561836243, 'learning_rate': 7.535294693172822e-09}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3980/4000 [14:36<00:03, 6.31it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3981/4000 [14:36<00:03, 6.30it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3982/4000 [14:36<00:02, 6.30it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3983/4000 [14:36<00:02, 6.31it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3984/4000 [14:36<00:02, 6.30it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3985/4000 [14:37<00:02, 6.31it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3986/4000 [14:37<00:02, 6.32it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3987/4000 [14:37<00:02, 6.31it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3988/4000 [14:37<00:01, 6.32it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3989/4000 [14:37<00:01, 6.34it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3990/4000 [14:37<00:01, 6.31it/s] {'loss': 0.2912, 'grad_norm': 1.0543948411941528, 'learning_rate': 2.0675448444251732e-09}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3990/4000 [14:37<00:01, 6.31it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3991/4000 [14:38<00:01, 6.24it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3992/4000 [14:38<00:01, 6.28it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3993/4000 [14:38<00:01, 6.28it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3994/4000 [14:38<00:00, 6.27it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3995/4000 [14:38<00:00, 6.30it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3996/4000 [14:38<00:00, 6.29it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3997/4000 [14:39<00:00, 6.30it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3998/4000 [14:39<00:00, 6.31it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3999/4000 [14:39<00:00, 6.32it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4000/4000 [14:39<00:00, 6.32it/s] {'loss': 0.2391, 'grad_norm': 1.032217025756836, 'learning_rate': 1.7087264264636914e-11}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4000/4000 [14:39<00:00, 6.32it/s]/home/ubuntu/Isaac-GR00T/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Copying experiment config directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/experiment_cfg to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-4000/experiment_cfg
Copying processor directory /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/processor to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-4000
Copying wandb_config.json from /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/wandb_config.json to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938/checkpoint-4000/wandb_config.json
{'train_runtime': 894.7924, 'train_samples_per_second': 71.525, 'train_steps_per_second': 4.47, 'train_loss': 0.6054690551757812}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4000/4000 [14:54<00:00, 6.32it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4000/4000 [14:54<00:00, 4.47it/s]
05/27/2026 10:45:14 - INFO - Model saved to /home/ubuntu/groot-files/checkpoints/g1_finetune-20260527-102938
05/27/2026 10:45:14 - INFO - Training completed!
wandb:
wandb: πŸš€ View run g1_finetune-20260527-102938 at: 
wandb: Find logs at: wandb/run-20260527_102947-11o59yla/logs