🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
Loading unsloth/Qwen3-4B-Instruct-2507...
INFO 04-25 09:59:05 [vllm_utils.py:724] Unsloth: Patching vLLM v1 graph capture
==((====))==  Unsloth 2026.4.8: Fast Qwen3 patching. Transformers: 4.56.2. vLLM: 0.15.1.
   \\   /|    NVIDIA L40S. Num GPUs = 1. Max memory: 44.392 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.1+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.1
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: FlashInfer requires JIT compilation but nvcc (CUDA compiler) is not found.
  vLLM will use FLASH_ATTN attention + PyTorch sampler instead (works fine).
  To enable FlashInfer, install the missing tools:
    nvcc  - install the CUDA toolkit or set CUDA_HOME to your CUDA installation
    ninja - pip install ninja
  To silence this warning: set UNSLOTH_VLLM_NO_FLASHINFER=1
Unsloth: vLLM loading unsloth/Qwen3-4B-Instruct-2507 with actual GPU utilization = 89.06%
Unsloth: Your GPU has CUDA compute capability 8.9 with VRAM = 44.39 GB.
Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 4096. Num Sequences = 96.
Unsloth: vLLM's KV Cache can use up to 32.5 GB. Also swap space = 6 GB.
Unsloth: Not an error, but `use_cudagraph` is not supported in vLLM.config.CompilationConfig. Skipping.
Unsloth: Not an error, but `use_inductor` is not supported in vLLM.config.CompilationConfig. Skipping.
WARNING 04-25 09:59:07 [compilation.py:762] Level is deprecated and will be removed in the next release,either 0.12.0 or 0.11.2 whichever is soonest.Use mode instead.If both level and mode are given,only mode will be used.
Unsloth: Not an error, but `device` is not supported in vLLM. Skipping.
/root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/pydantic/type_adapter.py:607: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [field_name='mode', input_value=3, input_type=int])
  return self.serializer.to_python(
INFO 04-25 09:59:07 [utils.py:261] non-default args: {'dtype': torch.bfloat16, 'max_model_len': 4096, 'enable_prefix_caching': True, 'swap_space': 6, 'gpu_memory_utilization': 0.8906117106477057, 'max_num_batched_tokens': 8192, 'max_num_seqs': 96, 'max_logprobs': 0, 'disable_log_stats': True, 'enable_lora': True, 'enable_chunked_prefill': True, 'compilation_config': {'level': 3, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': [], 'splitting_ops': None, 'compile_mm_encoder': False, 'compile_sizes': None, 'compile_ranges_split_points': None, 'inductor_compile_config': {'epilogue_fusion': True, 'max_autotune': False, 'shape_padding': True, 'trace.enabled': False, 'triton.cudagraphs': False, 'debug': False, 'dce': True, 'memory_planning': True, 'coordinate_descent_tuning': False, 'trace.graph_diagram': False, 'compile_threads': 8, 'group_fusion': True, 'disable_progress': False, 'verbose_progress': True, 'triton.multi_kernel': 0, 'triton.use_block_ptr': True, 'triton.enable_persistent_tma_matmul': True, 'triton.autotune_at_compile_time': False, 'triton.cooperative_reductions': False, 'cuda.compile_opt_level': '-O2', 'cuda.enable_cuda_lto': True, 'combo_kernels': False, 'benchmark_combo_kernel': True, 'combo_kernel_foreach_dynamic_shapes': True, 'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': None, 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': None, 'pass_config': {}, 'max_cudagraph_capture_size': None, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}, 'model': 'unsloth/Qwen3-4B-Instruct-2507'}
WARNING 04-25 09:59:07 [arg_utils.py:1220] The global random seed is set to 0. Since VLLM_ENABLE_V1_MULTIPROCESSING is set to False, this may affect the random state of the Python process that launched vLLM.
INFO 04-25 09:59:14 [model.py:541] Resolved architecture: Qwen3ForCausalLM
INFO 04-25 09:59:14 [model.py:1561] Using max model len 4096
INFO 04-25 09:59:15 [scheduler.py:226] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 04-25 09:59:15 [vllm.py:624] Asynchronous scheduling is enabled.


generation_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s][A
generation_config.json: 100%|██████████| 237/237 [00:00<00:00, 1.79MB/s]


tokenizer_config.json:   0%|          | 0.00/9.65k [00:00<?, ?B/s][A
tokenizer_config.json: 100%|██████████| 9.65k/9.65k [00:00<00:00, 60.1MB/s]


vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s][A
vocab.json: 100%|██████████| 2.78M/2.78M [00:00<00:00, 53.7MB/s]


merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s][A
merges.txt: 100%|██████████| 1.67M/1.67M [00:00<00:00, 83.0MB/s]


tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s][A
tokenizer.json: 100%|██████████| 11.4M/11.4M [00:00<00:00, 44.7MB/s]


added_tokens.json:   0%|          | 0.00/707 [00:00<?, ?B/s][A
added_tokens.json: 100%|██████████| 707/707 [00:00<00:00, 7.32MB/s]


special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s][A
special_tokens_map.json: 100%|██████████| 614/614 [00:00<00:00, 3.17MB/s]


chat_template.jinja:   0%|          | 0.00/4.04k [00:00<?, ?B/s][A
chat_template.jinja: 100%|██████████| 4.04k/4.04k [00:00<00:00, 43.4MB/s]
/root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/pydantic/type_adapter.py:607: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [field_name='mode', input_value=3, input_type=int])
  return self.serializer.to_python(
INFO 04-25 09:59:16 [core.py:96] Initializing a V1 LLM engine (v0.15.1) with config: model='unsloth/Qwen3-4B-Instruct-2507', speculative_config=None, tokenizer='unsloth/Qwen3-4B-Instruct-2507', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=unsloth/Qwen3-4B-Instruct-2507, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': 3, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [8192], 'inductor_compile_config': {'epilogue_fusion': True, 'max_autotune': False, 'shape_padding': True, 'trace.enabled': False, 'triton.cudagraphs': False, 'debug': False, 'dce': True, 'memory_planning': True, 'coordinate_descent_tuning': False, 'trace.graph_diagram': False, 'compile_threads': 8, 'group_fusion': True, 'disable_progress': False, 'verbose_progress': True, 'triton.multi_kernel': 0, 'triton.use_block_ptr': True, 'triton.enable_persistent_tma_matmul': True, 'triton.autotune_at_compile_time': False, 'triton.cooperative_reductions': False, 'cuda.compile_opt_level': '-O2', 'cuda.enable_cuda_lto': True, 'combo_kernels': False, 'benchmark_combo_kernel': True, 'combo_kernel_foreach_dynamic_shapes': True, 'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 192, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}
INFO 04-25 09:59:16 [parallel_state.py:1212] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.113.93.102:50843 backend=nccl
INFO 04-25 09:59:16 [parallel_state.py:1423] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
INFO 04-25 09:59:16 [gpu_model_runner.py:4033] Starting to load model unsloth/Qwen3-4B-Instruct-2507...
/root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
We recommend installing via `pip install torch-c-dlpack-ext`
  warnings.warn(
INFO 04-25 09:59:19 [cuda.py:364] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION')


model.safetensors.index.json:   0%|          | 0.00/32.9k [00:00<?, ?B/s][A
model.safetensors.index.json: 100%|██████████| 32.9k/32.9k [00:00<00:00, 120MB/s]


model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s][A

model-00001-of-00002.safetensors:   3%|▎         | 134M/4.97G [00:01<00:37, 128MB/s][A

model-00001-of-00002.safetensors:  31%|███       | 1.54G/4.97G [00:02<00:04, 753MB/s][A

model-00001-of-00002.safetensors:  53%|█████▎    | 2.61G/4.97G [00:03<00:02, 826MB/s][A
model-00001-of-00002.safetensors: 100%|██████████| 4.97G/4.97G [00:04<00:00, 1.24GB/s]


model-00002-of-00002.safetensors:   0%|          | 0.00/3.08G [00:00<?, ?B/s][A

model-00002-of-00002.safetensors:   0%|          | 0.00/3.08G [00:01<?, ?B/s][A

model-00002-of-00002.safetensors:  11%|█         | 332M/3.08G [00:02<00:09, 304MB/s][A

model-00002-of-00002.safetensors:  83%|████████▎ | 2.54G/3.08G [00:03<00:00, 1.25GB/s][A
model-00002-of-00002.safetensors: 100%|██████████| 3.08G/3.08G [00:04<00:00, 640MB/s] 
INFO 04-25 09:59:29 [weight_utils.py:527] Time spent downloading weights for unsloth/Qwen3-4B-Instruct-2507: 8.877664 seconds


Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
[A
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00,  2.75it/s]

INFO 04-25 09:59:29 [default_loader.py:291] Loading weights took 0.74 seconds
INFO 04-25 09:59:29 [punica_selector.py:20] Using PunicaWrapperGPU.
INFO 04-25 09:59:30 [gpu_model_runner.py:4130] Model loading took 7.67 GiB memory and 12.958485 seconds
INFO 04-25 09:59:42 [backends.py:812] Using cache directory: /root/.cache/vllm/torch_compile_cache/f6f5a6d496/rank_0_0/backbone for vLLM's torch.compile
INFO 04-25 09:59:42 [backends.py:872] Dynamo bytecode transform time: 11.11 s


Unsloth: Compiling kernels:   0%|          | 0/5 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/5 [00:00<?, ?it/s, triton_red_fused__to_copy_add_embedding_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  20%|██        | 1/5 [00:00<00:01,  3.47it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_1][A

Unsloth: Compiling kernels:  40%|████      | 2/5 [00:00<00:01,  2.12it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_2][A

Unsloth: Compiling kernels:  60%|██████    | 3/5 [00:00<00:00,  3.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_3][A

Unsloth: Compiling kernels:  80%|████████  | 4/5 [00:00<00:00,  4.21it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_4][A
Unsloth: Compiling kernels: 100%|██████████| 5/5 [00:00<00:00,  5.25it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_4]
INFO 04-25 09:59:52 [backends.py:302] Cache the graph of compile range (1, 8192) for later use


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 477.82it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 480.28it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 523.70it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 103.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 54.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 40.69it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 17.98it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 859.49it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 820.24it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 853.43it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 822.33it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 810.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 805.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 800.42it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 869.65it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 840.96it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 874.36it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 851.94it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 842.30it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 830.75it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 824.33it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 809.87it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 802.05it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 848.08it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 829.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 818.91it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 808.51it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 804.41it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 625.92it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 669.91it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 719.43it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 712.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 701.93it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 704.02it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 708.10it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 841.05it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 817.13it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 859.49it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 841.55it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 830.29it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 823.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 818.24it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 804.12it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 815.22it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 854.64it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 834.31it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 825.26it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 817.87it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 805.47it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 848.02it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 844.60it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 882.39it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 857.99it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 843.42it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 835.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 828.73it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 815.38it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 821.77it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 856.74it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 832.91it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 822.64it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 812.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 803.44it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 883.20it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 860.72it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 886.06it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 855.28it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 837.45it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 828.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 822.97it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 860.19it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 838.69it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 879.80it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 847.51it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 830.23it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 821.20it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 813.71it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 880.79it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 846.31it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 875.82it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 845.63it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 830.88it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 821.55it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 816.38it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 806.13it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 809.55it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 850.20it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 828.18it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 807.03it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 796.26it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 791.31it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 837.02it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 838.78it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 872.60it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 850.17it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 836.92it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 827.03it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 821.31it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 874.72it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 859.05it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 894.50it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 867.76it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 856.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 843.95it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 838.67it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 885.06it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 848.79it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 889.00it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 830.02it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 821.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 818.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 815.76it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 849.39it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 848.45it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 880.29it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 858.30it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 847.37it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 837.94it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 828.73it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 857.56it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 848.02it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 886.50it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 850.21it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 841.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 833.72it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 826.56it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 845.11it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 820.64it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 855.69it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 836.14it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 825.23it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 818.45it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 806.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 858.26it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 849.65it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 888.75it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 864.98it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 850.50it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 841.78it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 828.96it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 879.31it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 864.98it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 902.13it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 872.13it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 852.29it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 841.50it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 835.38it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 890.70it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 863.03it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 899.49it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 864.05it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 843.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 830.94it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 822.85it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 886.75it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 861.78it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 886.00it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 861.34it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 845.59it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 833.53it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 816.22it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 836.02it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 816.17it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 858.55it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 837.98it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 829.54it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 821.58it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 817.83it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 745.79it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 790.71it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 841.27it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 828.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 823.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 817.92it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 796.12it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 818.72it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 820.48it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 857.20it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 834.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 815.60it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 811.57it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 804.83it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 878.20it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 853.28it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 894.18it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 866.32it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 853.09it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 840.46it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 833.79it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 852.15it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 826.63it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 861.73it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 842.78it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 829.04it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 819.87it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 813.53it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 845.28it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 841.22it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 881.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 856.94it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 840.88it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 829.98it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 823.34it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 882.08it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 861.70it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 885.06it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 857.91it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 817.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 811.85it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 803.81it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 892.79it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 850.94it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 882.02it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 862.80it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 845.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 835.19it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 828.89it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 871.27it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 851.55it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 881.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 857.69it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 842.30it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 834.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 829.08it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 899.68it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 877.47it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 907.53it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 870.87it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 856.89it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 848.05it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 840.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 826.63it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 825.49it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 874.66it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 855.59it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 845.46it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 836.19it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 827.96it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  14%|█▍        | 1/7 [00:00<00:00, 837.02it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  29%|██▊       | 2/7 [00:00<00:00, 833.94it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

Unsloth: Compiling kernels:  43%|████▎     | 3/7 [00:00<00:00, 879.99it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

Unsloth: Compiling kernels:  57%|█████▋    | 4/7 [00:00<00:00, 855.81it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

Unsloth: Compiling kernels:  71%|███████▏  | 5/7 [00:00<00:00, 845.28it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

Unsloth: Compiling kernels:  86%|████████▌ | 6/7 [00:00<00:00, 837.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
Unsloth: Compiling kernels: 100%|██████████| 7/7 [00:00<00:00, 831.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


Unsloth: Compiling kernels:   0%|          | 0/3 [00:00<?, ?it/s][A

Unsloth: Compiling kernels:   0%|          | 0/3 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

Unsloth: Compiling kernels:  33%|███▎      | 1/3 [00:00<00:00, 711.62it/s, triton_poi_fused_mul_silu_slice_1]        [A

Unsloth: Compiling kernels:  67%|██████▋   | 2/3 [00:00<00:00, 803.35it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A
Unsloth: Compiling kernels: 100%|██████████| 3/3 [00:00<00:00, 20.50it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2] 
INFO 04-25 10:00:01 [backends.py:319] Compiling a graph for compile range (1, 8192) takes 12.73 s
INFO 04-25 10:00:01 [monitor.py:34] torch.compile takes 23.84 s in total
INFO 04-25 10:00:02 [gpu_worker.py:356] Available KV cache memory: 31.08 GiB
INFO 04-25 10:00:02 [kv_cache_utils.py:1307] GPU KV cache size: 226,336 tokens
INFO 04-25 10:00:02 [kv_cache_utils.py:1312] Maximum concurrency for 4,096 tokens per request: 55.26x
INFO 04-25 10:00:02 [vllm_utils.py:729] Unsloth: Running patched vLLM v1 `capture_model`.


Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   0%|          | 0/54 [00:00<?, ?it/s][AWARNING 04-25 10:00:03 [utils.py:268] Using default LoRA kernel configs


Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   2%|▏         | 1/54 [00:02<02:05,  2.37s/it][A

Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   7%|▋         | 4/54 [00:03<00:41,  1.22it/s][A

Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  28%|██▊       | 15/54 [00:04<00:09,  4.23it/s][A

Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  39%|███▉      | 21/54 [00:07<00:09,  3.43it/s][A

Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  61%|██████    | 33/54 [00:08<00:03,  5.37it/s][A

Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  83%|████████▎ | 45/54 [00:09<00:01,  6.87it/s][A

Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 54/54 [00:12<00:00,  4.61it/s][A
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 54/54 [00:12<00:00,  4.27it/s]


Capturing CUDA graphs (decode, FULL):   0%|          | 0/30 [00:00<?, ?it/s][A

Capturing CUDA graphs (decode, FULL):  37%|███▋      | 11/30 [00:01<00:01, 10.43it/s][A

Capturing CUDA graphs (decode, FULL):  77%|███████▋  | 23/30 [00:02<00:00, 11.21it/s][A
Capturing CUDA graphs (decode, FULL): 100%|██████████| 30/30 [00:02<00:00, 10.98it/s]
INFO 04-25 10:00:18 [gpu_model_runner.py:5063] Graph capturing finished in 15 secs, took 0.69 GiB
INFO 04-25 10:00:18 [vllm_utils.py:736] Unsloth: Patched vLLM v1 graph capture finished in 15 secs.
INFO 04-25 10:00:19 [core.py:272] init engine (profile, create kv cache, warmup model) took 48.93 seconds
INFO 04-25 10:00:21 [llm.py:343] Supported tasks: ('generate',)
Unsloth: Just some info: will skip parsing ['layer_norm2', 'q_norm', 'post_attention_layernorm', 'norm', 'input_layernorm', 'ffn_norm', 'post_layernorm', 'norm1', 'layer_norm1', 'attention_norm', 'post_feedforward_layernorm', 'k_norm', 'norm2', 'pre_feedforward_layernorm']


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s][A
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 46.31it/s]
Some weights of Qwen3ForCausalLM were not initialized from the model checkpoint at unsloth/Qwen3-4B-Instruct-2507 and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Performing substitution for additional_keys=set()
Unsloth: Just some info: will skip parsing ['layer_norm2', 'q_norm', 'post_attention_layernorm', 'norm', 'input_layernorm', 'cross_attn_input_layernorm', 'ffn_norm', 'post_layernorm', 'norm1', 'cross_attn_post_attention_layernorm', 'layer_norm1', 'attention_norm', 'post_feedforward_layernorm', 'k_norm', 'norm2', 'pre_feedforward_layernorm']
unsloth/Qwen3-4B-Instruct-2507 does not have a padding token! Will use pad_token = <|PAD_TOKEN|>.
Unsloth 2026.4.8 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers.
VRAM allocated: 41.84 GB

══ SFT warm-start — sft_traces/traces_v2.jsonl ══
  120 SFT examples loaded (chat format in `text`)


Unsloth: Tokenizing ["text"] (num_proc=12):   0%|          | 0/120 [00:00<?, ? examples/s][A

Unsloth: Tokenizing ["text"] (num_proc=12):   8%|▊         | 10/120 [00:01<00:13,  8.20 examples/s][A

Unsloth: Tokenizing ["text"] (num_proc=12):  92%|█████████▏| 110/120 [00:02<00:00, 56.30 examples/s][A
Unsloth: Tokenizing ["text"] (num_proc=12): 100%|██████████| 120/120 [00:02<00:00, 46.73 examples/s]
🦥 Unsloth: Padding-free auto-enabled, enabling faster training.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 120 | Num Epochs = 10 | Total steps = 150
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)


  0%|          | 0/150 [00:00<?, ?it/s][AUnsloth: Will smartly offload gradients to save VRAM!


  1%|          | 1/150 [00:04<10:16,  4.14s/it][A

  1%|▏         | 2/150 [00:05<06:02,  2.45s/it][A

  2%|▏         | 3/150 [00:06<04:42,  1.92s/it][A

  3%|▎         | 4/150 [00:08<04:26,  1.82s/it][A

  3%|▎         | 5/150 [00:09<04:02,  1.67s/it][A

                                               
[A{'loss': 3.6266, 'grad_norm': 2.663548231124878, 'learning_rate': 2.5e-05, 'epoch': 0.33}


  3%|▎         | 5/150 [00:09<04:02,  1.67s/it][A

  4%|▍         | 6/150 [00:11<03:41,  1.54s/it][A

  5%|▍         | 7/150 [00:12<03:28,  1.46s/it][A

  5%|▌         | 8/150 [00:13<03:18,  1.40s/it][A

  6%|▌         | 9/150 [00:14<03:12,  1.36s/it][A

  7%|▋         | 10/150 [00:16<03:07,  1.34s/it][A

                                                
[A{'loss': 3.3225, 'grad_norm': 2.001558542251587, 'learning_rate': 4.9647887323943665e-05, 'epoch': 0.67}


  7%|▋         | 10/150 [00:16<03:07,  1.34s/it][A

  7%|▋         | 11/150 [00:17<03:04,  1.33s/it][A

  8%|▊         | 12/150 [00:18<03:00,  1.31s/it][A

  9%|▊         | 13/150 [00:20<02:58,  1.30s/it][A

  9%|▉         | 14/150 [00:21<02:56,  1.30s/it][A

 10%|█         | 15/150 [00:22<02:55,  1.30s/it][A

                                                
[A{'loss': 2.7371, 'grad_norm': 0.9380167722702026, 'learning_rate': 4.788732394366197e-05, 'epoch': 1.0}


 10%|█         | 15/150 [00:22<02:55,  1.30s/it][A

 11%|█         | 16/150 [00:23<02:53,  1.30s/it][A

 11%|█▏        | 17/150 [00:25<02:52,  1.30s/it][A

 12%|█▏        | 18/150 [00:26<02:50,  1.29s/it][A

 13%|█▎        | 19/150 [00:27<02:49,  1.29s/it][A

 13%|█▎        | 20/150 [00:29<02:48,  1.30s/it][A

                                                
[A{'loss': 2.365, 'grad_norm': 0.8978179693222046, 'learning_rate': 4.6126760563380286e-05, 'epoch': 1.33}


 13%|█▎        | 20/150 [00:29<02:48,  1.30s/it][A

 14%|█▍        | 21/150 [00:30<02:47,  1.29s/it][A

 15%|█▍        | 22/150 [00:31<02:45,  1.29s/it][A

 15%|█▌        | 23/150 [00:32<02:43,  1.29s/it][A

 16%|█▌        | 24/150 [00:34<02:41,  1.28s/it][A

 17%|█▋        | 25/150 [00:35<02:39,  1.28s/it][A

                                                
[A{'loss': 2.0451, 'grad_norm': 0.9256548285484314, 'learning_rate': 4.436619718309859e-05, 'epoch': 1.67}


 17%|█▋        | 25/150 [00:35<02:39,  1.28s/it][A

 17%|█▋        | 26/150 [00:36<02:38,  1.28s/it][A

 18%|█▊        | 27/150 [00:38<02:38,  1.29s/it][A

 19%|█▊        | 28/150 [00:39<02:36,  1.29s/it][A

 19%|█▉        | 29/150 [00:40<02:35,  1.28s/it][A

 20%|██        | 30/150 [00:41<02:34,  1.28s/it][A

                                                
[A{'loss': 1.7249, 'grad_norm': 0.8666767477989197, 'learning_rate': 4.26056338028169e-05, 'epoch': 2.0}


 20%|██        | 30/150 [00:41<02:34,  1.28s/it][A

 21%|██        | 31/150 [00:43<02:32,  1.28s/it][A

 21%|██▏       | 32/150 [00:44<02:32,  1.29s/it][A

 22%|██▏       | 33/150 [00:45<02:30,  1.29s/it][A

 23%|██▎       | 34/150 [00:47<02:29,  1.29s/it][A

 23%|██▎       | 35/150 [00:48<02:28,  1.29s/it][A

                                                
[A{'loss': 1.4079, 'grad_norm': 1.2116891145706177, 'learning_rate': 4.0845070422535214e-05, 'epoch': 2.33}


 23%|██▎       | 35/150 [00:48<02:28,  1.29s/it][A

 24%|██▍       | 36/150 [00:49<02:26,  1.29s/it][A

 25%|██▍       | 37/150 [00:50<02:24,  1.28s/it][A

 25%|██▌       | 38/150 [00:52<02:24,  1.29s/it][A

 26%|██▌       | 39/150 [00:53<02:22,  1.29s/it][A

 27%|██▋       | 40/150 [00:54<02:21,  1.29s/it][A

                                                
[A{'loss': 1.1155, 'grad_norm': 0.8696402311325073, 'learning_rate': 3.908450704225352e-05, 'epoch': 2.67}


 27%|██▋       | 40/150 [00:54<02:21,  1.29s/it][A

 27%|██▋       | 41/150 [00:56<02:20,  1.29s/it][A

 28%|██▊       | 42/150 [00:57<02:19,  1.29s/it][A

 29%|██▊       | 43/150 [00:58<02:17,  1.29s/it][A

 29%|██▉       | 44/150 [01:00<02:17,  1.29s/it][A

 30%|███       | 45/150 [01:01<02:14,  1.29s/it][A

                                                
[A{'loss': 0.9477, 'grad_norm': 0.5664961338043213, 'learning_rate': 3.7323943661971835e-05, 'epoch': 3.0}


 30%|███       | 45/150 [01:01<02:14,  1.29s/it][A

 31%|███       | 46/150 [01:02<02:13,  1.28s/it][A

 31%|███▏      | 47/150 [01:03<02:11,  1.28s/it][A

 32%|███▏      | 48/150 [01:05<02:10,  1.28s/it][A

 33%|███▎      | 49/150 [01:06<02:09,  1.29s/it][A

 33%|███▎      | 50/150 [01:07<02:09,  1.30s/it][A

                                                
[A{'loss': 0.8914, 'grad_norm': 0.4789012372493744, 'learning_rate': 3.556338028169014e-05, 'epoch': 3.33}


 33%|███▎      | 50/150 [01:07<02:09,  1.30s/it][A

 34%|███▍      | 51/150 [01:09<02:07,  1.29s/it][A

 35%|███▍      | 52/150 [01:10<02:06,  1.29s/it][A

 35%|███▌      | 53/150 [01:11<02:04,  1.29s/it][A

 36%|███▌      | 54/150 [01:12<02:03,  1.29s/it][A

 37%|███▋      | 55/150 [01:14<02:02,  1.29s/it][A

                                                
[A{'loss': 0.8417, 'grad_norm': 0.3655957579612732, 'learning_rate': 3.380281690140845e-05, 'epoch': 3.67}


 37%|███▋      | 55/150 [01:14<02:02,  1.29s/it][A

 37%|███▋      | 56/150 [01:15<02:00,  1.29s/it][A

 38%|███▊      | 57/150 [01:16<01:59,  1.29s/it][A

 39%|███▊      | 58/150 [01:18<01:59,  1.30s/it][A

 39%|███▉      | 59/150 [01:19<01:58,  1.30s/it][A

 40%|████      | 60/150 [01:20<01:56,  1.30s/it][A

                                                
[A{'loss': 0.8088, 'grad_norm': 0.36159124970436096, 'learning_rate': 3.204225352112676e-05, 'epoch': 4.0}


 40%|████      | 60/150 [01:20<01:56,  1.30s/it][A

 41%|████      | 61/150 [01:21<01:56,  1.30s/it][A

 41%|████▏     | 62/150 [01:23<01:54,  1.30s/it][A

 42%|████▏     | 63/150 [01:24<01:53,  1.30s/it][A

 43%|████▎     | 64/150 [01:25<01:51,  1.30s/it][A

 43%|████▎     | 65/150 [01:27<01:50,  1.30s/it][A

                                                
[A{'loss': 0.7978, 'grad_norm': 0.3379436433315277, 'learning_rate': 3.028169014084507e-05, 'epoch': 4.33}


 43%|████▎     | 65/150 [01:27<01:50,  1.30s/it][A

 44%|████▍     | 66/150 [01:28<01:49,  1.30s/it][A

 45%|████▍     | 67/150 [01:29<01:47,  1.30s/it][A

 45%|████▌     | 68/150 [01:31<01:46,  1.29s/it][A

 46%|████▌     | 69/150 [01:32<01:44,  1.29s/it][A

 47%|████▋     | 70/150 [01:33<01:43,  1.29s/it][A

                                                
[A{'loss': 0.7577, 'grad_norm': 0.3583666682243347, 'learning_rate': 2.8521126760563384e-05, 'epoch': 4.67}


 47%|████▋     | 70/150 [01:33<01:43,  1.29s/it][A

 47%|████▋     | 71/150 [01:34<01:42,  1.29s/it][A

 48%|████▊     | 72/150 [01:36<01:41,  1.30s/it][A

 49%|████▊     | 73/150 [01:37<01:40,  1.31s/it][A

 49%|████▉     | 74/150 [01:38<01:39,  1.30s/it][A

 50%|█████     | 75/150 [01:40<01:37,  1.30s/it][A

                                                
[A{'loss': 0.7794, 'grad_norm': 0.33592215180397034, 'learning_rate': 2.676056338028169e-05, 'epoch': 5.0}


 50%|█████     | 75/150 [01:40<01:37,  1.30s/it][A

 51%|█████     | 76/150 [01:41<01:36,  1.30s/it][A

 51%|█████▏    | 77/150 [01:42<01:35,  1.30s/it][A

 52%|█████▏    | 78/150 [01:44<01:33,  1.30s/it][A

 53%|█████▎    | 79/150 [01:45<01:32,  1.30s/it][A

 53%|█████▎    | 80/150 [01:46<01:30,  1.30s/it][A

                                                
[A{'loss': 0.7684, 'grad_norm': 0.3456568121910095, 'learning_rate': 2.5e-05, 'epoch': 5.33}


 53%|█████▎    | 80/150 [01:46<01:30,  1.30s/it][A

 54%|█████▍    | 81/150 [01:47<01:29,  1.30s/it][A

 55%|█████▍    | 82/150 [01:49<01:28,  1.30s/it][A

 55%|█████▌    | 83/150 [01:50<01:26,  1.29s/it][A

 56%|█████▌    | 84/150 [01:51<01:25,  1.29s/it][A

 57%|█████▋    | 85/150 [01:53<01:23,  1.29s/it][A

                                                
[A{'loss': 0.7243, 'grad_norm': 0.33662667870521545, 'learning_rate': 2.323943661971831e-05, 'epoch': 5.67}


 57%|█████▋    | 85/150 [01:53<01:23,  1.29s/it][A

 57%|█████▋    | 86/150 [01:54<01:22,  1.29s/it][A

 58%|█████▊    | 87/150 [01:55<01:20,  1.28s/it][A

 59%|█████▊    | 88/150 [01:56<01:19,  1.29s/it][A

 59%|█████▉    | 89/150 [01:58<01:18,  1.29s/it][A

 60%|██████    | 90/150 [01:59<01:17,  1.29s/it][A

                                                
[A{'loss': 0.7285, 'grad_norm': 0.3644108772277832, 'learning_rate': 2.147887323943662e-05, 'epoch': 6.0}


 60%|██████    | 90/150 [01:59<01:17,  1.29s/it][A

 61%|██████    | 91/150 [02:00<01:16,  1.29s/it][A

 61%|██████▏   | 92/150 [02:02<01:15,  1.30s/it][A

 62%|██████▏   | 93/150 [02:03<01:13,  1.29s/it][A

 63%|██████▎   | 94/150 [02:04<01:12,  1.29s/it][A

 63%|██████▎   | 95/150 [02:05<01:10,  1.29s/it][A

                                                
[A{'loss': 0.7192, 'grad_norm': 0.35359156131744385, 'learning_rate': 1.971830985915493e-05, 'epoch': 6.33}


 63%|██████▎   | 95/150 [02:05<01:10,  1.29s/it][A

 64%|██████▍   | 96/150 [02:07<01:10,  1.30s/it][A

 65%|██████▍   | 97/150 [02:08<01:08,  1.30s/it][A

 65%|██████▌   | 98/150 [02:09<01:07,  1.30s/it][A

 66%|██████▌   | 99/150 [02:11<01:06,  1.30s/it][A

 67%|██████▋   | 100/150 [02:12<01:05,  1.30s/it][A

                                                 
[A{'loss': 0.7025, 'grad_norm': 0.3457960784435272, 'learning_rate': 1.7957746478873243e-05, 'epoch': 6.67}


 67%|██████▋   | 100/150 [02:12<01:05,  1.30s/it][A

 67%|██████▋   | 101/150 [02:13<01:03,  1.30s/it][A

 68%|██████▊   | 102/150 [02:15<01:02,  1.30s/it][A

 69%|██████▊   | 103/150 [02:16<01:00,  1.29s/it][A

 69%|██████▉   | 104/150 [02:17<00:59,  1.30s/it][A

 70%|███████   | 105/150 [02:18<00:58,  1.30s/it][A

                                                 
[A{'loss': 0.7215, 'grad_norm': 0.3716900646686554, 'learning_rate': 1.619718309859155e-05, 'epoch': 7.0}


 70%|███████   | 105/150 [02:18<00:58,  1.30s/it][A

 71%|███████   | 106/150 [02:20<00:57,  1.30s/it][A

 71%|███████▏  | 107/150 [02:21<00:56,  1.30s/it][A

 72%|███████▏  | 108/150 [02:22<00:54,  1.30s/it][A

 73%|███████▎  | 109/150 [02:24<00:53,  1.30s/it][A

 73%|███████▎  | 110/150 [02:25<00:51,  1.30s/it][A

                                                 
[A{'loss': 0.6965, 'grad_norm': 0.35728198289871216, 'learning_rate': 1.443661971830986e-05, 'epoch': 7.33}


 73%|███████▎  | 110/150 [02:25<00:51,  1.30s/it][A

 74%|███████▍  | 111/150 [02:26<00:50,  1.30s/it][A

 75%|███████▍  | 112/150 [02:28<00:49,  1.30s/it][A

 75%|███████▌  | 113/150 [02:29<00:48,  1.30s/it][A

 76%|███████▌  | 114/150 [02:30<00:46,  1.30s/it][A

 77%|███████▋  | 115/150 [02:31<00:45,  1.30s/it][A

                                                 
[A{'loss': 0.701, 'grad_norm': 0.3863743245601654, 'learning_rate': 1.267605633802817e-05, 'epoch': 7.67}


 77%|███████▋  | 115/150 [02:31<00:45,  1.30s/it][A

 77%|███████▋  | 116/150 [02:33<00:44,  1.30s/it][A

 78%|███████▊  | 117/150 [02:34<00:42,  1.30s/it][A

 79%|███████▊  | 118/150 [02:35<00:41,  1.30s/it][A

 79%|███████▉  | 119/150 [02:37<00:40,  1.31s/it][A

 80%|████████  | 120/150 [02:38<00:39,  1.31s/it][A

                                                 
[A{'loss': 0.691, 'grad_norm': 0.38696053624153137, 'learning_rate': 1.0915492957746478e-05, 'epoch': 8.0}


 80%|████████  | 120/150 [02:38<00:39,  1.31s/it][A

 81%|████████  | 121/150 [02:39<00:38,  1.32s/it][A

 81%|████████▏ | 122/150 [02:41<00:36,  1.31s/it][A

 82%|████████▏ | 123/150 [02:42<00:35,  1.31s/it][A

 83%|████████▎ | 124/150 [02:43<00:33,  1.31s/it][A

 83%|████████▎ | 125/150 [02:45<00:32,  1.30s/it][A

                                                 
[A{'loss': 0.6836, 'grad_norm': 0.3782326579093933, 'learning_rate': 9.15492957746479e-06, 'epoch': 8.33}


 83%|████████▎ | 125/150 [02:45<00:32,  1.30s/it][A

 84%|████████▍ | 126/150 [02:46<00:31,  1.30s/it][A

 85%|████████▍ | 127/150 [02:47<00:30,  1.31s/it][A

 85%|████████▌ | 128/150 [02:48<00:28,  1.30s/it][A

 86%|████████▌ | 129/150 [02:50<00:27,  1.30s/it][A

 87%|████████▋ | 130/150 [02:51<00:26,  1.30s/it][A

                                                 
[A{'loss': 0.6819, 'grad_norm': 0.3920275866985321, 'learning_rate': 7.394366197183099e-06, 'epoch': 8.67}


 87%|████████▋ | 130/150 [02:51<00:26,  1.30s/it][A

 87%|████████▋ | 131/150 [02:52<00:24,  1.30s/it][A

 88%|████████▊ | 132/150 [02:54<00:23,  1.29s/it][A

 89%|████████▊ | 133/150 [02:55<00:22,  1.30s/it][A

 89%|████████▉ | 134/150 [02:56<00:20,  1.30s/it][A

 90%|█████████ | 135/150 [02:58<00:19,  1.30s/it][A

                                                 
[A{'loss': 0.6833, 'grad_norm': 0.37108415365219116, 'learning_rate': 5.6338028169014084e-06, 'epoch': 9.0}


 90%|█████████ | 135/150 [02:58<00:19,  1.30s/it][A

 91%|█████████ | 136/150 [02:59<00:18,  1.30s/it][A

 91%|█████████▏| 137/150 [03:00<00:16,  1.30s/it][A

 92%|█████████▏| 138/150 [03:01<00:15,  1.30s/it][A

 93%|█████████▎| 139/150 [03:03<00:14,  1.29s/it][A

 93%|█████████▎| 140/150 [03:04<00:12,  1.29s/it][A

                                                 
[A{'loss': 0.6688, 'grad_norm': 0.3897058367729187, 'learning_rate': 3.873239436619718e-06, 'epoch': 9.33}


 93%|█████████▎| 140/150 [03:04<00:12,  1.29s/it][A

 94%|█████████▍| 141/150 [03:05<00:11,  1.29s/it][A

 95%|█████████▍| 142/150 [03:07<00:10,  1.29s/it][A

 95%|█████████▌| 143/150 [03:08<00:09,  1.29s/it][A

 96%|█████████▌| 144/150 [03:09<00:07,  1.30s/it][A

 97%|█████████▋| 145/150 [03:10<00:06,  1.30s/it][A

                                                 
[A{'loss': 0.6744, 'grad_norm': 0.3871634006500244, 'learning_rate': 2.112676056338028e-06, 'epoch': 9.67}


 97%|█████████▋| 145/150 [03:10<00:06,  1.30s/it][A

 97%|█████████▋| 146/150 [03:12<00:05,  1.30s/it][A

 98%|█████████▊| 147/150 [03:13<00:03,  1.30s/it][A

 99%|█████████▊| 148/150 [03:14<00:02,  1.33s/it][A

 99%|█████████▉| 149/150 [03:16<00:01,  1.32s/it][A

100%|██████████| 150/150 [03:17<00:00,  1.31s/it][A

                                                 
[A{'loss': 0.6839, 'grad_norm': 0.40198108553886414, 'learning_rate': 3.5211267605633803e-07, 'epoch': 10.0}


100%|██████████| 150/150 [03:17<00:00,  1.31s/it][A

                                                 
[A{'train_runtime': 198.1987, 'train_samples_per_second': 6.055, 'train_steps_per_second': 0.757, 'train_loss': 1.1565947977701823, 'epoch': 10.0}


100%|██████████| 150/150 [03:18<00:00,  1.31s/it][A
100%|██████████| 150/150 [03:18<00:00,  1.32s/it]
  SFT done in 3.3 min

══ Pre-GRPO hold-out eval (SFT-only) ══

  [diagnostic] seed=100 raw completion (first 500 chars):
  <tool_call>
  1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the immediate 3-year cumulative real return is heavily penalized. The 12-quarter lockup forces a defensive tilt. 3rd-order: The fragmentation of global supply chains acts as a massive structural headwind for TECH and GREEN. The base case 
  [parse_action result]: metadata={} weights=[0.0, 0.4, 0.0, 0.2, 0.4] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='fragmentation'

── Hold-out eval (5/5 valid) ──
  mean regret: -0.2516
  beat baseline: 0/5

══ GRPO Phase 1: 4Q episodes, 50 iters, rewards=['format', 'regret'] ══
Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it.
Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 200 | Num Epochs = 1 | Total steps = 50
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 1 x 1) = 4
 "-____-"     Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)


  0%|          | 0/50 [00:00<?, ?it/s][AWARNING 04-25 10:04:33 [input_processor.py:287] vLLM has deprecated support for supporting different tokenizers for different LoRAs. By default, vLLM uses base model's tokenizer. If you are using a LoRA with its own tokenizer, consider specifying `--tokenizer [lora_path]` to use the LoRA tokenizer.
Unsloth: Will smartly offload gradients to save VRAM!


  2%|▏         | 1/50 [00:14<11:30, 14.08s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 1996.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  2%|▏         | 1/50 [00:14<11:30, 14.08s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 4044.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  4%|▍         | 2/50 [00:14<11:15, 14.08s/it][A

  6%|▌         | 3/50 [00:15<03:16,  4.18s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 6092.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  6%|▌         | 3/50 [00:15<03:16,  4.18s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 8140.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


  8%|▊         | 4/50 [00:16<03:12,  4.18s/it][A

 10%|█         | 5/50 [00:16<01:44,  2.33s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 10224.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 10%|█         | 5/50 [00:16<01:44,  2.33s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 12248.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 12%|█▏        | 6/50 [00:17<01:42,  2.33s/it][A

 14%|█▍        | 7/50 [00:17<01:07,  1.58s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.888888888888889e-06, 'num_tokens': 14296.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 14%|█▍        | 7/50 [00:17<01:07,  1.58s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.777777777777778e-06, 'num_tokens': 16348.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 16%|█▌        | 8/50 [00:18<01:06,  1.58s/it][A

 18%|█▊        | 9/50 [00:18<00:49,  1.20s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.666666666666667e-06, 'num_tokens': 18396.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 18%|█▊        | 9/50 [00:18<00:49,  1.20s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.555555555555556e-06, 'num_tokens': 20444.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 20%|██        | 10/50 [00:19<00:48,  1.20s/it][A

 22%|██▏       | 11/50 [00:20<00:38,  1.02it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 22492.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 22%|██▏       | 11/50 [00:20<00:38,  1.02it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.333333333333334e-06, 'num_tokens': 24544.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 24%|██▍       | 12/50 [00:20<00:37,  1.02it/s][A

 26%|██▌       | 13/50 [00:21<00:34,  1.07it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.222222222222223e-06, 'num_tokens': 26592.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 26%|██▌       | 13/50 [00:21<00:34,  1.07it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.111111111111111e-06, 'num_tokens': 28640.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 28%|██▊       | 14/50 [00:22<00:33,  1.07it/s][A

 30%|███       | 15/50 [00:22<00:28,  1.23it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 30688.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 30%|███       | 15/50 [00:22<00:28,  1.23it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 32736.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 32%|███▏      | 16/50 [00:23<00:27,  1.23it/s][A

 34%|███▍      | 17/50 [00:24<00:24,  1.36it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.777777777777778e-06, 'num_tokens': 34784.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 34%|███▍      | 17/50 [00:24<00:24,  1.36it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6666666666666666e-06, 'num_tokens': 36832.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 36%|███▌      | 18/50 [00:24<00:23,  1.36it/s][A

 38%|███▊      | 19/50 [00:25<00:21,  1.46it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.555555555555556e-06, 'num_tokens': 38884.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 38%|███▊      | 19/50 [00:25<00:21,  1.46it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.444444444444445e-06, 'num_tokens': 40908.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 40%|████      | 20/50 [00:25<00:20,  1.46it/s][A

 42%|████▏     | 21/50 [00:26<00:18,  1.55it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 42904.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 42%|████▏     | 21/50 [00:26<00:18,  1.55it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2222222222222227e-06, 'num_tokens': 44988.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 44%|████▍     | 22/50 [00:26<00:18,  1.55it/s][A

 46%|████▌     | 23/50 [00:27<00:16,  1.60it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1111111111111116e-06, 'num_tokens': 46984.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 46%|████▌     | 23/50 [00:27<00:16,  1.60it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 49008.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 48%|████▊     | 24/50 [00:28<00:16,  1.60it/s][A

 50%|█████     | 25/50 [00:29<00:17,  1.44it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.888888888888889e-06, 'num_tokens': 51092.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 50%|█████     | 25/50 [00:29<00:17,  1.44it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 53176.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 52%|█████▏    | 26/50 [00:29<00:16,  1.44it/s][A

 54%|█████▍    | 27/50 [00:30<00:15,  1.52it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.666666666666667e-06, 'num_tokens': 55172.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 54%|█████▍    | 27/50 [00:30<00:15,  1.52it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5555555555555557e-06, 'num_tokens': 57224.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 56%|█████▌    | 28/50 [00:30<00:14,  1.52it/s][A

 58%|█████▊    | 29/50 [00:31<00:13,  1.58it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4444444444444447e-06, 'num_tokens': 59308.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 58%|█████▊    | 29/50 [00:31<00:13,  1.58it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.3333333333333336e-06, 'num_tokens': 61356.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 60%|██████    | 30/50 [00:32<00:12,  1.58it/s][A

 62%|██████▏   | 31/50 [00:32<00:11,  1.63it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 63440.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 62%|██████▏   | 31/50 [00:32<00:11,  1.63it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.1111111111111114e-06, 'num_tokens': 65488.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 64%|██████▍   | 32/50 [00:33<00:11,  1.63it/s][A

 66%|██████▌   | 33/50 [00:33<00:10,  1.67it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 67512.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


 66%|██████▌   | 33/50 [00:33<00:10,  1.67it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.888888888888889e-06, 'num_tokens': 69564.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


 68%|██████▊   | 34/50 [00:34<00:09,  1.67it/s][A

 70%|███████   | 35/50 [00:34<00:08,  1.70it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.777777777777778e-06, 'num_tokens': 71560.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


 70%|███████   | 35/50 [00:34<00:08,  1.70it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 73608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.18}


 72%|███████▏  | 36/50 [00:35<00:08,  1.70it/s][A

 74%|███████▍  | 37/50 [00:37<00:09,  1.35it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5555555555555558e-06, 'num_tokens': 75692.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.18}


 74%|███████▍  | 37/50 [00:37<00:09,  1.35it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4444444444444445e-06, 'num_tokens': 77740.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.19}


 76%|███████▌  | 38/50 [00:37<00:08,  1.35it/s][A

 78%|███████▊  | 39/50 [00:38<00:07,  1.45it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3333333333333334e-06, 'num_tokens': 79764.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2}


 78%|███████▊  | 39/50 [00:38<00:07,  1.45it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2222222222222223e-06, 'num_tokens': 81812.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2}


 80%|████████  | 40/50 [00:38<00:06,  1.45it/s][A

 82%|████████▏ | 41/50 [00:39<00:05,  1.53it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 83808.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2}


 82%|████████▏ | 41/50 [00:39<00:05,  1.53it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 85804.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.21}


 84%|████████▍ | 42/50 [00:39<00:05,  1.53it/s][A

 86%|████████▌ | 43/50 [00:40<00:04,  1.59it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.88888888888889e-07, 'num_tokens': 87888.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.21}


 86%|████████▌ | 43/50 [00:40<00:04,  1.59it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.777777777777779e-07, 'num_tokens': 89884.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.22}


 88%|████████▊ | 44/50 [00:41<00:03,  1.59it/s][A

 90%|█████████ | 45/50 [00:41<00:03,  1.64it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.666666666666667e-07, 'num_tokens': 91936.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23}


 90%|█████████ | 45/50 [00:41<00:03,  1.64it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 94020.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23}


 92%|█████████▏| 46/50 [00:42<00:02,  1.64it/s][A

 94%|█████████▍| 47/50 [00:42<00:01,  1.68it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444445e-07, 'num_tokens': 96016.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23}


 94%|█████████▍| 47/50 [00:42<00:01,  1.68it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333335e-07, 'num_tokens': 98064.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.24}


 96%|█████████▌| 48/50 [00:43<00:01,  1.68it/s][A

 98%|█████████▊| 49/50 [00:44<00:00,  1.48it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2222222222222224e-07, 'num_tokens': 100116.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.24}


 98%|█████████▊| 49/50 [00:44<00:00,  1.48it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1111111111111112e-07, 'num_tokens': 102168.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.25}


100%|██████████| 50/50 [00:44<00:00,  1.48it/s][A

                                               
[A{'train_runtime': 45.5636, 'train_samples_per_second': 4.389, 'train_steps_per_second': 1.097, 'train_loss': 0.0, 'epoch': 0.25}


100%|██████████| 50/50 [00:45<00:00,  1.48it/s][A
100%|██████████| 50/50 [00:45<00:00,  1.10it/s]
  Phase 1 done in 0.8 min

  [diagnostic] seed=100 raw completion (first 500 chars):
  <tool_call>
  1st-order: EV adoption surges, directly driving demand for GREEN energy and EV supply chains. 2nd-order: As EVs displace ICE vehicles, OIL demand faces structural headwinds over the 12-quarter cycle, forcing a long-term rotation away from fossil fuels. 3rd-order: The massive capital deployment into EV infrastructure acts as a massive liquidity pump, supporting TECH and REAL_ESTATE valuations. Base-rate: Today's news strongly signals a structural transition away from OIL and a green b
  [parse_action result]: metadata={} weights=[0.35, 0.05, 0.45, 0.1, 0.05] infra_commit=0.15 carbon_offset_buy=0.0 put_hedge=0.0 tech_bet='green_leaps'

── Hold-out eval (5/5 valid) ──
  mean regret: -0.0037
  beat baseline: 4/5

══ GRPO Phase 2: 8Q episodes, 100 iters, rewards=['format', 'regret', 'sharpe', 'drawdown'] ══
Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it.
Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 600 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \    Batch size per device = 6 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (6 x 1 x 1) = 6
 "-____-"     Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)


  0%|          | 0/100 [00:00<?, ?it/s][AUnsloth: Will smartly offload gradients to save VRAM!


  1%|          | 1/100 [00:05<08:43,  5.29s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 2994.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}


  1%|          | 1/100 [00:05<08:43,  5.29s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.000000000000001e-07, 'num_tokens': 6066.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}


  2%|▏         | 2/100 [00:06<08:38,  5.29s/it][A

  3%|▎         | 3/100 [00:06<03:06,  1.92s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 9270.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  3%|▎         | 3/100 [00:06<03:06,  1.92s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5e-06, 'num_tokens': 12342.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  4%|▍         | 4/100 [00:07<03:04,  1.92s/it][A

  5%|▌         | 5/100 [00:08<02:05,  1.32s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 15576.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  5%|▌         | 5/100 [00:08<02:05,  1.32s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 18714.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  6%|▌         | 6/100 [00:09<02:04,  1.32s/it][A

  7%|▋         | 7/100 [00:09<01:39,  1.07s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 21702.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  7%|▋         | 7/100 [00:09<01:39,  1.07s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5e-06, 'num_tokens': 24738.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  8%|▊         | 8/100 [00:10<01:38,  1.07s/it][A

  9%|▉         | 9/100 [00:11<01:25,  1.06it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 27726.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  9%|▉         | 9/100 [00:11<01:25,  1.06it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5e-06, 'num_tokens': 30972.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


 10%|█         | 10/100 [00:11<01:24,  1.06it/s][A

 11%|█         | 11/100 [00:12<01:18,  1.14it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 34098.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


 11%|█         | 11/100 [00:12<01:18,  1.14it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.944444444444445e-06, 'num_tokens': 37176.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


 12%|█▏        | 12/100 [00:13<01:17,  1.14it/s][A

 13%|█▎        | 13/100 [00:14<01:13,  1.19it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.888888888888889e-06, 'num_tokens': 40380.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


 13%|█▎        | 13/100 [00:14<01:13,  1.19it/s][A

 13%|█▎        | 13/100 [00:28<01:13,  1.19it/s][A

 14%|█▍        | 14/100 [00:53<11:54,  8.31s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.833333333333333e-06, 'num_tokens': 43382.0, 'completions/mean_length': 3.3333334922790527, 'completions/min_length': 1.0, 'completions/max_length': 15.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.3333334922790527, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 15.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.3333334922790527, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


 14%|█▍        | 14/100 [00:53<11:54,  8.31s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.777777777777778e-06, 'num_tokens': 46454.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 15%|█▌        | 15/100 [00:54<11:46,  8.31s/it][A

 16%|█▌        | 16/100 [00:55<07:52,  5.62s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.722222222222222e-06, 'num_tokens': 49448.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 16%|█▌        | 16/100 [00:55<07:52,  5.62s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.666666666666667e-06, 'num_tokens': 52526.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 17%|█▋        | 17/100 [00:56<07:46,  5.62s/it][A

 18%|█▊        | 18/100 [00:56<05:26,  3.98s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.611111111111112e-06, 'num_tokens': 55730.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 18%|█▊        | 18/100 [00:56<05:26,  3.98s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.555555555555556e-06, 'num_tokens': 58976.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 19%|█▉        | 19/100 [00:57<05:22,  3.98s/it][A

 20%|██        | 20/100 [00:58<03:54,  2.93s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5e-06, 'num_tokens': 62048.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 20%|██        | 20/100 [00:58<03:54,  2.93s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 65282.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 21%|██        | 21/100 [00:59<03:51,  2.93s/it][A

 22%|██▏       | 22/100 [00:59<02:54,  2.24s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.388888888888889e-06, 'num_tokens': 68354.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 22%|██▏       | 22/100 [00:59<02:54,  2.24s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.333333333333334e-06, 'num_tokens': 71588.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 23%|██▎       | 23/100 [01:00<02:52,  2.24s/it][A

 24%|██▍       | 24/100 [01:01<02:16,  1.79s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.277777777777778e-06, 'num_tokens': 74660.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 24%|██▍       | 24/100 [01:01<02:16,  1.79s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.222222222222223e-06, 'num_tokens': 77786.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 25%|██▌       | 25/100 [01:02<02:14,  1.79s/it][A

 26%|██▌       | 26/100 [01:03<01:55,  1.56s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.166666666666667e-06, 'num_tokens': 80858.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 26%|██▌       | 26/100 [01:03<01:55,  1.56s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.111111111111111e-06, 'num_tokens': 84062.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 27%|██▋       | 27/100 [01:04<01:53,  1.56s/it][A

 28%|██▊       | 28/100 [01:05<01:34,  1.31s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.055555555555556e-06, 'num_tokens': 87140.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 28%|██▊       | 28/100 [01:05<01:34,  1.31s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 90212.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 29%|██▉       | 29/100 [01:05<01:32,  1.31s/it][A

 30%|███       | 30/100 [01:06<01:19,  1.13s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.944444444444445e-06, 'num_tokens': 93248.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 30%|███       | 30/100 [01:06<01:19,  1.13s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 96236.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 31%|███       | 31/100 [01:07<01:18,  1.13s/it][A

 32%|███▏      | 32/100 [01:07<01:08,  1.01s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.833333333333334e-06, 'num_tokens': 99482.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 32%|███▏      | 32/100 [01:07<01:08,  1.01s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.777777777777778e-06, 'num_tokens': 102608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 33%|███▎      | 33/100 [01:08<01:07,  1.01s/it][A

 34%|███▍      | 34/100 [01:09<01:01,  1.08it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7222222222222225e-06, 'num_tokens': 105596.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 34%|███▍      | 34/100 [01:09<01:01,  1.08it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6666666666666666e-06, 'num_tokens': 108668.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 35%|███▌      | 35/100 [01:10<01:00,  1.08it/s][A

 36%|███▌      | 36/100 [01:10<00:55,  1.15it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6111111111111115e-06, 'num_tokens': 111656.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 36%|███▌      | 36/100 [01:10<00:55,  1.15it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.555555555555556e-06, 'num_tokens': 114650.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 37%|███▋      | 37/100 [01:11<00:54,  1.15it/s][A

 38%|███▊      | 38/100 [01:12<00:51,  1.21it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5e-06, 'num_tokens': 117728.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 38%|███▊      | 38/100 [01:12<00:51,  1.21it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.444444444444445e-06, 'num_tokens': 120764.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 39%|███▉      | 39/100 [01:13<00:50,  1.21it/s][A

 40%|████      | 40/100 [01:13<00:48,  1.24it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3888888888888893e-06, 'num_tokens': 123998.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 40%|████      | 40/100 [01:13<00:48,  1.24it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 127034.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 41%|████      | 41/100 [01:14<00:47,  1.24it/s][A

 42%|████▏     | 42/100 [01:15<00:45,  1.28it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.277777777777778e-06, 'num_tokens': 130160.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 42%|████▏     | 42/100 [01:15<00:45,  1.28it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2222222222222227e-06, 'num_tokens': 133316.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 43%|████▎     | 43/100 [01:16<00:44,  1.28it/s][A

 44%|████▍     | 44/100 [01:16<00:43,  1.29it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1666666666666667e-06, 'num_tokens': 136550.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 44%|████▍     | 44/100 [01:16<00:43,  1.29it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1111111111111116e-06, 'num_tokens': 139622.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 45%|████▌     | 45/100 [01:17<00:42,  1.29it/s][A

 46%|████▌     | 46/100 [01:18<00:41,  1.31it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.055555555555556e-06, 'num_tokens': 142826.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 46%|████▌     | 46/100 [01:18<00:41,  1.31it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 145814.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 47%|████▋     | 47/100 [01:19<00:40,  1.31it/s][A

 48%|████▊     | 48/100 [01:19<00:39,  1.32it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.944444444444445e-06, 'num_tokens': 149060.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 48%|████▊     | 48/100 [01:19<00:39,  1.32it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.888888888888889e-06, 'num_tokens': 152054.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 49%|████▉     | 49/100 [01:20<00:38,  1.32it/s][A

 50%|█████     | 50/100 [01:21<00:37,  1.33it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.8333333333333335e-06, 'num_tokens': 155288.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 50%|█████     | 50/100 [01:21<00:37,  1.33it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 158426.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 51%|█████     | 51/100 [01:22<00:36,  1.33it/s][A

 52%|█████▏    | 52/100 [01:23<00:40,  1.20it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7222222222222224e-06, 'num_tokens': 161414.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 52%|█████▏    | 52/100 [01:23<00:40,  1.20it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.666666666666667e-06, 'num_tokens': 164552.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 53%|█████▎    | 53/100 [01:24<00:39,  1.20it/s][A

 54%|█████▍    | 54/100 [01:24<00:37,  1.23it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.6111111111111113e-06, 'num_tokens': 167756.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 54%|█████▍    | 54/100 [01:24<00:37,  1.23it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5555555555555557e-06, 'num_tokens': 170828.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 55%|█████▌    | 55/100 [01:25<00:36,  1.23it/s][A

 56%|█████▌    | 56/100 [01:26<00:34,  1.27it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 173900.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 56%|█████▌    | 56/100 [01:26<00:34,  1.27it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4444444444444447e-06, 'num_tokens': 177056.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 57%|█████▋    | 57/100 [01:27<00:33,  1.27it/s][A

 58%|█████▊    | 58/100 [01:27<00:32,  1.28it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.388888888888889e-06, 'num_tokens': 180302.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 58%|█████▊    | 58/100 [01:27<00:32,  1.28it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.3333333333333336e-06, 'num_tokens': 183290.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 59%|█████▉    | 59/100 [01:28<00:31,  1.28it/s][A

 60%|██████    | 60/100 [01:29<00:31,  1.28it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.277777777777778e-06, 'num_tokens': 186326.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 60%|██████    | 60/100 [01:29<00:31,  1.28it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 189398.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 61%|██████    | 61/100 [01:30<00:30,  1.28it/s][A

 62%|██████▏   | 62/100 [01:30<00:29,  1.31it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.166666666666667e-06, 'num_tokens': 192470.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 62%|██████▏   | 62/100 [01:30<00:29,  1.31it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.1111111111111114e-06, 'num_tokens': 195608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 63%|██████▎   | 63/100 [01:31<00:28,  1.31it/s][A

 64%|██████▍   | 64/100 [01:32<00:27,  1.31it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0555555555555555e-06, 'num_tokens': 198734.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 64%|██████▍   | 64/100 [01:32<00:27,  1.31it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 201806.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 65%|██████▌   | 65/100 [01:33<00:26,  1.31it/s][A

 66%|██████▌   | 66/100 [01:33<00:25,  1.33it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.944444444444445e-06, 'num_tokens': 204794.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 66%|██████▌   | 66/100 [01:33<00:25,  1.33it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.888888888888889e-06, 'num_tokens': 207830.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 67%|██████▋   | 67/100 [01:34<00:24,  1.33it/s][A

 68%|██████▊   | 68/100 [01:35<00:23,  1.34it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8333333333333333e-06, 'num_tokens': 210902.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 68%|██████▊   | 68/100 [01:35<00:23,  1.34it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.777777777777778e-06, 'num_tokens': 213974.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 69%|██████▉   | 69/100 [01:36<00:23,  1.34it/s][A

 70%|███████   | 70/100 [01:36<00:22,  1.35it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.7222222222222224e-06, 'num_tokens': 217046.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 70%|███████   | 70/100 [01:36<00:22,  1.35it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 220292.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 71%|███████   | 71/100 [01:37<00:21,  1.35it/s][A

 72%|███████▏  | 72/100 [01:38<00:20,  1.34it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6111111111111113e-06, 'num_tokens': 223280.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 72%|███████▏  | 72/100 [01:38<00:20,  1.34it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5555555555555558e-06, 'num_tokens': 226358.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 73%|███████▎  | 73/100 [01:39<00:20,  1.34it/s][A

 74%|███████▍  | 74/100 [01:39<00:19,  1.34it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5e-06, 'num_tokens': 229496.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 74%|███████▍  | 74/100 [01:39<00:19,  1.34it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4444444444444445e-06, 'num_tokens': 232730.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 75%|███████▌  | 75/100 [01:40<00:18,  1.34it/s][A

 76%|███████▌  | 76/100 [01:41<00:19,  1.20it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888892e-06, 'num_tokens': 235802.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 76%|███████▌  | 76/100 [01:41<00:19,  1.20it/s][A

 76%|███████▌  | 76/100 [01:52<00:19,  1.20it/s][A

 77%|███████▋  | 77/100 [02:19<02:47,  7.28s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3333333333333334e-06, 'num_tokens': 238797.0, 'completions/mean_length': 2.1666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 8.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 2.1666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 8.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 2.1666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 77%|███████▋  | 77/100 [02:19<02:47,  7.28s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2777777777777779e-06, 'num_tokens': 241935.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 78%|███████▊  | 78/100 [02:19<02:40,  7.28s/it][A

 79%|███████▉  | 79/100 [02:20<01:46,  5.09s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2222222222222223e-06, 'num_tokens': 244923.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 79%|███████▉  | 79/100 [02:20<01:46,  5.09s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1666666666666668e-06, 'num_tokens': 247995.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 80%|████████  | 80/100 [02:21<01:41,  5.09s/it][A

 81%|████████  | 81/100 [02:22<01:09,  3.68s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 251067.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 81%|████████  | 81/100 [02:22<01:09,  3.68s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0555555555555557e-06, 'num_tokens': 254145.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 82%|████████▏ | 82/100 [02:22<01:06,  3.68s/it][A

 83%|████████▎ | 83/100 [02:23<00:46,  2.75s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 257181.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 83%|████████▎ | 83/100 [02:23<00:46,  2.75s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.444444444444445e-07, 'num_tokens': 260253.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 84%|████████▍ | 84/100 [02:24<00:44,  2.75s/it][A

 85%|████████▌ | 85/100 [02:25<00:31,  2.13s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.88888888888889e-07, 'num_tokens': 263247.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 85%|████████▌ | 85/100 [02:25<00:31,  2.13s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.333333333333333e-07, 'num_tokens': 266451.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 86%|████████▌ | 86/100 [02:26<00:29,  2.13s/it][A

 87%|████████▋ | 87/100 [02:26<00:22,  1.73s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.777777777777779e-07, 'num_tokens': 269439.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 87%|████████▋ | 87/100 [02:26<00:22,  1.73s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.222222222222222e-07, 'num_tokens': 272427.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 88%|████████▊ | 88/100 [02:27<00:20,  1.73s/it][A

 89%|████████▉ | 89/100 [02:28<00:15,  1.43s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.666666666666667e-07, 'num_tokens': 275499.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 89%|████████▉ | 89/100 [02:28<00:15,  1.43s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.111111111111112e-07, 'num_tokens': 278655.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 90%|█████████ | 90/100 [02:29<00:14,  1.43s/it][A

 91%|█████████ | 91/100 [02:29<00:10,  1.22s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 281691.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 91%|█████████ | 91/100 [02:29<00:10,  1.22s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.000000000000001e-07, 'num_tokens': 284925.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 92%|█████████▏| 92/100 [02:30<00:09,  1.22s/it][A

 93%|█████████▎| 93/100 [02:31<00:07,  1.08s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444445e-07, 'num_tokens': 287913.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 93%|█████████▎| 93/100 [02:31<00:07,  1.08s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.8888888888888895e-07, 'num_tokens': 290985.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 94%|█████████▍| 94/100 [02:32<00:06,  1.08s/it][A

 95%|█████████▌| 95/100 [02:32<00:04,  1.02it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333335e-07, 'num_tokens': 294123.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 95%|█████████▌| 95/100 [02:32<00:04,  1.02it/s][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777776e-07, 'num_tokens': 297357.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 96%|█████████▌| 96/100 [02:33<00:03,  1.02it/s][A

 96%|█████████▌| 96/100 [02:44<00:03,  1.02it/s][A

 97%|█████████▋| 97/100 [03:05<00:16,  5.61s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2222222222222224e-07, 'num_tokens': 300361.0, 'completions/mean_length': 3.6666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 17.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.6666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 17.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.6666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 97%|█████████▋| 97/100 [03:05<00:16,  5.61s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666668e-07, 'num_tokens': 303517.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 98%|█████████▊| 98/100 [03:06<00:11,  5.61s/it][A

 99%|█████████▉| 99/100 [03:06<00:04,  4.15s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1111111111111112e-07, 'num_tokens': 306721.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


 99%|█████████▉| 99/100 [03:06<00:04,  4.15s/it][A

                                                
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555556e-08, 'num_tokens': 309859.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


100%|██████████| 100/100 [03:07<00:00,  4.15s/it][A

                                                 
[A{'train_runtime': 188.2628, 'train_samples_per_second': 3.187, 'train_steps_per_second': 0.531, 'train_loss': 0.0, 'epoch': 0.17}


100%|██████████| 100/100 [03:08<00:00,  4.15s/it][A
100%|██████████| 100/100 [03:08<00:00,  1.88s/it]
  Phase 2 done in 3.1 min

  [diagnostic] seed=100 raw completion (first 500 chars):
  <tool_call>
  1st-order: Insurers exiting Florida and California triggers a massive flight-to-safety, driving 10-year Treasuries down and freezing municipal bonds. 2nd-order: The freeze in municipal bonds directly crushes the yield curve, making long-duration BONDS a dead asset over the next 12 quarters. 3rd-order: The physical loss of insurance capital in the Gulf Coast and Bay Area will eventually trigger a broader real estate market correction, severely hurting REAL_ESTATE. Base case: Deflation
  [parse_action result]: metadata={} weights=[0.2, 0.05, 0.05, 0.0, 0.7] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary'

── Hold-out eval (5/5 valid) ──
  mean regret: -0.0391
  beat baseline: 2/5

══ GRPO Phase 3: 12Q episodes, 80 iters, rewards=['format', 'regret', 'sharpe', 'drawdown', 'carbon'] ══
Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it.
Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 480 | Num Epochs = 1 | Total steps = 80
O^O/ \_/ \    Batch size per device = 6 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (6 x 1 x 1) = 6
 "-____-"     Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)


  0%|          | 0/80 [00:00<?, ?it/s][AUnsloth: Will smartly offload gradients to save VRAM!


[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 3216.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}


  1%|▏         | 1/80 [00:00<01:04,  1.22it/s][A

  2%|▎         | 2/80 [00:01<01:00,  1.28it/s][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.25e-07, 'num_tokens': 6288.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}


  2%|▎         | 2/80 [00:01<01:00,  1.28it/s][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.25e-06, 'num_tokens': 9426.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  4%|▍         | 3/80 [00:02<01:00,  1.28it/s][A

  5%|▌         | 4/80 [00:03<00:58,  1.31it/s][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8750000000000003e-06, 'num_tokens': 12564.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  5%|▌         | 4/80 [00:03<00:58,  1.31it/s][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 15810.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  6%|▋         | 5/80 [00:03<00:57,  1.31it/s][A

  8%|▊         | 6/80 [00:11<02:47,  2.27s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.125e-06, 'num_tokens': 18807.0, 'completions/mean_length': 69.16667175292969, 'completions/min_length': 1.0, 'completions/max_length': 400.0, 'completions/clipped_ratio': 0.16666666666666663, 'completions/mean_terminated_length': 3.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 11.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 69.16667175292969, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  8%|▊         | 6/80 [00:11<02:47,  2.27s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7500000000000005e-06, 'num_tokens': 22041.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


  9%|▉         | 7/80 [00:11<02:45,  2.27s/it][A

 10%|█         | 8/80 [00:12<02:00,  1.67s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.3750000000000005e-06, 'num_tokens': 25287.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


 10%|█         | 8/80 [00:12<02:00,  1.67s/it][A

                                              
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 28413.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


 11%|█▏        | 9/80 [00:13<01:58,  1.67s/it][A

 12%|█▎        | 10/80 [00:14<01:33,  1.34s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.930555555555556e-06, 'num_tokens': 31629.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


 12%|█▎        | 10/80 [00:14<01:33,  1.34s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.861111111111111e-06, 'num_tokens': 34863.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


 14%|█▍        | 11/80 [00:14<01:32,  1.34s/it][A

 15%|█▌        | 12/80 [00:15<01:17,  1.13s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.791666666666668e-06, 'num_tokens': 37851.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 15%|█▌        | 12/80 [00:15<01:17,  1.13s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.722222222222222e-06, 'num_tokens': 40923.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 16%|█▋        | 13/80 [00:16<01:16,  1.13s/it][A

 18%|█▊        | 14/80 [00:17<01:06,  1.00s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.652777777777779e-06, 'num_tokens': 44061.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 18%|█▊        | 14/80 [00:17<01:06,  1.00s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.583333333333333e-06, 'num_tokens': 47295.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 19%|█▉        | 15/80 [00:17<01:05,  1.00s/it][A

 20%|██        | 16/80 [00:18<00:58,  1.09it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5138888888888895e-06, 'num_tokens': 50283.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


 20%|██        | 16/80 [00:18<00:58,  1.09it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 53361.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 21%|██▏       | 17/80 [00:19<00:57,  1.09it/s][A

 22%|██▎       | 18/80 [00:20<00:53,  1.16it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.3750000000000005e-06, 'num_tokens': 56577.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 22%|██▎       | 18/80 [00:20<00:53,  1.16it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.305555555555556e-06, 'num_tokens': 59565.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 24%|██▍       | 19/80 [00:20<00:52,  1.16it/s][A

 25%|██▌       | 20/80 [00:21<00:52,  1.15it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.236111111111111e-06, 'num_tokens': 62571.0, 'completions/mean_length': 4.0, 'completions/min_length': 1.0, 'completions/max_length': 19.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 4.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 19.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 4.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 25%|██▌       | 20/80 [00:21<00:52,  1.15it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.166666666666667e-06, 'num_tokens': 65787.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


 26%|██▋       | 21/80 [00:23<00:51,  1.15it/s][A

 28%|██▊       | 22/80 [00:23<00:53,  1.08it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.097222222222222e-06, 'num_tokens': 69003.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 28%|██▊       | 22/80 [00:23<00:53,  1.08it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.027777777777779e-06, 'num_tokens': 72141.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 29%|██▉       | 23/80 [00:24<00:52,  1.08it/s][A

 30%|███       | 24/80 [00:25<00:48,  1.15it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.958333333333333e-06, 'num_tokens': 75213.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 30%|███       | 24/80 [00:25<00:48,  1.15it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 78447.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 31%|███▏      | 25/80 [00:26<00:47,  1.15it/s][A

 32%|███▎      | 26/80 [00:27<00:45,  1.19it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.819444444444444e-06, 'num_tokens': 81675.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


 32%|███▎      | 26/80 [00:27<00:45,  1.19it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7500000000000005e-06, 'num_tokens': 84813.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 34%|███▍      | 27/80 [00:27<00:44,  1.19it/s][A

 35%|███▌      | 28/80 [00:28<00:43,  1.20it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.680555555555556e-06, 'num_tokens': 87808.0, 'completions/mean_length': 2.1666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 8.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 2.1666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 8.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 2.1666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 35%|███▌      | 28/80 [00:28<00:43,  1.20it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6111111111111115e-06, 'num_tokens': 90880.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 36%|███▋      | 29/80 [00:29<00:42,  1.20it/s][A

 38%|███▊      | 30/80 [00:30<00:40,  1.25it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5416666666666673e-06, 'num_tokens': 94006.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 38%|███▊      | 30/80 [00:30<00:40,  1.25it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.4722222222222224e-06, 'num_tokens': 97252.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


 39%|███▉      | 31/80 [00:30<00:39,  1.25it/s][A

 40%|████      | 32/80 [00:31<00:37,  1.26it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.4027777777777783e-06, 'num_tokens': 100468.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 40%|████      | 32/80 [00:31<00:37,  1.26it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 103672.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 41%|████▏     | 33/80 [00:32<00:37,  1.26it/s][A

 42%|████▎     | 34/80 [00:33<00:37,  1.22it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2638888888888892e-06, 'num_tokens': 106689.0, 'completions/mean_length': 5.833333492279053, 'completions/min_length': 1.0, 'completions/max_length': 16.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 5.833333492279053, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 16.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 5.833333492279053, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 42%|████▎     | 34/80 [00:33<00:37,  1.22it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1944444444444443e-06, 'num_tokens': 109761.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 44%|████▍     | 35/80 [00:34<00:36,  1.22it/s][A

 44%|████▍     | 35/80 [00:48<00:36,  1.22it/s][A

 45%|████▌     | 36/80 [01:04<03:50,  5.24s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.125e-06, 'num_tokens': 112761.0, 'completions/mean_length': 3.0, 'completions/min_length': 1.0, 'completions/max_length': 13.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 13.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


 45%|████▌     | 36/80 [01:04<03:50,  5.24s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.055555555555556e-06, 'num_tokens': 115995.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 46%|████▋     | 37/80 [01:05<03:45,  5.24s/it][A

 48%|████▊     | 38/80 [01:05<02:43,  3.89s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.986111111111111e-06, 'num_tokens': 119073.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 48%|████▊     | 38/80 [01:05<02:43,  3.89s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.916666666666667e-06, 'num_tokens': 122319.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 49%|████▉     | 39/80 [01:06<02:39,  3.89s/it][A

 50%|█████     | 40/80 [01:07<01:58,  2.95s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.8472222222222224e-06, 'num_tokens': 125523.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


 50%|█████     | 40/80 [01:07<01:58,  2.95s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 128559.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 51%|█████▏    | 41/80 [01:08<01:55,  2.95s/it][A

 52%|█████▎    | 42/80 [01:09<01:30,  2.37s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7083333333333334e-06, 'num_tokens': 131547.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 52%|█████▎    | 42/80 [01:09<01:30,  2.37s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.6388888888888893e-06, 'num_tokens': 134751.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 54%|█████▍    | 43/80 [01:10<01:27,  2.37s/it][A

 55%|█████▌    | 44/80 [01:11<01:07,  1.89s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5694444444444443e-06, 'num_tokens': 137967.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 55%|█████▌    | 44/80 [01:11<01:07,  1.89s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 141171.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


 56%|█████▋    | 45/80 [01:11<01:06,  1.89s/it][A

 57%|█████▊    | 46/80 [01:12<00:52,  1.55s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4305555555555557e-06, 'num_tokens': 144309.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 57%|█████▊    | 46/80 [01:12<00:52,  1.55s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.361111111111111e-06, 'num_tokens': 147609.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 59%|█████▉    | 47/80 [01:13<00:51,  1.55s/it][A

 60%|██████    | 48/80 [01:14<00:42,  1.31s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2916666666666666e-06, 'num_tokens': 150849.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 60%|██████    | 48/80 [01:14<00:42,  1.31s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 154065.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 61%|██████▏   | 49/80 [01:14<00:40,  1.31s/it][A

 62%|██████▎   | 50/80 [01:15<00:34,  1.15s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.152777777777778e-06, 'num_tokens': 157311.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


 62%|██████▎   | 50/80 [01:15<00:34,  1.15s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0833333333333334e-06, 'num_tokens': 160551.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 64%|██████▍   | 51/80 [01:16<00:33,  1.15s/it][A

 65%|██████▌   | 52/80 [01:17<00:28,  1.03s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0138888888888893e-06, 'num_tokens': 163623.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 65%|██████▌   | 52/80 [01:17<00:28,  1.03s/it][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.944444444444445e-06, 'num_tokens': 166863.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 66%|██████▋   | 53/80 [01:17<00:27,  1.03s/it][A

 68%|██████▊   | 54/80 [01:18<00:24,  1.06it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8750000000000003e-06, 'num_tokens': 170079.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 68%|██████▊   | 54/80 [01:18<00:24,  1.06it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8055555555555557e-06, 'num_tokens': 173319.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


 69%|██████▉   | 55/80 [01:19<00:23,  1.06it/s][A

 70%|███████   | 56/80 [01:20<00:21,  1.12it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.7361111111111112e-06, 'num_tokens': 176457.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 70%|███████   | 56/80 [01:20<00:21,  1.12it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 179595.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 71%|███████▏  | 57/80 [01:20<00:20,  1.12it/s][A

 72%|███████▎  | 58/80 [01:21<00:18,  1.18it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5972222222222221e-06, 'num_tokens': 182799.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 72%|███████▎  | 58/80 [01:21<00:18,  1.18it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.527777777777778e-06, 'num_tokens': 185937.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 74%|███████▍  | 59/80 [01:22<00:17,  1.18it/s][A

 75%|███████▌  | 60/80 [01:23<00:16,  1.22it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4583333333333335e-06, 'num_tokens': 189009.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


 75%|███████▌  | 60/80 [01:23<00:16,  1.22it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888892e-06, 'num_tokens': 192255.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 76%|███████▋  | 61/80 [01:24<00:15,  1.22it/s][A

 78%|███████▊  | 62/80 [01:25<00:15,  1.13it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3194444444444446e-06, 'num_tokens': 195483.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 78%|███████▊  | 62/80 [01:25<00:15,  1.13it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.25e-06, 'num_tokens': 198555.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 79%|███████▉  | 63/80 [01:25<00:15,  1.13it/s][A

 80%|████████  | 64/80 [01:26<00:13,  1.19it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1805555555555556e-06, 'num_tokens': 201633.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


 80%|████████  | 64/80 [01:26<00:13,  1.19it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 204636.0, 'completions/mean_length': 3.5, 'completions/min_length': 1.0, 'completions/max_length': 16.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.5, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 16.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.5, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 81%|████████▏ | 65/80 [01:27<00:12,  1.19it/s][A

 82%|████████▎ | 66/80 [01:28<00:11,  1.18it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0416666666666667e-06, 'num_tokens': 207708.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 82%|████████▎ | 66/80 [01:28<00:11,  1.18it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.722222222222224e-07, 'num_tokens': 211008.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 84%|████████▍ | 67/80 [01:29<00:11,  1.18it/s][A

 85%|████████▌ | 68/80 [01:29<00:09,  1.22it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.027777777777779e-07, 'num_tokens': 213996.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 85%|████████▌ | 68/80 [01:29<00:09,  1.22it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.333333333333333e-07, 'num_tokens': 216998.0, 'completions/mean_length': 3.3333334922790527, 'completions/min_length': 1.0, 'completions/max_length': 15.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.3333334922790527, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 15.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.3333334922790527, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


 86%|████████▋ | 69/80 [01:30<00:09,  1.22it/s][A

 88%|████████▊ | 70/80 [01:31<00:08,  1.20it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.63888888888889e-07, 'num_tokens': 220124.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 88%|████████▊ | 70/80 [01:31<00:08,  1.20it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.944444444444446e-07, 'num_tokens': 223160.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 89%|████████▉ | 71/80 [01:32<00:07,  1.20it/s][A

 90%|█████████ | 72/80 [01:33<00:06,  1.24it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.25e-07, 'num_tokens': 226238.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 90%|█████████ | 72/80 [01:33<00:06,  1.24it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 229478.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 91%|█████████▏| 73/80 [01:33<00:05,  1.24it/s][A

 92%|█████████▎| 74/80 [01:34<00:04,  1.26it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.861111111111112e-07, 'num_tokens': 232706.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


 92%|█████████▎| 74/80 [01:34<00:04,  1.26it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.1666666666666667e-07, 'num_tokens': 235844.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 94%|█████████▍| 75/80 [01:35<00:03,  1.26it/s][A

 95%|█████████▌| 76/80 [01:36<00:03,  1.28it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.472222222222223e-07, 'num_tokens': 238922.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 95%|█████████▌| 76/80 [01:36<00:03,  1.28it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777776e-07, 'num_tokens': 241994.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 96%|█████████▋| 77/80 [01:36<00:02,  1.28it/s][A

 98%|█████████▊| 78/80 [01:37<00:01,  1.29it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0833333333333333e-07, 'num_tokens': 245240.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 98%|█████████▊| 78/80 [01:37<00:01,  1.29it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888888e-07, 'num_tokens': 248396.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


 99%|█████████▉| 79/80 [01:38<00:00,  1.29it/s][A

100%|██████████| 80/80 [01:39<00:00,  1.30it/s][A

                                               
[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.944444444444444e-08, 'num_tokens': 251534.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


100%|██████████| 80/80 [01:39<00:00,  1.30it/s][A

                                               
[A{'train_runtime': 100.0827, 'train_samples_per_second': 4.796, 'train_steps_per_second': 0.799, 'train_loss': 0.0, 'epoch': 0.17}


100%|██████████| 80/80 [01:40<00:00,  1.30it/s][A
100%|██████████| 80/80 [01:40<00:00,  1.25s/it]
  Phase 3 done in 1.7 min

  [diagnostic] seed=100 raw completion (first 500 chars):
  <tool_call>
  1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the massive overcapacity in the oil sector will be rapidly absorbed by industrial demand, driving a structural inflationary squeeze. This stagflationary regime will crush BONDS and compress REAL_ESTATE valuations. 3rd-order: The forced lo
  [parse_action result]: metadata={} weights=[0.09523809523809523, 0.42857142857142855, 0.047619047619047616, 0.09523809523809523, 0.3333333333333333] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary'

── Hold-out eval (5/5 valid) ──
  mean regret: -0.0941
  beat baseline: 3/5
Found HuggingFace hub cache directory: /tmp/CarbonAlpha/hf_cache/hub
Checking cache directory for required files...


Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`:   0%|          | 0/2 [00:00<?, ?it/s][A

Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 100%|██████████| 2/2 [00:01<00:00,  1.37it/s][A
Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 100%|██████████| 2/2 [00:01<00:00,  1.37it/s]
Successfully copied all 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files:   0%|          | 0/2 [00:00<?, ?it/s][A
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 60787.01it/s]


Unsloth: Merging weights into 16bit:   0%|          | 0/2 [00:00<?, ?it/s][A

Unsloth: Merging weights into 16bit:  50%|█████     | 1/2 [00:31<00:31, 31.55s/it][A

Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [00:55<00:00, 26.86s/it][A
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [00:55<00:00, 27.56s/it]
Unsloth: Merge process complete. Saved to `/tmp/CarbonAlpha/checkpoints/final_merged`

Saved LoRA adapters to /tmp/CarbonAlpha/checkpoints/final_merged
[rank0]:[W425 10:13:19.025103781 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())