v2_phase_all_run2: training log

754ff58 verified about 1 month ago

334 kB

	🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
	🦥 Unsloth Zoo will now patch everything to make training faster!
	Loading unsloth/Qwen3-4B-Instruct-2507...
	INFO 04-25 09:59:05 [vllm_utils.py:724] Unsloth: Patching vLLM v1 graph capture
	==((====))== Unsloth 2026.4.8: Fast Qwen3 patching. Transformers: 4.56.2. vLLM: 0.15.1.
	\\ /\| NVIDIA L40S. Num GPUs = 1. Max memory: 44.392 GB. Platform: Linux.
	O^O/ \_/ \ Torch: 2.9.1+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.1
	\ / Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]
	"-____-" Free license: http://github.com/unslothai/unsloth
	Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
	Unsloth: FlashInfer requires JIT compilation but nvcc (CUDA compiler) is not found.
	vLLM will use FLASH_ATTN attention + PyTorch sampler instead (works fine).
	To enable FlashInfer, install the missing tools:
	nvcc - install the CUDA toolkit or set CUDA_HOME to your CUDA installation
	ninja - pip install ninja
	To silence this warning: set UNSLOTH_VLLM_NO_FLASHINFER=1
	Unsloth: vLLM loading unsloth/Qwen3-4B-Instruct-2507 with actual GPU utilization = 89.06%
	Unsloth: Your GPU has CUDA compute capability 8.9 with VRAM = 44.39 GB.
	Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 4096. Num Sequences = 96.
	Unsloth: vLLM's KV Cache can use up to 32.5 GB. Also swap space = 6 GB.
	Unsloth: Not an error, but `use_cudagraph` is not supported in vLLM.config.CompilationConfig. Skipping.
	Unsloth: Not an error, but `use_inductor` is not supported in vLLM.config.CompilationConfig. Skipping.
	WARNING 04-25 09:59:07 [compilation.py:762] Level is deprecated and will be removed in the next release,either 0.12.0 or 0.11.2 whichever is soonest.Use mode instead.If both level and mode are given,only mode will be used.
	Unsloth: Not an error, but `device` is not supported in vLLM. Skipping.
	/root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/pydantic/type_adapter.py:607: UserWarning: Pydantic serializer warnings:
	PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [field_name='mode', input_value=3, input_type=int])
	return self.serializer.to_python(
	INFO 04-25 09:59:07 [utils.py:261] non-default args: {'dtype': torch.bfloat16, 'max_model_len': 4096, 'enable_prefix_caching': True, 'swap_space': 6, 'gpu_memory_utilization': 0.8906117106477057, 'max_num_batched_tokens': 8192, 'max_num_seqs': 96, 'max_logprobs': 0, 'disable_log_stats': True, 'enable_lora': True, 'enable_chunked_prefill': True, 'compilation_config': {'level': 3, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': [], 'splitting_ops': None, 'compile_mm_encoder': False, 'compile_sizes': None, 'compile_ranges_split_points': None, 'inductor_compile_config': {'epilogue_fusion': True, 'max_autotune': False, 'shape_padding': True, 'trace.enabled': False, 'triton.cudagraphs': False, 'debug': False, 'dce': True, 'memory_planning': True, 'coordinate_descent_tuning': False, 'trace.graph_diagram': False, 'compile_threads': 8, 'group_fusion': True, 'disable_progress': False, 'verbose_progress': True, 'triton.multi_kernel': 0, 'triton.use_block_ptr': True, 'triton.enable_persistent_tma_matmul': True, 'triton.autotune_at_compile_time': False, 'triton.cooperative_reductions': False, 'cuda.compile_opt_level': '-O2', 'cuda.enable_cuda_lto': True, 'combo_kernels': False, 'benchmark_combo_kernel': True, 'combo_kernel_foreach_dynamic_shapes': True, 'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': None, 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': None, 'pass_config': {}, 'max_cudagraph_capture_size': None, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}, 'model': 'unsloth/Qwen3-4B-Instruct-2507'}
	WARNING 04-25 09:59:07 [arg_utils.py:1220] The global random seed is set to 0. Since VLLM_ENABLE_V1_MULTIPROCESSING is set to False, this may affect the random state of the Python process that launched vLLM.
	INFO 04-25 09:59:14 [model.py:541] Resolved architecture: Qwen3ForCausalLM
	INFO 04-25 09:59:14 [model.py:1561] Using max model len 4096
	INFO 04-25 09:59:15 [scheduler.py:226] Chunked prefill is enabled with max_num_batched_tokens=8192.
	INFO 04-25 09:59:15 [vllm.py:624] Asynchronous scheduling is enabled.


	generation_config.json: 0%\| \| 0.00/237 [00:00<?, ?B/s][A
	generation_config.json: 100%\|██████████\| 237/237 [00:00<00:00, 1.79MB/s]


	tokenizer_config.json: 0%\| \| 0.00/9.65k [00:00<?, ?B/s][A
	tokenizer_config.json: 100%\|██████████\| 9.65k/9.65k [00:00<00:00, 60.1MB/s]


	vocab.json: 0%\| \| 0.00/2.78M [00:00<?, ?B/s][A
	vocab.json: 100%\|██████████\| 2.78M/2.78M [00:00<00:00, 53.7MB/s]


	merges.txt: 0%\| \| 0.00/1.67M [00:00<?, ?B/s][A
	merges.txt: 100%\|██████████\| 1.67M/1.67M [00:00<00:00, 83.0MB/s]


	tokenizer.json: 0%\| \| 0.00/11.4M [00:00<?, ?B/s][A
	tokenizer.json: 100%\|██████████\| 11.4M/11.4M [00:00<00:00, 44.7MB/s]


	added_tokens.json: 0%\| \| 0.00/707 [00:00<?, ?B/s][A
	added_tokens.json: 100%\|██████████\| 707/707 [00:00<00:00, 7.32MB/s]


	special_tokens_map.json: 0%\| \| 0.00/614 [00:00<?, ?B/s][A
	special_tokens_map.json: 100%\|██████████\| 614/614 [00:00<00:00, 3.17MB/s]


	chat_template.jinja: 0%\| \| 0.00/4.04k [00:00<?, ?B/s][A
	chat_template.jinja: 100%\|██████████\| 4.04k/4.04k [00:00<00:00, 43.4MB/s]
	/root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/pydantic/type_adapter.py:607: UserWarning: Pydantic serializer warnings:
	PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [field_name='mode', input_value=3, input_type=int])
	return self.serializer.to_python(
	INFO 04-25 09:59:16 [core.py:96] Initializing a V1 LLM engine (v0.15.1) with config: model='unsloth/Qwen3-4B-Instruct-2507', speculative_config=None, tokenizer='unsloth/Qwen3-4B-Instruct-2507', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=unsloth/Qwen3-4B-Instruct-2507, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': 3, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [8192], 'inductor_compile_config': {'epilogue_fusion': True, 'max_autotune': False, 'shape_padding': True, 'trace.enabled': False, 'triton.cudagraphs': False, 'debug': False, 'dce': True, 'memory_planning': True, 'coordinate_descent_tuning': False, 'trace.graph_diagram': False, 'compile_threads': 8, 'group_fusion': True, 'disable_progress': False, 'verbose_progress': True, 'triton.multi_kernel': 0, 'triton.use_block_ptr': True, 'triton.enable_persistent_tma_matmul': True, 'triton.autotune_at_compile_time': False, 'triton.cooperative_reductions': False, 'cuda.compile_opt_level': '-O2', 'cuda.enable_cuda_lto': True, 'combo_kernels': False, 'benchmark_combo_kernel': True, 'combo_kernel_foreach_dynamic_shapes': True, 'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 192, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}
	INFO 04-25 09:59:16 [parallel_state.py:1212] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.113.93.102:50843 backend=nccl
	INFO 04-25 09:59:16 [parallel_state.py:1423] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
	INFO 04-25 09:59:16 [gpu_model_runner.py:4033] Starting to load model unsloth/Qwen3-4B-Instruct-2507...
	/root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
	We recommend installing via `pip install torch-c-dlpack-ext`
	warnings.warn(
	INFO 04-25 09:59:19 [cuda.py:364] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION')


	model.safetensors.index.json: 0%\| \| 0.00/32.9k [00:00<?, ?B/s][A
	model.safetensors.index.json: 100%\|██████████\| 32.9k/32.9k [00:00<00:00, 120MB/s]


	model-00001-of-00002.safetensors: 0%\| \| 0.00/4.97G [00:00<?, ?B/s][A

	model-00001-of-00002.safetensors: 3%\|▎ \| 134M/4.97G [00:01<00:37, 128MB/s][A

	model-00001-of-00002.safetensors: 31%\|███ \| 1.54G/4.97G [00:02<00:04, 753MB/s][A

	model-00001-of-00002.safetensors: 53%\|█████▎ \| 2.61G/4.97G [00:03<00:02, 826MB/s][A
	model-00001-of-00002.safetensors: 100%\|██████████\| 4.97G/4.97G [00:04<00:00, 1.24GB/s]


	model-00002-of-00002.safetensors: 0%\| \| 0.00/3.08G [00:00<?, ?B/s][A

	model-00002-of-00002.safetensors: 0%\| \| 0.00/3.08G [00:01<?, ?B/s][A

	model-00002-of-00002.safetensors: 11%\|█ \| 332M/3.08G [00:02<00:09, 304MB/s][A

	model-00002-of-00002.safetensors: 83%\|████████▎ \| 2.54G/3.08G [00:03<00:00, 1.25GB/s][A
	model-00002-of-00002.safetensors: 100%\|██████████\| 3.08G/3.08G [00:04<00:00, 640MB/s]
	INFO 04-25 09:59:29 [weight_utils.py:527] Time spent downloading weights for unsloth/Qwen3-4B-Instruct-2507: 8.877664 seconds


	Loading safetensors checkpoint shards: 0% Completed \| 0/2 [00:00<?, ?it/s]
	[A
	Loading safetensors checkpoint shards: 100% Completed \| 2/2 [00:00<00:00, 2.75it/s]

	INFO 04-25 09:59:29 [default_loader.py:291] Loading weights took 0.74 seconds
	INFO 04-25 09:59:29 [punica_selector.py:20] Using PunicaWrapperGPU.
	INFO 04-25 09:59:30 [gpu_model_runner.py:4130] Model loading took 7.67 GiB memory and 12.958485 seconds
	INFO 04-25 09:59:42 [backends.py:812] Using cache directory: /root/.cache/vllm/torch_compile_cache/f6f5a6d496/rank_0_0/backbone for vLLM's torch.compile
	INFO 04-25 09:59:42 [backends.py:872] Dynamo bytecode transform time: 11.11 s


	Unsloth: Compiling kernels: 0%\| \| 0/5 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/5 [00:00<?, ?it/s, triton_red_fused__to_copy_add_embedding_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 20%\|██ \| 1/5 [00:00<00:01, 3.47it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_1][A

	Unsloth: Compiling kernels: 40%\|████ \| 2/5 [00:00<00:01, 2.12it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_2][A

	Unsloth: Compiling kernels: 60%\|██████ \| 3/5 [00:00<00:00, 3.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_3][A

	Unsloth: Compiling kernels: 80%\|████████ \| 4/5 [00:00<00:00, 4.21it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_4][A
	Unsloth: Compiling kernels: 100%\|██████████\| 5/5 [00:00<00:00, 5.25it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_4]
	INFO 04-25 09:59:52 [backends.py:302] Cache the graph of compile range (1, 8192) for later use


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 477.82it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 480.28it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 523.70it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 103.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 54.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 40.69it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 17.98it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 859.49it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 820.24it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 853.43it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 822.33it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 810.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 805.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 800.42it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 869.65it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 840.96it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 874.36it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 851.94it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 842.30it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 830.75it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 824.33it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 809.87it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 802.05it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 848.08it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 829.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 818.91it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 808.51it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 804.41it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 625.92it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 669.91it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 719.43it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 712.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 701.93it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 704.02it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 708.10it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 841.05it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 817.13it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 859.49it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 841.55it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 830.29it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 823.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 818.24it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 804.12it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 815.22it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 854.64it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 834.31it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 825.26it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 817.87it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 805.47it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 848.02it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 844.60it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 882.39it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 857.99it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 843.42it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 835.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 828.73it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 815.38it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 821.77it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 856.74it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 832.91it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 822.64it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 812.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 803.44it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 883.20it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 860.72it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 886.06it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 855.28it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 837.45it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 828.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 822.97it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 860.19it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 838.69it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 879.80it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 847.51it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 830.23it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 821.20it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 813.71it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 880.79it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 846.31it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 875.82it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 845.63it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 830.88it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 821.55it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 816.38it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 806.13it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 809.55it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 850.20it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 828.18it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 807.03it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 796.26it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 791.31it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 837.02it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 838.78it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 872.60it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 850.17it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 836.92it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 827.03it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 821.31it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 874.72it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 859.05it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 894.50it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 867.76it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 856.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 843.95it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 838.67it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 885.06it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 848.79it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 889.00it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 830.02it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 821.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 818.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 815.76it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 849.39it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 848.45it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 880.29it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 858.30it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 847.37it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 837.94it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 828.73it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 857.56it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 848.02it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 886.50it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 850.21it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 841.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 833.72it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 826.56it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 845.11it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 820.64it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 855.69it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 836.14it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 825.23it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 818.45it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 806.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 858.26it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 849.65it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 888.75it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 864.98it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 850.50it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 841.78it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 828.96it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 879.31it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 864.98it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 902.13it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 872.13it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 852.29it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 841.50it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 835.38it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 890.70it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 863.03it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 899.49it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 864.05it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 843.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 830.94it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 822.85it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 886.75it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 861.78it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 886.00it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 861.34it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 845.59it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 833.53it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 816.22it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 836.02it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 816.17it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 858.55it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 837.98it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 829.54it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 821.58it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 817.83it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 745.79it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 790.71it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 841.27it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 828.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 823.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 817.92it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 796.12it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 818.72it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 820.48it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 857.20it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 834.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 815.60it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 811.57it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 804.83it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 878.20it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 853.28it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 894.18it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 866.32it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 853.09it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 840.46it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 833.79it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 852.15it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 826.63it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 861.73it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 842.78it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 829.04it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 819.87it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 813.53it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 845.28it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 841.22it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 881.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 856.94it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 840.88it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 829.98it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 823.34it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 882.08it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 861.70it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 885.06it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 857.91it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 817.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 811.85it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 803.81it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 892.79it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 850.94it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 882.02it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 862.80it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 845.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 835.19it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 828.89it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 871.27it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 851.55it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 881.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 857.69it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 842.30it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 834.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 829.08it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 899.68it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 877.47it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 907.53it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 870.87it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 856.89it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 848.05it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 840.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 826.63it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 825.49it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 874.66it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 855.59it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 845.46it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 836.19it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 827.96it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 14%\|█▍ \| 1/7 [00:00<00:00, 837.02it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 29%\|██▊ \| 2/7 [00:00<00:00, 833.94it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A

	Unsloth: Compiling kernels: 43%\|████▎ \| 3/7 [00:00<00:00, 879.99it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A

	Unsloth: Compiling kernels: 57%\|█████▋ \| 4/7 [00:00<00:00, 855.81it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A

	Unsloth: Compiling kernels: 71%\|███████▏ \| 5/7 [00:00<00:00, 845.28it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A

	Unsloth: Compiling kernels: 86%\|████████▌ \| 6/7 [00:00<00:00, 837.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A
	Unsloth: Compiling kernels: 100%\|██████████\| 7/7 [00:00<00:00, 831.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]


	Unsloth: Compiling kernels: 0%\| \| 0/3 [00:00<?, ?it/s][A

	Unsloth: Compiling kernels: 0%\| \| 0/3 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A

	Unsloth: Compiling kernels: 33%\|███▎ \| 1/3 [00:00<00:00, 711.62it/s, triton_poi_fused_mul_silu_slice_1] [A

	Unsloth: Compiling kernels: 67%\|██████▋ \| 2/3 [00:00<00:00, 803.35it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A
	Unsloth: Compiling kernels: 100%\|██████████\| 3/3 [00:00<00:00, 20.50it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
	INFO 04-25 10:00:01 [backends.py:319] Compiling a graph for compile range (1, 8192) takes 12.73 s
	INFO 04-25 10:00:01 [monitor.py:34] torch.compile takes 23.84 s in total
	INFO 04-25 10:00:02 [gpu_worker.py:356] Available KV cache memory: 31.08 GiB
	INFO 04-25 10:00:02 [kv_cache_utils.py:1307] GPU KV cache size: 226,336 tokens
	INFO 04-25 10:00:02 [kv_cache_utils.py:1312] Maximum concurrency for 4,096 tokens per request: 55.26x
	INFO 04-25 10:00:02 [vllm_utils.py:729] Unsloth: Running patched vLLM v1 `capture_model`.


	Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%\| \| 0/54 [00:00<?, ?it/s][AWARNING 04-25 10:00:03 [utils.py:268] Using default LoRA kernel configs


	Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 2%\|▏ \| 1/54 [00:02<02:05, 2.37s/it][A

	Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 7%\|▋ \| 4/54 [00:03<00:41, 1.22it/s][A

	Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 28%\|██▊ \| 15/54 [00:04<00:09, 4.23it/s][A

	Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 39%\|███▉ \| 21/54 [00:07<00:09, 3.43it/s][A

	Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 61%\|██████ \| 33/54 [00:08<00:03, 5.37it/s][A

	Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 83%\|████████▎ \| 45/54 [00:09<00:01, 6.87it/s][A

	Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%\|██████████\| 54/54 [00:12<00:00, 4.61it/s][A
	Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%\|██████████\| 54/54 [00:12<00:00, 4.27it/s]


	Capturing CUDA graphs (decode, FULL): 0%\| \| 0/30 [00:00<?, ?it/s][A

	Capturing CUDA graphs (decode, FULL): 37%\|███▋ \| 11/30 [00:01<00:01, 10.43it/s][A

	Capturing CUDA graphs (decode, FULL): 77%\|███████▋ \| 23/30 [00:02<00:00, 11.21it/s][A
	Capturing CUDA graphs (decode, FULL): 100%\|██████████\| 30/30 [00:02<00:00, 10.98it/s]
	INFO 04-25 10:00:18 [gpu_model_runner.py:5063] Graph capturing finished in 15 secs, took 0.69 GiB
	INFO 04-25 10:00:18 [vllm_utils.py:736] Unsloth: Patched vLLM v1 graph capture finished in 15 secs.
	INFO 04-25 10:00:19 [core.py:272] init engine (profile, create kv cache, warmup model) took 48.93 seconds
	INFO 04-25 10:00:21 [llm.py:343] Supported tasks: ('generate',)
	Unsloth: Just some info: will skip parsing ['layer_norm2', 'q_norm', 'post_attention_layernorm', 'norm', 'input_layernorm', 'ffn_norm', 'post_layernorm', 'norm1', 'layer_norm1', 'attention_norm', 'post_feedforward_layernorm', 'k_norm', 'norm2', 'pre_feedforward_layernorm']


	Loading checkpoint shards: 0%\| \| 0/2 [00:00<?, ?it/s][A
	Loading checkpoint shards: 100%\|██████████\| 2/2 [00:00<00:00, 46.31it/s]
	Some weights of Qwen3ForCausalLM were not initialized from the model checkpoint at unsloth/Qwen3-4B-Instruct-2507 and are newly initialized: ['lm_head.weight']
	You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
	Performing substitution for additional_keys=set()
	Unsloth: Just some info: will skip parsing ['layer_norm2', 'q_norm', 'post_attention_layernorm', 'norm', 'input_layernorm', 'cross_attn_input_layernorm', 'ffn_norm', 'post_layernorm', 'norm1', 'cross_attn_post_attention_layernorm', 'layer_norm1', 'attention_norm', 'post_feedforward_layernorm', 'k_norm', 'norm2', 'pre_feedforward_layernorm']
	unsloth/Qwen3-4B-Instruct-2507 does not have a padding token! Will use pad_token = <\|PAD_TOKEN\|>.
	Unsloth 2026.4.8 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers.
	VRAM allocated: 41.84 GB

	══ SFT warm-start — sft_traces/traces_v2.jsonl ══
	120 SFT examples loaded (chat format in `text`)


	Unsloth: Tokenizing ["text"] (num_proc=12): 0%\| \| 0/120 [00:00<?, ? examples/s][A

	Unsloth: Tokenizing ["text"] (num_proc=12): 8%\|▊ \| 10/120 [00:01<00:13, 8.20 examples/s][A

	Unsloth: Tokenizing ["text"] (num_proc=12): 92%\|█████████▏\| 110/120 [00:02<00:00, 56.30 examples/s][A
	Unsloth: Tokenizing ["text"] (num_proc=12): 100%\|██████████\| 120/120 [00:02<00:00, 46.73 examples/s]
	🦥 Unsloth: Padding-free auto-enabled, enabling faster training.
	==((====))== Unsloth - 2x faster free finetuning \| Num GPUs used = 1
	\\ /\| Num examples = 120 \| Num Epochs = 10 \| Total steps = 150
	O^O/ \_/ \ Batch size per device = 2 \| Gradient accumulation steps = 4
	\ / Data Parallel GPUs = 1 \| Total batch size (2 x 4 x 1) = 8
	"-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)


	0%\| \| 0/150 [00:00<?, ?it/s][AUnsloth: Will smartly offload gradients to save VRAM!


	1%\| \| 1/150 [00:04<10:16, 4.14s/it][A

	1%\|▏ \| 2/150 [00:05<06:02, 2.45s/it][A

	2%\|▏ \| 3/150 [00:06<04:42, 1.92s/it][A

	3%\|▎ \| 4/150 [00:08<04:26, 1.82s/it][A

	3%\|▎ \| 5/150 [00:09<04:02, 1.67s/it][A


	[A{'loss': 3.6266, 'grad_norm': 2.663548231124878, 'learning_rate': 2.5e-05, 'epoch': 0.33}


	3%\|▎ \| 5/150 [00:09<04:02, 1.67s/it][A

	4%\|▍ \| 6/150 [00:11<03:41, 1.54s/it][A

	5%\|▍ \| 7/150 [00:12<03:28, 1.46s/it][A

	5%\|▌ \| 8/150 [00:13<03:18, 1.40s/it][A

	6%\|▌ \| 9/150 [00:14<03:12, 1.36s/it][A

	7%\|▋ \| 10/150 [00:16<03:07, 1.34s/it][A


	[A{'loss': 3.3225, 'grad_norm': 2.001558542251587, 'learning_rate': 4.9647887323943665e-05, 'epoch': 0.67}


	7%\|▋ \| 10/150 [00:16<03:07, 1.34s/it][A

	7%\|▋ \| 11/150 [00:17<03:04, 1.33s/it][A

	8%\|▊ \| 12/150 [00:18<03:00, 1.31s/it][A

	9%\|▊ \| 13/150 [00:20<02:58, 1.30s/it][A

	9%\|▉ \| 14/150 [00:21<02:56, 1.30s/it][A

	10%\|█ \| 15/150 [00:22<02:55, 1.30s/it][A


	[A{'loss': 2.7371, 'grad_norm': 0.9380167722702026, 'learning_rate': 4.788732394366197e-05, 'epoch': 1.0}


	10%\|█ \| 15/150 [00:22<02:55, 1.30s/it][A

	11%\|█ \| 16/150 [00:23<02:53, 1.30s/it][A

	11%\|█▏ \| 17/150 [00:25<02:52, 1.30s/it][A

	12%\|█▏ \| 18/150 [00:26<02:50, 1.29s/it][A

	13%\|█▎ \| 19/150 [00:27<02:49, 1.29s/it][A

	13%\|█▎ \| 20/150 [00:29<02:48, 1.30s/it][A


	[A{'loss': 2.365, 'grad_norm': 0.8978179693222046, 'learning_rate': 4.6126760563380286e-05, 'epoch': 1.33}


	13%\|█▎ \| 20/150 [00:29<02:48, 1.30s/it][A

	14%\|█▍ \| 21/150 [00:30<02:47, 1.29s/it][A

	15%\|█▍ \| 22/150 [00:31<02:45, 1.29s/it][A

	15%\|█▌ \| 23/150 [00:32<02:43, 1.29s/it][A

	16%\|█▌ \| 24/150 [00:34<02:41, 1.28s/it][A

	17%\|█▋ \| 25/150 [00:35<02:39, 1.28s/it][A


	[A{'loss': 2.0451, 'grad_norm': 0.9256548285484314, 'learning_rate': 4.436619718309859e-05, 'epoch': 1.67}


	17%\|█▋ \| 25/150 [00:35<02:39, 1.28s/it][A

	17%\|█▋ \| 26/150 [00:36<02:38, 1.28s/it][A

	18%\|█▊ \| 27/150 [00:38<02:38, 1.29s/it][A

	19%\|█▊ \| 28/150 [00:39<02:36, 1.29s/it][A

	19%\|█▉ \| 29/150 [00:40<02:35, 1.28s/it][A

	20%\|██ \| 30/150 [00:41<02:34, 1.28s/it][A


	[A{'loss': 1.7249, 'grad_norm': 0.8666767477989197, 'learning_rate': 4.26056338028169e-05, 'epoch': 2.0}


	20%\|██ \| 30/150 [00:41<02:34, 1.28s/it][A

	21%\|██ \| 31/150 [00:43<02:32, 1.28s/it][A

	21%\|██▏ \| 32/150 [00:44<02:32, 1.29s/it][A

	22%\|██▏ \| 33/150 [00:45<02:30, 1.29s/it][A

	23%\|██▎ \| 34/150 [00:47<02:29, 1.29s/it][A

	23%\|██▎ \| 35/150 [00:48<02:28, 1.29s/it][A


	[A{'loss': 1.4079, 'grad_norm': 1.2116891145706177, 'learning_rate': 4.0845070422535214e-05, 'epoch': 2.33}


	23%\|██▎ \| 35/150 [00:48<02:28, 1.29s/it][A

	24%\|██▍ \| 36/150 [00:49<02:26, 1.29s/it][A

	25%\|██▍ \| 37/150 [00:50<02:24, 1.28s/it][A

	25%\|██▌ \| 38/150 [00:52<02:24, 1.29s/it][A

	26%\|██▌ \| 39/150 [00:53<02:22, 1.29s/it][A

	27%\|██▋ \| 40/150 [00:54<02:21, 1.29s/it][A


	[A{'loss': 1.1155, 'grad_norm': 0.8696402311325073, 'learning_rate': 3.908450704225352e-05, 'epoch': 2.67}


	27%\|██▋ \| 40/150 [00:54<02:21, 1.29s/it][A

	27%\|██▋ \| 41/150 [00:56<02:20, 1.29s/it][A

	28%\|██▊ \| 42/150 [00:57<02:19, 1.29s/it][A

	29%\|██▊ \| 43/150 [00:58<02:17, 1.29s/it][A

	29%\|██▉ \| 44/150 [01:00<02:17, 1.29s/it][A

	30%\|███ \| 45/150 [01:01<02:14, 1.29s/it][A


	[A{'loss': 0.9477, 'grad_norm': 0.5664961338043213, 'learning_rate': 3.7323943661971835e-05, 'epoch': 3.0}


	30%\|███ \| 45/150 [01:01<02:14, 1.29s/it][A

	31%\|███ \| 46/150 [01:02<02:13, 1.28s/it][A

	31%\|███▏ \| 47/150 [01:03<02:11, 1.28s/it][A

	32%\|███▏ \| 48/150 [01:05<02:10, 1.28s/it][A

	33%\|███▎ \| 49/150 [01:06<02:09, 1.29s/it][A

	33%\|███▎ \| 50/150 [01:07<02:09, 1.30s/it][A


	[A{'loss': 0.8914, 'grad_norm': 0.4789012372493744, 'learning_rate': 3.556338028169014e-05, 'epoch': 3.33}


	33%\|███▎ \| 50/150 [01:07<02:09, 1.30s/it][A

	34%\|███▍ \| 51/150 [01:09<02:07, 1.29s/it][A

	35%\|███▍ \| 52/150 [01:10<02:06, 1.29s/it][A

	35%\|███▌ \| 53/150 [01:11<02:04, 1.29s/it][A

	36%\|███▌ \| 54/150 [01:12<02:03, 1.29s/it][A

	37%\|███▋ \| 55/150 [01:14<02:02, 1.29s/it][A


	[A{'loss': 0.8417, 'grad_norm': 0.3655957579612732, 'learning_rate': 3.380281690140845e-05, 'epoch': 3.67}


	37%\|███▋ \| 55/150 [01:14<02:02, 1.29s/it][A

	37%\|███▋ \| 56/150 [01:15<02:00, 1.29s/it][A

	38%\|███▊ \| 57/150 [01:16<01:59, 1.29s/it][A

	39%\|███▊ \| 58/150 [01:18<01:59, 1.30s/it][A

	39%\|███▉ \| 59/150 [01:19<01:58, 1.30s/it][A

	40%\|████ \| 60/150 [01:20<01:56, 1.30s/it][A


	[A{'loss': 0.8088, 'grad_norm': 0.36159124970436096, 'learning_rate': 3.204225352112676e-05, 'epoch': 4.0}


	40%\|████ \| 60/150 [01:20<01:56, 1.30s/it][A

	41%\|████ \| 61/150 [01:21<01:56, 1.30s/it][A

	41%\|████▏ \| 62/150 [01:23<01:54, 1.30s/it][A

	42%\|████▏ \| 63/150 [01:24<01:53, 1.30s/it][A

	43%\|████▎ \| 64/150 [01:25<01:51, 1.30s/it][A

	43%\|████▎ \| 65/150 [01:27<01:50, 1.30s/it][A


	[A{'loss': 0.7978, 'grad_norm': 0.3379436433315277, 'learning_rate': 3.028169014084507e-05, 'epoch': 4.33}


	43%\|████▎ \| 65/150 [01:27<01:50, 1.30s/it][A

	44%\|████▍ \| 66/150 [01:28<01:49, 1.30s/it][A

	45%\|████▍ \| 67/150 [01:29<01:47, 1.30s/it][A

	45%\|████▌ \| 68/150 [01:31<01:46, 1.29s/it][A

	46%\|████▌ \| 69/150 [01:32<01:44, 1.29s/it][A

	47%\|████▋ \| 70/150 [01:33<01:43, 1.29s/it][A


	[A{'loss': 0.7577, 'grad_norm': 0.3583666682243347, 'learning_rate': 2.8521126760563384e-05, 'epoch': 4.67}


	47%\|████▋ \| 70/150 [01:33<01:43, 1.29s/it][A

	47%\|████▋ \| 71/150 [01:34<01:42, 1.29s/it][A

	48%\|████▊ \| 72/150 [01:36<01:41, 1.30s/it][A

	49%\|████▊ \| 73/150 [01:37<01:40, 1.31s/it][A

	49%\|████▉ \| 74/150 [01:38<01:39, 1.30s/it][A

	50%\|█████ \| 75/150 [01:40<01:37, 1.30s/it][A


	[A{'loss': 0.7794, 'grad_norm': 0.33592215180397034, 'learning_rate': 2.676056338028169e-05, 'epoch': 5.0}


	50%\|█████ \| 75/150 [01:40<01:37, 1.30s/it][A

	51%\|█████ \| 76/150 [01:41<01:36, 1.30s/it][A

	51%\|█████▏ \| 77/150 [01:42<01:35, 1.30s/it][A

	52%\|█████▏ \| 78/150 [01:44<01:33, 1.30s/it][A

	53%\|█████▎ \| 79/150 [01:45<01:32, 1.30s/it][A

	53%\|█████▎ \| 80/150 [01:46<01:30, 1.30s/it][A


	[A{'loss': 0.7684, 'grad_norm': 0.3456568121910095, 'learning_rate': 2.5e-05, 'epoch': 5.33}


	53%\|█████▎ \| 80/150 [01:46<01:30, 1.30s/it][A

	54%\|█████▍ \| 81/150 [01:47<01:29, 1.30s/it][A

	55%\|█████▍ \| 82/150 [01:49<01:28, 1.30s/it][A

	55%\|█████▌ \| 83/150 [01:50<01:26, 1.29s/it][A

	56%\|█████▌ \| 84/150 [01:51<01:25, 1.29s/it][A

	57%\|█████▋ \| 85/150 [01:53<01:23, 1.29s/it][A


	[A{'loss': 0.7243, 'grad_norm': 0.33662667870521545, 'learning_rate': 2.323943661971831e-05, 'epoch': 5.67}


	57%\|█████▋ \| 85/150 [01:53<01:23, 1.29s/it][A

	57%\|█████▋ \| 86/150 [01:54<01:22, 1.29s/it][A

	58%\|█████▊ \| 87/150 [01:55<01:20, 1.28s/it][A

	59%\|█████▊ \| 88/150 [01:56<01:19, 1.29s/it][A

	59%\|█████▉ \| 89/150 [01:58<01:18, 1.29s/it][A

	60%\|██████ \| 90/150 [01:59<01:17, 1.29s/it][A


	[A{'loss': 0.7285, 'grad_norm': 0.3644108772277832, 'learning_rate': 2.147887323943662e-05, 'epoch': 6.0}


	60%\|██████ \| 90/150 [01:59<01:17, 1.29s/it][A

	61%\|██████ \| 91/150 [02:00<01:16, 1.29s/it][A

	61%\|██████▏ \| 92/150 [02:02<01:15, 1.30s/it][A

	62%\|██████▏ \| 93/150 [02:03<01:13, 1.29s/it][A

	63%\|██████▎ \| 94/150 [02:04<01:12, 1.29s/it][A

	63%\|██████▎ \| 95/150 [02:05<01:10, 1.29s/it][A


	[A{'loss': 0.7192, 'grad_norm': 0.35359156131744385, 'learning_rate': 1.971830985915493e-05, 'epoch': 6.33}


	63%\|██████▎ \| 95/150 [02:05<01:10, 1.29s/it][A

	64%\|██████▍ \| 96/150 [02:07<01:10, 1.30s/it][A

	65%\|██████▍ \| 97/150 [02:08<01:08, 1.30s/it][A

	65%\|██████▌ \| 98/150 [02:09<01:07, 1.30s/it][A

	66%\|██████▌ \| 99/150 [02:11<01:06, 1.30s/it][A

	67%\|██████▋ \| 100/150 [02:12<01:05, 1.30s/it][A


	[A{'loss': 0.7025, 'grad_norm': 0.3457960784435272, 'learning_rate': 1.7957746478873243e-05, 'epoch': 6.67}


	67%\|██████▋ \| 100/150 [02:12<01:05, 1.30s/it][A

	67%\|██████▋ \| 101/150 [02:13<01:03, 1.30s/it][A

	68%\|██████▊ \| 102/150 [02:15<01:02, 1.30s/it][A

	69%\|██████▊ \| 103/150 [02:16<01:00, 1.29s/it][A

	69%\|██████▉ \| 104/150 [02:17<00:59, 1.30s/it][A

	70%\|███████ \| 105/150 [02:18<00:58, 1.30s/it][A


	[A{'loss': 0.7215, 'grad_norm': 0.3716900646686554, 'learning_rate': 1.619718309859155e-05, 'epoch': 7.0}


	70%\|███████ \| 105/150 [02:18<00:58, 1.30s/it][A

	71%\|███████ \| 106/150 [02:20<00:57, 1.30s/it][A

	71%\|███████▏ \| 107/150 [02:21<00:56, 1.30s/it][A

	72%\|███████▏ \| 108/150 [02:22<00:54, 1.30s/it][A

	73%\|███████▎ \| 109/150 [02:24<00:53, 1.30s/it][A

	73%\|███████▎ \| 110/150 [02:25<00:51, 1.30s/it][A


	[A{'loss': 0.6965, 'grad_norm': 0.35728198289871216, 'learning_rate': 1.443661971830986e-05, 'epoch': 7.33}


	73%\|███████▎ \| 110/150 [02:25<00:51, 1.30s/it][A

	74%\|███████▍ \| 111/150 [02:26<00:50, 1.30s/it][A

	75%\|███████▍ \| 112/150 [02:28<00:49, 1.30s/it][A

	75%\|███████▌ \| 113/150 [02:29<00:48, 1.30s/it][A

	76%\|███████▌ \| 114/150 [02:30<00:46, 1.30s/it][A

	77%\|███████▋ \| 115/150 [02:31<00:45, 1.30s/it][A


	[A{'loss': 0.701, 'grad_norm': 0.3863743245601654, 'learning_rate': 1.267605633802817e-05, 'epoch': 7.67}


	77%\|███████▋ \| 115/150 [02:31<00:45, 1.30s/it][A

	77%\|███████▋ \| 116/150 [02:33<00:44, 1.30s/it][A

	78%\|███████▊ \| 117/150 [02:34<00:42, 1.30s/it][A

	79%\|███████▊ \| 118/150 [02:35<00:41, 1.30s/it][A

	79%\|███████▉ \| 119/150 [02:37<00:40, 1.31s/it][A

	80%\|████████ \| 120/150 [02:38<00:39, 1.31s/it][A


	[A{'loss': 0.691, 'grad_norm': 0.38696053624153137, 'learning_rate': 1.0915492957746478e-05, 'epoch': 8.0}


	80%\|████████ \| 120/150 [02:38<00:39, 1.31s/it][A

	81%\|████████ \| 121/150 [02:39<00:38, 1.32s/it][A

	81%\|████████▏ \| 122/150 [02:41<00:36, 1.31s/it][A

	82%\|████████▏ \| 123/150 [02:42<00:35, 1.31s/it][A

	83%\|████████▎ \| 124/150 [02:43<00:33, 1.31s/it][A

	83%\|████████▎ \| 125/150 [02:45<00:32, 1.30s/it][A


	[A{'loss': 0.6836, 'grad_norm': 0.3782326579093933, 'learning_rate': 9.15492957746479e-06, 'epoch': 8.33}


	83%\|████████▎ \| 125/150 [02:45<00:32, 1.30s/it][A

	84%\|████████▍ \| 126/150 [02:46<00:31, 1.30s/it][A

	85%\|████████▍ \| 127/150 [02:47<00:30, 1.31s/it][A

	85%\|████████▌ \| 128/150 [02:48<00:28, 1.30s/it][A

	86%\|████████▌ \| 129/150 [02:50<00:27, 1.30s/it][A

	87%\|████████▋ \| 130/150 [02:51<00:26, 1.30s/it][A


	[A{'loss': 0.6819, 'grad_norm': 0.3920275866985321, 'learning_rate': 7.394366197183099e-06, 'epoch': 8.67}


	87%\|████████▋ \| 130/150 [02:51<00:26, 1.30s/it][A

	87%\|████████▋ \| 131/150 [02:52<00:24, 1.30s/it][A

	88%\|████████▊ \| 132/150 [02:54<00:23, 1.29s/it][A

	89%\|████████▊ \| 133/150 [02:55<00:22, 1.30s/it][A

	89%\|████████▉ \| 134/150 [02:56<00:20, 1.30s/it][A

	90%\|█████████ \| 135/150 [02:58<00:19, 1.30s/it][A


	[A{'loss': 0.6833, 'grad_norm': 0.37108415365219116, 'learning_rate': 5.6338028169014084e-06, 'epoch': 9.0}


	90%\|█████████ \| 135/150 [02:58<00:19, 1.30s/it][A

	91%\|█████████ \| 136/150 [02:59<00:18, 1.30s/it][A

	91%\|█████████▏\| 137/150 [03:00<00:16, 1.30s/it][A

	92%\|█████████▏\| 138/150 [03:01<00:15, 1.30s/it][A

	93%\|█████████▎\| 139/150 [03:03<00:14, 1.29s/it][A

	93%\|█████████▎\| 140/150 [03:04<00:12, 1.29s/it][A


	[A{'loss': 0.6688, 'grad_norm': 0.3897058367729187, 'learning_rate': 3.873239436619718e-06, 'epoch': 9.33}


	93%\|█████████▎\| 140/150 [03:04<00:12, 1.29s/it][A

	94%\|█████████▍\| 141/150 [03:05<00:11, 1.29s/it][A

	95%\|█████████▍\| 142/150 [03:07<00:10, 1.29s/it][A

	95%\|█████████▌\| 143/150 [03:08<00:09, 1.29s/it][A

	96%\|█████████▌\| 144/150 [03:09<00:07, 1.30s/it][A

	97%\|█████████▋\| 145/150 [03:10<00:06, 1.30s/it][A


	[A{'loss': 0.6744, 'grad_norm': 0.3871634006500244, 'learning_rate': 2.112676056338028e-06, 'epoch': 9.67}


	97%\|█████████▋\| 145/150 [03:10<00:06, 1.30s/it][A

	97%\|█████████▋\| 146/150 [03:12<00:05, 1.30s/it][A

	98%\|█████████▊\| 147/150 [03:13<00:03, 1.30s/it][A

	99%\|█████████▊\| 148/150 [03:14<00:02, 1.33s/it][A

	99%\|█████████▉\| 149/150 [03:16<00:01, 1.32s/it][A

	100%\|██████████\| 150/150 [03:17<00:00, 1.31s/it][A


	[A{'loss': 0.6839, 'grad_norm': 0.40198108553886414, 'learning_rate': 3.5211267605633803e-07, 'epoch': 10.0}


	100%\|██████████\| 150/150 [03:17<00:00, 1.31s/it][A


	[A{'train_runtime': 198.1987, 'train_samples_per_second': 6.055, 'train_steps_per_second': 0.757, 'train_loss': 1.1565947977701823, 'epoch': 10.0}


	100%\|██████████\| 150/150 [03:18<00:00, 1.31s/it][A
	100%\|██████████\| 150/150 [03:18<00:00, 1.32s/it]
	SFT done in 3.3 min

	══ Pre-GRPO hold-out eval (SFT-only) ══

	[diagnostic] seed=100 raw completion (first 500 chars):
	<tool_call>
	1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the immediate 3-year cumulative real return is heavily penalized. The 12-quarter lockup forces a defensive tilt. 3rd-order: The fragmentation of global supply chains acts as a massive structural headwind for TECH and GREEN. The base case
	[parse_action result]: metadata={} weights=[0.0, 0.4, 0.0, 0.2, 0.4] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='fragmentation'

	── Hold-out eval (5/5 valid) ──
	mean regret: -0.2516
	beat baseline: 0/5

	══ GRPO Phase 1: 4Q episodes, 50 iters, rewards=['format', 'regret'] ══
	Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it.
	Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it.
	==((====))== Unsloth - 2x faster free finetuning \| Num GPUs used = 1
	\\ /\| Num examples = 200 \| Num Epochs = 1 \| Total steps = 50
	O^O/ \_/ \ Batch size per device = 4 \| Gradient accumulation steps = 1
	\ / Data Parallel GPUs = 1 \| Total batch size (4 x 1 x 1) = 4
	"-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)


	0%\| \| 0/50 [00:00<?, ?it/s][AWARNING 04-25 10:04:33 [input_processor.py:287] vLLM has deprecated support for supporting different tokenizers for different LoRAs. By default, vLLM uses base model's tokenizer. If you are using a LoRA with its own tokenizer, consider specifying `--tokenizer [lora_path]` to use the LoRA tokenizer.
	Unsloth: Will smartly offload gradients to save VRAM!


	2%\|▏ \| 1/50 [00:14<11:30, 14.08s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 1996.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	2%\|▏ \| 1/50 [00:14<11:30, 14.08s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 4044.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	4%\|▍ \| 2/50 [00:14<11:15, 14.08s/it][A

	6%\|▌ \| 3/50 [00:15<03:16, 4.18s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 6092.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	6%\|▌ \| 3/50 [00:15<03:16, 4.18s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 8140.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	8%\|▊ \| 4/50 [00:16<03:12, 4.18s/it][A

	10%\|█ \| 5/50 [00:16<01:44, 2.33s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 10224.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	10%\|█ \| 5/50 [00:16<01:44, 2.33s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 12248.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	12%\|█▏ \| 6/50 [00:17<01:42, 2.33s/it][A

	14%\|█▍ \| 7/50 [00:17<01:07, 1.58s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.888888888888889e-06, 'num_tokens': 14296.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	14%\|█▍ \| 7/50 [00:17<01:07, 1.58s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.777777777777778e-06, 'num_tokens': 16348.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	16%\|█▌ \| 8/50 [00:18<01:06, 1.58s/it][A

	18%\|█▊ \| 9/50 [00:18<00:49, 1.20s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.666666666666667e-06, 'num_tokens': 18396.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	18%\|█▊ \| 9/50 [00:18<00:49, 1.20s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.555555555555556e-06, 'num_tokens': 20444.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	20%\|██ \| 10/50 [00:19<00:48, 1.20s/it][A

	22%\|██▏ \| 11/50 [00:20<00:38, 1.02it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 22492.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	22%\|██▏ \| 11/50 [00:20<00:38, 1.02it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.333333333333334e-06, 'num_tokens': 24544.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	24%\|██▍ \| 12/50 [00:20<00:37, 1.02it/s][A

	26%\|██▌ \| 13/50 [00:21<00:34, 1.07it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.222222222222223e-06, 'num_tokens': 26592.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	26%\|██▌ \| 13/50 [00:21<00:34, 1.07it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.111111111111111e-06, 'num_tokens': 28640.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	28%\|██▊ \| 14/50 [00:22<00:33, 1.07it/s][A

	30%\|███ \| 15/50 [00:22<00:28, 1.23it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 30688.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	30%\|███ \| 15/50 [00:22<00:28, 1.23it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 32736.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	32%\|███▏ \| 16/50 [00:23<00:27, 1.23it/s][A

	34%\|███▍ \| 17/50 [00:24<00:24, 1.36it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.777777777777778e-06, 'num_tokens': 34784.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	34%\|███▍ \| 17/50 [00:24<00:24, 1.36it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6666666666666666e-06, 'num_tokens': 36832.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	36%\|███▌ \| 18/50 [00:24<00:23, 1.36it/s][A

	38%\|███▊ \| 19/50 [00:25<00:21, 1.46it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.555555555555556e-06, 'num_tokens': 38884.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	38%\|███▊ \| 19/50 [00:25<00:21, 1.46it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.444444444444445e-06, 'num_tokens': 40908.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	40%\|████ \| 20/50 [00:25<00:20, 1.46it/s][A

	42%\|████▏ \| 21/50 [00:26<00:18, 1.55it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 42904.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	42%\|████▏ \| 21/50 [00:26<00:18, 1.55it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2222222222222227e-06, 'num_tokens': 44988.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	44%\|████▍ \| 22/50 [00:26<00:18, 1.55it/s][A

	46%\|████▌ \| 23/50 [00:27<00:16, 1.60it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1111111111111116e-06, 'num_tokens': 46984.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	46%\|████▌ \| 23/50 [00:27<00:16, 1.60it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 49008.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	48%\|████▊ \| 24/50 [00:28<00:16, 1.60it/s][A

	50%\|█████ \| 25/50 [00:29<00:17, 1.44it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.888888888888889e-06, 'num_tokens': 51092.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	50%\|█████ \| 25/50 [00:29<00:17, 1.44it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 53176.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	52%\|█████▏ \| 26/50 [00:29<00:16, 1.44it/s][A

	54%\|█████▍ \| 27/50 [00:30<00:15, 1.52it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.666666666666667e-06, 'num_tokens': 55172.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	54%\|█████▍ \| 27/50 [00:30<00:15, 1.52it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5555555555555557e-06, 'num_tokens': 57224.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	56%\|█████▌ \| 28/50 [00:30<00:14, 1.52it/s][A

	58%\|█████▊ \| 29/50 [00:31<00:13, 1.58it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4444444444444447e-06, 'num_tokens': 59308.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	58%\|█████▊ \| 29/50 [00:31<00:13, 1.58it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.3333333333333336e-06, 'num_tokens': 61356.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	60%\|██████ \| 30/50 [00:32<00:12, 1.58it/s][A

	62%\|██████▏ \| 31/50 [00:32<00:11, 1.63it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 63440.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	62%\|██████▏ \| 31/50 [00:32<00:11, 1.63it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.1111111111111114e-06, 'num_tokens': 65488.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	64%\|██████▍ \| 32/50 [00:33<00:11, 1.63it/s][A

	66%\|██████▌ \| 33/50 [00:33<00:10, 1.67it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 67512.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


	66%\|██████▌ \| 33/50 [00:33<00:10, 1.67it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.888888888888889e-06, 'num_tokens': 69564.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


	68%\|██████▊ \| 34/50 [00:34<00:09, 1.67it/s][A

	70%\|███████ \| 35/50 [00:34<00:08, 1.70it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.777777777777778e-06, 'num_tokens': 71560.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


	70%\|███████ \| 35/50 [00:34<00:08, 1.70it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 73608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.18}


	72%\|███████▏ \| 36/50 [00:35<00:08, 1.70it/s][A

	74%\|███████▍ \| 37/50 [00:37<00:09, 1.35it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5555555555555558e-06, 'num_tokens': 75692.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.18}


	74%\|███████▍ \| 37/50 [00:37<00:09, 1.35it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4444444444444445e-06, 'num_tokens': 77740.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.19}


	76%\|███████▌ \| 38/50 [00:37<00:08, 1.35it/s][A

	78%\|███████▊ \| 39/50 [00:38<00:07, 1.45it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3333333333333334e-06, 'num_tokens': 79764.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2}


	78%\|███████▊ \| 39/50 [00:38<00:07, 1.45it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2222222222222223e-06, 'num_tokens': 81812.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2}


	80%\|████████ \| 40/50 [00:38<00:06, 1.45it/s][A

	82%\|████████▏ \| 41/50 [00:39<00:05, 1.53it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 83808.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2}


	82%\|████████▏ \| 41/50 [00:39<00:05, 1.53it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 85804.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.21}


	84%\|████████▍ \| 42/50 [00:39<00:05, 1.53it/s][A

	86%\|████████▌ \| 43/50 [00:40<00:04, 1.59it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.88888888888889e-07, 'num_tokens': 87888.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.21}


	86%\|████████▌ \| 43/50 [00:40<00:04, 1.59it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.777777777777779e-07, 'num_tokens': 89884.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.22}


	88%\|████████▊ \| 44/50 [00:41<00:03, 1.59it/s][A

	90%\|█████████ \| 45/50 [00:41<00:03, 1.64it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.666666666666667e-07, 'num_tokens': 91936.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23}


	90%\|█████████ \| 45/50 [00:41<00:03, 1.64it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 94020.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23}


	92%\|█████████▏\| 46/50 [00:42<00:02, 1.64it/s][A

	94%\|█████████▍\| 47/50 [00:42<00:01, 1.68it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444445e-07, 'num_tokens': 96016.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23}


	94%\|█████████▍\| 47/50 [00:42<00:01, 1.68it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333335e-07, 'num_tokens': 98064.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.24}


	96%\|█████████▌\| 48/50 [00:43<00:01, 1.68it/s][A

	98%\|█████████▊\| 49/50 [00:44<00:00, 1.48it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2222222222222224e-07, 'num_tokens': 100116.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.24}


	98%\|█████████▊\| 49/50 [00:44<00:00, 1.48it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1111111111111112e-07, 'num_tokens': 102168.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.25}


	100%\|██████████\| 50/50 [00:44<00:00, 1.48it/s][A


	[A{'train_runtime': 45.5636, 'train_samples_per_second': 4.389, 'train_steps_per_second': 1.097, 'train_loss': 0.0, 'epoch': 0.25}


	100%\|██████████\| 50/50 [00:45<00:00, 1.48it/s][A
	100%\|██████████\| 50/50 [00:45<00:00, 1.10it/s]
	Phase 1 done in 0.8 min

	[diagnostic] seed=100 raw completion (first 500 chars):
	<tool_call>
	1st-order: EV adoption surges, directly driving demand for GREEN energy and EV supply chains. 2nd-order: As EVs displace ICE vehicles, OIL demand faces structural headwinds over the 12-quarter cycle, forcing a long-term rotation away from fossil fuels. 3rd-order: The massive capital deployment into EV infrastructure acts as a massive liquidity pump, supporting TECH and REAL_ESTATE valuations. Base-rate: Today's news strongly signals a structural transition away from OIL and a green b
	[parse_action result]: metadata={} weights=[0.35, 0.05, 0.45, 0.1, 0.05] infra_commit=0.15 carbon_offset_buy=0.0 put_hedge=0.0 tech_bet='green_leaps'

	── Hold-out eval (5/5 valid) ──
	mean regret: -0.0037
	beat baseline: 4/5

	══ GRPO Phase 2: 8Q episodes, 100 iters, rewards=['format', 'regret', 'sharpe', 'drawdown'] ══
	Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it.
	Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it.
	==((====))== Unsloth - 2x faster free finetuning \| Num GPUs used = 1
	\\ /\| Num examples = 600 \| Num Epochs = 1 \| Total steps = 100
	O^O/ \_/ \ Batch size per device = 6 \| Gradient accumulation steps = 1
	\ / Data Parallel GPUs = 1 \| Total batch size (6 x 1 x 1) = 6
	"-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)


	0%\| \| 0/100 [00:00<?, ?it/s][AUnsloth: Will smartly offload gradients to save VRAM!


	1%\| \| 1/100 [00:05<08:43, 5.29s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 2994.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}


	1%\| \| 1/100 [00:05<08:43, 5.29s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.000000000000001e-07, 'num_tokens': 6066.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}


	2%\|▏ \| 2/100 [00:06<08:38, 5.29s/it][A

	3%\|▎ \| 3/100 [00:06<03:06, 1.92s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 9270.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	3%\|▎ \| 3/100 [00:06<03:06, 1.92s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5e-06, 'num_tokens': 12342.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	4%\|▍ \| 4/100 [00:07<03:04, 1.92s/it][A

	5%\|▌ \| 5/100 [00:08<02:05, 1.32s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 15576.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	5%\|▌ \| 5/100 [00:08<02:05, 1.32s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 18714.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	6%\|▌ \| 6/100 [00:09<02:04, 1.32s/it][A

	7%\|▋ \| 7/100 [00:09<01:39, 1.07s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 21702.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	7%\|▋ \| 7/100 [00:09<01:39, 1.07s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5e-06, 'num_tokens': 24738.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	8%\|▊ \| 8/100 [00:10<01:38, 1.07s/it][A

	9%\|▉ \| 9/100 [00:11<01:25, 1.06it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 27726.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	9%\|▉ \| 9/100 [00:11<01:25, 1.06it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5e-06, 'num_tokens': 30972.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	10%\|█ \| 10/100 [00:11<01:24, 1.06it/s][A

	11%\|█ \| 11/100 [00:12<01:18, 1.14it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 34098.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	11%\|█ \| 11/100 [00:12<01:18, 1.14it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.944444444444445e-06, 'num_tokens': 37176.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	12%\|█▏ \| 12/100 [00:13<01:17, 1.14it/s][A

	13%\|█▎ \| 13/100 [00:14<01:13, 1.19it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.888888888888889e-06, 'num_tokens': 40380.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	13%\|█▎ \| 13/100 [00:14<01:13, 1.19it/s][A

	13%\|█▎ \| 13/100 [00:28<01:13, 1.19it/s][A

	14%\|█▍ \| 14/100 [00:53<11:54, 8.31s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.833333333333333e-06, 'num_tokens': 43382.0, 'completions/mean_length': 3.3333334922790527, 'completions/min_length': 1.0, 'completions/max_length': 15.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.3333334922790527, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 15.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.3333334922790527, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	14%\|█▍ \| 14/100 [00:53<11:54, 8.31s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.777777777777778e-06, 'num_tokens': 46454.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	15%\|█▌ \| 15/100 [00:54<11:46, 8.31s/it][A

	16%\|█▌ \| 16/100 [00:55<07:52, 5.62s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.722222222222222e-06, 'num_tokens': 49448.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	16%\|█▌ \| 16/100 [00:55<07:52, 5.62s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.666666666666667e-06, 'num_tokens': 52526.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	17%\|█▋ \| 17/100 [00:56<07:46, 5.62s/it][A

	18%\|█▊ \| 18/100 [00:56<05:26, 3.98s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.611111111111112e-06, 'num_tokens': 55730.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	18%\|█▊ \| 18/100 [00:56<05:26, 3.98s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.555555555555556e-06, 'num_tokens': 58976.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	19%\|█▉ \| 19/100 [00:57<05:22, 3.98s/it][A

	20%\|██ \| 20/100 [00:58<03:54, 2.93s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5e-06, 'num_tokens': 62048.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	20%\|██ \| 20/100 [00:58<03:54, 2.93s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 65282.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	21%\|██ \| 21/100 [00:59<03:51, 2.93s/it][A

	22%\|██▏ \| 22/100 [00:59<02:54, 2.24s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.388888888888889e-06, 'num_tokens': 68354.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	22%\|██▏ \| 22/100 [00:59<02:54, 2.24s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.333333333333334e-06, 'num_tokens': 71588.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	23%\|██▎ \| 23/100 [01:00<02:52, 2.24s/it][A

	24%\|██▍ \| 24/100 [01:01<02:16, 1.79s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.277777777777778e-06, 'num_tokens': 74660.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	24%\|██▍ \| 24/100 [01:01<02:16, 1.79s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.222222222222223e-06, 'num_tokens': 77786.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	25%\|██▌ \| 25/100 [01:02<02:14, 1.79s/it][A

	26%\|██▌ \| 26/100 [01:03<01:55, 1.56s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.166666666666667e-06, 'num_tokens': 80858.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	26%\|██▌ \| 26/100 [01:03<01:55, 1.56s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.111111111111111e-06, 'num_tokens': 84062.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	27%\|██▋ \| 27/100 [01:04<01:53, 1.56s/it][A

	28%\|██▊ \| 28/100 [01:05<01:34, 1.31s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.055555555555556e-06, 'num_tokens': 87140.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	28%\|██▊ \| 28/100 [01:05<01:34, 1.31s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 90212.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	29%\|██▉ \| 29/100 [01:05<01:32, 1.31s/it][A

	30%\|███ \| 30/100 [01:06<01:19, 1.13s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.944444444444445e-06, 'num_tokens': 93248.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	30%\|███ \| 30/100 [01:06<01:19, 1.13s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 96236.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	31%\|███ \| 31/100 [01:07<01:18, 1.13s/it][A

	32%\|███▏ \| 32/100 [01:07<01:08, 1.01s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.833333333333334e-06, 'num_tokens': 99482.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	32%\|███▏ \| 32/100 [01:07<01:08, 1.01s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.777777777777778e-06, 'num_tokens': 102608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	33%\|███▎ \| 33/100 [01:08<01:07, 1.01s/it][A

	34%\|███▍ \| 34/100 [01:09<01:01, 1.08it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7222222222222225e-06, 'num_tokens': 105596.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	34%\|███▍ \| 34/100 [01:09<01:01, 1.08it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6666666666666666e-06, 'num_tokens': 108668.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	35%\|███▌ \| 35/100 [01:10<01:00, 1.08it/s][A

	36%\|███▌ \| 36/100 [01:10<00:55, 1.15it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6111111111111115e-06, 'num_tokens': 111656.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	36%\|███▌ \| 36/100 [01:10<00:55, 1.15it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.555555555555556e-06, 'num_tokens': 114650.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	37%\|███▋ \| 37/100 [01:11<00:54, 1.15it/s][A

	38%\|███▊ \| 38/100 [01:12<00:51, 1.21it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5e-06, 'num_tokens': 117728.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	38%\|███▊ \| 38/100 [01:12<00:51, 1.21it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.444444444444445e-06, 'num_tokens': 120764.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	39%\|███▉ \| 39/100 [01:13<00:50, 1.21it/s][A

	40%\|████ \| 40/100 [01:13<00:48, 1.24it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3888888888888893e-06, 'num_tokens': 123998.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	40%\|████ \| 40/100 [01:13<00:48, 1.24it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 127034.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	41%\|████ \| 41/100 [01:14<00:47, 1.24it/s][A

	42%\|████▏ \| 42/100 [01:15<00:45, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.277777777777778e-06, 'num_tokens': 130160.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	42%\|████▏ \| 42/100 [01:15<00:45, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2222222222222227e-06, 'num_tokens': 133316.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	43%\|████▎ \| 43/100 [01:16<00:44, 1.28it/s][A

	44%\|████▍ \| 44/100 [01:16<00:43, 1.29it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1666666666666667e-06, 'num_tokens': 136550.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	44%\|████▍ \| 44/100 [01:16<00:43, 1.29it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1111111111111116e-06, 'num_tokens': 139622.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	45%\|████▌ \| 45/100 [01:17<00:42, 1.29it/s][A

	46%\|████▌ \| 46/100 [01:18<00:41, 1.31it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.055555555555556e-06, 'num_tokens': 142826.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	46%\|████▌ \| 46/100 [01:18<00:41, 1.31it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 145814.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	47%\|████▋ \| 47/100 [01:19<00:40, 1.31it/s][A

	48%\|████▊ \| 48/100 [01:19<00:39, 1.32it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.944444444444445e-06, 'num_tokens': 149060.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	48%\|████▊ \| 48/100 [01:19<00:39, 1.32it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.888888888888889e-06, 'num_tokens': 152054.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	49%\|████▉ \| 49/100 [01:20<00:38, 1.32it/s][A

	50%\|█████ \| 50/100 [01:21<00:37, 1.33it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.8333333333333335e-06, 'num_tokens': 155288.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	50%\|█████ \| 50/100 [01:21<00:37, 1.33it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 158426.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	51%\|█████ \| 51/100 [01:22<00:36, 1.33it/s][A

	52%\|█████▏ \| 52/100 [01:23<00:40, 1.20it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7222222222222224e-06, 'num_tokens': 161414.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	52%\|█████▏ \| 52/100 [01:23<00:40, 1.20it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.666666666666667e-06, 'num_tokens': 164552.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	53%\|█████▎ \| 53/100 [01:24<00:39, 1.20it/s][A

	54%\|█████▍ \| 54/100 [01:24<00:37, 1.23it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.6111111111111113e-06, 'num_tokens': 167756.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	54%\|█████▍ \| 54/100 [01:24<00:37, 1.23it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5555555555555557e-06, 'num_tokens': 170828.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	55%\|█████▌ \| 55/100 [01:25<00:36, 1.23it/s][A

	56%\|█████▌ \| 56/100 [01:26<00:34, 1.27it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 173900.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	56%\|█████▌ \| 56/100 [01:26<00:34, 1.27it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4444444444444447e-06, 'num_tokens': 177056.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	57%\|█████▋ \| 57/100 [01:27<00:33, 1.27it/s][A

	58%\|█████▊ \| 58/100 [01:27<00:32, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.388888888888889e-06, 'num_tokens': 180302.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	58%\|█████▊ \| 58/100 [01:27<00:32, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.3333333333333336e-06, 'num_tokens': 183290.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	59%\|█████▉ \| 59/100 [01:28<00:31, 1.28it/s][A

	60%\|██████ \| 60/100 [01:29<00:31, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.277777777777778e-06, 'num_tokens': 186326.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	60%\|██████ \| 60/100 [01:29<00:31, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 189398.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	61%\|██████ \| 61/100 [01:30<00:30, 1.28it/s][A

	62%\|██████▏ \| 62/100 [01:30<00:29, 1.31it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.166666666666667e-06, 'num_tokens': 192470.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	62%\|██████▏ \| 62/100 [01:30<00:29, 1.31it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.1111111111111114e-06, 'num_tokens': 195608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	63%\|██████▎ \| 63/100 [01:31<00:28, 1.31it/s][A

	64%\|██████▍ \| 64/100 [01:32<00:27, 1.31it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0555555555555555e-06, 'num_tokens': 198734.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	64%\|██████▍ \| 64/100 [01:32<00:27, 1.31it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 201806.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	65%\|██████▌ \| 65/100 [01:33<00:26, 1.31it/s][A

	66%\|██████▌ \| 66/100 [01:33<00:25, 1.33it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.944444444444445e-06, 'num_tokens': 204794.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	66%\|██████▌ \| 66/100 [01:33<00:25, 1.33it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.888888888888889e-06, 'num_tokens': 207830.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	67%\|██████▋ \| 67/100 [01:34<00:24, 1.33it/s][A

	68%\|██████▊ \| 68/100 [01:35<00:23, 1.34it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8333333333333333e-06, 'num_tokens': 210902.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	68%\|██████▊ \| 68/100 [01:35<00:23, 1.34it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.777777777777778e-06, 'num_tokens': 213974.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	69%\|██████▉ \| 69/100 [01:36<00:23, 1.34it/s][A

	70%\|███████ \| 70/100 [01:36<00:22, 1.35it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.7222222222222224e-06, 'num_tokens': 217046.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	70%\|███████ \| 70/100 [01:36<00:22, 1.35it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 220292.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	71%\|███████ \| 71/100 [01:37<00:21, 1.35it/s][A

	72%\|███████▏ \| 72/100 [01:38<00:20, 1.34it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6111111111111113e-06, 'num_tokens': 223280.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	72%\|███████▏ \| 72/100 [01:38<00:20, 1.34it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5555555555555558e-06, 'num_tokens': 226358.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	73%\|███████▎ \| 73/100 [01:39<00:20, 1.34it/s][A

	74%\|███████▍ \| 74/100 [01:39<00:19, 1.34it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5e-06, 'num_tokens': 229496.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	74%\|███████▍ \| 74/100 [01:39<00:19, 1.34it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4444444444444445e-06, 'num_tokens': 232730.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	75%\|███████▌ \| 75/100 [01:40<00:18, 1.34it/s][A

	76%\|███████▌ \| 76/100 [01:41<00:19, 1.20it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888892e-06, 'num_tokens': 235802.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	76%\|███████▌ \| 76/100 [01:41<00:19, 1.20it/s][A

	76%\|███████▌ \| 76/100 [01:52<00:19, 1.20it/s][A

	77%\|███████▋ \| 77/100 [02:19<02:47, 7.28s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3333333333333334e-06, 'num_tokens': 238797.0, 'completions/mean_length': 2.1666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 8.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 2.1666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 8.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 2.1666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	77%\|███████▋ \| 77/100 [02:19<02:47, 7.28s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2777777777777779e-06, 'num_tokens': 241935.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	78%\|███████▊ \| 78/100 [02:19<02:40, 7.28s/it][A

	79%\|███████▉ \| 79/100 [02:20<01:46, 5.09s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2222222222222223e-06, 'num_tokens': 244923.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	79%\|███████▉ \| 79/100 [02:20<01:46, 5.09s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1666666666666668e-06, 'num_tokens': 247995.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	80%\|████████ \| 80/100 [02:21<01:41, 5.09s/it][A

	81%\|████████ \| 81/100 [02:22<01:09, 3.68s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 251067.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	81%\|████████ \| 81/100 [02:22<01:09, 3.68s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0555555555555557e-06, 'num_tokens': 254145.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	82%\|████████▏ \| 82/100 [02:22<01:06, 3.68s/it][A

	83%\|████████▎ \| 83/100 [02:23<00:46, 2.75s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 257181.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	83%\|████████▎ \| 83/100 [02:23<00:46, 2.75s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.444444444444445e-07, 'num_tokens': 260253.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	84%\|████████▍ \| 84/100 [02:24<00:44, 2.75s/it][A

	85%\|████████▌ \| 85/100 [02:25<00:31, 2.13s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.88888888888889e-07, 'num_tokens': 263247.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	85%\|████████▌ \| 85/100 [02:25<00:31, 2.13s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.333333333333333e-07, 'num_tokens': 266451.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	86%\|████████▌ \| 86/100 [02:26<00:29, 2.13s/it][A

	87%\|████████▋ \| 87/100 [02:26<00:22, 1.73s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.777777777777779e-07, 'num_tokens': 269439.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	87%\|████████▋ \| 87/100 [02:26<00:22, 1.73s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.222222222222222e-07, 'num_tokens': 272427.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	88%\|████████▊ \| 88/100 [02:27<00:20, 1.73s/it][A

	89%\|████████▉ \| 89/100 [02:28<00:15, 1.43s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.666666666666667e-07, 'num_tokens': 275499.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	89%\|████████▉ \| 89/100 [02:28<00:15, 1.43s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.111111111111112e-07, 'num_tokens': 278655.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	90%\|█████████ \| 90/100 [02:29<00:14, 1.43s/it][A

	91%\|█████████ \| 91/100 [02:29<00:10, 1.22s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 281691.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	91%\|█████████ \| 91/100 [02:29<00:10, 1.22s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.000000000000001e-07, 'num_tokens': 284925.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	92%\|█████████▏\| 92/100 [02:30<00:09, 1.22s/it][A

	93%\|█████████▎\| 93/100 [02:31<00:07, 1.08s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444445e-07, 'num_tokens': 287913.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	93%\|█████████▎\| 93/100 [02:31<00:07, 1.08s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.8888888888888895e-07, 'num_tokens': 290985.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	94%\|█████████▍\| 94/100 [02:32<00:06, 1.08s/it][A

	95%\|█████████▌\| 95/100 [02:32<00:04, 1.02it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333335e-07, 'num_tokens': 294123.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	95%\|█████████▌\| 95/100 [02:32<00:04, 1.02it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777776e-07, 'num_tokens': 297357.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	96%\|█████████▌\| 96/100 [02:33<00:03, 1.02it/s][A

	96%\|█████████▌\| 96/100 [02:44<00:03, 1.02it/s][A

	97%\|█████████▋\| 97/100 [03:05<00:16, 5.61s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2222222222222224e-07, 'num_tokens': 300361.0, 'completions/mean_length': 3.6666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 17.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.6666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 17.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.6666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	97%\|█████████▋\| 97/100 [03:05<00:16, 5.61s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666668e-07, 'num_tokens': 303517.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	98%\|█████████▊\| 98/100 [03:06<00:11, 5.61s/it][A

	99%\|█████████▉\| 99/100 [03:06<00:04, 4.15s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1111111111111112e-07, 'num_tokens': 306721.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


	99%\|█████████▉\| 99/100 [03:06<00:04, 4.15s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555556e-08, 'num_tokens': 309859.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


	100%\|██████████\| 100/100 [03:07<00:00, 4.15s/it][A


	[A{'train_runtime': 188.2628, 'train_samples_per_second': 3.187, 'train_steps_per_second': 0.531, 'train_loss': 0.0, 'epoch': 0.17}


	100%\|██████████\| 100/100 [03:08<00:00, 4.15s/it][A
	100%\|██████████\| 100/100 [03:08<00:00, 1.88s/it]
	Phase 2 done in 3.1 min

	[diagnostic] seed=100 raw completion (first 500 chars):
	<tool_call>
	1st-order: Insurers exiting Florida and California triggers a massive flight-to-safety, driving 10-year Treasuries down and freezing municipal bonds. 2nd-order: The freeze in municipal bonds directly crushes the yield curve, making long-duration BONDS a dead asset over the next 12 quarters. 3rd-order: The physical loss of insurance capital in the Gulf Coast and Bay Area will eventually trigger a broader real estate market correction, severely hurting REAL_ESTATE. Base case: Deflation
	[parse_action result]: metadata={} weights=[0.2, 0.05, 0.05, 0.0, 0.7] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary'

	── Hold-out eval (5/5 valid) ──
	mean regret: -0.0391
	beat baseline: 2/5

	══ GRPO Phase 3: 12Q episodes, 80 iters, rewards=['format', 'regret', 'sharpe', 'drawdown', 'carbon'] ══
	Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it.
	Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it.
	==((====))== Unsloth - 2x faster free finetuning \| Num GPUs used = 1
	\\ /\| Num examples = 480 \| Num Epochs = 1 \| Total steps = 80
	O^O/ \_/ \ Batch size per device = 6 \| Gradient accumulation steps = 1
	\ / Data Parallel GPUs = 1 \| Total batch size (6 x 1 x 1) = 6
	"-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)


	0%\| \| 0/80 [00:00<?, ?it/s][AUnsloth: Will smartly offload gradients to save VRAM!



	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 3216.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}


	1%\|▏ \| 1/80 [00:00<01:04, 1.22it/s][A

	2%\|▎ \| 2/80 [00:01<01:00, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.25e-07, 'num_tokens': 6288.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}


	2%\|▎ \| 2/80 [00:01<01:00, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.25e-06, 'num_tokens': 9426.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	4%\|▍ \| 3/80 [00:02<01:00, 1.28it/s][A

	5%\|▌ \| 4/80 [00:03<00:58, 1.31it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8750000000000003e-06, 'num_tokens': 12564.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	5%\|▌ \| 4/80 [00:03<00:58, 1.31it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 15810.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	6%\|▋ \| 5/80 [00:03<00:57, 1.31it/s][A

	8%\|▊ \| 6/80 [00:11<02:47, 2.27s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.125e-06, 'num_tokens': 18807.0, 'completions/mean_length': 69.16667175292969, 'completions/min_length': 1.0, 'completions/max_length': 400.0, 'completions/clipped_ratio': 0.16666666666666663, 'completions/mean_terminated_length': 3.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 11.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 69.16667175292969, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	8%\|▊ \| 6/80 [00:11<02:47, 2.27s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7500000000000005e-06, 'num_tokens': 22041.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}


	9%\|▉ \| 7/80 [00:11<02:45, 2.27s/it][A

	10%\|█ \| 8/80 [00:12<02:00, 1.67s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.3750000000000005e-06, 'num_tokens': 25287.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	10%\|█ \| 8/80 [00:12<02:00, 1.67s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 28413.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	11%\|█▏ \| 9/80 [00:13<01:58, 1.67s/it][A

	12%\|█▎ \| 10/80 [00:14<01:33, 1.34s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.930555555555556e-06, 'num_tokens': 31629.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	12%\|█▎ \| 10/80 [00:14<01:33, 1.34s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.861111111111111e-06, 'num_tokens': 34863.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}


	14%\|█▍ \| 11/80 [00:14<01:32, 1.34s/it][A

	15%\|█▌ \| 12/80 [00:15<01:17, 1.13s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.791666666666668e-06, 'num_tokens': 37851.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	15%\|█▌ \| 12/80 [00:15<01:17, 1.13s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.722222222222222e-06, 'num_tokens': 40923.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	16%\|█▋ \| 13/80 [00:16<01:16, 1.13s/it][A

	18%\|█▊ \| 14/80 [00:17<01:06, 1.00s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.652777777777779e-06, 'num_tokens': 44061.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	18%\|█▊ \| 14/80 [00:17<01:06, 1.00s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.583333333333333e-06, 'num_tokens': 47295.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	19%\|█▉ \| 15/80 [00:17<01:05, 1.00s/it][A

	20%\|██ \| 16/80 [00:18<00:58, 1.09it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5138888888888895e-06, 'num_tokens': 50283.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}


	20%\|██ \| 16/80 [00:18<00:58, 1.09it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 53361.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	21%\|██▏ \| 17/80 [00:19<00:57, 1.09it/s][A

	22%\|██▎ \| 18/80 [00:20<00:53, 1.16it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.3750000000000005e-06, 'num_tokens': 56577.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	22%\|██▎ \| 18/80 [00:20<00:53, 1.16it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.305555555555556e-06, 'num_tokens': 59565.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	24%\|██▍ \| 19/80 [00:20<00:52, 1.16it/s][A

	25%\|██▌ \| 20/80 [00:21<00:52, 1.15it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.236111111111111e-06, 'num_tokens': 62571.0, 'completions/mean_length': 4.0, 'completions/min_length': 1.0, 'completions/max_length': 19.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 4.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 19.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 4.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	25%\|██▌ \| 20/80 [00:21<00:52, 1.15it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.166666666666667e-06, 'num_tokens': 65787.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}


	26%\|██▋ \| 21/80 [00:23<00:51, 1.15it/s][A

	28%\|██▊ \| 22/80 [00:23<00:53, 1.08it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.097222222222222e-06, 'num_tokens': 69003.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	28%\|██▊ \| 22/80 [00:23<00:53, 1.08it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.027777777777779e-06, 'num_tokens': 72141.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	29%\|██▉ \| 23/80 [00:24<00:52, 1.08it/s][A

	30%\|███ \| 24/80 [00:25<00:48, 1.15it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.958333333333333e-06, 'num_tokens': 75213.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	30%\|███ \| 24/80 [00:25<00:48, 1.15it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 78447.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	31%\|███▏ \| 25/80 [00:26<00:47, 1.15it/s][A

	32%\|███▎ \| 26/80 [00:27<00:45, 1.19it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.819444444444444e-06, 'num_tokens': 81675.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}


	32%\|███▎ \| 26/80 [00:27<00:45, 1.19it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7500000000000005e-06, 'num_tokens': 84813.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	34%\|███▍ \| 27/80 [00:27<00:44, 1.19it/s][A

	35%\|███▌ \| 28/80 [00:28<00:43, 1.20it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.680555555555556e-06, 'num_tokens': 87808.0, 'completions/mean_length': 2.1666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 8.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 2.1666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 8.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 2.1666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	35%\|███▌ \| 28/80 [00:28<00:43, 1.20it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6111111111111115e-06, 'num_tokens': 90880.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	36%\|███▋ \| 29/80 [00:29<00:42, 1.20it/s][A

	38%\|███▊ \| 30/80 [00:30<00:40, 1.25it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5416666666666673e-06, 'num_tokens': 94006.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	38%\|███▊ \| 30/80 [00:30<00:40, 1.25it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.4722222222222224e-06, 'num_tokens': 97252.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}


	39%\|███▉ \| 31/80 [00:30<00:39, 1.25it/s][A

	40%\|████ \| 32/80 [00:31<00:37, 1.26it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.4027777777777783e-06, 'num_tokens': 100468.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	40%\|████ \| 32/80 [00:31<00:37, 1.26it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 103672.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	41%\|████▏ \| 33/80 [00:32<00:37, 1.26it/s][A

	42%\|████▎ \| 34/80 [00:33<00:37, 1.22it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2638888888888892e-06, 'num_tokens': 106689.0, 'completions/mean_length': 5.833333492279053, 'completions/min_length': 1.0, 'completions/max_length': 16.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 5.833333492279053, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 16.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 5.833333492279053, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	42%\|████▎ \| 34/80 [00:33<00:37, 1.22it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1944444444444443e-06, 'num_tokens': 109761.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	44%\|████▍ \| 35/80 [00:34<00:36, 1.22it/s][A

	44%\|████▍ \| 35/80 [00:48<00:36, 1.22it/s][A

	45%\|████▌ \| 36/80 [01:04<03:50, 5.24s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.125e-06, 'num_tokens': 112761.0, 'completions/mean_length': 3.0, 'completions/min_length': 1.0, 'completions/max_length': 13.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 13.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}


	45%\|████▌ \| 36/80 [01:04<03:50, 5.24s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.055555555555556e-06, 'num_tokens': 115995.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	46%\|████▋ \| 37/80 [01:05<03:45, 5.24s/it][A

	48%\|████▊ \| 38/80 [01:05<02:43, 3.89s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.986111111111111e-06, 'num_tokens': 119073.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	48%\|████▊ \| 38/80 [01:05<02:43, 3.89s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.916666666666667e-06, 'num_tokens': 122319.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	49%\|████▉ \| 39/80 [01:06<02:39, 3.89s/it][A

	50%\|█████ \| 40/80 [01:07<01:58, 2.95s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.8472222222222224e-06, 'num_tokens': 125523.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}


	50%\|█████ \| 40/80 [01:07<01:58, 2.95s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 128559.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	51%\|█████▏ \| 41/80 [01:08<01:55, 2.95s/it][A

	52%\|█████▎ \| 42/80 [01:09<01:30, 2.37s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7083333333333334e-06, 'num_tokens': 131547.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	52%\|█████▎ \| 42/80 [01:09<01:30, 2.37s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.6388888888888893e-06, 'num_tokens': 134751.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	54%\|█████▍ \| 43/80 [01:10<01:27, 2.37s/it][A

	55%\|█████▌ \| 44/80 [01:11<01:07, 1.89s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5694444444444443e-06, 'num_tokens': 137967.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	55%\|█████▌ \| 44/80 [01:11<01:07, 1.89s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 141171.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}


	56%\|█████▋ \| 45/80 [01:11<01:06, 1.89s/it][A

	57%\|█████▊ \| 46/80 [01:12<00:52, 1.55s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4305555555555557e-06, 'num_tokens': 144309.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	57%\|█████▊ \| 46/80 [01:12<00:52, 1.55s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.361111111111111e-06, 'num_tokens': 147609.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	59%\|█████▉ \| 47/80 [01:13<00:51, 1.55s/it][A

	60%\|██████ \| 48/80 [01:14<00:42, 1.31s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2916666666666666e-06, 'num_tokens': 150849.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	60%\|██████ \| 48/80 [01:14<00:42, 1.31s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 154065.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	61%\|██████▏ \| 49/80 [01:14<00:40, 1.31s/it][A

	62%\|██████▎ \| 50/80 [01:15<00:34, 1.15s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.152777777777778e-06, 'num_tokens': 157311.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}


	62%\|██████▎ \| 50/80 [01:15<00:34, 1.15s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0833333333333334e-06, 'num_tokens': 160551.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	64%\|██████▍ \| 51/80 [01:16<00:33, 1.15s/it][A

	65%\|██████▌ \| 52/80 [01:17<00:28, 1.03s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0138888888888893e-06, 'num_tokens': 163623.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	65%\|██████▌ \| 52/80 [01:17<00:28, 1.03s/it][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.944444444444445e-06, 'num_tokens': 166863.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	66%\|██████▋ \| 53/80 [01:17<00:27, 1.03s/it][A

	68%\|██████▊ \| 54/80 [01:18<00:24, 1.06it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8750000000000003e-06, 'num_tokens': 170079.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	68%\|██████▊ \| 54/80 [01:18<00:24, 1.06it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8055555555555557e-06, 'num_tokens': 173319.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}


	69%\|██████▉ \| 55/80 [01:19<00:23, 1.06it/s][A

	70%\|███████ \| 56/80 [01:20<00:21, 1.12it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.7361111111111112e-06, 'num_tokens': 176457.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	70%\|███████ \| 56/80 [01:20<00:21, 1.12it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 179595.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	71%\|███████▏ \| 57/80 [01:20<00:20, 1.12it/s][A

	72%\|███████▎ \| 58/80 [01:21<00:18, 1.18it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5972222222222221e-06, 'num_tokens': 182799.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	72%\|███████▎ \| 58/80 [01:21<00:18, 1.18it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.527777777777778e-06, 'num_tokens': 185937.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	74%\|███████▍ \| 59/80 [01:22<00:17, 1.18it/s][A

	75%\|███████▌ \| 60/80 [01:23<00:16, 1.22it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4583333333333335e-06, 'num_tokens': 189009.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}


	75%\|███████▌ \| 60/80 [01:23<00:16, 1.22it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888892e-06, 'num_tokens': 192255.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	76%\|███████▋ \| 61/80 [01:24<00:15, 1.22it/s][A

	78%\|███████▊ \| 62/80 [01:25<00:15, 1.13it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3194444444444446e-06, 'num_tokens': 195483.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	78%\|███████▊ \| 62/80 [01:25<00:15, 1.13it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.25e-06, 'num_tokens': 198555.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	79%\|███████▉ \| 63/80 [01:25<00:15, 1.13it/s][A

	80%\|████████ \| 64/80 [01:26<00:13, 1.19it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1805555555555556e-06, 'num_tokens': 201633.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}


	80%\|████████ \| 64/80 [01:26<00:13, 1.19it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 204636.0, 'completions/mean_length': 3.5, 'completions/min_length': 1.0, 'completions/max_length': 16.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.5, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 16.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.5, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	81%\|████████▏ \| 65/80 [01:27<00:12, 1.19it/s][A

	82%\|████████▎ \| 66/80 [01:28<00:11, 1.18it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0416666666666667e-06, 'num_tokens': 207708.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	82%\|████████▎ \| 66/80 [01:28<00:11, 1.18it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.722222222222224e-07, 'num_tokens': 211008.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	84%\|████████▍ \| 67/80 [01:29<00:11, 1.18it/s][A

	85%\|████████▌ \| 68/80 [01:29<00:09, 1.22it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.027777777777779e-07, 'num_tokens': 213996.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	85%\|████████▌ \| 68/80 [01:29<00:09, 1.22it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.333333333333333e-07, 'num_tokens': 216998.0, 'completions/mean_length': 3.3333334922790527, 'completions/min_length': 1.0, 'completions/max_length': 15.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.3333334922790527, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 15.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.3333334922790527, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}


	86%\|████████▋ \| 69/80 [01:30<00:09, 1.22it/s][A

	88%\|████████▊ \| 70/80 [01:31<00:08, 1.20it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.63888888888889e-07, 'num_tokens': 220124.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	88%\|████████▊ \| 70/80 [01:31<00:08, 1.20it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.944444444444446e-07, 'num_tokens': 223160.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	89%\|████████▉ \| 71/80 [01:32<00:07, 1.20it/s][A

	90%\|█████████ \| 72/80 [01:33<00:06, 1.24it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.25e-07, 'num_tokens': 226238.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	90%\|█████████ \| 72/80 [01:33<00:06, 1.24it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 229478.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	91%\|█████████▏\| 73/80 [01:33<00:05, 1.24it/s][A

	92%\|█████████▎\| 74/80 [01:34<00:04, 1.26it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.861111111111112e-07, 'num_tokens': 232706.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}


	92%\|█████████▎\| 74/80 [01:34<00:04, 1.26it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.1666666666666667e-07, 'num_tokens': 235844.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	94%\|█████████▍\| 75/80 [01:35<00:03, 1.26it/s][A

	95%\|█████████▌\| 76/80 [01:36<00:03, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.472222222222223e-07, 'num_tokens': 238922.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	95%\|█████████▌\| 76/80 [01:36<00:03, 1.28it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777776e-07, 'num_tokens': 241994.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	96%\|█████████▋\| 77/80 [01:36<00:02, 1.28it/s][A

	98%\|█████████▊\| 78/80 [01:37<00:01, 1.29it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0833333333333333e-07, 'num_tokens': 245240.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	98%\|█████████▊\| 78/80 [01:37<00:01, 1.29it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888888e-07, 'num_tokens': 248396.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}


	99%\|█████████▉\| 79/80 [01:38<00:00, 1.29it/s][A

	100%\|██████████\| 80/80 [01:39<00:00, 1.30it/s][A


	[A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.944444444444444e-08, 'num_tokens': 251534.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}


	100%\|██████████\| 80/80 [01:39<00:00, 1.30it/s][A


	[A{'train_runtime': 100.0827, 'train_samples_per_second': 4.796, 'train_steps_per_second': 0.799, 'train_loss': 0.0, 'epoch': 0.17}


	100%\|██████████\| 80/80 [01:40<00:00, 1.30it/s][A
	100%\|██████████\| 80/80 [01:40<00:00, 1.25s/it]
	Phase 3 done in 1.7 min

	[diagnostic] seed=100 raw completion (first 500 chars):
	<tool_call>
	1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the massive overcapacity in the oil sector will be rapidly absorbed by industrial demand, driving a structural inflationary squeeze. This stagflationary regime will crush BONDS and compress REAL_ESTATE valuations. 3rd-order: The forced lo
	[parse_action result]: metadata={} weights=[0.09523809523809523, 0.42857142857142855, 0.047619047619047616, 0.09523809523809523, 0.3333333333333333] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary'

	── Hold-out eval (5/5 valid) ──
	mean regret: -0.0941
	beat baseline: 3/5
	Found HuggingFace hub cache directory: /tmp/CarbonAlpha/hf_cache/hub
	Checking cache directory for required files...


	Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 0%\| \| 0/2 [00:00<?, ?it/s][A

	Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 100%\|██████████\| 2/2 [00:01<00:00, 1.37it/s][A
	Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 100%\|██████████\| 2/2 [00:01<00:00, 1.37it/s]
	Successfully copied all 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`
	Checking cache directory for required files...
	Cache check failed: tokenizer.model not found in local cache.
	Not all required files found in cache. Will proceed with downloading.


	Unsloth: Preparing safetensor model files: 0%\| \| 0/2 [00:00<?, ?it/s][A
	Unsloth: Preparing safetensor model files: 100%\|██████████\| 2/2 [00:00<00:00, 60787.01it/s]


	Unsloth: Merging weights into 16bit: 0%\| \| 0/2 [00:00<?, ?it/s][A

	Unsloth: Merging weights into 16bit: 50%\|█████ \| 1/2 [00:31<00:31, 31.55s/it][A

	Unsloth: Merging weights into 16bit: 100%\|██████████\| 2/2 [00:55<00:00, 26.86s/it][A
	Unsloth: Merging weights into 16bit: 100%\|██████████\| 2/2 [00:55<00:00, 27.56s/it]
	Unsloth: Merge process complete. Saved to `/tmp/CarbonAlpha/checkpoints/final_merged`

	Saved LoRA adapters to /tmp/CarbonAlpha/checkpoints/final_merged
	[rank0]:[W425 10:13:19.025103781 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())