77ethers's picture
v2_phase_all_run2: training log
754ff58 verified
πŸ¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning.
πŸ¦₯ Unsloth Zoo will now patch everything to make training faster!
Loading unsloth/Qwen3-4B-Instruct-2507...
INFO 04-25 09:59:05 [vllm_utils.py:724] Unsloth: Patching vLLM v1 graph capture
==((====))== Unsloth 2026.4.8: Fast Qwen3 patching. Transformers: 4.56.2. vLLM: 0.15.1.
\\ /| NVIDIA L40S. Num GPUs = 1. Max memory: 44.392 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.9.1+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.1
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: FlashInfer requires JIT compilation but nvcc (CUDA compiler) is not found.
vLLM will use FLASH_ATTN attention + PyTorch sampler instead (works fine).
To enable FlashInfer, install the missing tools:
nvcc - install the CUDA toolkit or set CUDA_HOME to your CUDA installation
ninja - pip install ninja
To silence this warning: set UNSLOTH_VLLM_NO_FLASHINFER=1
Unsloth: vLLM loading unsloth/Qwen3-4B-Instruct-2507 with actual GPU utilization = 89.06%
Unsloth: Your GPU has CUDA compute capability 8.9 with VRAM = 44.39 GB.
Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 4096. Num Sequences = 96.
Unsloth: vLLM's KV Cache can use up to 32.5 GB. Also swap space = 6 GB.
Unsloth: Not an error, but `use_cudagraph` is not supported in vLLM.config.CompilationConfig. Skipping.
Unsloth: Not an error, but `use_inductor` is not supported in vLLM.config.CompilationConfig. Skipping.
WARNING 04-25 09:59:07 [compilation.py:762] Level is deprecated and will be removed in the next release,either 0.12.0 or 0.11.2 whichever is soonest.Use mode instead.If both level and mode are given,only mode will be used.
Unsloth: Not an error, but `device` is not supported in vLLM. Skipping.
/root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/pydantic/type_adapter.py:607: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [field_name='mode', input_value=3, input_type=int])
return self.serializer.to_python(
INFO 04-25 09:59:07 [utils.py:261] non-default args: {'dtype': torch.bfloat16, 'max_model_len': 4096, 'enable_prefix_caching': True, 'swap_space': 6, 'gpu_memory_utilization': 0.8906117106477057, 'max_num_batched_tokens': 8192, 'max_num_seqs': 96, 'max_logprobs': 0, 'disable_log_stats': True, 'enable_lora': True, 'enable_chunked_prefill': True, 'compilation_config': {'level': 3, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': [], 'splitting_ops': None, 'compile_mm_encoder': False, 'compile_sizes': None, 'compile_ranges_split_points': None, 'inductor_compile_config': {'epilogue_fusion': True, 'max_autotune': False, 'shape_padding': True, 'trace.enabled': False, 'triton.cudagraphs': False, 'debug': False, 'dce': True, 'memory_planning': True, 'coordinate_descent_tuning': False, 'trace.graph_diagram': False, 'compile_threads': 8, 'group_fusion': True, 'disable_progress': False, 'verbose_progress': True, 'triton.multi_kernel': 0, 'triton.use_block_ptr': True, 'triton.enable_persistent_tma_matmul': True, 'triton.autotune_at_compile_time': False, 'triton.cooperative_reductions': False, 'cuda.compile_opt_level': '-O2', 'cuda.enable_cuda_lto': True, 'combo_kernels': False, 'benchmark_combo_kernel': True, 'combo_kernel_foreach_dynamic_shapes': True, 'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': None, 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': None, 'pass_config': {}, 'max_cudagraph_capture_size': None, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}, 'model': 'unsloth/Qwen3-4B-Instruct-2507'}
WARNING 04-25 09:59:07 [arg_utils.py:1220] The global random seed is set to 0. Since VLLM_ENABLE_V1_MULTIPROCESSING is set to False, this may affect the random state of the Python process that launched vLLM.
INFO 04-25 09:59:14 [model.py:541] Resolved architecture: Qwen3ForCausalLM
INFO 04-25 09:59:14 [model.py:1561] Using max model len 4096
INFO 04-25 09:59:15 [scheduler.py:226] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 04-25 09:59:15 [vllm.py:624] Asynchronous scheduling is enabled.
generation_config.json: 0%| | 0.00/237 [00:00<?, ?B/s]
generation_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 237/237 [00:00<00:00, 1.79MB/s]
tokenizer_config.json: 0%| | 0.00/9.65k [00:00<?, ?B/s]
tokenizer_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.65k/9.65k [00:00<00:00, 60.1MB/s]
vocab.json: 0%| | 0.00/2.78M [00:00<?, ?B/s]
vocab.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.78M/2.78M [00:00<00:00, 53.7MB/s]
merges.txt: 0%| | 0.00/1.67M [00:00<?, ?B/s]
merges.txt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.67M/1.67M [00:00<00:00, 83.0MB/s]
tokenizer.json: 0%| | 0.00/11.4M [00:00<?, ?B/s]
tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11.4M/11.4M [00:00<00:00, 44.7MB/s]
added_tokens.json: 0%| | 0.00/707 [00:00<?, ?B/s]
added_tokens.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 707/707 [00:00<00:00, 7.32MB/s]
special_tokens_map.json: 0%| | 0.00/614 [00:00<?, ?B/s]
special_tokens_map.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 614/614 [00:00<00:00, 3.17MB/s]
chat_template.jinja: 0%| | 0.00/4.04k [00:00<?, ?B/s]
chat_template.jinja: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.04k/4.04k [00:00<00:00, 43.4MB/s]
/root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/pydantic/type_adapter.py:607: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [field_name='mode', input_value=3, input_type=int])
return self.serializer.to_python(
INFO 04-25 09:59:16 [core.py:96] Initializing a V1 LLM engine (v0.15.1) with config: model='unsloth/Qwen3-4B-Instruct-2507', speculative_config=None, tokenizer='unsloth/Qwen3-4B-Instruct-2507', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=unsloth/Qwen3-4B-Instruct-2507, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': 3, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [8192], 'inductor_compile_config': {'epilogue_fusion': True, 'max_autotune': False, 'shape_padding': True, 'trace.enabled': False, 'triton.cudagraphs': False, 'debug': False, 'dce': True, 'memory_planning': True, 'coordinate_descent_tuning': False, 'trace.graph_diagram': False, 'compile_threads': 8, 'group_fusion': True, 'disable_progress': False, 'verbose_progress': True, 'triton.multi_kernel': 0, 'triton.use_block_ptr': True, 'triton.enable_persistent_tma_matmul': True, 'triton.autotune_at_compile_time': False, 'triton.cooperative_reductions': False, 'cuda.compile_opt_level': '-O2', 'cuda.enable_cuda_lto': True, 'combo_kernels': False, 'benchmark_combo_kernel': True, 'combo_kernel_foreach_dynamic_shapes': True, 'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 192, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}
INFO 04-25 09:59:16 [parallel_state.py:1212] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.113.93.102:50843 backend=nccl
INFO 04-25 09:59:16 [parallel_state.py:1423] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
INFO 04-25 09:59:16 [gpu_model_runner.py:4033] Starting to load model unsloth/Qwen3-4B-Instruct-2507...
/root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
We recommend installing via `pip install torch-c-dlpack-ext`
warnings.warn(
INFO 04-25 09:59:19 [cuda.py:364] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION')
model.safetensors.index.json: 0%| | 0.00/32.9k [00:00<?, ?B/s]
model.safetensors.index.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 32.9k/32.9k [00:00<00:00, 120MB/s]
model-00001-of-00002.safetensors: 0%| | 0.00/4.97G [00:00<?, ?B/s]
model-00001-of-00002.safetensors: 3%|β–Ž | 134M/4.97G [00:01<00:37, 128MB/s]
model-00001-of-00002.safetensors: 31%|β–ˆβ–ˆβ–ˆ | 1.54G/4.97G [00:02<00:04, 753MB/s]
model-00001-of-00002.safetensors: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2.61G/4.97G [00:03<00:02, 826MB/s]
model-00001-of-00002.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.97G/4.97G [00:04<00:00, 1.24GB/s]
model-00002-of-00002.safetensors: 0%| | 0.00/3.08G [00:00<?, ?B/s]
model-00002-of-00002.safetensors: 0%| | 0.00/3.08G [00:01<?, ?B/s]
model-00002-of-00002.safetensors: 11%|β–ˆ | 332M/3.08G [00:02<00:09, 304MB/s]
model-00002-of-00002.safetensors: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2.54G/3.08G [00:03<00:00, 1.25GB/s]
model-00002-of-00002.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.08G/3.08G [00:04<00:00, 640MB/s]
INFO 04-25 09:59:29 [weight_utils.py:527] Time spent downloading weights for unsloth/Qwen3-4B-Instruct-2507: 8.877664 seconds
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00, 2.75it/s]
INFO 04-25 09:59:29 [default_loader.py:291] Loading weights took 0.74 seconds
INFO 04-25 09:59:29 [punica_selector.py:20] Using PunicaWrapperGPU.
INFO 04-25 09:59:30 [gpu_model_runner.py:4130] Model loading took 7.67 GiB memory and 12.958485 seconds
INFO 04-25 09:59:42 [backends.py:812] Using cache directory: /root/.cache/vllm/torch_compile_cache/f6f5a6d496/rank_0_0/backbone for vLLM's torch.compile
INFO 04-25 09:59:42 [backends.py:872] Dynamo bytecode transform time: 11.11 s
Unsloth: Compiling kernels: 0%| | 0/5 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/5 [00:00<?, ?it/s, triton_red_fused__to_copy_add_embedding_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 20%|β–ˆβ–ˆ | 1/5 [00:00<00:01, 3.47it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_1]
Unsloth: Compiling kernels: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 2/5 [00:00<00:01, 2.12it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_2]
Unsloth: Compiling kernels: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3/5 [00:00<00:00, 3.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_3]
Unsloth: Compiling kernels: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 4/5 [00:00<00:00, 4.21it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_4]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00, 5.25it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_4]
INFO 04-25 09:59:52 [backends.py:302] Cache the graph of compile range (1, 8192) for later use
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 477.82it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 480.28it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 523.70it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 103.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 54.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 40.69it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 17.98it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 859.49it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 820.24it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 853.43it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 822.33it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 810.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 805.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 800.42it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 869.65it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 840.96it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 874.36it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 851.94it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 842.30it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 830.75it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 824.33it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 809.87it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 802.05it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 848.08it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 829.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 818.91it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 808.51it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 804.41it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 625.92it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 669.91it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 719.43it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 712.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 701.93it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 704.02it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 708.10it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 841.05it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 817.13it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 859.49it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 841.55it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 830.29it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 823.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 818.24it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 804.12it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 815.22it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 854.64it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 834.31it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 825.26it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 817.87it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 805.47it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 848.02it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 844.60it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 882.39it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 857.99it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 843.42it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 835.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 828.73it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 815.38it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 821.77it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 856.74it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 832.91it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 822.64it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 812.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 803.44it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 883.20it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 860.72it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 886.06it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 855.28it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 837.45it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 828.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 822.97it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 860.19it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 838.69it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 879.80it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 847.51it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 830.23it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 821.20it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 813.71it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 880.79it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 846.31it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 875.82it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 845.63it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 830.88it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 821.55it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 816.38it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 806.13it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 809.55it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 850.20it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 828.18it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 807.03it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 796.26it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 791.31it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 837.02it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 838.78it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 872.60it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 850.17it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 836.92it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 827.03it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 821.31it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 874.72it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 859.05it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 894.50it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 867.76it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 856.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 843.95it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 838.67it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 885.06it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 848.79it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 889.00it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 830.02it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 821.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 818.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 815.76it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 849.39it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 848.45it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 880.29it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 858.30it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 847.37it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 837.94it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 828.73it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 857.56it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 848.02it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 886.50it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 850.21it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 841.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 833.72it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 826.56it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 845.11it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 820.64it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 855.69it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 836.14it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 825.23it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 818.45it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 806.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 858.26it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 849.65it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 888.75it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 864.98it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 850.50it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 841.78it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 828.96it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 879.31it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 864.98it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 902.13it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 872.13it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 852.29it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 841.50it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 835.38it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 890.70it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 863.03it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 899.49it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 864.05it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 843.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 830.94it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 822.85it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 886.75it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 861.78it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 886.00it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 861.34it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 845.59it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 833.53it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 816.22it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 836.02it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 816.17it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 858.55it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 837.98it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 829.54it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 821.58it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 817.83it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 745.79it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 790.71it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 841.27it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 828.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 823.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 817.92it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 796.12it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 818.72it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 820.48it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 857.20it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 834.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 815.60it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 811.57it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 804.83it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 878.20it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 853.28it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 894.18it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 866.32it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 853.09it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 840.46it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 833.79it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 852.15it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 826.63it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 861.73it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 842.78it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 829.04it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 819.87it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 813.53it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 845.28it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 841.22it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 881.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 856.94it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 840.88it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 829.98it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 823.34it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 882.08it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 861.70it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 885.06it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 857.91it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 817.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 811.85it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 803.81it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 892.79it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 850.94it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 882.02it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 862.80it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 845.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 835.19it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 828.89it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 871.27it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 851.55it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 881.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 857.69it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 842.30it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 834.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 829.08it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 899.68it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 877.47it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 907.53it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 870.87it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 856.89it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 848.05it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 840.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 826.63it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 825.49it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 874.66it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 855.59it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 845.46it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 836.19it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 827.96it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 14%|β–ˆβ– | 1/7 [00:00<00:00, 837.02it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:00<00:00, 833.94it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:00<00:00, 879.99it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3]
Unsloth: Compiling kernels: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:00<00:00, 855.81it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4]
Unsloth: Compiling kernels: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:00<00:00, 845.28it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5]
Unsloth: Compiling kernels: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:00<00:00, 837.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 831.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6]
Unsloth: Compiling kernels: 0%| | 0/3 [00:00<?, ?it/s]
Unsloth: Compiling kernels: 0%| | 0/3 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0]
Unsloth: Compiling kernels: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1/3 [00:00<00:00, 711.62it/s, triton_poi_fused_mul_silu_slice_1] 
Unsloth: Compiling kernels: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2/3 [00:00<00:00, 803.35it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
Unsloth: Compiling kernels: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 20.50it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2]
INFO 04-25 10:00:01 [backends.py:319] Compiling a graph for compile range (1, 8192) takes 12.73 s
INFO 04-25 10:00:01 [monitor.py:34] torch.compile takes 23.84 s in total
INFO 04-25 10:00:02 [gpu_worker.py:356] Available KV cache memory: 31.08 GiB
INFO 04-25 10:00:02 [kv_cache_utils.py:1307] GPU KV cache size: 226,336 tokens
INFO 04-25 10:00:02 [kv_cache_utils.py:1312] Maximum concurrency for 4,096 tokens per request: 55.26x
INFO 04-25 10:00:02 [vllm_utils.py:729] Unsloth: Running patched vLLM v1 `capture_model`.
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/54 [00:00<?, ?it/s]WARNING 04-25 10:00:03 [utils.py:268] Using default LoRA kernel configs
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 2%|▏ | 1/54 [00:02<02:05, 2.37s/it]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 7%|β–‹ | 4/54 [00:03<00:41, 1.22it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 28%|β–ˆβ–ˆβ–Š | 15/54 [00:04<00:09, 4.23it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 21/54 [00:07<00:09, 3.43it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 33/54 [00:08<00:03, 5.37it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 45/54 [00:09<00:01, 6.87it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 54/54 [00:12<00:00, 4.61it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 54/54 [00:12<00:00, 4.27it/s]
Capturing CUDA graphs (decode, FULL): 0%| | 0/30 [00:00<?, ?it/s]
Capturing CUDA graphs (decode, FULL): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 11/30 [00:01<00:01, 10.43it/s]
Capturing CUDA graphs (decode, FULL): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 23/30 [00:02<00:00, 11.21it/s]
Capturing CUDA graphs (decode, FULL): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:02<00:00, 10.98it/s]
INFO 04-25 10:00:18 [gpu_model_runner.py:5063] Graph capturing finished in 15 secs, took 0.69 GiB
INFO 04-25 10:00:18 [vllm_utils.py:736] Unsloth: Patched vLLM v1 graph capture finished in 15 secs.
INFO 04-25 10:00:19 [core.py:272] init engine (profile, create kv cache, warmup model) took 48.93 seconds
INFO 04-25 10:00:21 [llm.py:343] Supported tasks: ('generate',)
Unsloth: Just some info: will skip parsing ['layer_norm2', 'q_norm', 'post_attention_layernorm', 'norm', 'input_layernorm', 'ffn_norm', 'post_layernorm', 'norm1', 'layer_norm1', 'attention_norm', 'post_feedforward_layernorm', 'k_norm', 'norm2', 'pre_feedforward_layernorm']
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 46.31it/s]
Some weights of Qwen3ForCausalLM were not initialized from the model checkpoint at unsloth/Qwen3-4B-Instruct-2507 and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Performing substitution for additional_keys=set()
Unsloth: Just some info: will skip parsing ['layer_norm2', 'q_norm', 'post_attention_layernorm', 'norm', 'input_layernorm', 'cross_attn_input_layernorm', 'ffn_norm', 'post_layernorm', 'norm1', 'cross_attn_post_attention_layernorm', 'layer_norm1', 'attention_norm', 'post_feedforward_layernorm', 'k_norm', 'norm2', 'pre_feedforward_layernorm']
unsloth/Qwen3-4B-Instruct-2507 does not have a padding token! Will use pad_token = <|PAD_TOKEN|>.
Unsloth 2026.4.8 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers.
VRAM allocated: 41.84 GB
══ SFT warm-start β€” sft_traces/traces_v2.jsonl ══
120 SFT examples loaded (chat format in `text`)
Unsloth: Tokenizing ["text"] (num_proc=12): 0%| | 0/120 [00:00<?, ? examples/s]
Unsloth: Tokenizing ["text"] (num_proc=12): 8%|β–Š | 10/120 [00:01<00:13, 8.20 examples/s]
Unsloth: Tokenizing ["text"] (num_proc=12): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 110/120 [00:02<00:00, 56.30 examples/s]
Unsloth: Tokenizing ["text"] (num_proc=12): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 120/120 [00:02<00:00, 46.73 examples/s]
πŸ¦₯ Unsloth: Padding-free auto-enabled, enabling faster training.
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 120 | Num Epochs = 10 | Total steps = 150
O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 4
\ / Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
"-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)
0%| | 0/150 [00:00<?, ?it/s]Unsloth: Will smartly offload gradients to save VRAM!
1%| | 1/150 [00:04<10:16, 4.14s/it]
1%|▏ | 2/150 [00:05<06:02, 2.45s/it]
2%|▏ | 3/150 [00:06<04:42, 1.92s/it]
3%|β–Ž | 4/150 [00:08<04:26, 1.82s/it]
3%|β–Ž | 5/150 [00:09<04:02, 1.67s/it]
{'loss': 3.6266, 'grad_norm': 2.663548231124878, 'learning_rate': 2.5e-05, 'epoch': 0.33}
3%|β–Ž | 5/150 [00:09<04:02, 1.67s/it]
4%|▍ | 6/150 [00:11<03:41, 1.54s/it]
5%|▍ | 7/150 [00:12<03:28, 1.46s/it]
5%|β–Œ | 8/150 [00:13<03:18, 1.40s/it]
6%|β–Œ | 9/150 [00:14<03:12, 1.36s/it]
7%|β–‹ | 10/150 [00:16<03:07, 1.34s/it]
{'loss': 3.3225, 'grad_norm': 2.001558542251587, 'learning_rate': 4.9647887323943665e-05, 'epoch': 0.67}
7%|β–‹ | 10/150 [00:16<03:07, 1.34s/it]
7%|β–‹ | 11/150 [00:17<03:04, 1.33s/it]
8%|β–Š | 12/150 [00:18<03:00, 1.31s/it]
9%|β–Š | 13/150 [00:20<02:58, 1.30s/it]
9%|β–‰ | 14/150 [00:21<02:56, 1.30s/it]
10%|β–ˆ | 15/150 [00:22<02:55, 1.30s/it]
{'loss': 2.7371, 'grad_norm': 0.9380167722702026, 'learning_rate': 4.788732394366197e-05, 'epoch': 1.0}
10%|β–ˆ | 15/150 [00:22<02:55, 1.30s/it]
11%|β–ˆ | 16/150 [00:23<02:53, 1.30s/it]
11%|β–ˆβ– | 17/150 [00:25<02:52, 1.30s/it]
12%|β–ˆβ– | 18/150 [00:26<02:50, 1.29s/it]
13%|β–ˆβ–Ž | 19/150 [00:27<02:49, 1.29s/it]
13%|β–ˆβ–Ž | 20/150 [00:29<02:48, 1.30s/it]
{'loss': 2.365, 'grad_norm': 0.8978179693222046, 'learning_rate': 4.6126760563380286e-05, 'epoch': 1.33}
13%|β–ˆβ–Ž | 20/150 [00:29<02:48, 1.30s/it]
14%|β–ˆβ– | 21/150 [00:30<02:47, 1.29s/it]
15%|β–ˆβ– | 22/150 [00:31<02:45, 1.29s/it]
15%|β–ˆβ–Œ | 23/150 [00:32<02:43, 1.29s/it]
16%|β–ˆβ–Œ | 24/150 [00:34<02:41, 1.28s/it]
17%|β–ˆβ–‹ | 25/150 [00:35<02:39, 1.28s/it]
{'loss': 2.0451, 'grad_norm': 0.9256548285484314, 'learning_rate': 4.436619718309859e-05, 'epoch': 1.67}
17%|β–ˆβ–‹ | 25/150 [00:35<02:39, 1.28s/it]
17%|β–ˆβ–‹ | 26/150 [00:36<02:38, 1.28s/it]
18%|β–ˆβ–Š | 27/150 [00:38<02:38, 1.29s/it]
19%|β–ˆβ–Š | 28/150 [00:39<02:36, 1.29s/it]
19%|β–ˆβ–‰ | 29/150 [00:40<02:35, 1.28s/it]
20%|β–ˆβ–ˆ | 30/150 [00:41<02:34, 1.28s/it]
{'loss': 1.7249, 'grad_norm': 0.8666767477989197, 'learning_rate': 4.26056338028169e-05, 'epoch': 2.0}
20%|β–ˆβ–ˆ | 30/150 [00:41<02:34, 1.28s/it]
21%|β–ˆβ–ˆ | 31/150 [00:43<02:32, 1.28s/it]
21%|β–ˆβ–ˆβ– | 32/150 [00:44<02:32, 1.29s/it]
22%|β–ˆβ–ˆβ– | 33/150 [00:45<02:30, 1.29s/it]
23%|β–ˆβ–ˆβ–Ž | 34/150 [00:47<02:29, 1.29s/it]
23%|β–ˆβ–ˆβ–Ž | 35/150 [00:48<02:28, 1.29s/it]
{'loss': 1.4079, 'grad_norm': 1.2116891145706177, 'learning_rate': 4.0845070422535214e-05, 'epoch': 2.33}
23%|β–ˆβ–ˆβ–Ž | 35/150 [00:48<02:28, 1.29s/it]
24%|β–ˆβ–ˆβ– | 36/150 [00:49<02:26, 1.29s/it]
25%|β–ˆβ–ˆβ– | 37/150 [00:50<02:24, 1.28s/it]
25%|β–ˆβ–ˆβ–Œ | 38/150 [00:52<02:24, 1.29s/it]
26%|β–ˆβ–ˆβ–Œ | 39/150 [00:53<02:22, 1.29s/it]
27%|β–ˆβ–ˆβ–‹ | 40/150 [00:54<02:21, 1.29s/it]
{'loss': 1.1155, 'grad_norm': 0.8696402311325073, 'learning_rate': 3.908450704225352e-05, 'epoch': 2.67}
27%|β–ˆβ–ˆβ–‹ | 40/150 [00:54<02:21, 1.29s/it]
27%|β–ˆβ–ˆβ–‹ | 41/150 [00:56<02:20, 1.29s/it]
28%|β–ˆβ–ˆβ–Š | 42/150 [00:57<02:19, 1.29s/it]
29%|β–ˆβ–ˆβ–Š | 43/150 [00:58<02:17, 1.29s/it]
29%|β–ˆβ–ˆβ–‰ | 44/150 [01:00<02:17, 1.29s/it]
30%|β–ˆβ–ˆβ–ˆ | 45/150 [01:01<02:14, 1.29s/it]
{'loss': 0.9477, 'grad_norm': 0.5664961338043213, 'learning_rate': 3.7323943661971835e-05, 'epoch': 3.0}
30%|β–ˆβ–ˆβ–ˆ | 45/150 [01:01<02:14, 1.29s/it]
31%|β–ˆβ–ˆβ–ˆ | 46/150 [01:02<02:13, 1.28s/it]
31%|β–ˆβ–ˆβ–ˆβ– | 47/150 [01:03<02:11, 1.28s/it]
32%|β–ˆβ–ˆβ–ˆβ– | 48/150 [01:05<02:10, 1.28s/it]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 49/150 [01:06<02:09, 1.29s/it]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 50/150 [01:07<02:09, 1.30s/it]
{'loss': 0.8914, 'grad_norm': 0.4789012372493744, 'learning_rate': 3.556338028169014e-05, 'epoch': 3.33}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 50/150 [01:07<02:09, 1.30s/it]
34%|β–ˆβ–ˆβ–ˆβ– | 51/150 [01:09<02:07, 1.29s/it]
35%|β–ˆβ–ˆβ–ˆβ– | 52/150 [01:10<02:06, 1.29s/it]
35%|β–ˆβ–ˆβ–ˆβ–Œ | 53/150 [01:11<02:04, 1.29s/it]
36%|β–ˆβ–ˆβ–ˆβ–Œ | 54/150 [01:12<02:03, 1.29s/it]
37%|β–ˆβ–ˆβ–ˆβ–‹ | 55/150 [01:14<02:02, 1.29s/it]
{'loss': 0.8417, 'grad_norm': 0.3655957579612732, 'learning_rate': 3.380281690140845e-05, 'epoch': 3.67}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 55/150 [01:14<02:02, 1.29s/it]
37%|β–ˆβ–ˆβ–ˆβ–‹ | 56/150 [01:15<02:00, 1.29s/it]
38%|β–ˆβ–ˆβ–ˆβ–Š | 57/150 [01:16<01:59, 1.29s/it]
39%|β–ˆβ–ˆβ–ˆβ–Š | 58/150 [01:18<01:59, 1.30s/it]
39%|β–ˆβ–ˆβ–ˆβ–‰ | 59/150 [01:19<01:58, 1.30s/it]
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 60/150 [01:20<01:56, 1.30s/it]
{'loss': 0.8088, 'grad_norm': 0.36159124970436096, 'learning_rate': 3.204225352112676e-05, 'epoch': 4.0}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 60/150 [01:20<01:56, 1.30s/it]
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 61/150 [01:21<01:56, 1.30s/it]
41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 62/150 [01:23<01:54, 1.30s/it]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 63/150 [01:24<01:53, 1.30s/it]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 64/150 [01:25<01:51, 1.30s/it]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 65/150 [01:27<01:50, 1.30s/it]
{'loss': 0.7978, 'grad_norm': 0.3379436433315277, 'learning_rate': 3.028169014084507e-05, 'epoch': 4.33}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 65/150 [01:27<01:50, 1.30s/it]
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 66/150 [01:28<01:49, 1.30s/it]
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 67/150 [01:29<01:47, 1.30s/it]
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 68/150 [01:31<01:46, 1.29s/it]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 69/150 [01:32<01:44, 1.29s/it]
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 70/150 [01:33<01:43, 1.29s/it]
{'loss': 0.7577, 'grad_norm': 0.3583666682243347, 'learning_rate': 2.8521126760563384e-05, 'epoch': 4.67}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 70/150 [01:33<01:43, 1.29s/it]
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 71/150 [01:34<01:42, 1.29s/it]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 72/150 [01:36<01:41, 1.30s/it]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 73/150 [01:37<01:40, 1.31s/it]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 74/150 [01:38<01:39, 1.30s/it]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 75/150 [01:40<01:37, 1.30s/it]
{'loss': 0.7794, 'grad_norm': 0.33592215180397034, 'learning_rate': 2.676056338028169e-05, 'epoch': 5.0}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 75/150 [01:40<01:37, 1.30s/it]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 76/150 [01:41<01:36, 1.30s/it]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 77/150 [01:42<01:35, 1.30s/it]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 78/150 [01:44<01:33, 1.30s/it]
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 79/150 [01:45<01:32, 1.30s/it]
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 80/150 [01:46<01:30, 1.30s/it]
{'loss': 0.7684, 'grad_norm': 0.3456568121910095, 'learning_rate': 2.5e-05, 'epoch': 5.33}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 80/150 [01:46<01:30, 1.30s/it]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 81/150 [01:47<01:29, 1.30s/it]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 82/150 [01:49<01:28, 1.30s/it]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 83/150 [01:50<01:26, 1.29s/it]
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 84/150 [01:51<01:25, 1.29s/it]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 85/150 [01:53<01:23, 1.29s/it]
{'loss': 0.7243, 'grad_norm': 0.33662667870521545, 'learning_rate': 2.323943661971831e-05, 'epoch': 5.67}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 85/150 [01:53<01:23, 1.29s/it]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 86/150 [01:54<01:22, 1.29s/it]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 87/150 [01:55<01:20, 1.28s/it]
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 88/150 [01:56<01:19, 1.29s/it]
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 89/150 [01:58<01:18, 1.29s/it]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 90/150 [01:59<01:17, 1.29s/it]
{'loss': 0.7285, 'grad_norm': 0.3644108772277832, 'learning_rate': 2.147887323943662e-05, 'epoch': 6.0}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 90/150 [01:59<01:17, 1.29s/it]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 91/150 [02:00<01:16, 1.29s/it]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 92/150 [02:02<01:15, 1.30s/it]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 93/150 [02:03<01:13, 1.29s/it]
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 94/150 [02:04<01:12, 1.29s/it]
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 95/150 [02:05<01:10, 1.29s/it]
{'loss': 0.7192, 'grad_norm': 0.35359156131744385, 'learning_rate': 1.971830985915493e-05, 'epoch': 6.33}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 95/150 [02:05<01:10, 1.29s/it]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 96/150 [02:07<01:10, 1.30s/it]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 97/150 [02:08<01:08, 1.30s/it]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 98/150 [02:09<01:07, 1.30s/it]
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 99/150 [02:11<01:06, 1.30s/it]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 100/150 [02:12<01:05, 1.30s/it]
{'loss': 0.7025, 'grad_norm': 0.3457960784435272, 'learning_rate': 1.7957746478873243e-05, 'epoch': 6.67}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 100/150 [02:12<01:05, 1.30s/it]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 101/150 [02:13<01:03, 1.30s/it]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 102/150 [02:15<01:02, 1.30s/it]
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 103/150 [02:16<01:00, 1.29s/it]
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 104/150 [02:17<00:59, 1.30s/it]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 105/150 [02:18<00:58, 1.30s/it]
{'loss': 0.7215, 'grad_norm': 0.3716900646686554, 'learning_rate': 1.619718309859155e-05, 'epoch': 7.0}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 105/150 [02:18<00:58, 1.30s/it]
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 106/150 [02:20<00:57, 1.30s/it]
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 107/150 [02:21<00:56, 1.30s/it]
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 108/150 [02:22<00:54, 1.30s/it]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 109/150 [02:24<00:53, 1.30s/it]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 110/150 [02:25<00:51, 1.30s/it]
{'loss': 0.6965, 'grad_norm': 0.35728198289871216, 'learning_rate': 1.443661971830986e-05, 'epoch': 7.33}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 110/150 [02:25<00:51, 1.30s/it]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 111/150 [02:26<00:50, 1.30s/it]
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 112/150 [02:28<00:49, 1.30s/it]
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 113/150 [02:29<00:48, 1.30s/it]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 114/150 [02:30<00:46, 1.30s/it]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 115/150 [02:31<00:45, 1.30s/it]
{'loss': 0.701, 'grad_norm': 0.3863743245601654, 'learning_rate': 1.267605633802817e-05, 'epoch': 7.67}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 115/150 [02:31<00:45, 1.30s/it]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 116/150 [02:33<00:44, 1.30s/it]
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 117/150 [02:34<00:42, 1.30s/it]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 118/150 [02:35<00:41, 1.30s/it]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 119/150 [02:37<00:40, 1.31s/it]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 120/150 [02:38<00:39, 1.31s/it]
{'loss': 0.691, 'grad_norm': 0.38696053624153137, 'learning_rate': 1.0915492957746478e-05, 'epoch': 8.0}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 120/150 [02:38<00:39, 1.31s/it]
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 121/150 [02:39<00:38, 1.32s/it]
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 122/150 [02:41<00:36, 1.31s/it]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 123/150 [02:42<00:35, 1.31s/it]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 124/150 [02:43<00:33, 1.31s/it]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 125/150 [02:45<00:32, 1.30s/it]
{'loss': 0.6836, 'grad_norm': 0.3782326579093933, 'learning_rate': 9.15492957746479e-06, 'epoch': 8.33}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 125/150 [02:45<00:32, 1.30s/it]
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 126/150 [02:46<00:31, 1.30s/it]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 127/150 [02:47<00:30, 1.31s/it]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 128/150 [02:48<00:28, 1.30s/it]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 129/150 [02:50<00:27, 1.30s/it]
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 130/150 [02:51<00:26, 1.30s/it]
{'loss': 0.6819, 'grad_norm': 0.3920275866985321, 'learning_rate': 7.394366197183099e-06, 'epoch': 8.67}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 130/150 [02:51<00:26, 1.30s/it]
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 131/150 [02:52<00:24, 1.30s/it]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 132/150 [02:54<00:23, 1.29s/it]
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 133/150 [02:55<00:22, 1.30s/it]
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 134/150 [02:56<00:20, 1.30s/it]
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 135/150 [02:58<00:19, 1.30s/it]
{'loss': 0.6833, 'grad_norm': 0.37108415365219116, 'learning_rate': 5.6338028169014084e-06, 'epoch': 9.0}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 135/150 [02:58<00:19, 1.30s/it]
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 136/150 [02:59<00:18, 1.30s/it]
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 137/150 [03:00<00:16, 1.30s/it]
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 138/150 [03:01<00:15, 1.30s/it]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 139/150 [03:03<00:14, 1.29s/it]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 140/150 [03:04<00:12, 1.29s/it]
{'loss': 0.6688, 'grad_norm': 0.3897058367729187, 'learning_rate': 3.873239436619718e-06, 'epoch': 9.33}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 140/150 [03:04<00:12, 1.29s/it]
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 141/150 [03:05<00:11, 1.29s/it]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 142/150 [03:07<00:10, 1.29s/it]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 143/150 [03:08<00:09, 1.29s/it]
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 144/150 [03:09<00:07, 1.30s/it]
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 145/150 [03:10<00:06, 1.30s/it]
{'loss': 0.6744, 'grad_norm': 0.3871634006500244, 'learning_rate': 2.112676056338028e-06, 'epoch': 9.67}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 145/150 [03:10<00:06, 1.30s/it]
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 146/150 [03:12<00:05, 1.30s/it]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 147/150 [03:13<00:03, 1.30s/it]
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 148/150 [03:14<00:02, 1.33s/it]
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 149/150 [03:16<00:01, 1.32s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 150/150 [03:17<00:00, 1.31s/it]
{'loss': 0.6839, 'grad_norm': 0.40198108553886414, 'learning_rate': 3.5211267605633803e-07, 'epoch': 10.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 150/150 [03:17<00:00, 1.31s/it]
{'train_runtime': 198.1987, 'train_samples_per_second': 6.055, 'train_steps_per_second': 0.757, 'train_loss': 1.1565947977701823, 'epoch': 10.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 150/150 [03:18<00:00, 1.31s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 150/150 [03:18<00:00, 1.32s/it]
SFT done in 3.3 min
══ Pre-GRPO hold-out eval (SFT-only) ══
[diagnostic] seed=100 raw completion (first 500 chars):
<tool_call>
1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the immediate 3-year cumulative real return is heavily penalized. The 12-quarter lockup forces a defensive tilt. 3rd-order: The fragmentation of global supply chains acts as a massive structural headwind for TECH and GREEN. The base case
[parse_action result]: metadata={} weights=[0.0, 0.4, 0.0, 0.2, 0.4] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='fragmentation'
── Hold-out eval (5/5 valid) ──
mean regret: -0.2516
beat baseline: 0/5
══ GRPO Phase 1: 4Q episodes, 50 iters, rewards=['format', 'regret'] ══
Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it.
Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it.
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 200 | Num Epochs = 1 | Total steps = 50
O^O/ \_/ \ Batch size per device = 4 | Gradient accumulation steps = 1
\ / Data Parallel GPUs = 1 | Total batch size (4 x 1 x 1) = 4
"-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)
0%| | 0/50 [00:00<?, ?it/s]WARNING 04-25 10:04:33 [input_processor.py:287] vLLM has deprecated support for supporting different tokenizers for different LoRAs. By default, vLLM uses base model's tokenizer. If you are using a LoRA with its own tokenizer, consider specifying `--tokenizer [lora_path]` to use the LoRA tokenizer.
Unsloth: Will smartly offload gradients to save VRAM!
2%|▏ | 1/50 [00:14<11:30, 14.08s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 1996.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
2%|▏ | 1/50 [00:14<11:30, 14.08s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 4044.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
4%|▍ | 2/50 [00:14<11:15, 14.08s/it]
6%|β–Œ | 3/50 [00:15<03:16, 4.18s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 6092.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
6%|β–Œ | 3/50 [00:15<03:16, 4.18s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 8140.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
8%|β–Š | 4/50 [00:16<03:12, 4.18s/it]
10%|β–ˆ | 5/50 [00:16<01:44, 2.33s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 10224.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
10%|β–ˆ | 5/50 [00:16<01:44, 2.33s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 12248.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
12%|β–ˆβ– | 6/50 [00:17<01:42, 2.33s/it]
14%|β–ˆβ– | 7/50 [00:17<01:07, 1.58s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.888888888888889e-06, 'num_tokens': 14296.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
14%|β–ˆβ– | 7/50 [00:17<01:07, 1.58s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.777777777777778e-06, 'num_tokens': 16348.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
16%|β–ˆβ–Œ | 8/50 [00:18<01:06, 1.58s/it]
18%|β–ˆβ–Š | 9/50 [00:18<00:49, 1.20s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.666666666666667e-06, 'num_tokens': 18396.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
18%|β–ˆβ–Š | 9/50 [00:18<00:49, 1.20s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.555555555555556e-06, 'num_tokens': 20444.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
20%|β–ˆβ–ˆ | 10/50 [00:19<00:48, 1.20s/it]
22%|β–ˆβ–ˆβ– | 11/50 [00:20<00:38, 1.02it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 22492.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
22%|β–ˆβ–ˆβ– | 11/50 [00:20<00:38, 1.02it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.333333333333334e-06, 'num_tokens': 24544.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
24%|β–ˆβ–ˆβ– | 12/50 [00:20<00:37, 1.02it/s]
26%|β–ˆβ–ˆβ–Œ | 13/50 [00:21<00:34, 1.07it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.222222222222223e-06, 'num_tokens': 26592.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
26%|β–ˆβ–ˆβ–Œ | 13/50 [00:21<00:34, 1.07it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.111111111111111e-06, 'num_tokens': 28640.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
28%|β–ˆβ–ˆβ–Š | 14/50 [00:22<00:33, 1.07it/s]
30%|β–ˆβ–ˆβ–ˆ | 15/50 [00:22<00:28, 1.23it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 30688.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
30%|β–ˆβ–ˆβ–ˆ | 15/50 [00:22<00:28, 1.23it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 32736.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
32%|β–ˆβ–ˆβ–ˆβ– | 16/50 [00:23<00:27, 1.23it/s]
34%|β–ˆβ–ˆβ–ˆβ– | 17/50 [00:24<00:24, 1.36it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.777777777777778e-06, 'num_tokens': 34784.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
34%|β–ˆβ–ˆβ–ˆβ– | 17/50 [00:24<00:24, 1.36it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6666666666666666e-06, 'num_tokens': 36832.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 18/50 [00:24<00:23, 1.36it/s]
38%|β–ˆβ–ˆβ–ˆβ–Š | 19/50 [00:25<00:21, 1.46it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.555555555555556e-06, 'num_tokens': 38884.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
38%|β–ˆβ–ˆβ–ˆβ–Š | 19/50 [00:25<00:21, 1.46it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.444444444444445e-06, 'num_tokens': 40908.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 20/50 [00:25<00:20, 1.46it/s]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 21/50 [00:26<00:18, 1.55it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 42904.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 21/50 [00:26<00:18, 1.55it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2222222222222227e-06, 'num_tokens': 44988.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 22/50 [00:26<00:18, 1.55it/s]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 23/50 [00:27<00:16, 1.60it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1111111111111116e-06, 'num_tokens': 46984.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 23/50 [00:27<00:16, 1.60it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 49008.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 24/50 [00:28<00:16, 1.60it/s]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 25/50 [00:29<00:17, 1.44it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.888888888888889e-06, 'num_tokens': 51092.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 25/50 [00:29<00:17, 1.44it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 53176.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 26/50 [00:29<00:16, 1.44it/s]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 27/50 [00:30<00:15, 1.52it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.666666666666667e-06, 'num_tokens': 55172.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 27/50 [00:30<00:15, 1.52it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5555555555555557e-06, 'num_tokens': 57224.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 28/50 [00:30<00:14, 1.52it/s]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 29/50 [00:31<00:13, 1.58it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4444444444444447e-06, 'num_tokens': 59308.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 29/50 [00:31<00:13, 1.58it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.3333333333333336e-06, 'num_tokens': 61356.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 30/50 [00:32<00:12, 1.58it/s]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 31/50 [00:32<00:11, 1.63it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 63440.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 31/50 [00:32<00:11, 1.63it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.1111111111111114e-06, 'num_tokens': 65488.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 32/50 [00:33<00:11, 1.63it/s]
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 33/50 [00:33<00:10, 1.67it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 67512.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 33/50 [00:33<00:10, 1.67it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.888888888888889e-06, 'num_tokens': 69564.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 34/50 [00:34<00:09, 1.67it/s]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 35/50 [00:34<00:08, 1.70it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.777777777777778e-06, 'num_tokens': 71560.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 35/50 [00:34<00:08, 1.70it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 73608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.18}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 36/50 [00:35<00:08, 1.70it/s]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 37/50 [00:37<00:09, 1.35it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5555555555555558e-06, 'num_tokens': 75692.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.18}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 37/50 [00:37<00:09, 1.35it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4444444444444445e-06, 'num_tokens': 77740.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.19}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 38/50 [00:37<00:08, 1.35it/s]
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 39/50 [00:38<00:07, 1.45it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3333333333333334e-06, 'num_tokens': 79764.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 39/50 [00:38<00:07, 1.45it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2222222222222223e-06, 'num_tokens': 81812.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 40/50 [00:38<00:06, 1.45it/s]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 41/50 [00:39<00:05, 1.53it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 83808.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 41/50 [00:39<00:05, 1.53it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 85804.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.21}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 42/50 [00:39<00:05, 1.53it/s]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 43/50 [00:40<00:04, 1.59it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.88888888888889e-07, 'num_tokens': 87888.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.21}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 43/50 [00:40<00:04, 1.59it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.777777777777779e-07, 'num_tokens': 89884.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.22}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 44/50 [00:41<00:03, 1.59it/s]
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 45/50 [00:41<00:03, 1.64it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.666666666666667e-07, 'num_tokens': 91936.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 45/50 [00:41<00:03, 1.64it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 94020.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 46/50 [00:42<00:02, 1.64it/s]
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 47/50 [00:42<00:01, 1.68it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444445e-07, 'num_tokens': 96016.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 47/50 [00:42<00:01, 1.68it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333335e-07, 'num_tokens': 98064.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.24}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 48/50 [00:43<00:01, 1.68it/s]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 49/50 [00:44<00:00, 1.48it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2222222222222224e-07, 'num_tokens': 100116.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.24}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 49/50 [00:44<00:00, 1.48it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1111111111111112e-07, 'num_tokens': 102168.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.25}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [00:44<00:00, 1.48it/s]
{'train_runtime': 45.5636, 'train_samples_per_second': 4.389, 'train_steps_per_second': 1.097, 'train_loss': 0.0, 'epoch': 0.25}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [00:45<00:00, 1.48it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [00:45<00:00, 1.10it/s]
Phase 1 done in 0.8 min
[diagnostic] seed=100 raw completion (first 500 chars):
<tool_call>
1st-order: EV adoption surges, directly driving demand for GREEN energy and EV supply chains. 2nd-order: As EVs displace ICE vehicles, OIL demand faces structural headwinds over the 12-quarter cycle, forcing a long-term rotation away from fossil fuels. 3rd-order: The massive capital deployment into EV infrastructure acts as a massive liquidity pump, supporting TECH and REAL_ESTATE valuations. Base-rate: Today's news strongly signals a structural transition away from OIL and a green b
[parse_action result]: metadata={} weights=[0.35, 0.05, 0.45, 0.1, 0.05] infra_commit=0.15 carbon_offset_buy=0.0 put_hedge=0.0 tech_bet='green_leaps'
── Hold-out eval (5/5 valid) ──
mean regret: -0.0037
beat baseline: 4/5
══ GRPO Phase 2: 8Q episodes, 100 iters, rewards=['format', 'regret', 'sharpe', 'drawdown'] ══
Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it.
Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it.
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 600 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \ Batch size per device = 6 | Gradient accumulation steps = 1
\ / Data Parallel GPUs = 1 | Total batch size (6 x 1 x 1) = 6
"-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)
0%| | 0/100 [00:00<?, ?it/s]Unsloth: Will smartly offload gradients to save VRAM!
1%| | 1/100 [00:05<08:43, 5.29s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 2994.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}
1%| | 1/100 [00:05<08:43, 5.29s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.000000000000001e-07, 'num_tokens': 6066.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}
2%|▏ | 2/100 [00:06<08:38, 5.29s/it]
3%|β–Ž | 3/100 [00:06<03:06, 1.92s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 9270.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
3%|β–Ž | 3/100 [00:06<03:06, 1.92s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5e-06, 'num_tokens': 12342.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
4%|▍ | 4/100 [00:07<03:04, 1.92s/it]
5%|β–Œ | 5/100 [00:08<02:05, 1.32s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 15576.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
5%|β–Œ | 5/100 [00:08<02:05, 1.32s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 18714.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
6%|β–Œ | 6/100 [00:09<02:04, 1.32s/it]
7%|β–‹ | 7/100 [00:09<01:39, 1.07s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 21702.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
7%|β–‹ | 7/100 [00:09<01:39, 1.07s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5e-06, 'num_tokens': 24738.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
8%|β–Š | 8/100 [00:10<01:38, 1.07s/it]
9%|β–‰ | 9/100 [00:11<01:25, 1.06it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 27726.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
9%|β–‰ | 9/100 [00:11<01:25, 1.06it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5e-06, 'num_tokens': 30972.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
10%|β–ˆ | 10/100 [00:11<01:24, 1.06it/s]
11%|β–ˆ | 11/100 [00:12<01:18, 1.14it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 34098.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
11%|β–ˆ | 11/100 [00:12<01:18, 1.14it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.944444444444445e-06, 'num_tokens': 37176.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
12%|β–ˆβ– | 12/100 [00:13<01:17, 1.14it/s]
13%|β–ˆβ–Ž | 13/100 [00:14<01:13, 1.19it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.888888888888889e-06, 'num_tokens': 40380.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
13%|β–ˆβ–Ž | 13/100 [00:14<01:13, 1.19it/s]
13%|β–ˆβ–Ž | 13/100 [00:28<01:13, 1.19it/s]
14%|β–ˆβ– | 14/100 [00:53<11:54, 8.31s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.833333333333333e-06, 'num_tokens': 43382.0, 'completions/mean_length': 3.3333334922790527, 'completions/min_length': 1.0, 'completions/max_length': 15.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.3333334922790527, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 15.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.3333334922790527, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
14%|β–ˆβ– | 14/100 [00:53<11:54, 8.31s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.777777777777778e-06, 'num_tokens': 46454.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
15%|β–ˆβ–Œ | 15/100 [00:54<11:46, 8.31s/it]
16%|β–ˆβ–Œ | 16/100 [00:55<07:52, 5.62s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.722222222222222e-06, 'num_tokens': 49448.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
16%|β–ˆβ–Œ | 16/100 [00:55<07:52, 5.62s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.666666666666667e-06, 'num_tokens': 52526.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
17%|β–ˆβ–‹ | 17/100 [00:56<07:46, 5.62s/it]
18%|β–ˆβ–Š | 18/100 [00:56<05:26, 3.98s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.611111111111112e-06, 'num_tokens': 55730.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
18%|β–ˆβ–Š | 18/100 [00:56<05:26, 3.98s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.555555555555556e-06, 'num_tokens': 58976.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
19%|β–ˆβ–‰ | 19/100 [00:57<05:22, 3.98s/it]
20%|β–ˆβ–ˆ | 20/100 [00:58<03:54, 2.93s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5e-06, 'num_tokens': 62048.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
20%|β–ˆβ–ˆ | 20/100 [00:58<03:54, 2.93s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 65282.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
21%|β–ˆβ–ˆ | 21/100 [00:59<03:51, 2.93s/it]
22%|β–ˆβ–ˆβ– | 22/100 [00:59<02:54, 2.24s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.388888888888889e-06, 'num_tokens': 68354.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
22%|β–ˆβ–ˆβ– | 22/100 [00:59<02:54, 2.24s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.333333333333334e-06, 'num_tokens': 71588.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
23%|β–ˆβ–ˆβ–Ž | 23/100 [01:00<02:52, 2.24s/it]
24%|β–ˆβ–ˆβ– | 24/100 [01:01<02:16, 1.79s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.277777777777778e-06, 'num_tokens': 74660.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
24%|β–ˆβ–ˆβ– | 24/100 [01:01<02:16, 1.79s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.222222222222223e-06, 'num_tokens': 77786.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
25%|β–ˆβ–ˆβ–Œ | 25/100 [01:02<02:14, 1.79s/it]
26%|β–ˆβ–ˆβ–Œ | 26/100 [01:03<01:55, 1.56s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.166666666666667e-06, 'num_tokens': 80858.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
26%|β–ˆβ–ˆβ–Œ | 26/100 [01:03<01:55, 1.56s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.111111111111111e-06, 'num_tokens': 84062.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
27%|β–ˆβ–ˆβ–‹ | 27/100 [01:04<01:53, 1.56s/it]
28%|β–ˆβ–ˆβ–Š | 28/100 [01:05<01:34, 1.31s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.055555555555556e-06, 'num_tokens': 87140.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
28%|β–ˆβ–ˆβ–Š | 28/100 [01:05<01:34, 1.31s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 90212.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
29%|β–ˆβ–ˆβ–‰ | 29/100 [01:05<01:32, 1.31s/it]
30%|β–ˆβ–ˆβ–ˆ | 30/100 [01:06<01:19, 1.13s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.944444444444445e-06, 'num_tokens': 93248.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
30%|β–ˆβ–ˆβ–ˆ | 30/100 [01:06<01:19, 1.13s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 96236.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
31%|β–ˆβ–ˆβ–ˆ | 31/100 [01:07<01:18, 1.13s/it]
32%|β–ˆβ–ˆβ–ˆβ– | 32/100 [01:07<01:08, 1.01s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.833333333333334e-06, 'num_tokens': 99482.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
32%|β–ˆβ–ˆβ–ˆβ– | 32/100 [01:07<01:08, 1.01s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.777777777777778e-06, 'num_tokens': 102608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 33/100 [01:08<01:07, 1.01s/it]
34%|β–ˆβ–ˆβ–ˆβ– | 34/100 [01:09<01:01, 1.08it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7222222222222225e-06, 'num_tokens': 105596.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
34%|β–ˆβ–ˆβ–ˆβ– | 34/100 [01:09<01:01, 1.08it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6666666666666666e-06, 'num_tokens': 108668.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 35/100 [01:10<01:00, 1.08it/s]
36%|β–ˆβ–ˆβ–ˆβ–Œ | 36/100 [01:10<00:55, 1.15it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6111111111111115e-06, 'num_tokens': 111656.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 36/100 [01:10<00:55, 1.15it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.555555555555556e-06, 'num_tokens': 114650.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 37/100 [01:11<00:54, 1.15it/s]
38%|β–ˆβ–ˆβ–ˆβ–Š | 38/100 [01:12<00:51, 1.21it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5e-06, 'num_tokens': 117728.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
38%|β–ˆβ–ˆβ–ˆβ–Š | 38/100 [01:12<00:51, 1.21it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.444444444444445e-06, 'num_tokens': 120764.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 39/100 [01:13<00:50, 1.21it/s]
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 40/100 [01:13<00:48, 1.24it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3888888888888893e-06, 'num_tokens': 123998.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 40/100 [01:13<00:48, 1.24it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 127034.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 41/100 [01:14<00:47, 1.24it/s]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 42/100 [01:15<00:45, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.277777777777778e-06, 'num_tokens': 130160.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 42/100 [01:15<00:45, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2222222222222227e-06, 'num_tokens': 133316.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 43/100 [01:16<00:44, 1.28it/s]
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 44/100 [01:16<00:43, 1.29it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1666666666666667e-06, 'num_tokens': 136550.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 44/100 [01:16<00:43, 1.29it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1111111111111116e-06, 'num_tokens': 139622.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 45/100 [01:17<00:42, 1.29it/s]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 46/100 [01:18<00:41, 1.31it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.055555555555556e-06, 'num_tokens': 142826.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 46/100 [01:18<00:41, 1.31it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 145814.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 47/100 [01:19<00:40, 1.31it/s]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 48/100 [01:19<00:39, 1.32it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.944444444444445e-06, 'num_tokens': 149060.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 48/100 [01:19<00:39, 1.32it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.888888888888889e-06, 'num_tokens': 152054.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 49/100 [01:20<00:38, 1.32it/s]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 50/100 [01:21<00:37, 1.33it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.8333333333333335e-06, 'num_tokens': 155288.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 50/100 [01:21<00:37, 1.33it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 158426.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 51/100 [01:22<00:36, 1.33it/s]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 52/100 [01:23<00:40, 1.20it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7222222222222224e-06, 'num_tokens': 161414.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 52/100 [01:23<00:40, 1.20it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.666666666666667e-06, 'num_tokens': 164552.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 53/100 [01:24<00:39, 1.20it/s]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 54/100 [01:24<00:37, 1.23it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.6111111111111113e-06, 'num_tokens': 167756.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 54/100 [01:24<00:37, 1.23it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5555555555555557e-06, 'num_tokens': 170828.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 55/100 [01:25<00:36, 1.23it/s]
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 56/100 [01:26<00:34, 1.27it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 173900.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 56/100 [01:26<00:34, 1.27it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4444444444444447e-06, 'num_tokens': 177056.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 57/100 [01:27<00:33, 1.27it/s]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 58/100 [01:27<00:32, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.388888888888889e-06, 'num_tokens': 180302.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 58/100 [01:27<00:32, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.3333333333333336e-06, 'num_tokens': 183290.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 59/100 [01:28<00:31, 1.28it/s]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 60/100 [01:29<00:31, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.277777777777778e-06, 'num_tokens': 186326.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 60/100 [01:29<00:31, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 189398.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 61/100 [01:30<00:30, 1.28it/s]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 62/100 [01:30<00:29, 1.31it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.166666666666667e-06, 'num_tokens': 192470.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 62/100 [01:30<00:29, 1.31it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.1111111111111114e-06, 'num_tokens': 195608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 63/100 [01:31<00:28, 1.31it/s]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 64/100 [01:32<00:27, 1.31it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0555555555555555e-06, 'num_tokens': 198734.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 64/100 [01:32<00:27, 1.31it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 201806.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 65/100 [01:33<00:26, 1.31it/s]
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 66/100 [01:33<00:25, 1.33it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.944444444444445e-06, 'num_tokens': 204794.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 66/100 [01:33<00:25, 1.33it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.888888888888889e-06, 'num_tokens': 207830.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 67/100 [01:34<00:24, 1.33it/s]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 68/100 [01:35<00:23, 1.34it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8333333333333333e-06, 'num_tokens': 210902.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 68/100 [01:35<00:23, 1.34it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.777777777777778e-06, 'num_tokens': 213974.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 69/100 [01:36<00:23, 1.34it/s]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 70/100 [01:36<00:22, 1.35it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.7222222222222224e-06, 'num_tokens': 217046.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 70/100 [01:36<00:22, 1.35it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 220292.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 71/100 [01:37<00:21, 1.35it/s]
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 72/100 [01:38<00:20, 1.34it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6111111111111113e-06, 'num_tokens': 223280.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 72/100 [01:38<00:20, 1.34it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5555555555555558e-06, 'num_tokens': 226358.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 73/100 [01:39<00:20, 1.34it/s]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 74/100 [01:39<00:19, 1.34it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5e-06, 'num_tokens': 229496.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 74/100 [01:39<00:19, 1.34it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4444444444444445e-06, 'num_tokens': 232730.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 75/100 [01:40<00:18, 1.34it/s]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 76/100 [01:41<00:19, 1.20it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888892e-06, 'num_tokens': 235802.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 76/100 [01:41<00:19, 1.20it/s]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 76/100 [01:52<00:19, 1.20it/s]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 77/100 [02:19<02:47, 7.28s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3333333333333334e-06, 'num_tokens': 238797.0, 'completions/mean_length': 2.1666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 8.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 2.1666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 8.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 2.1666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 77/100 [02:19<02:47, 7.28s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2777777777777779e-06, 'num_tokens': 241935.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 78/100 [02:19<02:40, 7.28s/it]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 79/100 [02:20<01:46, 5.09s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2222222222222223e-06, 'num_tokens': 244923.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 79/100 [02:20<01:46, 5.09s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1666666666666668e-06, 'num_tokens': 247995.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 80/100 [02:21<01:41, 5.09s/it]
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 81/100 [02:22<01:09, 3.68s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 251067.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 81/100 [02:22<01:09, 3.68s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0555555555555557e-06, 'num_tokens': 254145.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 82/100 [02:22<01:06, 3.68s/it]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 83/100 [02:23<00:46, 2.75s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 257181.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 83/100 [02:23<00:46, 2.75s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.444444444444445e-07, 'num_tokens': 260253.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 84/100 [02:24<00:44, 2.75s/it]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 85/100 [02:25<00:31, 2.13s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.88888888888889e-07, 'num_tokens': 263247.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 85/100 [02:25<00:31, 2.13s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.333333333333333e-07, 'num_tokens': 266451.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 86/100 [02:26<00:29, 2.13s/it]
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 87/100 [02:26<00:22, 1.73s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.777777777777779e-07, 'num_tokens': 269439.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 87/100 [02:26<00:22, 1.73s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.222222222222222e-07, 'num_tokens': 272427.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 88/100 [02:27<00:20, 1.73s/it]
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 89/100 [02:28<00:15, 1.43s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.666666666666667e-07, 'num_tokens': 275499.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 89/100 [02:28<00:15, 1.43s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.111111111111112e-07, 'num_tokens': 278655.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 90/100 [02:29<00:14, 1.43s/it]
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 91/100 [02:29<00:10, 1.22s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 281691.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 91/100 [02:29<00:10, 1.22s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.000000000000001e-07, 'num_tokens': 284925.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 92/100 [02:30<00:09, 1.22s/it]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 93/100 [02:31<00:07, 1.08s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444445e-07, 'num_tokens': 287913.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 93/100 [02:31<00:07, 1.08s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.8888888888888895e-07, 'num_tokens': 290985.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 94/100 [02:32<00:06, 1.08s/it]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 95/100 [02:32<00:04, 1.02it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333335e-07, 'num_tokens': 294123.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 95/100 [02:32<00:04, 1.02it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777776e-07, 'num_tokens': 297357.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 96/100 [02:33<00:03, 1.02it/s]
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 96/100 [02:44<00:03, 1.02it/s]
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 97/100 [03:05<00:16, 5.61s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2222222222222224e-07, 'num_tokens': 300361.0, 'completions/mean_length': 3.6666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 17.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.6666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 17.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.6666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 97/100 [03:05<00:16, 5.61s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666668e-07, 'num_tokens': 303517.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 98/100 [03:06<00:11, 5.61s/it]
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 99/100 [03:06<00:04, 4.15s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1111111111111112e-07, 'num_tokens': 306721.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 99/100 [03:06<00:04, 4.15s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555556e-08, 'num_tokens': 309859.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100/100 [03:07<00:00, 4.15s/it]
{'train_runtime': 188.2628, 'train_samples_per_second': 3.187, 'train_steps_per_second': 0.531, 'train_loss': 0.0, 'epoch': 0.17}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100/100 [03:08<00:00, 4.15s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100/100 [03:08<00:00, 1.88s/it]
Phase 2 done in 3.1 min
[diagnostic] seed=100 raw completion (first 500 chars):
<tool_call>
1st-order: Insurers exiting Florida and California triggers a massive flight-to-safety, driving 10-year Treasuries down and freezing municipal bonds. 2nd-order: The freeze in municipal bonds directly crushes the yield curve, making long-duration BONDS a dead asset over the next 12 quarters. 3rd-order: The physical loss of insurance capital in the Gulf Coast and Bay Area will eventually trigger a broader real estate market correction, severely hurting REAL_ESTATE. Base case: Deflation
[parse_action result]: metadata={} weights=[0.2, 0.05, 0.05, 0.0, 0.7] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary'
── Hold-out eval (5/5 valid) ──
mean regret: -0.0391
beat baseline: 2/5
══ GRPO Phase 3: 12Q episodes, 80 iters, rewards=['format', 'regret', 'sharpe', 'drawdown', 'carbon'] ══
Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it.
Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it.
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 480 | Num Epochs = 1 | Total steps = 80
O^O/ \_/ \ Batch size per device = 6 | Gradient accumulation steps = 1
\ / Data Parallel GPUs = 1 | Total batch size (6 x 1 x 1) = 6
"-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained)
0%| | 0/80 [00:00<?, ?it/s]Unsloth: Will smartly offload gradients to save VRAM!
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 3216.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}
1%|▏ | 1/80 [00:00<01:04, 1.22it/s]
2%|β–Ž | 2/80 [00:01<01:00, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.25e-07, 'num_tokens': 6288.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}
2%|β–Ž | 2/80 [00:01<01:00, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.25e-06, 'num_tokens': 9426.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
4%|▍ | 3/80 [00:02<01:00, 1.28it/s]
5%|β–Œ | 4/80 [00:03<00:58, 1.31it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8750000000000003e-06, 'num_tokens': 12564.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
5%|β–Œ | 4/80 [00:03<00:58, 1.31it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 15810.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
6%|β–‹ | 5/80 [00:03<00:57, 1.31it/s]
8%|β–Š | 6/80 [00:11<02:47, 2.27s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.125e-06, 'num_tokens': 18807.0, 'completions/mean_length': 69.16667175292969, 'completions/min_length': 1.0, 'completions/max_length': 400.0, 'completions/clipped_ratio': 0.16666666666666663, 'completions/mean_terminated_length': 3.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 11.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 69.16667175292969, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
8%|β–Š | 6/80 [00:11<02:47, 2.27s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7500000000000005e-06, 'num_tokens': 22041.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01}
9%|β–‰ | 7/80 [00:11<02:45, 2.27s/it]
10%|β–ˆ | 8/80 [00:12<02:00, 1.67s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.3750000000000005e-06, 'num_tokens': 25287.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
10%|β–ˆ | 8/80 [00:12<02:00, 1.67s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 28413.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
11%|β–ˆβ– | 9/80 [00:13<01:58, 1.67s/it]
12%|β–ˆβ–Ž | 10/80 [00:14<01:33, 1.34s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.930555555555556e-06, 'num_tokens': 31629.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
12%|β–ˆβ–Ž | 10/80 [00:14<01:33, 1.34s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.861111111111111e-06, 'num_tokens': 34863.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02}
14%|β–ˆβ– | 11/80 [00:14<01:32, 1.34s/it]
15%|β–ˆβ–Œ | 12/80 [00:15<01:17, 1.13s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.791666666666668e-06, 'num_tokens': 37851.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
15%|β–ˆβ–Œ | 12/80 [00:15<01:17, 1.13s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.722222222222222e-06, 'num_tokens': 40923.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
16%|β–ˆβ–‹ | 13/80 [00:16<01:16, 1.13s/it]
18%|β–ˆβ–Š | 14/80 [00:17<01:06, 1.00s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.652777777777779e-06, 'num_tokens': 44061.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
18%|β–ˆβ–Š | 14/80 [00:17<01:06, 1.00s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.583333333333333e-06, 'num_tokens': 47295.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
19%|β–ˆβ–‰ | 15/80 [00:17<01:05, 1.00s/it]
20%|β–ˆβ–ˆ | 16/80 [00:18<00:58, 1.09it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5138888888888895e-06, 'num_tokens': 50283.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03}
20%|β–ˆβ–ˆ | 16/80 [00:18<00:58, 1.09it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 53361.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
21%|β–ˆβ–ˆβ– | 17/80 [00:19<00:57, 1.09it/s]
22%|β–ˆβ–ˆβ–Ž | 18/80 [00:20<00:53, 1.16it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.3750000000000005e-06, 'num_tokens': 56577.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
22%|β–ˆβ–ˆβ–Ž | 18/80 [00:20<00:53, 1.16it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.305555555555556e-06, 'num_tokens': 59565.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
24%|β–ˆβ–ˆβ– | 19/80 [00:20<00:52, 1.16it/s]
25%|β–ˆβ–ˆβ–Œ | 20/80 [00:21<00:52, 1.15it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.236111111111111e-06, 'num_tokens': 62571.0, 'completions/mean_length': 4.0, 'completions/min_length': 1.0, 'completions/max_length': 19.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 4.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 19.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 4.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
25%|β–ˆβ–ˆβ–Œ | 20/80 [00:21<00:52, 1.15it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.166666666666667e-06, 'num_tokens': 65787.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04}
26%|β–ˆβ–ˆβ–‹ | 21/80 [00:23<00:51, 1.15it/s]
28%|β–ˆβ–ˆβ–Š | 22/80 [00:23<00:53, 1.08it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.097222222222222e-06, 'num_tokens': 69003.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
28%|β–ˆβ–ˆβ–Š | 22/80 [00:23<00:53, 1.08it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.027777777777779e-06, 'num_tokens': 72141.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
29%|β–ˆβ–ˆβ–‰ | 23/80 [00:24<00:52, 1.08it/s]
30%|β–ˆβ–ˆβ–ˆ | 24/80 [00:25<00:48, 1.15it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.958333333333333e-06, 'num_tokens': 75213.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
30%|β–ˆβ–ˆβ–ˆ | 24/80 [00:25<00:48, 1.15it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 78447.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
31%|β–ˆβ–ˆβ–ˆβ– | 25/80 [00:26<00:47, 1.15it/s]
32%|β–ˆβ–ˆβ–ˆβ–Ž | 26/80 [00:27<00:45, 1.19it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.819444444444444e-06, 'num_tokens': 81675.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05}
32%|β–ˆβ–ˆβ–ˆβ–Ž | 26/80 [00:27<00:45, 1.19it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7500000000000005e-06, 'num_tokens': 84813.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
34%|β–ˆβ–ˆβ–ˆβ– | 27/80 [00:27<00:44, 1.19it/s]
35%|β–ˆβ–ˆβ–ˆβ–Œ | 28/80 [00:28<00:43, 1.20it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.680555555555556e-06, 'num_tokens': 87808.0, 'completions/mean_length': 2.1666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 8.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 2.1666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 8.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 2.1666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 28/80 [00:28<00:43, 1.20it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6111111111111115e-06, 'num_tokens': 90880.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
36%|β–ˆβ–ˆβ–ˆβ–‹ | 29/80 [00:29<00:42, 1.20it/s]
38%|β–ˆβ–ˆβ–ˆβ–Š | 30/80 [00:30<00:40, 1.25it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5416666666666673e-06, 'num_tokens': 94006.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
38%|β–ˆβ–ˆβ–ˆβ–Š | 30/80 [00:30<00:40, 1.25it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.4722222222222224e-06, 'num_tokens': 97252.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 31/80 [00:30<00:39, 1.25it/s]
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 32/80 [00:31<00:37, 1.26it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.4027777777777783e-06, 'num_tokens': 100468.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 32/80 [00:31<00:37, 1.26it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 103672.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 33/80 [00:32<00:37, 1.26it/s]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 34/80 [00:33<00:37, 1.22it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2638888888888892e-06, 'num_tokens': 106689.0, 'completions/mean_length': 5.833333492279053, 'completions/min_length': 1.0, 'completions/max_length': 16.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 5.833333492279053, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 16.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 5.833333492279053, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 34/80 [00:33<00:37, 1.22it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1944444444444443e-06, 'num_tokens': 109761.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 35/80 [00:34<00:36, 1.22it/s]
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 35/80 [00:48<00:36, 1.22it/s]
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 36/80 [01:04<03:50, 5.24s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.125e-06, 'num_tokens': 112761.0, 'completions/mean_length': 3.0, 'completions/min_length': 1.0, 'completions/max_length': 13.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 13.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 36/80 [01:04<03:50, 5.24s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.055555555555556e-06, 'num_tokens': 115995.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 37/80 [01:05<03:45, 5.24s/it]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 38/80 [01:05<02:43, 3.89s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.986111111111111e-06, 'num_tokens': 119073.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 38/80 [01:05<02:43, 3.89s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.916666666666667e-06, 'num_tokens': 122319.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 39/80 [01:06<02:39, 3.89s/it]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 40/80 [01:07<01:58, 2.95s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.8472222222222224e-06, 'num_tokens': 125523.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 40/80 [01:07<01:58, 2.95s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 128559.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 41/80 [01:08<01:55, 2.95s/it]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 42/80 [01:09<01:30, 2.37s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7083333333333334e-06, 'num_tokens': 131547.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 42/80 [01:09<01:30, 2.37s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.6388888888888893e-06, 'num_tokens': 134751.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 43/80 [01:10<01:27, 2.37s/it]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 44/80 [01:11<01:07, 1.89s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5694444444444443e-06, 'num_tokens': 137967.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 44/80 [01:11<01:07, 1.89s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 141171.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 45/80 [01:11<01:06, 1.89s/it]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 46/80 [01:12<00:52, 1.55s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4305555555555557e-06, 'num_tokens': 144309.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 46/80 [01:12<00:52, 1.55s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.361111111111111e-06, 'num_tokens': 147609.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 47/80 [01:13<00:51, 1.55s/it]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 48/80 [01:14<00:42, 1.31s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2916666666666666e-06, 'num_tokens': 150849.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 48/80 [01:14<00:42, 1.31s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 154065.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 49/80 [01:14<00:40, 1.31s/it]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 50/80 [01:15<00:34, 1.15s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.152777777777778e-06, 'num_tokens': 157311.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 50/80 [01:15<00:34, 1.15s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0833333333333334e-06, 'num_tokens': 160551.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 51/80 [01:16<00:33, 1.15s/it]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 52/80 [01:17<00:28, 1.03s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0138888888888893e-06, 'num_tokens': 163623.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 52/80 [01:17<00:28, 1.03s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.944444444444445e-06, 'num_tokens': 166863.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 53/80 [01:17<00:27, 1.03s/it]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 54/80 [01:18<00:24, 1.06it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8750000000000003e-06, 'num_tokens': 170079.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 54/80 [01:18<00:24, 1.06it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8055555555555557e-06, 'num_tokens': 173319.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 55/80 [01:19<00:23, 1.06it/s]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 56/80 [01:20<00:21, 1.12it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.7361111111111112e-06, 'num_tokens': 176457.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 56/80 [01:20<00:21, 1.12it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 179595.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 57/80 [01:20<00:20, 1.12it/s]
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 58/80 [01:21<00:18, 1.18it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5972222222222221e-06, 'num_tokens': 182799.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 58/80 [01:21<00:18, 1.18it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.527777777777778e-06, 'num_tokens': 185937.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 59/80 [01:22<00:17, 1.18it/s]
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 60/80 [01:23<00:16, 1.22it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4583333333333335e-06, 'num_tokens': 189009.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 60/80 [01:23<00:16, 1.22it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888892e-06, 'num_tokens': 192255.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 61/80 [01:24<00:15, 1.22it/s]
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 62/80 [01:25<00:15, 1.13it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3194444444444446e-06, 'num_tokens': 195483.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 62/80 [01:25<00:15, 1.13it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.25e-06, 'num_tokens': 198555.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 63/80 [01:25<00:15, 1.13it/s]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 64/80 [01:26<00:13, 1.19it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1805555555555556e-06, 'num_tokens': 201633.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 64/80 [01:26<00:13, 1.19it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 204636.0, 'completions/mean_length': 3.5, 'completions/min_length': 1.0, 'completions/max_length': 16.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.5, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 16.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.5, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 65/80 [01:27<00:12, 1.19it/s]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 66/80 [01:28<00:11, 1.18it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0416666666666667e-06, 'num_tokens': 207708.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 66/80 [01:28<00:11, 1.18it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.722222222222224e-07, 'num_tokens': 211008.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 67/80 [01:29<00:11, 1.18it/s]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 68/80 [01:29<00:09, 1.22it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.027777777777779e-07, 'num_tokens': 213996.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 68/80 [01:29<00:09, 1.22it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.333333333333333e-07, 'num_tokens': 216998.0, 'completions/mean_length': 3.3333334922790527, 'completions/min_length': 1.0, 'completions/max_length': 15.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.3333334922790527, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 15.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.3333334922790527, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 69/80 [01:30<00:09, 1.22it/s]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 70/80 [01:31<00:08, 1.20it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.63888888888889e-07, 'num_tokens': 220124.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 70/80 [01:31<00:08, 1.20it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.944444444444446e-07, 'num_tokens': 223160.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 71/80 [01:32<00:07, 1.20it/s]
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 72/80 [01:33<00:06, 1.24it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.25e-07, 'num_tokens': 226238.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 72/80 [01:33<00:06, 1.24it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 229478.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 73/80 [01:33<00:05, 1.24it/s]
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 74/80 [01:34<00:04, 1.26it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.861111111111112e-07, 'num_tokens': 232706.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 74/80 [01:34<00:04, 1.26it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.1666666666666667e-07, 'num_tokens': 235844.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 75/80 [01:35<00:03, 1.26it/s]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 76/80 [01:36<00:03, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.472222222222223e-07, 'num_tokens': 238922.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 76/80 [01:36<00:03, 1.28it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777776e-07, 'num_tokens': 241994.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 77/80 [01:36<00:02, 1.28it/s]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 78/80 [01:37<00:01, 1.29it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0833333333333333e-07, 'num_tokens': 245240.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 78/80 [01:37<00:01, 1.29it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888888e-07, 'num_tokens': 248396.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 79/80 [01:38<00:00, 1.29it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 80/80 [01:39<00:00, 1.30it/s]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.944444444444444e-08, 'num_tokens': 251534.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 80/80 [01:39<00:00, 1.30it/s]
{'train_runtime': 100.0827, 'train_samples_per_second': 4.796, 'train_steps_per_second': 0.799, 'train_loss': 0.0, 'epoch': 0.17}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 80/80 [01:40<00:00, 1.30it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 80/80 [01:40<00:00, 1.25s/it]
Phase 3 done in 1.7 min
[diagnostic] seed=100 raw completion (first 500 chars):
<tool_call>
1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the massive overcapacity in the oil sector will be rapidly absorbed by industrial demand, driving a structural inflationary squeeze. This stagflationary regime will crush BONDS and compress REAL_ESTATE valuations. 3rd-order: The forced lo
[parse_action result]: metadata={} weights=[0.09523809523809523, 0.42857142857142855, 0.047619047619047616, 0.09523809523809523, 0.3333333333333333] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary'
── Hold-out eval (5/5 valid) ──
mean regret: -0.0941
beat baseline: 3/5
Found HuggingFace hub cache directory: /tmp/CarbonAlpha/hf_cache/hub
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 0%| | 0/2 [00:00<?, ?it/s]
Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.37it/s]
Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.37it/s]
Successfully copied all 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 0%| | 0/2 [00:00<?, ?it/s]
Unsloth: Preparing safetensor model files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 60787.01it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s]
Unsloth: Merging weights into 16bit: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:31<00:31, 31.55s/it]
Unsloth: Merging weights into 16bit: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:55<00:00, 26.86s/it]
Unsloth: Merging weights into 16bit: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:55<00:00, 27.56s/it]
Unsloth: Merge process complete. Saved to `/tmp/CarbonAlpha/checkpoints/final_merged`
Saved LoRA adapters to /tmp/CarbonAlpha/checkpoints/final_merged
[rank0]:[W425 10:13:19.025103781 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())