Instructions to use 77ethers/CarbonAlpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use 77ethers/CarbonAlpha with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| π¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning. | |
| π¦₯ Unsloth Zoo will now patch everything to make training faster! | |
| Loading unsloth/Qwen3-4B-Instruct-2507... | |
| INFO 04-25 09:59:05 [vllm_utils.py:724] Unsloth: Patching vLLM v1 graph capture | |
| ==((====))== Unsloth 2026.4.8: Fast Qwen3 patching. Transformers: 4.56.2. vLLM: 0.15.1. | |
| \\ /| NVIDIA L40S. Num GPUs = 1. Max memory: 44.392 GB. Platform: Linux. | |
| O^O/ \_/ \ Torch: 2.9.1+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.1 | |
| \ / Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False] | |
| "-____-" Free license: http://github.com/unslothai/unsloth | |
| Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored! | |
| Unsloth: FlashInfer requires JIT compilation but nvcc (CUDA compiler) is not found. | |
| vLLM will use FLASH_ATTN attention + PyTorch sampler instead (works fine). | |
| To enable FlashInfer, install the missing tools: | |
| nvcc - install the CUDA toolkit or set CUDA_HOME to your CUDA installation | |
| ninja - pip install ninja | |
| To silence this warning: set UNSLOTH_VLLM_NO_FLASHINFER=1 | |
| Unsloth: vLLM loading unsloth/Qwen3-4B-Instruct-2507 with actual GPU utilization = 89.06% | |
| Unsloth: Your GPU has CUDA compute capability 8.9 with VRAM = 44.39 GB. | |
| Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 4096. Num Sequences = 96. | |
| Unsloth: vLLM's KV Cache can use up to 32.5 GB. Also swap space = 6 GB. | |
| Unsloth: Not an error, but `use_cudagraph` is not supported in vLLM.config.CompilationConfig. Skipping. | |
| Unsloth: Not an error, but `use_inductor` is not supported in vLLM.config.CompilationConfig. Skipping. | |
| WARNING 04-25 09:59:07 [compilation.py:762] Level is deprecated and will be removed in the next release,either 0.12.0 or 0.11.2 whichever is soonest.Use mode instead.If both level and mode are given,only mode will be used. | |
| Unsloth: Not an error, but `device` is not supported in vLLM. Skipping. | |
| /root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/pydantic/type_adapter.py:607: UserWarning: Pydantic serializer warnings: | |
| PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [field_name='mode', input_value=3, input_type=int]) | |
| return self.serializer.to_python( | |
| INFO 04-25 09:59:07 [utils.py:261] non-default args: {'dtype': torch.bfloat16, 'max_model_len': 4096, 'enable_prefix_caching': True, 'swap_space': 6, 'gpu_memory_utilization': 0.8906117106477057, 'max_num_batched_tokens': 8192, 'max_num_seqs': 96, 'max_logprobs': 0, 'disable_log_stats': True, 'enable_lora': True, 'enable_chunked_prefill': True, 'compilation_config': {'level': 3, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': [], 'splitting_ops': None, 'compile_mm_encoder': False, 'compile_sizes': None, 'compile_ranges_split_points': None, 'inductor_compile_config': {'epilogue_fusion': True, 'max_autotune': False, 'shape_padding': True, 'trace.enabled': False, 'triton.cudagraphs': False, 'debug': False, 'dce': True, 'memory_planning': True, 'coordinate_descent_tuning': False, 'trace.graph_diagram': False, 'compile_threads': 8, 'group_fusion': True, 'disable_progress': False, 'verbose_progress': True, 'triton.multi_kernel': 0, 'triton.use_block_ptr': True, 'triton.enable_persistent_tma_matmul': True, 'triton.autotune_at_compile_time': False, 'triton.cooperative_reductions': False, 'cuda.compile_opt_level': '-O2', 'cuda.enable_cuda_lto': True, 'combo_kernels': False, 'benchmark_combo_kernel': True, 'combo_kernel_foreach_dynamic_shapes': True, 'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': None, 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': None, 'pass_config': {}, 'max_cudagraph_capture_size': None, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}, 'model': 'unsloth/Qwen3-4B-Instruct-2507'} | |
| WARNING 04-25 09:59:07 [arg_utils.py:1220] The global random seed is set to 0. Since VLLM_ENABLE_V1_MULTIPROCESSING is set to False, this may affect the random state of the Python process that launched vLLM. | |
| INFO 04-25 09:59:14 [model.py:541] Resolved architecture: Qwen3ForCausalLM | |
| INFO 04-25 09:59:14 [model.py:1561] Using max model len 4096 | |
| INFO 04-25 09:59:15 [scheduler.py:226] Chunked prefill is enabled with max_num_batched_tokens=8192. | |
| INFO 04-25 09:59:15 [vllm.py:624] Asynchronous scheduling is enabled. | |
| generation_config.json: 0%| | 0.00/237 [00:00<?, ?B/s][A | |
| generation_config.json: 100%|ββββββββββ| 237/237 [00:00<00:00, 1.79MB/s] | |
| tokenizer_config.json: 0%| | 0.00/9.65k [00:00<?, ?B/s][A | |
| tokenizer_config.json: 100%|ββββββββββ| 9.65k/9.65k [00:00<00:00, 60.1MB/s] | |
| vocab.json: 0%| | 0.00/2.78M [00:00<?, ?B/s][A | |
| vocab.json: 100%|ββββββββββ| 2.78M/2.78M [00:00<00:00, 53.7MB/s] | |
| merges.txt: 0%| | 0.00/1.67M [00:00<?, ?B/s][A | |
| merges.txt: 100%|ββββββββββ| 1.67M/1.67M [00:00<00:00, 83.0MB/s] | |
| tokenizer.json: 0%| | 0.00/11.4M [00:00<?, ?B/s][A | |
| tokenizer.json: 100%|ββββββββββ| 11.4M/11.4M [00:00<00:00, 44.7MB/s] | |
| added_tokens.json: 0%| | 0.00/707 [00:00<?, ?B/s][A | |
| added_tokens.json: 100%|ββββββββββ| 707/707 [00:00<00:00, 7.32MB/s] | |
| special_tokens_map.json: 0%| | 0.00/614 [00:00<?, ?B/s][A | |
| special_tokens_map.json: 100%|ββββββββββ| 614/614 [00:00<00:00, 3.17MB/s] | |
| chat_template.jinja: 0%| | 0.00/4.04k [00:00<?, ?B/s][A | |
| chat_template.jinja: 100%|ββββββββββ| 4.04k/4.04k [00:00<00:00, 43.4MB/s] | |
| /root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/pydantic/type_adapter.py:607: UserWarning: Pydantic serializer warnings: | |
| PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [field_name='mode', input_value=3, input_type=int]) | |
| return self.serializer.to_python( | |
| INFO 04-25 09:59:16 [core.py:96] Initializing a V1 LLM engine (v0.15.1) with config: model='unsloth/Qwen3-4B-Instruct-2507', speculative_config=None, tokenizer='unsloth/Qwen3-4B-Instruct-2507', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=unsloth/Qwen3-4B-Instruct-2507, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': 3, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [8192], 'inductor_compile_config': {'epilogue_fusion': True, 'max_autotune': False, 'shape_padding': True, 'trace.enabled': False, 'triton.cudagraphs': False, 'debug': False, 'dce': True, 'memory_planning': True, 'coordinate_descent_tuning': False, 'trace.graph_diagram': False, 'compile_threads': 8, 'group_fusion': True, 'disable_progress': False, 'verbose_progress': True, 'triton.multi_kernel': 0, 'triton.use_block_ptr': True, 'triton.enable_persistent_tma_matmul': True, 'triton.autotune_at_compile_time': False, 'triton.cooperative_reductions': False, 'cuda.compile_opt_level': '-O2', 'cuda.enable_cuda_lto': True, 'combo_kernels': False, 'benchmark_combo_kernel': True, 'combo_kernel_foreach_dynamic_shapes': True, 'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 192, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []} | |
| INFO 04-25 09:59:16 [parallel_state.py:1212] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.113.93.102:50843 backend=nccl | |
| INFO 04-25 09:59:16 [parallel_state.py:1423] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A | |
| INFO 04-25 09:59:16 [gpu_model_runner.py:4033] Starting to load model unsloth/Qwen3-4B-Instruct-2507... | |
| /root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. | |
| We recommend installing via `pip install torch-c-dlpack-ext` | |
| warnings.warn( | |
| INFO 04-25 09:59:19 [cuda.py:364] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') | |
| model.safetensors.index.json: 0%| | 0.00/32.9k [00:00<?, ?B/s][A | |
| model.safetensors.index.json: 100%|ββββββββββ| 32.9k/32.9k [00:00<00:00, 120MB/s] | |
| model-00001-of-00002.safetensors: 0%| | 0.00/4.97G [00:00<?, ?B/s][A | |
| model-00001-of-00002.safetensors: 3%|β | 134M/4.97G [00:01<00:37, 128MB/s][A | |
| model-00001-of-00002.safetensors: 31%|βββ | 1.54G/4.97G [00:02<00:04, 753MB/s][A | |
| model-00001-of-00002.safetensors: 53%|ββββββ | 2.61G/4.97G [00:03<00:02, 826MB/s][A | |
| model-00001-of-00002.safetensors: 100%|ββββββββββ| 4.97G/4.97G [00:04<00:00, 1.24GB/s] | |
| model-00002-of-00002.safetensors: 0%| | 0.00/3.08G [00:00<?, ?B/s][A | |
| model-00002-of-00002.safetensors: 0%| | 0.00/3.08G [00:01<?, ?B/s][A | |
| model-00002-of-00002.safetensors: 11%|β | 332M/3.08G [00:02<00:09, 304MB/s][A | |
| model-00002-of-00002.safetensors: 83%|βββββββββ | 2.54G/3.08G [00:03<00:00, 1.25GB/s][A | |
| model-00002-of-00002.safetensors: 100%|ββββββββββ| 3.08G/3.08G [00:04<00:00, 640MB/s] | |
| INFO 04-25 09:59:29 [weight_utils.py:527] Time spent downloading weights for unsloth/Qwen3-4B-Instruct-2507: 8.877664 seconds | |
| Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s] | |
| [A | |
| Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00, 2.75it/s] | |
| INFO 04-25 09:59:29 [default_loader.py:291] Loading weights took 0.74 seconds | |
| INFO 04-25 09:59:29 [punica_selector.py:20] Using PunicaWrapperGPU. | |
| INFO 04-25 09:59:30 [gpu_model_runner.py:4130] Model loading took 7.67 GiB memory and 12.958485 seconds | |
| INFO 04-25 09:59:42 [backends.py:812] Using cache directory: /root/.cache/vllm/torch_compile_cache/f6f5a6d496/rank_0_0/backbone for vLLM's torch.compile | |
| INFO 04-25 09:59:42 [backends.py:872] Dynamo bytecode transform time: 11.11 s | |
| Unsloth: Compiling kernels: 0%| | 0/5 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/5 [00:00<?, ?it/s, triton_red_fused__to_copy_add_embedding_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 20%|ββ | 1/5 [00:00<00:01, 3.47it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_1][A | |
| Unsloth: Compiling kernels: 40%|ββββ | 2/5 [00:00<00:01, 2.12it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_2][A | |
| Unsloth: Compiling kernels: 60%|ββββββ | 3/5 [00:00<00:00, 3.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_3][A | |
| Unsloth: Compiling kernels: 80%|ββββββββ | 4/5 [00:00<00:00, 4.21it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_4][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 5/5 [00:00<00:00, 5.25it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_4] | |
| INFO 04-25 09:59:52 [backends.py:302] Cache the graph of compile range (1, 8192) for later use | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 477.82it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 480.28it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 523.70it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 103.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 54.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 40.69it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 17.98it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 859.49it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 820.24it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 853.43it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 822.33it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 810.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 805.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 800.42it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 869.65it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 840.96it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 874.36it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 851.94it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 842.30it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 830.75it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 824.33it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 809.87it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 802.05it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 848.08it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 829.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 818.91it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 808.51it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 804.41it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 625.92it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 669.91it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 719.43it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 712.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 701.93it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 704.02it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 708.10it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 841.05it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 817.13it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 859.49it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 841.55it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 830.29it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 823.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 818.24it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 804.12it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 815.22it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 854.64it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 834.31it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 825.26it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 817.87it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 805.47it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 848.02it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 844.60it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 882.39it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 857.99it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 843.42it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 835.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 828.73it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 815.38it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 821.77it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 856.74it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 832.91it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 822.64it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 812.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 803.44it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 883.20it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 860.72it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 886.06it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 855.28it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 837.45it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 828.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 822.97it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 860.19it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 838.69it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 879.80it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 847.51it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 830.23it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 821.20it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 813.71it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 880.79it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 846.31it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 875.82it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 845.63it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 830.88it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 821.55it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 816.38it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 806.13it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 809.55it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 850.20it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 828.18it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 807.03it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 796.26it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 791.31it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 837.02it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 838.78it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 872.60it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 850.17it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 836.92it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 827.03it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 821.31it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 874.72it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 859.05it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 894.50it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 867.76it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 856.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 843.95it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 838.67it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 885.06it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 848.79it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 889.00it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 830.02it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 821.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 818.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 815.76it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 849.39it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 848.45it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 880.29it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 858.30it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 847.37it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 837.94it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 828.73it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 857.56it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 848.02it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 886.50it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 850.21it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 841.18it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 833.72it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 826.56it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 845.11it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 820.64it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 855.69it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 836.14it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 825.23it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 818.45it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 806.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 858.26it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 849.65it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 888.75it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 864.98it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 850.50it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 841.78it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 828.96it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 879.31it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 864.98it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 902.13it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 872.13it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 852.29it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 841.50it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 835.38it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 890.70it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 863.03it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 899.49it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 864.05it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 843.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 830.94it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 822.85it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 886.75it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 861.78it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 886.00it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 861.34it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 845.59it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 833.53it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 816.22it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 836.02it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 816.17it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 858.55it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 837.98it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 829.54it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 821.58it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 817.83it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 745.79it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 790.71it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 841.27it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 828.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 823.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 817.92it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 796.12it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 818.72it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 820.48it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 857.20it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 834.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 815.60it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 811.57it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 804.83it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 878.20it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 853.28it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 894.18it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 866.32it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 853.09it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 840.46it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 833.79it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 852.15it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 826.63it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 861.73it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 842.78it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 829.04it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 819.87it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 813.53it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 845.28it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 841.22it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 881.77it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 856.94it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 840.88it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 829.98it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 823.34it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 882.08it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 861.70it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 885.06it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 857.91it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 817.00it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 811.85it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 803.81it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 892.79it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 850.94it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 882.02it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 862.80it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 845.86it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 835.19it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 828.89it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 871.27it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 851.55it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 881.71it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 857.69it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 842.30it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 834.74it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 829.08it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 899.68it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 877.47it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 907.53it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 870.87it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 856.89it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 848.05it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 840.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 826.63it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 825.49it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 874.66it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 855.59it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 845.46it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 836.19it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 827.96it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/7 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 14%|ββ | 1/7 [00:00<00:00, 837.02it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 29%|βββ | 2/7 [00:00<00:00, 833.94it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 43%|βββββ | 3/7 [00:00<00:00, 879.99it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_3][A | |
| Unsloth: Compiling kernels: 57%|ββββββ | 4/7 [00:00<00:00, 855.81it/s, triton_red_fused__to_copy_mean_pow_split_with_sizes_view_4][A | |
| Unsloth: Compiling kernels: 71%|ββββββββ | 5/7 [00:00<00:00, 845.28it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_5][A | |
| Unsloth: Compiling kernels: 86%|βββββββββ | 6/7 [00:00<00:00, 837.16it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 7/7 [00:00<00:00, 831.40it/s, triton_poi_fused__to_copy_add_index_select_mean_mul_pow_rsqrt_split_split_with_sizes_sub_unsqueeze_view_6] | |
| Unsloth: Compiling kernels: 0%| | 0/3 [00:00<?, ?it/s][A | |
| Unsloth: Compiling kernels: 0%| | 0/3 [00:00<?, ?it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_0][A | |
| Unsloth: Compiling kernels: 33%|ββββ | 1/3 [00:00<00:00, 711.62it/s, triton_poi_fused_mul_silu_slice_1] [A | |
| Unsloth: Compiling kernels: 67%|βββββββ | 2/3 [00:00<00:00, 803.35it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2][A | |
| Unsloth: Compiling kernels: 100%|ββββββββββ| 3/3 [00:00<00:00, 20.50it/s, triton_red_fused__to_copy_add_mean_mul_pow_rsqrt_2] | |
| INFO 04-25 10:00:01 [backends.py:319] Compiling a graph for compile range (1, 8192) takes 12.73 s | |
| INFO 04-25 10:00:01 [monitor.py:34] torch.compile takes 23.84 s in total | |
| INFO 04-25 10:00:02 [gpu_worker.py:356] Available KV cache memory: 31.08 GiB | |
| INFO 04-25 10:00:02 [kv_cache_utils.py:1307] GPU KV cache size: 226,336 tokens | |
| INFO 04-25 10:00:02 [kv_cache_utils.py:1312] Maximum concurrency for 4,096 tokens per request: 55.26x | |
| INFO 04-25 10:00:02 [vllm_utils.py:729] Unsloth: Running patched vLLM v1 `capture_model`. | |
| Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/54 [00:00<?, ?it/s][AWARNING 04-25 10:00:03 [utils.py:268] Using default LoRA kernel configs | |
| Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 2%|β | 1/54 [00:02<02:05, 2.37s/it][A | |
| Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 7%|β | 4/54 [00:03<00:41, 1.22it/s][A | |
| Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 28%|βββ | 15/54 [00:04<00:09, 4.23it/s][A | |
| Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 39%|ββββ | 21/54 [00:07<00:09, 3.43it/s][A | |
| Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 61%|ββββββ | 33/54 [00:08<00:03, 5.37it/s][A | |
| Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 83%|βββββββββ | 45/54 [00:09<00:01, 6.87it/s][A | |
| Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|ββββββββββ| 54/54 [00:12<00:00, 4.61it/s][A | |
| Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|ββββββββββ| 54/54 [00:12<00:00, 4.27it/s] | |
| Capturing CUDA graphs (decode, FULL): 0%| | 0/30 [00:00<?, ?it/s][A | |
| Capturing CUDA graphs (decode, FULL): 37%|ββββ | 11/30 [00:01<00:01, 10.43it/s][A | |
| Capturing CUDA graphs (decode, FULL): 77%|ββββββββ | 23/30 [00:02<00:00, 11.21it/s][A | |
| Capturing CUDA graphs (decode, FULL): 100%|ββββββββββ| 30/30 [00:02<00:00, 10.98it/s] | |
| INFO 04-25 10:00:18 [gpu_model_runner.py:5063] Graph capturing finished in 15 secs, took 0.69 GiB | |
| INFO 04-25 10:00:18 [vllm_utils.py:736] Unsloth: Patched vLLM v1 graph capture finished in 15 secs. | |
| INFO 04-25 10:00:19 [core.py:272] init engine (profile, create kv cache, warmup model) took 48.93 seconds | |
| INFO 04-25 10:00:21 [llm.py:343] Supported tasks: ('generate',) | |
| Unsloth: Just some info: will skip parsing ['layer_norm2', 'q_norm', 'post_attention_layernorm', 'norm', 'input_layernorm', 'ffn_norm', 'post_layernorm', 'norm1', 'layer_norm1', 'attention_norm', 'post_feedforward_layernorm', 'k_norm', 'norm2', 'pre_feedforward_layernorm'] | |
| Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s][A | |
| Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:00<00:00, 46.31it/s] | |
| Some weights of Qwen3ForCausalLM were not initialized from the model checkpoint at unsloth/Qwen3-4B-Instruct-2507 and are newly initialized: ['lm_head.weight'] | |
| You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. | |
| Performing substitution for additional_keys=set() | |
| Unsloth: Just some info: will skip parsing ['layer_norm2', 'q_norm', 'post_attention_layernorm', 'norm', 'input_layernorm', 'cross_attn_input_layernorm', 'ffn_norm', 'post_layernorm', 'norm1', 'cross_attn_post_attention_layernorm', 'layer_norm1', 'attention_norm', 'post_feedforward_layernorm', 'k_norm', 'norm2', 'pre_feedforward_layernorm'] | |
| unsloth/Qwen3-4B-Instruct-2507 does not have a padding token! Will use pad_token = <|PAD_TOKEN|>. | |
| Unsloth 2026.4.8 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers. | |
| VRAM allocated: 41.84 GB | |
| ββ SFT warm-start β sft_traces/traces_v2.jsonl ββ | |
| 120 SFT examples loaded (chat format in `text`) | |
| Unsloth: Tokenizing ["text"] (num_proc=12): 0%| | 0/120 [00:00<?, ? examples/s][A | |
| Unsloth: Tokenizing ["text"] (num_proc=12): 8%|β | 10/120 [00:01<00:13, 8.20 examples/s][A | |
| Unsloth: Tokenizing ["text"] (num_proc=12): 92%|ββββββββββ| 110/120 [00:02<00:00, 56.30 examples/s][A | |
| Unsloth: Tokenizing ["text"] (num_proc=12): 100%|ββββββββββ| 120/120 [00:02<00:00, 46.73 examples/s] | |
| π¦₯ Unsloth: Padding-free auto-enabled, enabling faster training. | |
| ==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 | |
| \\ /| Num examples = 120 | Num Epochs = 10 | Total steps = 150 | |
| O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 4 | |
| \ / Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8 | |
| "-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained) | |
| 0%| | 0/150 [00:00<?, ?it/s][AUnsloth: Will smartly offload gradients to save VRAM! | |
| 1%| | 1/150 [00:04<10:16, 4.14s/it][A | |
| 1%|β | 2/150 [00:05<06:02, 2.45s/it][A | |
| 2%|β | 3/150 [00:06<04:42, 1.92s/it][A | |
| 3%|β | 4/150 [00:08<04:26, 1.82s/it][A | |
| 3%|β | 5/150 [00:09<04:02, 1.67s/it][A | |
| [A{'loss': 3.6266, 'grad_norm': 2.663548231124878, 'learning_rate': 2.5e-05, 'epoch': 0.33} | |
| 3%|β | 5/150 [00:09<04:02, 1.67s/it][A | |
| 4%|β | 6/150 [00:11<03:41, 1.54s/it][A | |
| 5%|β | 7/150 [00:12<03:28, 1.46s/it][A | |
| 5%|β | 8/150 [00:13<03:18, 1.40s/it][A | |
| 6%|β | 9/150 [00:14<03:12, 1.36s/it][A | |
| 7%|β | 10/150 [00:16<03:07, 1.34s/it][A | |
| [A{'loss': 3.3225, 'grad_norm': 2.001558542251587, 'learning_rate': 4.9647887323943665e-05, 'epoch': 0.67} | |
| 7%|β | 10/150 [00:16<03:07, 1.34s/it][A | |
| 7%|β | 11/150 [00:17<03:04, 1.33s/it][A | |
| 8%|β | 12/150 [00:18<03:00, 1.31s/it][A | |
| 9%|β | 13/150 [00:20<02:58, 1.30s/it][A | |
| 9%|β | 14/150 [00:21<02:56, 1.30s/it][A | |
| 10%|β | 15/150 [00:22<02:55, 1.30s/it][A | |
| [A{'loss': 2.7371, 'grad_norm': 0.9380167722702026, 'learning_rate': 4.788732394366197e-05, 'epoch': 1.0} | |
| 10%|β | 15/150 [00:22<02:55, 1.30s/it][A | |
| 11%|β | 16/150 [00:23<02:53, 1.30s/it][A | |
| 11%|ββ | 17/150 [00:25<02:52, 1.30s/it][A | |
| 12%|ββ | 18/150 [00:26<02:50, 1.29s/it][A | |
| 13%|ββ | 19/150 [00:27<02:49, 1.29s/it][A | |
| 13%|ββ | 20/150 [00:29<02:48, 1.30s/it][A | |
| [A{'loss': 2.365, 'grad_norm': 0.8978179693222046, 'learning_rate': 4.6126760563380286e-05, 'epoch': 1.33} | |
| 13%|ββ | 20/150 [00:29<02:48, 1.30s/it][A | |
| 14%|ββ | 21/150 [00:30<02:47, 1.29s/it][A | |
| 15%|ββ | 22/150 [00:31<02:45, 1.29s/it][A | |
| 15%|ββ | 23/150 [00:32<02:43, 1.29s/it][A | |
| 16%|ββ | 24/150 [00:34<02:41, 1.28s/it][A | |
| 17%|ββ | 25/150 [00:35<02:39, 1.28s/it][A | |
| [A{'loss': 2.0451, 'grad_norm': 0.9256548285484314, 'learning_rate': 4.436619718309859e-05, 'epoch': 1.67} | |
| 17%|ββ | 25/150 [00:35<02:39, 1.28s/it][A | |
| 17%|ββ | 26/150 [00:36<02:38, 1.28s/it][A | |
| 18%|ββ | 27/150 [00:38<02:38, 1.29s/it][A | |
| 19%|ββ | 28/150 [00:39<02:36, 1.29s/it][A | |
| 19%|ββ | 29/150 [00:40<02:35, 1.28s/it][A | |
| 20%|ββ | 30/150 [00:41<02:34, 1.28s/it][A | |
| [A{'loss': 1.7249, 'grad_norm': 0.8666767477989197, 'learning_rate': 4.26056338028169e-05, 'epoch': 2.0} | |
| 20%|ββ | 30/150 [00:41<02:34, 1.28s/it][A | |
| 21%|ββ | 31/150 [00:43<02:32, 1.28s/it][A | |
| 21%|βββ | 32/150 [00:44<02:32, 1.29s/it][A | |
| 22%|βββ | 33/150 [00:45<02:30, 1.29s/it][A | |
| 23%|βββ | 34/150 [00:47<02:29, 1.29s/it][A | |
| 23%|βββ | 35/150 [00:48<02:28, 1.29s/it][A | |
| [A{'loss': 1.4079, 'grad_norm': 1.2116891145706177, 'learning_rate': 4.0845070422535214e-05, 'epoch': 2.33} | |
| 23%|βββ | 35/150 [00:48<02:28, 1.29s/it][A | |
| 24%|βββ | 36/150 [00:49<02:26, 1.29s/it][A | |
| 25%|βββ | 37/150 [00:50<02:24, 1.28s/it][A | |
| 25%|βββ | 38/150 [00:52<02:24, 1.29s/it][A | |
| 26%|βββ | 39/150 [00:53<02:22, 1.29s/it][A | |
| 27%|βββ | 40/150 [00:54<02:21, 1.29s/it][A | |
| [A{'loss': 1.1155, 'grad_norm': 0.8696402311325073, 'learning_rate': 3.908450704225352e-05, 'epoch': 2.67} | |
| 27%|βββ | 40/150 [00:54<02:21, 1.29s/it][A | |
| 27%|βββ | 41/150 [00:56<02:20, 1.29s/it][A | |
| 28%|βββ | 42/150 [00:57<02:19, 1.29s/it][A | |
| 29%|βββ | 43/150 [00:58<02:17, 1.29s/it][A | |
| 29%|βββ | 44/150 [01:00<02:17, 1.29s/it][A | |
| 30%|βββ | 45/150 [01:01<02:14, 1.29s/it][A | |
| [A{'loss': 0.9477, 'grad_norm': 0.5664961338043213, 'learning_rate': 3.7323943661971835e-05, 'epoch': 3.0} | |
| 30%|βββ | 45/150 [01:01<02:14, 1.29s/it][A | |
| 31%|βββ | 46/150 [01:02<02:13, 1.28s/it][A | |
| 31%|ββββ | 47/150 [01:03<02:11, 1.28s/it][A | |
| 32%|ββββ | 48/150 [01:05<02:10, 1.28s/it][A | |
| 33%|ββββ | 49/150 [01:06<02:09, 1.29s/it][A | |
| 33%|ββββ | 50/150 [01:07<02:09, 1.30s/it][A | |
| [A{'loss': 0.8914, 'grad_norm': 0.4789012372493744, 'learning_rate': 3.556338028169014e-05, 'epoch': 3.33} | |
| 33%|ββββ | 50/150 [01:07<02:09, 1.30s/it][A | |
| 34%|ββββ | 51/150 [01:09<02:07, 1.29s/it][A | |
| 35%|ββββ | 52/150 [01:10<02:06, 1.29s/it][A | |
| 35%|ββββ | 53/150 [01:11<02:04, 1.29s/it][A | |
| 36%|ββββ | 54/150 [01:12<02:03, 1.29s/it][A | |
| 37%|ββββ | 55/150 [01:14<02:02, 1.29s/it][A | |
| [A{'loss': 0.8417, 'grad_norm': 0.3655957579612732, 'learning_rate': 3.380281690140845e-05, 'epoch': 3.67} | |
| 37%|ββββ | 55/150 [01:14<02:02, 1.29s/it][A | |
| 37%|ββββ | 56/150 [01:15<02:00, 1.29s/it][A | |
| 38%|ββββ | 57/150 [01:16<01:59, 1.29s/it][A | |
| 39%|ββββ | 58/150 [01:18<01:59, 1.30s/it][A | |
| 39%|ββββ | 59/150 [01:19<01:58, 1.30s/it][A | |
| 40%|ββββ | 60/150 [01:20<01:56, 1.30s/it][A | |
| [A{'loss': 0.8088, 'grad_norm': 0.36159124970436096, 'learning_rate': 3.204225352112676e-05, 'epoch': 4.0} | |
| 40%|ββββ | 60/150 [01:20<01:56, 1.30s/it][A | |
| 41%|ββββ | 61/150 [01:21<01:56, 1.30s/it][A | |
| 41%|βββββ | 62/150 [01:23<01:54, 1.30s/it][A | |
| 42%|βββββ | 63/150 [01:24<01:53, 1.30s/it][A | |
| 43%|βββββ | 64/150 [01:25<01:51, 1.30s/it][A | |
| 43%|βββββ | 65/150 [01:27<01:50, 1.30s/it][A | |
| [A{'loss': 0.7978, 'grad_norm': 0.3379436433315277, 'learning_rate': 3.028169014084507e-05, 'epoch': 4.33} | |
| 43%|βββββ | 65/150 [01:27<01:50, 1.30s/it][A | |
| 44%|βββββ | 66/150 [01:28<01:49, 1.30s/it][A | |
| 45%|βββββ | 67/150 [01:29<01:47, 1.30s/it][A | |
| 45%|βββββ | 68/150 [01:31<01:46, 1.29s/it][A | |
| 46%|βββββ | 69/150 [01:32<01:44, 1.29s/it][A | |
| 47%|βββββ | 70/150 [01:33<01:43, 1.29s/it][A | |
| [A{'loss': 0.7577, 'grad_norm': 0.3583666682243347, 'learning_rate': 2.8521126760563384e-05, 'epoch': 4.67} | |
| 47%|βββββ | 70/150 [01:33<01:43, 1.29s/it][A | |
| 47%|βββββ | 71/150 [01:34<01:42, 1.29s/it][A | |
| 48%|βββββ | 72/150 [01:36<01:41, 1.30s/it][A | |
| 49%|βββββ | 73/150 [01:37<01:40, 1.31s/it][A | |
| 49%|βββββ | 74/150 [01:38<01:39, 1.30s/it][A | |
| 50%|βββββ | 75/150 [01:40<01:37, 1.30s/it][A | |
| [A{'loss': 0.7794, 'grad_norm': 0.33592215180397034, 'learning_rate': 2.676056338028169e-05, 'epoch': 5.0} | |
| 50%|βββββ | 75/150 [01:40<01:37, 1.30s/it][A | |
| 51%|βββββ | 76/150 [01:41<01:36, 1.30s/it][A | |
| 51%|ββββββ | 77/150 [01:42<01:35, 1.30s/it][A | |
| 52%|ββββββ | 78/150 [01:44<01:33, 1.30s/it][A | |
| 53%|ββββββ | 79/150 [01:45<01:32, 1.30s/it][A | |
| 53%|ββββββ | 80/150 [01:46<01:30, 1.30s/it][A | |
| [A{'loss': 0.7684, 'grad_norm': 0.3456568121910095, 'learning_rate': 2.5e-05, 'epoch': 5.33} | |
| 53%|ββββββ | 80/150 [01:46<01:30, 1.30s/it][A | |
| 54%|ββββββ | 81/150 [01:47<01:29, 1.30s/it][A | |
| 55%|ββββββ | 82/150 [01:49<01:28, 1.30s/it][A | |
| 55%|ββββββ | 83/150 [01:50<01:26, 1.29s/it][A | |
| 56%|ββββββ | 84/150 [01:51<01:25, 1.29s/it][A | |
| 57%|ββββββ | 85/150 [01:53<01:23, 1.29s/it][A | |
| [A{'loss': 0.7243, 'grad_norm': 0.33662667870521545, 'learning_rate': 2.323943661971831e-05, 'epoch': 5.67} | |
| 57%|ββββββ | 85/150 [01:53<01:23, 1.29s/it][A | |
| 57%|ββββββ | 86/150 [01:54<01:22, 1.29s/it][A | |
| 58%|ββββββ | 87/150 [01:55<01:20, 1.28s/it][A | |
| 59%|ββββββ | 88/150 [01:56<01:19, 1.29s/it][A | |
| 59%|ββββββ | 89/150 [01:58<01:18, 1.29s/it][A | |
| 60%|ββββββ | 90/150 [01:59<01:17, 1.29s/it][A | |
| [A{'loss': 0.7285, 'grad_norm': 0.3644108772277832, 'learning_rate': 2.147887323943662e-05, 'epoch': 6.0} | |
| 60%|ββββββ | 90/150 [01:59<01:17, 1.29s/it][A | |
| 61%|ββββββ | 91/150 [02:00<01:16, 1.29s/it][A | |
| 61%|βββββββ | 92/150 [02:02<01:15, 1.30s/it][A | |
| 62%|βββββββ | 93/150 [02:03<01:13, 1.29s/it][A | |
| 63%|βββββββ | 94/150 [02:04<01:12, 1.29s/it][A | |
| 63%|βββββββ | 95/150 [02:05<01:10, 1.29s/it][A | |
| [A{'loss': 0.7192, 'grad_norm': 0.35359156131744385, 'learning_rate': 1.971830985915493e-05, 'epoch': 6.33} | |
| 63%|βββββββ | 95/150 [02:05<01:10, 1.29s/it][A | |
| 64%|βββββββ | 96/150 [02:07<01:10, 1.30s/it][A | |
| 65%|βββββββ | 97/150 [02:08<01:08, 1.30s/it][A | |
| 65%|βββββββ | 98/150 [02:09<01:07, 1.30s/it][A | |
| 66%|βββββββ | 99/150 [02:11<01:06, 1.30s/it][A | |
| 67%|βββββββ | 100/150 [02:12<01:05, 1.30s/it][A | |
| [A{'loss': 0.7025, 'grad_norm': 0.3457960784435272, 'learning_rate': 1.7957746478873243e-05, 'epoch': 6.67} | |
| 67%|βββββββ | 100/150 [02:12<01:05, 1.30s/it][A | |
| 67%|βββββββ | 101/150 [02:13<01:03, 1.30s/it][A | |
| 68%|βββββββ | 102/150 [02:15<01:02, 1.30s/it][A | |
| 69%|βββββββ | 103/150 [02:16<01:00, 1.29s/it][A | |
| 69%|βββββββ | 104/150 [02:17<00:59, 1.30s/it][A | |
| 70%|βββββββ | 105/150 [02:18<00:58, 1.30s/it][A | |
| [A{'loss': 0.7215, 'grad_norm': 0.3716900646686554, 'learning_rate': 1.619718309859155e-05, 'epoch': 7.0} | |
| 70%|βββββββ | 105/150 [02:18<00:58, 1.30s/it][A | |
| 71%|βββββββ | 106/150 [02:20<00:57, 1.30s/it][A | |
| 71%|ββββββββ | 107/150 [02:21<00:56, 1.30s/it][A | |
| 72%|ββββββββ | 108/150 [02:22<00:54, 1.30s/it][A | |
| 73%|ββββββββ | 109/150 [02:24<00:53, 1.30s/it][A | |
| 73%|ββββββββ | 110/150 [02:25<00:51, 1.30s/it][A | |
| [A{'loss': 0.6965, 'grad_norm': 0.35728198289871216, 'learning_rate': 1.443661971830986e-05, 'epoch': 7.33} | |
| 73%|ββββββββ | 110/150 [02:25<00:51, 1.30s/it][A | |
| 74%|ββββββββ | 111/150 [02:26<00:50, 1.30s/it][A | |
| 75%|ββββββββ | 112/150 [02:28<00:49, 1.30s/it][A | |
| 75%|ββββββββ | 113/150 [02:29<00:48, 1.30s/it][A | |
| 76%|ββββββββ | 114/150 [02:30<00:46, 1.30s/it][A | |
| 77%|ββββββββ | 115/150 [02:31<00:45, 1.30s/it][A | |
| [A{'loss': 0.701, 'grad_norm': 0.3863743245601654, 'learning_rate': 1.267605633802817e-05, 'epoch': 7.67} | |
| 77%|ββββββββ | 115/150 [02:31<00:45, 1.30s/it][A | |
| 77%|ββββββββ | 116/150 [02:33<00:44, 1.30s/it][A | |
| 78%|ββββββββ | 117/150 [02:34<00:42, 1.30s/it][A | |
| 79%|ββββββββ | 118/150 [02:35<00:41, 1.30s/it][A | |
| 79%|ββββββββ | 119/150 [02:37<00:40, 1.31s/it][A | |
| 80%|ββββββββ | 120/150 [02:38<00:39, 1.31s/it][A | |
| [A{'loss': 0.691, 'grad_norm': 0.38696053624153137, 'learning_rate': 1.0915492957746478e-05, 'epoch': 8.0} | |
| 80%|ββββββββ | 120/150 [02:38<00:39, 1.31s/it][A | |
| 81%|ββββββββ | 121/150 [02:39<00:38, 1.32s/it][A | |
| 81%|βββββββββ | 122/150 [02:41<00:36, 1.31s/it][A | |
| 82%|βββββββββ | 123/150 [02:42<00:35, 1.31s/it][A | |
| 83%|βββββββββ | 124/150 [02:43<00:33, 1.31s/it][A | |
| 83%|βββββββββ | 125/150 [02:45<00:32, 1.30s/it][A | |
| [A{'loss': 0.6836, 'grad_norm': 0.3782326579093933, 'learning_rate': 9.15492957746479e-06, 'epoch': 8.33} | |
| 83%|βββββββββ | 125/150 [02:45<00:32, 1.30s/it][A | |
| 84%|βββββββββ | 126/150 [02:46<00:31, 1.30s/it][A | |
| 85%|βββββββββ | 127/150 [02:47<00:30, 1.31s/it][A | |
| 85%|βββββββββ | 128/150 [02:48<00:28, 1.30s/it][A | |
| 86%|βββββββββ | 129/150 [02:50<00:27, 1.30s/it][A | |
| 87%|βββββββββ | 130/150 [02:51<00:26, 1.30s/it][A | |
| [A{'loss': 0.6819, 'grad_norm': 0.3920275866985321, 'learning_rate': 7.394366197183099e-06, 'epoch': 8.67} | |
| 87%|βββββββββ | 130/150 [02:51<00:26, 1.30s/it][A | |
| 87%|βββββββββ | 131/150 [02:52<00:24, 1.30s/it][A | |
| 88%|βββββββββ | 132/150 [02:54<00:23, 1.29s/it][A | |
| 89%|βββββββββ | 133/150 [02:55<00:22, 1.30s/it][A | |
| 89%|βββββββββ | 134/150 [02:56<00:20, 1.30s/it][A | |
| 90%|βββββββββ | 135/150 [02:58<00:19, 1.30s/it][A | |
| [A{'loss': 0.6833, 'grad_norm': 0.37108415365219116, 'learning_rate': 5.6338028169014084e-06, 'epoch': 9.0} | |
| 90%|βββββββββ | 135/150 [02:58<00:19, 1.30s/it][A | |
| 91%|βββββββββ | 136/150 [02:59<00:18, 1.30s/it][A | |
| 91%|ββββββββββ| 137/150 [03:00<00:16, 1.30s/it][A | |
| 92%|ββββββββββ| 138/150 [03:01<00:15, 1.30s/it][A | |
| 93%|ββββββββββ| 139/150 [03:03<00:14, 1.29s/it][A | |
| 93%|ββββββββββ| 140/150 [03:04<00:12, 1.29s/it][A | |
| [A{'loss': 0.6688, 'grad_norm': 0.3897058367729187, 'learning_rate': 3.873239436619718e-06, 'epoch': 9.33} | |
| 93%|ββββββββββ| 140/150 [03:04<00:12, 1.29s/it][A | |
| 94%|ββββββββββ| 141/150 [03:05<00:11, 1.29s/it][A | |
| 95%|ββββββββββ| 142/150 [03:07<00:10, 1.29s/it][A | |
| 95%|ββββββββββ| 143/150 [03:08<00:09, 1.29s/it][A | |
| 96%|ββββββββββ| 144/150 [03:09<00:07, 1.30s/it][A | |
| 97%|ββββββββββ| 145/150 [03:10<00:06, 1.30s/it][A | |
| [A{'loss': 0.6744, 'grad_norm': 0.3871634006500244, 'learning_rate': 2.112676056338028e-06, 'epoch': 9.67} | |
| 97%|ββββββββββ| 145/150 [03:10<00:06, 1.30s/it][A | |
| 97%|ββββββββββ| 146/150 [03:12<00:05, 1.30s/it][A | |
| 98%|ββββββββββ| 147/150 [03:13<00:03, 1.30s/it][A | |
| 99%|ββββββββββ| 148/150 [03:14<00:02, 1.33s/it][A | |
| 99%|ββββββββββ| 149/150 [03:16<00:01, 1.32s/it][A | |
| 100%|ββββββββββ| 150/150 [03:17<00:00, 1.31s/it][A | |
| [A{'loss': 0.6839, 'grad_norm': 0.40198108553886414, 'learning_rate': 3.5211267605633803e-07, 'epoch': 10.0} | |
| 100%|ββββββββββ| 150/150 [03:17<00:00, 1.31s/it][A | |
| [A{'train_runtime': 198.1987, 'train_samples_per_second': 6.055, 'train_steps_per_second': 0.757, 'train_loss': 1.1565947977701823, 'epoch': 10.0} | |
| 100%|ββββββββββ| 150/150 [03:18<00:00, 1.31s/it][A | |
| 100%|ββββββββββ| 150/150 [03:18<00:00, 1.32s/it] | |
| SFT done in 3.3 min | |
| ββ Pre-GRPO hold-out eval (SFT-only) ββ | |
| [diagnostic] seed=100 raw completion (first 500 chars): | |
| <tool_call> | |
| 1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the immediate 3-year cumulative real return is heavily penalized. The 12-quarter lockup forces a defensive tilt. 3rd-order: The fragmentation of global supply chains acts as a massive structural headwind for TECH and GREEN. The base case | |
| [parse_action result]: metadata={} weights=[0.0, 0.4, 0.0, 0.2, 0.4] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='fragmentation' | |
| ββ Hold-out eval (5/5 valid) ββ | |
| mean regret: -0.2516 | |
| beat baseline: 0/5 | |
| ββ GRPO Phase 1: 4Q episodes, 50 iters, rewards=['format', 'regret'] ββ | |
| Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it. | |
| Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it. | |
| ==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 | |
| \\ /| Num examples = 200 | Num Epochs = 1 | Total steps = 50 | |
| O^O/ \_/ \ Batch size per device = 4 | Gradient accumulation steps = 1 | |
| \ / Data Parallel GPUs = 1 | Total batch size (4 x 1 x 1) = 4 | |
| "-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained) | |
| 0%| | 0/50 [00:00<?, ?it/s][AWARNING 04-25 10:04:33 [input_processor.py:287] vLLM has deprecated support for supporting different tokenizers for different LoRAs. By default, vLLM uses base model's tokenizer. If you are using a LoRA with its own tokenizer, consider specifying `--tokenizer [lora_path]` to use the LoRA tokenizer. | |
| Unsloth: Will smartly offload gradients to save VRAM! | |
| 2%|β | 1/50 [00:14<11:30, 14.08s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 1996.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 2%|β | 1/50 [00:14<11:30, 14.08s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 4044.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 4%|β | 2/50 [00:14<11:15, 14.08s/it][A | |
| 6%|β | 3/50 [00:15<03:16, 4.18s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 6092.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 6%|β | 3/50 [00:15<03:16, 4.18s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 8140.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 8%|β | 4/50 [00:16<03:12, 4.18s/it][A | |
| 10%|β | 5/50 [00:16<01:44, 2.33s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 10224.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 10%|β | 5/50 [00:16<01:44, 2.33s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 12248.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 12%|ββ | 6/50 [00:17<01:42, 2.33s/it][A | |
| 14%|ββ | 7/50 [00:17<01:07, 1.58s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.888888888888889e-06, 'num_tokens': 14296.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 14%|ββ | 7/50 [00:17<01:07, 1.58s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.777777777777778e-06, 'num_tokens': 16348.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 16%|ββ | 8/50 [00:18<01:06, 1.58s/it][A | |
| 18%|ββ | 9/50 [00:18<00:49, 1.20s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.666666666666667e-06, 'num_tokens': 18396.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 18%|ββ | 9/50 [00:18<00:49, 1.20s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.555555555555556e-06, 'num_tokens': 20444.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 20%|ββ | 10/50 [00:19<00:48, 1.20s/it][A | |
| 22%|βββ | 11/50 [00:20<00:38, 1.02it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 22492.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 22%|βββ | 11/50 [00:20<00:38, 1.02it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.333333333333334e-06, 'num_tokens': 24544.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 24%|βββ | 12/50 [00:20<00:37, 1.02it/s][A | |
| 26%|βββ | 13/50 [00:21<00:34, 1.07it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.222222222222223e-06, 'num_tokens': 26592.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 26%|βββ | 13/50 [00:21<00:34, 1.07it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.111111111111111e-06, 'num_tokens': 28640.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 28%|βββ | 14/50 [00:22<00:33, 1.07it/s][A | |
| 30%|βββ | 15/50 [00:22<00:28, 1.23it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 30688.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 30%|βββ | 15/50 [00:22<00:28, 1.23it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 32736.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 32%|ββββ | 16/50 [00:23<00:27, 1.23it/s][A | |
| 34%|ββββ | 17/50 [00:24<00:24, 1.36it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.777777777777778e-06, 'num_tokens': 34784.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 34%|ββββ | 17/50 [00:24<00:24, 1.36it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6666666666666666e-06, 'num_tokens': 36832.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 36%|ββββ | 18/50 [00:24<00:23, 1.36it/s][A | |
| 38%|ββββ | 19/50 [00:25<00:21, 1.46it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.555555555555556e-06, 'num_tokens': 38884.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 38%|ββββ | 19/50 [00:25<00:21, 1.46it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.444444444444445e-06, 'num_tokens': 40908.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 40%|ββββ | 20/50 [00:25<00:20, 1.46it/s][A | |
| 42%|βββββ | 21/50 [00:26<00:18, 1.55it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 42904.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 42%|βββββ | 21/50 [00:26<00:18, 1.55it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2222222222222227e-06, 'num_tokens': 44988.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 44%|βββββ | 22/50 [00:26<00:18, 1.55it/s][A | |
| 46%|βββββ | 23/50 [00:27<00:16, 1.60it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1111111111111116e-06, 'num_tokens': 46984.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 46%|βββββ | 23/50 [00:27<00:16, 1.60it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 49008.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 48%|βββββ | 24/50 [00:28<00:16, 1.60it/s][A | |
| 50%|βββββ | 25/50 [00:29<00:17, 1.44it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.888888888888889e-06, 'num_tokens': 51092.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 50%|βββββ | 25/50 [00:29<00:17, 1.44it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 53176.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 52%|ββββββ | 26/50 [00:29<00:16, 1.44it/s][A | |
| 54%|ββββββ | 27/50 [00:30<00:15, 1.52it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.666666666666667e-06, 'num_tokens': 55172.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 54%|ββββββ | 27/50 [00:30<00:15, 1.52it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5555555555555557e-06, 'num_tokens': 57224.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 56%|ββββββ | 28/50 [00:30<00:14, 1.52it/s][A | |
| 58%|ββββββ | 29/50 [00:31<00:13, 1.58it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4444444444444447e-06, 'num_tokens': 59308.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 58%|ββββββ | 29/50 [00:31<00:13, 1.58it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.3333333333333336e-06, 'num_tokens': 61356.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 60%|ββββββ | 30/50 [00:32<00:12, 1.58it/s][A | |
| 62%|βββββββ | 31/50 [00:32<00:11, 1.63it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 63440.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 62%|βββββββ | 31/50 [00:32<00:11, 1.63it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.1111111111111114e-06, 'num_tokens': 65488.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 64%|βββββββ | 32/50 [00:33<00:11, 1.63it/s][A | |
| 66%|βββββββ | 33/50 [00:33<00:10, 1.67it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 67512.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17} | |
| 66%|βββββββ | 33/50 [00:33<00:10, 1.67it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.888888888888889e-06, 'num_tokens': 69564.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17} | |
| 68%|βββββββ | 34/50 [00:34<00:09, 1.67it/s][A | |
| 70%|βββββββ | 35/50 [00:34<00:08, 1.70it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.777777777777778e-06, 'num_tokens': 71560.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17} | |
| 70%|βββββββ | 35/50 [00:34<00:08, 1.70it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 73608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.18} | |
| 72%|ββββββββ | 36/50 [00:35<00:08, 1.70it/s][A | |
| 74%|ββββββββ | 37/50 [00:37<00:09, 1.35it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5555555555555558e-06, 'num_tokens': 75692.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.18} | |
| 74%|ββββββββ | 37/50 [00:37<00:09, 1.35it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4444444444444445e-06, 'num_tokens': 77740.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.19} | |
| 76%|ββββββββ | 38/50 [00:37<00:08, 1.35it/s][A | |
| 78%|ββββββββ | 39/50 [00:38<00:07, 1.45it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3333333333333334e-06, 'num_tokens': 79764.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2} | |
| 78%|ββββββββ | 39/50 [00:38<00:07, 1.45it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2222222222222223e-06, 'num_tokens': 81812.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2} | |
| 80%|ββββββββ | 40/50 [00:38<00:06, 1.45it/s][A | |
| 82%|βββββββββ | 41/50 [00:39<00:05, 1.53it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 83808.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.2} | |
| 82%|βββββββββ | 41/50 [00:39<00:05, 1.53it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 85804.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.21} | |
| 84%|βββββββββ | 42/50 [00:39<00:05, 1.53it/s][A | |
| 86%|βββββββββ | 43/50 [00:40<00:04, 1.59it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.88888888888889e-07, 'num_tokens': 87888.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.21} | |
| 86%|βββββββββ | 43/50 [00:40<00:04, 1.59it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.777777777777779e-07, 'num_tokens': 89884.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.22} | |
| 88%|βββββββββ | 44/50 [00:41<00:03, 1.59it/s][A | |
| 90%|βββββββββ | 45/50 [00:41<00:03, 1.64it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.666666666666667e-07, 'num_tokens': 91936.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23} | |
| 90%|βββββββββ | 45/50 [00:41<00:03, 1.64it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 94020.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23} | |
| 92%|ββββββββββ| 46/50 [00:42<00:02, 1.64it/s][A | |
| 94%|ββββββββββ| 47/50 [00:42<00:01, 1.68it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444445e-07, 'num_tokens': 96016.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.23} | |
| 94%|ββββββββββ| 47/50 [00:42<00:01, 1.68it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333335e-07, 'num_tokens': 98064.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.24} | |
| 96%|ββββββββββ| 48/50 [00:43<00:01, 1.68it/s][A | |
| 98%|ββββββββββ| 49/50 [00:44<00:00, 1.48it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2222222222222224e-07, 'num_tokens': 100116.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.24} | |
| 98%|ββββββββββ| 49/50 [00:44<00:00, 1.48it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1111111111111112e-07, 'num_tokens': 102168.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase1/mean': 0.0, 'rewards/r_format_phase1/std': 0.0, 'rewards/r_regret_phase1/mean': -0.5, 'rewards/r_regret_phase1/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.25} | |
| 100%|ββββββββββ| 50/50 [00:44<00:00, 1.48it/s][A | |
| [A{'train_runtime': 45.5636, 'train_samples_per_second': 4.389, 'train_steps_per_second': 1.097, 'train_loss': 0.0, 'epoch': 0.25} | |
| 100%|ββββββββββ| 50/50 [00:45<00:00, 1.48it/s][A | |
| 100%|ββββββββββ| 50/50 [00:45<00:00, 1.10it/s] | |
| Phase 1 done in 0.8 min | |
| [diagnostic] seed=100 raw completion (first 500 chars): | |
| <tool_call> | |
| 1st-order: EV adoption surges, directly driving demand for GREEN energy and EV supply chains. 2nd-order: As EVs displace ICE vehicles, OIL demand faces structural headwinds over the 12-quarter cycle, forcing a long-term rotation away from fossil fuels. 3rd-order: The massive capital deployment into EV infrastructure acts as a massive liquidity pump, supporting TECH and REAL_ESTATE valuations. Base-rate: Today's news strongly signals a structural transition away from OIL and a green b | |
| [parse_action result]: metadata={} weights=[0.35, 0.05, 0.45, 0.1, 0.05] infra_commit=0.15 carbon_offset_buy=0.0 put_hedge=0.0 tech_bet='green_leaps' | |
| ββ Hold-out eval (5/5 valid) ββ | |
| mean regret: -0.0037 | |
| beat baseline: 4/5 | |
| ββ GRPO Phase 2: 8Q episodes, 100 iters, rewards=['format', 'regret', 'sharpe', 'drawdown'] ββ | |
| Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it. | |
| Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it. | |
| ==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 | |
| \\ /| Num examples = 600 | Num Epochs = 1 | Total steps = 100 | |
| O^O/ \_/ \ Batch size per device = 6 | Gradient accumulation steps = 1 | |
| \ / Data Parallel GPUs = 1 | Total batch size (6 x 1 x 1) = 6 | |
| "-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained) | |
| 0%| | 0/100 [00:00<?, ?it/s][AUnsloth: Will smartly offload gradients to save VRAM! | |
| 1%| | 1/100 [00:05<08:43, 5.29s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 2994.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0} | |
| 1%| | 1/100 [00:05<08:43, 5.29s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.000000000000001e-07, 'num_tokens': 6066.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0} | |
| 2%|β | 2/100 [00:06<08:38, 5.29s/it][A | |
| 3%|β | 3/100 [00:06<03:06, 1.92s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 9270.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 3%|β | 3/100 [00:06<03:06, 1.92s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5e-06, 'num_tokens': 12342.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 4%|β | 4/100 [00:07<03:04, 1.92s/it][A | |
| 5%|β | 5/100 [00:08<02:05, 1.32s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 15576.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 5%|β | 5/100 [00:08<02:05, 1.32s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 18714.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 6%|β | 6/100 [00:09<02:04, 1.32s/it][A | |
| 7%|β | 7/100 [00:09<01:39, 1.07s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 21702.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 7%|β | 7/100 [00:09<01:39, 1.07s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5e-06, 'num_tokens': 24738.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 8%|β | 8/100 [00:10<01:38, 1.07s/it][A | |
| 9%|β | 9/100 [00:11<01:25, 1.06it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 27726.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 9%|β | 9/100 [00:11<01:25, 1.06it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5e-06, 'num_tokens': 30972.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 10%|β | 10/100 [00:11<01:24, 1.06it/s][A | |
| 11%|β | 11/100 [00:12<01:18, 1.14it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 34098.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 11%|β | 11/100 [00:12<01:18, 1.14it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.944444444444445e-06, 'num_tokens': 37176.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 12%|ββ | 12/100 [00:13<01:17, 1.14it/s][A | |
| 13%|ββ | 13/100 [00:14<01:13, 1.19it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.888888888888889e-06, 'num_tokens': 40380.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 13%|ββ | 13/100 [00:14<01:13, 1.19it/s][A | |
| 13%|ββ | 13/100 [00:28<01:13, 1.19it/s][A | |
| 14%|ββ | 14/100 [00:53<11:54, 8.31s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.833333333333333e-06, 'num_tokens': 43382.0, 'completions/mean_length': 3.3333334922790527, 'completions/min_length': 1.0, 'completions/max_length': 15.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.3333334922790527, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 15.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.3333334922790527, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 14%|ββ | 14/100 [00:53<11:54, 8.31s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.777777777777778e-06, 'num_tokens': 46454.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 15%|ββ | 15/100 [00:54<11:46, 8.31s/it][A | |
| 16%|ββ | 16/100 [00:55<07:52, 5.62s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.722222222222222e-06, 'num_tokens': 49448.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 16%|ββ | 16/100 [00:55<07:52, 5.62s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.666666666666667e-06, 'num_tokens': 52526.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 17%|ββ | 17/100 [00:56<07:46, 5.62s/it][A | |
| 18%|ββ | 18/100 [00:56<05:26, 3.98s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.611111111111112e-06, 'num_tokens': 55730.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 18%|ββ | 18/100 [00:56<05:26, 3.98s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.555555555555556e-06, 'num_tokens': 58976.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 19%|ββ | 19/100 [00:57<05:22, 3.98s/it][A | |
| 20%|ββ | 20/100 [00:58<03:54, 2.93s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5e-06, 'num_tokens': 62048.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 20%|ββ | 20/100 [00:58<03:54, 2.93s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 65282.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 21%|ββ | 21/100 [00:59<03:51, 2.93s/it][A | |
| 22%|βββ | 22/100 [00:59<02:54, 2.24s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.388888888888889e-06, 'num_tokens': 68354.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 22%|βββ | 22/100 [00:59<02:54, 2.24s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.333333333333334e-06, 'num_tokens': 71588.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 23%|βββ | 23/100 [01:00<02:52, 2.24s/it][A | |
| 24%|βββ | 24/100 [01:01<02:16, 1.79s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.277777777777778e-06, 'num_tokens': 74660.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 24%|βββ | 24/100 [01:01<02:16, 1.79s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.222222222222223e-06, 'num_tokens': 77786.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 25%|βββ | 25/100 [01:02<02:14, 1.79s/it][A | |
| 26%|βββ | 26/100 [01:03<01:55, 1.56s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.166666666666667e-06, 'num_tokens': 80858.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 26%|βββ | 26/100 [01:03<01:55, 1.56s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.111111111111111e-06, 'num_tokens': 84062.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 27%|βββ | 27/100 [01:04<01:53, 1.56s/it][A | |
| 28%|βββ | 28/100 [01:05<01:34, 1.31s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.055555555555556e-06, 'num_tokens': 87140.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 28%|βββ | 28/100 [01:05<01:34, 1.31s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 90212.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 29%|βββ | 29/100 [01:05<01:32, 1.31s/it][A | |
| 30%|βββ | 30/100 [01:06<01:19, 1.13s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.944444444444445e-06, 'num_tokens': 93248.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 30%|βββ | 30/100 [01:06<01:19, 1.13s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 96236.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 31%|βββ | 31/100 [01:07<01:18, 1.13s/it][A | |
| 32%|ββββ | 32/100 [01:07<01:08, 1.01s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.833333333333334e-06, 'num_tokens': 99482.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 32%|ββββ | 32/100 [01:07<01:08, 1.01s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.777777777777778e-06, 'num_tokens': 102608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 33%|ββββ | 33/100 [01:08<01:07, 1.01s/it][A | |
| 34%|ββββ | 34/100 [01:09<01:01, 1.08it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7222222222222225e-06, 'num_tokens': 105596.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 34%|ββββ | 34/100 [01:09<01:01, 1.08it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6666666666666666e-06, 'num_tokens': 108668.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 35%|ββββ | 35/100 [01:10<01:00, 1.08it/s][A | |
| 36%|ββββ | 36/100 [01:10<00:55, 1.15it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6111111111111115e-06, 'num_tokens': 111656.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 36%|ββββ | 36/100 [01:10<00:55, 1.15it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.555555555555556e-06, 'num_tokens': 114650.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 37%|ββββ | 37/100 [01:11<00:54, 1.15it/s][A | |
| 38%|ββββ | 38/100 [01:12<00:51, 1.21it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5e-06, 'num_tokens': 117728.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 38%|ββββ | 38/100 [01:12<00:51, 1.21it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.444444444444445e-06, 'num_tokens': 120764.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 39%|ββββ | 39/100 [01:13<00:50, 1.21it/s][A | |
| 40%|ββββ | 40/100 [01:13<00:48, 1.24it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3888888888888893e-06, 'num_tokens': 123998.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 40%|ββββ | 40/100 [01:13<00:48, 1.24it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 127034.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 41%|ββββ | 41/100 [01:14<00:47, 1.24it/s][A | |
| 42%|βββββ | 42/100 [01:15<00:45, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.277777777777778e-06, 'num_tokens': 130160.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 42%|βββββ | 42/100 [01:15<00:45, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2222222222222227e-06, 'num_tokens': 133316.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 43%|βββββ | 43/100 [01:16<00:44, 1.28it/s][A | |
| 44%|βββββ | 44/100 [01:16<00:43, 1.29it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1666666666666667e-06, 'num_tokens': 136550.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 44%|βββββ | 44/100 [01:16<00:43, 1.29it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1111111111111116e-06, 'num_tokens': 139622.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 45%|βββββ | 45/100 [01:17<00:42, 1.29it/s][A | |
| 46%|βββββ | 46/100 [01:18<00:41, 1.31it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.055555555555556e-06, 'num_tokens': 142826.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 46%|βββββ | 46/100 [01:18<00:41, 1.31it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3e-06, 'num_tokens': 145814.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 47%|βββββ | 47/100 [01:19<00:40, 1.31it/s][A | |
| 48%|βββββ | 48/100 [01:19<00:39, 1.32it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.944444444444445e-06, 'num_tokens': 149060.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 48%|βββββ | 48/100 [01:19<00:39, 1.32it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.888888888888889e-06, 'num_tokens': 152054.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 49%|βββββ | 49/100 [01:20<00:38, 1.32it/s][A | |
| 50%|βββββ | 50/100 [01:21<00:37, 1.33it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.8333333333333335e-06, 'num_tokens': 155288.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 50%|βββββ | 50/100 [01:21<00:37, 1.33it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 158426.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 51%|βββββ | 51/100 [01:22<00:36, 1.33it/s][A | |
| 52%|ββββββ | 52/100 [01:23<00:40, 1.20it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7222222222222224e-06, 'num_tokens': 161414.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 52%|ββββββ | 52/100 [01:23<00:40, 1.20it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.666666666666667e-06, 'num_tokens': 164552.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 53%|ββββββ | 53/100 [01:24<00:39, 1.20it/s][A | |
| 54%|ββββββ | 54/100 [01:24<00:37, 1.23it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.6111111111111113e-06, 'num_tokens': 167756.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 54%|ββββββ | 54/100 [01:24<00:37, 1.23it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5555555555555557e-06, 'num_tokens': 170828.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 55%|ββββββ | 55/100 [01:25<00:36, 1.23it/s][A | |
| 56%|ββββββ | 56/100 [01:26<00:34, 1.27it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 173900.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 56%|ββββββ | 56/100 [01:26<00:34, 1.27it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4444444444444447e-06, 'num_tokens': 177056.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 57%|ββββββ | 57/100 [01:27<00:33, 1.27it/s][A | |
| 58%|ββββββ | 58/100 [01:27<00:32, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.388888888888889e-06, 'num_tokens': 180302.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 58%|ββββββ | 58/100 [01:27<00:32, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.3333333333333336e-06, 'num_tokens': 183290.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 59%|ββββββ | 59/100 [01:28<00:31, 1.28it/s][A | |
| 60%|ββββββ | 60/100 [01:29<00:31, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.277777777777778e-06, 'num_tokens': 186326.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 60%|ββββββ | 60/100 [01:29<00:31, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 189398.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 61%|ββββββ | 61/100 [01:30<00:30, 1.28it/s][A | |
| 62%|βββββββ | 62/100 [01:30<00:29, 1.31it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.166666666666667e-06, 'num_tokens': 192470.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 62%|βββββββ | 62/100 [01:30<00:29, 1.31it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.1111111111111114e-06, 'num_tokens': 195608.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 63%|βββββββ | 63/100 [01:31<00:28, 1.31it/s][A | |
| 64%|βββββββ | 64/100 [01:32<00:27, 1.31it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0555555555555555e-06, 'num_tokens': 198734.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 64%|βββββββ | 64/100 [01:32<00:27, 1.31it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000003e-06, 'num_tokens': 201806.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 65%|βββββββ | 65/100 [01:33<00:26, 1.31it/s][A | |
| 66%|βββββββ | 66/100 [01:33<00:25, 1.33it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.944444444444445e-06, 'num_tokens': 204794.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 66%|βββββββ | 66/100 [01:33<00:25, 1.33it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.888888888888889e-06, 'num_tokens': 207830.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 67%|βββββββ | 67/100 [01:34<00:24, 1.33it/s][A | |
| 68%|βββββββ | 68/100 [01:35<00:23, 1.34it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8333333333333333e-06, 'num_tokens': 210902.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 68%|βββββββ | 68/100 [01:35<00:23, 1.34it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.777777777777778e-06, 'num_tokens': 213974.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 69%|βββββββ | 69/100 [01:36<00:23, 1.34it/s][A | |
| 70%|βββββββ | 70/100 [01:36<00:22, 1.35it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.7222222222222224e-06, 'num_tokens': 217046.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 70%|βββββββ | 70/100 [01:36<00:22, 1.35it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 220292.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 71%|βββββββ | 71/100 [01:37<00:21, 1.35it/s][A | |
| 72%|ββββββββ | 72/100 [01:38<00:20, 1.34it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6111111111111113e-06, 'num_tokens': 223280.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 72%|ββββββββ | 72/100 [01:38<00:20, 1.34it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5555555555555558e-06, 'num_tokens': 226358.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 73%|ββββββββ | 73/100 [01:39<00:20, 1.34it/s][A | |
| 74%|ββββββββ | 74/100 [01:39<00:19, 1.34it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5e-06, 'num_tokens': 229496.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 74%|ββββββββ | 74/100 [01:39<00:19, 1.34it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4444444444444445e-06, 'num_tokens': 232730.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 75%|ββββββββ | 75/100 [01:40<00:18, 1.34it/s][A | |
| 76%|ββββββββ | 76/100 [01:41<00:19, 1.20it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888892e-06, 'num_tokens': 235802.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 76%|ββββββββ | 76/100 [01:41<00:19, 1.20it/s][A | |
| 76%|ββββββββ | 76/100 [01:52<00:19, 1.20it/s][A | |
| 77%|ββββββββ | 77/100 [02:19<02:47, 7.28s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3333333333333334e-06, 'num_tokens': 238797.0, 'completions/mean_length': 2.1666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 8.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 2.1666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 8.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 2.1666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 77%|ββββββββ | 77/100 [02:19<02:47, 7.28s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2777777777777779e-06, 'num_tokens': 241935.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 78%|ββββββββ | 78/100 [02:19<02:40, 7.28s/it][A | |
| 79%|ββββββββ | 79/100 [02:20<01:46, 5.09s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.2222222222222223e-06, 'num_tokens': 244923.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 79%|ββββββββ | 79/100 [02:20<01:46, 5.09s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1666666666666668e-06, 'num_tokens': 247995.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 80%|ββββββββ | 80/100 [02:21<01:41, 5.09s/it][A | |
| 81%|ββββββββ | 81/100 [02:22<01:09, 3.68s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 251067.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 81%|ββββββββ | 81/100 [02:22<01:09, 3.68s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0555555555555557e-06, 'num_tokens': 254145.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 82%|βββββββββ | 82/100 [02:22<01:06, 3.68s/it][A | |
| 83%|βββββββββ | 83/100 [02:23<00:46, 2.75s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0000000000000002e-06, 'num_tokens': 257181.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 83%|βββββββββ | 83/100 [02:23<00:46, 2.75s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.444444444444445e-07, 'num_tokens': 260253.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 84%|βββββββββ | 84/100 [02:24<00:44, 2.75s/it][A | |
| 85%|βββββββββ | 85/100 [02:25<00:31, 2.13s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.88888888888889e-07, 'num_tokens': 263247.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 85%|βββββββββ | 85/100 [02:25<00:31, 2.13s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.333333333333333e-07, 'num_tokens': 266451.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 86%|βββββββββ | 86/100 [02:26<00:29, 2.13s/it][A | |
| 87%|βββββββββ | 87/100 [02:26<00:22, 1.73s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.777777777777779e-07, 'num_tokens': 269439.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 87%|βββββββββ | 87/100 [02:26<00:22, 1.73s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.222222222222222e-07, 'num_tokens': 272427.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 88%|βββββββββ | 88/100 [02:27<00:20, 1.73s/it][A | |
| 89%|βββββββββ | 89/100 [02:28<00:15, 1.43s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.666666666666667e-07, 'num_tokens': 275499.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 89%|βββββββββ | 89/100 [02:28<00:15, 1.43s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.111111111111112e-07, 'num_tokens': 278655.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 90%|βββββββββ | 90/100 [02:29<00:14, 1.43s/it][A | |
| 91%|βββββββββ | 91/100 [02:29<00:10, 1.22s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 281691.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 91%|βββββββββ | 91/100 [02:29<00:10, 1.22s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.000000000000001e-07, 'num_tokens': 284925.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 92%|ββββββββββ| 92/100 [02:30<00:09, 1.22s/it][A | |
| 93%|ββββββββββ| 93/100 [02:31<00:07, 1.08s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444445e-07, 'num_tokens': 287913.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 93%|ββββββββββ| 93/100 [02:31<00:07, 1.08s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.8888888888888895e-07, 'num_tokens': 290985.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 94%|ββββββββββ| 94/100 [02:32<00:06, 1.08s/it][A | |
| 95%|ββββββββββ| 95/100 [02:32<00:04, 1.02it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333335e-07, 'num_tokens': 294123.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 95%|ββββββββββ| 95/100 [02:32<00:04, 1.02it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777776e-07, 'num_tokens': 297357.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 96%|ββββββββββ| 96/100 [02:33<00:03, 1.02it/s][A | |
| 96%|ββββββββββ| 96/100 [02:44<00:03, 1.02it/s][A | |
| 97%|ββββββββββ| 97/100 [03:05<00:16, 5.61s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2222222222222224e-07, 'num_tokens': 300361.0, 'completions/mean_length': 3.6666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 17.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.6666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 17.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.6666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 97%|ββββββββββ| 97/100 [03:05<00:16, 5.61s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666668e-07, 'num_tokens': 303517.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 98%|ββββββββββ| 98/100 [03:06<00:11, 5.61s/it][A | |
| 99%|ββββββββββ| 99/100 [03:06<00:04, 4.15s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1111111111111112e-07, 'num_tokens': 306721.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17} | |
| 99%|ββββββββββ| 99/100 [03:06<00:04, 4.15s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555556e-08, 'num_tokens': 309859.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase2/mean': 0.0, 'rewards/r_format_phase2/std': 0.0, 'rewards/r_regret_phase2/mean': -0.5, 'rewards/r_regret_phase2/std': 0.0, 'rewards/r_sharpe_phase2/mean': 0.0, 'rewards/r_sharpe_phase2/std': 0.0, 'rewards/r_drawdown_phase2/mean': 0.0, 'rewards/r_drawdown_phase2/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17} | |
| 100%|ββββββββββ| 100/100 [03:07<00:00, 4.15s/it][A | |
| [A{'train_runtime': 188.2628, 'train_samples_per_second': 3.187, 'train_steps_per_second': 0.531, 'train_loss': 0.0, 'epoch': 0.17} | |
| 100%|ββββββββββ| 100/100 [03:08<00:00, 4.15s/it][A | |
| 100%|ββββββββββ| 100/100 [03:08<00:00, 1.88s/it] | |
| Phase 2 done in 3.1 min | |
| [diagnostic] seed=100 raw completion (first 500 chars): | |
| <tool_call> | |
| 1st-order: Insurers exiting Florida and California triggers a massive flight-to-safety, driving 10-year Treasuries down and freezing municipal bonds. 2nd-order: The freeze in municipal bonds directly crushes the yield curve, making long-duration BONDS a dead asset over the next 12 quarters. 3rd-order: The physical loss of insurance capital in the Gulf Coast and Bay Area will eventually trigger a broader real estate market correction, severely hurting REAL_ESTATE. Base case: Deflation | |
| [parse_action result]: metadata={} weights=[0.2, 0.05, 0.05, 0.0, 0.7] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary' | |
| ββ Hold-out eval (5/5 valid) ββ | |
| mean regret: -0.0391 | |
| beat baseline: 2/5 | |
| ββ GRPO Phase 3: 12Q episodes, 80 iters, rewards=['format', 'regret', 'sharpe', 'drawdown', 'carbon'] ββ | |
| Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it. | |
| Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it. | |
| ==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 | |
| \\ /| Num examples = 480 | Num Epochs = 1 | Total steps = 80 | |
| O^O/ \_/ \ Batch size per device = 6 | Gradient accumulation steps = 1 | |
| \ / Data Parallel GPUs = 1 | Total batch size (6 x 1 x 1) = 6 | |
| "-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained) | |
| 0%| | 0/80 [00:00<?, ?it/s][AUnsloth: Will smartly offload gradients to save VRAM! | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0, 'num_tokens': 3216.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0} | |
| 1%|β | 1/80 [00:00<01:04, 1.22it/s][A | |
| 2%|β | 2/80 [00:01<01:00, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.25e-07, 'num_tokens': 6288.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0} | |
| 2%|β | 2/80 [00:01<01:00, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.25e-06, 'num_tokens': 9426.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 4%|β | 3/80 [00:02<01:00, 1.28it/s][A | |
| 5%|β | 4/80 [00:03<00:58, 1.31it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8750000000000003e-06, 'num_tokens': 12564.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 5%|β | 4/80 [00:03<00:58, 1.31it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 15810.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 6%|β | 5/80 [00:03<00:57, 1.31it/s][A | |
| 8%|β | 6/80 [00:11<02:47, 2.27s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.125e-06, 'num_tokens': 18807.0, 'completions/mean_length': 69.16667175292969, 'completions/min_length': 1.0, 'completions/max_length': 400.0, 'completions/clipped_ratio': 0.16666666666666663, 'completions/mean_terminated_length': 3.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 11.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 69.16667175292969, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 8%|β | 6/80 [00:11<02:47, 2.27s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7500000000000005e-06, 'num_tokens': 22041.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.01} | |
| 9%|β | 7/80 [00:11<02:45, 2.27s/it][A | |
| 10%|β | 8/80 [00:12<02:00, 1.67s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.3750000000000005e-06, 'num_tokens': 25287.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 10%|β | 8/80 [00:12<02:00, 1.67s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5e-06, 'num_tokens': 28413.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 11%|ββ | 9/80 [00:13<01:58, 1.67s/it][A | |
| 12%|ββ | 10/80 [00:14<01:33, 1.34s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.930555555555556e-06, 'num_tokens': 31629.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 12%|ββ | 10/80 [00:14<01:33, 1.34s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.861111111111111e-06, 'num_tokens': 34863.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.02} | |
| 14%|ββ | 11/80 [00:14<01:32, 1.34s/it][A | |
| 15%|ββ | 12/80 [00:15<01:17, 1.13s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.791666666666668e-06, 'num_tokens': 37851.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 15%|ββ | 12/80 [00:15<01:17, 1.13s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.722222222222222e-06, 'num_tokens': 40923.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 16%|ββ | 13/80 [00:16<01:16, 1.13s/it][A | |
| 18%|ββ | 14/80 [00:17<01:06, 1.00s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.652777777777779e-06, 'num_tokens': 44061.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 18%|ββ | 14/80 [00:17<01:06, 1.00s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.583333333333333e-06, 'num_tokens': 47295.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 19%|ββ | 15/80 [00:17<01:05, 1.00s/it][A | |
| 20%|ββ | 16/80 [00:18<00:58, 1.09it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.5138888888888895e-06, 'num_tokens': 50283.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.03} | |
| 20%|ββ | 16/80 [00:18<00:58, 1.09it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.444444444444444e-06, 'num_tokens': 53361.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 21%|βββ | 17/80 [00:19<00:57, 1.09it/s][A | |
| 22%|βββ | 18/80 [00:20<00:53, 1.16it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.3750000000000005e-06, 'num_tokens': 56577.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 22%|βββ | 18/80 [00:20<00:53, 1.16it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.305555555555556e-06, 'num_tokens': 59565.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 24%|βββ | 19/80 [00:20<00:52, 1.16it/s][A | |
| 25%|βββ | 20/80 [00:21<00:52, 1.15it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.236111111111111e-06, 'num_tokens': 62571.0, 'completions/mean_length': 4.0, 'completions/min_length': 1.0, 'completions/max_length': 19.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 4.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 19.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 4.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 25%|βββ | 20/80 [00:21<00:52, 1.15it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.166666666666667e-06, 'num_tokens': 65787.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.04} | |
| 26%|βββ | 21/80 [00:23<00:51, 1.15it/s][A | |
| 28%|βββ | 22/80 [00:23<00:53, 1.08it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.097222222222222e-06, 'num_tokens': 69003.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 28%|βββ | 22/80 [00:23<00:53, 1.08it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.027777777777779e-06, 'num_tokens': 72141.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 29%|βββ | 23/80 [00:24<00:52, 1.08it/s][A | |
| 30%|βββ | 24/80 [00:25<00:48, 1.15it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.958333333333333e-06, 'num_tokens': 75213.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 30%|βββ | 24/80 [00:25<00:48, 1.15it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.88888888888889e-06, 'num_tokens': 78447.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 31%|ββββ | 25/80 [00:26<00:47, 1.15it/s][A | |
| 32%|ββββ | 26/80 [00:27<00:45, 1.19it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.819444444444444e-06, 'num_tokens': 81675.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.05} | |
| 32%|ββββ | 26/80 [00:27<00:45, 1.19it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.7500000000000005e-06, 'num_tokens': 84813.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 34%|ββββ | 27/80 [00:27<00:44, 1.19it/s][A | |
| 35%|ββββ | 28/80 [00:28<00:43, 1.20it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.680555555555556e-06, 'num_tokens': 87808.0, 'completions/mean_length': 2.1666667461395264, 'completions/min_length': 1.0, 'completions/max_length': 8.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 2.1666667461395264, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 8.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 2.1666667461395264, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 35%|ββββ | 28/80 [00:28<00:43, 1.20it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.6111111111111115e-06, 'num_tokens': 90880.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 36%|ββββ | 29/80 [00:29<00:42, 1.20it/s][A | |
| 38%|ββββ | 30/80 [00:30<00:40, 1.25it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.5416666666666673e-06, 'num_tokens': 94006.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 38%|ββββ | 30/80 [00:30<00:40, 1.25it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.4722222222222224e-06, 'num_tokens': 97252.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.06} | |
| 39%|ββββ | 31/80 [00:30<00:39, 1.25it/s][A | |
| 40%|ββββ | 32/80 [00:31<00:37, 1.26it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.4027777777777783e-06, 'num_tokens': 100468.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 40%|ββββ | 32/80 [00:31<00:37, 1.26it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.3333333333333333e-06, 'num_tokens': 103672.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 41%|βββββ | 33/80 [00:32<00:37, 1.26it/s][A | |
| 42%|βββββ | 34/80 [00:33<00:37, 1.22it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.2638888888888892e-06, 'num_tokens': 106689.0, 'completions/mean_length': 5.833333492279053, 'completions/min_length': 1.0, 'completions/max_length': 16.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 5.833333492279053, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 16.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 5.833333492279053, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 42%|βββββ | 34/80 [00:33<00:37, 1.22it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.1944444444444443e-06, 'num_tokens': 109761.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 44%|βββββ | 35/80 [00:34<00:36, 1.22it/s][A | |
| 44%|βββββ | 35/80 [00:48<00:36, 1.22it/s][A | |
| 45%|βββββ | 36/80 [01:04<03:50, 5.24s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.125e-06, 'num_tokens': 112761.0, 'completions/mean_length': 3.0, 'completions/min_length': 1.0, 'completions/max_length': 13.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 13.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.07} | |
| 45%|βββββ | 36/80 [01:04<03:50, 5.24s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.055555555555556e-06, 'num_tokens': 115995.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 46%|βββββ | 37/80 [01:05<03:45, 5.24s/it][A | |
| 48%|βββββ | 38/80 [01:05<02:43, 3.89s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.986111111111111e-06, 'num_tokens': 119073.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 48%|βββββ | 38/80 [01:05<02:43, 3.89s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.916666666666667e-06, 'num_tokens': 122319.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 49%|βββββ | 39/80 [01:06<02:39, 3.89s/it][A | |
| 50%|βββββ | 40/80 [01:07<01:58, 2.95s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.8472222222222224e-06, 'num_tokens': 125523.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.08} | |
| 50%|βββββ | 40/80 [01:07<01:58, 2.95s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777783e-06, 'num_tokens': 128559.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 51%|ββββββ | 41/80 [01:08<01:55, 2.95s/it][A | |
| 52%|ββββββ | 42/80 [01:09<01:30, 2.37s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7083333333333334e-06, 'num_tokens': 131547.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 52%|ββββββ | 42/80 [01:09<01:30, 2.37s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.6388888888888893e-06, 'num_tokens': 134751.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 54%|ββββββ | 43/80 [01:10<01:27, 2.37s/it][A | |
| 55%|ββββββ | 44/80 [01:11<01:07, 1.89s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5694444444444443e-06, 'num_tokens': 137967.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 55%|ββββββ | 44/80 [01:11<01:07, 1.89s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.5e-06, 'num_tokens': 141171.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.09} | |
| 56%|ββββββ | 45/80 [01:11<01:06, 1.89s/it][A | |
| 57%|ββββββ | 46/80 [01:12<00:52, 1.55s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.4305555555555557e-06, 'num_tokens': 144309.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 57%|ββββββ | 46/80 [01:12<00:52, 1.55s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.361111111111111e-06, 'num_tokens': 147609.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 59%|ββββββ | 47/80 [01:13<00:51, 1.55s/it][A | |
| 60%|ββββββ | 48/80 [01:14<00:42, 1.31s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2916666666666666e-06, 'num_tokens': 150849.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 60%|ββββββ | 48/80 [01:14<00:42, 1.31s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.222222222222222e-06, 'num_tokens': 154065.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 61%|βββββββ | 49/80 [01:14<00:40, 1.31s/it][A | |
| 62%|βββββββ | 50/80 [01:15<00:34, 1.15s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.152777777777778e-06, 'num_tokens': 157311.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.1} | |
| 62%|βββββββ | 50/80 [01:15<00:34, 1.15s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0833333333333334e-06, 'num_tokens': 160551.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 64%|βββββββ | 51/80 [01:16<00:33, 1.15s/it][A | |
| 65%|βββββββ | 52/80 [01:17<00:28, 1.03s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0138888888888893e-06, 'num_tokens': 163623.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 65%|βββββββ | 52/80 [01:17<00:28, 1.03s/it][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.944444444444445e-06, 'num_tokens': 166863.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 66%|βββββββ | 53/80 [01:17<00:27, 1.03s/it][A | |
| 68%|βββββββ | 54/80 [01:18<00:24, 1.06it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8750000000000003e-06, 'num_tokens': 170079.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 68%|βββββββ | 54/80 [01:18<00:24, 1.06it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.8055555555555557e-06, 'num_tokens': 173319.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.11} | |
| 69%|βββββββ | 55/80 [01:19<00:23, 1.06it/s][A | |
| 70%|βββββββ | 56/80 [01:20<00:21, 1.12it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.7361111111111112e-06, 'num_tokens': 176457.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 70%|βββββββ | 56/80 [01:20<00:21, 1.12it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.6666666666666667e-06, 'num_tokens': 179595.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 71%|ββββββββ | 57/80 [01:20<00:20, 1.12it/s][A | |
| 72%|ββββββββ | 58/80 [01:21<00:18, 1.18it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.5972222222222221e-06, 'num_tokens': 182799.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 72%|ββββββββ | 58/80 [01:21<00:18, 1.18it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.527777777777778e-06, 'num_tokens': 185937.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 74%|ββββββββ | 59/80 [01:22<00:17, 1.18it/s][A | |
| 75%|ββββββββ | 60/80 [01:23<00:16, 1.22it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.4583333333333335e-06, 'num_tokens': 189009.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.12} | |
| 75%|ββββββββ | 60/80 [01:23<00:16, 1.22it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888892e-06, 'num_tokens': 192255.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 76%|ββββββββ | 61/80 [01:24<00:15, 1.22it/s][A | |
| 78%|ββββββββ | 62/80 [01:25<00:15, 1.13it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3194444444444446e-06, 'num_tokens': 195483.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 78%|ββββββββ | 62/80 [01:25<00:15, 1.13it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.25e-06, 'num_tokens': 198555.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 79%|ββββββββ | 63/80 [01:25<00:15, 1.13it/s][A | |
| 80%|ββββββββ | 64/80 [01:26<00:13, 1.19it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.1805555555555556e-06, 'num_tokens': 201633.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.13} | |
| 80%|ββββββββ | 64/80 [01:26<00:13, 1.19it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.111111111111111e-06, 'num_tokens': 204636.0, 'completions/mean_length': 3.5, 'completions/min_length': 1.0, 'completions/max_length': 16.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.5, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 16.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.5, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 81%|βββββββββ | 65/80 [01:27<00:12, 1.19it/s][A | |
| 82%|βββββββββ | 66/80 [01:28<00:11, 1.18it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.0416666666666667e-06, 'num_tokens': 207708.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 82%|βββββββββ | 66/80 [01:28<00:11, 1.18it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.722222222222224e-07, 'num_tokens': 211008.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 84%|βββββββββ | 67/80 [01:29<00:11, 1.18it/s][A | |
| 85%|βββββββββ | 68/80 [01:29<00:09, 1.22it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 9.027777777777779e-07, 'num_tokens': 213996.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 85%|βββββββββ | 68/80 [01:29<00:09, 1.22it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 8.333333333333333e-07, 'num_tokens': 216998.0, 'completions/mean_length': 3.3333334922790527, 'completions/min_length': 1.0, 'completions/max_length': 15.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 3.3333334922790527, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 15.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 3.3333334922790527, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.14} | |
| 86%|βββββββββ | 69/80 [01:30<00:09, 1.22it/s][A | |
| 88%|βββββββββ | 70/80 [01:31<00:08, 1.20it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 7.63888888888889e-07, 'num_tokens': 220124.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 88%|βββββββββ | 70/80 [01:31<00:08, 1.20it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.944444444444446e-07, 'num_tokens': 223160.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 89%|βββββββββ | 71/80 [01:32<00:07, 1.20it/s][A | |
| 90%|βββββββββ | 72/80 [01:33<00:06, 1.24it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.25e-07, 'num_tokens': 226238.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 90%|βββββββββ | 72/80 [01:33<00:06, 1.24it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 5.555555555555555e-07, 'num_tokens': 229478.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 91%|ββββββββββ| 73/80 [01:33<00:05, 1.24it/s][A | |
| 92%|ββββββββββ| 74/80 [01:34<00:04, 1.26it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.861111111111112e-07, 'num_tokens': 232706.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.15} | |
| 92%|ββββββββββ| 74/80 [01:34<00:04, 1.26it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 4.1666666666666667e-07, 'num_tokens': 235844.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 94%|ββββββββββ| 75/80 [01:35<00:03, 1.26it/s][A | |
| 95%|ββββββββββ| 76/80 [01:36<00:03, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.472222222222223e-07, 'num_tokens': 238922.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 95%|ββββββββββ| 76/80 [01:36<00:03, 1.28it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.7777777777777776e-07, 'num_tokens': 241994.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 96%|ββββββββββ| 77/80 [01:36<00:02, 1.28it/s][A | |
| 98%|ββββββββββ| 78/80 [01:37<00:01, 1.29it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0833333333333333e-07, 'num_tokens': 245240.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 98%|ββββββββββ| 78/80 [01:37<00:01, 1.29it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 1.3888888888888888e-07, 'num_tokens': 248396.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.16} | |
| 99%|ββββββββββ| 79/80 [01:38<00:00, 1.29it/s][A | |
| 100%|ββββββββββ| 80/80 [01:39<00:00, 1.30it/s][A | |
| [A{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 6.944444444444444e-08, 'num_tokens': 251534.0, 'completions/mean_length': 1.0, 'completions/min_length': 1.0, 'completions/max_length': 1.0, 'completions/clipped_ratio': 0.0, 'completions/mean_terminated_length': 1.0, 'completions/min_terminated_length': 1.0, 'completions/max_terminated_length': 1.0, 'rewards/r_format_phase3/mean': 0.0, 'rewards/r_format_phase3/std': 0.0, 'rewards/r_regret_phase3/mean': -0.5, 'rewards/r_regret_phase3/std': 0.0, 'rewards/r_sharpe_phase3/mean': 0.0, 'rewards/r_sharpe_phase3/std': 0.0, 'rewards/r_drawdown_phase3/mean': 0.0, 'rewards/r_drawdown_phase3/std': 0.0, 'rewards/r_carbon_phase3/mean': 0.0, 'rewards/r_carbon_phase3/std': 0.0, 'reward': -0.5, 'reward_std': 0.0, 'frac_reward_zero_std': 1.0, 'completion_length': 1.0, 'kl': 0.0, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.17} | |
| 100%|ββββββββββ| 80/80 [01:39<00:00, 1.30it/s][A | |
| [A{'train_runtime': 100.0827, 'train_samples_per_second': 4.796, 'train_steps_per_second': 0.799, 'train_loss': 0.0, 'epoch': 0.17} | |
| 100%|ββββββββββ| 80/80 [01:40<00:00, 1.30it/s][A | |
| 100%|ββββββββββ| 80/80 [01:40<00:00, 1.25s/it] | |
| Phase 3 done in 1.7 min | |
| [diagnostic] seed=100 raw completion (first 500 chars): | |
| <tool_call> | |
| 1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the massive overcapacity in the oil sector will be rapidly absorbed by industrial demand, driving a structural inflationary squeeze. This stagflationary regime will crush BONDS and compress REAL_ESTATE valuations. 3rd-order: The forced lo | |
| [parse_action result]: metadata={} weights=[0.09523809523809523, 0.42857142857142855, 0.047619047619047616, 0.09523809523809523, 0.3333333333333333] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary' | |
| ββ Hold-out eval (5/5 valid) ββ | |
| mean regret: -0.0941 | |
| beat baseline: 3/5 | |
| Found HuggingFace hub cache directory: /tmp/CarbonAlpha/hf_cache/hub | |
| Checking cache directory for required files... | |
| Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 0%| | 0/2 [00:00<?, ?it/s][A | |
| Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 100%|ββββββββββ| 2/2 [00:01<00:00, 1.37it/s][A | |
| Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 100%|ββββββββββ| 2/2 [00:01<00:00, 1.37it/s] | |
| Successfully copied all 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged` | |
| Checking cache directory for required files... | |
| Cache check failed: tokenizer.model not found in local cache. | |
| Not all required files found in cache. Will proceed with downloading. | |
| Unsloth: Preparing safetensor model files: 0%| | 0/2 [00:00<?, ?it/s][A | |
| Unsloth: Preparing safetensor model files: 100%|ββββββββββ| 2/2 [00:00<00:00, 60787.01it/s] | |
| Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A | |
| Unsloth: Merging weights into 16bit: 50%|βββββ | 1/2 [00:31<00:31, 31.55s/it][A | |
| Unsloth: Merging weights into 16bit: 100%|ββββββββββ| 2/2 [00:55<00:00, 26.86s/it][A | |
| Unsloth: Merging weights into 16bit: 100%|ββββββββββ| 2/2 [00:55<00:00, 27.56s/it] | |
| Unsloth: Merge process complete. Saved to `/tmp/CarbonAlpha/checkpoints/final_merged` | |
| Saved LoRA adapters to /tmp/CarbonAlpha/checkpoints/final_merged | |
| [rank0]:[W425 10:13:19.025103781 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |