🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. 🦥 Unsloth Zoo will now patch everything to make training faster! Loading unsloth/Qwen3-4B-Instruct-2507... INFO 04-25 09:59:05 [vllm_utils.py:724] Unsloth: Patching vLLM v1 graph capture ==((====))== Unsloth 2026.4.8: Fast Qwen3 patching. Transformers: 4.56.2. vLLM: 0.15.1. \\ /| NVIDIA L40S. Num GPUs = 1. Max memory: 44.392 GB. Platform: Linux. O^O/ \_/ \ Torch: 2.9.1+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.1 \ / Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False] "-____-" Free license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored! Unsloth: FlashInfer requires JIT compilation but nvcc (CUDA compiler) is not found. vLLM will use FLASH_ATTN attention + PyTorch sampler instead (works fine). To enable FlashInfer, install the missing tools: nvcc - install the CUDA toolkit or set CUDA_HOME to your CUDA installation ninja - pip install ninja To silence this warning: set UNSLOTH_VLLM_NO_FLASHINFER=1 Unsloth: vLLM loading unsloth/Qwen3-4B-Instruct-2507 with actual GPU utilization = 89.06% Unsloth: Your GPU has CUDA compute capability 8.9 with VRAM = 44.39 GB. Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 4096. Num Sequences = 96. Unsloth: vLLM's KV Cache can use up to 32.5 GB. Also swap space = 6 GB. Unsloth: Not an error, but `use_cudagraph` is not supported in vLLM.config.CompilationConfig. Skipping. Unsloth: Not an error, but `use_inductor` is not supported in vLLM.config.CompilationConfig. Skipping. WARNING 04-25 09:59:07 [compilation.py:762] Level is deprecated and will be removed in the next release,either 0.12.0 or 0.11.2 whichever is soonest.Use mode instead.If both level and mode are given,only mode will be used. Unsloth: Not an error, but `device` is not supported in vLLM. Skipping. /root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/pydantic/type_adapter.py:607: UserWarning: Pydantic serializer warnings: PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [field_name='mode', input_value=3, input_type=int]) return self.serializer.to_python( INFO 04-25 09:59:07 [utils.py:261] non-default args: {'dtype': torch.bfloat16, 'max_model_len': 4096, 'enable_prefix_caching': True, 'swap_space': 6, 'gpu_memory_utilization': 0.8906117106477057, 'max_num_batched_tokens': 8192, 'max_num_seqs': 96, 'max_logprobs': 0, 'disable_log_stats': True, 'enable_lora': True, 'enable_chunked_prefill': True, 'compilation_config': {'level': 3, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': [], 'splitting_ops': None, 'compile_mm_encoder': False, 'compile_sizes': None, 'compile_ranges_split_points': None, 'inductor_compile_config': {'epilogue_fusion': True, 'max_autotune': False, 'shape_padding': True, 'trace.enabled': False, 'triton.cudagraphs': False, 'debug': False, 'dce': True, 'memory_planning': True, 'coordinate_descent_tuning': False, 'trace.graph_diagram': False, 'compile_threads': 8, 'group_fusion': True, 'disable_progress': False, 'verbose_progress': True, 'triton.multi_kernel': 0, 'triton.use_block_ptr': True, 'triton.enable_persistent_tma_matmul': True, 'triton.autotune_at_compile_time': False, 'triton.cooperative_reductions': False, 'cuda.compile_opt_level': '-O2', 'cuda.enable_cuda_lto': True, 'combo_kernels': False, 'benchmark_combo_kernel': True, 'combo_kernel_foreach_dynamic_shapes': True, 'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': None, 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': None, 'pass_config': {}, 'max_cudagraph_capture_size': None, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}, 'model': 'unsloth/Qwen3-4B-Instruct-2507'} WARNING 04-25 09:59:07 [arg_utils.py:1220] The global random seed is set to 0. Since VLLM_ENABLE_V1_MULTIPROCESSING is set to False, this may affect the random state of the Python process that launched vLLM. INFO 04-25 09:59:14 [model.py:541] Resolved architecture: Qwen3ForCausalLM INFO 04-25 09:59:14 [model.py:1561] Using max model len 4096 INFO 04-25 09:59:15 [scheduler.py:226] Chunked prefill is enabled with max_num_batched_tokens=8192. INFO 04-25 09:59:15 [vllm.py:624] Asynchronous scheduling is enabled. generation_config.json: 0%| | 0.00/237 [00:00, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 192, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []} INFO 04-25 09:59:16 [parallel_state.py:1212] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.113.93.102:50843 backend=nccl INFO 04-25 09:59:16 [parallel_state.py:1423] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A INFO 04-25 09:59:16 [gpu_model_runner.py:4033] Starting to load model unsloth/Qwen3-4B-Instruct-2507... /root/.cache/uv/environments-v2/hf-train-2a0e45940eaf9e50/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. We recommend installing via `pip install torch-c-dlpack-ext` warnings.warn( INFO 04-25 09:59:19 [cuda.py:364] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') model.safetensors.index.json: 0%| | 0.00/32.9k [00:00. Unsloth 2026.4.8 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers. VRAM allocated: 41.84 GB ══ SFT warm-start — sft_traces/traces_v2.jsonl ══ 120 SFT examples loaded (chat format in `text`) Unsloth: Tokenizing ["text"] (num_proc=12): 0%| | 0/120 [00:00 1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the immediate 3-year cumulative real return is heavily penalized. The 12-quarter lockup forces a defensive tilt. 3rd-order: The fragmentation of global supply chains acts as a massive structural headwind for TECH and GREEN. The base case [parse_action result]: metadata={} weights=[0.0, 0.4, 0.0, 0.2, 0.4] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='fragmentation' ── Hold-out eval (5/5 valid) ── mean regret: -0.2516 beat baseline: 0/5 ══ GRPO Phase 1: 4Q episodes, 50 iters, rewards=['format', 'regret'] ══ Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it. Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it. ==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 \\ /| Num examples = 200 | Num Epochs = 1 | Total steps = 50 O^O/ \_/ \ Batch size per device = 4 | Gradient accumulation steps = 1 \ / Data Parallel GPUs = 1 | Total batch size (4 x 1 x 1) = 4 "-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained) 0%| | 0/50 [00:00 1st-order: EV adoption surges, directly driving demand for GREEN energy and EV supply chains. 2nd-order: As EVs displace ICE vehicles, OIL demand faces structural headwinds over the 12-quarter cycle, forcing a long-term rotation away from fossil fuels. 3rd-order: The massive capital deployment into EV infrastructure acts as a massive liquidity pump, supporting TECH and REAL_ESTATE valuations. Base-rate: Today's news strongly signals a structural transition away from OIL and a green b [parse_action result]: metadata={} weights=[0.35, 0.05, 0.45, 0.1, 0.05] infra_commit=0.15 carbon_offset_buy=0.0 put_hedge=0.0 tech_bet='green_leaps' ── Hold-out eval (5/5 valid) ── mean regret: -0.0037 beat baseline: 4/5 ══ GRPO Phase 2: 8Q episodes, 100 iters, rewards=['format', 'regret', 'sharpe', 'drawdown'] ══ Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it. Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it. ==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 \\ /| Num examples = 600 | Num Epochs = 1 | Total steps = 100 O^O/ \_/ \ Batch size per device = 6 | Gradient accumulation steps = 1 \ / Data Parallel GPUs = 1 | Total batch size (6 x 1 x 1) = 6 "-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained) 0%| | 0/100 [00:00 1st-order: Insurers exiting Florida and California triggers a massive flight-to-safety, driving 10-year Treasuries down and freezing municipal bonds. 2nd-order: The freeze in municipal bonds directly crushes the yield curve, making long-duration BONDS a dead asset over the next 12 quarters. 3rd-order: The physical loss of insurance capital in the Gulf Coast and Bay Area will eventually trigger a broader real estate market correction, severely hurting REAL_ESTATE. Base case: Deflation [parse_action result]: metadata={} weights=[0.2, 0.05, 0.05, 0.0, 0.7] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary' ── Hold-out eval (5/5 valid) ── mean regret: -0.0391 beat baseline: 2/5 ══ GRPO Phase 3: 12Q episodes, 80 iters, rewards=['format', 'regret', 'sharpe', 'drawdown', 'carbon'] ══ Unsloth: The DAPO paper recommends `mask_truncated_completions = True` - we will set it. Unsloth: The DAPO paper recommends `epsilon_high = 0.28` - we will set it. ==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 \\ /| Num examples = 480 | Num Epochs = 1 | Total steps = 80 O^O/ \_/ \ Batch size per device = 6 | Gradient accumulation steps = 1 \ / Data Parallel GPUs = 1 | Total batch size (6 x 1 x 1) = 6 "-____-" Trainable parameters = 33,030,144 of 4,055,498,240 (0.81% trained) 0%| | 0/80 [00:00 1st-order: China's export restrictions and US semiconductor controls directly choke the supply chain for critical green tech components, severely constraining GREEN and TECH growth for the next 18 months. 2nd-order: As global supply chains fracture, the massive overcapacity in the oil sector will be rapidly absorbed by industrial demand, driving a structural inflationary squeeze. This stagflationary regime will crush BONDS and compress REAL_ESTATE valuations. 3rd-order: The forced lo [parse_action result]: metadata={} weights=[0.09523809523809523, 0.42857142857142855, 0.047619047619047616, 0.09523809523809523, 0.3333333333333333] infra_commit=0.0 carbon_offset_buy=0.0 put_hedge=0.03 tech_bet='inflationary' ── Hold-out eval (5/5 valid) ── mean regret: -0.0941 beat baseline: 3/5 Found HuggingFace hub cache directory: /tmp/CarbonAlpha/hf_cache/hub Checking cache directory for required files... Unsloth: Copying 2 files from cache to `/tmp/CarbonAlpha/checkpoints/final_merged`: 0%| | 0/2 [00:00