File size: 7,375 Bytes
685d968 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
Retrying due to status code 502. text= ====================================================================== MEMORY ROUTING AGENT - FULL TRAINING PIPELINE ====================================================================== Experiment: memory_routing_v1 Output: training/experiments/memory_routing_v1_20251124_165000 Base model: meta-llama/Llama-3.1-8B LoRA rank: 32 ====================================================================== PHASE 1: SUPERVISED FINE-TUNING ====================================================================== Train: 800, Test: 200 Learning rate: 2.86e-04 Step 0: train_loss=3.4228, test_loss=2.6279, time=3.3s Step 1: train_loss=2.5284, time=34.7s Step 2: train_loss=2.0672, time=4.1s Step 3: train_loss=1.7094, time=4.3s Step 4: train_loss=1.5843, time=2.5s Step 5: train_loss=1.4973, time=3.0s Step 6: train_loss=1.3900, time=4.6s Step 7: train_loss=1.4226, time=24.7s Step 8: train_loss=1.3094, time=2.6s Step 9: train_loss=1.3240, time=3.4s Step 10: train_loss=1.1783, test_loss=1.1197, time=2.9s Step 11: train_loss=1.1683, time=3.0s Step 12: train_loss=1.2817, time=3.1s Step 13: train_loss=0.9658, time=2.4s Step 14: train_loss=0.8791, time=34.4s Step 15: train_loss=0.7782, time=33.0s Step 16: train_loss=0.7206, time=3.1s Step 17: train_loss=0.6524, time=2.4s Step 18: train_loss=0.5603, time=2.9s Step 19: train_loss=0.5045, time=4.4s Step 20: train_loss=0.4175, test_loss=0.3288, time=2.7s Step 21: train_loss=0.3219, time=2.2s Step 22: train_loss=0.3643, time=2.4s Step 23: train_loss=0.3799, time=2.1s Step 24: train_loss=0.3603, time=2.4s Step 25: train_loss=0.5269, time=1.9s Step 26: train_loss=0.3044, time=29.7s Step 27: train_loss=0.2869, time=3.5s Step 28: train_loss=0.2994, time=4.4s Step 29: train_loss=0.3266, time=2.2s Step 30: train_loss=0.3303, test_loss=0.2598, time=2.3s Step 31: train_loss=0.2958, time=1.8s Step 32: train_loss=0.3050, time=2.0s Step 33: train_loss=0.3092, time=33.7s Step 34: train_loss=0.2802, time=2.1s Step 35: train_loss=0.3087, time=2.0s Step 36: train_loss=0.3042, time=2.0s Step 37: train_loss=0.4495, time=3.2s Step 38: train_loss=0.2939, time=2.0s Step 39: train_loss=0.2473, time=2.0s Step 40: train_loss=0.2092, test_loss=0.2544, time=2.8s Step 41: train_loss=0.2836, time=2.9s Step 42: train_loss=0.2363, time=2.0s Step 43: train_loss=0.2641, time=2.1s Step 44: train_loss=0.2647, time=2.2s Step 45: train_loss=0.2634, time=3.5s Step 46: train_loss=0.2576, time=2.7s Step 47: train_loss=0.2471, time=2.5s Step 48: train_loss=0.2778, time=2.7s Step 49: train_loss=0.2875, time=7.9s Step 50: train_loss=0.4188, test_loss=0.2334, time=2.2s Step 51: train_loss=0.2511, time=2.7s Step 52: train_loss=0.1968, time=28.9s Step 53: train_loss=0.2182, time=2.8s Step 54: train_loss=0.2473, time=34.8s Step 55: train_loss=0.2404, time=2.6s Step 56: train_loss=0.2247, time=2.5s Step 57: train_loss=0.2161, time=2.2s Step 58: train_loss=0.2167, time=1.9s Step 59: train_loss=0.2116, time=2.1s Step 60: train_loss=0.2304, test_loss=0.2018, time=3.1s Step 61: train_loss=0.2512, time=2.8s Step 62: train_loss=0.2886, time=2.0s Step 63: train_loss=0.2893, time=1.9s Step 64: train_loss=0.2319, time=2.0s Step 65: train_loss=0.1766, time=1.9s Step 66: train_loss=0.2583, time=2.3s Step 67: train_loss=0.2068, time=3.1s Step 68: train_loss=0.2338, time=2.5s Step 69: train_loss=0.2009, time=2.0s Step 70: train_loss=0.1942, test_loss=0.1832, time=2.6s Step 71: train_loss=0.2030, time=2.2s Step 72: train_loss=0.1983, time=24.0s Step 73: train_loss=0.2216, time=2.8s Step 74: train_loss=0.2449, time=2.7s Step 75: train_loss=0.3014, time=2.8s Step 76: train_loss=0.2157, time=2.8s Step 77: train_loss=0.2117, time=16.5s Step 78: train_loss=0.2102, time=32.4s Step 79: train_loss=0.2355, time=2.1s Step 80: train_loss=0.2199, test_loss=0.1973, time=2.3s Step 81: train_loss=0.2125, time=3.6s Step 82: train_loss=0.2148, time=2.2s Step 83: train_loss=0.1887, time=2.5s Step 84: train_loss=0.1713, time=31.9s Step 85: train_loss=0.2361, time=2.3s Step 86: train_loss=0.1958, time=35.1s Step 87: train_loss=0.2396, time=2.3s Step 88: train_loss=0.2032, time=32.1s Step 89: train_loss=0.1682, time=82.7s Step 90: train_loss=0.1952, test_loss=0.1960, time=2.6s Step 91: train_loss=0.2146, time=2.3s Step 92: train_loss=0.1845, time=28.6s Step 93: train_loss=0.2103, time=3.3s Step 94: train_loss=0.1943, time=3.3s Step 95: train_loss=0.1729, time=3.1s Step 96: train_loss=0.1698, time=2.8s Step 97: train_loss=0.2020, time=3.2s Step 98: train_loss=0.1963, time=3.6s Step 99: train_loss=0.2097, test_loss=0.1150, time=3.1s Saving final SFT checkpoint... SFT State checkpoint: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/weights/sft_final SFT Sampler checkpoint: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/sampler_weights/sft_final_sampler --- Evaluating: SFT Model --- Evaluated 50/200 Evaluated 100/200 Evaluated 150/200 Evaluated 200/200 Any Match: 87.0% Exact Match: 39.0% F1: 69.2% Mean Reward: 0.772 ====================================================================== PHASE 2: REINFORCEMENT LEARNING ====================================================================== Training examples: 800 RL iterations: 15 Batch size: 32, Group size: 8 Loading SFT checkpoint: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/weights/sft_final --- Iteration 1/15 --- Reward: 0.872 ± 0.192, Acc: 100.0%, Format: 100.0% --- Iteration 2/15 --- Reward: 0.842 ± 0.235, Acc: 100.0%, Format: 100.0% --- Iteration 3/15 --- Reward: 0.823 ± 0.247, Acc: 100.0%, Format: 100.0% --- Iteration 4/15 --- Reward: 0.901 ± 0.158, Acc: 100.0%, Format: 100.0% --- Iteration 5/15 --- Reward: 0.852 ± 0.214, Acc: 100.0%, Format: 100.0% --- Iteration 6/15 --- Reward: 0.843 ± 0.251, Acc: 99.6%, Format: 99.6% --- Iteration 7/15 --- Reward: 0.859 ± 0.214, Acc: 100.0%, Format: 100.0% --- Iteration 8/15 --- Reward: 0.899 ± 0.159, Acc: 100.0%, Format: 100.0% --- Iteration 9/15 --- Reward: 0.870 ± 0.175, Acc: 100.0%, Format: 100.0% --- Iteration 10/15 --- Reward: 0.866 ± 0.234, Acc: 99.6%, Format: 99.6% --- Iteration 11/15 --- Reward: 0.845 ± 0.238, Acc: 100.0%, Format: 100.0% --- Iteration 12/15 --- Reward: 0.908 ± 0.148, Acc: 100.0%, Format: 100.0% --- Iteration 13/15 --- Reward: 0.838 ± 0.234, Acc: 100.0%, Format: 100.0% --- Iteration 14/15 --- Reward: 0.899 ± 0.143, Acc: 100.0%, Format: 100.0% --- Iteration 15/15 --- Reward: 0.895 ± 0.147, Acc: 100.0%, Format: 100.0% Saving final RL checkpoint... RL checkpoint: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/sampler_weights/rl_final --- Evaluating: RL Model --- Evaluated 50/200 Evaluated 100/200 Evaluated 150/200 Evaluated 200/200 Any Match: 90.0% Exact Match: 42.5% F1: 72.3% Mean Reward: 0.792 ====================================================================== TRAINING COMPLETE ====================================================================== Results saved to: training/experiments/memory_routing_v1_20251124_165000/results.json Final Model: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/sampler_weights/rl_final Comparison: SFT - F1: 69.2%, Any Match: 87.0% RL - F1: 72.3%, Any Match: 90.0% |