| Retrying due to status code 502. text= | |
| ====================================================================== | |
| MEMORY ROUTING AGENT - FULL TRAINING PIPELINE | |
| ====================================================================== | |
| Experiment: memory_routing_v1 | |
| Output: training/experiments/memory_routing_v1_20251124_165000 | |
| Base model: meta-llama/Llama-3.1-8B | |
| LoRA rank: 32 | |
| ====================================================================== | |
| PHASE 1: SUPERVISED FINE-TUNING | |
| ====================================================================== | |
| Train: 800, Test: 200 | |
| Learning rate: 2.86e-04 | |
| Step 0: train_loss=3.4228, test_loss=2.6279, time=3.3s | |
| Step 1: train_loss=2.5284, time=34.7s | |
| Step 2: train_loss=2.0672, time=4.1s | |
| Step 3: train_loss=1.7094, time=4.3s | |
| Step 4: train_loss=1.5843, time=2.5s | |
| Step 5: train_loss=1.4973, time=3.0s | |
| Step 6: train_loss=1.3900, time=4.6s | |
| Step 7: train_loss=1.4226, time=24.7s | |
| Step 8: train_loss=1.3094, time=2.6s | |
| Step 9: train_loss=1.3240, time=3.4s | |
| Step 10: train_loss=1.1783, test_loss=1.1197, time=2.9s | |
| Step 11: train_loss=1.1683, time=3.0s | |
| Step 12: train_loss=1.2817, time=3.1s | |
| Step 13: train_loss=0.9658, time=2.4s | |
| Step 14: train_loss=0.8791, time=34.4s | |
| Step 15: train_loss=0.7782, time=33.0s | |
| Step 16: train_loss=0.7206, time=3.1s | |
| Step 17: train_loss=0.6524, time=2.4s | |
| Step 18: train_loss=0.5603, time=2.9s | |
| Step 19: train_loss=0.5045, time=4.4s | |
| Step 20: train_loss=0.4175, test_loss=0.3288, time=2.7s | |
| Step 21: train_loss=0.3219, time=2.2s | |
| Step 22: train_loss=0.3643, time=2.4s | |
| Step 23: train_loss=0.3799, time=2.1s | |
| Step 24: train_loss=0.3603, time=2.4s | |
| Step 25: train_loss=0.5269, time=1.9s | |
| Step 26: train_loss=0.3044, time=29.7s | |
| Step 27: train_loss=0.2869, time=3.5s | |
| Step 28: train_loss=0.2994, time=4.4s | |
| Step 29: train_loss=0.3266, time=2.2s | |
| Step 30: train_loss=0.3303, test_loss=0.2598, time=2.3s | |
| Step 31: train_loss=0.2958, time=1.8s | |
| Step 32: train_loss=0.3050, time=2.0s | |
| Step 33: train_loss=0.3092, time=33.7s | |
| Step 34: train_loss=0.2802, time=2.1s | |
| Step 35: train_loss=0.3087, time=2.0s | |
| Step 36: train_loss=0.3042, time=2.0s | |
| Step 37: train_loss=0.4495, time=3.2s | |
| Step 38: train_loss=0.2939, time=2.0s | |
| Step 39: train_loss=0.2473, time=2.0s | |
| Step 40: train_loss=0.2092, test_loss=0.2544, time=2.8s | |
| Step 41: train_loss=0.2836, time=2.9s | |
| Step 42: train_loss=0.2363, time=2.0s | |
| Step 43: train_loss=0.2641, time=2.1s | |
| Step 44: train_loss=0.2647, time=2.2s | |
| Step 45: train_loss=0.2634, time=3.5s | |
| Step 46: train_loss=0.2576, time=2.7s | |
| Step 47: train_loss=0.2471, time=2.5s | |
| Step 48: train_loss=0.2778, time=2.7s | |
| Step 49: train_loss=0.2875, time=7.9s | |
| Step 50: train_loss=0.4188, test_loss=0.2334, time=2.2s | |
| Step 51: train_loss=0.2511, time=2.7s | |
| Step 52: train_loss=0.1968, time=28.9s | |
| Step 53: train_loss=0.2182, time=2.8s | |
| Step 54: train_loss=0.2473, time=34.8s | |
| Step 55: train_loss=0.2404, time=2.6s | |
| Step 56: train_loss=0.2247, time=2.5s | |
| Step 57: train_loss=0.2161, time=2.2s | |
| Step 58: train_loss=0.2167, time=1.9s | |
| Step 59: train_loss=0.2116, time=2.1s | |
| Step 60: train_loss=0.2304, test_loss=0.2018, time=3.1s | |
| Step 61: train_loss=0.2512, time=2.8s | |
| Step 62: train_loss=0.2886, time=2.0s | |
| Step 63: train_loss=0.2893, time=1.9s | |
| Step 64: train_loss=0.2319, time=2.0s | |
| Step 65: train_loss=0.1766, time=1.9s | |
| Step 66: train_loss=0.2583, time=2.3s | |
| Step 67: train_loss=0.2068, time=3.1s | |
| Step 68: train_loss=0.2338, time=2.5s | |
| Step 69: train_loss=0.2009, time=2.0s | |
| Step 70: train_loss=0.1942, test_loss=0.1832, time=2.6s | |
| Step 71: train_loss=0.2030, time=2.2s | |
| Step 72: train_loss=0.1983, time=24.0s | |
| Step 73: train_loss=0.2216, time=2.8s | |
| Step 74: train_loss=0.2449, time=2.7s | |
| Step 75: train_loss=0.3014, time=2.8s | |
| Step 76: train_loss=0.2157, time=2.8s | |
| Step 77: train_loss=0.2117, time=16.5s | |
| Step 78: train_loss=0.2102, time=32.4s | |
| Step 79: train_loss=0.2355, time=2.1s | |
| Step 80: train_loss=0.2199, test_loss=0.1973, time=2.3s | |
| Step 81: train_loss=0.2125, time=3.6s | |
| Step 82: train_loss=0.2148, time=2.2s | |
| Step 83: train_loss=0.1887, time=2.5s | |
| Step 84: train_loss=0.1713, time=31.9s | |
| Step 85: train_loss=0.2361, time=2.3s | |
| Step 86: train_loss=0.1958, time=35.1s | |
| Step 87: train_loss=0.2396, time=2.3s | |
| Step 88: train_loss=0.2032, time=32.1s | |
| Step 89: train_loss=0.1682, time=82.7s | |
| Step 90: train_loss=0.1952, test_loss=0.1960, time=2.6s | |
| Step 91: train_loss=0.2146, time=2.3s | |
| Step 92: train_loss=0.1845, time=28.6s | |
| Step 93: train_loss=0.2103, time=3.3s | |
| Step 94: train_loss=0.1943, time=3.3s | |
| Step 95: train_loss=0.1729, time=3.1s | |
| Step 96: train_loss=0.1698, time=2.8s | |
| Step 97: train_loss=0.2020, time=3.2s | |
| Step 98: train_loss=0.1963, time=3.6s | |
| Step 99: train_loss=0.2097, test_loss=0.1150, time=3.1s | |
| Saving final SFT checkpoint... | |
| SFT State checkpoint: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/weights/sft_final | |
| SFT Sampler checkpoint: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/sampler_weights/sft_final_sampler | |
| --- Evaluating: SFT Model --- | |
| Evaluated 50/200 | |
| Evaluated 100/200 | |
| Evaluated 150/200 | |
| Evaluated 200/200 | |
| Any Match: 87.0% | |
| Exact Match: 39.0% | |
| F1: 69.2% | |
| Mean Reward: 0.772 | |
| ====================================================================== | |
| PHASE 2: REINFORCEMENT LEARNING | |
| ====================================================================== | |
| Training examples: 800 | |
| RL iterations: 15 | |
| Batch size: 32, Group size: 8 | |
| Loading SFT checkpoint: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/weights/sft_final | |
| --- Iteration 1/15 --- | |
| Reward: 0.872 ± 0.192, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 2/15 --- | |
| Reward: 0.842 ± 0.235, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 3/15 --- | |
| Reward: 0.823 ± 0.247, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 4/15 --- | |
| Reward: 0.901 ± 0.158, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 5/15 --- | |
| Reward: 0.852 ± 0.214, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 6/15 --- | |
| Reward: 0.843 ± 0.251, Acc: 99.6%, Format: 99.6% | |
| --- Iteration 7/15 --- | |
| Reward: 0.859 ± 0.214, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 8/15 --- | |
| Reward: 0.899 ± 0.159, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 9/15 --- | |
| Reward: 0.870 ± 0.175, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 10/15 --- | |
| Reward: 0.866 ± 0.234, Acc: 99.6%, Format: 99.6% | |
| --- Iteration 11/15 --- | |
| Reward: 0.845 ± 0.238, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 12/15 --- | |
| Reward: 0.908 ± 0.148, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 13/15 --- | |
| Reward: 0.838 ± 0.234, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 14/15 --- | |
| Reward: 0.899 ± 0.143, Acc: 100.0%, Format: 100.0% | |
| --- Iteration 15/15 --- | |
| Reward: 0.895 ± 0.147, Acc: 100.0%, Format: 100.0% | |
| Saving final RL checkpoint... | |
| RL checkpoint: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/sampler_weights/rl_final | |
| --- Evaluating: RL Model --- | |
| Evaluated 50/200 | |
| Evaluated 100/200 | |
| Evaluated 150/200 | |
| Evaluated 200/200 | |
| Any Match: 90.0% | |
| Exact Match: 42.5% | |
| F1: 72.3% | |
| Mean Reward: 0.792 | |
| ====================================================================== | |
| TRAINING COMPLETE | |
| ====================================================================== | |
| Results saved to: training/experiments/memory_routing_v1_20251124_165000/results.json | |
| Final Model: tinker://b6c9686e-b64d-5cd9-b9e5-a882b0f69d6a:train:0/sampler_weights/rl_final | |
| Comparison: | |
| SFT - F1: 69.2%, Any Match: 87.0% | |
| RL - F1: 72.3%, Any Match: 90.0% | |