MuratcanKoylan's picture
Upload folder using huggingface_hub
685d968 verified
======================================================================
MEMORY ROUTING AGENT - TRAINING PIPELINE v2
======================================================================
Log directory: training/logs/run_20251124_200256
Model: meta-llama/Llama-3.1-8B
RL Groups: 64, Group Size: 32
Train: 1800, Test: 201
======================================================================
PHASE 1: SUPERVISED FINE-TUNING
======================================================================
Learning rate: 2.86e-04 (LoRA-adjusted)
Steps: 100, Batch size: 32
Gradient accumulation: 1
Effective batch size: 32
Early stopping patience: 5 evals
Total completion tokens: 11,541.0
(LoRA works well when completion tokens < LoRA params)
[SFT 0] Loss: 5.4671 | Test: 3.6343 | Time: 32.6s
[SFT 1] Loss: 3.7487 | Test: N/A | Time: 2.0s
[SFT 2] Loss: 2.4716 | Test: N/A | Time: 1.9s
[SFT 3] Loss: 2.1727 | Test: N/A | Time: 1.8s
[SFT 4] Loss: 2.2810 | Test: N/A | Time: 2.1s
[SFT 5] Loss: 1.8691 | Test: N/A | Time: 1.8s
[SFT 6] Loss: 1.8894 | Test: N/A | Time: 1.8s
[SFT 7] Loss: 1.5066 | Test: N/A | Time: 2.5s
[SFT 8] Loss: 1.5398 | Test: N/A | Time: 39.8s
[SFT 9] Loss: 1.7029 | Test: N/A | Time: 23.8s
[SFT 10] Loss: 1.4991 | Test: 1.2472 | Time: 3.8s
[SFT 11] Loss: 1.2880 | Test: N/A | Time: 38.3s
[SFT 12] Loss: 1.1976 | Test: N/A | Time: 2.1s
[SFT 13] Loss: 1.1008 | Test: N/A | Time: 1.7s
[SFT 14] Loss: 1.0307 | Test: N/A | Time: 1.8s
[SFT 15] Loss: 0.9700 | Test: N/A | Time: 1.9s
[SFT 16] Loss: 0.9220 | Test: N/A | Time: 1.6s
[SFT 17] Loss: 0.6043 | Test: N/A | Time: 1.7s
[SFT 18] Loss: 0.4576 | Test: N/A | Time: 3.3s
[SFT 19] Loss: 0.3646 | Test: N/A | Time: 6.0s
[SFT 20] Loss: 0.3698 | Test: 0.3547 | Time: 2.9s
[SFT 21] Loss: 0.3075 | Test: N/A | Time: 2.1s
[SFT 22] Loss: 0.3561 | Test: N/A | Time: 1.9s
[SFT 23] Loss: 0.3464 | Test: N/A | Time: 1.8s
[SFT 24] Loss: 0.4513 | Test: N/A | Time: 35.8s
[SFT 25] Loss: 0.3381 | Test: N/A | Time: 2.0s
[SFT 26] Loss: 0.4228 | Test: N/A | Time: 1.9s
[SFT 27] Loss: 0.3424 | Test: N/A | Time: 2.1s
[SFT 28] Loss: 0.4407 | Test: N/A | Time: 2.0s
[SFT 29] Loss: 0.3198 | Test: N/A | Time: 1.7s
[SFT 30] Loss: 0.3410 | Test: 0.2509 | Time: 4.1s
[SFT 31] Loss: 0.3987 | Test: N/A | Time: 2.2s
[SFT 32] Loss: 0.2976 | Test: N/A | Time: 39.7s
[SFT 33] Loss: 0.3058 | Test: N/A | Time: 24.5s
[SFT 34] Loss: 0.3336 | Test: N/A | Time: 8.6s
[SFT 35] Loss: 0.2664 | Test: N/A | Time: 31.4s
[SFT 36] Loss: 0.3167 | Test: N/A | Time: 8.5s
[SFT 37] Loss: 0.1997 | Test: N/A | Time: 2.6s
[SFT 38] Loss: 0.3690 | Test: N/A | Time: 3.7s
[SFT 39] Loss: 0.2222 | Test: N/A | Time: 2.3s
[SFT 40] Loss: 0.2838 | Test: 0.2286 | Time: 13.0s
[SFT 41] Loss: 0.2845 | Test: N/A | Time: 31.8s
[SFT 42] Loss: 0.3012 | Test: N/A | Time: 2.4s
[SFT 43] Loss: 0.2602 | Test: N/A | Time: 32.2s
[SFT 44] Loss: 0.2745 | Test: N/A | Time: 3.1s
[SFT 45] Loss: 0.3184 | Test: N/A | Time: 3.1s
[SFT 46] Loss: 0.3594 | Test: N/A | Time: 1.8s
[SFT 47] Loss: 0.3876 | Test: N/A | Time: 57.8s
[SFT 48] Loss: 0.2056 | Test: N/A | Time: 2.0s
[SFT 49] Loss: 0.3571 | Test: N/A | Time: 1.8s
[SFT 50] Loss: 0.2431 | Test: 0.1731 | Time: 1.8s
[SFT 51] Loss: 0.2366 | Test: N/A | Time: 29.2s
[SFT 52] Loss: 0.2144 | Test: N/A | Time: 1.9s
[SFT 53] Loss: 0.3431 | Test: N/A | Time: 1.9s
[SFT 54] Loss: 0.1824 | Test: N/A | Time: 2.0s
[SFT 55] Loss: 0.2290 | Test: N/A | Time: 1.9s
[SFT 56] Loss: 0.1782 | Test: N/A | Time: 25.9s
[SFT 57] Loss: 0.3247 | Test: N/A | Time: 3.0s
[SFT 58] Loss: 0.2719 | Test: N/A | Time: 2.6s
[SFT 59] Loss: 0.3262 | Test: N/A | Time: 37.2s
[SFT 60] Loss: 0.3060 | Test: 0.1461 | Time: 2.0s
[SFT 61] Loss: 0.1350 | Test: N/A | Time: 2.0s
[SFT 62] Loss: 0.1798 | Test: N/A | Time: 3.4s
[SFT 63] Loss: 0.2052 | Test: N/A | Time: 13.1s
[SFT 64] Loss: 0.2290 | Test: N/A | Time: 2.9s
[SFT 65] Loss: 0.2151 | Test: N/A | Time: 2.5s
[SFT 66] Loss: 0.2592 | Test: N/A | Time: 1.8s
[SFT 67] Loss: 0.2380 | Test: N/A | Time: 1.5s
[SFT 68] Loss: 0.2634 | Test: N/A | Time: 7.6s
[SFT 69] Loss: 0.2840 | Test: N/A | Time: 25.9s
[SFT 70] Loss: 0.2459 | Test: 0.1466 | Time: 2.0s
[SFT 71] Loss: 0.2175 | Test: N/A | Time: 2.0s
[SFT 72] Loss: 0.2801 | Test: N/A | Time: 1.7s
[SFT 73] Loss: 0.2118 | Test: N/A | Time: 1.6s
[SFT 74] Loss: 0.2317 | Test: N/A | Time: 2.0s
[SFT 75] Loss: 0.2686 | Test: N/A | Time: 1.7s
[SFT 76] Loss: 0.1551 | Test: N/A | Time: 1.7s
[SFT 77] Loss: 0.1563 | Test: N/A | Time: 11.2s
[SFT 78] Loss: 0.2685 | Test: N/A | Time: 25.7s
[SFT 79] Loss: 0.2555 | Test: N/A | Time: 2.0s
[SFT 80] Loss: 0.1970 | Test: 0.1482 | Time: 1.9s
[SFT 81] Loss: 0.2625 | Test: N/A | Time: 3.5s
[SFT 82] Loss: 0.1867 | Test: N/A | Time: 1.7s
[SFT 83] Loss: 0.1692 | Test: N/A | Time: 2.8s
[SFT 84] Loss: 0.1564 | Test: N/A | Time: 2.5s
[SFT 85] Loss: 0.3328 | Test: N/A | Time: 1.9s
[SFT 86] Loss: 0.2639 | Test: N/A | Time: 23.6s
[SFT 87] Loss: 0.1613 | Test: N/A | Time: 2.0s
[SFT 88] Loss: 0.2312 | Test: N/A | Time: 1.9s
[SFT 89] Loss: 0.2950 | Test: N/A | Time: 6.2s
[SFT 90] Loss: 0.2510 | Test: 0.1050 | Time: 2.3s
[SFT 91] Loss: 0.2559 | Test: N/A | Time: 40.9s
[SFT 92] Loss: 0.3120 | Test: N/A | Time: 2.4s
[SFT 93] Loss: 0.2267 | Test: N/A | Time: 1.6s
[SFT 94] Loss: 0.3272 | Test: N/A | Time: 2.2s
[SFT 95] Loss: 0.3016 | Test: N/A | Time: 1.9s
[SFT 96] Loss: 0.2956 | Test: N/A | Time: 1.8s
[SFT 97] Loss: 0.3144 | Test: N/A | Time: 2.7s
[SFT 98] Loss: 0.2225 | Test: N/A | Time: 38.8s
[SFT 99] Loss: 0.2622 | Test: 0.1475 | Time: 2.1s
SFT Complete.
Final checkpoint: tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/sampler_weights/sft_final_sampler
Best checkpoint (loss=0.1050): tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/sampler_weights/sft_step_0090
State for RL: tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/weights/sft_final
----------------------------------------------------------------------
Evaluating: tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/sampler_weights/sft_final_sampler
SFT: Any=82.0%, Exact=71.0%, F1=78.0%, Reward=0.836
======================================================================
PHASE 2: REINFORCEMENT LEARNING
======================================================================
Loading SFT state: tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/weights/sft_final
Iterations: 30
Groups per batch: 64
Group size: 32
Total rollouts per iteration: 2048
Learning rate: 2.00e-05
KL threshold: 0.01
[RL 0] Reward: 0.727 (±0.347) | Acc: 99.7% | KL_v1: -0.0133 | KL_v2: 0.0168 | Active: 40/64 | Time: 282.6s
WARNING: KL_v2 0.0168 exceeds threshold 0.01
[RL 1] Reward: 0.721 (±0.336) | Acc: 100.0% | KL_v1: -0.0080 | KL_v2: 0.0212 | Active: 42/64 | Time: 338.3s
WARNING: KL_v2 0.0212 exceeds threshold 0.01
[RL 2] Reward: 0.759 (±0.309) | Acc: 100.0% | KL_v1: -0.0084 | KL_v2: 0.0220 | Active: 38/64 | Time: 366.4s
WARNING: KL_v2 0.0220 exceeds threshold 0.01
[RL 3] Reward: 0.834 (±0.276) | Acc: 100.0% | KL_v1: -0.0074 | KL_v2: 0.0191 | Active: 31/64 | Time: 429.3s
WARNING: KL_v2 0.0191 exceeds threshold 0.01
[RL 4] Reward: 0.793 (±0.269) | Acc: 100.0% | KL_v1: -0.0082 | KL_v2: 0.0271 | Active: 44/64 | Time: 237.0s
WARNING: KL_v2 0.0271 exceeds threshold 0.01
[RL 5] Reward: 0.832 (±0.265) | Acc: 100.0% | KL_v1: -0.0020 | KL_v2: 0.0223 | Active: 31/64 | Time: 305.1s
WARNING: KL_v2 0.0223 exceeds threshold 0.01
[RL 6] Reward: 0.816 (±0.268) | Acc: 100.0% | KL_v1: -0.0100 | KL_v2: 0.0200 | Active: 37/64 | Time: 483.4s
WARNING: KL_v2 0.0200 exceeds threshold 0.01
[RL 7] Reward: 0.839 (±0.242) | Acc: 100.0% | KL_v1: -0.0106 | KL_v2: 0.0133 | Active: 33/64 | Time: 242.8s
WARNING: KL_v2 0.0133 exceeds threshold 0.01
Retrying due to status code 502. text=
[RL 8] Reward: 0.862 (±0.235) | Acc: 100.0% | KL_v1: -0.0068 | KL_v2: 0.0174 | Active: 29/64 | Time: 382.3s
WARNING: KL_v2 0.0174 exceeds threshold 0.01
[RL 9] Reward: 0.824 (±0.285) | Acc: 99.9% | KL_v1: -0.0105 | KL_v2: 0.0138 | Active: 36/64 | Time: 378.2s
WARNING: KL_v2 0.0138 exceeds threshold 0.01
Retrying due to status code 502. text=
[RL 10] Reward: 0.862 (±0.230) | Acc: 100.0% | KL_v1: -0.0081 | KL_v2: 0.0118 | Active: 31/64 | Time: 200.5s
WARNING: KL_v2 0.0118 exceeds threshold 0.01
[RL 11] Reward: 0.881 (±0.215) | Acc: 100.0% | KL_v1: -0.0116 | KL_v2: 0.0124 | Active: 24/64 | Time: 211.0s
WARNING: KL_v2 0.0124 exceeds threshold 0.01
[RL 12] Reward: 0.872 (±0.255) | Acc: 100.0% | KL_v1: -0.0098 | KL_v2: 0.0116 | Active: 19/64 | Time: 266.4s
WARNING: KL_v2 0.0116 exceeds threshold 0.01
[RL 13] Reward: 0.890 (±0.211) | Acc: 100.0% | KL_v1: -0.0086 | KL_v2: 0.0112 | Active: 25/64 | Time: 344.2s
WARNING: KL_v2 0.0112 exceeds threshold 0.01
[RL 14] Reward: 0.881 (±0.218) | Acc: 100.0% | KL_v1: -0.0064 | KL_v2: 0.0109 | Active: 28/64 | Time: 358.3s
WARNING: KL_v2 0.0109 exceeds threshold 0.01
[RL 15] Reward: 0.893 (±0.210) | Acc: 100.0% | KL_v1: -0.0068 | KL_v2: 0.0119 | Active: 24/64 | Time: 394.3s
WARNING: KL_v2 0.0119 exceeds threshold 0.01
[RL 16] Reward: 0.860 (±0.232) | Acc: 100.0% | KL_v1: -0.0092 | KL_v2: 0.0098 | Active: 32/64 | Time: 320.4s
[RL 17] Reward: 0.885 (±0.197) | Acc: 100.0% | KL_v1: -0.0092 | KL_v2: 0.0087 | Active: 25/64 | Time: 654.1s
[RL 18] Reward: 0.802 (±0.280) | Acc: 100.0% | KL_v1: -0.0140 | KL_v2: 0.0096 | Active: 34/64 | Time: 409.3s
[RL 19] Reward: 0.854 (±0.213) | Acc: 100.0% | KL_v1: -0.0089 | KL_v2: 0.0094 | Active: 27/64 | Time: 427.3s
[RL 20] Reward: 0.877 (±0.228) | Acc: 100.0% | KL_v1: -0.0078 | KL_v2: 0.0110 | Active: 23/64 | Time: 182.2s
WARNING: KL_v2 0.0110 exceeds threshold 0.01
[RL 21] Reward: 0.878 (±0.221) | Acc: 100.0% | KL_v1: -0.0101 | KL_v2: 0.0094 | Active: 24/64 | Time: 317.5s
[RL 22] Reward: 0.914 (±0.196) | Acc: 100.0% | KL_v1: -0.0060 | KL_v2: 0.0145 | Active: 18/64 | Time: 350.9s
WARNING: KL_v2 0.0145 exceeds threshold 0.01
[RL 23] Reward: 0.856 (±0.244) | Acc: 100.0% | KL_v1: -0.0096 | KL_v2: 0.0080 | Active: 27/64 | Time: 398.3s
[RL 24] Reward: 0.849 (±0.235) | Acc: 100.0% | KL_v1: -0.0060 | KL_v2: 0.0118 | Active: 31/64 | Time: 292.1s
WARNING: KL_v2 0.0118 exceeds threshold 0.01
[RL 25] Reward: 0.834 (±0.260) | Acc: 100.0% | KL_v1: -0.0099 | KL_v2: 0.0101 | Active: 27/64 | Time: 261.0s
WARNING: KL_v2 0.0101 exceeds threshold 0.01
[RL 26] Reward: 0.868 (±0.228) | Acc: 100.0% | KL_v1: -0.0059 | KL_v2: 0.0110 | Active: 30/64 | Time: 255.0s
WARNING: KL_v2 0.0110 exceeds threshold 0.01
[RL 27] Reward: 0.867 (±0.222) | Acc: 100.0% | KL_v1: -0.0044 | KL_v2: 0.0106 | Active: 25/64 | Time: 447.1s
WARNING: KL_v2 0.0106 exceeds threshold 0.01
[RL 28] Reward: 0.929 (±0.144) | Acc: 100.0% | KL_v1: -0.0048 | KL_v2: 0.0134 | Active: 19/64 | Time: 420.3s
WARNING: KL_v2 0.0134 exceeds threshold 0.01
[RL 29] Reward: 0.848 (±0.239) | Acc: 99.9% | KL_v1: -0.0060 | KL_v2: 0.0109 | Active: 25/64 | Time: 297.6s
WARNING: KL_v2 0.0109 exceeds threshold 0.01
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /training/train_v2.py", line 1017, in <module>
asyncio.run(main())
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 650, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /training/train_v2.py", line 982, in main
rl_final = await run_rl(
^^^^^^^^^^^^^
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /training/train_v2.py", line 870, in run_rl
final_result = await final_future.result_async()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/api_future.py", line 37, in result_async
return await asyncio.wrap_future(self._future)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/training_client.py", line 484, in _save_weights_for_sampler_async
result = await self._save_weights_for_sampler_impl(request_id, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 222, in __aexit__
await self.gen.athrow(typ, value, traceback)
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/telemetry.py", line 309, in acapture_exceptions
yield
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/telemetry.py", line 384, in _awrapper
return await cast(Callable[..., Awaitable[R]], func)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 222, in __aexit__
await self.gen.athrow(typ, value, traceback)
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/training_client.py", line 112, in _take_turn
yield
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/training_client.py", line 469, in _save_weights_for_sampler_impl
future = await self.holder.execute_with_retries(_send_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/internal_client_holder.py", line 306, in execute_with_retries
raise e
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/internal_client_holder.py", line 267, in execute_with_retries
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/training_client.py", line 464, in _send_request
return await client.weights.save_for_sampler(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/resources/weights.py", line 153, in save_for_sampler
return await self._post(
^^^^^^^^^^^^^^^^^
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/_base_client.py", line 1232, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/_base_client.py", line 1033, in request
raise self._make_status_error_from_response(err.response) from None
tinker.ConflictError: Error code: 409 - {'detail': "Checkpoint 'rl_final' already exists for model 4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0 in sampler_weights. Please choose a different name to avoid overwriting."}