File size: 16,239 Bytes
685d968
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
======================================================================
MEMORY ROUTING AGENT - TRAINING PIPELINE v2
======================================================================
Log directory: training/logs/run_20251124_200256
Model: meta-llama/Llama-3.1-8B
RL Groups: 64, Group Size: 32

Train: 1800, Test: 201

======================================================================
PHASE 1: SUPERVISED FINE-TUNING
======================================================================
Learning rate: 2.86e-04 (LoRA-adjusted)
Steps: 100, Batch size: 32
Gradient accumulation: 1
Effective batch size: 32
Early stopping patience: 5 evals

Total completion tokens: 11,541.0
(LoRA works well when completion tokens < LoRA params)

[SFT   0] Loss: 5.4671 | Test: 3.6343 | Time: 32.6s
[SFT   1] Loss: 3.7487 | Test: N/A | Time: 2.0s
[SFT   2] Loss: 2.4716 | Test: N/A | Time: 1.9s
[SFT   3] Loss: 2.1727 | Test: N/A | Time: 1.8s
[SFT   4] Loss: 2.2810 | Test: N/A | Time: 2.1s
[SFT   5] Loss: 1.8691 | Test: N/A | Time: 1.8s
[SFT   6] Loss: 1.8894 | Test: N/A | Time: 1.8s
[SFT   7] Loss: 1.5066 | Test: N/A | Time: 2.5s
[SFT   8] Loss: 1.5398 | Test: N/A | Time: 39.8s
[SFT   9] Loss: 1.7029 | Test: N/A | Time: 23.8s
[SFT  10] Loss: 1.4991 | Test: 1.2472 | Time: 3.8s
[SFT  11] Loss: 1.2880 | Test: N/A | Time: 38.3s
[SFT  12] Loss: 1.1976 | Test: N/A | Time: 2.1s
[SFT  13] Loss: 1.1008 | Test: N/A | Time: 1.7s
[SFT  14] Loss: 1.0307 | Test: N/A | Time: 1.8s
[SFT  15] Loss: 0.9700 | Test: N/A | Time: 1.9s
[SFT  16] Loss: 0.9220 | Test: N/A | Time: 1.6s
[SFT  17] Loss: 0.6043 | Test: N/A | Time: 1.7s
[SFT  18] Loss: 0.4576 | Test: N/A | Time: 3.3s
[SFT  19] Loss: 0.3646 | Test: N/A | Time: 6.0s
[SFT  20] Loss: 0.3698 | Test: 0.3547 | Time: 2.9s
[SFT  21] Loss: 0.3075 | Test: N/A | Time: 2.1s
[SFT  22] Loss: 0.3561 | Test: N/A | Time: 1.9s
[SFT  23] Loss: 0.3464 | Test: N/A | Time: 1.8s
[SFT  24] Loss: 0.4513 | Test: N/A | Time: 35.8s
[SFT  25] Loss: 0.3381 | Test: N/A | Time: 2.0s
[SFT  26] Loss: 0.4228 | Test: N/A | Time: 1.9s
[SFT  27] Loss: 0.3424 | Test: N/A | Time: 2.1s
[SFT  28] Loss: 0.4407 | Test: N/A | Time: 2.0s
[SFT  29] Loss: 0.3198 | Test: N/A | Time: 1.7s
[SFT  30] Loss: 0.3410 | Test: 0.2509 | Time: 4.1s
[SFT  31] Loss: 0.3987 | Test: N/A | Time: 2.2s
[SFT  32] Loss: 0.2976 | Test: N/A | Time: 39.7s
[SFT  33] Loss: 0.3058 | Test: N/A | Time: 24.5s
[SFT  34] Loss: 0.3336 | Test: N/A | Time: 8.6s
[SFT  35] Loss: 0.2664 | Test: N/A | Time: 31.4s
[SFT  36] Loss: 0.3167 | Test: N/A | Time: 8.5s
[SFT  37] Loss: 0.1997 | Test: N/A | Time: 2.6s
[SFT  38] Loss: 0.3690 | Test: N/A | Time: 3.7s
[SFT  39] Loss: 0.2222 | Test: N/A | Time: 2.3s
[SFT  40] Loss: 0.2838 | Test: 0.2286 | Time: 13.0s
[SFT  41] Loss: 0.2845 | Test: N/A | Time: 31.8s
[SFT  42] Loss: 0.3012 | Test: N/A | Time: 2.4s
[SFT  43] Loss: 0.2602 | Test: N/A | Time: 32.2s
[SFT  44] Loss: 0.2745 | Test: N/A | Time: 3.1s
[SFT  45] Loss: 0.3184 | Test: N/A | Time: 3.1s
[SFT  46] Loss: 0.3594 | Test: N/A | Time: 1.8s
[SFT  47] Loss: 0.3876 | Test: N/A | Time: 57.8s
[SFT  48] Loss: 0.2056 | Test: N/A | Time: 2.0s
[SFT  49] Loss: 0.3571 | Test: N/A | Time: 1.8s
[SFT  50] Loss: 0.2431 | Test: 0.1731 | Time: 1.8s
[SFT  51] Loss: 0.2366 | Test: N/A | Time: 29.2s
[SFT  52] Loss: 0.2144 | Test: N/A | Time: 1.9s
[SFT  53] Loss: 0.3431 | Test: N/A | Time: 1.9s
[SFT  54] Loss: 0.1824 | Test: N/A | Time: 2.0s
[SFT  55] Loss: 0.2290 | Test: N/A | Time: 1.9s
[SFT  56] Loss: 0.1782 | Test: N/A | Time: 25.9s
[SFT  57] Loss: 0.3247 | Test: N/A | Time: 3.0s
[SFT  58] Loss: 0.2719 | Test: N/A | Time: 2.6s
[SFT  59] Loss: 0.3262 | Test: N/A | Time: 37.2s
[SFT  60] Loss: 0.3060 | Test: 0.1461 | Time: 2.0s
[SFT  61] Loss: 0.1350 | Test: N/A | Time: 2.0s
[SFT  62] Loss: 0.1798 | Test: N/A | Time: 3.4s
[SFT  63] Loss: 0.2052 | Test: N/A | Time: 13.1s
[SFT  64] Loss: 0.2290 | Test: N/A | Time: 2.9s
[SFT  65] Loss: 0.2151 | Test: N/A | Time: 2.5s
[SFT  66] Loss: 0.2592 | Test: N/A | Time: 1.8s
[SFT  67] Loss: 0.2380 | Test: N/A | Time: 1.5s
[SFT  68] Loss: 0.2634 | Test: N/A | Time: 7.6s
[SFT  69] Loss: 0.2840 | Test: N/A | Time: 25.9s
[SFT  70] Loss: 0.2459 | Test: 0.1466 | Time: 2.0s
[SFT  71] Loss: 0.2175 | Test: N/A | Time: 2.0s
[SFT  72] Loss: 0.2801 | Test: N/A | Time: 1.7s
[SFT  73] Loss: 0.2118 | Test: N/A | Time: 1.6s
[SFT  74] Loss: 0.2317 | Test: N/A | Time: 2.0s
[SFT  75] Loss: 0.2686 | Test: N/A | Time: 1.7s
[SFT  76] Loss: 0.1551 | Test: N/A | Time: 1.7s
[SFT  77] Loss: 0.1563 | Test: N/A | Time: 11.2s
[SFT  78] Loss: 0.2685 | Test: N/A | Time: 25.7s
[SFT  79] Loss: 0.2555 | Test: N/A | Time: 2.0s
[SFT  80] Loss: 0.1970 | Test: 0.1482 | Time: 1.9s
[SFT  81] Loss: 0.2625 | Test: N/A | Time: 3.5s
[SFT  82] Loss: 0.1867 | Test: N/A | Time: 1.7s
[SFT  83] Loss: 0.1692 | Test: N/A | Time: 2.8s
[SFT  84] Loss: 0.1564 | Test: N/A | Time: 2.5s
[SFT  85] Loss: 0.3328 | Test: N/A | Time: 1.9s
[SFT  86] Loss: 0.2639 | Test: N/A | Time: 23.6s
[SFT  87] Loss: 0.1613 | Test: N/A | Time: 2.0s
[SFT  88] Loss: 0.2312 | Test: N/A | Time: 1.9s
[SFT  89] Loss: 0.2950 | Test: N/A | Time: 6.2s
[SFT  90] Loss: 0.2510 | Test: 0.1050 | Time: 2.3s
[SFT  91] Loss: 0.2559 | Test: N/A | Time: 40.9s
[SFT  92] Loss: 0.3120 | Test: N/A | Time: 2.4s
[SFT  93] Loss: 0.2267 | Test: N/A | Time: 1.6s
[SFT  94] Loss: 0.3272 | Test: N/A | Time: 2.2s
[SFT  95] Loss: 0.3016 | Test: N/A | Time: 1.9s
[SFT  96] Loss: 0.2956 | Test: N/A | Time: 1.8s
[SFT  97] Loss: 0.3144 | Test: N/A | Time: 2.7s
[SFT  98] Loss: 0.2225 | Test: N/A | Time: 38.8s
[SFT  99] Loss: 0.2622 | Test: 0.1475 | Time: 2.1s

SFT Complete.
  Final checkpoint: tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/sampler_weights/sft_final_sampler
  Best checkpoint (loss=0.1050): tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/sampler_weights/sft_step_0090
  State for RL: tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/weights/sft_final

----------------------------------------------------------------------

Evaluating: tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/sampler_weights/sft_final_sampler
SFT: Any=82.0%, Exact=71.0%, F1=78.0%, Reward=0.836

======================================================================
PHASE 2: REINFORCEMENT LEARNING
======================================================================
Loading SFT state: tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/weights/sft_final
Iterations: 30
Groups per batch: 64
Group size: 32
Total rollouts per iteration: 2048
Learning rate: 2.00e-05
KL threshold: 0.01

[RL   0] Reward: 0.727 (±0.347) | Acc: 99.7% | KL_v1: -0.0133 | KL_v2: 0.0168 | Active: 40/64 | Time: 282.6s
WARNING: KL_v2 0.0168 exceeds threshold 0.01
[RL   1] Reward: 0.721 (±0.336) | Acc: 100.0% | KL_v1: -0.0080 | KL_v2: 0.0212 | Active: 42/64 | Time: 338.3s
WARNING: KL_v2 0.0212 exceeds threshold 0.01
[RL   2] Reward: 0.759 (±0.309) | Acc: 100.0% | KL_v1: -0.0084 | KL_v2: 0.0220 | Active: 38/64 | Time: 366.4s
WARNING: KL_v2 0.0220 exceeds threshold 0.01
[RL   3] Reward: 0.834 (±0.276) | Acc: 100.0% | KL_v1: -0.0074 | KL_v2: 0.0191 | Active: 31/64 | Time: 429.3s
WARNING: KL_v2 0.0191 exceeds threshold 0.01
[RL   4] Reward: 0.793 (±0.269) | Acc: 100.0% | KL_v1: -0.0082 | KL_v2: 0.0271 | Active: 44/64 | Time: 237.0s
WARNING: KL_v2 0.0271 exceeds threshold 0.01
[RL   5] Reward: 0.832 (±0.265) | Acc: 100.0% | KL_v1: -0.0020 | KL_v2: 0.0223 | Active: 31/64 | Time: 305.1s
WARNING: KL_v2 0.0223 exceeds threshold 0.01
[RL   6] Reward: 0.816 (±0.268) | Acc: 100.0% | KL_v1: -0.0100 | KL_v2: 0.0200 | Active: 37/64 | Time: 483.4s
WARNING: KL_v2 0.0200 exceeds threshold 0.01
[RL   7] Reward: 0.839 (±0.242) | Acc: 100.0% | KL_v1: -0.0106 | KL_v2: 0.0133 | Active: 33/64 | Time: 242.8s
WARNING: KL_v2 0.0133 exceeds threshold 0.01
Retrying due to status code 502. text=
[RL   8] Reward: 0.862 (±0.235) | Acc: 100.0% | KL_v1: -0.0068 | KL_v2: 0.0174 | Active: 29/64 | Time: 382.3s
WARNING: KL_v2 0.0174 exceeds threshold 0.01
[RL   9] Reward: 0.824 (±0.285) | Acc: 99.9% | KL_v1: -0.0105 | KL_v2: 0.0138 | Active: 36/64 | Time: 378.2s
WARNING: KL_v2 0.0138 exceeds threshold 0.01
Retrying due to status code 502. text=
[RL  10] Reward: 0.862 (±0.230) | Acc: 100.0% | KL_v1: -0.0081 | KL_v2: 0.0118 | Active: 31/64 | Time: 200.5s
WARNING: KL_v2 0.0118 exceeds threshold 0.01
[RL  11] Reward: 0.881 (±0.215) | Acc: 100.0% | KL_v1: -0.0116 | KL_v2: 0.0124 | Active: 24/64 | Time: 211.0s
WARNING: KL_v2 0.0124 exceeds threshold 0.01
[RL  12] Reward: 0.872 (±0.255) | Acc: 100.0% | KL_v1: -0.0098 | KL_v2: 0.0116 | Active: 19/64 | Time: 266.4s
WARNING: KL_v2 0.0116 exceeds threshold 0.01
[RL  13] Reward: 0.890 (±0.211) | Acc: 100.0% | KL_v1: -0.0086 | KL_v2: 0.0112 | Active: 25/64 | Time: 344.2s
WARNING: KL_v2 0.0112 exceeds threshold 0.01
[RL  14] Reward: 0.881 (±0.218) | Acc: 100.0% | KL_v1: -0.0064 | KL_v2: 0.0109 | Active: 28/64 | Time: 358.3s
WARNING: KL_v2 0.0109 exceeds threshold 0.01
[RL  15] Reward: 0.893 (±0.210) | Acc: 100.0% | KL_v1: -0.0068 | KL_v2: 0.0119 | Active: 24/64 | Time: 394.3s
WARNING: KL_v2 0.0119 exceeds threshold 0.01
[RL  16] Reward: 0.860 (±0.232) | Acc: 100.0% | KL_v1: -0.0092 | KL_v2: 0.0098 | Active: 32/64 | Time: 320.4s
[RL  17] Reward: 0.885 (±0.197) | Acc: 100.0% | KL_v1: -0.0092 | KL_v2: 0.0087 | Active: 25/64 | Time: 654.1s
[RL  18] Reward: 0.802 (±0.280) | Acc: 100.0% | KL_v1: -0.0140 | KL_v2: 0.0096 | Active: 34/64 | Time: 409.3s
[RL  19] Reward: 0.854 (±0.213) | Acc: 100.0% | KL_v1: -0.0089 | KL_v2: 0.0094 | Active: 27/64 | Time: 427.3s
[RL  20] Reward: 0.877 (±0.228) | Acc: 100.0% | KL_v1: -0.0078 | KL_v2: 0.0110 | Active: 23/64 | Time: 182.2s
WARNING: KL_v2 0.0110 exceeds threshold 0.01
[RL  21] Reward: 0.878 (±0.221) | Acc: 100.0% | KL_v1: -0.0101 | KL_v2: 0.0094 | Active: 24/64 | Time: 317.5s
[RL  22] Reward: 0.914 (±0.196) | Acc: 100.0% | KL_v1: -0.0060 | KL_v2: 0.0145 | Active: 18/64 | Time: 350.9s
WARNING: KL_v2 0.0145 exceeds threshold 0.01
[RL  23] Reward: 0.856 (±0.244) | Acc: 100.0% | KL_v1: -0.0096 | KL_v2: 0.0080 | Active: 27/64 | Time: 398.3s
[RL  24] Reward: 0.849 (±0.235) | Acc: 100.0% | KL_v1: -0.0060 | KL_v2: 0.0118 | Active: 31/64 | Time: 292.1s
WARNING: KL_v2 0.0118 exceeds threshold 0.01
[RL  25] Reward: 0.834 (±0.260) | Acc: 100.0% | KL_v1: -0.0099 | KL_v2: 0.0101 | Active: 27/64 | Time: 261.0s
WARNING: KL_v2 0.0101 exceeds threshold 0.01
[RL  26] Reward: 0.868 (±0.228) | Acc: 100.0% | KL_v1: -0.0059 | KL_v2: 0.0110 | Active: 30/64 | Time: 255.0s
WARNING: KL_v2 0.0110 exceeds threshold 0.01
[RL  27] Reward: 0.867 (±0.222) | Acc: 100.0% | KL_v1: -0.0044 | KL_v2: 0.0106 | Active: 25/64 | Time: 447.1s
WARNING: KL_v2 0.0106 exceeds threshold 0.01
[RL  28] Reward: 0.929 (±0.144) | Acc: 100.0% | KL_v1: -0.0048 | KL_v2: 0.0134 | Active: 19/64 | Time: 420.3s
WARNING: KL_v2 0.0134 exceeds threshold 0.01
[RL  29] Reward: 0.848 (±0.239) | Acc: 99.9% | KL_v1: -0.0060 | KL_v2: 0.0109 | Active: 25/64 | Time: 297.6s
WARNING: KL_v2 0.0109 exceeds threshold 0.01
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /training/train_v2.py", line 1017, in <module>
    asyncio.run(main())
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 650, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /training/train_v2.py", line 982, in main
    rl_final = await run_rl(
               ^^^^^^^^^^^^^
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /training/train_v2.py", line 870, in run_rl
    final_result = await final_future.result_async()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/api_future.py", line 37, in result_async
    return await asyncio.wrap_future(self._future)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/training_client.py", line 484, in _save_weights_for_sampler_async
    result = await self._save_weights_for_sampler_impl(request_id, name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 222, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/telemetry.py", line 309, in acapture_exceptions
    yield
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/telemetry.py", line 384, in _awrapper
    return await cast(Callable[..., Awaitable[R]], func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 222, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/training_client.py", line 112, in _take_turn
    yield
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/training_client.py", line 469, in _save_weights_for_sampler_impl
    future = await self.holder.execute_with_retries(_send_request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/internal_client_holder.py", line 306, in execute_with_retries
    raise e
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/internal_client_holder.py", line 267, in execute_with_retries
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/lib/public_interfaces/training_client.py", line 464, in _send_request
    return await client.weights.save_for_sampler(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/resources/weights.py", line 153, in save_for_sampler
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/_base_client.py", line 1232, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/muratcankoylan/tinker/memory-routing-project/docs/tinker_Prompt-Distillation /venv/lib/python3.11/site-packages/tinker/_base_client.py", line 1033, in request
    raise self._make_status_error_from_response(err.response) from None
tinker.ConflictError: Error code: 409 - {'detail': "Checkpoint 'rl_final' already exists for model 4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0 in sampler_weights. Please choose a different name to avoid overwriting."}