LorMolf commited on
Commit
fd8125f
·
verified ·
1 Parent(s): 4b58638

Upload FirstAttack checkpoint envstep_70000.pth.tar

Browse files
README.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: pytorch
3
+ tags:
4
+ - efficientzero
5
+ - muzero
6
+ - board-game
7
+ - combinatorial-reasoning
8
+ - crpt
9
+ ---
10
+
11
+ # FirstAttack-CK
12
+
13
+ Latest exported EfficientZero checkpoint for `simplified__first_attack` from the simplified-five CRPT training runs.
14
+
15
+ - Checkpoint: `checkpoints/envstep_70000.pth.tar`
16
+ - Source path: `/workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/ckpt/envstep_70000.pth.tar`
17
+ - Source attempt directory: `/workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543`
18
+ - W&B project: `crpt-simplified5-constant-lr-continuation`
19
+ - W&B run: `constant_lr_20260519__simplified__first_attack__a01`
20
+ - Uploaded at: `2026-05-19T15:04:12Z`
21
+
22
+ Companion metadata is stored under `metadata/`, including the resolved LightZero config when available.
checkpoints/envstep_70000.pth.tar ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2f36645173aafec9555502a284e9be2f2e2c4df8703d2c8031a09735e36f932
3
+ size 108123787
metadata/checkpoint_index.yaml ADDED
@@ -0,0 +1,2186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ checkpoints:
2
+ ckpt_best.pth.tar:
3
+ checkpoint_name: ckpt_best.pth.tar
4
+ checkpoint_path: /workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/ckpt/ckpt_best.pth.tar
5
+ saved_at: '2026-05-19T09:54:18.856840+00:00'
6
+ train_iter: 496
7
+ envstep: 5018
8
+ trigger: null
9
+ latest_scalars:
10
+ evaluator_iter/episode_count:
11
+ value: 10
12
+ step: 496
13
+ walltime: null
14
+ evaluator_step/episode_count:
15
+ value: 10
16
+ step: 5018
17
+ walltime: null
18
+ evaluator_iter/envstep_count:
19
+ value: 30
20
+ step: 496
21
+ walltime: null
22
+ evaluator_step/envstep_count:
23
+ value: 30
24
+ step: 5018
25
+ walltime: null
26
+ evaluator_iter/avg_envstep_per_episode:
27
+ value: 3.0
28
+ step: 496
29
+ walltime: null
30
+ evaluator_step/avg_envstep_per_episode:
31
+ value: 3.0
32
+ step: 5018
33
+ walltime: null
34
+ evaluator_iter/evaluate_time:
35
+ value: 2.766915771484375
36
+ step: 496
37
+ walltime: null
38
+ evaluator_step/evaluate_time:
39
+ value: 2.766915771484375
40
+ step: 5018
41
+ walltime: null
42
+ evaluator_iter/avg_envstep_per_sec:
43
+ value: 10.842397267447652
44
+ step: 496
45
+ walltime: null
46
+ evaluator_step/avg_envstep_per_sec:
47
+ value: 10.842397267447652
48
+ step: 5018
49
+ walltime: null
50
+ evaluator_iter/avg_time_per_episode:
51
+ value: 3.614132422482551
52
+ step: 496
53
+ walltime: null
54
+ evaluator_step/avg_time_per_episode:
55
+ value: 3.614132422482551
56
+ step: 5018
57
+ walltime: null
58
+ evaluator_iter/reward_mean:
59
+ value: 1.0
60
+ step: 496
61
+ walltime: null
62
+ evaluator_step/reward_mean:
63
+ value: 1.0
64
+ step: 5018
65
+ walltime: null
66
+ evaluator_iter/reward_std:
67
+ value: 0.0
68
+ step: 496
69
+ walltime: null
70
+ evaluator_step/reward_std:
71
+ value: 0.0
72
+ step: 5018
73
+ walltime: null
74
+ evaluator_iter/reward_max:
75
+ value: 1.0
76
+ step: 496
77
+ walltime: null
78
+ evaluator_step/reward_max:
79
+ value: 1.0
80
+ step: 5018
81
+ walltime: null
82
+ evaluator_iter/reward_min:
83
+ value: 1.0
84
+ step: 496
85
+ walltime: null
86
+ evaluator_step/reward_min:
87
+ value: 1.0
88
+ step: 5018
89
+ walltime: null
90
+ Buffer/Task_0/num_collected_episodes:
91
+ value: 1016
92
+ step: 496
93
+ walltime: null
94
+ Buffer/Task_0/num_game_segments:
95
+ value: 1016
96
+ step: 496
97
+ walltime: null
98
+ Buffer/Task_0/num_transitions:
99
+ value: 5018
100
+ step: 496
101
+ walltime: null
102
+ Buffer/Task_0/memory_usage_mb/game_segment_buffer:
103
+ value: 4.832954406738281
104
+ step: 496
105
+ walltime: null
106
+ Buffer/Task_0/memory_usage_mb/process:
107
+ value: 2043.26171875
108
+ step: 496
109
+ walltime: null
110
+ learner_iter/collect_mcts_temperature_avg:
111
+ value: 0.25
112
+ step: 400
113
+ walltime: null
114
+ learner_step/collect_mcts_temperature_avg:
115
+ value: 0.25
116
+ step: 4098
117
+ walltime: null
118
+ learner_iter/cur_lr_avg:
119
+ value: 0.0029999999999999996
120
+ step: 400
121
+ walltime: null
122
+ learner_step/cur_lr_avg:
123
+ value: 0.0029999999999999996
124
+ step: 4098
125
+ walltime: null
126
+ learner_iter/weighted_total_loss_avg:
127
+ value: 1.5632856217297642
128
+ step: 400
129
+ walltime: null
130
+ learner_step/weighted_total_loss_avg:
131
+ value: 1.5632856217297642
132
+ step: 4098
133
+ walltime: null
134
+ learner_iter/total_loss_avg:
135
+ value: 1.5632856217297642
136
+ step: 400
137
+ walltime: null
138
+ learner_step/total_loss_avg:
139
+ value: 1.5632856217297642
140
+ step: 4098
141
+ walltime: null
142
+ learner_iter/policy_loss_avg:
143
+ value: 4.736617175015536
144
+ step: 400
145
+ walltime: null
146
+ learner_step/policy_loss_avg:
147
+ value: 4.736617175015536
148
+ step: 4098
149
+ walltime: null
150
+ learner_iter/policy_entropy_avg:
151
+ value: 1.7712429798010623
152
+ step: 400
153
+ walltime: null
154
+ learner_step/policy_entropy_avg:
155
+ value: 1.7712429798010623
156
+ step: 4098
157
+ walltime: null
158
+ learner_iter/target_policy_entropy_avg:
159
+ value: 1.3918844136324795
160
+ step: 400
161
+ walltime: null
162
+ learner_step/target_policy_entropy_avg:
163
+ value: 1.3918844136324795
164
+ step: 4098
165
+ walltime: null
166
+ learner_iter/value_prefix_loss_avg:
167
+ value: 2.92797606641596
168
+ step: 400
169
+ walltime: null
170
+ learner_step/value_prefix_loss_avg:
171
+ value: 2.92797606641596
172
+ step: 4098
173
+ walltime: null
174
+ learner_iter/value_loss_avg:
175
+ value: 2.9029295444488525
176
+ step: 400
177
+ walltime: null
178
+ learner_step/value_loss_avg:
179
+ value: 2.9029295444488525
180
+ step: 4098
181
+ walltime: null
182
+ learner_iter/consistency_loss_avg:
183
+ value: -0.6827039935372092
184
+ step: 400
185
+ walltime: null
186
+ learner_step/consistency_loss_avg:
187
+ value: -0.6827039935372092
188
+ step: 4098
189
+ walltime: null
190
+ learner_iter/value_priority_avg:
191
+ value: 0.2535270005464554
192
+ step: 400
193
+ walltime: null
194
+ learner_step/value_priority_avg:
195
+ value: 0.2535270005464554
196
+ step: 4098
197
+ walltime: null
198
+ learner_iter/target_value_prefix_avg:
199
+ value: 0.6656013456257907
200
+ step: 400
201
+ walltime: null
202
+ learner_step/target_value_prefix_avg:
203
+ value: 0.6656013456257907
204
+ step: 4098
205
+ walltime: null
206
+ learner_iter/target_value_avg:
207
+ value: 0.02758049304512414
208
+ step: 400
209
+ walltime: null
210
+ learner_step/target_value_avg:
211
+ value: 0.02758049304512414
212
+ step: 4098
213
+ walltime: null
214
+ learner_iter/predicted_value_prefixs_avg:
215
+ value: 0.6718856367197904
216
+ step: 400
217
+ walltime: null
218
+ learner_step/predicted_value_prefixs_avg:
219
+ value: 0.6718856367197904
220
+ step: 4098
221
+ walltime: null
222
+ learner_iter/predicted_values_avg:
223
+ value: 0.01786879792978818
224
+ step: 400
225
+ walltime: null
226
+ learner_step/predicted_values_avg:
227
+ value: 0.01786879792978818
228
+ step: 4098
229
+ walltime: null
230
+ learner_iter/transformed_target_value_prefix_avg:
231
+ value: 0.2763666700233113
232
+ step: 400
233
+ walltime: null
234
+ learner_step/transformed_target_value_prefix_avg:
235
+ value: 0.2763666700233113
236
+ step: 4098
237
+ walltime: null
238
+ learner_iter/transformed_target_value_avg:
239
+ value: 0.011451793665235693
240
+ step: 400
241
+ walltime: null
242
+ learner_step/transformed_target_value_avg:
243
+ value: 0.011451793665235693
244
+ step: 4098
245
+ walltime: null
246
+ learner_iter/total_grad_norm_before_clip_avg:
247
+ value: 2.54057898033749
248
+ step: 400
249
+ walltime: null
250
+ learner_step/total_grad_norm_before_clip_avg:
251
+ value: 2.54057898033749
252
+ step: 4098
253
+ walltime: null
254
+ collector_iter/episode_count:
255
+ value: 200
256
+ step: 400
257
+ walltime: null
258
+ collector_step/episode_count:
259
+ value: 200
260
+ step: 4098
261
+ walltime: null
262
+ collector_iter/envstep_count:
263
+ value: 985
264
+ step: 400
265
+ walltime: null
266
+ collector_step/envstep_count:
267
+ value: 985
268
+ step: 4098
269
+ walltime: null
270
+ collector_iter/avg_envstep_per_episode:
271
+ value: 4.925
272
+ step: 400
273
+ walltime: null
274
+ collector_step/avg_envstep_per_episode:
275
+ value: 4.925
276
+ step: 4098
277
+ walltime: null
278
+ collector_iter/avg_envstep_per_sec:
279
+ value: 11.373479111602059
280
+ step: 400
281
+ walltime: null
282
+ collector_step/avg_envstep_per_sec:
283
+ value: 11.373479111602059
284
+ step: 4098
285
+ walltime: null
286
+ collector_iter/avg_episode_per_sec:
287
+ value: 2.3093358602237686
288
+ step: 400
289
+ walltime: null
290
+ collector_step/avg_episode_per_sec:
291
+ value: 2.3093358602237686
292
+ step: 4098
293
+ walltime: null
294
+ collector_iter/collect_time:
295
+ value: 86.60498606756167
296
+ step: 400
297
+ walltime: null
298
+ collector_step/collect_time:
299
+ value: 86.60498606756167
300
+ step: 4098
301
+ walltime: null
302
+ collector_iter/reward_mean:
303
+ value: 1.0
304
+ step: 400
305
+ walltime: null
306
+ collector_step/reward_mean:
307
+ value: 1.0
308
+ step: 4098
309
+ walltime: null
310
+ collector_iter/reward_std:
311
+ value: 0.0
312
+ step: 400
313
+ walltime: null
314
+ collector_step/reward_std:
315
+ value: 0.0
316
+ step: 4098
317
+ walltime: null
318
+ collector_iter/reward_max:
319
+ value: 1.0
320
+ step: 400
321
+ walltime: null
322
+ collector_step/reward_max:
323
+ value: 1.0
324
+ step: 4098
325
+ walltime: null
326
+ collector_iter/reward_min:
327
+ value: 1.0
328
+ step: 400
329
+ walltime: null
330
+ collector_step/reward_min:
331
+ value: 1.0
332
+ step: 4098
333
+ walltime: null
334
+ collector_iter/total_envstep_count:
335
+ value: 4098
336
+ step: 400
337
+ walltime: null
338
+ collector_step/total_envstep_count:
339
+ value: 4098
340
+ step: 4098
341
+ walltime: null
342
+ collector_iter/total_episode_count:
343
+ value: 828
344
+ step: 400
345
+ walltime: null
346
+ collector_step/total_episode_count:
347
+ value: 828
348
+ step: 4098
349
+ walltime: null
350
+ collector_iter/total_duration:
351
+ value: 9723.866266854113
352
+ step: 400
353
+ walltime: null
354
+ collector_step/total_duration:
355
+ value: 9723.866266854113
356
+ step: 4098
357
+ walltime: null
358
+ collector_iter/visit_entropy_mean:
359
+ value: 1.1180463468347202
360
+ step: 400
361
+ walltime: null
362
+ collector_step/visit_entropy_mean:
363
+ value: 1.1180463468347202
364
+ step: 4098
365
+ walltime: null
366
+ envstep_50000.pth.tar:
367
+ checkpoint_name: envstep_50000.pth.tar
368
+ checkpoint_path: /workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/ckpt/envstep_50000.pth.tar
369
+ saved_at: '2026-05-19T13:10:59.906420+00:00'
370
+ train_iter: 5050
371
+ envstep: 50000
372
+ trigger: envstep
373
+ latest_scalars:
374
+ evaluator_iter/episode_count:
375
+ value: 10
376
+ step: 4544
377
+ walltime: null
378
+ evaluator_step/episode_count:
379
+ value: 10
380
+ step: 45002
381
+ walltime: null
382
+ evaluator_iter/envstep_count:
383
+ value: 27
384
+ step: 4544
385
+ walltime: null
386
+ evaluator_step/envstep_count:
387
+ value: 27
388
+ step: 45002
389
+ walltime: null
390
+ evaluator_iter/avg_envstep_per_episode:
391
+ value: 2.7
392
+ step: 4544
393
+ walltime: null
394
+ evaluator_step/avg_envstep_per_episode:
395
+ value: 2.7
396
+ step: 45002
397
+ walltime: null
398
+ evaluator_iter/evaluate_time:
399
+ value: 0.5984893798828125
400
+ step: 4544
401
+ walltime: null
402
+ evaluator_step/evaluate_time:
403
+ value: 0.5984893798828125
404
+ step: 45002
405
+ walltime: null
406
+ evaluator_iter/avg_envstep_per_sec:
407
+ value: 45.113582475409586
408
+ step: 4544
409
+ walltime: null
410
+ evaluator_step/avg_envstep_per_sec:
411
+ value: 45.113582475409586
412
+ step: 45002
413
+ walltime: null
414
+ evaluator_iter/avg_time_per_episode:
415
+ value: 16.7087342501517
416
+ step: 4544
417
+ walltime: null
418
+ evaluator_step/avg_time_per_episode:
419
+ value: 16.7087342501517
420
+ step: 45002
421
+ walltime: null
422
+ evaluator_iter/reward_mean:
423
+ value: 0.4
424
+ step: 4544
425
+ walltime: null
426
+ evaluator_step/reward_mean:
427
+ value: 0.4
428
+ step: 45002
429
+ walltime: null
430
+ evaluator_iter/reward_std:
431
+ value: 0.9165151389911679
432
+ step: 4544
433
+ walltime: null
434
+ evaluator_step/reward_std:
435
+ value: 0.9165151389911679
436
+ step: 45002
437
+ walltime: null
438
+ evaluator_iter/reward_max:
439
+ value: 1.0
440
+ step: 4544
441
+ walltime: null
442
+ evaluator_step/reward_max:
443
+ value: 1.0
444
+ step: 45002
445
+ walltime: null
446
+ evaluator_iter/reward_min:
447
+ value: -1.0
448
+ step: 4544
449
+ walltime: null
450
+ evaluator_step/reward_min:
451
+ value: -1.0
452
+ step: 45002
453
+ walltime: null
454
+ Buffer/Task_0/num_collected_episodes:
455
+ value: 10124
456
+ step: 5050
457
+ walltime: null
458
+ Buffer/Task_0/num_game_segments:
459
+ value: 10124
460
+ step: 5050
461
+ walltime: null
462
+ Buffer/Task_0/num_transitions:
463
+ value: 49994
464
+ step: 5050
465
+ walltime: null
466
+ Buffer/Task_0/memory_usage_mb/game_segment_buffer:
467
+ value: 48.12529754638672
468
+ step: 5050
469
+ walltime: null
470
+ Buffer/Task_0/memory_usage_mb/process:
471
+ value: 2092.62109375
472
+ step: 5050
473
+ walltime: null
474
+ learner_iter/collect_mcts_temperature_avg:
475
+ value: 0.25
476
+ step: 5000
477
+ walltime: null
478
+ learner_step/collect_mcts_temperature_avg:
479
+ value: 0.25
480
+ step: 49518
481
+ walltime: null
482
+ learner_iter/cur_lr_avg:
483
+ value: 0.0029999999999999996
484
+ step: 5000
485
+ walltime: null
486
+ learner_step/cur_lr_avg:
487
+ value: 0.0029999999999999996
488
+ step: 49518
489
+ walltime: null
490
+ learner_iter/weighted_total_loss_avg:
491
+ value: 1.7651414546099575
492
+ step: 5000
493
+ walltime: null
494
+ learner_step/weighted_total_loss_avg:
495
+ value: 1.7651414546099575
496
+ step: 49518
497
+ walltime: null
498
+ learner_iter/total_loss_avg:
499
+ value: 1.7651414546099575
500
+ step: 5000
501
+ walltime: null
502
+ learner_step/total_loss_avg:
503
+ value: 1.7651414546099575
504
+ step: 49518
505
+ walltime: null
506
+ learner_iter/policy_loss_avg:
507
+ value: 4.930113272233442
508
+ step: 5000
509
+ walltime: null
510
+ learner_step/policy_loss_avg:
511
+ value: 4.930113272233442
512
+ step: 49518
513
+ walltime: null
514
+ learner_iter/policy_entropy_avg:
515
+ value: 1.839398933179451
516
+ step: 5000
517
+ walltime: null
518
+ learner_step/policy_entropy_avg:
519
+ value: 1.839398933179451
520
+ step: 49518
521
+ walltime: null
522
+ learner_iter/target_policy_entropy_avg:
523
+ value: 1.4404258294539016
524
+ step: 5000
525
+ walltime: null
526
+ learner_step/target_policy_entropy_avg:
527
+ value: 1.4404258294539016
528
+ step: 49518
529
+ walltime: null
530
+ learner_iter/value_prefix_loss_avg:
531
+ value: 2.920419129458341
532
+ step: 5000
533
+ walltime: null
534
+ learner_step/value_prefix_loss_avg:
535
+ value: 2.920419129458341
536
+ step: 49518
537
+ walltime: null
538
+ learner_iter/value_loss_avg:
539
+ value: 2.864912444894964
540
+ step: 5000
541
+ walltime: null
542
+ learner_step/value_loss_avg:
543
+ value: 2.864912444894964
544
+ step: 49518
545
+ walltime: null
546
+ learner_iter/consistency_loss_avg:
547
+ value: -0.6801619052886962
548
+ step: 5000
549
+ walltime: null
550
+ learner_step/consistency_loss_avg:
551
+ value: -0.6801619052886962
552
+ step: 49518
553
+ walltime: null
554
+ learner_iter/value_priority_avg:
555
+ value: 0.2628427621993152
556
+ step: 5000
557
+ walltime: null
558
+ learner_step/value_priority_avg:
559
+ value: 0.2628427621993152
560
+ step: 49518
561
+ walltime: null
562
+ learner_iter/target_value_prefix_avg:
563
+ value: 0.6663115837357261
564
+ step: 5000
565
+ walltime: null
566
+ learner_step/target_value_prefix_avg:
567
+ value: 0.6663115837357261
568
+ step: 49518
569
+ walltime: null
570
+ learner_iter/target_value_avg:
571
+ value: 0.03586647862737829
572
+ step: 5000
573
+ walltime: null
574
+ learner_step/target_value_avg:
575
+ value: 0.03586647862737829
576
+ step: 49518
577
+ walltime: null
578
+ learner_iter/predicted_value_prefixs_avg:
579
+ value: 0.668508296663111
580
+ step: 5000
581
+ walltime: null
582
+ learner_step/predicted_value_prefixs_avg:
583
+ value: 0.668508296663111
584
+ step: 49518
585
+ walltime: null
586
+ learner_iter/predicted_values_avg:
587
+ value: 0.03528098355640064
588
+ step: 5000
589
+ walltime: null
590
+ learner_step/predicted_values_avg:
591
+ value: 0.03528098355640064
592
+ step: 49518
593
+ walltime: null
594
+ learner_iter/transformed_target_value_prefix_avg:
595
+ value: 0.2766615640033375
596
+ step: 5000
597
+ walltime: null
598
+ learner_step/transformed_target_value_prefix_avg:
599
+ value: 0.2766615640033375
600
+ step: 49518
601
+ walltime: null
602
+ learner_iter/transformed_target_value_avg:
603
+ value: 0.014892246489497747
604
+ step: 5000
605
+ walltime: null
606
+ learner_step/transformed_target_value_avg:
607
+ value: 0.014892246489497747
608
+ step: 49518
609
+ walltime: null
610
+ learner_iter/total_grad_norm_before_clip_avg:
611
+ value: 4.187030717730522
612
+ step: 5000
613
+ walltime: null
614
+ learner_step/total_grad_norm_before_clip_avg:
615
+ value: 4.187030717730522
616
+ step: 49518
617
+ walltime: null
618
+ collector_iter/episode_count:
619
+ value: 200
620
+ step: 5000
621
+ walltime: null
622
+ collector_step/episode_count:
623
+ value: 200
624
+ step: 49518
625
+ walltime: null
626
+ collector_iter/envstep_count:
627
+ value: 986
628
+ step: 5000
629
+ walltime: null
630
+ collector_step/envstep_count:
631
+ value: 986
632
+ step: 49518
633
+ walltime: null
634
+ collector_iter/avg_envstep_per_episode:
635
+ value: 4.93
636
+ step: 5000
637
+ walltime: null
638
+ collector_step/avg_envstep_per_episode:
639
+ value: 4.93
640
+ step: 49518
641
+ walltime: null
642
+ collector_iter/avg_envstep_per_sec:
643
+ value: 12.096509257006582
644
+ step: 5000
645
+ walltime: null
646
+ collector_step/avg_envstep_per_sec:
647
+ value: 12.096509257006582
648
+ step: 49518
649
+ walltime: null
650
+ collector_iter/avg_episode_per_sec:
651
+ value: 2.453652993307623
652
+ step: 5000
653
+ walltime: null
654
+ collector_step/avg_episode_per_sec:
655
+ value: 2.453652993307623
656
+ step: 49518
657
+ walltime: null
658
+ collector_iter/collect_time:
659
+ value: 81.51111854263954
660
+ step: 5000
661
+ walltime: null
662
+ collector_step/collect_time:
663
+ value: 81.51111854263954
664
+ step: 49518
665
+ walltime: null
666
+ collector_iter/reward_mean:
667
+ value: 1.0
668
+ step: 5000
669
+ walltime: null
670
+ collector_step/reward_mean:
671
+ value: 1.0
672
+ step: 49518
673
+ walltime: null
674
+ collector_iter/reward_std:
675
+ value: 0.0
676
+ step: 5000
677
+ walltime: null
678
+ collector_step/reward_std:
679
+ value: 0.0
680
+ step: 49518
681
+ walltime: null
682
+ collector_iter/reward_max:
683
+ value: 1.0
684
+ step: 5000
685
+ walltime: null
686
+ collector_step/reward_max:
687
+ value: 1.0
688
+ step: 49518
689
+ walltime: null
690
+ collector_iter/reward_min:
691
+ value: 1.0
692
+ step: 5000
693
+ walltime: null
694
+ collector_step/reward_min:
695
+ value: 1.0
696
+ step: 49518
697
+ walltime: null
698
+ collector_iter/total_envstep_count:
699
+ value: 49518
700
+ step: 5000
701
+ walltime: null
702
+ collector_step/total_envstep_count:
703
+ value: 49518
704
+ step: 49518
705
+ walltime: null
706
+ collector_iter/total_episode_count:
707
+ value: 10028
708
+ step: 5000
709
+ walltime: null
710
+ collector_step/total_episode_count:
711
+ value: 10028
712
+ step: 49518
713
+ walltime: null
714
+ collector_iter/total_duration:
715
+ value: 106945.62145317174
716
+ step: 5000
717
+ walltime: null
718
+ collector_step/total_duration:
719
+ value: 106945.62145317174
720
+ step: 49518
721
+ walltime: null
722
+ collector_iter/visit_entropy_mean:
723
+ value: 1.412281816542291
724
+ step: 5000
725
+ walltime: null
726
+ collector_step/visit_entropy_mean:
727
+ value: 1.412281816542291
728
+ step: 49518
729
+ walltime: null
730
+ envstep_55000.pth.tar:
731
+ checkpoint_name: envstep_55000.pth.tar
732
+ checkpoint_path: /workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/ckpt/envstep_55000.pth.tar
733
+ saved_at: '2026-05-19T13:34:59.497817+00:00'
734
+ train_iter: 5558
735
+ envstep: 55000
736
+ trigger: envstep
737
+ latest_scalars:
738
+ evaluator_iter/episode_count:
739
+ value: 10
740
+ step: 5052
741
+ walltime: null
742
+ evaluator_step/episode_count:
743
+ value: 10
744
+ step: 50014
745
+ walltime: null
746
+ evaluator_iter/envstep_count:
747
+ value: 26
748
+ step: 5052
749
+ walltime: null
750
+ evaluator_step/envstep_count:
751
+ value: 26
752
+ step: 50014
753
+ walltime: null
754
+ evaluator_iter/avg_envstep_per_episode:
755
+ value: 2.6
756
+ step: 5052
757
+ walltime: null
758
+ evaluator_step/avg_envstep_per_episode:
759
+ value: 2.6
760
+ step: 50014
761
+ walltime: null
762
+ evaluator_iter/evaluate_time:
763
+ value: 1.2514569091796874
764
+ step: 5052
765
+ walltime: null
766
+ evaluator_step/evaluate_time:
767
+ value: 1.2514569091796874
768
+ step: 50014
769
+ walltime: null
770
+ evaluator_iter/avg_envstep_per_sec:
771
+ value: 20.775785254198354
772
+ step: 5052
773
+ walltime: null
774
+ evaluator_step/avg_envstep_per_sec:
775
+ value: 20.775785254198354
776
+ step: 50014
777
+ walltime: null
778
+ evaluator_iter/avg_time_per_episode:
779
+ value: 7.990686636230136
780
+ step: 5052
781
+ walltime: null
782
+ evaluator_step/avg_time_per_episode:
783
+ value: 7.990686636230136
784
+ step: 50014
785
+ walltime: null
786
+ evaluator_iter/reward_mean:
787
+ value: 0.2
788
+ step: 5052
789
+ walltime: null
790
+ evaluator_step/reward_mean:
791
+ value: 0.2
792
+ step: 50014
793
+ walltime: null
794
+ evaluator_iter/reward_std:
795
+ value: 0.9797958971132713
796
+ step: 5052
797
+ walltime: null
798
+ evaluator_step/reward_std:
799
+ value: 0.9797958971132713
800
+ step: 50014
801
+ walltime: null
802
+ evaluator_iter/reward_max:
803
+ value: 1.0
804
+ step: 5052
805
+ walltime: null
806
+ evaluator_step/reward_max:
807
+ value: 1.0
808
+ step: 50014
809
+ walltime: null
810
+ evaluator_iter/reward_min:
811
+ value: -1.0
812
+ step: 5052
813
+ walltime: null
814
+ evaluator_step/reward_min:
815
+ value: -1.0
816
+ step: 50014
817
+ walltime: null
818
+ Buffer/Task_0/num_collected_episodes:
819
+ value: 11140
820
+ step: 5558
821
+ walltime: null
822
+ Buffer/Task_0/num_game_segments:
823
+ value: 10128
824
+ step: 5558
825
+ walltime: null
826
+ Buffer/Task_0/num_transitions:
827
+ value: 49996
828
+ step: 5558
829
+ walltime: null
830
+ Buffer/Task_0/memory_usage_mb/game_segment_buffer:
831
+ value: 48.135398864746094
832
+ step: 5558
833
+ walltime: null
834
+ Buffer/Task_0/memory_usage_mb/process:
835
+ value: 2096.26953125
836
+ step: 5558
837
+ walltime: null
838
+ learner_iter/collect_mcts_temperature_avg:
839
+ value: 0.25
840
+ step: 5500
841
+ walltime: null
842
+ learner_step/collect_mcts_temperature_avg:
843
+ value: 0.25
844
+ step: 54442
845
+ walltime: null
846
+ learner_iter/cur_lr_avg:
847
+ value: 0.0029999999999999996
848
+ step: 5500
849
+ walltime: null
850
+ learner_step/cur_lr_avg:
851
+ value: 0.0029999999999999996
852
+ step: 54442
853
+ walltime: null
854
+ learner_iter/weighted_total_loss_avg:
855
+ value: 1.7596894177523525
856
+ step: 5500
857
+ walltime: null
858
+ learner_step/weighted_total_loss_avg:
859
+ value: 1.7596894177523525
860
+ step: 54442
861
+ walltime: null
862
+ learner_iter/total_loss_avg:
863
+ value: 1.7596894177523525
864
+ step: 5500
865
+ walltime: null
866
+ learner_step/total_loss_avg:
867
+ value: 1.7596894177523525
868
+ step: 54442
869
+ walltime: null
870
+ learner_iter/policy_loss_avg:
871
+ value: 4.986965179443359
872
+ step: 5500
873
+ walltime: null
874
+ learner_step/policy_loss_avg:
875
+ value: 4.986965179443359
876
+ step: 54442
877
+ walltime: null
878
+ learner_iter/policy_entropy_avg:
879
+ value: 1.894639072996197
880
+ step: 5500
881
+ walltime: null
882
+ learner_step/policy_entropy_avg:
883
+ value: 1.894639072996197
884
+ step: 54442
885
+ walltime: null
886
+ learner_iter/target_policy_entropy_avg:
887
+ value: 1.4588232184901384
888
+ step: 5500
889
+ walltime: null
890
+ learner_step/target_policy_entropy_avg:
891
+ value: 1.4588232184901384
892
+ step: 54442
893
+ walltime: null
894
+ learner_iter/value_prefix_loss_avg:
895
+ value: 2.9177387194199995
896
+ step: 5500
897
+ walltime: null
898
+ learner_step/value_prefix_loss_avg:
899
+ value: 2.9177387194199995
900
+ step: 54442
901
+ walltime: null
902
+ learner_iter/value_loss_avg:
903
+ value: 2.873926422812722
904
+ step: 5500
905
+ walltime: null
906
+ learner_step/value_loss_avg:
907
+ value: 2.873926422812722
908
+ step: 54442
909
+ walltime: null
910
+ learner_iter/consistency_loss_avg:
911
+ value: -0.6863496086814187
912
+ step: 5500
913
+ walltime: null
914
+ learner_step/consistency_loss_avg:
915
+ value: -0.6863496086814187
916
+ step: 54442
917
+ walltime: null
918
+ learner_iter/value_priority_avg:
919
+ value: 0.24085114083506845
920
+ step: 5500
921
+ walltime: null
922
+ learner_step/value_priority_avg:
923
+ value: 0.24085114083506845
924
+ step: 54442
925
+ walltime: null
926
+ learner_iter/target_value_prefix_avg:
927
+ value: 0.6663115837357261
928
+ step: 5500
929
+ walltime: null
930
+ learner_step/target_value_prefix_avg:
931
+ value: 0.6663115837357261
932
+ step: 54442
933
+ walltime: null
934
+ learner_iter/target_value_avg:
935
+ value: 0.03053977374326099
936
+ step: 5500
937
+ walltime: null
938
+ learner_step/target_value_avg:
939
+ value: 0.03053977374326099
940
+ step: 54442
941
+ walltime: null
942
+ learner_iter/predicted_value_prefixs_avg:
943
+ value: 0.6696722832593051
944
+ step: 5500
945
+ walltime: null
946
+ learner_step/predicted_value_prefixs_avg:
947
+ value: 0.6696722832593051
948
+ step: 54442
949
+ walltime: null
950
+ learner_iter/predicted_values_avg:
951
+ value: 0.034540297759866175
952
+ step: 5500
953
+ walltime: null
954
+ learner_step/predicted_values_avg:
955
+ value: 0.034540297759866175
956
+ step: 54442
957
+ walltime: null
958
+ learner_iter/transformed_target_value_prefix_avg:
959
+ value: 0.2766615694219416
960
+ step: 5500
961
+ walltime: null
962
+ learner_step/transformed_target_value_prefix_avg:
963
+ value: 0.2766615694219416
964
+ step: 54442
965
+ walltime: null
966
+ learner_iter/transformed_target_value_avg:
967
+ value: 0.01268052618781274
968
+ step: 5500
969
+ walltime: null
970
+ learner_step/transformed_target_value_avg:
971
+ value: 0.01268052618781274
972
+ step: 54442
973
+ walltime: null
974
+ learner_iter/total_grad_norm_before_clip_avg:
975
+ value: 1.5357053916562686
976
+ step: 5500
977
+ walltime: null
978
+ learner_step/total_grad_norm_before_clip_avg:
979
+ value: 1.5357053916562686
980
+ step: 54442
981
+ walltime: null
982
+ collector_iter/episode_count:
983
+ value: 200
984
+ step: 5500
985
+ walltime: null
986
+ collector_step/episode_count:
987
+ value: 200
988
+ step: 54442
989
+ walltime: null
990
+ collector_iter/envstep_count:
991
+ value: 983
992
+ step: 5500
993
+ walltime: null
994
+ collector_step/envstep_count:
995
+ value: 983
996
+ step: 54442
997
+ walltime: null
998
+ collector_iter/avg_envstep_per_episode:
999
+ value: 4.915
1000
+ step: 5500
1001
+ walltime: null
1002
+ collector_step/avg_envstep_per_episode:
1003
+ value: 4.915
1004
+ step: 54442
1005
+ walltime: null
1006
+ collector_iter/avg_envstep_per_sec:
1007
+ value: 12.58463828403227
1008
+ step: 5500
1009
+ walltime: null
1010
+ collector_step/avg_envstep_per_sec:
1011
+ value: 12.58463828403227
1012
+ step: 54442
1013
+ walltime: null
1014
+ collector_iter/avg_episode_per_sec:
1015
+ value: 2.560455398582354
1016
+ step: 5500
1017
+ walltime: null
1018
+ collector_step/avg_episode_per_sec:
1019
+ value: 2.560455398582354
1020
+ step: 54442
1021
+ walltime: null
1022
+ collector_iter/collect_time:
1023
+ value: 78.11110481000134
1024
+ step: 5500
1025
+ walltime: null
1026
+ collector_step/collect_time:
1027
+ value: 78.11110481000134
1028
+ step: 54442
1029
+ walltime: null
1030
+ collector_iter/reward_mean:
1031
+ value: 1.0
1032
+ step: 5500
1033
+ walltime: null
1034
+ collector_step/reward_mean:
1035
+ value: 1.0
1036
+ step: 54442
1037
+ walltime: null
1038
+ collector_iter/reward_std:
1039
+ value: 0.0
1040
+ step: 5500
1041
+ walltime: null
1042
+ collector_step/reward_std:
1043
+ value: 0.0
1044
+ step: 54442
1045
+ walltime: null
1046
+ collector_iter/reward_max:
1047
+ value: 1.0
1048
+ step: 5500
1049
+ walltime: null
1050
+ collector_step/reward_max:
1051
+ value: 1.0
1052
+ step: 54442
1053
+ walltime: null
1054
+ collector_iter/reward_min:
1055
+ value: 1.0
1056
+ step: 5500
1057
+ walltime: null
1058
+ collector_step/reward_min:
1059
+ value: 1.0
1060
+ step: 54442
1061
+ walltime: null
1062
+ collector_iter/total_envstep_count:
1063
+ value: 54442
1064
+ step: 5500
1065
+ walltime: null
1066
+ collector_step/total_envstep_count:
1067
+ value: 54442
1068
+ step: 54442
1069
+ walltime: null
1070
+ collector_iter/total_episode_count:
1071
+ value: 11028
1072
+ step: 5500
1073
+ walltime: null
1074
+ collector_step/total_episode_count:
1075
+ value: 11028
1076
+ step: 54442
1077
+ walltime: null
1078
+ collector_iter/total_duration:
1079
+ value: 116580.37703669704
1080
+ step: 5500
1081
+ walltime: null
1082
+ collector_step/total_duration:
1083
+ value: 116580.37703669704
1084
+ step: 54442
1085
+ walltime: null
1086
+ collector_iter/visit_entropy_mean:
1087
+ value: 1.18402243386069
1088
+ step: 5500
1089
+ walltime: null
1090
+ collector_step/visit_entropy_mean:
1091
+ value: 1.18402243386069
1092
+ step: 54442
1093
+ walltime: null
1094
+ envstep_60000.pth.tar:
1095
+ checkpoint_name: envstep_60000.pth.tar
1096
+ checkpoint_path: /workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/ckpt/envstep_60000.pth.tar
1097
+ saved_at: '2026-05-19T13:59:49.254427+00:00'
1098
+ train_iter: 6066
1099
+ envstep: 60000
1100
+ trigger: envstep
1101
+ latest_scalars:
1102
+ evaluator_iter/episode_count:
1103
+ value: 10
1104
+ step: 5560
1105
+ walltime: null
1106
+ evaluator_step/episode_count:
1107
+ value: 10
1108
+ step: 55013
1109
+ walltime: null
1110
+ evaluator_iter/envstep_count:
1111
+ value: 24
1112
+ step: 5560
1113
+ walltime: null
1114
+ evaluator_step/envstep_count:
1115
+ value: 24
1116
+ step: 55013
1117
+ walltime: null
1118
+ evaluator_iter/avg_envstep_per_episode:
1119
+ value: 2.4
1120
+ step: 5560
1121
+ walltime: null
1122
+ evaluator_step/avg_envstep_per_episode:
1123
+ value: 2.4
1124
+ step: 55013
1125
+ walltime: null
1126
+ evaluator_iter/evaluate_time:
1127
+ value: 1.1975067138671875
1128
+ step: 5560
1129
+ walltime: null
1130
+ evaluator_step/evaluate_time:
1131
+ value: 1.1975067138671875
1132
+ step: 55013
1133
+ walltime: null
1134
+ evaluator_iter/avg_envstep_per_sec:
1135
+ value: 20.041641288586362
1136
+ step: 5560
1137
+ walltime: null
1138
+ evaluator_step/avg_envstep_per_sec:
1139
+ value: 20.041641288586362
1140
+ step: 55013
1141
+ walltime: null
1142
+ evaluator_iter/avg_time_per_episode:
1143
+ value: 8.350683870244318
1144
+ step: 5560
1145
+ walltime: null
1146
+ evaluator_step/avg_time_per_episode:
1147
+ value: 8.350683870244318
1148
+ step: 55013
1149
+ walltime: null
1150
+ evaluator_iter/reward_mean:
1151
+ value: -0.2
1152
+ step: 5560
1153
+ walltime: null
1154
+ evaluator_step/reward_mean:
1155
+ value: -0.2
1156
+ step: 55013
1157
+ walltime: null
1158
+ evaluator_iter/reward_std:
1159
+ value: 0.9797958971132713
1160
+ step: 5560
1161
+ walltime: null
1162
+ evaluator_step/reward_std:
1163
+ value: 0.9797958971132713
1164
+ step: 55013
1165
+ walltime: null
1166
+ evaluator_iter/reward_max:
1167
+ value: 1.0
1168
+ step: 5560
1169
+ walltime: null
1170
+ evaluator_step/reward_max:
1171
+ value: 1.0
1172
+ step: 55013
1173
+ walltime: null
1174
+ evaluator_iter/reward_min:
1175
+ value: -1.0
1176
+ step: 5560
1177
+ walltime: null
1178
+ evaluator_step/reward_min:
1179
+ value: -1.0
1180
+ step: 55013
1181
+ walltime: null
1182
+ Buffer/Task_0/num_collected_episodes:
1183
+ value: 12156
1184
+ step: 6066
1185
+ walltime: null
1186
+ Buffer/Task_0/num_game_segments:
1187
+ value: 10128
1188
+ step: 6066
1189
+ walltime: null
1190
+ Buffer/Task_0/num_transitions:
1191
+ value: 49998
1192
+ step: 6066
1193
+ walltime: null
1194
+ Buffer/Task_0/memory_usage_mb/game_segment_buffer:
1195
+ value: 48.13629913330078
1196
+ step: 6066
1197
+ walltime: null
1198
+ Buffer/Task_0/memory_usage_mb/process:
1199
+ value: 2095.19140625
1200
+ step: 6066
1201
+ walltime: null
1202
+ learner_iter/collect_mcts_temperature_avg:
1203
+ value: 0.25
1204
+ step: 6000
1205
+ walltime: null
1206
+ learner_step/collect_mcts_temperature_avg:
1207
+ value: 0.25
1208
+ step: 59369
1209
+ walltime: null
1210
+ learner_iter/cur_lr_avg:
1211
+ value: 0.0029999999999999996
1212
+ step: 6000
1213
+ walltime: null
1214
+ learner_step/cur_lr_avg:
1215
+ value: 0.0029999999999999996
1216
+ step: 59369
1217
+ walltime: null
1218
+ learner_iter/weighted_total_loss_avg:
1219
+ value: 1.879230412569913
1220
+ step: 6000
1221
+ walltime: null
1222
+ learner_step/weighted_total_loss_avg:
1223
+ value: 1.879230412569913
1224
+ step: 59369
1225
+ walltime: null
1226
+ learner_iter/total_loss_avg:
1227
+ value: 1.879230412569913
1228
+ step: 6000
1229
+ walltime: null
1230
+ learner_step/total_loss_avg:
1231
+ value: 1.879230412569913
1232
+ step: 59369
1233
+ walltime: null
1234
+ learner_iter/policy_loss_avg:
1235
+ value: 4.972260865298185
1236
+ step: 6000
1237
+ walltime: null
1238
+ learner_step/policy_loss_avg:
1239
+ value: 4.972260865298185
1240
+ step: 59369
1241
+ walltime: null
1242
+ learner_iter/policy_entropy_avg:
1243
+ value: 1.8664040565490723
1244
+ step: 6000
1245
+ walltime: null
1246
+ learner_step/policy_entropy_avg:
1247
+ value: 1.8664040565490723
1248
+ step: 59369
1249
+ walltime: null
1250
+ learner_iter/target_policy_entropy_avg:
1251
+ value: 1.4537174051458186
1252
+ step: 6000
1253
+ walltime: null
1254
+ learner_step/target_policy_entropy_avg:
1255
+ value: 1.4537174051458186
1256
+ step: 59369
1257
+ walltime: null
1258
+ learner_iter/value_prefix_loss_avg:
1259
+ value: 2.927896196191961
1260
+ step: 6000
1261
+ walltime: null
1262
+ learner_step/value_prefix_loss_avg:
1263
+ value: 2.927896196191961
1264
+ step: 59369
1265
+ walltime: null
1266
+ learner_iter/value_loss_avg:
1267
+ value: 2.866024515845559
1268
+ step: 6000
1269
+ walltime: null
1270
+ learner_step/value_loss_avg:
1271
+ value: 2.866024515845559
1272
+ step: 59369
1273
+ walltime: null
1274
+ learner_iter/consistency_loss_avg:
1275
+ value: -0.6737432783300227
1276
+ step: 6000
1277
+ walltime: null
1278
+ learner_step/consistency_loss_avg:
1279
+ value: -0.6737432783300227
1280
+ step: 59369
1281
+ walltime: null
1282
+ learner_iter/value_priority_avg:
1283
+ value: 0.25058447366411035
1284
+ step: 6000
1285
+ walltime: null
1286
+ learner_step/value_priority_avg:
1287
+ value: 0.25058447366411035
1288
+ step: 59369
1289
+ walltime: null
1290
+ learner_iter/target_value_prefix_avg:
1291
+ value: 0.6677320220253684
1292
+ step: 6000
1293
+ walltime: null
1294
+ learner_step/target_value_prefix_avg:
1295
+ value: 0.6677320220253684
1296
+ step: 59369
1297
+ walltime: null
1298
+ learner_iter/target_value_avg:
1299
+ value: 0.032315341755747795
1300
+ step: 6000
1301
+ walltime: null
1302
+ learner_step/target_value_avg:
1303
+ value: 0.032315341755747795
1304
+ step: 59369
1305
+ walltime: null
1306
+ learner_iter/predicted_value_prefixs_avg:
1307
+ value: 0.6754332509907809
1308
+ step: 6000
1309
+ walltime: null
1310
+ learner_step/predicted_value_prefixs_avg:
1311
+ value: 0.6754332509907809
1312
+ step: 59369
1313
+ walltime: null
1314
+ learner_iter/predicted_values_avg:
1315
+ value: 0.028052480171688578
1316
+ step: 6000
1317
+ walltime: null
1318
+ learner_step/predicted_values_avg:
1319
+ value: 0.028052480171688578
1320
+ step: 59369
1321
+ walltime: null
1322
+ learner_iter/transformed_target_value_prefix_avg:
1323
+ value: 0.2772513573819941
1324
+ step: 6000
1325
+ walltime: null
1326
+ learner_step/transformed_target_value_prefix_avg:
1327
+ value: 0.2772513573819941
1328
+ step: 59369
1329
+ walltime: null
1330
+ learner_iter/transformed_target_value_avg:
1331
+ value: 0.013417766090821136
1332
+ step: 6000
1333
+ walltime: null
1334
+ learner_step/transformed_target_value_avg:
1335
+ value: 0.013417766090821136
1336
+ step: 59369
1337
+ walltime: null
1338
+ learner_iter/total_grad_norm_before_clip_avg:
1339
+ value: 2.765905033458363
1340
+ step: 6000
1341
+ walltime: null
1342
+ learner_step/total_grad_norm_before_clip_avg:
1343
+ value: 2.765905033458363
1344
+ step: 59369
1345
+ walltime: null
1346
+ collector_iter/episode_count:
1347
+ value: 200
1348
+ step: 6000
1349
+ walltime: null
1350
+ collector_step/episode_count:
1351
+ value: 200
1352
+ step: 59369
1353
+ walltime: null
1354
+ collector_iter/envstep_count:
1355
+ value: 986
1356
+ step: 6000
1357
+ walltime: null
1358
+ collector_step/envstep_count:
1359
+ value: 986
1360
+ step: 59369
1361
+ walltime: null
1362
+ collector_iter/avg_envstep_per_episode:
1363
+ value: 4.93
1364
+ step: 6000
1365
+ walltime: null
1366
+ collector_step/avg_envstep_per_episode:
1367
+ value: 4.93
1368
+ step: 59369
1369
+ walltime: null
1370
+ collector_iter/avg_envstep_per_sec:
1371
+ value: 12.761187161979313
1372
+ step: 6000
1373
+ walltime: null
1374
+ collector_step/avg_envstep_per_sec:
1375
+ value: 12.761187161979313
1376
+ step: 59369
1377
+ walltime: null
1378
+ collector_iter/avg_episode_per_sec:
1379
+ value: 2.5884760977645667
1380
+ step: 6000
1381
+ walltime: null
1382
+ collector_step/avg_episode_per_sec:
1383
+ value: 2.5884760977645667
1384
+ step: 59369
1385
+ walltime: null
1386
+ collector_iter/collect_time:
1387
+ value: 77.26553865910601
1388
+ step: 6000
1389
+ walltime: null
1390
+ collector_step/collect_time:
1391
+ value: 77.26553865910601
1392
+ step: 59369
1393
+ walltime: null
1394
+ collector_iter/reward_mean:
1395
+ value: 1.0
1396
+ step: 6000
1397
+ walltime: null
1398
+ collector_step/reward_mean:
1399
+ value: 1.0
1400
+ step: 59369
1401
+ walltime: null
1402
+ collector_iter/reward_std:
1403
+ value: 0.0
1404
+ step: 6000
1405
+ walltime: null
1406
+ collector_step/reward_std:
1407
+ value: 0.0
1408
+ step: 59369
1409
+ walltime: null
1410
+ collector_iter/reward_max:
1411
+ value: 1.0
1412
+ step: 6000
1413
+ walltime: null
1414
+ collector_step/reward_max:
1415
+ value: 1.0
1416
+ step: 59369
1417
+ walltime: null
1418
+ collector_iter/reward_min:
1419
+ value: 1.0
1420
+ step: 6000
1421
+ walltime: null
1422
+ collector_step/reward_min:
1423
+ value: 1.0
1424
+ step: 59369
1425
+ walltime: null
1426
+ collector_iter/total_envstep_count:
1427
+ value: 59369
1428
+ step: 6000
1429
+ walltime: null
1430
+ collector_step/total_envstep_count:
1431
+ value: 59369
1432
+ step: 59369
1433
+ walltime: null
1434
+ collector_iter/total_episode_count:
1435
+ value: 12028
1436
+ step: 6000
1437
+ walltime: null
1438
+ collector_step/total_episode_count:
1439
+ value: 12028
1440
+ step: 59369
1441
+ walltime: null
1442
+ collector_iter/total_duration:
1443
+ value: 126609.39995060852
1444
+ step: 6000
1445
+ walltime: null
1446
+ collector_step/total_duration:
1447
+ value: 126609.39995060852
1448
+ step: 59369
1449
+ walltime: null
1450
+ collector_iter/visit_entropy_mean:
1451
+ value: 1.5180198430404364
1452
+ step: 6000
1453
+ walltime: null
1454
+ collector_step/visit_entropy_mean:
1455
+ value: 1.5180198430404364
1456
+ step: 59369
1457
+ walltime: null
1458
+ envstep_65000.pth.tar:
1459
+ checkpoint_name: envstep_65000.pth.tar
1460
+ checkpoint_path: /workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/ckpt/envstep_65000.pth.tar
1461
+ saved_at: '2026-05-19T14:24:08.608692+00:00'
1462
+ train_iter: 6574
1463
+ envstep: 65000
1464
+ trigger: envstep
1465
+ latest_scalars:
1466
+ evaluator_iter/episode_count:
1467
+ value: 10
1468
+ step: 6068
1469
+ walltime: null
1470
+ evaluator_step/episode_count:
1471
+ value: 10
1472
+ step: 60016
1473
+ walltime: null
1474
+ evaluator_iter/envstep_count:
1475
+ value: 30
1476
+ step: 6068
1477
+ walltime: null
1478
+ evaluator_step/envstep_count:
1479
+ value: 30
1480
+ step: 60016
1481
+ walltime: null
1482
+ evaluator_iter/avg_envstep_per_episode:
1483
+ value: 3.0
1484
+ step: 6068
1485
+ walltime: null
1486
+ evaluator_step/avg_envstep_per_episode:
1487
+ value: 3.0
1488
+ step: 60016
1489
+ walltime: null
1490
+ evaluator_iter/evaluate_time:
1491
+ value: 1.117225341796875
1492
+ step: 6068
1493
+ walltime: null
1494
+ evaluator_step/evaluate_time:
1495
+ value: 1.117225341796875
1496
+ step: 60016
1497
+ walltime: null
1498
+ evaluator_iter/avg_envstep_per_sec:
1499
+ value: 26.852237304024886
1500
+ step: 6068
1501
+ walltime: null
1502
+ evaluator_step/avg_envstep_per_sec:
1503
+ value: 26.852237304024886
1504
+ step: 60016
1505
+ walltime: null
1506
+ evaluator_iter/avg_time_per_episode:
1507
+ value: 8.950745768008296
1508
+ step: 6068
1509
+ walltime: null
1510
+ evaluator_step/avg_time_per_episode:
1511
+ value: 8.950745768008296
1512
+ step: 60016
1513
+ walltime: null
1514
+ evaluator_iter/reward_mean:
1515
+ value: 1.0
1516
+ step: 6068
1517
+ walltime: null
1518
+ evaluator_step/reward_mean:
1519
+ value: 1.0
1520
+ step: 60016
1521
+ walltime: null
1522
+ evaluator_iter/reward_std:
1523
+ value: 0.0
1524
+ step: 6068
1525
+ walltime: null
1526
+ evaluator_step/reward_std:
1527
+ value: 0.0
1528
+ step: 60016
1529
+ walltime: null
1530
+ evaluator_iter/reward_max:
1531
+ value: 1.0
1532
+ step: 6068
1533
+ walltime: null
1534
+ evaluator_step/reward_max:
1535
+ value: 1.0
1536
+ step: 60016
1537
+ walltime: null
1538
+ evaluator_iter/reward_min:
1539
+ value: 1.0
1540
+ step: 6068
1541
+ walltime: null
1542
+ evaluator_step/reward_min:
1543
+ value: 1.0
1544
+ step: 60016
1545
+ walltime: null
1546
+ Buffer/Task_0/num_collected_episodes:
1547
+ value: 13172
1548
+ step: 6574
1549
+ walltime: null
1550
+ Buffer/Task_0/num_game_segments:
1551
+ value: 10130
1552
+ step: 6574
1553
+ walltime: null
1554
+ Buffer/Task_0/num_transitions:
1555
+ value: 49999
1556
+ step: 6574
1557
+ walltime: null
1558
+ Buffer/Task_0/memory_usage_mb/game_segment_buffer:
1559
+ value: 48.14131164550781
1560
+ step: 6574
1561
+ walltime: null
1562
+ Buffer/Task_0/memory_usage_mb/process:
1563
+ value: 2096.0234375
1564
+ step: 6574
1565
+ walltime: null
1566
+ learner_iter/collect_mcts_temperature_avg:
1567
+ value: 0.25
1568
+ step: 6500
1569
+ walltime: null
1570
+ learner_step/collect_mcts_temperature_avg:
1571
+ value: 0.25
1572
+ step: 64290
1573
+ walltime: null
1574
+ learner_iter/cur_lr_avg:
1575
+ value: 0.0029999999999999996
1576
+ step: 6500
1577
+ walltime: null
1578
+ learner_step/cur_lr_avg:
1579
+ value: 0.0029999999999999996
1580
+ step: 64290
1581
+ walltime: null
1582
+ learner_iter/weighted_total_loss_avg:
1583
+ value: 1.8613728393207898
1584
+ step: 6500
1585
+ walltime: null
1586
+ learner_step/weighted_total_loss_avg:
1587
+ value: 1.8613728393207898
1588
+ step: 64290
1589
+ walltime: null
1590
+ learner_iter/total_loss_avg:
1591
+ value: 1.8613728393207898
1592
+ step: 6500
1593
+ walltime: null
1594
+ learner_step/total_loss_avg:
1595
+ value: 1.8613728393207898
1596
+ step: 64290
1597
+ walltime: null
1598
+ learner_iter/policy_loss_avg:
1599
+ value: 4.957675196907737
1600
+ step: 6500
1601
+ walltime: null
1602
+ learner_step/policy_loss_avg:
1603
+ value: 4.957675196907737
1604
+ step: 64290
1605
+ walltime: null
1606
+ learner_iter/policy_entropy_avg:
1607
+ value: 1.8809896237922439
1608
+ step: 6500
1609
+ walltime: null
1610
+ learner_step/policy_entropy_avg:
1611
+ value: 1.8809896237922439
1612
+ step: 64290
1613
+ walltime: null
1614
+ learner_iter/target_policy_entropy_avg:
1615
+ value: 1.4509589744336677
1616
+ step: 6500
1617
+ walltime: null
1618
+ learner_step/target_policy_entropy_avg:
1619
+ value: 1.4509589744336677
1620
+ step: 64290
1621
+ walltime: null
1622
+ learner_iter/value_prefix_loss_avg:
1623
+ value: 2.9388047565113413
1624
+ step: 6500
1625
+ walltime: null
1626
+ learner_step/value_prefix_loss_avg:
1627
+ value: 2.9388047565113413
1628
+ step: 64290
1629
+ walltime: null
1630
+ learner_iter/value_loss_avg:
1631
+ value: 2.862761367451061
1632
+ step: 6500
1633
+ walltime: null
1634
+ learner_step/value_loss_avg:
1635
+ value: 2.862761367451061
1636
+ step: 64290
1637
+ walltime: null
1638
+ learner_iter/consistency_loss_avg:
1639
+ value: -0.6750797401775013
1640
+ step: 6500
1641
+ walltime: null
1642
+ learner_step/consistency_loss_avg:
1643
+ value: -0.6750797401775013
1644
+ step: 64290
1645
+ walltime: null
1646
+ learner_iter/value_priority_avg:
1647
+ value: 0.3312423215671019
1648
+ step: 6500
1649
+ walltime: null
1650
+ learner_step/value_priority_avg:
1651
+ value: 0.3312423215671019
1652
+ step: 64290
1653
+ walltime: null
1654
+ learner_iter/target_value_prefix_avg:
1655
+ value: 0.6698627146807584
1656
+ step: 6500
1657
+ walltime: null
1658
+ learner_step/target_value_prefix_avg:
1659
+ value: 0.6698627146807584
1660
+ step: 64290
1661
+ walltime: null
1662
+ learner_iter/target_value_avg:
1663
+ value: 0.030066288838332348
1664
+ step: 6500
1665
+ walltime: null
1666
+ learner_step/target_value_avg:
1667
+ value: 0.030066288838332348
1668
+ step: 64290
1669
+ walltime: null
1670
+ learner_iter/predicted_value_prefixs_avg:
1671
+ value: 0.6871472651308234
1672
+ step: 6500
1673
+ walltime: null
1674
+ learner_step/predicted_value_prefixs_avg:
1675
+ value: 0.6871472651308234
1676
+ step: 64290
1677
+ walltime: null
1678
+ learner_iter/predicted_values_avg:
1679
+ value: 0.0084923032713546
1680
+ step: 6500
1681
+ walltime: null
1682
+ learner_step/predicted_values_avg:
1683
+ value: 0.0084923032713546
1684
+ step: 64290
1685
+ walltime: null
1686
+ learner_iter/transformed_target_value_prefix_avg:
1687
+ value: 0.2781360393220728
1688
+ step: 6500
1689
+ walltime: null
1690
+ learner_step/transformed_target_value_prefix_avg:
1691
+ value: 0.2781360393220728
1692
+ step: 64290
1693
+ walltime: null
1694
+ learner_iter/transformed_target_value_avg:
1695
+ value: 0.012483929410915483
1696
+ step: 6500
1697
+ walltime: null
1698
+ learner_step/transformed_target_value_avg:
1699
+ value: 0.012483929410915483
1700
+ step: 64290
1701
+ walltime: null
1702
+ learner_iter/total_grad_norm_before_clip_avg:
1703
+ value: 4.061127765612169
1704
+ step: 6500
1705
+ walltime: null
1706
+ learner_step/total_grad_norm_before_clip_avg:
1707
+ value: 4.061127765612169
1708
+ step: 64290
1709
+ walltime: null
1710
+ collector_iter/episode_count:
1711
+ value: 200
1712
+ step: 6500
1713
+ walltime: null
1714
+ collector_step/episode_count:
1715
+ value: 200
1716
+ step: 64290
1717
+ walltime: null
1718
+ collector_iter/envstep_count:
1719
+ value: 990
1720
+ step: 6500
1721
+ walltime: null
1722
+ collector_step/envstep_count:
1723
+ value: 990
1724
+ step: 64290
1725
+ walltime: null
1726
+ collector_iter/avg_envstep_per_episode:
1727
+ value: 4.95
1728
+ step: 6500
1729
+ walltime: null
1730
+ collector_step/avg_envstep_per_episode:
1731
+ value: 4.95
1732
+ step: 64290
1733
+ walltime: null
1734
+ collector_iter/avg_envstep_per_sec:
1735
+ value: 12.667328153967981
1736
+ step: 6500
1737
+ walltime: null
1738
+ collector_step/avg_envstep_per_sec:
1739
+ value: 12.667328153967981
1740
+ step: 64290
1741
+ walltime: null
1742
+ collector_iter/avg_episode_per_sec:
1743
+ value: 2.559056192720804
1744
+ step: 6500
1745
+ walltime: null
1746
+ collector_step/avg_episode_per_sec:
1747
+ value: 2.559056192720804
1748
+ step: 64290
1749
+ walltime: null
1750
+ collector_iter/collect_time:
1751
+ value: 78.15381333512602
1752
+ step: 6500
1753
+ walltime: null
1754
+ collector_step/collect_time:
1755
+ value: 78.15381333512602
1756
+ step: 64290
1757
+ walltime: null
1758
+ collector_iter/reward_mean:
1759
+ value: 1.0
1760
+ step: 6500
1761
+ walltime: null
1762
+ collector_step/reward_mean:
1763
+ value: 1.0
1764
+ step: 64290
1765
+ walltime: null
1766
+ collector_iter/reward_std:
1767
+ value: 0.0
1768
+ step: 6500
1769
+ walltime: null
1770
+ collector_step/reward_std:
1771
+ value: 0.0
1772
+ step: 64290
1773
+ walltime: null
1774
+ collector_iter/reward_max:
1775
+ value: 1.0
1776
+ step: 6500
1777
+ walltime: null
1778
+ collector_step/reward_max:
1779
+ value: 1.0
1780
+ step: 64290
1781
+ walltime: null
1782
+ collector_iter/reward_min:
1783
+ value: 1.0
1784
+ step: 6500
1785
+ walltime: null
1786
+ collector_step/reward_min:
1787
+ value: 1.0
1788
+ step: 64290
1789
+ walltime: null
1790
+ collector_iter/total_envstep_count:
1791
+ value: 64290
1792
+ step: 6500
1793
+ walltime: null
1794
+ collector_step/total_envstep_count:
1795
+ value: 64290
1796
+ step: 64290
1797
+ walltime: null
1798
+ collector_iter/total_episode_count:
1799
+ value: 13028
1800
+ step: 6500
1801
+ walltime: null
1802
+ collector_step/total_episode_count:
1803
+ value: 13028
1804
+ step: 64290
1805
+ walltime: null
1806
+ collector_iter/total_duration:
1807
+ value: 136729.9372411066
1808
+ step: 6500
1809
+ walltime: null
1810
+ collector_step/total_duration:
1811
+ value: 136729.9372411066
1812
+ step: 64290
1813
+ walltime: null
1814
+ collector_iter/visit_entropy_mean:
1815
+ value: 1.6745987191528302
1816
+ step: 6500
1817
+ walltime: null
1818
+ collector_step/visit_entropy_mean:
1819
+ value: 1.6745987191528302
1820
+ step: 64290
1821
+ walltime: null
1822
+ envstep_70000.pth.tar:
1823
+ checkpoint_name: envstep_70000.pth.tar
1824
+ checkpoint_path: /workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/ckpt/envstep_70000.pth.tar
1825
+ saved_at: '2026-05-19T14:48:45.882215+00:00'
1826
+ train_iter: 7080
1827
+ envstep: 70000
1828
+ trigger: envstep
1829
+ latest_scalars:
1830
+ evaluator_iter/episode_count:
1831
+ value: 10
1832
+ step: 6576
1833
+ walltime: null
1834
+ evaluator_step/episode_count:
1835
+ value: 10
1836
+ step: 65017
1837
+ walltime: null
1838
+ evaluator_iter/envstep_count:
1839
+ value: 29
1840
+ step: 6576
1841
+ walltime: null
1842
+ evaluator_step/envstep_count:
1843
+ value: 29
1844
+ step: 65017
1845
+ walltime: null
1846
+ evaluator_iter/avg_envstep_per_episode:
1847
+ value: 2.9
1848
+ step: 6576
1849
+ walltime: null
1850
+ evaluator_step/avg_envstep_per_episode:
1851
+ value: 2.9
1852
+ step: 65017
1853
+ walltime: null
1854
+ evaluator_iter/evaluate_time:
1855
+ value: 1.42638671875
1856
+ step: 6576
1857
+ walltime: null
1858
+ evaluator_step/evaluate_time:
1859
+ value: 1.42638671875
1860
+ step: 65017
1861
+ walltime: null
1862
+ evaluator_iter/avg_envstep_per_sec:
1863
+ value: 20.331092275882845
1864
+ step: 6576
1865
+ walltime: null
1866
+ evaluator_step/avg_envstep_per_sec:
1867
+ value: 20.331092275882845
1868
+ step: 65017
1869
+ walltime: null
1870
+ evaluator_iter/avg_time_per_episode:
1871
+ value: 7.01072147444236
1872
+ step: 6576
1873
+ walltime: null
1874
+ evaluator_step/avg_time_per_episode:
1875
+ value: 7.01072147444236
1876
+ step: 65017
1877
+ walltime: null
1878
+ evaluator_iter/reward_mean:
1879
+ value: 0.8
1880
+ step: 6576
1881
+ walltime: null
1882
+ evaluator_step/reward_mean:
1883
+ value: 0.8
1884
+ step: 65017
1885
+ walltime: null
1886
+ evaluator_iter/reward_std:
1887
+ value: 0.6000000000000001
1888
+ step: 6576
1889
+ walltime: null
1890
+ evaluator_step/reward_std:
1891
+ value: 0.6000000000000001
1892
+ step: 65017
1893
+ walltime: null
1894
+ evaluator_iter/reward_max:
1895
+ value: 1.0
1896
+ step: 6576
1897
+ walltime: null
1898
+ evaluator_step/reward_max:
1899
+ value: 1.0
1900
+ step: 65017
1901
+ walltime: null
1902
+ evaluator_iter/reward_min:
1903
+ value: -1.0
1904
+ step: 6576
1905
+ walltime: null
1906
+ evaluator_step/reward_min:
1907
+ value: -1.0
1908
+ step: 65017
1909
+ walltime: null
1910
+ Buffer/Task_0/num_collected_episodes:
1911
+ value: 14184
1912
+ step: 7080
1913
+ walltime: null
1914
+ Buffer/Task_0/num_game_segments:
1915
+ value: 10133
1916
+ step: 7080
1917
+ walltime: null
1918
+ Buffer/Task_0/num_transitions:
1919
+ value: 50000
1920
+ step: 7080
1921
+ walltime: null
1922
+ Buffer/Task_0/memory_usage_mb/game_segment_buffer:
1923
+ value: 48.14851379394531
1924
+ step: 7080
1925
+ walltime: null
1926
+ Buffer/Task_0/memory_usage_mb/process:
1927
+ value: 2092.95703125
1928
+ step: 7080
1929
+ walltime: null
1930
+ learner_iter/collect_mcts_temperature_avg:
1931
+ value: 0.25
1932
+ step: 7000
1933
+ walltime: null
1934
+ learner_step/collect_mcts_temperature_avg:
1935
+ value: 0.25
1936
+ step: 69218
1937
+ walltime: null
1938
+ learner_iter/cur_lr_avg:
1939
+ value: 0.0029999999999999996
1940
+ step: 7000
1941
+ walltime: null
1942
+ learner_step/cur_lr_avg:
1943
+ value: 0.0029999999999999996
1944
+ step: 69218
1945
+ walltime: null
1946
+ learner_iter/weighted_total_loss_avg:
1947
+ value: 2.013226541605863
1948
+ step: 7000
1949
+ walltime: null
1950
+ learner_step/weighted_total_loss_avg:
1951
+ value: 2.013226541605863
1952
+ step: 69218
1953
+ walltime: null
1954
+ learner_iter/total_loss_avg:
1955
+ value: 2.013226541605863
1956
+ step: 7000
1957
+ walltime: null
1958
+ learner_step/total_loss_avg:
1959
+ value: 2.013226541605863
1960
+ step: 69218
1961
+ walltime: null
1962
+ learner_iter/policy_loss_avg:
1963
+ value: 5.11634449525313
1964
+ step: 7000
1965
+ walltime: null
1966
+ learner_step/policy_loss_avg:
1967
+ value: 5.11634449525313
1968
+ step: 69218
1969
+ walltime: null
1970
+ learner_iter/policy_entropy_avg:
1971
+ value: 1.8890340400464607
1972
+ step: 7000
1973
+ walltime: null
1974
+ learner_step/policy_entropy_avg:
1975
+ value: 1.8890340400464607
1976
+ step: 69218
1977
+ walltime: null
1978
+ learner_iter/target_policy_entropy_avg:
1979
+ value: 1.4739436380790942
1980
+ step: 7000
1981
+ walltime: null
1982
+ learner_step/target_policy_entropy_avg:
1983
+ value: 1.4739436380790942
1984
+ step: 69218
1985
+ walltime: null
1986
+ learner_iter/value_prefix_loss_avg:
1987
+ value: 2.902969251979481
1988
+ step: 7000
1989
+ walltime: null
1990
+ learner_step/value_prefix_loss_avg:
1991
+ value: 2.902969251979481
1992
+ step: 69218
1993
+ walltime: null
1994
+ learner_iter/value_loss_avg:
1995
+ value: 2.918758609078147
1996
+ step: 7000
1997
+ walltime: null
1998
+ learner_step/value_loss_avg:
1999
+ value: 2.918758609078147
2000
+ step: 69218
2001
+ walltime: null
2002
+ learner_iter/consistency_loss_avg:
2003
+ value: -0.6735776684500954
2004
+ step: 7000
2005
+ walltime: null
2006
+ learner_step/consistency_loss_avg:
2007
+ value: -0.6735776684500954
2008
+ step: 69218
2009
+ walltime: null
2010
+ learner_iter/value_priority_avg:
2011
+ value: 0.2875572307543321
2012
+ step: 7000
2013
+ walltime: null
2014
+ learner_step/value_priority_avg:
2015
+ value: 0.2875572307543321
2016
+ step: 69218
2017
+ walltime: null
2018
+ learner_iter/target_value_prefix_avg:
2019
+ value: 0.6632339249957692
2020
+ step: 7000
2021
+ walltime: null
2022
+ learner_step/target_value_prefix_avg:
2023
+ value: 0.6632339249957692
2024
+ step: 69218
2025
+ walltime: null
2026
+ learner_iter/target_value_avg:
2027
+ value: 0.029947917569767345
2028
+ step: 7000
2029
+ walltime: null
2030
+ learner_step/target_value_avg:
2031
+ value: 0.029947917569767345
2032
+ step: 69218
2033
+ walltime: null
2034
+ learner_iter/predicted_value_prefixs_avg:
2035
+ value: 0.6678378582000732
2036
+ step: 7000
2037
+ walltime: null
2038
+ learner_step/predicted_value_prefixs_avg:
2039
+ value: 0.6678378582000732
2040
+ step: 69218
2041
+ walltime: null
2042
+ learner_iter/predicted_values_avg:
2043
+ value: 0.02264841838570481
2044
+ step: 7000
2045
+ walltime: null
2046
+ learner_step/predicted_values_avg:
2047
+ value: 0.02264841838570481
2048
+ step: 69218
2049
+ walltime: null
2050
+ learner_iter/transformed_target_value_prefix_avg:
2051
+ value: 0.2753836783495816
2052
+ step: 7000
2053
+ walltime: null
2054
+ learner_step/transformed_target_value_prefix_avg:
2055
+ value: 0.2753836783495816
2056
+ step: 69218
2057
+ walltime: null
2058
+ learner_iter/transformed_target_value_avg:
2059
+ value: 0.012434780174358324
2060
+ step: 7000
2061
+ walltime: null
2062
+ learner_step/transformed_target_value_avg:
2063
+ value: 0.012434780174358324
2064
+ step: 69218
2065
+ walltime: null
2066
+ learner_iter/total_grad_norm_before_clip_avg:
2067
+ value: 3.255574109879407
2068
+ step: 7000
2069
+ walltime: null
2070
+ learner_step/total_grad_norm_before_clip_avg:
2071
+ value: 3.255574109879407
2072
+ step: 69218
2073
+ walltime: null
2074
+ collector_iter/episode_count:
2075
+ value: 200
2076
+ step: 7000
2077
+ walltime: null
2078
+ collector_step/episode_count:
2079
+ value: 200
2080
+ step: 69218
2081
+ walltime: null
2082
+ collector_iter/envstep_count:
2083
+ value: 990
2084
+ step: 7000
2085
+ walltime: null
2086
+ collector_step/envstep_count:
2087
+ value: 990
2088
+ step: 69218
2089
+ walltime: null
2090
+ collector_iter/avg_envstep_per_episode:
2091
+ value: 4.95
2092
+ step: 7000
2093
+ walltime: null
2094
+ collector_step/avg_envstep_per_episode:
2095
+ value: 4.95
2096
+ step: 69218
2097
+ walltime: null
2098
+ collector_iter/avg_envstep_per_sec:
2099
+ value: 11.998207467099792
2100
+ step: 7000
2101
+ walltime: null
2102
+ collector_step/avg_envstep_per_sec:
2103
+ value: 11.998207467099792
2104
+ step: 69218
2105
+ walltime: null
2106
+ collector_iter/avg_episode_per_sec:
2107
+ value: 2.423880296383796
2108
+ step: 7000
2109
+ walltime: null
2110
+ collector_step/avg_episode_per_sec:
2111
+ value: 2.423880296383796
2112
+ step: 69218
2113
+ walltime: null
2114
+ collector_iter/collect_time:
2115
+ value: 82.51232550484501
2116
+ step: 7000
2117
+ walltime: null
2118
+ collector_step/collect_time:
2119
+ value: 82.51232550484501
2120
+ step: 69218
2121
+ walltime: null
2122
+ collector_iter/reward_mean:
2123
+ value: 1.0
2124
+ step: 7000
2125
+ walltime: null
2126
+ collector_step/reward_mean:
2127
+ value: 1.0
2128
+ step: 69218
2129
+ walltime: null
2130
+ collector_iter/reward_std:
2131
+ value: 0.0
2132
+ step: 7000
2133
+ walltime: null
2134
+ collector_step/reward_std:
2135
+ value: 0.0
2136
+ step: 69218
2137
+ walltime: null
2138
+ collector_iter/reward_max:
2139
+ value: 1.0
2140
+ step: 7000
2141
+ walltime: null
2142
+ collector_step/reward_max:
2143
+ value: 1.0
2144
+ step: 69218
2145
+ walltime: null
2146
+ collector_iter/reward_min:
2147
+ value: 1.0
2148
+ step: 7000
2149
+ walltime: null
2150
+ collector_step/reward_min:
2151
+ value: 1.0
2152
+ step: 69218
2153
+ walltime: null
2154
+ collector_iter/total_envstep_count:
2155
+ value: 69218
2156
+ step: 7000
2157
+ walltime: null
2158
+ collector_step/total_envstep_count:
2159
+ value: 69218
2160
+ step: 69218
2161
+ walltime: null
2162
+ collector_iter/total_episode_count:
2163
+ value: 14028
2164
+ step: 7000
2165
+ walltime: null
2166
+ collector_step/total_episode_count:
2167
+ value: 14028
2168
+ step: 69218
2169
+ walltime: null
2170
+ collector_iter/total_duration:
2171
+ value: 146936.3182298701
2172
+ step: 7000
2173
+ walltime: null
2174
+ collector_step/total_duration:
2175
+ value: 146936.3182298701
2176
+ step: 69218
2177
+ walltime: null
2178
+ collector_iter/visit_entropy_mean:
2179
+ value: 1.6860720122130517
2180
+ step: 7000
2181
+ walltime: null
2182
+ collector_step/visit_entropy_mean:
2183
+ value: 1.6860720122130517
2184
+ step: 69218
2185
+ walltime: null
2186
+ metadata_version: 1
metadata/total_config.py ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ exp_config = {
2
+ 'env': {
3
+ 'manager': {
4
+ 'episode_num': float("inf"),
5
+ 'max_retry': 1,
6
+ 'step_timeout': None,
7
+ 'auto_reset': True,
8
+ 'reset_timeout': None,
9
+ 'retry_type': 'reset',
10
+ 'retry_waiting_time': 0.1,
11
+ 'shared_memory': False,
12
+ 'copy_on_get': True,
13
+ 'context': 'fork',
14
+ 'wait_num': float("inf"),
15
+ 'step_wait_timeout': None,
16
+ 'connect_timeout': 60,
17
+ 'reset_inplace': False,
18
+ 'cfg_type': 'SyncSubprocessEnvManagerDict',
19
+ 'type': 'subprocess'
20
+ },
21
+ 'stop_value':
22
+ 10000000000,
23
+ 'n_evaluator_episode':
24
+ 10,
25
+ 'env_id':
26
+ 'simplified__first_attack',
27
+ 'battle_mode':
28
+ 'self_play_mode',
29
+ 'battle_mode_in_simulation_env':
30
+ 'self_play_mode',
31
+ 'bot_action_type':
32
+ 'rule',
33
+ 'agent_vs_human':
34
+ False,
35
+ 'prob_random_agent':
36
+ 0,
37
+ 'prob_expert_agent':
38
+ 0,
39
+ 'prob_random_action_in_bot':
40
+ 0.0,
41
+ 'channel_last':
42
+ False,
43
+ 'scale':
44
+ True,
45
+ 'render_mode':
46
+ None,
47
+ 'replay_path':
48
+ None,
49
+ 'alphazero_mcts_ctree':
50
+ False,
51
+ 'cfg_type':
52
+ 'SimplifiedFirstAttackEnvDict',
53
+ 'type':
54
+ 'simplified__first_attack',
55
+ 'import_names': [
56
+ 'custom_games_simplified.simplified__first_attack.envs.first_attack_env'
57
+ ],
58
+ 'collector_env_num':
59
+ 4,
60
+ 'evaluator_env_num':
61
+ 10
62
+ },
63
+ 'policy': {
64
+ 'model': {
65
+ 'model_type': 'conv',
66
+ 'continuous_action_space': False,
67
+ 'observation_shape': (3, 6, 6),
68
+ 'self_supervised_learning_loss': True,
69
+ 'categorical_distribution': True,
70
+ 'image_channel': 3,
71
+ 'frame_stack_num': 1,
72
+ 'num_res_blocks': 1,
73
+ 'num_channels': 32,
74
+ 'reward_support_range': (-300.0, 301.0, 1.0),
75
+ 'value_support_range': (-300.0, 301.0, 1.0),
76
+ 'bias': True,
77
+ 'discrete_action_encoding_type': 'one_hot',
78
+ 'res_connection_in_dynamics': True,
79
+ 'norm_type': 'BN',
80
+ 'analysis_sim_norm': False,
81
+ 'analysis_dormant_ratio': False,
82
+ 'harmony_balance': False,
83
+ 'lstm_hidden_size': 512,
84
+ 'action_space_size': 36
85
+ },
86
+ 'learn': {
87
+ 'learner': {
88
+ 'train_iterations': 1000000000,
89
+ 'dataloader': {
90
+ 'num_workers': 0
91
+ },
92
+ 'log_policy': True,
93
+ 'hook': {
94
+ 'load_ckpt_before_run': '',
95
+ 'log_show_after_iter': 100,
96
+ 'save_ckpt_after_iter': 10000,
97
+ 'save_ckpt_after_run': True
98
+ },
99
+ 'cfg_type': 'BaseLearnerDict'
100
+ },
101
+ 'resume_training': False
102
+ },
103
+ 'collect': {
104
+ 'collector': {
105
+ 'deepcopy_obs': False,
106
+ 'transform_obs': False,
107
+ 'collect_print_freq': 100,
108
+ 'cfg_type': 'SampleSerialCollectorDict',
109
+ 'type': 'sample'
110
+ }
111
+ },
112
+ 'eval': {
113
+ 'evaluator': {
114
+ 'eval_freq': 1000,
115
+ 'render': {
116
+ 'render_freq': -1,
117
+ 'mode': 'train_iter'
118
+ },
119
+ 'figure_path': None,
120
+ 'cfg_type': 'InteractionSerialEvaluatorDict',
121
+ 'stop_value': 10000000000,
122
+ 'n_episode': 10
123
+ }
124
+ },
125
+ 'other': {
126
+ 'replay_buffer': {
127
+ 'type': 'advanced',
128
+ 'replay_buffer_size': 4096,
129
+ 'max_use': float("inf"),
130
+ 'max_staleness': float("inf"),
131
+ 'alpha': 0.6,
132
+ 'beta': 0.4,
133
+ 'anneal_step': 100000,
134
+ 'enable_track_used_data': False,
135
+ 'deepcopy': False,
136
+ 'thruput_controller': {
137
+ 'push_sample_rate_limit': {
138
+ 'max': float("inf"),
139
+ 'min': 0
140
+ },
141
+ 'window_seconds': 30,
142
+ 'sample_min_limit_ratio': 1
143
+ },
144
+ 'monitor': {
145
+ 'sampled_data_attr': {
146
+ 'average_range': 5,
147
+ 'print_freq': 200
148
+ },
149
+ 'periodic_thruput': {
150
+ 'seconds': 60
151
+ }
152
+ },
153
+ 'cfg_type': 'AdvancedReplayBufferDict'
154
+ },
155
+ 'commander': {
156
+ 'cfg_type': 'BaseSerialCommanderDict'
157
+ }
158
+ },
159
+ 'on_policy': False,
160
+ 'cuda': True,
161
+ 'multi_gpu': False,
162
+ 'bp_update_sync': True,
163
+ 'traj_len_inf': False,
164
+ 'use_wandb': True,
165
+ 'use_rnd_model': False,
166
+ 'sampled_algo': False,
167
+ 'gumbel_algo': False,
168
+ 'mcts_ctree': True,
169
+ 'collector_env_num': 4,
170
+ 'evaluator_env_num': 10,
171
+ 'env_type': 'board_games',
172
+ 'action_type': 'varied_action_space',
173
+ 'battle_mode': 'self_play_mode',
174
+ 'monitor_extra_statistics': True,
175
+ 'game_segment_length': 12,
176
+ 'eval_offline': False,
177
+ 'calculate_dormant_ratio': False,
178
+ 'analysis_sim_norm': False,
179
+ 'analysis_dormant_ratio': False,
180
+ 'transform2string': False,
181
+ 'gray_scale': False,
182
+ 'use_augmentation': False,
183
+ 'augmentation': ['shift', 'intensity'],
184
+ 'ignore_done': False,
185
+ 'update_per_collect': 2,
186
+ 'replay_ratio': 0.25,
187
+ 'batch_size': 128,
188
+ 'optim_type': 'Adam',
189
+ 'learning_rate': 0.003,
190
+ 'target_update_freq': 100,
191
+ 'target_update_freq_for_intrinsic_reward': 1000,
192
+ 'weight_decay': 0.0001,
193
+ 'momentum': 0.9,
194
+ 'grad_clip_value': 0.5,
195
+ 'n_episode': 4,
196
+ 'num_segments': 8,
197
+ 'num_simulations': 20,
198
+ 'discount_factor': 1,
199
+ 'td_steps': 12,
200
+ 'num_unroll_steps': 5,
201
+ 'reward_loss_weight': 1,
202
+ 'value_loss_weight': 0.25,
203
+ 'policy_loss_weight': 1,
204
+ 'policy_entropy_weight': 0,
205
+ 'ssl_loss_weight': 2,
206
+ 'piecewise_decay_lr_scheduler': False,
207
+ 'threshold_training_steps_for_final_lr': 50000,
208
+ 'manual_temperature_decay': False,
209
+ 'threshold_training_steps_for_final_temperature': 100000,
210
+ 'fixed_temperature_value': 0.25,
211
+ 'use_ture_chance_label_in_chance_encoder': False,
212
+ 'reanalyze_noise': True,
213
+ 'reuse_search': False,
214
+ 'collect_with_pure_policy': False,
215
+ 'use_priority': False,
216
+ 'priority_prob_alpha': 0.6,
217
+ 'priority_prob_beta': 0.4,
218
+ 'root_dirichlet_alpha': 0.3,
219
+ 'root_noise_weight': 0.25,
220
+ 'random_collect_episode_num': 0,
221
+ 'eps': {
222
+ 'eps_greedy_exploration_in_collect': False,
223
+ 'type': 'linear',
224
+ 'start': 1.0,
225
+ 'end': 0.05,
226
+ 'decay': 100000
227
+ },
228
+ 'cfg_type': 'EfficientZeroPolicyDict',
229
+ 'lstm_horizon_len': 5,
230
+ 'type': 'efficientzero',
231
+ 'import_names': ['lzero.policy.efficientzero'],
232
+ 'model_path':
233
+ '/workspace/combinatorial_reasoning_post_training/models/simplified5_immediate_300k_continuation_20260518/round-01/simplified__first_attack/attempt-01_260518_231816/ckpt/envstep_130000.pth.tar',
234
+ 'reanalyze_ratio': 0.0,
235
+ 'eval_freq': 100001,
236
+ 'replay_buffer_size': 50000,
237
+ 'best_ckpt_strategy': 'raw',
238
+ 'best_ckpt_ema_alpha': 0.3,
239
+ 'best_ckpt_min_episodes': 20,
240
+ 'battle_mode_in_simulation_env': 'self_play_mode',
241
+ 'eval_opponent_type': 'env_bot',
242
+ 'previous_best_checkpoint': {
243
+ 'path': None,
244
+ 'selector': 'best',
245
+ 'update_policy': 'on_new_best',
246
+ 'num_simulations': None,
247
+ 'n_evaluator_episode': None,
248
+ 'evaluator_env_num': None,
249
+ 'promotion_threshold': 0.0,
250
+ 'fallback_to_env_bot': False
251
+ },
252
+ 'device': 'cuda'
253
+ },
254
+ 'exp_name':
255
+ '/workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543',
256
+ 'seed': 0
257
+ }
metadata/upload_manifest.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoint_mtime_utc": "2026-05-19T14:48:43Z",
3
+ "checkpoint_size_bytes": 108123787,
4
+ "display_name": "FirstAttack",
5
+ "game_id": "simplified__first_attack",
6
+ "path_in_repo": "checkpoints/envstep_70000.pth.tar",
7
+ "repo_id": "LorMolf/FirstAttack-CK",
8
+ "source_attempt_dir": "/workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543",
9
+ "source_checkpoint": "/workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/ckpt/envstep_70000.pth.tar",
10
+ "source_checkpoint_index": "/workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/ckpt/checkpoint_index.yaml",
11
+ "source_total_config": "/workspace/combinatorial_reasoning_post_training/models/simplified5_constant_lr_fixed_20260519/round-01/simplified__first_attack/attempt-01_260519_093543/total_config.py",
12
+ "uploaded_at_utc": "2026-05-19T15:04:12Z",
13
+ "wandb_project": "crpt-simplified5-constant-lr-continuation",
14
+ "wandb_run": "constant_lr_20260519__simplified__first_attack__a01"
15
+ }