jakegrigsby commited on
Commit
99d104c
·
verified ·
1 Parent(s): d2329d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +397 -1
README.md CHANGED
@@ -22,4 +22,400 @@ Metamon training checkpoints for policies that play Pokémon Showdown at a human
22
 
23
  **Code:** [GitHub Repository](https://github.com/UT-Austin-RPL/metamon/tree/main)
24
 
25
- For more details on models and usage, please refer to the [project's GitHub repository](https://github.com/UT-Austin-RPL/metamon/tree/main).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  **Code:** [GitHub Repository](https://github.com/UT-Austin-RPL/metamon/tree/main)
24
 
25
+
26
+ ## Pretrained Models
27
+
28
+ Every checkpoint of **29 models** is hosted at [`jakegrigsby/metamon`](https://huggingface.co/jakegrigsby/metamon/tree/main). Models are registered by name in `metamon/rl/pretrained.py` and split into two eras:
29
+
30
+ - **RLC Paper** (`SmallIL`, `SmallRL`, `MediumRL`, `LargeRL`, `SyntheticRLV0`–`V2`): Original models trained on Gen 1-4 OU only.
31
+ - **PokéAgent Challenge** (`Abra`, `Kadabra`–`Kadabra4`, `Alakazam`, `Minikazam`, `Superkazam`, `Kakuna`): Newer models that include Gen 9 OU and a substantially improved self-play dataset. **`Kakuna` is the strongest public policy.**
32
+
33
+ Checkpoints are large (~0.5–2GB each) and download on first use to `$METAMON_CACHE_DIR/pretrained_models`.
34
+
35
+ ### Loading a Model
36
+
37
+ ```python
38
+ from metamon.rl.pretrained import get_pretrained_model, get_pretrained_model_names
39
+
40
+ # List all available models
41
+ print(get_pretrained_model_names())
42
+
43
+ # Load model config (no download yet)
44
+ model = get_pretrained_model("Kakuna")
45
+
46
+ # Downloads weights from HF and returns an amago.Experiment
47
+ experiment = model.initialize_agent(
48
+ checkpoint=34, # omit to use the default checkpoint
49
+ action_temperature=1.0, # higher = more stochastic
50
+ )
51
+
52
+ policy = experiment.policy # the underlying nn.Module
53
+ ```
54
+
55
+ ### Evaluations
56
+
57
+ For running the agent in battles, use `metamon.rl.evaluate`. See the full [Evaluation README](metamon/rl/evaluate/README.md) for all eval types. Quick examples:
58
+
59
+ ```bash
60
+ # vs. heuristic baselines
61
+ python -m metamon.rl.evaluate \
62
+ --eval_type heuristic --agent Kakuna \
63
+ --gens 1 --formats ou --total_battles 100
64
+
65
+ # head-to-head vs. another pretrained model
66
+ python -m metamon.rl.evaluate \
67
+ --eval_type challenge --agent Kakuna --opponent_agent Superkazam \
68
+ --formats gen1ou --total_battles 200
69
+
70
+ # parameter sweep (checkpoints, temperatures, team sets, ...)
71
+ python -m metamon.rl.evaluate \
72
+ --eval_type sweep --config metamon/rl/evaluate/sweep/example_config.yaml
73
+ ```
74
+
75
+ ### Finetuning
76
+
77
+ To finetune a public model on your own data or a new objective:
78
+
79
+ ```bash
80
+ python -m metamon.rl.finetune_from_hf \
81
+ --finetune_from_model Kakuna \
82
+ --run_name MyKakuna \
83
+ --save_dir ~/my_ckpts/ \
84
+ --formats gen1ou --epochs 10 --steps_per_epoch 10000
85
+ ```
86
+
87
+ To evaluate a finetuned model, register it as a `LocalFinetunedModel` — see [`examples/evaluate_custom_models.py`](examples/evaluate_custom_models.py) for a complete example.
88
+
89
+ For more details on models and usage, please refer to the [project's GitHub repository](https://github.com/UT-Austin-RPL/metamon/tree/main).
90
+
91
+
92
+
93
+ ### Internal Leaderboard
94
+
95
+ *![Gold](https://img.shields.io/badge/Gold-DAA520?style=flat) = PokéAgent Challenge policy, ![Pink](https://img.shields.io/badge/Pink-E91E63?style=flat) = Paper policy.*
96
+
97
+ > [!TIP]
98
+ > *These GXE values are a measure of performance **relative** to the listed models and **have no connection to ratings on the public ladder**.*
99
+
100
+ <table>
101
+ <tr><th colspan="10" align="center"><strong>Early Gen OU Local GXE</strong></th></tr>
102
+ <tr>
103
+ <th align="center">Model</th>
104
+ <th colspan="4" align="center">Competitive TeamSet</th>
105
+ <th colspan="4" align="center">Modern Replays TeamSet</th>
106
+ <th align="center">Avg Rank</th>
107
+ </tr>
108
+ <tr>
109
+ <th align="center"></th>
110
+ <th align="center">G1</th>
111
+ <th align="center">G2</th>
112
+ <th align="center">G3</th>
113
+ <th align="center">G4</th>
114
+ <th align="center">G1</th>
115
+ <th align="center">G2</th>
116
+ <th align="center">G3</th>
117
+ <th align="center">G4</th>
118
+ <th align="center"></th>
119
+ </tr>
120
+ <tr>
121
+ <td align="center"><img src="https://img.shields.io/badge/Kakuna-DAA520?style=flat" alt="Kakuna"></td>
122
+ <td align="center"><strong>75%</strong></td>
123
+ <td align="center"><strong>66%</strong></td>
124
+ <td align="center"><strong>63%</strong></td>
125
+ <td align="center"><strong>60%</strong></td>
126
+ <td align="center"><strong>68%</strong></td>
127
+ <td align="center"><strong>71%</strong></td>
128
+ <td align="center"><strong>67%</strong></td>
129
+ <td align="center"><strong>69%</strong></td>
130
+ <td align="center">1.0</td>
131
+ </tr>
132
+ <tr>
133
+ <td align="center"><img src="https://img.shields.io/badge/Superkazam-DAA520?style=flat" alt="Superkazam"></td>
134
+ <td align="center">67%</td>
135
+ <td align="center">63%</td>
136
+ <td align="center">59%</td>
137
+ <td align="center"><ins>58%</ins></td>
138
+ <td align="center">64%</td>
139
+ <td align="center">61%</td>
140
+ <td align="center">62%</td>
141
+ <td align="center">61%</td>
142
+ <td align="center">3.0</td>
143
+ </tr>
144
+ <tr>
145
+ <td align="center"><img src="https://img.shields.io/badge/Kadabra4-DAA520?style=flat" alt="Kadabra4"></td>
146
+ <td align="center">66%</td>
147
+ <td align="center">60%</td>
148
+ <td align="center">58%</td>
149
+ <td align="center"><ins>58%</ins></td>
150
+ <td align="center"><ins>68%</ins></td>
151
+ <td align="center">60%</td>
152
+ <td align="center"><ins>66%</ins></td>
153
+ <td align="center">63%</td>
154
+ <td align="center">3.5</td>
155
+ </tr>
156
+ <tr>
157
+ <td align="center"><img src="https://img.shields.io/badge/Kadabra3-DAA520?style=flat" alt="Kadabra3"></td>
158
+ <td align="center">68%</td>
159
+ <td align="center">61%</td>
160
+ <td align="center">57%</td>
161
+ <td align="center">57%</td>
162
+ <td align="center"><ins>67%</ins></td>
163
+ <td align="center">60%</td>
164
+ <td align="center">60%</td>
165
+ <td align="center">60%</td>
166
+ <td align="center">4.0</td>
167
+ </tr>
168
+ <tr>
169
+ <td align="center"><img src="https://img.shields.io/badge/Kadabra2-DAA520?style=flat" alt="Kadabra2"></td>
170
+ <td align="center">67%</td>
171
+ <td align="center">60%</td>
172
+ <td align="center">58%</td>
173
+ <td align="center">57%</td>
174
+ <td align="center">64%</td>
175
+ <td align="center">62%</td>
176
+ <td align="center">59%</td>
177
+ <td align="center">60%</td>
178
+ <td align="center">4.4</td>
179
+ </tr>
180
+ <tr>
181
+ <td align="center"><img src="https://img.shields.io/badge/Alakazam-DAA520?style=flat" alt="Alakazam"></td>
182
+ <td align="center">66%</td>
183
+ <td align="center">59%</td>
184
+ <td align="center">56%</td>
185
+ <td align="center">57%</td>
186
+ <td align="center">64%</td>
187
+ <td align="center">58%</td>
188
+ <td align="center">61%</td>
189
+ <td align="center">58%</td>
190
+ <td align="center">5.5</td>
191
+ </tr>
192
+ <tr>
193
+ <td align="center"><img src="https://img.shields.io/badge/SynRLV2-E91E63?style=flat" alt="SynRLV2"></td>
194
+ <td align="center">50%</td>
195
+ <td align="center">59%</td>
196
+ <td align="center">55%</td>
197
+ <td align="center">55%</td>
198
+ <td align="center">54%</td>
199
+ <td align="center">61%</td>
200
+ <td align="center">55%</td>
201
+ <td align="center">56%</td>
202
+ <td align="center">6.9</td>
203
+ </tr>
204
+ <tr>
205
+ <td align="center"><img src="https://img.shields.io/badge/Kadabra-DAA520?style=flat" alt="Kadabra"></td>
206
+ <td align="center">56%</td>
207
+ <td align="center">50%</td>
208
+ <td align="center">47%</td>
209
+ <td align="center">47%</td>
210
+ <td align="center">55%</td>
211
+ <td align="center">53%</td>
212
+ <td align="center">50%</td>
213
+ <td align="center">54%</td>
214
+ <td align="center">7.9</td>
215
+ </tr>
216
+ <tr>
217
+ <td align="center"><img src="https://img.shields.io/badge/SynRLV1%2B%2B-E91E63?style=flat" alt="SynRLV1++"></td>
218
+ <td align="center">43%</td>
219
+ <td align="center">47%</td>
220
+ <td align="center">41%</td>
221
+ <td align="center">45%</td>
222
+ <td align="center">47%</td>
223
+ <td align="center">49%</td>
224
+ <td align="center">48%</td>
225
+ <td align="center">48%</td>
226
+ <td align="center">10.0</td>
227
+ </tr>
228
+ <tr>
229
+ <td align="center"><img src="https://img.shields.io/badge/SynRLV1-E91E63?style=flat" alt="SynRLV1"></td>
230
+ <td align="center">43%</td>
231
+ <td align="center">39%</td>
232
+ <td align="center">42%</td>
233
+ <td align="center">46%</td>
234
+ <td align="center">46%</td>
235
+ <td align="center">45%</td>
236
+ <td align="center">44%</td>
237
+ <td align="center">49%</td>
238
+ <td align="center">10.2</td>
239
+ </tr>
240
+ <tr>
241
+ <td align="center"><img src="https://img.shields.io/badge/SynRLV0-E91E63?style=flat" alt="SynRLV0"></td>
242
+ <td align="center">41%</td>
243
+ <td align="center">38%</td>
244
+ <td align="center">48%</td>
245
+ <td align="center">40%</td>
246
+ <td align="center">45%</td>
247
+ <td align="center">41%</td>
248
+ <td align="center">49%</td>
249
+ <td align="center">45%</td>
250
+ <td align="center">11.1</td>
251
+ </tr>
252
+ <tr>
253
+ <td align="center"><img src="https://img.shields.io/badge/Abra-DAA520?style=flat" alt="Abra"></td>
254
+ <td align="center">39%</td>
255
+ <td align="center">44%</td>
256
+ <td align="center">44%</td>
257
+ <td align="center">45%</td>
258
+ <td align="center">40%</td>
259
+ <td align="center">45%</td>
260
+ <td align="center">48%</td>
261
+ <td align="center">48%</td>
262
+ <td align="center">11.2</td>
263
+ </tr>
264
+ <tr>
265
+ <td align="center"><img src="https://img.shields.io/badge/SmallRLGen9Beta-DAA520?style=flat" alt="SmallRLGen9Beta"></td>
266
+ <td align="center">–</td>
267
+ <td align="center">–</td>
268
+ <td align="center">–</td>
269
+ <td align="center">–</td>
270
+ <td align="center">44%</td>
271
+ <td align="center">42%</td>
272
+ <td align="center">45%</td>
273
+ <td align="center">48%</td>
274
+ <td align="center">12.0</td>
275
+ </tr>
276
+ <tr>
277
+ <td align="center"><img src="https://img.shields.io/badge/LargeRL-E91E63?style=flat" alt="LargeRL"></td>
278
+ <td align="center">25%</td>
279
+ <td align="center">35%</td>
280
+ <td align="center">39%</td>
281
+ <td align="center">39%</td>
282
+ <td align="center">30%</td>
283
+ <td align="center">39%</td>
284
+ <td align="center">41%</td>
285
+ <td align="center">44%</td>
286
+ <td align="center">13.9</td>
287
+ </tr>
288
+ <tr>
289
+ <td align="center"><img src="https://img.shields.io/badge/Minikazam-DAA520?style=flat" alt="Minikazam"></td>
290
+ <td align="center">39%</td>
291
+ <td align="center">34%</td>
292
+ <td align="center">34%</td>
293
+ <td align="center">34%</td>
294
+ <td align="center">41%</td>
295
+ <td align="center">36%</td>
296
+ <td align="center">36%</td>
297
+ <td align="center">39%</td>
298
+ <td align="center">14.6</td>
299
+ </tr>
300
+ <tr>
301
+ <td align="center"><img src="https://img.shields.io/badge/SmallILFA-E91E63?style=flat" alt="SmallILFA"></td>
302
+ <td align="center">24%</td>
303
+ <td align="center">36%</td>
304
+ <td align="center">39%</td>
305
+ <td align="center">35%</td>
306
+ <td align="center">28%</td>
307
+ <td align="center">35%</td>
308
+ <td align="center">38%</td>
309
+ <td align="center">41%</td>
310
+ <td align="center">14.8</td>
311
+ </tr>
312
+ </table>
313
+
314
+ > [!TIP]
315
+ > Paper policies are (predictably) weak in Gen9OU because they were never trained to play the format and use observation spaces that assume Team Preview is not available.
316
+
317
+ <table>
318
+ <tr><th colspan="4" align="center"><strong>Gen9OU Local GXE</strong></th></tr>
319
+ <tr>
320
+ <th align="center">Model</th>
321
+ <th align="center">Competitive TeamSet</th>
322
+ <th align="center">Modern Replays TeamSet</th>
323
+ <th align="center">Avg Rank</th>
324
+ </tr>
325
+ <tr>
326
+ <td align="center"><img src="https://img.shields.io/badge/Kakuna-DAA520?style=flat" alt="Kakuna"></td>
327
+ <td align="center"><strong>76%</strong></td>
328
+ <td align="center"><strong>74%</strong></td>
329
+ <td align="center">1.0</td>
330
+ </tr>
331
+ <tr>
332
+ <td align="center"><img src="https://img.shields.io/badge/Superkazam-DAA520?style=flat" alt="Superkazam"></td>
333
+ <td align="center"><ins>75%</ins></td>
334
+ <td align="center"><ins>73%</ins></td>
335
+ <td align="center">2.5</td>
336
+ </tr>
337
+ <tr>
338
+ <td align="center"><img src="https://img.shields.io/badge/Kadabra4-DAA520?style=flat" alt="Kadabra4"></td>
339
+ <td align="center"><ins>75%</ins></td>
340
+ <td align="center"><ins>73%</ins></td>
341
+ <td align="center">2.5</td>
342
+ </tr>
343
+ <tr>
344
+ <td align="center"><img src="https://img.shields.io/badge/Kadabra3-DAA520?style=flat" alt="Kadabra3"></td>
345
+ <td align="center">73%</td>
346
+ <td align="center">71%</td>
347
+ <td align="center">4.5</td>
348
+ </tr>
349
+ <tr>
350
+ <td align="center"><img src="https://img.shields.io/badge/Kadabra2-DAA520?style=flat" alt="Kadabra2"></td>
351
+ <td align="center">73%</td>
352
+ <td align="center">69%</td>
353
+ <td align="center">5.0</td>
354
+ </tr>
355
+ <tr>
356
+ <td align="center"><img src="https://img.shields.io/badge/Alakazam-DAA520?style=flat" alt="Alakazam"></td>
357
+ <td align="center">73%</td>
358
+ <td align="center">71%</td>
359
+ <td align="center">5.5</td>
360
+ </tr>
361
+ <tr>
362
+ <td align="center"><img src="https://img.shields.io/badge/Abra-DAA520?style=flat" alt="Abra"></td>
363
+ <td align="center">61%</td>
364
+ <td align="center">57%</td>
365
+ <td align="center">7.0</td>
366
+ </tr>
367
+ <tr>
368
+ <td align="center"><img src="https://img.shields.io/badge/SmallRLGen9Beta-DAA520?style=flat" alt="SmallRLGen9Beta"></td>
369
+ <td align="center">56%</td>
370
+ <td align="center">57%</td>
371
+ <td align="center">8.5</td>
372
+ </tr>
373
+ <tr>
374
+ <td align="center"><img src="https://img.shields.io/badge/Kadabra-DAA520?style=flat" alt="Kadabra"></td>
375
+ <td align="center">58%</td>
376
+ <td align="center">55%</td>
377
+ <td align="center">8.5</td>
378
+ </tr>
379
+ <tr>
380
+ <td align="center"><img src="https://img.shields.io/badge/Minikazam-DAA520?style=flat" alt="Minikazam"></td>
381
+ <td align="center">50%</td>
382
+ <td align="center">50%</td>
383
+ <td align="center">10.0</td>
384
+ </tr>
385
+ <tr>
386
+ <td align="center"><img src="https://img.shields.io/badge/SynRLV0-E91E63?style=flat" alt="SynRLV0"></td>
387
+ <td align="center">32%</td>
388
+ <td align="center">36%</td>
389
+ <td align="center">11.5</td>
390
+ </tr>
391
+ <tr>
392
+ <td align="center"><img src="https://img.shields.io/badge/SynRLV2-E91E63?style=flat" alt="SynRLV2"></td>
393
+ <td align="center">32%</td>
394
+ <td align="center">38%</td>
395
+ <td align="center">11.5</td>
396
+ </tr>
397
+ <tr>
398
+ <td align="center"><img src="https://img.shields.io/badge/SynRLV1%2B%2B-E91E63?style=flat" alt="SynRLV1++"></td>
399
+ <td align="center">32%</td>
400
+ <td align="center">33%</td>
401
+ <td align="center">13.5</td>
402
+ </tr>
403
+ <tr>
404
+ <td align="center"><img src="https://img.shields.io/badge/LargeRL-E91E63?style=flat" alt="LargeRL"></td>
405
+ <td align="center">29%</td>
406
+ <td align="center">34%</td>
407
+ <td align="center">14.0</td>
408
+ </tr>
409
+ <tr>
410
+ <td align="center"><img src="https://img.shields.io/badge/SynRLV1-E91E63?style=flat" alt="SynRLV1"></td>
411
+ <td align="center">31%</td>
412
+ <td align="center">32%</td>
413
+ <td align="center">14.5</td>
414
+ </tr>
415
+ <tr>
416
+ <td align="center"><img src="https://img.shields.io/badge/SmallILFA-E91E63?style=flat" alt="SmallILFA"></td>
417
+ <td align="center">23%</td>
418
+ <td align="center">27%</td>
419
+ <td align="center">16.0</td>
420
+ </tr>
421
+ </table>