jakegrigsby commited on
Commit
9521cb6
·
verified ·
1 Parent(s): 99d104c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -395
README.md CHANGED
@@ -23,399 +23,5 @@ Metamon training checkpoints for policies that play Pokémon Showdown at a human
23
  **Code:** [GitHub Repository](https://github.com/UT-Austin-RPL/metamon/tree/main)
24
 
25
 
26
- ## Pretrained Models
27
 
28
- Every checkpoint of **29 models** is hosted at [`jakegrigsby/metamon`](https://huggingface.co/jakegrigsby/metamon/tree/main). Models are registered by name in `metamon/rl/pretrained.py` and split into two eras:
29
-
30
- - **RLC Paper** (`SmallIL`, `SmallRL`, `MediumRL`, `LargeRL`, `SyntheticRLV0`–`V2`): Original models trained on Gen 1-4 OU only.
31
- - **PokéAgent Challenge** (`Abra`, `Kadabra`–`Kadabra4`, `Alakazam`, `Minikazam`, `Superkazam`, `Kakuna`): Newer models that include Gen 9 OU and a substantially improved self-play dataset. **`Kakuna` is the strongest public policy.**
32
-
33
- Checkpoints are large (~0.5–2GB each) and download on first use to `$METAMON_CACHE_DIR/pretrained_models`.
34
-
35
- ### Loading a Model
36
-
37
- ```python
38
- from metamon.rl.pretrained import get_pretrained_model, get_pretrained_model_names
39
-
40
- # List all available models
41
- print(get_pretrained_model_names())
42
-
43
- # Load model config (no download yet)
44
- model = get_pretrained_model("Kakuna")
45
-
46
- # Downloads weights from HF and returns an amago.Experiment
47
- experiment = model.initialize_agent(
48
- checkpoint=34, # omit to use the default checkpoint
49
- action_temperature=1.0, # higher = more stochastic
50
- )
51
-
52
- policy = experiment.policy # the underlying nn.Module
53
- ```
54
-
55
- ### Evaluations
56
-
57
- For running the agent in battles, use `metamon.rl.evaluate`. See the full [Evaluation README](metamon/rl/evaluate/README.md) for all eval types. Quick examples:
58
-
59
- ```bash
60
- # vs. heuristic baselines
61
- python -m metamon.rl.evaluate \
62
- --eval_type heuristic --agent Kakuna \
63
- --gens 1 --formats ou --total_battles 100
64
-
65
- # head-to-head vs. another pretrained model
66
- python -m metamon.rl.evaluate \
67
- --eval_type challenge --agent Kakuna --opponent_agent Superkazam \
68
- --formats gen1ou --total_battles 200
69
-
70
- # parameter sweep (checkpoints, temperatures, team sets, ...)
71
- python -m metamon.rl.evaluate \
72
- --eval_type sweep --config metamon/rl/evaluate/sweep/example_config.yaml
73
- ```
74
-
75
- ### Finetuning
76
-
77
- To finetune a public model on your own data or a new objective:
78
-
79
- ```bash
80
- python -m metamon.rl.finetune_from_hf \
81
- --finetune_from_model Kakuna \
82
- --run_name MyKakuna \
83
- --save_dir ~/my_ckpts/ \
84
- --formats gen1ou --epochs 10 --steps_per_epoch 10000
85
- ```
86
-
87
- To evaluate a finetuned model, register it as a `LocalFinetunedModel` — see [`examples/evaluate_custom_models.py`](examples/evaluate_custom_models.py) for a complete example.
88
-
89
- For more details on models and usage, please refer to the [project's GitHub repository](https://github.com/UT-Austin-RPL/metamon/tree/main).
90
-
91
-
92
-
93
- ### Internal Leaderboard
94
-
95
- *![Gold](https://img.shields.io/badge/Gold-DAA520?style=flat) = PokéAgent Challenge policy, ![Pink](https://img.shields.io/badge/Pink-E91E63?style=flat) = Paper policy.*
96
-
97
- > [!TIP]
98
- > *These GXE values are a measure of performance **relative** to the listed models and **have no connection to ratings on the public ladder**.*
99
-
100
- <table>
101
- <tr><th colspan="10" align="center"><strong>Early Gen OU Local GXE</strong></th></tr>
102
- <tr>
103
- <th align="center">Model</th>
104
- <th colspan="4" align="center">Competitive TeamSet</th>
105
- <th colspan="4" align="center">Modern Replays TeamSet</th>
106
- <th align="center">Avg Rank</th>
107
- </tr>
108
- <tr>
109
- <th align="center"></th>
110
- <th align="center">G1</th>
111
- <th align="center">G2</th>
112
- <th align="center">G3</th>
113
- <th align="center">G4</th>
114
- <th align="center">G1</th>
115
- <th align="center">G2</th>
116
- <th align="center">G3</th>
117
- <th align="center">G4</th>
118
- <th align="center"></th>
119
- </tr>
120
- <tr>
121
- <td align="center"><img src="https://img.shields.io/badge/Kakuna-DAA520?style=flat" alt="Kakuna"></td>
122
- <td align="center"><strong>75%</strong></td>
123
- <td align="center"><strong>66%</strong></td>
124
- <td align="center"><strong>63%</strong></td>
125
- <td align="center"><strong>60%</strong></td>
126
- <td align="center"><strong>68%</strong></td>
127
- <td align="center"><strong>71%</strong></td>
128
- <td align="center"><strong>67%</strong></td>
129
- <td align="center"><strong>69%</strong></td>
130
- <td align="center">1.0</td>
131
- </tr>
132
- <tr>
133
- <td align="center"><img src="https://img.shields.io/badge/Superkazam-DAA520?style=flat" alt="Superkazam"></td>
134
- <td align="center">67%</td>
135
- <td align="center">63%</td>
136
- <td align="center">59%</td>
137
- <td align="center"><ins>58%</ins></td>
138
- <td align="center">64%</td>
139
- <td align="center">61%</td>
140
- <td align="center">62%</td>
141
- <td align="center">61%</td>
142
- <td align="center">3.0</td>
143
- </tr>
144
- <tr>
145
- <td align="center"><img src="https://img.shields.io/badge/Kadabra4-DAA520?style=flat" alt="Kadabra4"></td>
146
- <td align="center">66%</td>
147
- <td align="center">60%</td>
148
- <td align="center">58%</td>
149
- <td align="center"><ins>58%</ins></td>
150
- <td align="center"><ins>68%</ins></td>
151
- <td align="center">60%</td>
152
- <td align="center"><ins>66%</ins></td>
153
- <td align="center">63%</td>
154
- <td align="center">3.5</td>
155
- </tr>
156
- <tr>
157
- <td align="center"><img src="https://img.shields.io/badge/Kadabra3-DAA520?style=flat" alt="Kadabra3"></td>
158
- <td align="center">68%</td>
159
- <td align="center">61%</td>
160
- <td align="center">57%</td>
161
- <td align="center">57%</td>
162
- <td align="center"><ins>67%</ins></td>
163
- <td align="center">60%</td>
164
- <td align="center">60%</td>
165
- <td align="center">60%</td>
166
- <td align="center">4.0</td>
167
- </tr>
168
- <tr>
169
- <td align="center"><img src="https://img.shields.io/badge/Kadabra2-DAA520?style=flat" alt="Kadabra2"></td>
170
- <td align="center">67%</td>
171
- <td align="center">60%</td>
172
- <td align="center">58%</td>
173
- <td align="center">57%</td>
174
- <td align="center">64%</td>
175
- <td align="center">62%</td>
176
- <td align="center">59%</td>
177
- <td align="center">60%</td>
178
- <td align="center">4.4</td>
179
- </tr>
180
- <tr>
181
- <td align="center"><img src="https://img.shields.io/badge/Alakazam-DAA520?style=flat" alt="Alakazam"></td>
182
- <td align="center">66%</td>
183
- <td align="center">59%</td>
184
- <td align="center">56%</td>
185
- <td align="center">57%</td>
186
- <td align="center">64%</td>
187
- <td align="center">58%</td>
188
- <td align="center">61%</td>
189
- <td align="center">58%</td>
190
- <td align="center">5.5</td>
191
- </tr>
192
- <tr>
193
- <td align="center"><img src="https://img.shields.io/badge/SynRLV2-E91E63?style=flat" alt="SynRLV2"></td>
194
- <td align="center">50%</td>
195
- <td align="center">59%</td>
196
- <td align="center">55%</td>
197
- <td align="center">55%</td>
198
- <td align="center">54%</td>
199
- <td align="center">61%</td>
200
- <td align="center">55%</td>
201
- <td align="center">56%</td>
202
- <td align="center">6.9</td>
203
- </tr>
204
- <tr>
205
- <td align="center"><img src="https://img.shields.io/badge/Kadabra-DAA520?style=flat" alt="Kadabra"></td>
206
- <td align="center">56%</td>
207
- <td align="center">50%</td>
208
- <td align="center">47%</td>
209
- <td align="center">47%</td>
210
- <td align="center">55%</td>
211
- <td align="center">53%</td>
212
- <td align="center">50%</td>
213
- <td align="center">54%</td>
214
- <td align="center">7.9</td>
215
- </tr>
216
- <tr>
217
- <td align="center"><img src="https://img.shields.io/badge/SynRLV1%2B%2B-E91E63?style=flat" alt="SynRLV1++"></td>
218
- <td align="center">43%</td>
219
- <td align="center">47%</td>
220
- <td align="center">41%</td>
221
- <td align="center">45%</td>
222
- <td align="center">47%</td>
223
- <td align="center">49%</td>
224
- <td align="center">48%</td>
225
- <td align="center">48%</td>
226
- <td align="center">10.0</td>
227
- </tr>
228
- <tr>
229
- <td align="center"><img src="https://img.shields.io/badge/SynRLV1-E91E63?style=flat" alt="SynRLV1"></td>
230
- <td align="center">43%</td>
231
- <td align="center">39%</td>
232
- <td align="center">42%</td>
233
- <td align="center">46%</td>
234
- <td align="center">46%</td>
235
- <td align="center">45%</td>
236
- <td align="center">44%</td>
237
- <td align="center">49%</td>
238
- <td align="center">10.2</td>
239
- </tr>
240
- <tr>
241
- <td align="center"><img src="https://img.shields.io/badge/SynRLV0-E91E63?style=flat" alt="SynRLV0"></td>
242
- <td align="center">41%</td>
243
- <td align="center">38%</td>
244
- <td align="center">48%</td>
245
- <td align="center">40%</td>
246
- <td align="center">45%</td>
247
- <td align="center">41%</td>
248
- <td align="center">49%</td>
249
- <td align="center">45%</td>
250
- <td align="center">11.1</td>
251
- </tr>
252
- <tr>
253
- <td align="center"><img src="https://img.shields.io/badge/Abra-DAA520?style=flat" alt="Abra"></td>
254
- <td align="center">39%</td>
255
- <td align="center">44%</td>
256
- <td align="center">44%</td>
257
- <td align="center">45%</td>
258
- <td align="center">40%</td>
259
- <td align="center">45%</td>
260
- <td align="center">48%</td>
261
- <td align="center">48%</td>
262
- <td align="center">11.2</td>
263
- </tr>
264
- <tr>
265
- <td align="center"><img src="https://img.shields.io/badge/SmallRLGen9Beta-DAA520?style=flat" alt="SmallRLGen9Beta"></td>
266
- <td align="center">–</td>
267
- <td align="center">–</td>
268
- <td align="center">–</td>
269
- <td align="center">–</td>
270
- <td align="center">44%</td>
271
- <td align="center">42%</td>
272
- <td align="center">45%</td>
273
- <td align="center">48%</td>
274
- <td align="center">12.0</td>
275
- </tr>
276
- <tr>
277
- <td align="center"><img src="https://img.shields.io/badge/LargeRL-E91E63?style=flat" alt="LargeRL"></td>
278
- <td align="center">25%</td>
279
- <td align="center">35%</td>
280
- <td align="center">39%</td>
281
- <td align="center">39%</td>
282
- <td align="center">30%</td>
283
- <td align="center">39%</td>
284
- <td align="center">41%</td>
285
- <td align="center">44%</td>
286
- <td align="center">13.9</td>
287
- </tr>
288
- <tr>
289
- <td align="center"><img src="https://img.shields.io/badge/Minikazam-DAA520?style=flat" alt="Minikazam"></td>
290
- <td align="center">39%</td>
291
- <td align="center">34%</td>
292
- <td align="center">34%</td>
293
- <td align="center">34%</td>
294
- <td align="center">41%</td>
295
- <td align="center">36%</td>
296
- <td align="center">36%</td>
297
- <td align="center">39%</td>
298
- <td align="center">14.6</td>
299
- </tr>
300
- <tr>
301
- <td align="center"><img src="https://img.shields.io/badge/SmallILFA-E91E63?style=flat" alt="SmallILFA"></td>
302
- <td align="center">24%</td>
303
- <td align="center">36%</td>
304
- <td align="center">39%</td>
305
- <td align="center">35%</td>
306
- <td align="center">28%</td>
307
- <td align="center">35%</td>
308
- <td align="center">38%</td>
309
- <td align="center">41%</td>
310
- <td align="center">14.8</td>
311
- </tr>
312
- </table>
313
-
314
- > [!TIP]
315
- > Paper policies are (predictably) weak in Gen9OU because they were never trained to play the format and use observation spaces that assume Team Preview is not available.
316
-
317
- <table>
318
- <tr><th colspan="4" align="center"><strong>Gen9OU Local GXE</strong></th></tr>
319
- <tr>
320
- <th align="center">Model</th>
321
- <th align="center">Competitive TeamSet</th>
322
- <th align="center">Modern Replays TeamSet</th>
323
- <th align="center">Avg Rank</th>
324
- </tr>
325
- <tr>
326
- <td align="center"><img src="https://img.shields.io/badge/Kakuna-DAA520?style=flat" alt="Kakuna"></td>
327
- <td align="center"><strong>76%</strong></td>
328
- <td align="center"><strong>74%</strong></td>
329
- <td align="center">1.0</td>
330
- </tr>
331
- <tr>
332
- <td align="center"><img src="https://img.shields.io/badge/Superkazam-DAA520?style=flat" alt="Superkazam"></td>
333
- <td align="center"><ins>75%</ins></td>
334
- <td align="center"><ins>73%</ins></td>
335
- <td align="center">2.5</td>
336
- </tr>
337
- <tr>
338
- <td align="center"><img src="https://img.shields.io/badge/Kadabra4-DAA520?style=flat" alt="Kadabra4"></td>
339
- <td align="center"><ins>75%</ins></td>
340
- <td align="center"><ins>73%</ins></td>
341
- <td align="center">2.5</td>
342
- </tr>
343
- <tr>
344
- <td align="center"><img src="https://img.shields.io/badge/Kadabra3-DAA520?style=flat" alt="Kadabra3"></td>
345
- <td align="center">73%</td>
346
- <td align="center">71%</td>
347
- <td align="center">4.5</td>
348
- </tr>
349
- <tr>
350
- <td align="center"><img src="https://img.shields.io/badge/Kadabra2-DAA520?style=flat" alt="Kadabra2"></td>
351
- <td align="center">73%</td>
352
- <td align="center">69%</td>
353
- <td align="center">5.0</td>
354
- </tr>
355
- <tr>
356
- <td align="center"><img src="https://img.shields.io/badge/Alakazam-DAA520?style=flat" alt="Alakazam"></td>
357
- <td align="center">73%</td>
358
- <td align="center">71%</td>
359
- <td align="center">5.5</td>
360
- </tr>
361
- <tr>
362
- <td align="center"><img src="https://img.shields.io/badge/Abra-DAA520?style=flat" alt="Abra"></td>
363
- <td align="center">61%</td>
364
- <td align="center">57%</td>
365
- <td align="center">7.0</td>
366
- </tr>
367
- <tr>
368
- <td align="center"><img src="https://img.shields.io/badge/SmallRLGen9Beta-DAA520?style=flat" alt="SmallRLGen9Beta"></td>
369
- <td align="center">56%</td>
370
- <td align="center">57%</td>
371
- <td align="center">8.5</td>
372
- </tr>
373
- <tr>
374
- <td align="center"><img src="https://img.shields.io/badge/Kadabra-DAA520?style=flat" alt="Kadabra"></td>
375
- <td align="center">58%</td>
376
- <td align="center">55%</td>
377
- <td align="center">8.5</td>
378
- </tr>
379
- <tr>
380
- <td align="center"><img src="https://img.shields.io/badge/Minikazam-DAA520?style=flat" alt="Minikazam"></td>
381
- <td align="center">50%</td>
382
- <td align="center">50%</td>
383
- <td align="center">10.0</td>
384
- </tr>
385
- <tr>
386
- <td align="center"><img src="https://img.shields.io/badge/SynRLV0-E91E63?style=flat" alt="SynRLV0"></td>
387
- <td align="center">32%</td>
388
- <td align="center">36%</td>
389
- <td align="center">11.5</td>
390
- </tr>
391
- <tr>
392
- <td align="center"><img src="https://img.shields.io/badge/SynRLV2-E91E63?style=flat" alt="SynRLV2"></td>
393
- <td align="center">32%</td>
394
- <td align="center">38%</td>
395
- <td align="center">11.5</td>
396
- </tr>
397
- <tr>
398
- <td align="center"><img src="https://img.shields.io/badge/SynRLV1%2B%2B-E91E63?style=flat" alt="SynRLV1++"></td>
399
- <td align="center">32%</td>
400
- <td align="center">33%</td>
401
- <td align="center">13.5</td>
402
- </tr>
403
- <tr>
404
- <td align="center"><img src="https://img.shields.io/badge/LargeRL-E91E63?style=flat" alt="LargeRL"></td>
405
- <td align="center">29%</td>
406
- <td align="center">34%</td>
407
- <td align="center">14.0</td>
408
- </tr>
409
- <tr>
410
- <td align="center"><img src="https://img.shields.io/badge/SynRLV1-E91E63?style=flat" alt="SynRLV1"></td>
411
- <td align="center">31%</td>
412
- <td align="center">32%</td>
413
- <td align="center">14.5</td>
414
- </tr>
415
- <tr>
416
- <td align="center"><img src="https://img.shields.io/badge/SmallILFA-E91E63?style=flat" alt="SmallILFA"></td>
417
- <td align="center">23%</td>
418
- <td align="center">27%</td>
419
- <td align="center">16.0</td>
420
- </tr>
421
- </table>
 
23
  **Code:** [GitHub Repository](https://github.com/UT-Austin-RPL/metamon/tree/main)
24
 
25
 
 
26
 
27
+ The metamon README provides detailed instructions for loading, evaluating, and finetuning pretrained policies.