File size: 1,769 Bytes
02c1abc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
license: mit
---

# Models used in 'Verification of the Implicit World Model in a Generative Model via Adversarial Sequences' (ICLR 2026).

This repo contains **48 chess-playing GPT-2 and LLaMA models**, as well as 24 board state probes that were used in the experiments of the paper.

## Contents

Each model architecture folder contains 6 subfolders for the 6 datasets used in our experiments. 
Each of these 6 subfolders contains 4 checkpoint files, corresponding to the four training methods we used:
- Next-token prediction (NT) → `next_token.ckpt`
- Matching the probability distribution (PD) of valid single token continuations → `prob_dist.ckpt` 
- NT with a jointly trained board state probe (NT+JP) → `next_token_joint_probe.ckpt`
- PD with a jointly trained board state probe (PD+JP) → `prob_dist_joint_probe.ckpt`

Models trained without a joint probe have their linear board state probes in the `probes` folder. 

## Links

Paper links:  
arXiv: [https://arxiv.org/abs/2602.05903](https://arxiv.org/abs/2602.05903)  
HuggingFace: [https://huggingface.co/papers/2602.05903](https://huggingface.co/papers/2602.05903)

All corresponding code and links to further resources are available at [https://github.com/szegedai/world-model-verification](https://github.com/szegedai/world-model-verification)

## Citation

If you use our code, models, or datasets, please cite the following:
```
@inproceedings{
  balogh2026verification,
  title={Verification of the Implicit World Model in a Generative Model via Adversarial Sequences},
  author={Andr{\'a}s Balogh and M{\'a}rk Jelasity},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=BLOIB8CwBI}
}
```