| | --- |
| | license: mit |
| | --- |
| | |
| | # Models used in 'Verification of the Implicit World Model in a Generative Model via Adversarial Sequences' (ICLR 2026). |
| |
|
| | This repo contains **48 chess-playing GPT-2 and LLaMA models**, as well as 24 board state probes that were used in the experiments of the paper. |
| |
|
| | ## Contents |
| |
|
| | Each model architecture folder contains 6 subfolders for the 6 datasets used in our experiments. |
| | Each of these 6 subfolders contains 4 checkpoint files, corresponding to the four training methods we used: |
| | - Next-token prediction (NT) → `next_token.ckpt` |
| | - Matching the probability distribution (PD) of valid single token continuations → `prob_dist.ckpt` |
| | - NT with a jointly trained board state probe (NT+JP) → `next_token_joint_probe.ckpt` |
| | - PD with a jointly trained board state probe (PD+JP) → `prob_dist_joint_probe.ckpt` |
| |
|
| | Models trained without a joint probe have their linear board state probes in the `probes` folder. |
| |
|
| | ## Links |
| |
|
| | Paper links: |
| | arXiv: [https://arxiv.org/abs/2602.05903](https://arxiv.org/abs/2602.05903) |
| | HuggingFace: [https://huggingface.co/papers/2602.05903](https://huggingface.co/papers/2602.05903) |
| |
|
| | All corresponding code and links to further resources are available at [https://github.com/szegedai/world-model-verification](https://github.com/szegedai/world-model-verification) |
| |
|
| | ## Citation |
| |
|
| | If you use our code, models, or datasets, please cite the following: |
| | ``` |
| | @inproceedings{ |
| | balogh2026verification, |
| | title={Verification of the Implicit World Model in a Generative Model via Adversarial Sequences}, |
| | author={Andr{\'a}s Balogh and M{\'a}rk Jelasity}, |
| | booktitle={The Fourteenth International Conference on Learning Representations}, |
| | year={2026}, |
| | url={https://openreview.net/forum?id=BLOIB8CwBI} |
| | } |
| | ``` |
| |
|