elefantai
/

open-p2p

Image-Text-to-Text

English

Model card Files Files and versions

xet

Community

Improve model card: Add pipeline tag, links, and structure

by nielsr HF Staff - opened Jan 10

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+20

-7

Files changed (1) hide show

README.md +20 -7

README.md CHANGED Viewed

@@ -1,23 +1,35 @@
 ---
-license: mit
-language:
-- en
 datasets:
 - elefantai/p2p-full-data
 - elefantai/p2p-toy-examples
 ---
 ![Open Pixel2Play Banner](banner.png)
 Open Pixel2Play (P2P) is an open foundation model trained to play video games in real time. The model takes visual input (images) and text instructions and outputs keyboard and mouse actions, enabling direct interaction with real game environments.
-P2P is trained on 8,000+ hours of human-annotated gameplay videos. The full dataset is available at https://huggingface.co/datasets/elefantai/p2p-full-data
 Our smallest model (150M parameters) can be trained in ~70 hours, and the largest model (1.2B parameters) can be trained in ~140 hours on 8× H100 GPUs.
-Please checkout our [website](https://elefant-ai.github.io/open-p2p/) to watch our model play against real human player on Roblox games,
-and checkout our [github](https://github.com/elefant-ai/open-p2p) for training/inference details. Our [arxiv paper](https://arxiv.org/abs/2601.04575) is also available.
-If you use our models, please kindly consider citing our paper:
 ```bibtex
 @misc{yue2026scaling,
       title={Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing},
@@ -27,3 +39,4 @@ If you use our models, please kindly consider citing our paper:
       archivePrefix={arXiv},
       primaryClass={cs.LG}
 }

 ---
 datasets:
 - elefantai/p2p-full-data
 - elefantai/p2p-toy-examples
+language:
+- en
+license: mit
+pipeline_tag: image-text-to-text
 ---
+# Open Pixel2Play (P2P)
 ![Open Pixel2Play Banner](banner.png)
 Open Pixel2Play (P2P) is an open foundation model trained to play video games in real time. The model takes visual input (images) and text instructions and outputs keyboard and mouse actions, enabling direct interaction with real game environments.
+- **Paper:** [Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing](https://huggingface.co/papers/2601.04575)
+- **Project Page:** [elefant-ai.github.io/open-p2p](https://elefant-ai.github.io/open-p2p/)
+- **Repository:** [github.com/elefant-ai/open-p2p](https://github.com/elefant-ai/open-p2p)
+P2P is trained on 8,000+ hours of human-annotated gameplay videos. The full dataset is available at [elefantai/p2p-full-data](https://huggingface.co/datasets/elefantai/p2p-full-data).
 Our smallest model (150M parameters) can be trained in ~70 hours, and the largest model (1.2B parameters) can be trained in ~140 hours on 8× H100 GPUs.
+## Usage
+For detailed instructions on installation, training, offline inference, and real-time game inference (including integration with Recap on Windows), please refer to the [official GitHub repository](https://github.com/elefant-ai/open-p2p).
+## Citation
+If you use our models or data in your research, please kindly consider citing our paper:
 ```bibtex
 @misc{yue2026scaling,
       title={Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing},
       archivePrefix={arXiv},
       primaryClass={cs.LG}
 }
+```