Image-Text-to-Text
English

Improve model card: Add pipeline tag, links, and structure

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +20 -7
README.md CHANGED
@@ -1,23 +1,35 @@
1
  ---
2
- license: mit
3
- language:
4
- - en
5
  datasets:
6
  - elefantai/p2p-full-data
7
  - elefantai/p2p-toy-examples
 
 
 
 
8
  ---
 
 
 
9
  ![Open Pixel2Play Banner](banner.png)
10
 
11
  Open Pixel2Play (P2P) is an open foundation model trained to play video games in real time. The model takes visual input (images) and text instructions and outputs keyboard and mouse actions, enabling direct interaction with real game environments.
12
 
13
- P2P is trained on 8,000+ hours of human-annotated gameplay videos. The full dataset is available at https://huggingface.co/datasets/elefantai/p2p-full-data
 
 
 
 
14
 
15
  Our smallest model (150M parameters) can be trained in ~70 hours, and the largest model (1.2B parameters) can be trained in ~140 hours on 8× H100 GPUs.
16
 
17
- Please checkout our [website](https://elefant-ai.github.io/open-p2p/) to watch our model play against real human player on Roblox games,
18
- and checkout our [github](https://github.com/elefant-ai/open-p2p) for training/inference details. Our [arxiv paper](https://arxiv.org/abs/2601.04575) is also available.
 
 
 
 
 
19
 
20
- If you use our models, please kindly consider citing our paper:
21
  ```bibtex
22
  @misc{yue2026scaling,
23
  title={Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing},
@@ -27,3 +39,4 @@ If you use our models, please kindly consider citing our paper:
27
  archivePrefix={arXiv},
28
  primaryClass={cs.LG}
29
  }
 
 
1
  ---
 
 
 
2
  datasets:
3
  - elefantai/p2p-full-data
4
  - elefantai/p2p-toy-examples
5
+ language:
6
+ - en
7
+ license: mit
8
+ pipeline_tag: image-text-to-text
9
  ---
10
+
11
+ # Open Pixel2Play (P2P)
12
+
13
  ![Open Pixel2Play Banner](banner.png)
14
 
15
  Open Pixel2Play (P2P) is an open foundation model trained to play video games in real time. The model takes visual input (images) and text instructions and outputs keyboard and mouse actions, enabling direct interaction with real game environments.
16
 
17
+ - **Paper:** [Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing](https://huggingface.co/papers/2601.04575)
18
+ - **Project Page:** [elefant-ai.github.io/open-p2p](https://elefant-ai.github.io/open-p2p/)
19
+ - **Repository:** [github.com/elefant-ai/open-p2p](https://github.com/elefant-ai/open-p2p)
20
+
21
+ P2P is trained on 8,000+ hours of human-annotated gameplay videos. The full dataset is available at [elefantai/p2p-full-data](https://huggingface.co/datasets/elefantai/p2p-full-data).
22
 
23
  Our smallest model (150M parameters) can be trained in ~70 hours, and the largest model (1.2B parameters) can be trained in ~140 hours on 8× H100 GPUs.
24
 
25
+ ## Usage
26
+
27
+ For detailed instructions on installation, training, offline inference, and real-time game inference (including integration with Recap on Windows), please refer to the [official GitHub repository](https://github.com/elefant-ai/open-p2p).
28
+
29
+ ## Citation
30
+
31
+ If you use our models or data in your research, please kindly consider citing our paper:
32
 
 
33
  ```bibtex
34
  @misc{yue2026scaling,
35
  title={Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing},
 
39
  archivePrefix={arXiv},
40
  primaryClass={cs.LG}
41
  }
42
+ ```