File size: 3,073 Bytes
3896bb3
 
 
 
 
 
a2e2408
 
 
 
 
 
 
 
 
 
 
 
 
3896bb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83ac352
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# PSI-0.5 Usage Guide

PSI-0.5 is a promptable physical world model. It accepts notation strings such
as `rgb0->rgb1`, `rgb0,f01->f01,rgb1`, and `rgb0,c01->rgb1`, then fills in the
requested missing visual variables.

## Install

```bash
conda create -n psi-demos python=3.10 -y
conda activate psi-demos
pip install torch==2.7.1 --index-url https://download.pytorch.org/whl/cu126
pip install transformers huggingface-hub einops h5py tiktoken numpy pillow opencv-python gradio matplotlib scipy
```

The PyTorch command above installs the CUDA 12.6 wheel used on the ccn2 A40
nodes. For other machines, install the PyTorch build recommended for your
driver/platform first.

## Load With Transformers

```python
from PIL import Image
from transformers import AutoModel

predictor = AutoModel.from_pretrained(
    "StanfordNeuroAILab/psi0_5",
    trust_remote_code=True,
    device="cuda:0",
)

rgb1 = predictor.generate(
    "rgb0->rgb1",
    rgb0=Image.open("scene.png").convert("RGB"),
    seed=1110,
    temp=1.0,
    top_k=1000,
    top_p=1.0,
)
rgb1.save("scene_next.png")
```

## Sparse Flow Prompt

```python
from PIL import Image
from transformers import AutoModel


predictor = AutoModel.from_pretrained(
    "StanfordNeuroAILab/psi0_5",
    trust_remote_code=True,
    device="cuda:0",
)
rgb0 = Image.open("block_slide_rgb0.png").convert("RGB")
f01 = predictor.sparse_flow_prompt([((70, 221), (168, 221))], rgb0.size)

dense_flow, rgb1 = predictor.generate(
    "rgb0,f01->f01,rgb1",
    rgb0=rgb0,
    f01=f01,
    seed=1110,
    num_seq_patches=256,
)
```

## Depth, Flow, And RGB

```python
import numpy as np
from PIL import Image

rgb0 = Image.open("billiards_rgb0.png").convert("RGB")
depth0 = np.load("billiards_d0_meters.npy").astype(np.float32)
f01 = predictor.sparse_flow_prompt([((392, 171), (238, 94))], rgb0.size)

dense_flow, depth1, rgb1 = predictor.generate(
    "rgb0,d0,f01->f01,d1,rgb1",
    rgb0=rgb0,
    d0=depth0,
    f01=f01,
    seed=1110,
    num_seq_patches=256,
)
```

## Camera-Conditioned Novel View Synthesis

```python
camera = {
    "fov_x": 60.0,
    "fov_y": 60.0,
    "euler_angles": [0.0, -0.12, 0.0],
    "translation": [0.10, 0.0, 0.04],
}

rgb1 = predictor.generate(
    "rgb0,c01->rgb1",
    rgb0=Image.open("coffee_mug_000.png").convert("RGB"),
    c01=camera,
    seed=1110,
)
```

## Advanced Paths

All runtime files needed by Transformers remote code live at the repository
root. The release manifest lists the default checkpoint and tokenizer assets for
reproducibility.

PSIv0.5 is a modestly sized model that has not undergone any post-training yet.
Some of its rollouts diverge. We recommend unrestricted sampling for flow
prediction and `top_p=0.9`, `top_k=1000` for RGB rendering. Correct prompting
can significantly improve generations, and simple harnesses such as those in the
provided Gradio app can be used to steer the model much more effectively. We
believe this direction has great potential for scaling to create even more
comprehensive models of the world while maintaining this highly controllable
API.