qualiaadmin commited on
Commit
b7a8373
·
verified ·
1 Parent(s): 8529a55

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: lerobot
5
+ pipeline_tag: robotics
6
+ tags:
7
+ - vision-language-action
8
+ - imitation-learning
9
+ - lerobot
10
+ inference: false
11
+ license: gemma
12
+ ---
13
+
14
+ # π0 fast (PI0Fast) (LeRobot)
15
+
16
+ **PI0Fast** is a Vision-Language-Action (VLA) policy that predicts continuous robot actions via **autoregressive next-token prediction** over **FAST action tokens**.
17
+
18
+ **Original authors / paper:** [FAST: Efficient Action Tokenization for Vision-Language-Action Models](https://arxiv.org/abs/2501.09747)
19
+ **Implementation:** This LeRobot implementation follows the original reference code for compatibility.
20
+ **Reference implementation:** [https://github.com/Physical-Intelligence/openpi]
21
+
22
+
23
+ ## Model description
24
+
25
+ - **Inputs:** images (multi-view), proprio/state, optional language instruction
26
+ - **Outputs:** continuous actions (decoded from model outputs)
27
+ - **Training objective:** next-token cross-entropy
28
+ - **Action representation:** FAST tokens
29
+ - **Intended use:** Fine tune on your task.
30
+
31
+
32
+ ## Quick start (inference on a real batch)
33
+
34
+ ### Installation
35
+
36
+ ```bash
37
+ pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"
38
+
39
+ For full installation details (including optional video dependencies such as ffmpeg for torchcodec), see the official documentation: https://huggingface.co/docs/lerobot/installation
40
+ ```
41
+
42
+ ### Load model + dataset, run `select_action`
43
+
44
+ ```python
45
+ import torch
46
+ from lerobot.datasets.lerobot_dataset import LeRobotDataset
47
+ from lerobot.policies.factory import make_pre_post_processors
48
+
49
+ # Swap this import per-policy
50
+ from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
51
+
52
+ # load a policy
53
+ model_id = "lerobot/pi0fast-libero" # <- swap checkpoint
54
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
55
+
56
+ policy = PI0FastPolicy.from_pretrained(model_id).to(device).eval()
57
+
58
+ preprocess, postprocess = make_pre_post_processors(
59
+ policy.config,
60
+ model_id,
61
+ preprocessor_overrides={"device_processor": {"device": str(device)}},
62
+ )
63
+ # load a lerobotdataset
64
+ dataset = LeRobotDataset("lerobot/libero")
65
+
66
+ # pick an episode
67
+ episode_index = 0
68
+
69
+ # each episode corresponds to a contiguous range of frame indices
70
+ from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
71
+ to_idx = dataset.meta.episodes["dataset_to_index"][episode_index]
72
+
73
+ # get a single frame from that episode (e.g. the first frame)
74
+ frame_index = from_idx
75
+ frame = dict(dataset[frame_index])
76
+
77
+ batch = preprocess(frame)
78
+ with torch.inference_mode():
79
+ pred_action = policy.select_action(batch)
80
+ # use your policy postprocess, this post process the action
81
+ # for instance unnormalize the actions, detokenize it etc..
82
+ pred_action = postprocess(pred_action)
83
+ ```
84
+
85
+ ## Training step (loss + backward)
86
+
87
+ If you’re training / fine-tuning, you typically call `forward(...)` to get a loss and then:
88
+
89
+ ```python
90
+ policy.train()
91
+ batch = dict(dataset[0])
92
+ batch = preprocess(batch)
93
+
94
+ loss, outputs = policy.forward(batch)
95
+ loss.backward()
96
+
97
+ ```
98
+
99
+ > Notes:
100
+ >
101
+ > - Some policies expose `policy(**batch)` or return a dict; keep this snippet aligned with the policy API.
102
+ > - Use your trainer script (`lerobot-train`) for full training loops.
103
+
104
+
105
+ ## How to train / fine-tune
106
+
107
+ ```bash
108
+ lerobot-train \
109
+ --dataset.repo_id=HuggingFaceVLA/libero \
110
+ --output_dir=./outputs/[RUN_NAME] \
111
+ --job_name=[RUN_NAME] \
112
+ --policy.repo_id=[THIS_REPO_OR_CHECKPOINT] \
113
+ --policy.path=lerobot/[BASE_CHECKPOINT] \
114
+ --policy.dtype=bfloat16 \
115
+ --policy.device=cuda \
116
+ --steps=100000 \
117
+ --batch_size=4
118
+ ```
119
+
120
+ Add policy-specific flags below:
121
+
122
+ - `-policy.chunk_size=...`
123
+ - `-policy.n_action_steps=...`
124
+ - `-policy.max_action_tokens=...`
125
+ - `-policy.gradient_checkpointing=true`
126
+
127
+ ---
128
+
129
+ ## Evaluate in Simulation (LIBERO)
130
+
131
+ You can evaluate the model in Libero environment.
132
+
133
+ ```bash
134
+ lerobot-eval \
135
+ --policy.path=lerobot/[CHECKPOINT_ID] \
136
+ --env.type=libero \
137
+ --env.task=libero_object \
138
+ --eval.batch_size=1 \
139
+ --eval.n_episodes=20
140
+ ```
141
+
142
+ ---
143
+
144
+ ## Real-World Inference & Evaluation
145
+
146
+ You can use the `record` script from [**`lerobot-record`**](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_record.py) with a policy checkpoint as input, to run inference and evaluate your policy.
147
+
148
+ For instance, run this command or API example to run inference and record 10 evaluation episodes:
149
+
150
+ Copied
151
+
152
+ ```
153
+ lerobot-record \
154
+ --robot.type=so100_follower \
155
+ --robot.port=/dev/ttyACM1 \
156
+ --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
157
+ --robot.id=my_awesome_follower_arm \
158
+ --display_data=false \
159
+ --dataset.repo_id=${HF_USER}/eval_so100 \
160
+ --dataset.single_task="Put lego brick into the transparent box" \
161
+ # <- Teleop optional if you want to teleoperate in between episodes \
162
+ # --teleop.type=so100_leader \
163
+ # --teleop.port=/dev/ttyACM0 \
164
+ # --teleop.id=my_awesome_leader_arm \
165
+ --policy.path=${HF_USER}/my_policy
166
+ ```
config.json ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "type": "pi0_fast",
3
+ "input_features": {
4
+ "observation.images.base_0_rgb": {
5
+ "type": "VISUAL",
6
+ "shape": [
7
+ 3,
8
+ 224,
9
+ 224
10
+ ]
11
+ },
12
+ "observation.images.left_wrist_0_rgb": {
13
+ "type": "VISUAL",
14
+ "shape": [
15
+ 3,
16
+ 224,
17
+ 224
18
+ ]
19
+ },
20
+ "observation.images.right_wrist_0_rgb": {
21
+ "type": "VISUAL",
22
+ "shape": [
23
+ 3,
24
+ 224,
25
+ 224
26
+ ]
27
+ },
28
+ "observation.state": {
29
+ "type": "STATE",
30
+ "shape": [
31
+ 32
32
+ ]
33
+ }
34
+ },
35
+ "output_features": {
36
+ "action": {
37
+ "type": "ACTION",
38
+ "shape": [
39
+ 32
40
+ ]
41
+ }
42
+ },
43
+ "device": "cuda",
44
+ "use_amp": false,
45
+ "push_to_hub": true,
46
+ "repo_id": null,
47
+ "private": null,
48
+ "tags": null,
49
+ "license": null,
50
+ "paligemma_variant": "gemma_2b",
51
+ "dtype": "bfloat16",
52
+ "chunk_size": 50,
53
+ "n_action_steps": 50,
54
+ "max_action_dim": 32,
55
+ "max_state_dim": 32,
56
+ "max_action_tokens": 256,
57
+ "text_tokenizer_name": "google/paligemma-3b-pt-224",
58
+ "action_tokenizer_name": "physical-intelligence/fast",
59
+ "temperature": 0.0,
60
+ "max_decoding_steps": 256,
61
+ "fast_skip_tokens": 128,
62
+ "image_resolution": [
63
+ 224,
64
+ 224
65
+ ],
66
+ "gradient_checkpointing": false,
67
+ "compile_model": false,
68
+ "compile_mode": "max-autotune",
69
+ "optimizer_lr": 2.5e-05,
70
+ "optimizer_betas": [
71
+ 0.9,
72
+ 0.95
73
+ ],
74
+ "optimizer_eps": 1e-08,
75
+ "optimizer_weight_decay": 0.01,
76
+ "optimizer_grad_clip_norm": 1.0,
77
+ "scheduler_warmup_steps": 1000,
78
+ "scheduler_decay_steps": 30000,
79
+ "scheduler_decay_lr": 2.5e-06,
80
+ "tokenizer_max_length": 200
81
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9236d6c90c60603fa1ef6ad2b3e91489cbe70be941ff0f0af9d0656c6519606d
3
+ size 11693449488
policy_postprocessor.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "policy_postprocessor",
3
+ "steps": [
4
+ {
5
+ "registry_name": "unnormalizer_processor",
6
+ "config": {
7
+ "eps": 1e-08,
8
+ "features": {},
9
+ "norm_map": {
10
+ "VISUAL": "IDENTITY",
11
+ "STATE": "MEAN_STD",
12
+ "ACTION": "MEAN_STD"
13
+ }
14
+ }
15
+ },
16
+ {
17
+ "registry_name": "device_processor",
18
+ "config": {
19
+ "device": "cpu",
20
+ "float_dtype": null
21
+ }
22
+ }
23
+ ]
24
+ }
policy_preprocessor.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "policy_preprocessor",
3
+ "steps": [
4
+ {
5
+ "registry_name": "rename_observations_processor",
6
+ "config": {
7
+ "rename_map": {}
8
+ }
9
+ },
10
+ {
11
+ "registry_name": "to_batch_processor",
12
+ "config": {}
13
+ },
14
+ {
15
+ "registry_name": "normalizer_processor",
16
+ "config": {
17
+ "eps": 1e-08,
18
+ "features": {},
19
+ "norm_map": {
20
+ "VISUAL": "IDENTITY",
21
+ "STATE": "MEAN_STD",
22
+ "ACTION": "MEAN_STD"
23
+ }
24
+ }
25
+ },
26
+ {
27
+ "registry_name": "pi0_fast_prepare_state_tokenizer_processor_step",
28
+ "config": {}
29
+ },
30
+ {
31
+ "registry_name": "tokenizer_processor",
32
+ "config": {
33
+ "max_length": 200,
34
+ "task_key": "task",
35
+ "padding_side": "right",
36
+ "padding": "max_length",
37
+ "truncation": true,
38
+ "tokenizer_name": "google/paligemma-3b-pt-224"
39
+ }
40
+ },
41
+ {
42
+ "registry_name": "action_tokenizer_processor",
43
+ "config": {
44
+ "trust_remote_code": true,
45
+ "max_action_tokens": 256,
46
+ "action_tokenizer_name": "physical-intelligence/fast"
47
+ }
48
+ },
49
+ {
50
+ "registry_name": "device_processor",
51
+ "config": {
52
+ "device": "cpu",
53
+ "float_dtype": null
54
+ }
55
+ }
56
+ ]
57
+ }