Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +166 -0
config.json +81 -0
model.safetensors +3 -0
policy_postprocessor.json +24 -0
policy_preprocessor.json +57 -0

README.md ADDED Viewed

	@@ -0,0 +1,166 @@

+---
+language:
+- en
+library_name: lerobot
+pipeline_tag: robotics
+tags:
+- vision-language-action
+- imitation-learning
+- lerobot
+inference: false
+license: gemma
+---
+# π0 fast (PI0Fast) (LeRobot)
+**PI0Fast** is a Vision-Language-Action (VLA) policy that predicts continuous robot actions via **autoregressive next-token prediction** over **FAST action tokens**.
+**Original authors / paper:** [FAST: Efficient Action Tokenization for Vision-Language-Action Models](https://arxiv.org/abs/2501.09747)
+**Implementation:** This LeRobot implementation follows the original reference code for compatibility.
+**Reference implementation:** [https://github.com/Physical-Intelligence/openpi]
+## Model description
+- **Inputs:** images (multi-view), proprio/state, optional language instruction
+- **Outputs:** continuous actions (decoded from model outputs)
+- **Training objective:** next-token cross-entropy
+- **Action representation:** FAST tokens
+- **Intended use:** Fine tune on your task.
+## Quick start (inference on a real batch)
+### Installation
+```bash
+pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"
+For full installation details (including optional video dependencies such as ffmpeg for torchcodec), see the official documentation: https://huggingface.co/docs/lerobot/installation
+```
+### Load model + dataset, run `select_action`
+```python
+import torch
+from lerobot.datasets.lerobot_dataset import LeRobotDataset
+from lerobot.policies.factory import make_pre_post_processors
+# Swap this import per-policy
+from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
+# load a policy
+model_id = "lerobot/pi0fast-libero"  # <- swap checkpoint
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+policy = PI0FastPolicy.from_pretrained(model_id).to(device).eval()
+preprocess, postprocess = make_pre_post_processors(
+    policy.config,
+    model_id,
+    preprocessor_overrides={"device_processor": {"device": str(device)}},
+)
+# load a lerobotdataset
+dataset = LeRobotDataset("lerobot/libero")
+# pick an episode
+episode_index = 0
+# each episode corresponds to a contiguous range of frame indices
+from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
+to_idx   = dataset.meta.episodes["dataset_to_index"][episode_index]
+# get a single frame from that episode (e.g. the first frame)
+frame_index = from_idx
+frame = dict(dataset[frame_index])
+batch = preprocess(frame)
+with torch.inference_mode():
+    pred_action = policy.select_action(batch)
+    # use your policy postprocess, this post process the action
+    # for instance unnormalize the actions, detokenize it etc..
+    pred_action = postprocess(pred_action)
+```
+## Training step (loss + backward)
+If you’re training / fine-tuning, you typically call `forward(...)` to get a loss and then:
+```python
+policy.train()
+batch = dict(dataset[0])
+batch = preprocess(batch)
+loss, outputs = policy.forward(batch)
+loss.backward()
+```
+> Notes:
+>
+> - Some policies expose `policy(**batch)` or return a dict; keep this snippet aligned with the policy API.
+> - Use your trainer script (`lerobot-train`) for full training loops.
+## How to train / fine-tune
+```bash
+lerobot-train \
+  --dataset.repo_id=HuggingFaceVLA/libero \
+  --output_dir=./outputs/[RUN_NAME] \
+  --job_name=[RUN_NAME] \
+  --policy.repo_id=[THIS_REPO_OR_CHECKPOINT] \
+  --policy.path=lerobot/[BASE_CHECKPOINT] \
+  --policy.dtype=bfloat16 \
+  --policy.device=cuda \
+  --steps=100000 \
+  --batch_size=4
+```
+Add policy-specific flags below:
+- `-policy.chunk_size=...`
+- `-policy.n_action_steps=...`
+- `-policy.max_action_tokens=...`
+- `-policy.gradient_checkpointing=true`
+---
+## Evaluate in Simulation (LIBERO)
+You can evaluate the model in Libero environment.
+```bash
+lerobot-eval \
+  --policy.path=lerobot/[CHECKPOINT_ID] \
+  --env.type=libero \
+  --env.task=libero_object \
+  --eval.batch_size=1 \
+  --eval.n_episodes=20
+```
+---
+## Real-World Inference & Evaluation
+You can use the `record` script from [**`lerobot-record`**](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_record.py) with a policy checkpoint as input, to run inference and evaluate your policy.
+For instance, run this command or API example to run inference and record 10 evaluation episodes:
+Copied
+```
+lerobot-record  \
+  --robot.type=so100_follower \
+  --robot.port=/dev/ttyACM1 \
+  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
+  --robot.id=my_awesome_follower_arm \
+  --display_data=false \
+  --dataset.repo_id=${HF_USER}/eval_so100 \
+  --dataset.single_task="Put lego brick into the transparent box" \
+  # <- Teleop optional if you want to teleoperate in between episodes \
+  # --teleop.type=so100_leader \
+  # --teleop.port=/dev/ttyACM0 \
+  # --teleop.id=my_awesome_leader_arm \
+  --policy.path=${HF_USER}/my_policy
+```

config.json ADDED Viewed

	@@ -0,0 +1,81 @@

+{
+    "type": "pi0_fast",
+    "input_features": {
+        "observation.images.base_0_rgb": {
+            "type": "VISUAL",
+            "shape": [
+                3,
+                224,
+                224
+            ]
+        },
+        "observation.images.left_wrist_0_rgb": {
+            "type": "VISUAL",
+            "shape": [
+                3,
+                224,
+                224
+            ]
+        },
+        "observation.images.right_wrist_0_rgb": {
+            "type": "VISUAL",
+            "shape": [
+                3,
+                224,
+                224
+            ]
+        },
+        "observation.state": {
+            "type": "STATE",
+            "shape": [
+                32
+            ]
+        }
+    },
+    "output_features": {
+        "action": {
+            "type": "ACTION",
+            "shape": [
+                32
+            ]
+        }
+    },
+    "device": "cuda",
+    "use_amp": false,
+    "push_to_hub": true,
+    "repo_id": null,
+    "private": null,
+    "tags": null,
+    "license": null,
+    "paligemma_variant": "gemma_2b",
+    "dtype": "bfloat16",
+    "chunk_size": 50,
+    "n_action_steps": 50,
+    "max_action_dim": 32,
+    "max_state_dim": 32,
+    "max_action_tokens": 256,
+    "text_tokenizer_name": "google/paligemma-3b-pt-224",
+    "action_tokenizer_name": "physical-intelligence/fast",
+    "temperature": 0.0,
+    "max_decoding_steps": 256,
+    "fast_skip_tokens": 128,
+    "image_resolution": [
+        224,
+        224
+    ],
+    "gradient_checkpointing": false,
+    "compile_model": false,
+    "compile_mode": "max-autotune",
+    "optimizer_lr": 2.5e-05,
+    "optimizer_betas": [
+        0.9,
+        0.95
+    ],
+    "optimizer_eps": 1e-08,
+    "optimizer_weight_decay": 0.01,
+    "optimizer_grad_clip_norm": 1.0,
+    "scheduler_warmup_steps": 1000,
+    "scheduler_decay_steps": 30000,
+    "scheduler_decay_lr": 2.5e-06,
+    "tokenizer_max_length": 200
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9236d6c90c60603fa1ef6ad2b3e91489cbe70be941ff0f0af9d0656c6519606d
+size 11693449488

policy_postprocessor.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "name": "policy_postprocessor",
+  "steps": [
+    {
+      "registry_name": "unnormalizer_processor",
+      "config": {
+        "eps": 1e-08,
+        "features": {},
+        "norm_map": {
+          "VISUAL": "IDENTITY",
+          "STATE": "MEAN_STD",
+          "ACTION": "MEAN_STD"
+        }
+      }
+    },
+    {
+      "registry_name": "device_processor",
+      "config": {
+        "device": "cpu",
+        "float_dtype": null
+      }
+    }
+  ]
+}

policy_preprocessor.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "name": "policy_preprocessor",
+  "steps": [
+    {
+      "registry_name": "rename_observations_processor",
+      "config": {
+        "rename_map": {}
+      }
+    },
+    {
+      "registry_name": "to_batch_processor",
+      "config": {}
+    },
+    {
+      "registry_name": "normalizer_processor",
+      "config": {
+        "eps": 1e-08,
+        "features": {},
+        "norm_map": {
+          "VISUAL": "IDENTITY",
+          "STATE": "MEAN_STD",
+          "ACTION": "MEAN_STD"
+        }
+      }
+    },
+    {
+      "registry_name": "pi0_fast_prepare_state_tokenizer_processor_step",
+      "config": {}
+    },
+    {
+      "registry_name": "tokenizer_processor",
+      "config": {
+        "max_length": 200,
+        "task_key": "task",
+        "padding_side": "right",
+        "padding": "max_length",
+        "truncation": true,
+        "tokenizer_name": "google/paligemma-3b-pt-224"
+      }
+    },
+    {
+      "registry_name": "action_tokenizer_processor",
+      "config": {
+        "trust_remote_code": true,
+        "max_action_tokens": 256,
+        "action_tokenizer_name": "physical-intelligence/fast"
+      }
+    },
+    {
+      "registry_name": "device_processor",
+      "config": {
+        "device": "cpu",
+        "float_dtype": null
+      }
+    }
+  ]
+}