| --- |
| library_name: physicalai |
| license: apache-2.0 |
| model_name: ACT |
| pipeline_tag: robotics |
| tags: |
| - act |
| - executorch |
| - onnx |
| - openvino |
| - physical-ai-studio |
| - physicalai |
| - robotics |
| - torch |
| - vision-language-action |
| --- |
| |
| <center> |
| <a href="https://github.com/open-edge-platform/physical-ai-studio"> |
| <img src="https://github.com/open-edge-platform/physical-ai-studio/raw/main/docs/assets/physical_ai_studio.png" alt="Physical AI Studio - VLA model fine-tuning for robotics" /> |
| </a> |
| </center> |
| |
| # Action Chunking Transformer (ACT) |
|
|
| [Action Chunking with Transformers (ACT)](https://huggingface.co/papers/2304.13705) is an imitation-learning policy |
| that predicts short action chunks from robot state and visual observations. The robot can execute those chunks as a |
| sequence of real-world movements. |
|
|
| This model was trained and exported with Physical AI Studio for local or Hugging Face-hosted robot inference. |
|
|
| ## Model Details |
|
|
| - **Policy:** act |
| - **Runtime library:** `physicalai` |
| - **Generated by:** Physical AI Studio |
|
|
| ## Intended Use |
|
|
| Use this model for robot imitation-learning inference in setups matching the training dataset, robot embodiment, |
| camera viewpoints, and task instructions. Validate behavior in simulation or a safe test cell before running on hardware. |
|
|
| ## Dataset |
|
|
| This model was trained from the Physical AI Studio dataset named **Dice cleanup**. |
|
|
| ## Model Package |
|
|
| Load the model from the root directory when possible. The root `manifest.json` is the package entry point, and |
| backend-specific manifests live under `exports/<backend>/manifest.json`. |
|
|
| | Backend | Artifact | Intended Use | |
| | --- | --- | --- | |
| | torch | `exports/torch/act.pt` | Canonical checkpoint and Python inference | |
| | executorch | `exports/executorch/act.pte` | Edge and mobile runtime experiments | |
| | onnx | `exports/onnx/act.onnx` | Runtime portability | |
| | openvino | `exports/openvino/act.xml` | CPU, Intel GPU, and NPU inference | |
|
|
| ## Training Environment |
|
|
| Environment: **So101** |
|
|
| ```yaml |
| name: So101 |
| robots: |
| - name: SO101 Follower |
| type: SO101_Follower |
| calibration: |
| elbow_flex: |
| id: 3 |
| drive_mode: 0 |
| homing_offset: 1149 |
| range_min: 851 |
| range_max: 3074 |
| gripper: |
| id: 6 |
| drive_mode: 0 |
| homing_offset: 1088 |
| range_min: 1938 |
| range_max: 3416 |
| shoulder_lift: |
| id: 2 |
| drive_mode: 0 |
| homing_offset: 263 |
| range_min: 821 |
| range_max: 3195 |
| shoulder_pan: |
| id: 1 |
| drive_mode: 0 |
| homing_offset: 135 |
| range_min: 732 |
| range_max: 3454 |
| wrist_flex: |
| id: 4 |
| drive_mode: 0 |
| homing_offset: -1606 |
| range_min: 860 |
| range_max: 3188 |
| wrist_roll: |
| id: 5 |
| drive_mode: 0 |
| homing_offset: 612 |
| range_min: 124 |
| range_max: 3956 |
| cameras: |
| - name: Gripper |
| driver: usb_camera |
| hardware_name: 'Innomaker-U20CAM-1080p-S1: Inno' |
| width: 640 |
| height: 480 |
| fps: 30 |
| - name: Overview |
| driver: usb_camera |
| hardware_name: 'Innomaker-U20CAM-1080p-S1: Inno' |
| width: 640 |
| height: 480 |
| fps: 30 |
| ``` |
|
|
| ## I/O Specification |
|
|
| ### Shared By `executorch`, `onnx`, `openvino`, `torch` |
|
|
| #### Inputs |
|
|
| | Name | Type | Shape | Dtype | |
| | --- | --- | --- | --- | |
| | state | STATE | [6] | float32 | |
| | images.gripper | VISUAL | [3, 480, 640] | float32 | |
| | images.overview | VISUAL | [3, 480, 640] | float32 | |
|
|
| #### Outputs |
|
|
| | Name | Type | Shape | Dtype | |
| | --- | --- | --- | --- | |
| | action | ACTION | [100, 6] | float32 | |
|
|
| ## Running Inference |
|
|
| ### Installation |
|
|
| ```bash |
| uv pip install physicalai numpy |
| ``` |
|
|
| The following smoke test verifies that the package loads and accepts tensors with the declared shapes. Replace the dummy |
| values with observations from your robot runtime before using the model for control. |
|
|
| ```python |
| import numpy as np |
| from physicalai.inference import InferenceModel |
| |
| MODEL_PATH = "path/to/model" |
| model = InferenceModel.load(MODEL_PATH, device="CPU") |
| |
| observation = { |
| "state": np.random.rand(1, 6).astype(np.float32), |
| "images.gripper": np.random.rand(1, 3, 480, 640).astype(np.float32), |
| "images.overview": np.random.rand(1, 3, 480, 640).astype(np.float32), |
| } |
| |
| chunk = model.predict_action_chunk(observation) |
| ``` |
|
|
| Set `MODEL_PATH` to this local model directory or to the Hugging Face repository id after upload. |
|
|
| ### Running A Robot Control Loop |
|
|
| For a blocking control loop similar to PhysicalAI's `examples/runtime/sync_inference.py`, start from the training robot |
| and camera names exported above. Local device handles are placeholders because ports, camera paths, and stream URLs are |
| not included in published model metadata. |
|
|
| ```bash |
| python examples/runtime/sync_inference.py \ |
| --robot so101 \ |
| --port /dev/ttyACM0 \ |
| --calibration ./calibration.json \ |
| --model path/to/model \ |
| --camera gripper:uvc:/dev/video0 \ |
| --camera overview:uvc:/dev/video1 \ |
| --task "Move the dice into the cup" \ |
| --device CPU |
| ``` |
|
|
| ## Training / Reproducing Training |
|
|
| Import this model in Physical AI Studio and start a new training job using it as the base model. Studio will preserve |
| the training lineage through the parent model relationship. |
|
|
| To reproduce behavior on your own hardware, match the exported I/O specification, robot type, camera viewpoints, |
| control frequency, and calibration values from `environment.json` as closely as possible. |
|
|
| ## Evaluation |
|
|
| No task-specific evaluation metrics were exported with this generated card. Add validation results, success rates, and |
| hardware test conditions before publishing externally. |
|
|
| ## Limitations And Safety |
|
|
| Robot policies can behave unpredictably outside their training distribution. Validate camera viewpoints, lighting, |
| object placement, calibration values, robot embodiment, and task wording before autonomous operation. Use hardware |
| limits, emergency stops, supervision, and staged validation. |
|
|