Upload folder using huggingface_hub

4230f34 verified about 1 month ago

5.8 kB

	---
	library_name: physicalai
	license: apache-2.0
	model_name: ACT
	pipeline_tag: robotics
	tags:
	- act
	- executorch
	- onnx
	- openvino
	- physical-ai-studio
	- physicalai
	- robotics
	- torch
	- vision-language-action
	---

	<center>
	<a href="https://github.com/open-edge-platform/physical-ai-studio">
	<img src="https://github.com/open-edge-platform/physical-ai-studio/raw/main/docs/assets/physical_ai_studio.png" alt="Physical AI Studio - VLA model fine-tuning for robotics" />
	</a>
	</center>

	# Action Chunking Transformer (ACT)

	[Action Chunking with Transformers (ACT)](https://huggingface.co/papers/2304.13705) is an imitation-learning policy
	that predicts short action chunks from robot state and visual observations. The robot can execute those chunks as a
	sequence of real-world movements.

	This model was trained and exported with Physical AI Studio for local or Hugging Face-hosted robot inference.

	## Model Details

	- Policy: act
	- Runtime library: `physicalai`
	- Generated by: Physical AI Studio

	## Intended Use

	Use this model for robot imitation-learning inference in setups matching the training dataset, robot embodiment,
	camera viewpoints, and task instructions. Validate behavior in simulation or a safe test cell before running on hardware.

	## Dataset

	This model was trained from the Physical AI Studio dataset named Dice cleanup.

	## Model Package

	Load the model from the root directory when possible. The root `manifest.json` is the package entry point, and
	backend-specific manifests live under `exports/<backend>/manifest.json`.

	\| Backend \| Artifact \| Intended Use \|
	\| --- \| --- \| --- \|
	\| torch \| `exports/torch/act.pt` \| Canonical checkpoint and Python inference \|
	\| executorch \| `exports/executorch/act.pte` \| Edge and mobile runtime experiments \|
	\| onnx \| `exports/onnx/act.onnx` \| Runtime portability \|
	\| openvino \| `exports/openvino/act.xml` \| CPU, Intel GPU, and NPU inference \|

	## Training Environment

	Environment: So101

	```yaml
	name: So101
	robots:
	- name: SO101 Follower
	type: SO101_Follower
	calibration:
	elbow_flex:
	id: 3
	drive_mode: 0
	homing_offset: 1149
	range_min: 851
	range_max: 3074
	gripper:
	id: 6
	drive_mode: 0
	homing_offset: 1088
	range_min: 1938
	range_max: 3416
	shoulder_lift:
	id: 2
	drive_mode: 0
	homing_offset: 263
	range_min: 821
	range_max: 3195
	shoulder_pan:
	id: 1
	drive_mode: 0
	homing_offset: 135
	range_min: 732
	range_max: 3454
	wrist_flex:
	id: 4
	drive_mode: 0
	homing_offset: -1606
	range_min: 860
	range_max: 3188
	wrist_roll:
	id: 5
	drive_mode: 0
	homing_offset: 612
	range_min: 124
	range_max: 3956
	cameras:
	- name: Gripper
	driver: usb_camera
	hardware_name: 'Innomaker-U20CAM-1080p-S1: Inno'
	width: 640
	height: 480
	fps: 30
	- name: Overview
	driver: usb_camera
	hardware_name: 'Innomaker-U20CAM-1080p-S1: Inno'
	width: 640
	height: 480
	fps: 30
	```

	## I/O Specification

	### Shared By `executorch`, `onnx`, `openvino`, `torch`

	#### Inputs

	\| Name \| Type \| Shape \| Dtype \|
	\| --- \| --- \| --- \| --- \|
	\| state \| STATE \| [6] \| float32 \|
	\| images.gripper \| VISUAL \| [3, 480, 640] \| float32 \|
	\| images.overview \| VISUAL \| [3, 480, 640] \| float32 \|

	#### Outputs

	\| Name \| Type \| Shape \| Dtype \|
	\| --- \| --- \| --- \| --- \|
	\| action \| ACTION \| [100, 6] \| float32 \|

	## Running Inference

	### Installation

	```bash
	uv pip install physicalai numpy
	```

	The following smoke test verifies that the package loads and accepts tensors with the declared shapes. Replace the dummy
	values with observations from your robot runtime before using the model for control.

	```python
	import numpy as np
	from physicalai.inference import InferenceModel

	MODEL_PATH = "path/to/model"
	model = InferenceModel.load(MODEL_PATH, device="CPU")

	observation = {
	"state": np.random.rand(1, 6).astype(np.float32),
	"images.gripper": np.random.rand(1, 3, 480, 640).astype(np.float32),
	"images.overview": np.random.rand(1, 3, 480, 640).astype(np.float32),
	}

	chunk = model.predict_action_chunk(observation)
	```

	Set `MODEL_PATH` to this local model directory or to the Hugging Face repository id after upload.

	### Running A Robot Control Loop

	For a blocking control loop similar to PhysicalAI's `examples/runtime/sync_inference.py`, start from the training robot
	and camera names exported above. Local device handles are placeholders because ports, camera paths, and stream URLs are
	not included in published model metadata.

	```bash
	python examples/runtime/sync_inference.py \
	--robot so101 \
	--port /dev/ttyACM0 \
	--calibration ./calibration.json \
	--model path/to/model \
	--camera gripper:uvc:/dev/video0 \
	--camera overview:uvc:/dev/video1 \
	--task "Move the dice into the cup" \
	--device CPU
	```

	## Training / Reproducing Training

	Import this model in Physical AI Studio and start a new training job using it as the base model. Studio will preserve
	the training lineage through the parent model relationship.

	To reproduce behavior on your own hardware, match the exported I/O specification, robot type, camera viewpoints,
	control frequency, and calibration values from `environment.json` as closely as possible.

	## Evaluation

	No task-specific evaluation metrics were exported with this generated card. Add validation results, success rates, and
	hardware test conditions before publishing externally.

	## Limitations And Safety

	Robot policies can behave unpredictably outside their training distribution. Validate camera viewpoints, lighting,
	object placement, calibration values, robot embodiment, and task wording before autonomous operation. Use hardware
	limits, emergency stops, supervision, and staged validation.