rskill-pi05-so101-pickplace-nf4

This package wraps hf://HollyTan/pi05_so101_pick_place-v2.2basev2.4_abs_nofreeze_8b as a pre-quantized (nf4) OpenRAL rSkill. It does not copy model weights — the manifest points at the mirrored nf4 pack on the Hub.

What this skill does

A π0.5 (PaliGemma + action-expert) vision-language-action policy finetuned for pick-and-place on the SO-101 follower arm. Given two RGB views (overhead + wrist) and the 6-D joint state, it emits absolute 6-DoF joint-position action chunks of length 50.

Field	Value
Actions	pick, place, pick_and_place
Objects	diverse tabletop objects
Scenes	tabletop
Embodiment	so101_follower

How it works

π0.5 runs a frozen-ish PaliGemma 3 B backbone plus a small action expert that decodes a flow-matching action chunk conditioned on the image tokens, the language instruction, and the proprioceptive state. This rSkill ships the post-quantization nf4 state dict and a quantization_metadata.json sentinel; the pi05 adapter detects it, meta-initialises the graph, overlays the packed weights, and skips the on-line bf16→nf4 conversion, so the 4.14 B graph fits in ~4 GiB of VRAM.

Observation → action contract

dir	key	shape	notes
in	`observation.images.top`	`(1, 3, H, W)` float32 [0,1]	overhead view ← scene `oak_top`
in	`observation.images.wrist`	`(1, 3, H, W)` float32 [0,1]	gripper-mounted view ← scene `wrist`
in	`observation.images.front`	`(1, 3, H, W)` float32 [0,1]	optional; zero-padded via image mask when absent
in	`observation.state`	`(1, 6)` float32	SO-101 joint positions (rad)
out	action chunk	`(50, 6)` float32	absolute joint-position targets

How it was trained / provenance

Field	Value
Source repo	`HollyTan/pi05_so101_pick_place-v2.2basev2.4_abs_nofreeze_8b`
Base model	`lerobot/pi05_base` (Physical Intelligence π0.5)
Paper	arxiv:2410.24164 — π0.5: a VLA with Open-World Generalization
License	apache-2.0
Parameters	~4.14 B (PaliGemma backbone + action expert)
Training data	`HollyTan/so101_pick-place-v2.2-100eps` — 100 SO-101 teleop episodes

The nf4 pack is produced from the upstream checkpoint with tools/quantize_rskill.py (see the header of rskill.yaml for the exact command).

Supported robots / embodiments

Embodiment	Cameras	Status	Notes
`so101_follower`	overhead + wrist	⚡ experimental	6-DoF; validated to load + step on the `so101_box` sim scene

SO-100 shares the 6-DoF follower IO contract but is not claimed — it's an unverified transfer and the canonical SO-100 detection profile does not advertise int4 GPU support. Add so100_follower to embodiment_tags once an SO-100 host validates the nf4 load.

Sensors required

Feature key	Modality	Min size	Dtype
`observation.images.camera1`	RGB	224 × 224	`float32`
`observation.images.camera2`	RGB	224 × 224	`float32`
`observation.state`	proprioception	(6,)	`float32`

Manifest summary

Field	Value
`name`	`OpenRAL/rskill-pi05-so101-pickplace-nf4`
`model_family`	`pi05`
`role`	`s1`
`license`	`apache-2.0`
`embodiment_tags`	`so101_follower`
`runtime` / `quantization.dtype`	`pytorch` / `int4` (nf4)
`weights_uri`	`hf://OpenRAL/rskill-pi05-so101-pickplace-nf4`

License

Apache-2.0 — both the upstream code and the HollyTan checkpoint weights are Apache-2.0, so commercial use is permitted (unlike the permissive-research pi05-so100 base wrapper).

Downloads last month: 9

Safetensors

Model size

3B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for OpenRAL/rskill-pi05-so101-pickplace-nf4

π_0: A Vision-Language-Action Flow Model for General Robot Control

Paper • 2410.24164 • Published Oct 31, 2024 • 31