rskill-act-libero
Action Chunking Transformer (Zhao et al., 2023)
fine-tuned on HuggingFaceVLA/libero, packaged as a OpenRAL rSkill
for the LIBERO Franka-Panda embodiment.
| Field | Value |
|---|---|
| Weights | hf://Deepkar/libero-test-act |
| Architecture | ResNet-18 backbone · 4+1 encoder/decoder · latent VAE · chunk_size=100 |
| Inputs | image 256×256, image2 256×256, state (8-D) |
| Action | 7-D delta-EEF + gripper |
| Robot | franka_panda (LIBERO embodiment tag) |
| Dataset | HuggingFaceVLA/libero (Apache-2.0) |
| License | Apache-2.0 |
Run
CC=/usr/bin/gcc uv sync --group libero # first time only
ral sim run --config examples/sim/act_libero_spatial.yaml \
--rskill rskill://rskills/act-libero
The shipped sim YAML pins libero_spatial/0 for a 200-step
single-episode rollout. Sweep tasks with --task libero_spatial/<n>.
A spot-check on libero_spatial/2 reaches is_success=True in
~91 steps (reward 1.0) on a single seed.
Camera & state contract
LIBERO emits images={"camera1": agentview, "camera2": eye_in_hand}
while this checkpoint's input features are
observation.images.image / observation.images.image2. The
manifest's image_preprocessing block rewrites the batch keys at step
time:
image_preprocessing:
flip_180: true # HuggingFaceVLA/libero is captured rotated 180°
aliases:
camera1: image
camera2: image2
The state_contract.dim: 8 declaration confirms the proprio width.
Because the upstream training set is HuggingFaceVLA/libero — the same
dataset the smolvla / pi05 / xvla LIBERO checkpoints in this repo were
finetuned on — the state semantics (pos3 + axisangle3 + grip2) line up
with OpenRAL's LIBERO backend end-to-end, with no quat-vs-axisangle
mismatch.
Benchmarks
None measured yet. Populate eval/ with ral benchmark run JSON
fixtures before publishing a headline number.