name: 3d-diffuser-actor-rlbench
description: >-
S1 Vision-Language-Action policy. Capabilities: generalist, open, close, pick,
place. 3D Diffuser Actor (Ke et al., 2024) — a diffusion policy over
end-effector keyposes fusing multi-view RGB-D into a 3D scene representation,
on the RLBench PerAct 18-task benchmark. Shares the out-of-process
CoppeliaSim/PyRep sidecar with the rlbench scene backend (ADR-0062). MIT code
+ checkpoints. The PerAct checkpoint is loaded verbatim; ships three
live-verified starter tasks. Discovery view of an OpenRAL rSkill — NOT
directly runnable by an agent harness; it runs via rSkill.from_pretrained +
the robot HAL.
metadata:
openral_rskill: true
schema_version: 0.1
rskill_id: OpenRAL/rskill-3d-diffuser-actor-rlbench
manifest: ./rskill.yaml
role: s1
kind: vla
model_family: diffuser_actor
embodiment_tags:
- franka_panda
actions:
- generalist
- open
- close
- pick
- place
scenes:
- tabletop
sensors_required:
- rgb
action_dim: 8
runtime: pytorch
min_vram_gb:
bf16: 2
fp32: 2
chunk_size: 1
latency_budget:
per_chunk_ms: 3000
license_code: Apache-2.0
license_weights: mit
weights_uri: hf://katefgroup/3d_diffuser_actor
source_repo: hf://katefgroup/3d_diffuser_actor
paper_url: https://arxiv.org/abs/2402.10885
3d-diffuser-actor-rlbench — rSkill discovery view
Generated view, not a hand-written skill. This
SKILL.mdis a discovery-only mirror ofrskill.yaml, produced bytools/generate_rskill_skillmd.py. It lets tools that read the standard agent-skill format find and reason about this OpenRAL rSkill. Therskill.yamlmanifest is the single source of truth (CLAUDE.md §1.3). Do not edit by hand — edit the manifest and regenerate.
What it is
An OpenRAL Vision-Language-Action policy (role: s1, kind: vla). 3D Diffuser Actor (Ke et al., 2024) — a diffusion policy over end-effector keyposes fusing multi-view RGB-D into a 3D scene representation, on the RLBench PerAct 18-task benchmark. Shares the out-of-process CoppeliaSim/PyRep sidecar with the rlbench scene backend (ADR-0062). MIT code + checkpoints. The PerAct checkpoint is loaded verbatim; ships three live-verified starter tasks.
Capabilities
- Verbs: generalist · open · close · pick · place
- Scenes: tabletop
- Embodiments: franka_panda
Why this is discovery-only
An agent skill is natural-language instructions loaded into an LLM's context. An rSkill is an executable artifact: it carries a typed capability/embodiment contract, model weights, a runtime, and a license/provenance gate — none of which fit in freeform markdown. So an agent can use this view to select the right skill, but cannot execute it by loading this file. Execution always goes through the OpenRAL loader and the robot HAL.
License
- Code: Apache-2.0.
- Weights:
mit— permissive / commercial-use OK
How to actually run it (not via an agent harness)
from openral_rskill import rSkill
skill = rSkill.from_pretrained("OpenRAL/rskill-3d-diffuser-actor-rlbench")
# the loader validates embodiment / sensors / runtime / quantization against the target
# RobotDescription and enforces the weight-license gate before any weights load.
See rskill.yaml for the authoritative, validated manifest.