| --- |
| tags: |
| - OpenRAL |
| - rskill |
| - diffuser-actor |
| - 3d-diffuser-actor |
| - rlbench |
| - coppeliasim |
| - peract |
| - manipulation |
| - franka |
| license: mit |
| language: |
| - en |
| --- |
| |
| <!-- |
| rSkill README β 3D Diffuser Actor (RLBench PerAct setup). |
| Discovery + provenance card; mirrors rskill.yaml. ADR-0062. |
| --> |
|
|
| # rskill-3d-diffuser-actor-rlbench |
|
|
| 3D Diffuser Actor β a diffusion policy over end-effector **keyposes** for RLBench, |
| running on the CoppeliaSim/PyRep RLBench benchmark backend (ADR-0062). |
|
|
| ## What this skill does |
|
|
| Predicts the next end-effector keypose (position + orientation + gripper) from |
| multi-view RGB-D, conditioned on a language instruction. Used to benchmark |
| 3D/keyframe manipulation on the RLBench **PerAct 18-task** suite. Ships the three |
| live-verified starter tasks: `open_drawer`, `meat_off_grill`, `close_jar`. |
|
|
| | Field | Value | |
| |---|---| |
| | Actions | open, close, pick, place (generalist keyframe policy) | |
| | Objects | drawer, grill/meat, jar β (PerAct task objects) | |
| | Scenes | tabletop (RLBench / CoppeliaSim) | |
| | Embodiment | franka_panda | |
| |
| ## How it works |
| |
| 3D Diffuser Actor lifts the four RLBench camera RGB-D streams into a 3D point-cloud |
| scene token field, attends over it with a relative-position transformer, and runs a |
| DDPM diffusion head (100 denoising steps) to denoise an end-effector keypose |
| trajectory. Each predicted keypose is executed in RLBench by its sampling-based |
| motion planner (`EndEffectorPoseViaPlanning`), then the policy re-observes and |
| predicts the next keypose. The policy and the CoppeliaSim/PyRep scene run in an |
| out-of-process **py3.10 sidecar** (ZMQ + msgpack); the openral adapter |
| (`openral_sim.policies.rlbench_3dda`) forks it transparently. |
| |
| ### Observation β action contract |
| |
| | dir | key | shape | notes | |
| |---|---|---|---| |
| | in | `observation.images.{left_shoulder,right_shoulder,wrist,front}` | `(H, W, 3) uint8` | RLBench PerAct cameras, 256Γ256 | |
| | in | `observation.point_clouds.{β¦}` | `(H, W, 3) float32` | per-camera world-frame point clouds | |
| | in | `observation.gripper_pose` | `(7,)` float32 | `[x y z qx qy qz qw]` | |
| | out | keyframe action | `(8,)` float32 | `[x y z qx qy qz qw gripper_open]` (world frame) | |
|
|
| ## Upstream model / training |
|
|
| Weights are the authors' published RLBench PerAct multi-task checkpoint |
| (`diffuser_actor_peract.pth`); loaded verbatim, not retrained. Trained by the |
| authors on the PerAct 18-task RLBench demonstrations (multi-view RGB-D + keypose |
| supervision). |
|
|
| | Field | Value | |
| |---|---| |
| | Source repo | [`nickgkan/3d_diffuser_actor`](https://github.com/nickgkan/3d_diffuser_actor) | |
| | Weights | [`katefgroup/3d_diffuser_actor`](https://huggingface.co/katefgroup/3d_diffuser_actor) β `diffuser_actor_peract.pth` (168 MB) | |
| | Paper | [arxiv:2402.10885](https://arxiv.org/abs/2402.10885) β *3D Diffuser Actor: Policy Diffusion with 3D Scene Representations* | |
| | License | mit (code + checkpoints) β commercially permissive | |
| | Parameters | ~55 M | |
| | Training data | RLBench PerAct 18-task demonstrations | |
|
|
| ## Supported robots |
|
|
| | Robot | Scene | Status | Notes | |
| |---|---|---|---| |
| | franka_panda | RLBench (CoppeliaSim) | β validated | open_drawer 4/4, meat_off_grill 3/3, close_jar solved (8 GB Ada host, 2026-06-19) | |
| |
| ## Sensors required |
| |
| | key | modality | resolution | dtype | |
| |---|---|---|---| |
| | `observation.images.left_shoulder` | RGB | 256 Γ 256 | `uint8` | |
| | `observation.images.right_shoulder` | RGB | 256 Γ 256 | `uint8` | |
| | `observation.images.wrist` | RGB | 256 Γ 256 | `uint8` | |
| | `observation.images.front` | RGB | 256 Γ 256 | `uint8` | |
|
|
| ## Manifest summary |
|
|
| | Field | Value | |
| |---|---| |
| | `name` | `OpenRAL/rskill-3d-diffuser-actor-rlbench` | |
| | `version` | `0.1.0` | |
| | `license` | `mit` | |
| | `role` | `s1` | |
| | `model_family` | `diffuser_actor` | |
| | `embodiment_tags` | `franka_panda` | |
| | `runtime` | `pytorch` | |
| | `weights_uri` | `hf://katefgroup/3d_diffuser_actor` | |
| | `action_contract.dim` | `8` | |
| | `latency_budget.per_chunk_ms` | `3000.0` | |
|
|
| ## Reproduction |
|
|
| ```bash |
| # One-time: provision CoppeliaSim 4.1.0 + PyRep + RLBench@peract + the checkpoint |
| # in the py3.10 sidecar venv (see docs/adr/0062-rlbench-benchmark-backend.md). |
| openral benchmark scene \ |
| --config scenes/benchmark/rlbench_open_drawer.yaml \ |
| --rskill rskills/3d-diffuser-actor-rlbench |
| ``` |
|
|
| Inference VRAM peaks ~0.43 GB; runs comfortably on an 8 GB GPU. CoppeliaSim is |
| proprietary (free EDU license) and is **never** vendored β it is an |
| externally-provisioned dependency (CLAUDE.md Β§1.9 / ADR-0062). |
|
|
| ## Evaluation |
|
|
| [`eval/rlbench.json`](eval/rlbench.json) is the **full official protocol** |
| result (`reproduced_locally: true`), produced by the canonical |
| `openral benchmark run` (ADR-0009 PR D) on an 8 GB Ada host (2026-06-20) β |
| **25 episodes per task**, seeds 0β24, max 25 macro-keyposes: |
|
|
| | Task | Success rate | |
| |---|---| |
| | `open_drawer` | 22/25 = **0.88** | |
| | `meat_off_grill` | 24/25 = **0.96** | |
| | `close_jar` | 19/25 = **0.76** | |
| | **Average** | **0.867** | |
|
|
| (~946 ms mean step latency; in line with the 3D Diffuser Actor paper's ~0.81 |
| RLBench PerAct average.) Reproduce with: |
|
|
| ```bash |
| openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench |
| ``` |
|
|
| > **Note on variance.** RLBench's sampling-based `EndEffectorPoseViaPlanning` |
| > mover is non-deterministic, so per-task rates vary run-to-run; 3 of the 75 |
| > episodes hit a planner path-failure and are counted as failed episodes (the |
| > sidecar handles them gracefully rather than aborting the run β ADR-0062). |
| > Per-task paper baselines (Ke et al., 2402.10885, Table 1) are intentionally |
| > not transcribed into the artifact to avoid mis-citation. |
|
|
| ## License |
|
|
| OpenRAL wrapper files in this repository follow the project Apache-2.0 license. |
| The wrapped upstream 3D Diffuser Actor code and released |
| `diffuser_actor_peract.pth` checkpoint are MIT-licensed; the manifest therefore |
| uses `license: mit` for the consumer-visible weight/runtime posture. |
|
|
| ## See also |
|
|
| - `scenes/benchmark/rlbench_open_drawer.yaml` |
| - `scenes/benchmark/rlbench_meat_off_grill.yaml` |
| - `scenes/benchmark/rlbench_close_jar.yaml` |
| - `benchmarks/rlbench.yaml` |
| - `docs/adr/0062-rlbench-benchmark-backend.md` |
|
|