File size: 6,174 Bytes
799163d b0873f1 d592f2b b0873f1 d592f2b b0873f1 d592f2b b0873f1 d592f2b b0873f1 f1e4d4f b0873f1 f1e4d4f d592f2b f1e4d4f b0873f1 d592f2b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | ---
tags:
- OpenRAL
- rskill
- diffuser-actor
- 3d-diffuser-actor
- rlbench
- coppeliasim
- peract
- manipulation
- franka
license: mit
language:
- en
---
<!--
rSkill README β 3D Diffuser Actor (RLBench PerAct setup).
Discovery + provenance card; mirrors rskill.yaml. ADR-0062.
-->
# rskill-3d-diffuser-actor-rlbench
3D Diffuser Actor β a diffusion policy over end-effector **keyposes** for RLBench,
running on the CoppeliaSim/PyRep RLBench benchmark backend (ADR-0062).
## What this skill does
Predicts the next end-effector keypose (position + orientation + gripper) from
multi-view RGB-D, conditioned on a language instruction. Used to benchmark
3D/keyframe manipulation on the RLBench **PerAct 18-task** suite. Ships the three
live-verified starter tasks: `open_drawer`, `meat_off_grill`, `close_jar`.
| Field | Value |
|---|---|
| Actions | open, close, pick, place (generalist keyframe policy) |
| Objects | drawer, grill/meat, jar β (PerAct task objects) |
| Scenes | tabletop (RLBench / CoppeliaSim) |
| Embodiment | franka_panda |
## How it works
3D Diffuser Actor lifts the four RLBench camera RGB-D streams into a 3D point-cloud
scene token field, attends over it with a relative-position transformer, and runs a
DDPM diffusion head (100 denoising steps) to denoise an end-effector keypose
trajectory. Each predicted keypose is executed in RLBench by its sampling-based
motion planner (`EndEffectorPoseViaPlanning`), then the policy re-observes and
predicts the next keypose. The policy and the CoppeliaSim/PyRep scene run in an
out-of-process **py3.10 sidecar** (ZMQ + msgpack); the openral adapter
(`openral_sim.policies.rlbench_3dda`) forks it transparently.
### Observation β action contract
| dir | key | shape | notes |
|---|---|---|---|
| in | `observation.images.{left_shoulder,right_shoulder,wrist,front}` | `(H, W, 3) uint8` | RLBench PerAct cameras, 256Γ256 |
| in | `observation.point_clouds.{β¦}` | `(H, W, 3) float32` | per-camera world-frame point clouds |
| in | `observation.gripper_pose` | `(7,)` float32 | `[x y z qx qy qz qw]` |
| out | keyframe action | `(8,)` float32 | `[x y z qx qy qz qw gripper_open]` (world frame) |
## Upstream model / training
Weights are the authors' published RLBench PerAct multi-task checkpoint
(`diffuser_actor_peract.pth`); loaded verbatim, not retrained. Trained by the
authors on the PerAct 18-task RLBench demonstrations (multi-view RGB-D + keypose
supervision).
| Field | Value |
|---|---|
| Source repo | [`nickgkan/3d_diffuser_actor`](https://github.com/nickgkan/3d_diffuser_actor) |
| Weights | [`katefgroup/3d_diffuser_actor`](https://huggingface.co/katefgroup/3d_diffuser_actor) β `diffuser_actor_peract.pth` (168 MB) |
| Paper | [arxiv:2402.10885](https://arxiv.org/abs/2402.10885) β *3D Diffuser Actor: Policy Diffusion with 3D Scene Representations* |
| License | mit (code + checkpoints) β commercially permissive |
| Parameters | ~55 M |
| Training data | RLBench PerAct 18-task demonstrations |
## Supported robots
| Robot | Scene | Status | Notes |
|---|---|---|---|
| franka_panda | RLBench (CoppeliaSim) | β validated | open_drawer 4/4, meat_off_grill 3/3, close_jar solved (8 GB Ada host, 2026-06-19) |
## Sensors required
| key | modality | resolution | dtype |
|---|---|---|---|
| `observation.images.left_shoulder` | RGB | 256 Γ 256 | `uint8` |
| `observation.images.right_shoulder` | RGB | 256 Γ 256 | `uint8` |
| `observation.images.wrist` | RGB | 256 Γ 256 | `uint8` |
| `observation.images.front` | RGB | 256 Γ 256 | `uint8` |
## Manifest summary
| Field | Value |
|---|---|
| `name` | `OpenRAL/rskill-3d-diffuser-actor-rlbench` |
| `version` | `0.1.0` |
| `license` | `mit` |
| `role` | `s1` |
| `model_family` | `diffuser_actor` |
| `embodiment_tags` | `franka_panda` |
| `runtime` | `pytorch` |
| `weights_uri` | `hf://katefgroup/3d_diffuser_actor` |
| `action_contract.dim` | `8` |
| `latency_budget.per_chunk_ms` | `3000.0` |
## Reproduction
```bash
# One-time: provision CoppeliaSim 4.1.0 + PyRep + RLBench@peract + the checkpoint
# in the py3.10 sidecar venv (see docs/adr/0062-rlbench-benchmark-backend.md).
openral benchmark scene \
--config scenes/benchmark/rlbench_open_drawer.yaml \
--rskill rskills/3d-diffuser-actor-rlbench
```
Inference VRAM peaks ~0.43 GB; runs comfortably on an 8 GB GPU. CoppeliaSim is
proprietary (free EDU license) and is **never** vendored β it is an
externally-provisioned dependency (CLAUDE.md Β§1.9 / ADR-0062).
## Evaluation
[`eval/rlbench.json`](eval/rlbench.json) is the **full official protocol**
result (`reproduced_locally: true`), produced by the canonical
`openral benchmark run` (ADR-0009 PR D) on an 8 GB Ada host (2026-06-20) β
**25 episodes per task**, seeds 0β24, max 25 macro-keyposes:
| Task | Success rate |
|---|---|
| `open_drawer` | 22/25 = **0.88** |
| `meat_off_grill` | 24/25 = **0.96** |
| `close_jar` | 19/25 = **0.76** |
| **Average** | **0.867** |
(~946 ms mean step latency; in line with the 3D Diffuser Actor paper's ~0.81
RLBench PerAct average.) Reproduce with:
```bash
openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench
```
> **Note on variance.** RLBench's sampling-based `EndEffectorPoseViaPlanning`
> mover is non-deterministic, so per-task rates vary run-to-run; 3 of the 75
> episodes hit a planner path-failure and are counted as failed episodes (the
> sidecar handles them gracefully rather than aborting the run β ADR-0062).
> Per-task paper baselines (Ke et al., 2402.10885, Table 1) are intentionally
> not transcribed into the artifact to avoid mis-citation.
## License
OpenRAL wrapper files in this repository follow the project Apache-2.0 license.
The wrapped upstream 3D Diffuser Actor code and released
`diffuser_actor_peract.pth` checkpoint are MIT-licensed; the manifest therefore
uses `license: mit` for the consumer-visible weight/runtime posture.
## See also
- `scenes/benchmark/rlbench_open_drawer.yaml`
- `scenes/benchmark/rlbench_meat_off_grill.yaml`
- `scenes/benchmark/rlbench_close_jar.yaml`
- `benchmarks/rlbench.yaml`
- `docs/adr/0062-rlbench-benchmark-backend.md`
|