File size: 6,174 Bytes
799163d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b0873f1
 
d592f2b
b0873f1
 
 
 
 
d592f2b
b0873f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d592f2b
b0873f1
 
 
 
 
 
 
d592f2b
b0873f1
 
 
f1e4d4f
 
 
 
 
 
 
 
 
 
 
 
 
 
b0873f1
 
 
 
 
f1e4d4f
 
 
d592f2b
f1e4d4f
 
b0873f1
 
 
 
 
 
 
 
 
 
 
 
 
 
d592f2b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
tags:
  - OpenRAL
  - rskill
  - diffuser-actor
  - 3d-diffuser-actor
  - rlbench
  - coppeliasim
  - peract
  - manipulation
  - franka
license: mit
language:
  - en
---

<!--
  rSkill README β€” 3D Diffuser Actor (RLBench PerAct setup).
  Discovery + provenance card; mirrors rskill.yaml. ADR-0062.
-->

# rskill-3d-diffuser-actor-rlbench

3D Diffuser Actor β€” a diffusion policy over end-effector **keyposes** for RLBench,
running on the CoppeliaSim/PyRep RLBench benchmark backend (ADR-0062).

## What this skill does

Predicts the next end-effector keypose (position + orientation + gripper) from
multi-view RGB-D, conditioned on a language instruction. Used to benchmark
3D/keyframe manipulation on the RLBench **PerAct 18-task** suite. Ships the three
live-verified starter tasks: `open_drawer`, `meat_off_grill`, `close_jar`.

| Field | Value |
|---|---|
| Actions | open, close, pick, place (generalist keyframe policy) |
| Objects | drawer, grill/meat, jar β€” (PerAct task objects) |
| Scenes  | tabletop (RLBench / CoppeliaSim) |
| Embodiment | franka_panda |

## How it works

3D Diffuser Actor lifts the four RLBench camera RGB-D streams into a 3D point-cloud
scene token field, attends over it with a relative-position transformer, and runs a
DDPM diffusion head (100 denoising steps) to denoise an end-effector keypose
trajectory. Each predicted keypose is executed in RLBench by its sampling-based
motion planner (`EndEffectorPoseViaPlanning`), then the policy re-observes and
predicts the next keypose. The policy and the CoppeliaSim/PyRep scene run in an
out-of-process **py3.10 sidecar** (ZMQ + msgpack); the openral adapter
(`openral_sim.policies.rlbench_3dda`) forks it transparently.

### Observation β†’ action contract

| dir | key | shape | notes |
|---|---|---|---|
| in | `observation.images.{left_shoulder,right_shoulder,wrist,front}` | `(H, W, 3) uint8` | RLBench PerAct cameras, 256Γ—256 |
| in | `observation.point_clouds.{…}` | `(H, W, 3) float32` | per-camera world-frame point clouds |
| in | `observation.gripper_pose` | `(7,)` float32 | `[x y z qx qy qz qw]` |
| out | keyframe action | `(8,)` float32 | `[x y z qx qy qz qw gripper_open]` (world frame) |

## Upstream model / training

Weights are the authors' published RLBench PerAct multi-task checkpoint
(`diffuser_actor_peract.pth`); loaded verbatim, not retrained. Trained by the
authors on the PerAct 18-task RLBench demonstrations (multi-view RGB-D + keypose
supervision).

| Field | Value |
|---|---|
| Source repo | [`nickgkan/3d_diffuser_actor`](https://github.com/nickgkan/3d_diffuser_actor) |
| Weights | [`katefgroup/3d_diffuser_actor`](https://huggingface.co/katefgroup/3d_diffuser_actor) β€” `diffuser_actor_peract.pth` (168 MB) |
| Paper | [arxiv:2402.10885](https://arxiv.org/abs/2402.10885) β€” *3D Diffuser Actor: Policy Diffusion with 3D Scene Representations* |
| License | mit (code + checkpoints) β€” commercially permissive |
| Parameters | ~55 M |
| Training data | RLBench PerAct 18-task demonstrations |

## Supported robots

| Robot | Scene | Status | Notes |
|---|---|---|---|
| franka_panda | RLBench (CoppeliaSim) | βœ“ validated | open_drawer 4/4, meat_off_grill 3/3, close_jar solved (8 GB Ada host, 2026-06-19) |

## Sensors required

| key | modality | resolution | dtype |
|---|---|---|---|
| `observation.images.left_shoulder` | RGB | 256 Γ— 256 | `uint8` |
| `observation.images.right_shoulder` | RGB | 256 Γ— 256 | `uint8` |
| `observation.images.wrist` | RGB | 256 Γ— 256 | `uint8` |
| `observation.images.front` | RGB | 256 Γ— 256 | `uint8` |

## Manifest summary

| Field | Value |
|---|---|
| `name` | `OpenRAL/rskill-3d-diffuser-actor-rlbench` |
| `version` | `0.1.0` |
| `license` | `mit` |
| `role` | `s1` |
| `model_family` | `diffuser_actor` |
| `embodiment_tags` | `franka_panda` |
| `runtime` | `pytorch` |
| `weights_uri` | `hf://katefgroup/3d_diffuser_actor` |
| `action_contract.dim` | `8` |
| `latency_budget.per_chunk_ms` | `3000.0` |

## Reproduction

```bash
# One-time: provision CoppeliaSim 4.1.0 + PyRep + RLBench@peract + the checkpoint
# in the py3.10 sidecar venv (see docs/adr/0062-rlbench-benchmark-backend.md).
openral benchmark scene \
  --config scenes/benchmark/rlbench_open_drawer.yaml \
  --rskill rskills/3d-diffuser-actor-rlbench
```

Inference VRAM peaks ~0.43 GB; runs comfortably on an 8 GB GPU. CoppeliaSim is
proprietary (free EDU license) and is **never** vendored β€” it is an
externally-provisioned dependency (CLAUDE.md Β§1.9 / ADR-0062).

## Evaluation

[`eval/rlbench.json`](eval/rlbench.json) is the **full official protocol**
result (`reproduced_locally: true`), produced by the canonical
`openral benchmark run` (ADR-0009 PR D) on an 8 GB Ada host (2026-06-20) β€”
**25 episodes per task**, seeds 0–24, max 25 macro-keyposes:

| Task | Success rate |
|---|---|
| `open_drawer` | 22/25 = **0.88** |
| `meat_off_grill` | 24/25 = **0.96** |
| `close_jar` | 19/25 = **0.76** |
| **Average** | **0.867** |

(~946 ms mean step latency; in line with the 3D Diffuser Actor paper's ~0.81
RLBench PerAct average.) Reproduce with:

```bash
openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench
```

> **Note on variance.** RLBench's sampling-based `EndEffectorPoseViaPlanning`
> mover is non-deterministic, so per-task rates vary run-to-run; 3 of the 75
> episodes hit a planner path-failure and are counted as failed episodes (the
> sidecar handles them gracefully rather than aborting the run β€” ADR-0062).
> Per-task paper baselines (Ke et al., 2402.10885, Table 1) are intentionally
> not transcribed into the artifact to avoid mis-citation.

## License

OpenRAL wrapper files in this repository follow the project Apache-2.0 license.
The wrapped upstream 3D Diffuser Actor code and released
`diffuser_actor_peract.pth` checkpoint are MIT-licensed; the manifest therefore
uses `license: mit` for the consumer-visible weight/runtime posture.

## See also

- `scenes/benchmark/rlbench_open_drawer.yaml`
- `scenes/benchmark/rlbench_meat_off_grill.yaml`
- `scenes/benchmark/rlbench_close_jar.yaml`
- `benchmarks/rlbench.yaml`
- `docs/adr/0062-rlbench-benchmark-backend.md`