AdrianLlopart commited on
Commit
b0873f1
Β·
verified Β·
1 Parent(s): 4bbd48d

chore: publish rSkill OpenRAL/rskill-3d-diffuser-actor-rlbench v0.1.0

Browse files
Files changed (4) hide show
  1. README.md +137 -0
  2. SKILL.md +70 -0
  3. eval/rlbench.json +72 -0
  4. rskill.yaml +108 -0
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--
2
+ rSkill README β€” 3D Diffuser Actor (RLBench PerAct setup).
3
+ Discovery + provenance card; mirrors rskill.yaml. ADR-0061.
4
+ -->
5
+
6
+ # rskill-3d-diffuser-actor-rlbench
7
+
8
+ 3D Diffuser Actor β€” a diffusion policy over end-effector **keyposes** for RLBench,
9
+ running on the CoppeliaSim/PyRep RLBench benchmark backend (ADR-0061).
10
+
11
+ ## What this skill does
12
+
13
+ Predicts the next end-effector keypose (position + orientation + gripper) from
14
+ multi-view RGB-D, conditioned on a language instruction. Used to benchmark
15
+ 3D/keyframe manipulation on the RLBench **PerAct 18-task** suite. Ships the three
16
+ live-verified starter tasks: `open_drawer`, `meat_off_grill`, `close_jar`.
17
+
18
+ | Field | Value |
19
+ |---|---|
20
+ | Actions | open, close, pick, place (generalist keyframe policy) |
21
+ | Objects | drawer, grill/meat, jar β€” (PerAct task objects) |
22
+ | Scenes | tabletop (RLBench / CoppeliaSim) |
23
+ | Embodiment | franka_panda |
24
+
25
+ ## How it works
26
+
27
+ 3D Diffuser Actor lifts the four RLBench camera RGB-D streams into a 3D point-cloud
28
+ scene token field, attends over it with a relative-position transformer, and runs a
29
+ DDPM diffusion head (100 denoising steps) to denoise an end-effector keypose
30
+ trajectory. Each predicted keypose is executed in RLBench by its sampling-based
31
+ motion planner (`EndEffectorPoseViaPlanning`), then the policy re-observes and
32
+ predicts the next keypose. The policy and the CoppeliaSim/PyRep scene run in an
33
+ out-of-process **py3.10 sidecar** (ZMQ + msgpack); the openral adapter
34
+ (`openral_sim.policies.rlbench_3dda`) forks it transparently.
35
+
36
+ ### Observation β†’ action contract
37
+
38
+ | dir | key | shape | notes |
39
+ |---|---|---|---|
40
+ | in | `observation.images.{left_shoulder,right_shoulder,wrist,front}` | `(H, W, 3) uint8` | RLBench PerAct cameras, 256Γ—256 |
41
+ | in | `observation.point_clouds.{…}` | `(H, W, 3) float32` | per-camera world-frame point clouds |
42
+ | in | `observation.gripper_pose` | `(7,)` float32 | `[x y z qx qy qz qw]` |
43
+ | out | keyframe action | `(8,)` float32 | `[x y z qx qy qz qw gripper_open]` (world frame) |
44
+
45
+ ## Upstream model / training
46
+
47
+ Weights are the authors' published RLBench PerAct multi-task checkpoint
48
+ (`diffuser_actor_peract.pth`); loaded verbatim, not retrained. Trained by the
49
+ authors on the PerAct 18-task RLBench demonstrations (multi-view RGB-D + keypose
50
+ supervision).
51
+
52
+ | Field | Value |
53
+ |---|---|
54
+ | Source repo | [`nickgkan/3d_diffuser_actor`](https://github.com/nickgkan/3d_diffuser_actor) |
55
+ | Weights | [`katefgroup/3d_diffuser_actor`](https://huggingface.co/katefgroup/3d_diffuser_actor) β€” `diffuser_actor_peract.pth` (168 MB) |
56
+ | Paper | [arxiv:2402.10885](https://arxiv.org/abs/2402.10885) β€” *3D Diffuser Actor: Policy Diffusion with 3D Scene Representations* |
57
+ | License | mit (code + checkpoints) β€” commercially permissive |
58
+ | Parameters | ~55 M |
59
+ | Training data | RLBench PerAct 18-task demonstrations |
60
+
61
+ ## Supported robots
62
+
63
+ | Robot | Scene | Status | Notes |
64
+ |---|---|---|---|
65
+ | franka_panda | RLBench (CoppeliaSim) | βœ“ validated | open_drawer 4/4, meat_off_grill 3/3, close_jar solved (8 GB Ada host, 2026-06-19) |
66
+
67
+ ## Sensors required
68
+
69
+ | key | modality | resolution | dtype |
70
+ |---|---|---|---|
71
+ | `observation.images.left_shoulder` | RGB | 256 Γ— 256 | `uint8` |
72
+ | `observation.images.right_shoulder` | RGB | 256 Γ— 256 | `uint8` |
73
+ | `observation.images.wrist` | RGB | 256 Γ— 256 | `uint8` |
74
+ | `observation.images.front` | RGB | 256 Γ— 256 | `uint8` |
75
+
76
+ ## Manifest summary
77
+
78
+ | Field | Value |
79
+ |---|---|
80
+ | `name` | `OpenRAL/rskill-3d-diffuser-actor-rlbench` |
81
+ | `version` | `0.1.0` |
82
+ | `license` | `mit` |
83
+ | `role` | `s1` |
84
+ | `model_family` | `diffuser_actor` |
85
+ | `embodiment_tags` | `franka_panda` |
86
+ | `runtime` | `pytorch` |
87
+ | `weights_uri` | `hf://katefgroup/3d_diffuser_actor` |
88
+ | `action_contract.dim` | `8` |
89
+ | `latency_budget.per_chunk_ms` | `3000.0` |
90
+
91
+ ## Reproduction
92
+
93
+ ```bash
94
+ # One-time: provision CoppeliaSim 4.1.0 + PyRep + RLBench@peract + the checkpoint
95
+ # in the py3.10 sidecar venv (see docs/adr/0061-rlbench-benchmark-backend.md).
96
+ openral benchmark scene \
97
+ --config scenes/benchmark/rlbench_open_drawer.yaml \
98
+ --rskill rskills/3d-diffuser-actor-rlbench
99
+ ```
100
+
101
+ Inference VRAM peaks ~0.43 GB; runs comfortably on an 8 GB GPU. CoppeliaSim is
102
+ proprietary (free EDU license) and is **never** vendored β€” it is an
103
+ externally-provisioned dependency (CLAUDE.md Β§1.9 / ADR-0061).
104
+
105
+ ## Evaluation
106
+
107
+ [`eval/rlbench.json`](eval/rlbench.json) ships the **live single-episode
108
+ verification** that qualifies this starter PR (`reproduced_locally: true`):
109
+ `open_drawer`, `meat_off_grill`, and `close_jar` each succeed (success_rate
110
+ `1.0`, 3 / 5 / 6 macro-keyposes, ~1.0 s/keypose) on an 8 GB Ada host
111
+ (2026-06-19, seed 0). This is **not** the full official protocol β€” RLBench /
112
+ PerAct / 3DDA evaluate **25 episodes per task** (seed 0, max 25 keyposes). To
113
+ produce the full artifact and overwrite the `results` block, run the suite
114
+ against the provisioned CoppeliaSim sidecar:
115
+
116
+ ```bash
117
+ openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench
118
+ ```
119
+
120
+ (`openral benchmark run` is the canonical `RSkillEvalResult` producer β€” ADR-0009
121
+ PR D.) Per-task paper baselines are reported in Ke et al. (2402.10885, Table 1)
122
+ and are intentionally not transcribed into the artifact to avoid mis-citation.
123
+
124
+ ## License
125
+
126
+ OpenRAL wrapper files in this repository follow the project Apache-2.0 license.
127
+ The wrapped upstream 3D Diffuser Actor code and released
128
+ `diffuser_actor_peract.pth` checkpoint are MIT-licensed; the manifest therefore
129
+ uses `license: mit` for the consumer-visible weight/runtime posture.
130
+
131
+ ## See also
132
+
133
+ - `scenes/benchmark/rlbench_open_drawer.yaml`
134
+ - `scenes/benchmark/rlbench_meat_off_grill.yaml`
135
+ - `scenes/benchmark/rlbench_close_jar.yaml`
136
+ - `benchmarks/rlbench.yaml`
137
+ - `docs/adr/0061-rlbench-benchmark-backend.md`
SKILL.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: 3d-diffuser-actor-rlbench
3
+ description: >-
4
+ S1 Vision-Language-Action policy. Capabilities: generalist, open, close, pick, place. 3D Diffuser Actor (Ke et al., 2024) β€” a diffusion policy over end-effector keyposes fusing multi-view RGB-D into a 3D scene representation, on the RLBench PerAct 18-task benchmark. Shares the out-of-process CoppeliaSim/PyRep sidecar with the rlbench scene backend (ADR-0061). MIT code + checkpoints. The PerAct checkpoint is loaded verbatim; ships three live-verified starter tasks. Discovery view of an OpenRAL rSkill β€” NOT directly runnable by an agent harness; it runs via rSkill.from_pretrained + the robot HAL.
5
+ metadata:
6
+ openral_rskill: true # generated discovery view of an rSkill
7
+ schema_version: 0.1
8
+ rskill_id: OpenRAL/rskill-3d-diffuser-actor-rlbench
9
+ manifest: ./rskill.yaml
10
+ role: s1
11
+ kind: vla
12
+ model_family: diffuser_actor
13
+ embodiment_tags: [franka_panda]
14
+ actions: [generalist, open, close, pick, place]
15
+ scenes: [tabletop]
16
+ sensors_required: [rgb]
17
+ action_dim: 8
18
+ runtime: pytorch
19
+ min_vram_gb: {bf16: 2.0, fp32: 2.0}
20
+ chunk_size: 1
21
+ latency_budget: {per_chunk_ms: 3000.0}
22
+ license_code: Apache-2.0
23
+ license_weights: mit
24
+ weights_uri: hf://katefgroup/3d_diffuser_actor
25
+ source_repo: hf://katefgroup/3d_diffuser_actor
26
+ paper_url: https://arxiv.org/abs/2402.10885
27
+ ---
28
+
29
+ # 3d-diffuser-actor-rlbench β€” rSkill discovery view
30
+
31
+ > **Generated view, not a hand-written skill.** This `SKILL.md` is a discovery-only
32
+ > mirror of [`rskill.yaml`](./rskill.yaml), produced by `tools/generate_rskill_skillmd.py`.
33
+ > It lets tools that read the standard agent-skill format find and reason about this
34
+ > OpenRAL rSkill. The `rskill.yaml` manifest is the single source of truth
35
+ > (CLAUDE.md Β§1.3). Do not edit by hand β€” edit the manifest and regenerate.
36
+
37
+ ## What it is
38
+
39
+ An OpenRAL **Vision-Language-Action policy** (`role: s1`, `kind: vla`). 3D Diffuser Actor (Ke et al., 2024) β€” a diffusion policy over end-effector keyposes fusing multi-view RGB-D into a 3D scene representation, on the RLBench PerAct 18-task benchmark. Shares the out-of-process CoppeliaSim/PyRep sidecar with the rlbench scene backend (ADR-0061). MIT code + checkpoints. The PerAct checkpoint is loaded verbatim; ships three live-verified starter tasks.
40
+
41
+ ## Capabilities
42
+
43
+ - **Verbs:** generalist Β· open Β· close Β· pick Β· place
44
+ - **Scenes:** tabletop
45
+ - **Embodiments:** franka_panda
46
+
47
+ ## Why this is discovery-only
48
+
49
+ An agent skill is natural-language instructions loaded into an LLM's context. An rSkill
50
+ is an executable artifact: it carries a typed capability/embodiment contract, model weights,
51
+ a runtime, and a license/provenance gate β€” none of which fit in freeform markdown. So an
52
+ agent can use this view to *select* the right skill, but cannot *execute* it by loading
53
+ this file. Execution always goes through the OpenRAL loader and the robot HAL.
54
+
55
+ ## License
56
+
57
+ - **Code:** Apache-2.0.
58
+ - **Weights:** `mit` β€” permissive / commercial-use OK
59
+
60
+ ## How to actually run it (not via an agent harness)
61
+
62
+ ```python
63
+ from openral_rskill import rSkill
64
+
65
+ skill = rSkill.from_pretrained("OpenRAL/rskill-3d-diffuser-actor-rlbench")
66
+ # the loader validates embodiment / sensors / runtime / quantization against the target
67
+ # RobotDescription and enforces the weight-license gate before any weights load.
68
+ ```
69
+
70
+ See [`rskill.yaml`](./rskill.yaml) for the authoritative, validated manifest.
eval/rlbench.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_comment": "Live single-episode verification of 3D Diffuser Actor (katefgroup/3d_diffuser_actor, MIT) on three RLBench PerAct tasks, reproduced locally on an 8 GB Ada GPU host (2026-06-19) via the CoppeliaSim/PyRep + 3DDA py3.10 sidecars (ADR-0061). This is the starter-PR proof, NOT the full official protocol: the canonical RLBench/PerAct/3DDA protocol is 25 evaluation episodes per task (seed 0, max 25 macro-keyposes) β€” run the full suite to overwrite these blocks (see source.reproduction_planned). Per-task paper baselines are reported in Ke et al. 2402.10885 Table 1 and are intentionally NOT transcribed here to avoid mis-citation.",
3
+ "schema_version": "0.1",
4
+ "source": {
5
+ "paper": "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations (Ke et al., 2024)",
6
+ "arxiv": "https://arxiv.org/abs/2402.10885",
7
+ "model_variant": "3D Diffuser Actor (PerAct multi-task checkpoint, diffuser_actor_peract.pth)",
8
+ "evaluated_by": "OpenRAL: openral benchmark scene",
9
+ "reproduced_locally": true,
10
+ "reproduction_planned": "Full official protocol (25 episodes/task, seed 0, max 25 keyposes) deferred to a dedicated benchmark session β€” run `openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench` against the provisioned CoppeliaSim sidecar and overwrite the results block.",
11
+ "reproduction_cli": {
12
+ "description": "ADR-0009 PR D: `openral benchmark run` / `openral benchmark scene` is the canonical producer of RSkillEvalResult JSONs. Requires the externally-provisioned CoppeliaSim 4.1.0 + PyRep + RLBench@peract + 3D Diffuser Actor py3.10 sidecar venv (ADR-0061).",
13
+ "single_scene_example": "openral benchmark scene --config scenes/benchmark/rlbench_open_drawer.yaml --rskill rskills/3d-diffuser-actor-rlbench --n-episodes 1",
14
+ "all_suites": "openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench",
15
+ "suite_max_steps": 25,
16
+ "notes": [
17
+ "CoppeliaSim is proprietary / free-EDU and is NEVER vendored; provision it yourself per ADR-0061.",
18
+ "The 3D Diffuser Actor checkpoint and code are MIT-licensed β€” no install-time license guard.",
19
+ "Inference VRAM peak ~0.43 GB; the policy + RLBench scene share one py3.10 ZMQ sidecar.",
20
+ "results below are reproduced_locally=true at n_episodes=1 per task (live verification); flip to the full 25-episode protocol via the all_suites command above."
21
+ ]
22
+ },
23
+ "table": null,
24
+ "status": "reproduced"
25
+ },
26
+ "benchmark": {
27
+ "name": "RLBench",
28
+ "dataset": null,
29
+ "protocol": "Live verification: 1 episode per task, seed=0, success_key=is_success, max 25 macro-keyposes/episode (each planned + executed by RLBench EndEffectorPoseViaPlanning). Official PerAct/3DDA protocol is 25 episodes/task.",
30
+ "robot": "franka_panda",
31
+ "simulator": "CoppeliaSim 4.1.0 / PyRep (RLBench@peract fork)"
32
+ },
33
+ "eval_config": {
34
+ "n_episodes_per_task": 1,
35
+ "seeds": [0],
36
+ "success_key": "is_success",
37
+ "max_steps": 25,
38
+ "vla_id": "diffuser_actor",
39
+ "weights_uri": "hf://katefgroup/3d_diffuser_actor",
40
+ "denoising_steps": 100,
41
+ "cameras": ["left_shoulder", "right_shoulder", "wrist", "front"],
42
+ "observation_size": [256, 256],
43
+ "action_dim": 8,
44
+ "inference_vram_gb_peak": 0.43
45
+ },
46
+ "results": {
47
+ "rlbench/open_drawer": {
48
+ "success_rate": 1.0,
49
+ "n_episodes": 1,
50
+ "keyposes": 3,
51
+ "mean_keypose_latency_ms": 1006.0
52
+ },
53
+ "rlbench/meat_off_grill": {
54
+ "success_rate": 1.0,
55
+ "n_episodes": 1,
56
+ "keyposes": 5,
57
+ "mean_keypose_latency_ms": 974.0
58
+ },
59
+ "rlbench/close_jar": {
60
+ "success_rate": 1.0,
61
+ "n_episodes": 1,
62
+ "keyposes": 6,
63
+ "mean_keypose_latency_ms": 964.0
64
+ },
65
+ "avg_success_rate": 1.0,
66
+ "n_tasks": 3,
67
+ "n_episodes_per_task": 1,
68
+ "n_episodes_total": 3
69
+ },
70
+ "baselines": {},
71
+ "trace_id": null
72
+ }
rskill.yaml ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # rSkill manifest β€” openral packaging format V1 (CLAUDE.md Β§6.4)
2
+ # Wraps: katefgroup/3d_diffuser_actor (diffuser_actor_peract.pth)
3
+ # Paper: Ke et al., 2024 β€” "3D Diffuser Actor: Policy Diffusion with 3D Scene
4
+ # Representations" (arXiv:2402.10885). RLBench PerAct 18-task setup.
5
+ #
6
+ # LICENSE: MIT (code + released checkpoints) β€” commercially permissive. No
7
+ # license guard needed (unlike RVT/RVT-2, which are NVIDIA non-commercial).
8
+ #
9
+ # RUNTIME: auto-managed out-of-process sidecar (ZMQ + msgpack), ADR-0061. The
10
+ # policy AND the CoppeliaSim/PyRep RLBench scene run in their own externally-
11
+ # provisioned py3.10 venv (CoppeliaSim is proprietary, free-EDU, NEVER vendored).
12
+ # The openral adapter (openral_sim.policies.rlbench_3dda) forks
13
+ # tools/rlbench_3dda_sidecar.py on first use; user workflow is one command:
14
+ #
15
+ # openral benchmark scene --config scenes/benchmark/rlbench_open_drawer.yaml \
16
+ # --rskill rskills/3d-diffuser-actor-rlbench
17
+ #
18
+ # Verified live on an 8 GB Ada GPU host (2026-06-19): open_drawer 4/4,
19
+ # meat_off_grill 3/3, close_jar solved. Inference VRAM peak ~0.43 GB.
20
+
21
+ # ── Identity ───────────────────────────────────────────────────────────────
22
+ schema_version: "0.1"
23
+ name: "OpenRAL/rskill-3d-diffuser-actor-rlbench"
24
+ # ADR-0060: the benchmark tasks this checkpoint is validated for (gate). The
25
+ # released PerAct checkpoint covers all 18 PerAct tasks; we ship + declare the
26
+ # three live-verified starter tasks here (the rest are a follow-up).
27
+ evaluated_tasks:
28
+ - "rlbench/open_drawer"
29
+ - "rlbench/meat_off_grill"
30
+ - "rlbench/close_jar"
31
+ version: "0.1.0"
32
+ license: "mit"
33
+ role: "s1"
34
+ kind: "vla"
35
+
36
+ # ── Policy identity ────────────────────────────────────────────────────────
37
+ model_family: "diffuser_actor"
38
+
39
+ # ── Compatibility contract ─────────────────────────────────────────────────
40
+ embodiment_tags:
41
+ - "franka_panda"
42
+
43
+ # RLBench renders four fixed cameras (the PerAct set: left_shoulder /
44
+ # right_shoulder / wrist / front) and the policy fuses their RGB-D point clouds
45
+ # into a 3D scene representation. Those four are supplied by the SCENE backend
46
+ # (openral_sim.backends.rlbench), NOT by the robot's real sensor list β€” so the
47
+ # robot-capability gate uses a coarse modality-count requirement ("an RGB-vision
48
+ # embodiment") rather than keyed camera1..4 the franka_panda manifest doesn't
49
+ # declare. The per-camera 3D fusion happens inside the policy sidecar.
50
+ sensors_required:
51
+ - modality: "rgb"
52
+ count: 1
53
+ min_width: 128
54
+ min_height: 128
55
+
56
+ # The policy emits next-keyframe end-effector poses; RLBench executes each via
57
+ # its sampling-based motion planner (EndEffectorPoseViaPlanning). Absolute EE
58
+ # pose targets, not deltas.
59
+ actuators_required:
60
+ - kind: "cartesian_pose"
61
+ control_mode_semantics:
62
+ mode: "absolute"
63
+ reference_frame: "panda_link0"
64
+
65
+ # ── Runtime / weights ──────────────────────────────────────────────────────
66
+ runtime: "pytorch"
67
+ min_vram_gb:
68
+ bf16: 2.0
69
+ fp32: 2.0
70
+ weights_uri: "hf://katefgroup/3d_diffuser_actor"
71
+
72
+ # ── Execution semantics ────────────────────────────────────────────────────
73
+ # One macro-keypose per step (the scene's mover plans + executes it). 100 DDIM
74
+ # denoising steps per keypose; ~1.2 s/keypose on an 8 GB Ada GPU.
75
+ chunk_size: 1
76
+ latency_budget:
77
+ per_chunk_ms: 3000.0
78
+
79
+ # ── IO contract ────────────────────────────────────────────────────────────
80
+ # 8-D keyframe action: [x y z qx qy qz qw gripper_open] (world frame). The scene
81
+ # sidecar appends the peract fork's ignore_collisions channel + plans the motion.
82
+ action_contract:
83
+ dim: 8
84
+ slots:
85
+ - {range: [0, 6], control_mode: "cartesian_pose", ee: "panda_hand", frame: "panda_link0"}
86
+ - {range: [7, 7], control_mode: "gripper_position", ee: "panda_gripper"}
87
+
88
+ # ── Provenance ─────────────────────────────────────────────────────────────
89
+ paper_url: "https://arxiv.org/abs/2402.10885"
90
+ source_repo: "hf://katefgroup/3d_diffuser_actor"
91
+
92
+ description: >
93
+ 3D Diffuser Actor (Ke et al., 2024) β€” a diffusion policy over end-effector
94
+ keyposes fusing multi-view RGB-D into a 3D scene representation, on the RLBench
95
+ PerAct 18-task benchmark. Shares the out-of-process CoppeliaSim/PyRep sidecar
96
+ with the rlbench scene backend (ADR-0061). MIT code + checkpoints. The PerAct
97
+ checkpoint is loaded verbatim; ships three live-verified starter tasks.
98
+
99
+ # ADR-0022 β€” action vocabulary surfaced to the reasoner LLM tool palette.
100
+ actions:
101
+ - "generalist"
102
+ - "open"
103
+ - "close"
104
+ - "pick"
105
+ - "place"
106
+ objects: []
107
+ scenes:
108
+ - "tabletop"