chore: publish rSkill OpenRAL/rskill-3d-diffuser-actor-rlbench v0.1.0
Browse files- README.md +137 -0
- SKILL.md +70 -0
- eval/rlbench.json +72 -0
- rskill.yaml +108 -0
README.md
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--
|
| 2 |
+
rSkill README β 3D Diffuser Actor (RLBench PerAct setup).
|
| 3 |
+
Discovery + provenance card; mirrors rskill.yaml. ADR-0061.
|
| 4 |
+
-->
|
| 5 |
+
|
| 6 |
+
# rskill-3d-diffuser-actor-rlbench
|
| 7 |
+
|
| 8 |
+
3D Diffuser Actor β a diffusion policy over end-effector **keyposes** for RLBench,
|
| 9 |
+
running on the CoppeliaSim/PyRep RLBench benchmark backend (ADR-0061).
|
| 10 |
+
|
| 11 |
+
## What this skill does
|
| 12 |
+
|
| 13 |
+
Predicts the next end-effector keypose (position + orientation + gripper) from
|
| 14 |
+
multi-view RGB-D, conditioned on a language instruction. Used to benchmark
|
| 15 |
+
3D/keyframe manipulation on the RLBench **PerAct 18-task** suite. Ships the three
|
| 16 |
+
live-verified starter tasks: `open_drawer`, `meat_off_grill`, `close_jar`.
|
| 17 |
+
|
| 18 |
+
| Field | Value |
|
| 19 |
+
|---|---|
|
| 20 |
+
| Actions | open, close, pick, place (generalist keyframe policy) |
|
| 21 |
+
| Objects | drawer, grill/meat, jar β (PerAct task objects) |
|
| 22 |
+
| Scenes | tabletop (RLBench / CoppeliaSim) |
|
| 23 |
+
| Embodiment | franka_panda |
|
| 24 |
+
|
| 25 |
+
## How it works
|
| 26 |
+
|
| 27 |
+
3D Diffuser Actor lifts the four RLBench camera RGB-D streams into a 3D point-cloud
|
| 28 |
+
scene token field, attends over it with a relative-position transformer, and runs a
|
| 29 |
+
DDPM diffusion head (100 denoising steps) to denoise an end-effector keypose
|
| 30 |
+
trajectory. Each predicted keypose is executed in RLBench by its sampling-based
|
| 31 |
+
motion planner (`EndEffectorPoseViaPlanning`), then the policy re-observes and
|
| 32 |
+
predicts the next keypose. The policy and the CoppeliaSim/PyRep scene run in an
|
| 33 |
+
out-of-process **py3.10 sidecar** (ZMQ + msgpack); the openral adapter
|
| 34 |
+
(`openral_sim.policies.rlbench_3dda`) forks it transparently.
|
| 35 |
+
|
| 36 |
+
### Observation β action contract
|
| 37 |
+
|
| 38 |
+
| dir | key | shape | notes |
|
| 39 |
+
|---|---|---|---|
|
| 40 |
+
| in | `observation.images.{left_shoulder,right_shoulder,wrist,front}` | `(H, W, 3) uint8` | RLBench PerAct cameras, 256Γ256 |
|
| 41 |
+
| in | `observation.point_clouds.{β¦}` | `(H, W, 3) float32` | per-camera world-frame point clouds |
|
| 42 |
+
| in | `observation.gripper_pose` | `(7,)` float32 | `[x y z qx qy qz qw]` |
|
| 43 |
+
| out | keyframe action | `(8,)` float32 | `[x y z qx qy qz qw gripper_open]` (world frame) |
|
| 44 |
+
|
| 45 |
+
## Upstream model / training
|
| 46 |
+
|
| 47 |
+
Weights are the authors' published RLBench PerAct multi-task checkpoint
|
| 48 |
+
(`diffuser_actor_peract.pth`); loaded verbatim, not retrained. Trained by the
|
| 49 |
+
authors on the PerAct 18-task RLBench demonstrations (multi-view RGB-D + keypose
|
| 50 |
+
supervision).
|
| 51 |
+
|
| 52 |
+
| Field | Value |
|
| 53 |
+
|---|---|
|
| 54 |
+
| Source repo | [`nickgkan/3d_diffuser_actor`](https://github.com/nickgkan/3d_diffuser_actor) |
|
| 55 |
+
| Weights | [`katefgroup/3d_diffuser_actor`](https://huggingface.co/katefgroup/3d_diffuser_actor) β `diffuser_actor_peract.pth` (168 MB) |
|
| 56 |
+
| Paper | [arxiv:2402.10885](https://arxiv.org/abs/2402.10885) β *3D Diffuser Actor: Policy Diffusion with 3D Scene Representations* |
|
| 57 |
+
| License | mit (code + checkpoints) β commercially permissive |
|
| 58 |
+
| Parameters | ~55 M |
|
| 59 |
+
| Training data | RLBench PerAct 18-task demonstrations |
|
| 60 |
+
|
| 61 |
+
## Supported robots
|
| 62 |
+
|
| 63 |
+
| Robot | Scene | Status | Notes |
|
| 64 |
+
|---|---|---|---|
|
| 65 |
+
| franka_panda | RLBench (CoppeliaSim) | β validated | open_drawer 4/4, meat_off_grill 3/3, close_jar solved (8 GB Ada host, 2026-06-19) |
|
| 66 |
+
|
| 67 |
+
## Sensors required
|
| 68 |
+
|
| 69 |
+
| key | modality | resolution | dtype |
|
| 70 |
+
|---|---|---|---|
|
| 71 |
+
| `observation.images.left_shoulder` | RGB | 256 Γ 256 | `uint8` |
|
| 72 |
+
| `observation.images.right_shoulder` | RGB | 256 Γ 256 | `uint8` |
|
| 73 |
+
| `observation.images.wrist` | RGB | 256 Γ 256 | `uint8` |
|
| 74 |
+
| `observation.images.front` | RGB | 256 Γ 256 | `uint8` |
|
| 75 |
+
|
| 76 |
+
## Manifest summary
|
| 77 |
+
|
| 78 |
+
| Field | Value |
|
| 79 |
+
|---|---|
|
| 80 |
+
| `name` | `OpenRAL/rskill-3d-diffuser-actor-rlbench` |
|
| 81 |
+
| `version` | `0.1.0` |
|
| 82 |
+
| `license` | `mit` |
|
| 83 |
+
| `role` | `s1` |
|
| 84 |
+
| `model_family` | `diffuser_actor` |
|
| 85 |
+
| `embodiment_tags` | `franka_panda` |
|
| 86 |
+
| `runtime` | `pytorch` |
|
| 87 |
+
| `weights_uri` | `hf://katefgroup/3d_diffuser_actor` |
|
| 88 |
+
| `action_contract.dim` | `8` |
|
| 89 |
+
| `latency_budget.per_chunk_ms` | `3000.0` |
|
| 90 |
+
|
| 91 |
+
## Reproduction
|
| 92 |
+
|
| 93 |
+
```bash
|
| 94 |
+
# One-time: provision CoppeliaSim 4.1.0 + PyRep + RLBench@peract + the checkpoint
|
| 95 |
+
# in the py3.10 sidecar venv (see docs/adr/0061-rlbench-benchmark-backend.md).
|
| 96 |
+
openral benchmark scene \
|
| 97 |
+
--config scenes/benchmark/rlbench_open_drawer.yaml \
|
| 98 |
+
--rskill rskills/3d-diffuser-actor-rlbench
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
Inference VRAM peaks ~0.43 GB; runs comfortably on an 8 GB GPU. CoppeliaSim is
|
| 102 |
+
proprietary (free EDU license) and is **never** vendored β it is an
|
| 103 |
+
externally-provisioned dependency (CLAUDE.md Β§1.9 / ADR-0061).
|
| 104 |
+
|
| 105 |
+
## Evaluation
|
| 106 |
+
|
| 107 |
+
[`eval/rlbench.json`](eval/rlbench.json) ships the **live single-episode
|
| 108 |
+
verification** that qualifies this starter PR (`reproduced_locally: true`):
|
| 109 |
+
`open_drawer`, `meat_off_grill`, and `close_jar` each succeed (success_rate
|
| 110 |
+
`1.0`, 3 / 5 / 6 macro-keyposes, ~1.0 s/keypose) on an 8 GB Ada host
|
| 111 |
+
(2026-06-19, seed 0). This is **not** the full official protocol β RLBench /
|
| 112 |
+
PerAct / 3DDA evaluate **25 episodes per task** (seed 0, max 25 keyposes). To
|
| 113 |
+
produce the full artifact and overwrite the `results` block, run the suite
|
| 114 |
+
against the provisioned CoppeliaSim sidecar:
|
| 115 |
+
|
| 116 |
+
```bash
|
| 117 |
+
openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
(`openral benchmark run` is the canonical `RSkillEvalResult` producer β ADR-0009
|
| 121 |
+
PR D.) Per-task paper baselines are reported in Ke et al. (2402.10885, Table 1)
|
| 122 |
+
and are intentionally not transcribed into the artifact to avoid mis-citation.
|
| 123 |
+
|
| 124 |
+
## License
|
| 125 |
+
|
| 126 |
+
OpenRAL wrapper files in this repository follow the project Apache-2.0 license.
|
| 127 |
+
The wrapped upstream 3D Diffuser Actor code and released
|
| 128 |
+
`diffuser_actor_peract.pth` checkpoint are MIT-licensed; the manifest therefore
|
| 129 |
+
uses `license: mit` for the consumer-visible weight/runtime posture.
|
| 130 |
+
|
| 131 |
+
## See also
|
| 132 |
+
|
| 133 |
+
- `scenes/benchmark/rlbench_open_drawer.yaml`
|
| 134 |
+
- `scenes/benchmark/rlbench_meat_off_grill.yaml`
|
| 135 |
+
- `scenes/benchmark/rlbench_close_jar.yaml`
|
| 136 |
+
- `benchmarks/rlbench.yaml`
|
| 137 |
+
- `docs/adr/0061-rlbench-benchmark-backend.md`
|
SKILL.md
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: 3d-diffuser-actor-rlbench
|
| 3 |
+
description: >-
|
| 4 |
+
S1 Vision-Language-Action policy. Capabilities: generalist, open, close, pick, place. 3D Diffuser Actor (Ke et al., 2024) β a diffusion policy over end-effector keyposes fusing multi-view RGB-D into a 3D scene representation, on the RLBench PerAct 18-task benchmark. Shares the out-of-process CoppeliaSim/PyRep sidecar with the rlbench scene backend (ADR-0061). MIT code + checkpoints. The PerAct checkpoint is loaded verbatim; ships three live-verified starter tasks. Discovery view of an OpenRAL rSkill β NOT directly runnable by an agent harness; it runs via rSkill.from_pretrained + the robot HAL.
|
| 5 |
+
metadata:
|
| 6 |
+
openral_rskill: true # generated discovery view of an rSkill
|
| 7 |
+
schema_version: 0.1
|
| 8 |
+
rskill_id: OpenRAL/rskill-3d-diffuser-actor-rlbench
|
| 9 |
+
manifest: ./rskill.yaml
|
| 10 |
+
role: s1
|
| 11 |
+
kind: vla
|
| 12 |
+
model_family: diffuser_actor
|
| 13 |
+
embodiment_tags: [franka_panda]
|
| 14 |
+
actions: [generalist, open, close, pick, place]
|
| 15 |
+
scenes: [tabletop]
|
| 16 |
+
sensors_required: [rgb]
|
| 17 |
+
action_dim: 8
|
| 18 |
+
runtime: pytorch
|
| 19 |
+
min_vram_gb: {bf16: 2.0, fp32: 2.0}
|
| 20 |
+
chunk_size: 1
|
| 21 |
+
latency_budget: {per_chunk_ms: 3000.0}
|
| 22 |
+
license_code: Apache-2.0
|
| 23 |
+
license_weights: mit
|
| 24 |
+
weights_uri: hf://katefgroup/3d_diffuser_actor
|
| 25 |
+
source_repo: hf://katefgroup/3d_diffuser_actor
|
| 26 |
+
paper_url: https://arxiv.org/abs/2402.10885
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
# 3d-diffuser-actor-rlbench β rSkill discovery view
|
| 30 |
+
|
| 31 |
+
> **Generated view, not a hand-written skill.** This `SKILL.md` is a discovery-only
|
| 32 |
+
> mirror of [`rskill.yaml`](./rskill.yaml), produced by `tools/generate_rskill_skillmd.py`.
|
| 33 |
+
> It lets tools that read the standard agent-skill format find and reason about this
|
| 34 |
+
> OpenRAL rSkill. The `rskill.yaml` manifest is the single source of truth
|
| 35 |
+
> (CLAUDE.md Β§1.3). Do not edit by hand β edit the manifest and regenerate.
|
| 36 |
+
|
| 37 |
+
## What it is
|
| 38 |
+
|
| 39 |
+
An OpenRAL **Vision-Language-Action policy** (`role: s1`, `kind: vla`). 3D Diffuser Actor (Ke et al., 2024) β a diffusion policy over end-effector keyposes fusing multi-view RGB-D into a 3D scene representation, on the RLBench PerAct 18-task benchmark. Shares the out-of-process CoppeliaSim/PyRep sidecar with the rlbench scene backend (ADR-0061). MIT code + checkpoints. The PerAct checkpoint is loaded verbatim; ships three live-verified starter tasks.
|
| 40 |
+
|
| 41 |
+
## Capabilities
|
| 42 |
+
|
| 43 |
+
- **Verbs:** generalist Β· open Β· close Β· pick Β· place
|
| 44 |
+
- **Scenes:** tabletop
|
| 45 |
+
- **Embodiments:** franka_panda
|
| 46 |
+
|
| 47 |
+
## Why this is discovery-only
|
| 48 |
+
|
| 49 |
+
An agent skill is natural-language instructions loaded into an LLM's context. An rSkill
|
| 50 |
+
is an executable artifact: it carries a typed capability/embodiment contract, model weights,
|
| 51 |
+
a runtime, and a license/provenance gate β none of which fit in freeform markdown. So an
|
| 52 |
+
agent can use this view to *select* the right skill, but cannot *execute* it by loading
|
| 53 |
+
this file. Execution always goes through the OpenRAL loader and the robot HAL.
|
| 54 |
+
|
| 55 |
+
## License
|
| 56 |
+
|
| 57 |
+
- **Code:** Apache-2.0.
|
| 58 |
+
- **Weights:** `mit` β permissive / commercial-use OK
|
| 59 |
+
|
| 60 |
+
## How to actually run it (not via an agent harness)
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
from openral_rskill import rSkill
|
| 64 |
+
|
| 65 |
+
skill = rSkill.from_pretrained("OpenRAL/rskill-3d-diffuser-actor-rlbench")
|
| 66 |
+
# the loader validates embodiment / sensors / runtime / quantization against the target
|
| 67 |
+
# RobotDescription and enforces the weight-license gate before any weights load.
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
See [`rskill.yaml`](./rskill.yaml) for the authoritative, validated manifest.
|
eval/rlbench.json
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_comment": "Live single-episode verification of 3D Diffuser Actor (katefgroup/3d_diffuser_actor, MIT) on three RLBench PerAct tasks, reproduced locally on an 8 GB Ada GPU host (2026-06-19) via the CoppeliaSim/PyRep + 3DDA py3.10 sidecars (ADR-0061). This is the starter-PR proof, NOT the full official protocol: the canonical RLBench/PerAct/3DDA protocol is 25 evaluation episodes per task (seed 0, max 25 macro-keyposes) β run the full suite to overwrite these blocks (see source.reproduction_planned). Per-task paper baselines are reported in Ke et al. 2402.10885 Table 1 and are intentionally NOT transcribed here to avoid mis-citation.",
|
| 3 |
+
"schema_version": "0.1",
|
| 4 |
+
"source": {
|
| 5 |
+
"paper": "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations (Ke et al., 2024)",
|
| 6 |
+
"arxiv": "https://arxiv.org/abs/2402.10885",
|
| 7 |
+
"model_variant": "3D Diffuser Actor (PerAct multi-task checkpoint, diffuser_actor_peract.pth)",
|
| 8 |
+
"evaluated_by": "OpenRAL: openral benchmark scene",
|
| 9 |
+
"reproduced_locally": true,
|
| 10 |
+
"reproduction_planned": "Full official protocol (25 episodes/task, seed 0, max 25 keyposes) deferred to a dedicated benchmark session β run `openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench` against the provisioned CoppeliaSim sidecar and overwrite the results block.",
|
| 11 |
+
"reproduction_cli": {
|
| 12 |
+
"description": "ADR-0009 PR D: `openral benchmark run` / `openral benchmark scene` is the canonical producer of RSkillEvalResult JSONs. Requires the externally-provisioned CoppeliaSim 4.1.0 + PyRep + RLBench@peract + 3D Diffuser Actor py3.10 sidecar venv (ADR-0061).",
|
| 13 |
+
"single_scene_example": "openral benchmark scene --config scenes/benchmark/rlbench_open_drawer.yaml --rskill rskills/3d-diffuser-actor-rlbench --n-episodes 1",
|
| 14 |
+
"all_suites": "openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench",
|
| 15 |
+
"suite_max_steps": 25,
|
| 16 |
+
"notes": [
|
| 17 |
+
"CoppeliaSim is proprietary / free-EDU and is NEVER vendored; provision it yourself per ADR-0061.",
|
| 18 |
+
"The 3D Diffuser Actor checkpoint and code are MIT-licensed β no install-time license guard.",
|
| 19 |
+
"Inference VRAM peak ~0.43 GB; the policy + RLBench scene share one py3.10 ZMQ sidecar.",
|
| 20 |
+
"results below are reproduced_locally=true at n_episodes=1 per task (live verification); flip to the full 25-episode protocol via the all_suites command above."
|
| 21 |
+
]
|
| 22 |
+
},
|
| 23 |
+
"table": null,
|
| 24 |
+
"status": "reproduced"
|
| 25 |
+
},
|
| 26 |
+
"benchmark": {
|
| 27 |
+
"name": "RLBench",
|
| 28 |
+
"dataset": null,
|
| 29 |
+
"protocol": "Live verification: 1 episode per task, seed=0, success_key=is_success, max 25 macro-keyposes/episode (each planned + executed by RLBench EndEffectorPoseViaPlanning). Official PerAct/3DDA protocol is 25 episodes/task.",
|
| 30 |
+
"robot": "franka_panda",
|
| 31 |
+
"simulator": "CoppeliaSim 4.1.0 / PyRep (RLBench@peract fork)"
|
| 32 |
+
},
|
| 33 |
+
"eval_config": {
|
| 34 |
+
"n_episodes_per_task": 1,
|
| 35 |
+
"seeds": [0],
|
| 36 |
+
"success_key": "is_success",
|
| 37 |
+
"max_steps": 25,
|
| 38 |
+
"vla_id": "diffuser_actor",
|
| 39 |
+
"weights_uri": "hf://katefgroup/3d_diffuser_actor",
|
| 40 |
+
"denoising_steps": 100,
|
| 41 |
+
"cameras": ["left_shoulder", "right_shoulder", "wrist", "front"],
|
| 42 |
+
"observation_size": [256, 256],
|
| 43 |
+
"action_dim": 8,
|
| 44 |
+
"inference_vram_gb_peak": 0.43
|
| 45 |
+
},
|
| 46 |
+
"results": {
|
| 47 |
+
"rlbench/open_drawer": {
|
| 48 |
+
"success_rate": 1.0,
|
| 49 |
+
"n_episodes": 1,
|
| 50 |
+
"keyposes": 3,
|
| 51 |
+
"mean_keypose_latency_ms": 1006.0
|
| 52 |
+
},
|
| 53 |
+
"rlbench/meat_off_grill": {
|
| 54 |
+
"success_rate": 1.0,
|
| 55 |
+
"n_episodes": 1,
|
| 56 |
+
"keyposes": 5,
|
| 57 |
+
"mean_keypose_latency_ms": 974.0
|
| 58 |
+
},
|
| 59 |
+
"rlbench/close_jar": {
|
| 60 |
+
"success_rate": 1.0,
|
| 61 |
+
"n_episodes": 1,
|
| 62 |
+
"keyposes": 6,
|
| 63 |
+
"mean_keypose_latency_ms": 964.0
|
| 64 |
+
},
|
| 65 |
+
"avg_success_rate": 1.0,
|
| 66 |
+
"n_tasks": 3,
|
| 67 |
+
"n_episodes_per_task": 1,
|
| 68 |
+
"n_episodes_total": 3
|
| 69 |
+
},
|
| 70 |
+
"baselines": {},
|
| 71 |
+
"trace_id": null
|
| 72 |
+
}
|
rskill.yaml
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# rSkill manifest β openral packaging format V1 (CLAUDE.md Β§6.4)
|
| 2 |
+
# Wraps: katefgroup/3d_diffuser_actor (diffuser_actor_peract.pth)
|
| 3 |
+
# Paper: Ke et al., 2024 β "3D Diffuser Actor: Policy Diffusion with 3D Scene
|
| 4 |
+
# Representations" (arXiv:2402.10885). RLBench PerAct 18-task setup.
|
| 5 |
+
#
|
| 6 |
+
# LICENSE: MIT (code + released checkpoints) β commercially permissive. No
|
| 7 |
+
# license guard needed (unlike RVT/RVT-2, which are NVIDIA non-commercial).
|
| 8 |
+
#
|
| 9 |
+
# RUNTIME: auto-managed out-of-process sidecar (ZMQ + msgpack), ADR-0061. The
|
| 10 |
+
# policy AND the CoppeliaSim/PyRep RLBench scene run in their own externally-
|
| 11 |
+
# provisioned py3.10 venv (CoppeliaSim is proprietary, free-EDU, NEVER vendored).
|
| 12 |
+
# The openral adapter (openral_sim.policies.rlbench_3dda) forks
|
| 13 |
+
# tools/rlbench_3dda_sidecar.py on first use; user workflow is one command:
|
| 14 |
+
#
|
| 15 |
+
# openral benchmark scene --config scenes/benchmark/rlbench_open_drawer.yaml \
|
| 16 |
+
# --rskill rskills/3d-diffuser-actor-rlbench
|
| 17 |
+
#
|
| 18 |
+
# Verified live on an 8 GB Ada GPU host (2026-06-19): open_drawer 4/4,
|
| 19 |
+
# meat_off_grill 3/3, close_jar solved. Inference VRAM peak ~0.43 GB.
|
| 20 |
+
|
| 21 |
+
# ββ Identity βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 22 |
+
schema_version: "0.1"
|
| 23 |
+
name: "OpenRAL/rskill-3d-diffuser-actor-rlbench"
|
| 24 |
+
# ADR-0060: the benchmark tasks this checkpoint is validated for (gate). The
|
| 25 |
+
# released PerAct checkpoint covers all 18 PerAct tasks; we ship + declare the
|
| 26 |
+
# three live-verified starter tasks here (the rest are a follow-up).
|
| 27 |
+
evaluated_tasks:
|
| 28 |
+
- "rlbench/open_drawer"
|
| 29 |
+
- "rlbench/meat_off_grill"
|
| 30 |
+
- "rlbench/close_jar"
|
| 31 |
+
version: "0.1.0"
|
| 32 |
+
license: "mit"
|
| 33 |
+
role: "s1"
|
| 34 |
+
kind: "vla"
|
| 35 |
+
|
| 36 |
+
# ββ Policy identity ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 37 |
+
model_family: "diffuser_actor"
|
| 38 |
+
|
| 39 |
+
# ββ Compatibility contract βββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 40 |
+
embodiment_tags:
|
| 41 |
+
- "franka_panda"
|
| 42 |
+
|
| 43 |
+
# RLBench renders four fixed cameras (the PerAct set: left_shoulder /
|
| 44 |
+
# right_shoulder / wrist / front) and the policy fuses their RGB-D point clouds
|
| 45 |
+
# into a 3D scene representation. Those four are supplied by the SCENE backend
|
| 46 |
+
# (openral_sim.backends.rlbench), NOT by the robot's real sensor list β so the
|
| 47 |
+
# robot-capability gate uses a coarse modality-count requirement ("an RGB-vision
|
| 48 |
+
# embodiment") rather than keyed camera1..4 the franka_panda manifest doesn't
|
| 49 |
+
# declare. The per-camera 3D fusion happens inside the policy sidecar.
|
| 50 |
+
sensors_required:
|
| 51 |
+
- modality: "rgb"
|
| 52 |
+
count: 1
|
| 53 |
+
min_width: 128
|
| 54 |
+
min_height: 128
|
| 55 |
+
|
| 56 |
+
# The policy emits next-keyframe end-effector poses; RLBench executes each via
|
| 57 |
+
# its sampling-based motion planner (EndEffectorPoseViaPlanning). Absolute EE
|
| 58 |
+
# pose targets, not deltas.
|
| 59 |
+
actuators_required:
|
| 60 |
+
- kind: "cartesian_pose"
|
| 61 |
+
control_mode_semantics:
|
| 62 |
+
mode: "absolute"
|
| 63 |
+
reference_frame: "panda_link0"
|
| 64 |
+
|
| 65 |
+
# ββ Runtime / weights ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 66 |
+
runtime: "pytorch"
|
| 67 |
+
min_vram_gb:
|
| 68 |
+
bf16: 2.0
|
| 69 |
+
fp32: 2.0
|
| 70 |
+
weights_uri: "hf://katefgroup/3d_diffuser_actor"
|
| 71 |
+
|
| 72 |
+
# ββ Execution semantics ββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 73 |
+
# One macro-keypose per step (the scene's mover plans + executes it). 100 DDIM
|
| 74 |
+
# denoising steps per keypose; ~1.2 s/keypose on an 8 GB Ada GPU.
|
| 75 |
+
chunk_size: 1
|
| 76 |
+
latency_budget:
|
| 77 |
+
per_chunk_ms: 3000.0
|
| 78 |
+
|
| 79 |
+
# ββ IO contract ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 80 |
+
# 8-D keyframe action: [x y z qx qy qz qw gripper_open] (world frame). The scene
|
| 81 |
+
# sidecar appends the peract fork's ignore_collisions channel + plans the motion.
|
| 82 |
+
action_contract:
|
| 83 |
+
dim: 8
|
| 84 |
+
slots:
|
| 85 |
+
- {range: [0, 6], control_mode: "cartesian_pose", ee: "panda_hand", frame: "panda_link0"}
|
| 86 |
+
- {range: [7, 7], control_mode: "gripper_position", ee: "panda_gripper"}
|
| 87 |
+
|
| 88 |
+
# ββ Provenance βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 89 |
+
paper_url: "https://arxiv.org/abs/2402.10885"
|
| 90 |
+
source_repo: "hf://katefgroup/3d_diffuser_actor"
|
| 91 |
+
|
| 92 |
+
description: >
|
| 93 |
+
3D Diffuser Actor (Ke et al., 2024) β a diffusion policy over end-effector
|
| 94 |
+
keyposes fusing multi-view RGB-D into a 3D scene representation, on the RLBench
|
| 95 |
+
PerAct 18-task benchmark. Shares the out-of-process CoppeliaSim/PyRep sidecar
|
| 96 |
+
with the rlbench scene backend (ADR-0061). MIT code + checkpoints. The PerAct
|
| 97 |
+
checkpoint is loaded verbatim; ships three live-verified starter tasks.
|
| 98 |
+
|
| 99 |
+
# ADR-0022 β action vocabulary surfaced to the reasoner LLM tool palette.
|
| 100 |
+
actions:
|
| 101 |
+
- "generalist"
|
| 102 |
+
- "open"
|
| 103 |
+
- "close"
|
| 104 |
+
- "pick"
|
| 105 |
+
- "place"
|
| 106 |
+
objects: []
|
| 107 |
+
scenes:
|
| 108 |
+
- "tabletop"
|