AdrianLlopart's picture
chore: publish rSkill OpenRAL/rskill-3d-diffuser-actor-rlbench v0.1.0
d592f2b verified
|
Raw
History Blame Contribute Delete
3.43 kB
---
name: 3d-diffuser-actor-rlbench
description: >-
S1 Vision-Language-Action policy. Capabilities: generalist, open, close, pick, place. 3D Diffuser Actor (Ke et al., 2024) — a diffusion policy over end-effector keyposes fusing multi-view RGB-D into a 3D scene representation, on the RLBench PerAct 18-task benchmark. Shares the out-of-process CoppeliaSim/PyRep sidecar with the rlbench scene backend (ADR-0062). MIT code + checkpoints. The PerAct checkpoint is loaded verbatim; ships three live-verified starter tasks. Discovery view of an OpenRAL rSkill — NOT directly runnable by an agent harness; it runs via rSkill.from_pretrained + the robot HAL.
metadata:
openral_rskill: true # generated discovery view of an rSkill
schema_version: 0.1
rskill_id: OpenRAL/rskill-3d-diffuser-actor-rlbench
manifest: ./rskill.yaml
role: s1
kind: vla
model_family: diffuser_actor
embodiment_tags: [franka_panda]
actions: [generalist, open, close, pick, place]
scenes: [tabletop]
sensors_required: [rgb]
action_dim: 8
runtime: pytorch
min_vram_gb: {bf16: 2.0, fp32: 2.0}
chunk_size: 1
latency_budget: {per_chunk_ms: 3000.0}
license_code: Apache-2.0
license_weights: mit
weights_uri: hf://katefgroup/3d_diffuser_actor
source_repo: hf://katefgroup/3d_diffuser_actor
paper_url: https://arxiv.org/abs/2402.10885
---
# 3d-diffuser-actor-rlbench — rSkill discovery view
> **Generated view, not a hand-written skill.** This `SKILL.md` is a discovery-only
> mirror of [`rskill.yaml`](./rskill.yaml), produced by `tools/generate_rskill_skillmd.py`.
> It lets tools that read the standard agent-skill format find and reason about this
> OpenRAL rSkill. The `rskill.yaml` manifest is the single source of truth
> (CLAUDE.md §1.3). Do not edit by hand — edit the manifest and regenerate.
## What it is
An OpenRAL **Vision-Language-Action policy** (`role: s1`, `kind: vla`). 3D Diffuser Actor (Ke et al., 2024) — a diffusion policy over end-effector keyposes fusing multi-view RGB-D into a 3D scene representation, on the RLBench PerAct 18-task benchmark. Shares the out-of-process CoppeliaSim/PyRep sidecar with the rlbench scene backend (ADR-0062). MIT code + checkpoints. The PerAct checkpoint is loaded verbatim; ships three live-verified starter tasks.
## Capabilities
- **Verbs:** generalist · open · close · pick · place
- **Scenes:** tabletop
- **Embodiments:** franka_panda
## Why this is discovery-only
An agent skill is natural-language instructions loaded into an LLM's context. An rSkill
is an executable artifact: it carries a typed capability/embodiment contract, model weights,
a runtime, and a license/provenance gate — none of which fit in freeform markdown. So an
agent can use this view to *select* the right skill, but cannot *execute* it by loading
this file. Execution always goes through the OpenRAL loader and the robot HAL.
## License
- **Code:** Apache-2.0.
- **Weights:** `mit` — permissive / commercial-use OK
## How to actually run it (not via an agent harness)
```python
from openral_rskill import rSkill
skill = rSkill.from_pretrained("OpenRAL/rskill-3d-diffuser-actor-rlbench")
# the loader validates embodiment / sensors / runtime / quantization against the target
# RobotDescription and enforces the weight-license gate before any weights load.
```
See [`rskill.yaml`](./rskill.yaml) for the authoritative, validated manifest.