Instructions to use OpenRAL/rskill-diffusion-pusht with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use OpenRAL/rskill-diffusion-pusht with LeRobot:
- Notebooks
- Google Colab
- Kaggle
Commit ·
73940ec
0
Parent(s):
Duplicate from AdrianLlopart/rskill-diffusion-pusht
Browse files- .gitattributes +35 -0
- README.md +97 -0
- eval/README.md +14 -0
- eval/pusht.json +90 -0
- rskill.yaml +78 -0
.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- OpenRAL
|
| 4 |
+
- rskill
|
| 5 |
+
- diffusion-policy
|
| 6 |
+
- lerobot
|
| 7 |
+
- pusht
|
| 8 |
+
- manipulation
|
| 9 |
+
license: apache-2.0
|
| 10 |
+
language:
|
| 11 |
+
- en
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# rskill-diffusion-pusht
|
| 15 |
+
|
| 16 |
+
> **OpenRAL rSkill** — Diffusion Policy (Chi et al., 2023) trained on
|
| 17 |
+
> the PushT 2-D pushing benchmark, packaged for `OpenRAL`.
|
| 18 |
+
|
| 19 |
+
This package wraps [`lerobot/diffusion_pusht`](https://huggingface.co/lerobot/diffusion_pusht)
|
| 20 |
+
with a `rskill.yaml` manifest. It does **not** copy model weights.
|
| 21 |
+
|
| 22 |
+
## Upstream model
|
| 23 |
+
|
| 24 |
+
| Field | Value |
|
| 25 |
+
| --- | --- |
|
| 26 |
+
| Source repo | [`lerobot/diffusion_pusht`](https://huggingface.co/lerobot/diffusion_pusht) |
|
| 27 |
+
| Paper | [arxiv:2303.04137](https://arxiv.org/abs/2303.04137) — *Diffusion Policy: Visuomotor Policy Learning via Action Diffusion* (Chi et al., 2023) |
|
| 28 |
+
| License | Apache-2.0 |
|
| 29 |
+
| Parameters | ~263 M (1-D U-Net) |
|
| 30 |
+
| Action chunk | 8 (within horizon 16) |
|
| 31 |
+
| Denoising | 100 DDPM steps per chunk |
|
| 32 |
+
| Benchmark | PushT (`gym_pusht`, `pymunk` 2-D rigid-body) |
|
| 33 |
+
|
| 34 |
+
Per-chunk inference is dominated by the 100-step denoising loop; cached
|
| 35 |
+
pops are essentially free, so this is the extreme test of the
|
| 36 |
+
queue-drain contract in `ChunkedExecutor`.
|
| 37 |
+
|
| 38 |
+
## Supported robots
|
| 39 |
+
|
| 40 |
+
| Robot | Embodiment tag | Status | Notes |
|
| 41 |
+
| --- | --- | --- | --- |
|
| 42 |
+
| PushT 2-D pseudo-robot (`gym_pusht/PushT-v0`) | `pusht`, `lerobot` | ✓ sim | 2-D end-effector pushing a T block on a 512 × 512 px canvas |
|
| 43 |
+
|
| 44 |
+
## Sensors required
|
| 45 |
+
|
| 46 |
+
| Key | Type | Resolution | Format |
|
| 47 |
+
| --- | --- | --- | --- |
|
| 48 |
+
| `observation.image` | RGB camera | 96 × 96 | `float32` |
|
| 49 |
+
|
| 50 |
+
PushT predates the multi-cam `observation.images.cameraN` convention and
|
| 51 |
+
exposes the raw key `observation.image`.
|
| 52 |
+
|
| 53 |
+
## Manifest summary
|
| 54 |
+
|
| 55 |
+
| Field | Value |
|
| 56 |
+
| --- | --- |
|
| 57 |
+
| `name` | `AdrianLlopart/rskill-diffusion-pusht` |
|
| 58 |
+
| `version` | `0.1.0` |
|
| 59 |
+
| `license` | `apache-2.0` |
|
| 60 |
+
| `role` | `s1` |
|
| 61 |
+
| `embodiment_tags` | `pusht`, `lerobot` |
|
| 62 |
+
| `runtime` / `quantization.dtype` | `pytorch` / `fp32` |
|
| 63 |
+
| `weights_uri` | `hf://lerobot/diffusion_pusht` |
|
| 64 |
+
| `latency_budget.per_chunk_ms` | 1 250 ms (warm full-chunk ≈ 1 756 ms on RTX 4070 Laptop, dominated by DDPM) |
|
| 65 |
+
| `latency_budget.warmup_ms` | 10 000 ms |
|
| 66 |
+
| `latency_budget.load_ms` | 30 000 ms |
|
| 67 |
+
| `commercial_use_allowed` | `true` |
|
| 68 |
+
|
| 69 |
+
Full schema: `openral_core.RSkillManifest` —
|
| 70 |
+
`python/core/src/openral_core/schemas.py`.
|
| 71 |
+
|
| 72 |
+
## Reproduction
|
| 73 |
+
|
| 74 |
+
```bash
|
| 75 |
+
git clone https://github.com/AdrianLlopart/openral && cd OpenRAL
|
| 76 |
+
just bootstrap && uv sync --all-packages --group sim
|
| 77 |
+
|
| 78 |
+
# End-to-end via the canonical SimEnvironment config (CPU is enough):
|
| 79 |
+
just sim-diffusion-pusht
|
| 80 |
+
# which runs:
|
| 81 |
+
# ral sim run --config examples/sim/diffusion_pusht.yaml --save-video
|
| 82 |
+
|
| 83 |
+
# Sim test (gym_pusht + pymunk):
|
| 84 |
+
uv run pytest tests/sim/test_pusht_2d_diffusion_pusht.py -v -m sim
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
## License
|
| 88 |
+
|
| 89 |
+
This rSkill package (`rskill.yaml`, `README.md`) is **Apache-2.0** to
|
| 90 |
+
match the upstream weights. Commercial use is allowed
|
| 91 |
+
(`commercial_use_allowed: true`).
|
| 92 |
+
|
| 93 |
+
## See also
|
| 94 |
+
|
| 95 |
+
- [`robots/pusht_2d/README.md`](../../robots/pusht_2d/README.md) — RobotDescription manifest.
|
| 96 |
+
- [`examples/sim/diffusion_pusht.yaml`](../../examples/sim/diffusion_pusht.yaml) — paired SimEnvironment config.
|
| 97 |
+
- [`docs/reference/vla_compatibility.md`](../../docs/reference/vla_compatibility.md) — VLA × Robot × Sim matrix.
|
eval/README.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# `rskills/diffusion-pusht/eval/` — benchmark results
|
| 2 |
+
|
| 3 |
+
`pusht.json` is the PushT mean-coverage-IoU benchmark result block for this
|
| 4 |
+
rSkill. Validated against
|
| 5 |
+
[`openral_core.RSkillEvalResult`](../../../docs/reference/schemas/RSkillEvalResult.json)
|
| 6 |
+
at load time by the `rSkill` loader and surfaced by `ral benchmark report`.
|
| 7 |
+
|
| 8 |
+
| Field | Value |
|
| 9 |
+
| --- | --- |
|
| 10 |
+
| Source | Chi et al., 2023 — *Diffusion Policy: Visuomotor Policy Learning via Action Diffusion* (arxiv:2303.04137) |
|
| 11 |
+
| Benchmark | PushT (`gym_pusht/PushT-v0`, pymunk 2-D rigid-body) |
|
| 12 |
+
| Robot | PushT 2-D pseudo-robot (single 2-D end-effector tip) |
|
| 13 |
+
| Reproduced locally? | ✗ — paper-only. `tests/sim/test_pusht_2d_diffusion_pusht.py` runs a single episode for IO + latency + VRAM verification. |
|
| 14 |
+
| Reproduce | `just sim-diffusion-pusht` (single episode); raise `--n-episodes 50` for the full paper protocol. |
|
eval/pusht.json
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"schema_version": "1",
|
| 3 |
+
"source": {
|
| 4 |
+
"paper": "https://arxiv.org/abs/2303.04137",
|
| 5 |
+
"arxiv": "https://arxiv.org/abs/2303.04137",
|
| 6 |
+
"model_variant": "diffusion",
|
| 7 |
+
"evaluated_by": "OpenRAL:ral benchmark run",
|
| 8 |
+
"reproduced_locally": true,
|
| 9 |
+
"reproduction_planned": null,
|
| 10 |
+
"reproduction_cli": "ral benchmark run --suite pusht --rskill rskill://diffusion-pusht",
|
| 11 |
+
"table": null,
|
| 12 |
+
"status": "reproduced"
|
| 13 |
+
},
|
| 14 |
+
"benchmark": {
|
| 15 |
+
"name": "PushT (gym-pusht)",
|
| 16 |
+
"dataset": null,
|
| 17 |
+
"protocol": "50 episodes per task, success_key=is_success, max_steps=300",
|
| 18 |
+
"robot": "pusht_2d",
|
| 19 |
+
"simulator": "gym-pusht (pymunk 2-D)"
|
| 20 |
+
},
|
| 21 |
+
"eval_config": {
|
| 22 |
+
"n_episodes": 50,
|
| 23 |
+
"seeds": [
|
| 24 |
+
0,
|
| 25 |
+
1,
|
| 26 |
+
2,
|
| 27 |
+
3,
|
| 28 |
+
4,
|
| 29 |
+
5,
|
| 30 |
+
6,
|
| 31 |
+
7,
|
| 32 |
+
8,
|
| 33 |
+
9,
|
| 34 |
+
10,
|
| 35 |
+
11,
|
| 36 |
+
12,
|
| 37 |
+
13,
|
| 38 |
+
14,
|
| 39 |
+
15,
|
| 40 |
+
16,
|
| 41 |
+
17,
|
| 42 |
+
18,
|
| 43 |
+
19,
|
| 44 |
+
20,
|
| 45 |
+
21,
|
| 46 |
+
22,
|
| 47 |
+
23,
|
| 48 |
+
24,
|
| 49 |
+
25,
|
| 50 |
+
26,
|
| 51 |
+
27,
|
| 52 |
+
28,
|
| 53 |
+
29,
|
| 54 |
+
30,
|
| 55 |
+
31,
|
| 56 |
+
32,
|
| 57 |
+
33,
|
| 58 |
+
34,
|
| 59 |
+
35,
|
| 60 |
+
36,
|
| 61 |
+
37,
|
| 62 |
+
38,
|
| 63 |
+
39,
|
| 64 |
+
40,
|
| 65 |
+
41,
|
| 66 |
+
42,
|
| 67 |
+
43,
|
| 68 |
+
44,
|
| 69 |
+
45,
|
| 70 |
+
46,
|
| 71 |
+
47,
|
| 72 |
+
48,
|
| 73 |
+
49
|
| 74 |
+
],
|
| 75 |
+
"success_key": "is_success",
|
| 76 |
+
"max_steps": 300,
|
| 77 |
+
"vla_id": "diffusion",
|
| 78 |
+
"weights_uri": "rskill://rskills/diffusion-pusht"
|
| 79 |
+
},
|
| 80 |
+
"results": {
|
| 81 |
+
"pusht/0_success_rate": 0.6,
|
| 82 |
+
"avg_success_rate": 0.6,
|
| 83 |
+
"n_tasks": 1,
|
| 84 |
+
"n_episodes_per_task": 50,
|
| 85 |
+
"n_episodes_total": 50,
|
| 86 |
+
"mean_step_latency_ms_avg": 232.5852261891309,
|
| 87 |
+
"mean_coverage_iou": 0.9496237652727986
|
| 88 |
+
},
|
| 89 |
+
"baselines": {}
|
| 90 |
+
}
|
rskill.yaml
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# rSkill manifest — OpenRAL packaging format V1 (CLAUDE.md §6.4)
|
| 2 |
+
# Wraps: lerobot/diffusion_pusht (Apache-2.0)
|
| 3 |
+
# Paper: Chi et al., 2023 — Diffusion Policy.
|
| 4 |
+
|
| 5 |
+
schema_version: "1"
|
| 6 |
+
|
| 7 |
+
name: "AdrianLlopart/rskill-diffusion-pusht"
|
| 8 |
+
version: "0.1.0"
|
| 9 |
+
license: "apache-2.0"
|
| 10 |
+
role: "s1"
|
| 11 |
+
|
| 12 |
+
model_family: "diffusion"
|
| 13 |
+
|
| 14 |
+
# 2-D PushT pseudo-robot (single end-effector pushing a T block). Used by
|
| 15 |
+
# tests/sim/test_pusht_2d_diffusion_pusht.py against gym_pusht/PushT-v0.
|
| 16 |
+
embodiment_tags:
|
| 17 |
+
- "pusht"
|
| 18 |
+
|
| 19 |
+
capabilities_required: {}
|
| 20 |
+
|
| 21 |
+
# PushT exposes a single 96×96 RGB top-down stream (named
|
| 22 |
+
# observation.image, not images.cameraN — PushT predates the multi-cam
|
| 23 |
+
# convention used by SmolVLA/ACT).
|
| 24 |
+
sensors_required:
|
| 25 |
+
- modality: "rgb"
|
| 26 |
+
vla_feature_key: "observation.image"
|
| 27 |
+
min_width: 96
|
| 28 |
+
min_height: 96
|
| 29 |
+
|
| 30 |
+
# Output side (ADR-0013). The pusht_2d scene-pseudo-robot exposes a 2-D
|
| 31 |
+
# (x, y) absolute position; robots/pusht_2d/robot.yaml advertises
|
| 32 |
+
# `cartesian_pose` as its supported control mode (the codebase
|
| 33 |
+
# convention for the PushT 2-D action regardless of dimensionality).
|
| 34 |
+
# The loader auto-fills n_dof (2) + vla_action_key from the robot YAML.
|
| 35 |
+
actuators_required:
|
| 36 |
+
- kind: "cartesian_pose"
|
| 37 |
+
|
| 38 |
+
runtime: "pytorch"
|
| 39 |
+
|
| 40 |
+
quantization:
|
| 41 |
+
dtype: "fp32"
|
| 42 |
+
backend: "pytorch"
|
| 43 |
+
|
| 44 |
+
weights_uri: "hf://lerobot/diffusion_pusht"
|
| 45 |
+
|
| 46 |
+
chunk_size: 8
|
| 47 |
+
|
| 48 |
+
latency_budget:
|
| 49 |
+
# Reference-host measurement (RTX 4070 Laptop, CUDA 12.8, PyTorch 2.10)
|
| 50 |
+
# of the warm full-chunk inference is 1756 ms — Diffusion Policy runs
|
| 51 |
+
# 100 DDPM denoising steps per chunk, the dominant cost in the suite.
|
| 52 |
+
# Pinning per_chunk_ms to 1250 ms with tolerance_pct=100 yields the
|
| 53 |
+
# previous 2.5 s ceiling (_WARM_CHUNK_CEILING_S in the sim test).
|
| 54 |
+
per_chunk_ms: 1250.0
|
| 55 |
+
warmup_ms: 10000.0
|
| 56 |
+
load_ms: 30000.0
|
| 57 |
+
|
| 58 |
+
fallback_skill_id: null
|
| 59 |
+
|
| 60 |
+
# Headline success rate from skills/diffusion-pusht/eval/pusht.json.
|
| 61 |
+
benchmarks:
|
| 62 |
+
pusht: 0.60
|
| 63 |
+
|
| 64 |
+
# PushT is a 2-DoF planar pushing benchmark; proprio state is 2-D
|
| 65 |
+
# (x, y) of the end effector.
|
| 66 |
+
policy_id: "diffusion"
|
| 67 |
+
state_contract:
|
| 68 |
+
dim: 2
|
| 69 |
+
|
| 70 |
+
paper_url: "https://arxiv.org/abs/2303.04137"
|
| 71 |
+
source_repo: "hf://lerobot/diffusion_pusht"
|
| 72 |
+
|
| 73 |
+
description: >
|
| 74 |
+
Diffusion Policy (~263M-param U-Net with 100-step DDPM denoiser) for
|
| 75 |
+
the PushT 2-DoF pushing benchmark. Action chunks of length 8 within a
|
| 76 |
+
horizon of 16. The chunk inference cost is dominated by the denoising
|
| 77 |
+
loop, so cached pops are essentially free — this is the extreme test
|
| 78 |
+
of the queue-drain contract.
|