AdrianLlopart commited on
Commit
73940ec
·
0 Parent(s):

Duplicate from AdrianLlopart/rskill-diffusion-pusht

Browse files
Files changed (5) hide show
  1. .gitattributes +35 -0
  2. README.md +97 -0
  3. eval/README.md +14 -0
  4. eval/pusht.json +90 -0
  5. rskill.yaml +78 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - OpenRAL
4
+ - rskill
5
+ - diffusion-policy
6
+ - lerobot
7
+ - pusht
8
+ - manipulation
9
+ license: apache-2.0
10
+ language:
11
+ - en
12
+ ---
13
+
14
+ # rskill-diffusion-pusht
15
+
16
+ > **OpenRAL rSkill** — Diffusion Policy (Chi et al., 2023) trained on
17
+ > the PushT 2-D pushing benchmark, packaged for `OpenRAL`.
18
+
19
+ This package wraps [`lerobot/diffusion_pusht`](https://huggingface.co/lerobot/diffusion_pusht)
20
+ with a `rskill.yaml` manifest. It does **not** copy model weights.
21
+
22
+ ## Upstream model
23
+
24
+ | Field | Value |
25
+ | --- | --- |
26
+ | Source repo | [`lerobot/diffusion_pusht`](https://huggingface.co/lerobot/diffusion_pusht) |
27
+ | Paper | [arxiv:2303.04137](https://arxiv.org/abs/2303.04137) — *Diffusion Policy: Visuomotor Policy Learning via Action Diffusion* (Chi et al., 2023) |
28
+ | License | Apache-2.0 |
29
+ | Parameters | ~263 M (1-D U-Net) |
30
+ | Action chunk | 8 (within horizon 16) |
31
+ | Denoising | 100 DDPM steps per chunk |
32
+ | Benchmark | PushT (`gym_pusht`, `pymunk` 2-D rigid-body) |
33
+
34
+ Per-chunk inference is dominated by the 100-step denoising loop; cached
35
+ pops are essentially free, so this is the extreme test of the
36
+ queue-drain contract in `ChunkedExecutor`.
37
+
38
+ ## Supported robots
39
+
40
+ | Robot | Embodiment tag | Status | Notes |
41
+ | --- | --- | --- | --- |
42
+ | PushT 2-D pseudo-robot (`gym_pusht/PushT-v0`) | `pusht`, `lerobot` | ✓ sim | 2-D end-effector pushing a T block on a 512 × 512 px canvas |
43
+
44
+ ## Sensors required
45
+
46
+ | Key | Type | Resolution | Format |
47
+ | --- | --- | --- | --- |
48
+ | `observation.image` | RGB camera | 96 × 96 | `float32` |
49
+
50
+ PushT predates the multi-cam `observation.images.cameraN` convention and
51
+ exposes the raw key `observation.image`.
52
+
53
+ ## Manifest summary
54
+
55
+ | Field | Value |
56
+ | --- | --- |
57
+ | `name` | `AdrianLlopart/rskill-diffusion-pusht` |
58
+ | `version` | `0.1.0` |
59
+ | `license` | `apache-2.0` |
60
+ | `role` | `s1` |
61
+ | `embodiment_tags` | `pusht`, `lerobot` |
62
+ | `runtime` / `quantization.dtype` | `pytorch` / `fp32` |
63
+ | `weights_uri` | `hf://lerobot/diffusion_pusht` |
64
+ | `latency_budget.per_chunk_ms` | 1 250 ms (warm full-chunk ≈ 1 756 ms on RTX 4070 Laptop, dominated by DDPM) |
65
+ | `latency_budget.warmup_ms` | 10 000 ms |
66
+ | `latency_budget.load_ms` | 30 000 ms |
67
+ | `commercial_use_allowed` | `true` |
68
+
69
+ Full schema: `openral_core.RSkillManifest` —
70
+ `python/core/src/openral_core/schemas.py`.
71
+
72
+ ## Reproduction
73
+
74
+ ```bash
75
+ git clone https://github.com/AdrianLlopart/openral && cd OpenRAL
76
+ just bootstrap && uv sync --all-packages --group sim
77
+
78
+ # End-to-end via the canonical SimEnvironment config (CPU is enough):
79
+ just sim-diffusion-pusht
80
+ # which runs:
81
+ # ral sim run --config examples/sim/diffusion_pusht.yaml --save-video
82
+
83
+ # Sim test (gym_pusht + pymunk):
84
+ uv run pytest tests/sim/test_pusht_2d_diffusion_pusht.py -v -m sim
85
+ ```
86
+
87
+ ## License
88
+
89
+ This rSkill package (`rskill.yaml`, `README.md`) is **Apache-2.0** to
90
+ match the upstream weights. Commercial use is allowed
91
+ (`commercial_use_allowed: true`).
92
+
93
+ ## See also
94
+
95
+ - [`robots/pusht_2d/README.md`](../../robots/pusht_2d/README.md) — RobotDescription manifest.
96
+ - [`examples/sim/diffusion_pusht.yaml`](../../examples/sim/diffusion_pusht.yaml) — paired SimEnvironment config.
97
+ - [`docs/reference/vla_compatibility.md`](../../docs/reference/vla_compatibility.md) — VLA × Robot × Sim matrix.
eval/README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # `rskills/diffusion-pusht/eval/` — benchmark results
2
+
3
+ `pusht.json` is the PushT mean-coverage-IoU benchmark result block for this
4
+ rSkill. Validated against
5
+ [`openral_core.RSkillEvalResult`](../../../docs/reference/schemas/RSkillEvalResult.json)
6
+ at load time by the `rSkill` loader and surfaced by `ral benchmark report`.
7
+
8
+ | Field | Value |
9
+ | --- | --- |
10
+ | Source | Chi et al., 2023 — *Diffusion Policy: Visuomotor Policy Learning via Action Diffusion* (arxiv:2303.04137) |
11
+ | Benchmark | PushT (`gym_pusht/PushT-v0`, pymunk 2-D rigid-body) |
12
+ | Robot | PushT 2-D pseudo-robot (single 2-D end-effector tip) |
13
+ | Reproduced locally? | ✗ — paper-only. `tests/sim/test_pusht_2d_diffusion_pusht.py` runs a single episode for IO + latency + VRAM verification. |
14
+ | Reproduce | `just sim-diffusion-pusht` (single episode); raise `--n-episodes 50` for the full paper protocol. |
eval/pusht.json ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "schema_version": "1",
3
+ "source": {
4
+ "paper": "https://arxiv.org/abs/2303.04137",
5
+ "arxiv": "https://arxiv.org/abs/2303.04137",
6
+ "model_variant": "diffusion",
7
+ "evaluated_by": "OpenRAL:ral benchmark run",
8
+ "reproduced_locally": true,
9
+ "reproduction_planned": null,
10
+ "reproduction_cli": "ral benchmark run --suite pusht --rskill rskill://diffusion-pusht",
11
+ "table": null,
12
+ "status": "reproduced"
13
+ },
14
+ "benchmark": {
15
+ "name": "PushT (gym-pusht)",
16
+ "dataset": null,
17
+ "protocol": "50 episodes per task, success_key=is_success, max_steps=300",
18
+ "robot": "pusht_2d",
19
+ "simulator": "gym-pusht (pymunk 2-D)"
20
+ },
21
+ "eval_config": {
22
+ "n_episodes": 50,
23
+ "seeds": [
24
+ 0,
25
+ 1,
26
+ 2,
27
+ 3,
28
+ 4,
29
+ 5,
30
+ 6,
31
+ 7,
32
+ 8,
33
+ 9,
34
+ 10,
35
+ 11,
36
+ 12,
37
+ 13,
38
+ 14,
39
+ 15,
40
+ 16,
41
+ 17,
42
+ 18,
43
+ 19,
44
+ 20,
45
+ 21,
46
+ 22,
47
+ 23,
48
+ 24,
49
+ 25,
50
+ 26,
51
+ 27,
52
+ 28,
53
+ 29,
54
+ 30,
55
+ 31,
56
+ 32,
57
+ 33,
58
+ 34,
59
+ 35,
60
+ 36,
61
+ 37,
62
+ 38,
63
+ 39,
64
+ 40,
65
+ 41,
66
+ 42,
67
+ 43,
68
+ 44,
69
+ 45,
70
+ 46,
71
+ 47,
72
+ 48,
73
+ 49
74
+ ],
75
+ "success_key": "is_success",
76
+ "max_steps": 300,
77
+ "vla_id": "diffusion",
78
+ "weights_uri": "rskill://rskills/diffusion-pusht"
79
+ },
80
+ "results": {
81
+ "pusht/0_success_rate": 0.6,
82
+ "avg_success_rate": 0.6,
83
+ "n_tasks": 1,
84
+ "n_episodes_per_task": 50,
85
+ "n_episodes_total": 50,
86
+ "mean_step_latency_ms_avg": 232.5852261891309,
87
+ "mean_coverage_iou": 0.9496237652727986
88
+ },
89
+ "baselines": {}
90
+ }
rskill.yaml ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # rSkill manifest — OpenRAL packaging format V1 (CLAUDE.md §6.4)
2
+ # Wraps: lerobot/diffusion_pusht (Apache-2.0)
3
+ # Paper: Chi et al., 2023 — Diffusion Policy.
4
+
5
+ schema_version: "1"
6
+
7
+ name: "AdrianLlopart/rskill-diffusion-pusht"
8
+ version: "0.1.0"
9
+ license: "apache-2.0"
10
+ role: "s1"
11
+
12
+ model_family: "diffusion"
13
+
14
+ # 2-D PushT pseudo-robot (single end-effector pushing a T block). Used by
15
+ # tests/sim/test_pusht_2d_diffusion_pusht.py against gym_pusht/PushT-v0.
16
+ embodiment_tags:
17
+ - "pusht"
18
+
19
+ capabilities_required: {}
20
+
21
+ # PushT exposes a single 96×96 RGB top-down stream (named
22
+ # observation.image, not images.cameraN — PushT predates the multi-cam
23
+ # convention used by SmolVLA/ACT).
24
+ sensors_required:
25
+ - modality: "rgb"
26
+ vla_feature_key: "observation.image"
27
+ min_width: 96
28
+ min_height: 96
29
+
30
+ # Output side (ADR-0013). The pusht_2d scene-pseudo-robot exposes a 2-D
31
+ # (x, y) absolute position; robots/pusht_2d/robot.yaml advertises
32
+ # `cartesian_pose` as its supported control mode (the codebase
33
+ # convention for the PushT 2-D action regardless of dimensionality).
34
+ # The loader auto-fills n_dof (2) + vla_action_key from the robot YAML.
35
+ actuators_required:
36
+ - kind: "cartesian_pose"
37
+
38
+ runtime: "pytorch"
39
+
40
+ quantization:
41
+ dtype: "fp32"
42
+ backend: "pytorch"
43
+
44
+ weights_uri: "hf://lerobot/diffusion_pusht"
45
+
46
+ chunk_size: 8
47
+
48
+ latency_budget:
49
+ # Reference-host measurement (RTX 4070 Laptop, CUDA 12.8, PyTorch 2.10)
50
+ # of the warm full-chunk inference is 1756 ms — Diffusion Policy runs
51
+ # 100 DDPM denoising steps per chunk, the dominant cost in the suite.
52
+ # Pinning per_chunk_ms to 1250 ms with tolerance_pct=100 yields the
53
+ # previous 2.5 s ceiling (_WARM_CHUNK_CEILING_S in the sim test).
54
+ per_chunk_ms: 1250.0
55
+ warmup_ms: 10000.0
56
+ load_ms: 30000.0
57
+
58
+ fallback_skill_id: null
59
+
60
+ # Headline success rate from skills/diffusion-pusht/eval/pusht.json.
61
+ benchmarks:
62
+ pusht: 0.60
63
+
64
+ # PushT is a 2-DoF planar pushing benchmark; proprio state is 2-D
65
+ # (x, y) of the end effector.
66
+ policy_id: "diffusion"
67
+ state_contract:
68
+ dim: 2
69
+
70
+ paper_url: "https://arxiv.org/abs/2303.04137"
71
+ source_repo: "hf://lerobot/diffusion_pusht"
72
+
73
+ description: >
74
+ Diffusion Policy (~263M-param U-Net with 100-step DDPM denoiser) for
75
+ the PushT 2-DoF pushing benchmark. Action chunks of length 8 within a
76
+ horizon of 16. The chunk inference cost is dominated by the denoising
77
+ loop, so cached pops are essentially free — this is the extreme test
78
+ of the queue-drain contract.