EXOKERN1 commited on
Commit
63bf94e
·
verified ·
1 Parent(s): 85434f3

Improve model card metadata and benchmark clarity

Browse files
Files changed (1) hide show
  1. README.md +160 -97
README.md CHANGED
@@ -1,149 +1,212 @@
1
  ---
2
- pretty_name: "EXOKERN Skill v0.1.1 Robust Contact-Rich Peg Insertion (Diffusion Policy)"
3
  license: cc-by-nc-4.0
4
- task_categories:
5
- - robotics
6
- language:
7
- - en
8
  tags:
9
  - robotics
 
10
  - force-torque
11
  - contact-rich
12
  - manipulation
13
  - insertion
14
- - diffusion-policy
15
  - domain-randomization
16
  - sim-to-real
17
  - isaac-lab
18
  - franka
19
  - physical-ai
20
- - pretrained-model
21
- library_name: exokern
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ---
23
 
24
- # EXOKERN Skill v0.1.1 Robust Contact-Rich Peg Insertion
25
-
26
- Pre-trained Diffusion Policy for the contact-rich **Peg Insertion** task, trained on the domain-randomized [ContactBench v0.1.1 dataset](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1).
27
-
28
- This skill demonstrates **Level 1 (Sim-Validated + Out-of-Distribution)** of the EXOKERN Quality Pyramid. It achieves a 100% success rate under severe domain randomization, establishing a robust baseline for force-aware manipulation.
29
-
30
- Part of the [EXOKERN Skill Platform](https://huggingface.co/EXOKERN) — unlocking the "Kontakt-Foundation Model" for industrial assembly.
31
-
32
- ![Training Curves](https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/training_curve_full_ft_seed42.png)
33
-
34
- ## Architecture
35
-
36
- This repository contains two variants of a 71.3M parameter Vision-Language-Action (VLA) style Visuomotor Diffusion Policy, adapted for low-dimensional proprietary state spaces:
37
 
38
- 1. **`full_ft`** (22-dim input): Includes the 6-axis end-effector wrench.
39
- 2. **`no_ft`** (16-dim input): Ablated baseline without force/torque data.
40
 
41
- Both networks use a 1D Temporal CNN backbone + DDIM (16 diffusion steps).
 
42
 
43
- ![Architecture](https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/architecture.png)
44
 
45
- ## Evaluation Results: 600-Episode Multi-Seed Deep Dive
46
 
47
- We evaluated both the `full_ft` and `no_ft` policies across 3 distinct training seeds (42, 123, 7). Each policy was evaluated for 100 episodes under severe domain randomization (friction, mass, gains).
 
 
 
 
 
 
 
 
 
 
 
48
 
49
- ### Key Finding: Domain Randomization is the Great Equalizer
50
 
51
- | Metric | v0 (Fixed Env) | v0.1.1 (Severe DR) |
52
- |---|:---:|:---:|
53
- | **Success Rate** | 100% | **100%** |
54
- | `full_ft` Avg Force | 3.7 N | **3.67 N** |
55
- | `no_ft` Avg Force | 5.3 N | **3.37 N** |
56
- | **F/T Force Reduction** | 30% | **-9% (Inconclusive)** |
57
 
58
- **Analysis:**
59
- In our v0 baseline (fixed environment), providing force/torque data reduced contact forces by 30%. However, under v0.1.1's severe Domain Randomization, the `no_ft` policy was forced to learn significantly more robust, compliant behaviors from visual/proprioceptive cues alone.
 
 
60
 
61
- As a result, **DR reduced the `no_ft` contact force by 36%** (5.3N -> 3.37N), effectively neutralizing the F/T advantage on this specific, relatively simple task. This confirms our roadmap strategy: F/T sensing becomes critical primarily on harder tasks (e.g., tighter tolerances, screw driving, snap-fits), which is the focus of our upcoming v0.2 release.
62
-
63
- ### Detailed Multi-Seed Results
64
 
65
  | Seed | Condition | Success Rate | Avg Force (N) | Peak Force (N) | Avg Time (s) |
66
- |:---:|:---|:---:|:---:|:---:|:---:|
67
- | **42** | `full_ft` | 100.0% | 3.2 | 10.4 | 25.6 |
68
- | **42** | `no_ft` | 100.0% | 3.4 | 10.4 | 25.7 |
69
- | **123** | `full_ft` | 100.0% | 4.1 | 10.6 | 25.7 |
70
- | **123** | `no_ft` | 100.0% | 3.3 | 10.3 | 25.8 |
71
- | **7** | `full_ft` | 100.0% | 3.7 | 10.9 | 25.5 |
72
- | **7** | `no_ft` | 100.0% | 3.4 | 10.3 | 25.7 |
73
- | **Mean** | `full_ft` | **100.0%** | **3.67 ± 0.45** | **10.63** | **25.6** |
74
- | **Mean** | `no_ft` | **100.0%** | **3.37 ± 0.06** | **10.33** | **25.7** |
75
-
76
- *Note: The `no_ft` policy exhibited significantly lower variance across seeds (±0.06N vs ±0.45N), indicating higher training stability under these specific DR conditions.*
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
- ## Usage with `exokern-eval`
79
 
80
- Evaluate this exact checkpoint in your own Isaac Lab environment using our CLI tool:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
  ```bash
83
  pip install exokern-eval
84
 
85
- # Download the model weights
86
  wget https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/full_ft_best_model.pt
87
 
88
- # Run the evaluation
89
  exokern-eval \
90
  --policy full_ft_best_model.pt \
91
  --env Isaac-Forge-PegInsert-Direct-v0 \
92
  --episodes 100
93
  ```
94
 
95
- ## Inference Code Example
96
 
97
  ```python
98
- import torch
99
- import numpy as np
100
-
101
- # Use EXOKERN's safe_load utility to prevent unpickling vulnerabilities
102
- # https://github.com/Exokern/exokern/blob/main/Skill_Training/safe_load.py
103
- from safe_load import safe_load_checkpoint
104
-
105
- # Load the model directly from HF Hub
106
- from huggingface_hub import hf_hub_download
107
- model_path = hf_hub_download(repo_id="EXOKERN/skill-forge-peginsert-v0.1.1", filename="full_ft_best_model.pt")
108
-
109
- ckpt = safe_load_checkpoint(model_path, device="cuda")
110
- model = ckpt["model"]
111
-
112
- # Create dummy observation (22-dim)
113
- # See Dataset card for tensor layout
114
- obs_dict = {
115
- "observation.state": torch.randn(1, 22).cuda()
116
- }
117
-
118
- # Run inference
119
- with torch.no_grad():
120
- action = model(obs_dict)
121
-
122
- print(f"Predicted action (7-DOF): {action.shape}")
123
- ```
124
 
125
- ## Security Notice (`torch.load` Vulnerability)
126
 
127
- In PyTorch, `torch.load` with `weights_only=False` can execute arbitrary code during unpickling (CVE-2025-32434).
 
 
 
 
128
 
129
- **Always use `weights_only=True` when loading untrusted checkpoints.**
130
- This repository provides a `safe_load_checkpoint` script configured to safely allowlist necessary `numpy` types (like `np.dtypes.Float64DType`) required to read the EXOKERN checkpoint statistics. Download it from our GitHub repository.
131
 
132
- ## Intended Use
 
 
 
133
 
134
- **Primary Use Case:**
135
- Sim-to-real research, benchmarking imitation learning algorithms, and studying the effect of Force/Torque modalities under domain randomization.
 
 
 
136
 
137
- **Out of Scope:**
138
- Direct deployment on physical hardware without comprehensive safety bridging, limit definition, and real-world calibration. This model was trained exclusively in simulation.
139
 
140
- ## License
 
 
 
 
 
 
 
 
 
 
141
 
142
- - **Code & Architecture:** Apache 2.0
143
- - **Model Weights:** CC-BY-NC 4.0 (Free for research & non-commercial use)
144
 
145
- For commercial licensing of the "Kontakt-Foundation Model" or custom Skill porting, please contact EXOKERN.
146
 
147
- ---
148
- **EXOKERN — Bridging the Haptic Gap in Robotic Manipulation**
149
- [exokern.com](https://exokern.com) | [github.com/Exokern](https://github.com/Exokern)
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ pretty_name: "EXOKERN Skill v0.1.1 - Robust Peg Insertion Under Domain Randomization"
3
  license: cc-by-nc-4.0
4
+ pipeline_tag: robotics
5
+ library_name: pytorch
 
 
6
  tags:
7
  - robotics
8
+ - diffusion-policy
9
  - force-torque
10
  - contact-rich
11
  - manipulation
12
  - insertion
 
13
  - domain-randomization
14
  - sim-to-real
15
  - isaac-lab
16
  - franka
17
  - physical-ai
18
+ - lerobot
19
+ datasets:
20
+ - EXOKERN/contactbench-forge-peginsert-v0.1.1
21
+ metrics:
22
+ - success_rate
23
+ - avg_contact_force_n
24
+ - peak_contact_force_n
25
+ model-index:
26
+ - name: EXOKERN Skill v0.1.1 - Peg Insertion (full_ft)
27
+ results:
28
+ - task:
29
+ type: robotics
30
+ name: Peg insertion
31
+ dataset:
32
+ name: EXOKERN ContactBench v0.1.1
33
+ type: EXOKERN/contactbench-forge-peginsert-v0.1.1
34
+ metrics:
35
+ - type: success_rate
36
+ value: 100.0
37
+ name: Success Rate (%)
38
+ - type: avg_contact_force_n
39
+ value: 3.67
40
+ name: Average Contact Force (N)
41
+ - type: peak_contact_force_n
42
+ value: 10.64
43
+ name: Peak Contact Force (N)
44
  ---
45
 
46
+ # EXOKERN Skill v0.1.1 - Robust Peg Insertion Under Domain Randomization
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
+ `skill-forge-peginsert-v0.1.1` is the domain-randomized reference model release in the EXOKERN catalog. It is trained on [EXOKERN ContactBench v0.1.1](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1) and ships the same paired comparison structure as v0:
 
49
 
50
+ - `full_ft_best_model.pt`: primary checkpoint with 22D observations, including force/torque input
51
+ - `no_ft_best_model.pt`: ablation checkpoint with the same architecture and 16D state-only observations
52
 
53
+ This release should be read as a robustness benchmark first. Both policies remain successful under severe domain randomization, and the repo is valuable precisely because it makes the mixed result on force reduction explicit.
54
 
55
+ ## Quick Facts
56
 
57
+ | Item | Value |
58
+ | --- | --- |
59
+ | Task | Peg insertion in simulation under domain randomization |
60
+ | Dataset | [EXOKERN/contactbench-forge-peginsert-v0.1.1](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1) |
61
+ | Simulator | NVIDIA Isaac Lab (Isaac Sim 4.5) |
62
+ | Robot | Franka FR3 |
63
+ | Architecture | TemporalUNet1D diffusion policy |
64
+ | Parameters | 71.3M |
65
+ | Observation horizon | 10 frames |
66
+ | Prediction / execution horizon | 16 / 8 actions |
67
+ | Seeds evaluated | 42, 123, 7 |
68
+ | Total rollouts reported | 600 |
69
 
70
+ ## Benchmark Summary
71
 
72
+ The Hub metadata for this repo tracks the primary `full_ft` checkpoint. The full repo includes the paired `no_ft` ablation for comparison.
 
 
 
 
 
73
 
74
+ | Checkpoint | Success Rate | Avg Contact Force (N) | Peak Force (N) | Avg Episode Time (s) |
75
+ | --- | ---: | ---: | ---: | ---: |
76
+ | `full_ft` | 100.0 | 3.67 +/- 0.45 | 10.63 | 25.63 |
77
+ | `no_ft` | 100.0 | 3.37 +/- 0.06 | 10.33 | 25.73 |
78
 
79
+ Per-seed results:
 
 
80
 
81
  | Seed | Condition | Success Rate | Avg Force (N) | Peak Force (N) | Avg Time (s) |
82
+ | --- | --- | ---: | ---: | ---: | ---: |
83
+ | 42 | `full_ft` | 100.0 | 3.24 | 10.44 | 25.61 |
84
+ | 42 | `no_ft` | 100.0 | 3.38 | 10.38 | 25.73 |
85
+ | 123 | `full_ft` | 100.0 | 4.12 | 10.57 | 25.74 |
86
+ | 123 | `no_ft` | 100.0 | 3.34 | 10.32 | 25.79 |
87
+ | 7 | `full_ft` | 100.0 | 3.69 | 10.93 | 25.54 |
88
+ | 7 | `no_ft` | 100.0 | 3.37 | 10.31 | 25.68 |
89
+
90
+ Interpretation:
91
+
92
+ - This release demonstrates robust task completion under a much harder collection regime than v0.
93
+ - On this particular peg-in-hole setup, domain randomization largely closed the force gap between `full_ft` and `no_ft`.
94
+ - That does not prove force/torque is unnecessary in general. It shows that this release is best used as a robust benchmark and an honest reference point for harder future tasks.
95
+
96
+ ## What Changed Compared To v0
97
+
98
+ | Topic | v0 | v0.1.1 |
99
+ | --- | --- | --- |
100
+ | Dataset regime | Mostly fixed conditions | Multi-layer domain randomization |
101
+ | Dataset size | 2,221 episodes / 330,929 frames | 5,000 episodes / 745,000 frames |
102
+ | Robot | Franka Emika Panda | Franka FR3 |
103
+ | Force reduction takeaway | Clear F/T advantage | Inconclusive on this task |
104
+ | Best use | Clean baseline | Robustness benchmark |
105
 
106
+ ## Architecture
107
 
108
+ This release uses the same 1D Temporal U-Net diffusion policy family as v0.
109
+
110
+ ![Architecture](architecture.png)
111
+
112
+ | Component | Value |
113
+ | --- | --- |
114
+ | Action dimension | 7 |
115
+ | Observation dimensions | 22 (`full_ft`) / 16 (`no_ft`) |
116
+ | Diffusion training steps | 100 |
117
+ | DDIM inference steps | 16 |
118
+ | Base channels | 256 |
119
+ | Channel multipliers | (1, 2, 4) |
120
+ | Normalization | Min-max to `[-1, 1]` |
121
+
122
+ ## Repository Contents
123
+
124
+ | File | Description |
125
+ | --- | --- |
126
+ | `full_ft_best_model.pt` | Best checkpoint with force/torque input |
127
+ | `no_ft_best_model.pt` | Ablation checkpoint without force/torque input |
128
+ | `inference.py` | Self-contained inference helper and model definition |
129
+ | `config.yaml` | Training, dataset, and environment configuration |
130
+ | `eval_seed42.json` | Seed 42 evaluation artifact |
131
+ | `eval_seed123.json` | Seed 123 evaluation artifact |
132
+ | `eval_seed7.json` | Seed 7 evaluation artifact |
133
+ | `training_curve_full_ft_seed42.png` | Training curve for `full_ft`, seed 42 |
134
+ | `training_curve_full_ft_seed123.png` | Training curve for `full_ft`, seed 123 |
135
+ | `training_curve_full_ft_seed7.png` | Training curve for `full_ft`, seed 7 |
136
+ | `training_curve_no_ft_seed42.png` | Training curve for `no_ft`, seed 42 |
137
+ | `training_curve_no_ft_seed123.png` | Training curve for `no_ft`, seed 123 |
138
+ | `training_curve_no_ft_seed7.png` | Training curve for `no_ft`, seed 7 |
139
+
140
+ ## Usage
141
+
142
+ ### Reproduce evaluation with `exokern-eval`
143
 
144
  ```bash
145
  pip install exokern-eval
146
 
 
147
  wget https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/full_ft_best_model.pt
148
 
 
149
  exokern-eval \
150
  --policy full_ft_best_model.pt \
151
  --env Isaac-Forge-PegInsert-Direct-v0 \
152
  --episodes 100
153
  ```
154
 
155
+ ### Load the repo helper locally
156
 
157
  ```python
158
+ import os
159
+ import sys
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
 
161
+ from huggingface_hub import snapshot_download
162
 
163
+ repo_dir = snapshot_download(
164
+ repo_id="EXOKERN/skill-forge-peginsert-v0.1.1",
165
+ allow_patterns=["*.pt", "inference.py"],
166
+ )
167
+ sys.path.insert(0, repo_dir)
168
 
169
+ from inference import DiffusionPolicyInference
 
170
 
171
+ policy = DiffusionPolicyInference(
172
+ os.path.join(repo_dir, "full_ft_best_model.pt"),
173
+ device="cpu",
174
+ )
175
 
176
+ # full_ft expects a 22D observation vector
177
+ policy.add_observation([0.0] * 22)
178
+ actions = policy.get_actions()
179
+ print(len(actions))
180
+ ```
181
 
182
+ ## Training And Evaluation Setup
 
183
 
184
+ | Item | Value |
185
+ | --- | --- |
186
+ | Train / val split | 85% / 15% by episode |
187
+ | Epochs | 300 |
188
+ | Batch size | 256 |
189
+ | Optimizer | AdamW, `lr=1e-4`, `weight_decay=1e-4` |
190
+ | LR schedule | Cosine annealing to `1e-6` |
191
+ | EMA decay | 0.995 |
192
+ | Physics rate | 120 Hz |
193
+ | Control rate | 15 Hz |
194
+ | Domain randomization | Enabled in the training dataset |
195
 
196
+ ## Security Note
 
197
 
198
+ The checkpoints in this repo are PyTorch pickles. Load them only in a trusted or isolated environment after reviewing the repository contents.
199
 
200
+ ## Limitations
201
+
202
+ - Simulation only. This release does not claim real-robot readiness.
203
+ - Reported robustness is specific to the peg-in-hole task and the randomization ranges documented in the paired dataset card.
204
+ - The ablation result is mixed: use this repo to study robustness, not to overclaim a universal force/torque effect.
205
+ - The repo exposes paired checkpoints for research comparison; the intended production-style reference in this repo is `full_ft_best_model.pt`.
206
+
207
+ ## Related Resources
208
+
209
+ - Dataset: [EXOKERN/contactbench-forge-peginsert-v0.1.1](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1)
210
+ - Baseline predecessor: [EXOKERN/skill-forge-peginsert-v0](https://huggingface.co/EXOKERN/skill-forge-peginsert-v0)
211
+ - Evaluation CLI: [github.com/Exokern/exokern_eval](https://github.com/Exokern/exokern_eval)
212
+ - Organization page: [huggingface.co/EXOKERN](https://huggingface.co/EXOKERN)