Improve model card metadata and benchmark clarity

Browse files

Files changed (1) hide show

README.md +160 -97

README.md CHANGED Viewed

@@ -1,149 +1,212 @@
 ---
-pretty_name: "EXOKERN Skill v0.1.1 — Robust Contact-Rich Peg Insertion (Diffusion Policy)"
 license: cc-by-nc-4.0
-task_categories:
-  - robotics
-language:
-  - en
 tags:
   - robotics
   - force-torque
   - contact-rich
   - manipulation
   - insertion
-  - diffusion-policy
   - domain-randomization
   - sim-to-real
   - isaac-lab
   - franka
   - physical-ai
-  - pretrained-model
-library_name: exokern
 ---
-# EXOKERN Skill v0.1.1 — Robust Contact-Rich Peg Insertion
-Pre-trained Diffusion Policy for the contact-rich **Peg Insertion** task, trained on the domain-randomized [ContactBench v0.1.1 dataset](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1).
-This skill demonstrates **Level 1 (Sim-Validated + Out-of-Distribution)** of the EXOKERN Quality Pyramid. It achieves a 100% success rate under severe domain randomization, establishing a robust baseline for force-aware manipulation.
-Part of the [EXOKERN Skill Platform](https://huggingface.co/EXOKERN) — unlocking the "Kontakt-Foundation Model" for industrial assembly.
-![Training Curves](https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/training_curve_full_ft_seed42.png)
-## Architecture
-This repository contains two variants of a 71.3M parameter Vision-Language-Action (VLA) style Visuomotor Diffusion Policy, adapted for low-dimensional proprietary state spaces:
-1. **`full_ft`** (22-dim input): Includes the 6-axis end-effector wrench.
-2. **`no_ft`** (16-dim input): Ablated baseline without force/torque data.
-Both networks use a 1D Temporal CNN backbone + DDIM (16 diffusion steps).
-![Architecture](https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/architecture.png)
-## Evaluation Results: 600-Episode Multi-Seed Deep Dive
-We evaluated both the `full_ft` and `no_ft` policies across 3 distinct training seeds (42, 123, 7). Each policy was evaluated for 100 episodes under severe domain randomization (friction, mass, gains).
-### Key Finding: Domain Randomization is the Great Equalizer
-| Metric | v0 (Fixed Env) | v0.1.1 (Severe DR) |
-|---|:---:|:---:|
-| **Success Rate** | 100% | **100%** |
-| `full_ft` Avg Force | 3.7 N | **3.67 N** |
-| `no_ft` Avg Force | 5.3 N | **3.37 N** |
-| **F/T Force Reduction** | 30% | **-9% (Inconclusive)** |
-**Analysis:**
-In our v0 baseline (fixed environment), providing force/torque data reduced contact forces by 30%. However, under v0.1.1's severe Domain Randomization, the `no_ft` policy was forced to learn significantly more robust, compliant behaviors from visual/proprioceptive cues alone.
-As a result, **DR reduced the `no_ft` contact force by 36%** (5.3N -> 3.37N), effectively neutralizing the F/T advantage on this specific, relatively simple task. This confirms our roadmap strategy: F/T sensing becomes critical primarily on harder tasks (e.g., tighter tolerances, screw driving, snap-fits), which is the focus of our upcoming v0.2 release.
-### Detailed Multi-Seed Results
 | Seed | Condition | Success Rate | Avg Force (N) | Peak Force (N) | Avg Time (s) |
-|:---:|:---|:---:|:---:|:---:|:---:|
-| **42** | `full_ft` | 100.0% | 3.2 | 10.4 | 25.6 |
-| **42** | `no_ft` | 100.0% | 3.4 | 10.4 | 25.7 |
-| **123** | `full_ft` | 100.0% | 4.1 | 10.6 | 25.7 |
-| **123** | `no_ft` | 100.0% | 3.3 | 10.3 | 25.8 |
-| **7** | `full_ft` | 100.0% | 3.7 | 10.9 | 25.5 |
-| **7** | `no_ft` | 100.0% | 3.4 | 10.3 | 25.7 |
-| **Mean** | `full_ft` | **100.0%** | **3.67 ± 0.45** | **10.63** | **25.6** |
-| **Mean** | `no_ft` | **100.0%** | **3.37 ± 0.06** | **10.33** | **25.7** |
-*Note: The `no_ft` policy exhibited significantly lower variance across seeds (±0.06N vs ±0.45N), indicating higher training stability under these specific DR conditions.*
-## Usage with `exokern-eval`
-Evaluate this exact checkpoint in your own Isaac Lab environment using our CLI tool:
 ```bash
 pip install exokern-eval
-# Download the model weights
 wget https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/full_ft_best_model.pt
-# Run the evaluation
 exokern-eval \
   --policy full_ft_best_model.pt \
   --env Isaac-Forge-PegInsert-Direct-v0 \
   --episodes 100
 ```
-## Inference Code Example
 ```python
-import torch
-import numpy as np
-# Use EXOKERN's safe_load utility to prevent unpickling vulnerabilities
-# https://github.com/Exokern/exokern/blob/main/Skill_Training/safe_load.py
-from safe_load import safe_load_checkpoint
-# Load the model directly from HF Hub
-from huggingface_hub import hf_hub_download
-model_path = hf_hub_download(repo_id="EXOKERN/skill-forge-peginsert-v0.1.1", filename="full_ft_best_model.pt")
-ckpt = safe_load_checkpoint(model_path, device="cuda")
-model = ckpt["model"]
-# Create dummy observation (22-dim)
-# See Dataset card for tensor layout
-obs_dict = {
-    "observation.state": torch.randn(1, 22).cuda()
-}
-# Run inference
-with torch.no_grad():
-    action = model(obs_dict)
-print(f"Predicted action (7-DOF): {action.shape}")
-```
-## Security Notice (`torch.load` Vulnerability)
-In PyTorch, `torch.load` with `weights_only=False` can execute arbitrary code during unpickling (CVE-2025-32434).
-**Always use `weights_only=True` when loading untrusted checkpoints.**
-This repository provides a `safe_load_checkpoint` script configured to safely allowlist necessary `numpy` types (like `np.dtypes.Float64DType`) required to read the EXOKERN checkpoint statistics. Download it from our GitHub repository.
-## Intended Use
-**Primary Use Case:**
-Sim-to-real research, benchmarking imitation learning algorithms, and studying the effect of Force/Torque modalities under domain randomization.
-**Out of Scope:**
-Direct deployment on physical hardware without comprehensive safety bridging, limit definition, and real-world calibration. This model was trained exclusively in simulation.
-## License
-- **Code & Architecture:** Apache 2.0
-- **Model Weights:** CC-BY-NC 4.0 (Free for research & non-commercial use)
-For commercial licensing of the "Kontakt-Foundation Model" or custom Skill porting, please contact EXOKERN.
----
-**EXOKERN — Bridging the Haptic Gap in Robotic Manipulation**
-[exokern.com](https://exokern.com) | [github.com/Exokern](https://github.com/Exokern)

 ---
+pretty_name: "EXOKERN Skill v0.1.1 - Robust Peg Insertion Under Domain Randomization"
 license: cc-by-nc-4.0
+pipeline_tag: robotics
+library_name: pytorch
 tags:
   - robotics
+  - diffusion-policy
   - force-torque
   - contact-rich
   - manipulation
   - insertion
   - domain-randomization
   - sim-to-real
   - isaac-lab
   - franka
   - physical-ai
+  - lerobot
+datasets:
+  - EXOKERN/contactbench-forge-peginsert-v0.1.1
+metrics:
+  - success_rate
+  - avg_contact_force_n
+  - peak_contact_force_n
+model-index:
+  - name: EXOKERN Skill v0.1.1 - Peg Insertion (full_ft)
+    results:
+      - task:
+          type: robotics
+          name: Peg insertion
+        dataset:
+          name: EXOKERN ContactBench v0.1.1
+          type: EXOKERN/contactbench-forge-peginsert-v0.1.1
+        metrics:
+          - type: success_rate
+            value: 100.0
+            name: Success Rate (%)
+          - type: avg_contact_force_n
+            value: 3.67
+            name: Average Contact Force (N)
+          - type: peak_contact_force_n
+            value: 10.64
+            name: Peak Contact Force (N)
 ---
+# EXOKERN Skill v0.1.1 - Robust Peg Insertion Under Domain Randomization
+`skill-forge-peginsert-v0.1.1` is the domain-randomized reference model release in the EXOKERN catalog. It is trained on [EXOKERN ContactBench v0.1.1](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1) and ships the same paired comparison structure as v0:
+- `full_ft_best_model.pt`: primary checkpoint with 22D observations, including force/torque input
+- `no_ft_best_model.pt`: ablation checkpoint with the same architecture and 16D state-only observations
+This release should be read as a robustness benchmark first. Both policies remain successful under severe domain randomization, and the repo is valuable precisely because it makes the mixed result on force reduction explicit.
+## Quick Facts
+| Item | Value |
+| --- | --- |
+| Task | Peg insertion in simulation under domain randomization |
+| Dataset | [EXOKERN/contactbench-forge-peginsert-v0.1.1](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1) |
+| Simulator | NVIDIA Isaac Lab (Isaac Sim 4.5) |
+| Robot | Franka FR3 |
+| Architecture | TemporalUNet1D diffusion policy |
+| Parameters | 71.3M |
+| Observation horizon | 10 frames |
+| Prediction / execution horizon | 16 / 8 actions |
+| Seeds evaluated | 42, 123, 7 |
+| Total rollouts reported | 600 |
+## Benchmark Summary
+The Hub metadata for this repo tracks the primary `full_ft` checkpoint. The full repo includes the paired `no_ft` ablation for comparison.
+| Checkpoint | Success Rate | Avg Contact Force (N) | Peak Force (N) | Avg Episode Time (s) |
+| --- | ---: | ---: | ---: | ---: |
+| `full_ft` | 100.0 | 3.67 +/- 0.45 | 10.63 | 25.63 |
+| `no_ft` | 100.0 | 3.37 +/- 0.06 | 10.33 | 25.73 |
+Per-seed results:
 | Seed | Condition | Success Rate | Avg Force (N) | Peak Force (N) | Avg Time (s) |
+| --- | --- | ---: | ---: | ---: | ---: |
+| 42 | `full_ft` | 100.0 | 3.24 | 10.44 | 25.61 |
+| 42 | `no_ft` | 100.0 | 3.38 | 10.38 | 25.73 |
+| 123 | `full_ft` | 100.0 | 4.12 | 10.57 | 25.74 |
+| 123 | `no_ft` | 100.0 | 3.34 | 10.32 | 25.79 |
+| 7 | `full_ft` | 100.0 | 3.69 | 10.93 | 25.54 |
+| 7 | `no_ft` | 100.0 | 3.37 | 10.31 | 25.68 |
+Interpretation:
+- This release demonstrates robust task completion under a much harder collection regime than v0.
+- On this particular peg-in-hole setup, domain randomization largely closed the force gap between `full_ft` and `no_ft`.
+- That does not prove force/torque is unnecessary in general. It shows that this release is best used as a robust benchmark and an honest reference point for harder future tasks.
+## What Changed Compared To v0
+| Topic | v0 | v0.1.1 |
+| --- | --- | --- |
+| Dataset regime | Mostly fixed conditions | Multi-layer domain randomization |
+| Dataset size | 2,221 episodes / 330,929 frames | 5,000 episodes / 745,000 frames |
+| Robot | Franka Emika Panda | Franka FR3 |
+| Force reduction takeaway | Clear F/T advantage | Inconclusive on this task |
+| Best use | Clean baseline | Robustness benchmark |
+## Architecture
+This release uses the same 1D Temporal U-Net diffusion policy family as v0.
+![Architecture](architecture.png)
+| Component | Value |
+| --- | --- |
+| Action dimension | 7 |
+| Observation dimensions | 22 (`full_ft`) / 16 (`no_ft`) |
+| Diffusion training steps | 100 |
+| DDIM inference steps | 16 |
+| Base channels | 256 |
+| Channel multipliers | (1, 2, 4) |
+| Normalization | Min-max to `[-1, 1]` |
+## Repository Contents
+| File | Description |
+| --- | --- |
+| `full_ft_best_model.pt` | Best checkpoint with force/torque input |
+| `no_ft_best_model.pt` | Ablation checkpoint without force/torque input |
+| `inference.py` | Self-contained inference helper and model definition |
+| `config.yaml` | Training, dataset, and environment configuration |
+| `eval_seed42.json` | Seed 42 evaluation artifact |
+| `eval_seed123.json` | Seed 123 evaluation artifact |
+| `eval_seed7.json` | Seed 7 evaluation artifact |
+| `training_curve_full_ft_seed42.png` | Training curve for `full_ft`, seed 42 |
+| `training_curve_full_ft_seed123.png` | Training curve for `full_ft`, seed 123 |
+| `training_curve_full_ft_seed7.png` | Training curve for `full_ft`, seed 7 |
+| `training_curve_no_ft_seed42.png` | Training curve for `no_ft`, seed 42 |
+| `training_curve_no_ft_seed123.png` | Training curve for `no_ft`, seed 123 |
+| `training_curve_no_ft_seed7.png` | Training curve for `no_ft`, seed 7 |
+## Usage
+### Reproduce evaluation with `exokern-eval`
 ```bash
 pip install exokern-eval
 wget https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/full_ft_best_model.pt
 exokern-eval \
   --policy full_ft_best_model.pt \
   --env Isaac-Forge-PegInsert-Direct-v0 \
   --episodes 100
 ```
+### Load the repo helper locally
 ```python
+import os
+import sys
+from huggingface_hub import snapshot_download
+repo_dir = snapshot_download(
+    repo_id="EXOKERN/skill-forge-peginsert-v0.1.1",
+    allow_patterns=["*.pt", "inference.py"],
+)
+sys.path.insert(0, repo_dir)
+from inference import DiffusionPolicyInference
+policy = DiffusionPolicyInference(
+    os.path.join(repo_dir, "full_ft_best_model.pt"),
+    device="cpu",
+)
+# full_ft expects a 22D observation vector
+policy.add_observation([0.0] * 22)
+actions = policy.get_actions()
+print(len(actions))
+```
+## Training And Evaluation Setup
+| Item | Value |
+| --- | --- |
+| Train / val split | 85% / 15% by episode |
+| Epochs | 300 |
+| Batch size | 256 |
+| Optimizer | AdamW, `lr=1e-4`, `weight_decay=1e-4` |
+| LR schedule | Cosine annealing to `1e-6` |
+| EMA decay | 0.995 |
+| Physics rate | 120 Hz |
+| Control rate | 15 Hz |
+| Domain randomization | Enabled in the training dataset |
+## Security Note
+The checkpoints in this repo are PyTorch pickles. Load them only in a trusted or isolated environment after reviewing the repository contents.
+## Limitations
+- Simulation only. This release does not claim real-robot readiness.
+- Reported robustness is specific to the peg-in-hole task and the randomization ranges documented in the paired dataset card.
+- The ablation result is mixed: use this repo to study robustness, not to overclaim a universal force/torque effect.
+- The repo exposes paired checkpoints for research comparison; the intended production-style reference in this repo is `full_ft_best_model.pt`.
+## Related Resources
+- Dataset: [EXOKERN/contactbench-forge-peginsert-v0.1.1](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1)
+- Baseline predecessor: [EXOKERN/skill-forge-peginsert-v0](https://huggingface.co/EXOKERN/skill-forge-peginsert-v0)
+- Evaluation CLI: [github.com/Exokern/exokern_eval](https://github.com/Exokern/exokern_eval)
+- Organization page: [huggingface.co/EXOKERN](https://huggingface.co/EXOKERN)