README.md · EXOKERN/skill-forge-peginsert-v0.1.1 at main

skill-forge-peginsert-v0.1.1

File size: 8,806 Bytes

---
pretty_name: "EXOKERN Skill v0.1.1 - Robust Peg Insertion Under Domain Randomization"
license: cc-by-nc-4.0
pipeline_tag: robotics
library_name: pytorch
tags:
  - robotics
  - diffusion-policy
  - force-torque
  - contact-rich
  - manipulation
  - insertion
  - domain-randomization
  - sim-to-real
  - isaac-lab
  - franka
  - physical-ai
  - lerobot
datasets:
  - EXOKERN/contactbench-forge-peginsert-v0.1.1
metrics:
  - success_rate
  - avg_contact_force_n
  - peak_contact_force_n
model-index:
  - name: EXOKERN Skill v0.1.1 - Peg Insertion (full_ft)
    results:
      - task:
          type: robotics
          name: Peg insertion
        dataset:
          name: EXOKERN ContactBench v0.1.1
          type: EXOKERN/contactbench-forge-peginsert-v0.1.1
        metrics:
          - type: success_rate
            value: 100.0
            name: Success Rate (%)
          - type: avg_contact_force_n
            value: 3.67
            name: Average Contact Force (N)
          - type: peak_contact_force_n
            value: 10.64
            name: Peak Contact Force (N)
---

# EXOKERN Skill v0.1.1 - Robust Peg Insertion Under Domain Randomization

`skill-forge-peginsert-v0.1.1` is the domain-randomized reference model release in the EXOKERN catalog. It is trained on [EXOKERN ContactBench v0.1.1](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1) and ships the same paired comparison structure as v0:

- `full_ft_best_model.pt`: primary checkpoint with 22D observations, including force/torque input
- `no_ft_best_model.pt`: ablation checkpoint with the same architecture and 16D state-only observations

This release should be read as a robustness benchmark first. Both policies remain successful under severe domain randomization, and the repo is valuable precisely because it makes the mixed result on force reduction explicit.

## Quick Facts

| Item | Value |
| --- | --- |
| Task | Peg insertion in simulation under domain randomization |
| Dataset | [EXOKERN/contactbench-forge-peginsert-v0.1.1](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1) |
| Simulator | NVIDIA Isaac Lab (Isaac Sim 4.5) |
| Robot | Franka FR3 |
| Architecture | TemporalUNet1D diffusion policy |
| Parameters | 71.3M |
| Observation horizon | 10 frames |
| Prediction / execution horizon | 16 / 8 actions |
| Seeds evaluated | 42, 123, 7 |
| Total rollouts reported | 600 |

## Benchmark Summary

The Hub metadata for this repo tracks the primary `full_ft` checkpoint. The full repo includes the paired `no_ft` ablation for comparison.

| Checkpoint | Success Rate | Avg Contact Force (N) | Peak Contact Force (N) | Avg Episode Time (s) |
| --- | ---: | ---: | ---: | ---: |
| `full_ft` | 100.0 | 3.67 +/- 0.45 | 10.63 | 25.63 |
| `no_ft` | 100.0 | 3.37 +/- 0.06 | 10.33 | 25.73 |

![EXOKERN skill v0.1.1 benchmark summary](https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/figures/benchmark_summary.png)

*Figure: multi-seed benchmark summary built from the published `eval_seed42/123/7.json` artifacts.*

Per-seed results:

| Seed | Condition | Success Rate | Avg Force (N) | Peak Force (N) | Avg Time (s) |
| --- | --- | ---: | ---: | ---: | ---: |
| 42 | `full_ft` | 100.0 | 3.24 | 10.44 | 25.61 |
| 42 | `no_ft` | 100.0 | 3.38 | 10.38 | 25.73 |
| 123 | `full_ft` | 100.0 | 4.12 | 10.57 | 25.74 |
| 123 | `no_ft` | 100.0 | 3.34 | 10.32 | 25.79 |
| 7 | `full_ft` | 100.0 | 3.69 | 10.93 | 25.54 |
| 7 | `no_ft` | 100.0 | 3.37 | 10.31 | 25.68 |

Interpretation:

- This release demonstrates robust task completion under a much harder collection regime than v0.
- On this particular peg-in-hole setup, domain randomization largely closed the force gap between `full_ft` and `no_ft`.
- That does not prove force/torque is unnecessary in general. It shows that this release is best used as a robust benchmark and an honest reference point for harder future tasks.

## What Changed Compared To v0

| Topic | v0 | v0.1.1 |
| --- | --- | --- |
| Dataset regime | Mostly fixed conditions | Multi-layer domain randomization |
| Dataset size | 2,221 episodes / 330,929 frames | 5,000 episodes / 745,000 frames |
| Robot | Franka Emika Panda | Franka FR3 |
| Force reduction takeaway | Clear F/T advantage | Inconclusive on this task |
| Best use | Clean baseline | Robustness benchmark |

## Architecture

This release uses the same 1D Temporal U-Net diffusion policy family as v0.

![Architecture](https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/architecture.png)

| Component | Value |
| --- | --- |
| Action dimension | 7 |
| Observation dimensions | 22 (`full_ft`) / 16 (`no_ft`) |
| Diffusion training steps | 100 |
| DDIM inference steps | 16 |
| Base channels | 256 |
| Channel multipliers | (1, 2, 4) |
| Normalization | Min-max to `[-1, 1]` |

## Repository Contents

| File | Description |
| --- | --- |
| `full_ft_best_model.pt` | Best checkpoint with force/torque input |
| `no_ft_best_model.pt` | Ablation checkpoint without force/torque input |
| `inference.py` | Self-contained inference helper and model definition |
| `config.yaml` | Training, dataset, and environment configuration |
| `eval_seed42.json` | Seed 42 evaluation artifact |
| `eval_seed123.json` | Seed 123 evaluation artifact |
| `eval_seed7.json` | Seed 7 evaluation artifact |
| `training_curve_full_ft_seed42.png` | Training curve for `full_ft`, seed 42 |
| `training_curve_full_ft_seed123.png` | Training curve for `full_ft`, seed 123 |
| `training_curve_full_ft_seed7.png` | Training curve for `full_ft`, seed 7 |
| `training_curve_no_ft_seed42.png` | Training curve for `no_ft`, seed 42 |
| `training_curve_no_ft_seed123.png` | Training curve for `no_ft`, seed 123 |
| `training_curve_no_ft_seed7.png` | Training curve for `no_ft`, seed 7 |

## Usage

### Reproduce evaluation with `exokern-eval`

```bash
pip install exokern-eval

wget https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/full_ft_best_model.pt

exokern-eval \
  --policy full_ft_best_model.pt \
  --env Isaac-Forge-PegInsert-Direct-v0 \
  --episodes 100
```

### Load the repo helper locally

```python
import os
import sys

from huggingface_hub import snapshot_download

repo_dir = snapshot_download(
    repo_id="EXOKERN/skill-forge-peginsert-v0.1.1",
    allow_patterns=["*.pt", "inference.py"],
)
sys.path.insert(0, repo_dir)

from inference import DiffusionPolicyInference

policy = DiffusionPolicyInference(
    os.path.join(repo_dir, "full_ft_best_model.pt"),
    device="cpu",
)

policy.add_observation([0.0] * 22)
actions = policy.get_actions()
print(len(actions))
```

## Training And Evaluation Setup

| Item | Value |
| --- | --- |
| Train / val split | 85% / 15% by episode |
| Epochs | 300 |
| Batch size | 256 |
| Optimizer | AdamW, `lr=1e-4`, `weight_decay=1e-4` |
| LR schedule | Cosine annealing to `1e-6` |
| EMA decay | 0.995 |
| Physics rate | 120 Hz |
| Control rate | 15 Hz |
| Domain randomization | Enabled in the training dataset |

## Related Work

- FORGE: [Force-Guided Exploration for Robust Contact-Rich Manipulation under Uncertainty](https://arxiv.org/abs/2408.04587)
- Diffusion Policy: [Visuomotor Policy Learning via Action Diffusion](https://arxiv.org/abs/2303.04137)
- Factory: [Fast Contact for Robotic Assembly](https://arxiv.org/abs/2205.03532)

## Citation

```bibtex
@misc{exokern_skill_peginsert_v011_2026,
  title        = {EXOKERN Skill v0.1.1: Robust Peg Insertion Under Domain Randomization},
  author       = {{EXOKERN}},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1}},
  note         = {Paired full_ft and no_ft diffusion-policy checkpoints}
}
```

## Security Note

The checkpoints in this repo are PyTorch pickles. Load them only in a trusted or isolated environment after reviewing the repository contents.

## Limitations

- Simulation only. This release does not claim real-robot readiness.
- Reported robustness is specific to the peg-in-hole task and the randomization ranges documented in the paired dataset card.
- The ablation result is mixed: use this repo to study robustness, not to overclaim a universal force/torque effect.
- The repo exposes paired checkpoints for research comparison; the intended production-style reference in this repo is `full_ft_best_model.pt`.

## Related Resources

- Dataset: [EXOKERN/contactbench-forge-peginsert-v0.1.1](https://huggingface.co/datasets/EXOKERN/contactbench-forge-peginsert-v0.1.1)
- Baseline predecessor: [EXOKERN/skill-forge-peginsert-v0](https://huggingface.co/EXOKERN/skill-forge-peginsert-v0)
- Evaluation CLI: [github.com/Exokern/exokern_eval](https://github.com/Exokern/exokern_eval)
- Organization page: [huggingface.co/EXOKERN](https://huggingface.co/EXOKERN)