---
license: apache-2.0
library_name: robomimic
pipeline_tag: robotics
tags:
- robotics
- diffusion-policy
- imitation-learning
- lerobot
- variable-admittance
- force-control
- doosan
datasets:
- aleleanza/vac-pipe-dual-cam
---

# Diffusion Policy — VAC Pipe Insertion (input/output ablation)

Six **stock robomimic Diffusion Policy** checkpoints for the contact-rich
**pipe-insertion** task on a Doosan M0609 + Inspire RH56 hand under a **Variable
Admittance Controller (VAC)**. All trained on
[`aleleanza/vac-pipe-dual-cam`](https://huggingface.co/datasets/aleleanza/vac-pipe-dual-cam)
(202 episodes, 145,712 frames @ 50 Hz).

This repo is a clean **2 × 3 ablation**: two *output* representations (where the policy
sits relative to the admittance filter) × three *input* modalities.

## Variants (subfolders)

| Subfolder | Output (action) | Inputs (obs) | Action dim | Best val loss | @epoch |
|---|---|---|---:|---:|---:|
| `vac_pre_vis` | **pre**: `user_cmd[6]` + `K` + `ζ` + `hand_binary` | vision | 9 | 0.2445 | 67 |
| `vac_pre_vis_wrench` | pre | vision + wrench | 9 | 0.2583 | 8 |
| `vac_pre_vis_wrench_state` | pre | vision + wrench + state | 9 | 0.2739 | 11 |
| `vac_post_vis` | **post**: `vel_cmd[6]` + `hand_binary` | vision | 7 | 0.0580 | 48 |
| `vac_post_vis_wrench` | post | vision + wrench | 7 | 0.0572 | 48 |
| `vac_post_vis_wrench_state` | post | vision + wrench + state | 7 | **0.0547** | 48 |

- **pre** = predict the operator command *before* admittance, including the compliance
  command itself (`stiffness_cmd` K + damping ζ) → the policy *learns to set compliance*.
  Consumed downstream by the variable-admittance node.
- **post** = predict the Cartesian velocity the controller executed *after* admittance →
  the policy bypasses admittance and drives velocity directly.

Each subfolder contains:

```
<variant>/
├── best.pth              # lowest validation loss
├── last.pth              # final epoch
├── config.json           # full robomimic training config
├── action_stats.json     # action normalization (min-max) + action_components + hand binarization
└── dataset_summary.json  # train/valid episode split + frame counts
```

## Architecture & training

- **Algorithm:** robomimic Diffusion Policy (DDPM noise-prediction loss; **no** anchor /
  additional loss — stock).
- **Backbone:** conditional UNet `[128, 256, 512]`; ~89.4–90.0M params.
- **Horizons:** observation 2, action 4, prediction 8 (frame stack 2, seq length 8).
- **Image:** D405 wrist stream (`observation.images.camera`) → `front_rgb`, **84×84**.
- **Common:** 300 epochs, batch 16, lr 1e-4, DDPM 50/50 train/infer steps.
- **Split:** train episodes 0–171, valid 172–201.
- **Action normalization:** min-max to [−1, 1]; hand head is **binarized** from
  `action.absolute[:, 8:12]` (mean ≥ 0.85 → open). See each `action_stats.json`.

### Reading the results

The **post** (vel_cmd) variants reach far lower validation loss (~0.055–0.058) than the
**pre** (user_cmd + K + ζ) variants (~0.24–0.27): predicting executed velocity is an
easier target than predicting raw operator intent plus compliance. For `pre + wrench`
and `pre + wrench+state` the best validation arrives very early (epoch 8–11) while
training loss keeps dropping — a sign of **overfitting on proprioceptive inputs** at this
model/dataset size. Adding wrench/state did **not** help at this scale.

## Inference (ROS 2)

Run with the project's `robot_learning` real-time inference nodes (Doosan M0609 + Inspire
hand). The action contract determines the runner:

- **`pre` variants (9D)** → `diffusion_policy_vac_preimg_runner`. Publishes
  `/delta_pose_cmd` + `/predicted_K` + `/predicted_zeta`, consumed by
  `variable_admittance_node` (`variable_K:=true`). Add the Bota driver for `*_wrench*`.
- **`post` variants (7D)** → velocity runner: streams `vel_cmd` directly via the DSR
  `speedl` interface (no admittance node).

```bash
# pre family (variable-stiffness path)
ros2 run robot_learning diffusion_policy_vac_preimg_runner \
  --ros-args -p checkpoint:=/path/to/vac_pre_vis_wrench_state/best.pth

# post family (direct velocity path)
ros2 run robot_learning diffusion_policy_fixed_k_runner \
  --ros-args -p checkpoint:=/path/to/vac_post_vis_wrench_state/best.pth -p mode:=vel_cmd
```

The binary hand head publishes to `/inspire_hand/left/cmd`; RGB observations come from
the compressed camera topic; `*_wrench*` variants subscribe to `/bota_ft_sensor/wrench`
(tared per trial). `action_stats.json` provides the exact normalization to undo at
inference.

## Intended use & limitations

- **Use:** research on force-aware / compliance-predicting imitation learning for
  contact-rich insertion; a baseline ablation for VAC.
- **Limitations:** single task (`pipe_fixing`), single embodiment (M0609 + RH56), 84×84
  vision, binary hand. The `pre` variants overfit at this scale. Not validated for
  safety-critical or autonomous deployment.

## Related

- **Dataset:** [`aleleanza/vac-pipe-dual-cam`](https://huggingface.co/datasets/aleleanza/vac-pipe-dual-cam)
- **Collection:** *VAC — Pipe Insertion*