Diffusion Policy — VAC Pipe Insertion (input/output ablation)

Six stock robomimic Diffusion Policy checkpoints for the contact-rich pipe-insertion task on a Doosan M0609 + Inspire RH56 hand under a Variable Admittance Controller (VAC). All trained on aleleanza/vac-pipe-dual-cam (202 episodes, 145,712 frames @ 50 Hz).

This repo is a clean 2 × 3 ablation: two output representations (where the policy sits relative to the admittance filter) × three input modalities.

Variants (subfolders)

Subfolder	Output (action)	Inputs (obs)	Action dim	Best val loss	@epoch
`vac_pre_vis`	pre: `user_cmd[6]` + `K` + `ζ` + `hand_binary`	vision	9	0.2445	67
`vac_pre_vis_wrench`	pre	vision + wrench	9	0.2583	8
`vac_pre_vis_wrench_state`	pre	vision + wrench + state	9	0.2739	11
`vac_post_vis`	post: `vel_cmd[6]` + `hand_binary`	vision	7	0.0580	48
`vac_post_vis_wrench`	post	vision + wrench	7	0.0572	48
`vac_post_vis_wrench_state`	post	vision + wrench + state	7	0.0547	48

pre = predict the operator command before admittance, including the compliance command itself (stiffness_cmd K + damping ζ) → the policy learns to set compliance. Consumed downstream by the variable-admittance node.
post = predict the Cartesian velocity the controller executed after admittance → the policy bypasses admittance and drives velocity directly.

Each subfolder contains:

<variant>/
├── best.pth              # lowest validation loss
├── last.pth              # final epoch
├── config.json           # full robomimic training config
├── action_stats.json     # action normalization (min-max) + action_components + hand binarization
└── dataset_summary.json  # train/valid episode split + frame counts

Architecture & training

Algorithm: robomimic Diffusion Policy (DDPM noise-prediction loss; no anchor / additional loss — stock).
Backbone: conditional UNet [128, 256, 512]; ~89.4–90.0M params.
Horizons: observation 2, action 4, prediction 8 (frame stack 2, seq length 8).
Image: D405 wrist stream (observation.images.camera) → front_rgb, 84×84.
Common: 300 epochs, batch 16, lr 1e-4, DDPM 50/50 train/infer steps.
Split: train episodes 0–171, valid 172–201.
Action normalization: min-max to [−1, 1]; hand head is binarized from action.absolute[:, 8:12] (mean ≥ 0.85 → open). See each action_stats.json.

Reading the results

The post (vel_cmd) variants reach far lower validation loss (~~0.055–0.058) than the pre (user_cmd + K + ζ) variants (~~0.24–0.27): predicting executed velocity is an easier target than predicting raw operator intent plus compliance. For pre + wrench and pre + wrench+state the best validation arrives very early (epoch 8–11) while training loss keeps dropping — a sign of overfitting on proprioceptive inputs at this model/dataset size. Adding wrench/state did not help at this scale.

Inference (ROS 2)

Run with the project's robot_learning real-time inference nodes (Doosan M0609 + Inspire hand). The action contract determines the runner:

pre variants (9D) → diffusion_policy_vac_preimg_runner. Publishes /delta_pose_cmd + /predicted_K + /predicted_zeta, consumed by variable_admittance_node (variable_K:=true). Add the Bota driver for *_wrench*.
post variants (7D) → velocity runner: streams vel_cmd directly via the DSR speedl interface (no admittance node).

# pre family (variable-stiffness path)
ros2 run robot_learning diffusion_policy_vac_preimg_runner \
  --ros-args -p checkpoint:=/path/to/vac_pre_vis_wrench_state/best.pth

# post family (direct velocity path)
ros2 run robot_learning diffusion_policy_fixed_k_runner \
  --ros-args -p checkpoint:=/path/to/vac_post_vis_wrench_state/best.pth -p mode:=vel_cmd

The binary hand head publishes to /inspire_hand/left/cmd; RGB observations come from the compressed camera topic; *_wrench* variants subscribe to /bota_ft_sensor/wrench (tared per trial). action_stats.json provides the exact normalization to undo at inference.

Intended use & limitations

Use: research on force-aware / compliance-predicting imitation learning for contact-rich insertion; a baseline ablation for VAC.
Limitations: single task (pipe_fixing), single embodiment (M0609 + RH56), 84×84 vision, binary hand. The pre variants overfit at this scale. Not validated for safety-critical or autonomous deployment.

Dataset: aleleanza/vac-pipe-dual-cam
Collection: VAC — Pipe Insertion

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Dataset used to train aleleanza/diffusion-policy-vac-pipe

Collection including aleleanza/diffusion-policy-vac-pipe

VAC — Pipe Insertion

Collection

Doosan M0609 variable-admittance pipe insertion: dual-camera LeRobot dataset + a Diffusion Policy I/O ablation trained on it. • 2 items • Updated 22 days ago

aleleanza
/

diffusion-policy-vac-pipe