Diffusion Policy β€” VAC Pipe Insertion (input/output ablation)

Six stock robomimic Diffusion Policy checkpoints for the contact-rich pipe-insertion task on a Doosan M0609 + Inspire RH56 hand under a Variable Admittance Controller (VAC). All trained on aleleanza/vac-pipe-dual-cam (202 episodes, 145,712 frames @ 50 Hz).

This repo is a clean 2 Γ— 3 ablation: two output representations (where the policy sits relative to the admittance filter) Γ— three input modalities.

Variants (subfolders)

Subfolder Output (action) Inputs (obs) Action dim Best val loss @epoch
vac_pre_vis pre: user_cmd[6] + K + ΞΆ + hand_binary vision 9 0.2445 67
vac_pre_vis_wrench pre vision + wrench 9 0.2583 8
vac_pre_vis_wrench_state pre vision + wrench + state 9 0.2739 11
vac_post_vis post: vel_cmd[6] + hand_binary vision 7 0.0580 48
vac_post_vis_wrench post vision + wrench 7 0.0572 48
vac_post_vis_wrench_state post vision + wrench + state 7 0.0547 48
  • pre = predict the operator command before admittance, including the compliance command itself (stiffness_cmd K + damping ΞΆ) β†’ the policy learns to set compliance. Consumed downstream by the variable-admittance node.
  • post = predict the Cartesian velocity the controller executed after admittance β†’ the policy bypasses admittance and drives velocity directly.

Each subfolder contains:

<variant>/
β”œβ”€β”€ best.pth              # lowest validation loss
β”œβ”€β”€ last.pth              # final epoch
β”œβ”€β”€ config.json           # full robomimic training config
β”œβ”€β”€ action_stats.json     # action normalization (min-max) + action_components + hand binarization
└── dataset_summary.json  # train/valid episode split + frame counts

Architecture & training

  • Algorithm: robomimic Diffusion Policy (DDPM noise-prediction loss; no anchor / additional loss β€” stock).
  • Backbone: conditional UNet [128, 256, 512]; ~89.4–90.0M params.
  • Horizons: observation 2, action 4, prediction 8 (frame stack 2, seq length 8).
  • Image: D405 wrist stream (observation.images.camera) β†’ front_rgb, 84Γ—84.
  • Common: 300 epochs, batch 16, lr 1e-4, DDPM 50/50 train/infer steps.
  • Split: train episodes 0–171, valid 172–201.
  • Action normalization: min-max to [βˆ’1, 1]; hand head is binarized from action.absolute[:, 8:12] (mean β‰₯ 0.85 β†’ open). See each action_stats.json.

Reading the results

The post (vel_cmd) variants reach far lower validation loss (0.055–0.058) than the pre (user_cmd + K + ΞΆ) variants (0.24–0.27): predicting executed velocity is an easier target than predicting raw operator intent plus compliance. For pre + wrench and pre + wrench+state the best validation arrives very early (epoch 8–11) while training loss keeps dropping β€” a sign of overfitting on proprioceptive inputs at this model/dataset size. Adding wrench/state did not help at this scale.

Inference (ROS 2)

Run with the project's robot_learning real-time inference nodes (Doosan M0609 + Inspire hand). The action contract determines the runner:

  • pre variants (9D) β†’ diffusion_policy_vac_preimg_runner. Publishes /delta_pose_cmd + /predicted_K + /predicted_zeta, consumed by variable_admittance_node (variable_K:=true). Add the Bota driver for *_wrench*.
  • post variants (7D) β†’ velocity runner: streams vel_cmd directly via the DSR speedl interface (no admittance node).
# pre family (variable-stiffness path)
ros2 run robot_learning diffusion_policy_vac_preimg_runner \
  --ros-args -p checkpoint:=/path/to/vac_pre_vis_wrench_state/best.pth

# post family (direct velocity path)
ros2 run robot_learning diffusion_policy_fixed_k_runner \
  --ros-args -p checkpoint:=/path/to/vac_post_vis_wrench_state/best.pth -p mode:=vel_cmd

The binary hand head publishes to /inspire_hand/left/cmd; RGB observations come from the compressed camera topic; *_wrench* variants subscribe to /bota_ft_sensor/wrench (tared per trial). action_stats.json provides the exact normalization to undo at inference.

Intended use & limitations

  • Use: research on force-aware / compliance-predicting imitation learning for contact-rich insertion; a baseline ablation for VAC.
  • Limitations: single task (pipe_fixing), single embodiment (M0609 + RH56), 84Γ—84 vision, binary hand. The pre variants overfit at this scale. Not validated for safety-critical or autonomous deployment.

Related

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Dataset used to train aleleanza/diffusion-policy-vac-pipe

Collection including aleleanza/diffusion-policy-vac-pipe