--- license: apache-2.0 library_name: robomimic pipeline_tag: robotics tags: - robotics - diffusion-policy - imitation-learning - lerobot - variable-admittance - force-control - doosan datasets: - aleleanza/vac-pipe-dual-cam --- # Diffusion Policy — VAC Pipe Insertion (input/output ablation) Six **stock robomimic Diffusion Policy** checkpoints for the contact-rich **pipe-insertion** task on a Doosan M0609 + Inspire RH56 hand under a **Variable Admittance Controller (VAC)**. All trained on [`aleleanza/vac-pipe-dual-cam`](https://huggingface.co/datasets/aleleanza/vac-pipe-dual-cam) (202 episodes, 145,712 frames @ 50 Hz). This repo is a clean **2 × 3 ablation**: two *output* representations (where the policy sits relative to the admittance filter) × three *input* modalities. ## Variants (subfolders) | Subfolder | Output (action) | Inputs (obs) | Action dim | Best val loss | @epoch | |---|---|---|---:|---:|---:| | `vac_pre_vis` | **pre**: `user_cmd[6]` + `K` + `ζ` + `hand_binary` | vision | 9 | 0.2445 | 67 | | `vac_pre_vis_wrench` | pre | vision + wrench | 9 | 0.2583 | 8 | | `vac_pre_vis_wrench_state` | pre | vision + wrench + state | 9 | 0.2739 | 11 | | `vac_post_vis` | **post**: `vel_cmd[6]` + `hand_binary` | vision | 7 | 0.0580 | 48 | | `vac_post_vis_wrench` | post | vision + wrench | 7 | 0.0572 | 48 | | `vac_post_vis_wrench_state` | post | vision + wrench + state | 7 | **0.0547** | 48 | - **pre** = predict the operator command *before* admittance, including the compliance command itself (`stiffness_cmd` K + damping ζ) → the policy *learns to set compliance*. Consumed downstream by the variable-admittance node. - **post** = predict the Cartesian velocity the controller executed *after* admittance → the policy bypasses admittance and drives velocity directly. Each subfolder contains: ``` / ├── best.pth # lowest validation loss ├── last.pth # final epoch ├── config.json # full robomimic training config ├── action_stats.json # action normalization (min-max) + action_components + hand binarization └── dataset_summary.json # train/valid episode split + frame counts ``` ## Architecture & training - **Algorithm:** robomimic Diffusion Policy (DDPM noise-prediction loss; **no** anchor / additional loss — stock). - **Backbone:** conditional UNet `[128, 256, 512]`; ~89.4–90.0M params. - **Horizons:** observation 2, action 4, prediction 8 (frame stack 2, seq length 8). - **Image:** D405 wrist stream (`observation.images.camera`) → `front_rgb`, **84×84**. - **Common:** 300 epochs, batch 16, lr 1e-4, DDPM 50/50 train/infer steps. - **Split:** train episodes 0–171, valid 172–201. - **Action normalization:** min-max to [−1, 1]; hand head is **binarized** from `action.absolute[:, 8:12]` (mean ≥ 0.85 → open). See each `action_stats.json`. ### Reading the results The **post** (vel_cmd) variants reach far lower validation loss (~0.055–0.058) than the **pre** (user_cmd + K + ζ) variants (~0.24–0.27): predicting executed velocity is an easier target than predicting raw operator intent plus compliance. For `pre + wrench` and `pre + wrench+state` the best validation arrives very early (epoch 8–11) while training loss keeps dropping — a sign of **overfitting on proprioceptive inputs** at this model/dataset size. Adding wrench/state did **not** help at this scale. ## Inference (ROS 2) Run with the project's `robot_learning` real-time inference nodes (Doosan M0609 + Inspire hand). The action contract determines the runner: - **`pre` variants (9D)** → `diffusion_policy_vac_preimg_runner`. Publishes `/delta_pose_cmd` + `/predicted_K` + `/predicted_zeta`, consumed by `variable_admittance_node` (`variable_K:=true`). Add the Bota driver for `*_wrench*`. - **`post` variants (7D)** → velocity runner: streams `vel_cmd` directly via the DSR `speedl` interface (no admittance node). ```bash # pre family (variable-stiffness path) ros2 run robot_learning diffusion_policy_vac_preimg_runner \ --ros-args -p checkpoint:=/path/to/vac_pre_vis_wrench_state/best.pth # post family (direct velocity path) ros2 run robot_learning diffusion_policy_fixed_k_runner \ --ros-args -p checkpoint:=/path/to/vac_post_vis_wrench_state/best.pth -p mode:=vel_cmd ``` The binary hand head publishes to `/inspire_hand/left/cmd`; RGB observations come from the compressed camera topic; `*_wrench*` variants subscribe to `/bota_ft_sensor/wrench` (tared per trial). `action_stats.json` provides the exact normalization to undo at inference. ## Intended use & limitations - **Use:** research on force-aware / compliance-predicting imitation learning for contact-rich insertion; a baseline ablation for VAC. - **Limitations:** single task (`pipe_fixing`), single embodiment (M0609 + RH56), 84×84 vision, binary hand. The `pre` variants overfit at this scale. Not validated for safety-critical or autonomous deployment. ## Related - **Dataset:** [`aleleanza/vac-pipe-dual-cam`](https://huggingface.co/datasets/aleleanza/vac-pipe-dual-cam) - **Collection:** *VAC — Pipe Insertion*