Robotics
LeRobot
robomimic
diffusion-policy
imitation-learning
variable-admittance
force-control
doosan
Instructions to use aleleanza/diffusion-policy-vac-pipe with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use aleleanza/diffusion-policy-vac-pipe with LeRobot:
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: robomimic | |
| pipeline_tag: robotics | |
| tags: | |
| - robotics | |
| - diffusion-policy | |
| - imitation-learning | |
| - lerobot | |
| - variable-admittance | |
| - force-control | |
| - doosan | |
| datasets: | |
| - aleleanza/vac-pipe-dual-cam | |
| # Diffusion Policy β VAC Pipe Insertion (input/output ablation) | |
| Six **stock robomimic Diffusion Policy** checkpoints for the contact-rich | |
| **pipe-insertion** task on a Doosan M0609 + Inspire RH56 hand under a **Variable | |
| Admittance Controller (VAC)**. All trained on | |
| [`aleleanza/vac-pipe-dual-cam`](https://huggingface.co/datasets/aleleanza/vac-pipe-dual-cam) | |
| (202 episodes, 145,712 frames @ 50 Hz). | |
| This repo is a clean **2 Γ 3 ablation**: two *output* representations (where the policy | |
| sits relative to the admittance filter) Γ three *input* modalities. | |
| ## Variants (subfolders) | |
| | Subfolder | Output (action) | Inputs (obs) | Action dim | Best val loss | @epoch | | |
| |---|---|---|---:|---:|---:| | |
| | `vac_pre_vis` | **pre**: `user_cmd[6]` + `K` + `ΞΆ` + `hand_binary` | vision | 9 | 0.2445 | 67 | | |
| | `vac_pre_vis_wrench` | pre | vision + wrench | 9 | 0.2583 | 8 | | |
| | `vac_pre_vis_wrench_state` | pre | vision + wrench + state | 9 | 0.2739 | 11 | | |
| | `vac_post_vis` | **post**: `vel_cmd[6]` + `hand_binary` | vision | 7 | 0.0580 | 48 | | |
| | `vac_post_vis_wrench` | post | vision + wrench | 7 | 0.0572 | 48 | | |
| | `vac_post_vis_wrench_state` | post | vision + wrench + state | 7 | **0.0547** | 48 | | |
| - **pre** = predict the operator command *before* admittance, including the compliance | |
| command itself (`stiffness_cmd` K + damping ΞΆ) β the policy *learns to set compliance*. | |
| Consumed downstream by the variable-admittance node. | |
| - **post** = predict the Cartesian velocity the controller executed *after* admittance β | |
| the policy bypasses admittance and drives velocity directly. | |
| Each subfolder contains: | |
| ``` | |
| <variant>/ | |
| βββ best.pth # lowest validation loss | |
| βββ last.pth # final epoch | |
| βββ config.json # full robomimic training config | |
| βββ action_stats.json # action normalization (min-max) + action_components + hand binarization | |
| βββ dataset_summary.json # train/valid episode split + frame counts | |
| ``` | |
| ## Architecture & training | |
| - **Algorithm:** robomimic Diffusion Policy (DDPM noise-prediction loss; **no** anchor / | |
| additional loss β stock). | |
| - **Backbone:** conditional UNet `[128, 256, 512]`; ~89.4β90.0M params. | |
| - **Horizons:** observation 2, action 4, prediction 8 (frame stack 2, seq length 8). | |
| - **Image:** D405 wrist stream (`observation.images.camera`) β `front_rgb`, **84Γ84**. | |
| - **Common:** 300 epochs, batch 16, lr 1e-4, DDPM 50/50 train/infer steps. | |
| - **Split:** train episodes 0β171, valid 172β201. | |
| - **Action normalization:** min-max to [β1, 1]; hand head is **binarized** from | |
| `action.absolute[:, 8:12]` (mean β₯ 0.85 β open). See each `action_stats.json`. | |
| ### Reading the results | |
| The **post** (vel_cmd) variants reach far lower validation loss (~0.055β0.058) than the | |
| **pre** (user_cmd + K + ΞΆ) variants (~0.24β0.27): predicting executed velocity is an | |
| easier target than predicting raw operator intent plus compliance. For `pre + wrench` | |
| and `pre + wrench+state` the best validation arrives very early (epoch 8β11) while | |
| training loss keeps dropping β a sign of **overfitting on proprioceptive inputs** at this | |
| model/dataset size. Adding wrench/state did **not** help at this scale. | |
| ## Inference (ROS 2) | |
| Run with the project's `robot_learning` real-time inference nodes (Doosan M0609 + Inspire | |
| hand). The action contract determines the runner: | |
| - **`pre` variants (9D)** β `diffusion_policy_vac_preimg_runner`. Publishes | |
| `/delta_pose_cmd` + `/predicted_K` + `/predicted_zeta`, consumed by | |
| `variable_admittance_node` (`variable_K:=true`). Add the Bota driver for `*_wrench*`. | |
| - **`post` variants (7D)** β velocity runner: streams `vel_cmd` directly via the DSR | |
| `speedl` interface (no admittance node). | |
| ```bash | |
| # pre family (variable-stiffness path) | |
| ros2 run robot_learning diffusion_policy_vac_preimg_runner \ | |
| --ros-args -p checkpoint:=/path/to/vac_pre_vis_wrench_state/best.pth | |
| # post family (direct velocity path) | |
| ros2 run robot_learning diffusion_policy_fixed_k_runner \ | |
| --ros-args -p checkpoint:=/path/to/vac_post_vis_wrench_state/best.pth -p mode:=vel_cmd | |
| ``` | |
| The binary hand head publishes to `/inspire_hand/left/cmd`; RGB observations come from | |
| the compressed camera topic; `*_wrench*` variants subscribe to `/bota_ft_sensor/wrench` | |
| (tared per trial). `action_stats.json` provides the exact normalization to undo at | |
| inference. | |
| ## Intended use & limitations | |
| - **Use:** research on force-aware / compliance-predicting imitation learning for | |
| contact-rich insertion; a baseline ablation for VAC. | |
| - **Limitations:** single task (`pipe_fixing`), single embodiment (M0609 + RH56), 84Γ84 | |
| vision, binary hand. The `pre` variants overfit at this scale. Not validated for | |
| safety-critical or autonomous deployment. | |
| ## Related | |
| - **Dataset:** [`aleleanza/vac-pipe-dual-cam`](https://huggingface.co/datasets/aleleanza/vac-pipe-dual-cam) | |
| - **Collection:** *VAC β Pipe Insertion* | |