Add model card

75f6eb7 verified 30 days ago

5.15 kB

	---
	license: apache-2.0
	library_name: robomimic
	pipeline_tag: robotics
	tags:
	- robotics
	- diffusion-policy
	- imitation-learning
	- lerobot
	- variable-admittance
	- force-control
	- doosan
	datasets:
	- aleleanza/vac-pipe-dual-cam
	---

	# Diffusion Policy — VAC Pipe Insertion (input/output ablation)

	Six stock robomimic Diffusion Policy checkpoints for the contact-rich
	pipe-insertion task on a Doosan M0609 + Inspire RH56 hand under a **Variable
	Admittance Controller (VAC)**. All trained on
	[`aleleanza/vac-pipe-dual-cam`](https://huggingface.co/datasets/aleleanza/vac-pipe-dual-cam)
	(202 episodes, 145,712 frames @ 50 Hz).

	This repo is a clean 2 × 3 ablation: two output representations (where the policy
	sits relative to the admittance filter) × three input modalities.

	## Variants (subfolders)

	\| Subfolder \| Output (action) \| Inputs (obs) \| Action dim \| Best val loss \| @epoch \|
	\|---\|---\|---\|---:\|---:\|---:\|
	\| `vac_pre_vis` \| pre: `user_cmd[6]` + `K` + `ζ` + `hand_binary` \| vision \| 9 \| 0.2445 \| 67 \|
	\| `vac_pre_vis_wrench` \| pre \| vision + wrench \| 9 \| 0.2583 \| 8 \|
	\| `vac_pre_vis_wrench_state` \| pre \| vision + wrench + state \| 9 \| 0.2739 \| 11 \|
	\| `vac_post_vis` \| post: `vel_cmd[6]` + `hand_binary` \| vision \| 7 \| 0.0580 \| 48 \|
	\| `vac_post_vis_wrench` \| post \| vision + wrench \| 7 \| 0.0572 \| 48 \|
	\| `vac_post_vis_wrench_state` \| post \| vision + wrench + state \| 7 \| 0.0547 \| 48 \|

	- pre = predict the operator command before admittance, including the compliance
	command itself (`stiffness_cmd` K + damping ζ) → the policy learns to set compliance.
	Consumed downstream by the variable-admittance node.
	- post = predict the Cartesian velocity the controller executed after admittance →
	the policy bypasses admittance and drives velocity directly.

	Each subfolder contains:

	```
	<variant>/
	├── best.pth # lowest validation loss
	├── last.pth # final epoch
	├── config.json # full robomimic training config
	├── action_stats.json # action normalization (min-max) + action_components + hand binarization
	└── dataset_summary.json # train/valid episode split + frame counts
	```

	## Architecture & training

	- Algorithm: robomimic Diffusion Policy (DDPM noise-prediction loss; no anchor /
	additional loss — stock).
	- Backbone: conditional UNet `[128, 256, 512]`; ~89.4–90.0M params.
	- Horizons: observation 2, action 4, prediction 8 (frame stack 2, seq length 8).
	- Image: D405 wrist stream (`observation.images.camera`) → `front_rgb`, 84×84.
	- Common: 300 epochs, batch 16, lr 1e-4, DDPM 50/50 train/infer steps.
	- Split: train episodes 0–171, valid 172–201.
	- Action normalization: min-max to [−1, 1]; hand head is binarized from
	`action.absolute[:, 8:12]` (mean ≥ 0.85 → open). See each `action_stats.json`.

	### Reading the results

	The post (vel_cmd) variants reach far lower validation loss (~0.055–0.058) than the
	pre (user_cmd + K + ζ) variants (~0.24–0.27): predicting executed velocity is an
	easier target than predicting raw operator intent plus compliance. For `pre + wrench`
	and `pre + wrench+state` the best validation arrives very early (epoch 8–11) while
	training loss keeps dropping — a sign of overfitting on proprioceptive inputs at this
	model/dataset size. Adding wrench/state did not help at this scale.

	## Inference (ROS 2)

	Run with the project's `robot_learning` real-time inference nodes (Doosan M0609 + Inspire
	hand). The action contract determines the runner:

	- `pre` variants (9D) → `diffusion_policy_vac_preimg_runner`. Publishes
	`/delta_pose_cmd` + `/predicted_K` + `/predicted_zeta`, consumed by
	`variable_admittance_node` (`variable_K:=true`). Add the Bota driver for `_wrench`.
	- `post` variants (7D) → velocity runner: streams `vel_cmd` directly via the DSR
	`speedl` interface (no admittance node).

	```bash
	# pre family (variable-stiffness path)
	ros2 run robot_learning diffusion_policy_vac_preimg_runner \
	--ros-args -p checkpoint:=/path/to/vac_pre_vis_wrench_state/best.pth

	# post family (direct velocity path)
	ros2 run robot_learning diffusion_policy_fixed_k_runner \
	--ros-args -p checkpoint:=/path/to/vac_post_vis_wrench_state/best.pth -p mode:=vel_cmd
	```

	The binary hand head publishes to `/inspire_hand/left/cmd`; RGB observations come from
	the compressed camera topic; `_wrench` variants subscribe to `/bota_ft_sensor/wrench`
	(tared per trial). `action_stats.json` provides the exact normalization to undo at
	inference.

	## Intended use & limitations

	- Use: research on force-aware / compliance-predicting imitation learning for
	contact-rich insertion; a baseline ablation for VAC.
	- Limitations: single task (`pipe_fixing`), single embodiment (M0609 + RH56), 84×84
	vision, binary hand. The `pre` variants overfit at this scale. Not validated for
	safety-critical or autonomous deployment.

	## Related

	- Dataset: [`aleleanza/vac-pipe-dual-cam`](https://huggingface.co/datasets/aleleanza/vac-pipe-dual-cam)
	- Collection: VAC — Pipe Insertion