| --- |
| license: mit |
| tags: |
| - robotics |
| - diffusion-policy |
| - robomimic |
| - osc |
| - inverse-controller |
| --- |
| |
| # diffusion-policy-osc-adapters |
|
|
| NN inverse-controller adapters for running a joint-space (Δq) Diffusion Policy on top of the standard Robomimic `OSC_POSE` controller. Companion checkpoints to a fork of [columbia-ai-robotics/diffusion_policy](https://github.com/columbia-ai-robotics/diffusion_policy) on the Robomimic PH lowdim benchmark. |
|
|
| The adapter maps `(state, desired Δq) → a_OSC` so a policy trained to predict joint deltas can be rolled out through a Cartesian-space controller. |
|
|
| ## Layout |
|
|
| ``` |
| probe_nn_osc/ Probe-based adapters (Brian-style env-probing sampler) |
| nn_adapter_lift.pt |
| nn_adapter_can.pt |
| nn_adapter_square.pt |
| nn_adapter_tool_hang.pt |
| nn_adapter_transport.pt (dual-arm) |
| demosup_nn_osc/ Demo-supervised adapters (one training pair per demo timestep) |
| lift/ { best.pt, config.json, collect_metadata.json } |
| can/ ... |
| square/ ... |
| ``` |
|
|
| `config.json` is the trainer-emitted hyperparameter dump; `collect_metadata.json` records dataset, obs_keys, and demo IDs the adapter was trained on. |
| |
| ## Architecture (`demosup_nn_osc/*`) |
| |
| Brian Zhang's `InverseControllerMLP`: 3×512 SiLU + LayerNorm. Input is `(state, desired_Δq)` concatenated (33+7 = 40-D), output is the 7-D OSC normalized command. Per-channel input/output normalizers stored alongside in `best.pt`. |
|
|
| State features (single-arm): `object, robot0_eef_pos, robot0_eef_quat, robot0_gripper_qpos, robot0_joint_pos`. |
|
|
| ## Results — full-pipeline rollout (DP + adapter) |
|
|
| Robomimic PH lowdim, `test_start_seed=100000`, n_test=50, OSC default kp. DP is the 5k-epoch joint-delta UNet checkpoint. |
| |
| | Task | Probe NN-OSC | **Demo-supervised NN-OSC** | FK→OSC reference | |
| |--------|--------------|----------------------------|------------------| |
| | lift | 0.00 | **0.90** | 0.94 | |
| | can | 0.02 | **0.64** | 0.88 | |
| | square | 0.00 | **0.48** | 0.50 (kp=3000) | |
| |
| Demo-supervised closes most of the gap to the analytic FK→OSC adapter, and trains in ~5 min/task on a single H100/H200. The probe-based adapter fails because OSC's Cartesian command is a many-to-one inverse of Δq; the demo-supervised dataset restricts training to the demo manifold where teleop already picked a consistent branch. |
| |
| ## Usage |
| |
| ```python |
| import torch |
| ckpt = torch.load("demosup_nn_osc/lift/best.pt", map_location="cpu") |
| # ckpt contains: state_dict, input_mean/std, command_mean/std, hidden_dims, ... |
| # See reverse_controller/train_inverse_model.py in the source repo for the full |
| # build_model + normalize pipeline. |
| ``` |
| |
| ## Source |
| |
| - Code branch: https://github.com/sour5blue/diffusion_policy/tree/sour/obs-noise-param (`writeup.md`, sections D–F). |
| - Upstream: https://github.com/WubbLord/diffusion_policy |
| - Demo-supervised collector: `collect_demo_only_osc.py`. |
| - Trainer: `reverse_controller/train_inverse_model.py`. |
| - Eval entrypoint: `eval_nn_osc.py`. |
| |