diffusion-policy-osc-adapters

NN inverse-controller adapters for running a joint-space (Δq) Diffusion Policy on top of the standard Robomimic OSC_POSE controller. Companion checkpoints to a fork of columbia-ai-robotics/diffusion_policy on the Robomimic PH lowdim benchmark.

The adapter maps (state, desired Δq) → a_OSC so a policy trained to predict joint deltas can be rolled out through a Cartesian-space controller.

Layout

probe_nn_osc/      Probe-based adapters (Brian-style env-probing sampler)
  nn_adapter_lift.pt
  nn_adapter_can.pt
  nn_adapter_square.pt
  nn_adapter_tool_hang.pt
  nn_adapter_transport.pt        (dual-arm)
demosup_nn_osc/    Demo-supervised adapters (one training pair per demo timestep)
  lift/   { best.pt, config.json, collect_metadata.json }
  can/    ...
  square/ ...

config.json is the trainer-emitted hyperparameter dump; collect_metadata.json records dataset, obs_keys, and demo IDs the adapter was trained on.

Architecture (`demosup_nn_osc/*`)

Brian Zhang's InverseControllerMLP: 3×512 SiLU + LayerNorm. Input is (state, desired_Δq) concatenated (33+7 = 40-D), output is the 7-D OSC normalized command. Per-channel input/output normalizers stored alongside in best.pt.

State features (single-arm): object, robot0_eef_pos, robot0_eef_quat, robot0_gripper_qpos, robot0_joint_pos.

Results — full-pipeline rollout (DP + adapter)

Robomimic PH lowdim, test_start_seed=100000, n_test=50, OSC default kp. DP is the 5k-epoch joint-delta UNet checkpoint.

Task	Probe NN-OSC	Demo-supervised NN-OSC	FK→OSC reference
lift	0.00	0.90	0.94
can	0.02	0.64	0.88
square	0.00	0.48	0.50 (kp=3000)

Demo-supervised closes most of the gap to the analytic FK→OSC adapter, and trains in ~5 min/task on a single H100/H200. The probe-based adapter fails because OSC's Cartesian command is a many-to-one inverse of Δq; the demo-supervised dataset restricts training to the demo manifold where teleop already picked a consistent branch.

Usage

import torch
ckpt = torch.load("demosup_nn_osc/lift/best.pt", map_location="cpu")
# ckpt contains: state_dict, input_mean/std, command_mean/std, hidden_dims, ...
# See reverse_controller/train_inverse_model.py in the source repo for the full
# build_model + normalize pipeline.

Source

Code branch: https://github.com/sour5blue/diffusion_policy/tree/sour/obs-noise-param (writeup.md, sections D–F).
Upstream: https://github.com/WubbLord/diffusion_policy
Demo-supervised collector: collect_demo_only_osc.py.
Trainer: reverse_controller/train_inverse_model.py.
Eval entrypoint: eval_nn_osc.py.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview