--- license: apache-2.0 library_name: lerobot pipeline_tag: robotics tags: - robotics - diffusion-policy - imitation-learning - pusht base_model: lerobot/diffusion_pusht --- # Diffusion Policy — PushT with Obstacles Diffusion Policy finetuned from [`lerobot/diffusion_pusht`](https://huggingface.co/lerobot/diffusion_pusht) for the PushT manipulation task **with random circular obstacles**: the agent pushes a T-shaped block to a goal pose while avoiding 1–3 obstacles per episode. The base model handles standard PushT but has zero obstacle awareness (0% success, 55% obstacle-hit rate as zero-shot baseline). Finetuning on 101 obstacle-aware demonstrations recovers a working policy. ## Results | Checkpoint | Success Rate | Obstacle-Hit Rate | |---|---|---| | Base (`lerobot/diffusion_pusht`, zero-shot on obstacle env) | 0% | 55% | | **This model** (best of 30k finetune steps) | **95%** | **0%** | Evaluated on `PushTObstacleEnv` with 20 episodes per checkpoint, 300 max steps, success threshold 0.95 coverage. > Note: 20 episodes is a noisy estimator (Wilson 95% CI ≈ ±20%). Treat the > 95% headline as approximate; a 100-episode re-evaluation is recommended. ## Architecture Inherited from the base model (no architecture changes, only weight finetuning): | Field | Value | |---|---| | Vision backbone | ResNet-18 | | Image input | 3×96×96 (random-cropped to 84×84) | | State input | 2 (agent_pos) | | Action output | 2 | | `n_obs_steps` | 2 | | `horizon` | 16 | | `n_action_steps` | 8 | | Diffusion timesteps | 100 | | Parameters | 262,709,026 | ## Training - **Hardware**: 1× NVIDIA H100 (NCSA Delta AI), AMP enabled - **Wall time**: ~3 hours for 30k steps - **Optimizer**: AdamW, β=(0.95, 0.999), wd=1e-6 - **LR**: 3e-5 (constant after 100-step warmup) - **Batch size**: 64 - **Dataset**: 101 episodes / 15,758 frames @ 10 fps, recorded with mouse teleop in `pusht_obstacle_env.py` - **Normalization**: dataset stats recomputed locally (image mean/std differ from the base PushT distribution due to obstacle pixels) ## Usage ### Standard inference (recommended) ```python from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy policy = DiffusionPolicy.from_pretrained("zengxy0624/diffusion-pusht-obstacles") policy.eval() ``` Drop-in compatible with the base model — same input/output schema, just swap the repo id. ### Raw checkpoints (every 5k steps) The full run trajectory is also stored under `raw_checkpoints/` in the repo for offline evaluation, ablations, or resuming training. These are PyTorch `.pt` files in the original training format (NOT safetensors): | File | Contents | Size | Use case | |---|---|---|---| | `raw_checkpoints/best.pt` | model + cfg + success | ~1 GB | Inference at peak success | | `raw_checkpoints/final.pt` | model + cfg + step | ~1 GB | Last training step | | `raw_checkpoints/step_{10,15,20,25,30}000.pt` | model + optimizer + scheduler + step | ~3 GB each | Resume training; per-step ablation | Download e.g. one via: ```python from huggingface_hub import hf_hub_download import torch path = hf_hub_download( repo_id="zengxy0624/diffusion-pusht-obstacles", filename="raw_checkpoints/step_20000.pt", ) ckpt = torch.load(path, map_location="cpu", weights_only=False) # ckpt["model"]: state_dict, ckpt["model_cfg"]: dict, ckpt["step"]: int (or "success" for best.pt) ``` Note: the schema (`model`, `model_cfg`, `optimizer`, `scheduler`, `step`, `best_success`) is internal to the original training script ([finetune.py](https://github.com/Tool-as-Interface/Tool_as_Interface) of this fork). Standard LeRobot tooling does NOT understand `.pt` files — use the `config.json`/`model.safetensors` at the repo root for that. ## Limitations - Only trained on circular obstacles with radius 15 px and 1–3 per episode. Out-of-distribution obstacle counts/shapes are not handled. - Late-training evaluation showed high variance (occasional collapses to 10% success). The released checkpoint is a single best-of-N draw and may not exactly reproduce 95% on a fresh 100-episode evaluation. - No EMA was used during training; the base `lerobot/diffusion_pusht` model was trained with EMA. Adding EMA is a known follow-up. ## Citation If this checkpoint is useful, please cite the original Diffusion Policy work: ```bibtex @article{chi2023diffusion, title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion}, author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran}, journal={The International Journal of Robotics Research}, year={2023}, } ``` And LeRobot: ```bibtex @misc{cadene2024lerobot, title={LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch}, author={Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Wolf, Thomas}, year={2024}, url={https://github.com/huggingface/lerobot}, } ```