Instructions to use zengxy0624/diffusion-pusht-obstacles with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use zengxy0624/diffusion-pusht-obstacles with LeRobot:
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: lerobot | |
| pipeline_tag: robotics | |
| tags: | |
| - robotics | |
| - diffusion-policy | |
| - imitation-learning | |
| - pusht | |
| base_model: lerobot/diffusion_pusht | |
| # Diffusion Policy — PushT with Obstacles | |
| Diffusion Policy finetuned from [`lerobot/diffusion_pusht`](https://huggingface.co/lerobot/diffusion_pusht) | |
| for the PushT manipulation task **with random circular obstacles**: the agent | |
| pushes a T-shaped block to a goal pose while avoiding 1–3 obstacles per episode. | |
| The base model handles standard PushT but has zero obstacle awareness | |
| (0% success, 55% obstacle-hit rate as zero-shot baseline). Finetuning on | |
| 101 obstacle-aware demonstrations recovers a working policy. | |
| ## Results | |
| | Checkpoint | Success Rate | Obstacle-Hit Rate | | |
| |---|---|---| | |
| | Base (`lerobot/diffusion_pusht`, zero-shot on obstacle env) | 0% | 55% | | |
| | **This model** (best of 30k finetune steps) | **95%** | **0%** | | |
| Evaluated on `PushTObstacleEnv` with 20 episodes per checkpoint, 300 max steps, | |
| success threshold 0.95 coverage. | |
| > Note: 20 episodes is a noisy estimator (Wilson 95% CI ≈ ±20%). Treat the | |
| > 95% headline as approximate; a 100-episode re-evaluation is recommended. | |
| ## Architecture | |
| Inherited from the base model (no architecture changes, only weight finetuning): | |
| | Field | Value | | |
| |---|---| | |
| | Vision backbone | ResNet-18 | | |
| | Image input | 3×96×96 (random-cropped to 84×84) | | |
| | State input | 2 (agent_pos) | | |
| | Action output | 2 | | |
| | `n_obs_steps` | 2 | | |
| | `horizon` | 16 | | |
| | `n_action_steps` | 8 | | |
| | Diffusion timesteps | 100 | | |
| | Parameters | 262,709,026 | | |
| ## Training | |
| - **Hardware**: 1× NVIDIA H100 (NCSA Delta AI), AMP enabled | |
| - **Wall time**: ~3 hours for 30k steps | |
| - **Optimizer**: AdamW, β=(0.95, 0.999), wd=1e-6 | |
| - **LR**: 3e-5 (constant after 100-step warmup) | |
| - **Batch size**: 64 | |
| - **Dataset**: 101 episodes / 15,758 frames @ 10 fps, recorded with mouse teleop | |
| in `pusht_obstacle_env.py` | |
| - **Normalization**: dataset stats recomputed locally (image mean/std differ | |
| from the base PushT distribution due to obstacle pixels) | |
| ## Usage | |
| ### Standard inference (recommended) | |
| ```python | |
| from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy | |
| policy = DiffusionPolicy.from_pretrained("zengxy0624/diffusion-pusht-obstacles") | |
| policy.eval() | |
| ``` | |
| Drop-in compatible with the base model — same input/output schema, just | |
| swap the repo id. | |
| ### Raw checkpoints (every 5k steps) | |
| The full run trajectory is also stored under `raw_checkpoints/` in the repo | |
| for offline evaluation, ablations, or resuming training. These are PyTorch | |
| `.pt` files in the original training format (NOT safetensors): | |
| | File | Contents | Size | Use case | | |
| |---|---|---|---| | |
| | `raw_checkpoints/best.pt` | model + cfg + success | ~1 GB | Inference at peak success | | |
| | `raw_checkpoints/final.pt` | model + cfg + step | ~1 GB | Last training step | | |
| | `raw_checkpoints/step_{10,15,20,25,30}000.pt` | model + optimizer + scheduler + step | ~3 GB each | Resume training; per-step ablation | | |
| Download e.g. one via: | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| import torch | |
| path = hf_hub_download( | |
| repo_id="zengxy0624/diffusion-pusht-obstacles", | |
| filename="raw_checkpoints/step_20000.pt", | |
| ) | |
| ckpt = torch.load(path, map_location="cpu", weights_only=False) | |
| # ckpt["model"]: state_dict, ckpt["model_cfg"]: dict, ckpt["step"]: int (or "success" for best.pt) | |
| ``` | |
| Note: the schema (`model`, `model_cfg`, `optimizer`, `scheduler`, `step`, | |
| `best_success`) is internal to the original training script | |
| ([finetune.py](https://github.com/Tool-as-Interface/Tool_as_Interface) of this | |
| fork). Standard LeRobot tooling does NOT understand `.pt` files — use the | |
| `config.json`/`model.safetensors` at the repo root for that. | |
| ## Limitations | |
| - Only trained on circular obstacles with radius 15 px and 1–3 per episode. | |
| Out-of-distribution obstacle counts/shapes are not handled. | |
| - Late-training evaluation showed high variance (occasional collapses to | |
| 10% success). The released checkpoint is a single best-of-N draw and may | |
| not exactly reproduce 95% on a fresh 100-episode evaluation. | |
| - No EMA was used during training; the base `lerobot/diffusion_pusht` model | |
| was trained with EMA. Adding EMA is a known follow-up. | |
| ## Citation | |
| If this checkpoint is useful, please cite the original Diffusion Policy work: | |
| ```bibtex | |
| @article{chi2023diffusion, | |
| title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion}, | |
| author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran}, | |
| journal={The International Journal of Robotics Research}, | |
| year={2023}, | |
| } | |
| ``` | |
| And LeRobot: | |
| ```bibtex | |
| @misc{cadene2024lerobot, | |
| title={LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch}, | |
| author={Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Wolf, Thomas}, | |
| year={2024}, | |
| url={https://github.com/huggingface/lerobot}, | |
| } | |
| ``` | |