File size: 4,997 Bytes
cc8509f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9606b3d
 
cc8509f
 
 
 
 
 
 
 
 
 
9606b3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc8509f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
tags:
  - robotics
  - diffusion-policy
  - imitation-learning
  - pusht
base_model: lerobot/diffusion_pusht
---

# Diffusion Policy — PushT with Obstacles

Diffusion Policy finetuned from [`lerobot/diffusion_pusht`](https://huggingface.co/lerobot/diffusion_pusht)
for the PushT manipulation task **with random circular obstacles**: the agent
pushes a T-shaped block to a goal pose while avoiding 1–3 obstacles per episode.

The base model handles standard PushT but has zero obstacle awareness
(0% success, 55% obstacle-hit rate as zero-shot baseline). Finetuning on
101 obstacle-aware demonstrations recovers a working policy.

## Results

| Checkpoint | Success Rate | Obstacle-Hit Rate |
|---|---|---|
| Base (`lerobot/diffusion_pusht`, zero-shot on obstacle env) | 0% | 55% |
| **This model** (best of 30k finetune steps) | **95%** | **0%** |

Evaluated on `PushTObstacleEnv` with 20 episodes per checkpoint, 300 max steps,
success threshold 0.95 coverage.

> Note: 20 episodes is a noisy estimator (Wilson 95% CI ≈ ±20%). Treat the
> 95% headline as approximate; a 100-episode re-evaluation is recommended.

## Architecture

Inherited from the base model (no architecture changes, only weight finetuning):

| Field | Value |
|---|---|
| Vision backbone | ResNet-18 |
| Image input | 3×96×96 (random-cropped to 84×84) |
| State input | 2 (agent_pos) |
| Action output | 2 |
| `n_obs_steps` | 2 |
| `horizon` | 16 |
| `n_action_steps` | 8 |
| Diffusion timesteps | 100 |
| Parameters | 262,709,026 |

## Training

- **Hardware**: 1× NVIDIA H100 (NCSA Delta AI), AMP enabled
- **Wall time**: ~3 hours for 30k steps
- **Optimizer**: AdamW, β=(0.95, 0.999), wd=1e-6
- **LR**: 3e-5 (constant after 100-step warmup)
- **Batch size**: 64
- **Dataset**: 101 episodes / 15,758 frames @ 10 fps, recorded with mouse teleop
  in `pusht_obstacle_env.py`
- **Normalization**: dataset stats recomputed locally (image mean/std differ
  from the base PushT distribution due to obstacle pixels)

## Usage

### Standard inference (recommended)

```python
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy

policy = DiffusionPolicy.from_pretrained("zengxy0624/diffusion-pusht-obstacles")
policy.eval()
```

Drop-in compatible with the base model — same input/output schema, just
swap the repo id.

### Raw checkpoints (every 5k steps)

The full run trajectory is also stored under `raw_checkpoints/` in the repo
for offline evaluation, ablations, or resuming training. These are PyTorch
`.pt` files in the original training format (NOT safetensors):

| File | Contents | Size | Use case |
|---|---|---|---|
| `raw_checkpoints/best.pt` | model + cfg + success | ~1 GB | Inference at peak success |
| `raw_checkpoints/final.pt` | model + cfg + step | ~1 GB | Last training step |
| `raw_checkpoints/step_{10,15,20,25,30}000.pt` | model + optimizer + scheduler + step | ~3 GB each | Resume training; per-step ablation |

Download e.g. one via:

```python
from huggingface_hub import hf_hub_download
import torch

path = hf_hub_download(
    repo_id="zengxy0624/diffusion-pusht-obstacles",
    filename="raw_checkpoints/step_20000.pt",
)
ckpt = torch.load(path, map_location="cpu", weights_only=False)
# ckpt["model"]: state_dict, ckpt["model_cfg"]: dict, ckpt["step"]: int (or "success" for best.pt)
```

Note: the schema (`model`, `model_cfg`, `optimizer`, `scheduler`, `step`,
`best_success`) is internal to the original training script
([finetune.py](https://github.com/Tool-as-Interface/Tool_as_Interface) of this
fork). Standard LeRobot tooling does NOT understand `.pt` files — use the
`config.json`/`model.safetensors` at the repo root for that.

## Limitations

- Only trained on circular obstacles with radius 15 px and 1–3 per episode.
  Out-of-distribution obstacle counts/shapes are not handled.
- Late-training evaluation showed high variance (occasional collapses to
  10% success). The released checkpoint is a single best-of-N draw and may
  not exactly reproduce 95% on a fresh 100-episode evaluation.
- No EMA was used during training; the base `lerobot/diffusion_pusht` model
  was trained with EMA. Adding EMA is a known follow-up.

## Citation

If this checkpoint is useful, please cite the original Diffusion Policy work:

```bibtex
@article{chi2023diffusion,
  title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
  author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
  journal={The International Journal of Robotics Research},
  year={2023},
}
```

And LeRobot:

```bibtex
@misc{cadene2024lerobot,
  title={LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
  author={Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Wolf, Thomas},
  year={2024},
  url={https://github.com/huggingface/lerobot},
}
```