Document raw_checkpoints/ in README

9606b3d verified 20 days ago

5 kB

	---
	license: apache-2.0
	library_name: lerobot
	pipeline_tag: robotics
	tags:
	- robotics
	- diffusion-policy
	- imitation-learning
	- pusht
	base_model: lerobot/diffusion_pusht
	---

	# Diffusion Policy — PushT with Obstacles

	Diffusion Policy finetuned from [`lerobot/diffusion_pusht`](https://huggingface.co/lerobot/diffusion_pusht)
	for the PushT manipulation task with random circular obstacles: the agent
	pushes a T-shaped block to a goal pose while avoiding 1–3 obstacles per episode.

	The base model handles standard PushT but has zero obstacle awareness
	(0% success, 55% obstacle-hit rate as zero-shot baseline). Finetuning on
	101 obstacle-aware demonstrations recovers a working policy.

	## Results

	\| Checkpoint \| Success Rate \| Obstacle-Hit Rate \|
	\|---\|---\|---\|
	\| Base (`lerobot/diffusion_pusht`, zero-shot on obstacle env) \| 0% \| 55% \|
	\| This model (best of 30k finetune steps) \| 95% \| 0% \|

	Evaluated on `PushTObstacleEnv` with 20 episodes per checkpoint, 300 max steps,
	success threshold 0.95 coverage.

	> Note: 20 episodes is a noisy estimator (Wilson 95% CI ≈ ±20%). Treat the
	> 95% headline as approximate; a 100-episode re-evaluation is recommended.

	## Architecture

	Inherited from the base model (no architecture changes, only weight finetuning):

	\| Field \| Value \|
	\|---\|---\|
	\| Vision backbone \| ResNet-18 \|
	\| Image input \| 3×96×96 (random-cropped to 84×84) \|
	\| State input \| 2 (agent_pos) \|
	\| Action output \| 2 \|
	\| `n_obs_steps` \| 2 \|
	\| `horizon` \| 16 \|
	\| `n_action_steps` \| 8 \|
	\| Diffusion timesteps \| 100 \|
	\| Parameters \| 262,709,026 \|

	## Training

	- Hardware: 1× NVIDIA H100 (NCSA Delta AI), AMP enabled
	- Wall time: ~3 hours for 30k steps
	- Optimizer: AdamW, β=(0.95, 0.999), wd=1e-6
	- LR: 3e-5 (constant after 100-step warmup)
	- Batch size: 64
	- Dataset: 101 episodes / 15,758 frames @ 10 fps, recorded with mouse teleop
	in `pusht_obstacle_env.py`
	- Normalization: dataset stats recomputed locally (image mean/std differ
	from the base PushT distribution due to obstacle pixels)

	## Usage

	### Standard inference (recommended)

	```python
	from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy

	policy = DiffusionPolicy.from_pretrained("zengxy0624/diffusion-pusht-obstacles")
	policy.eval()
	```

	Drop-in compatible with the base model — same input/output schema, just
	swap the repo id.

	### Raw checkpoints (every 5k steps)

	The full run trajectory is also stored under `raw_checkpoints/` in the repo
	for offline evaluation, ablations, or resuming training. These are PyTorch
	`.pt` files in the original training format (NOT safetensors):

	\| File \| Contents \| Size \| Use case \|
	\|---\|---\|---\|---\|
	\| `raw_checkpoints/best.pt` \| model + cfg + success \| ~1 GB \| Inference at peak success \|
	\| `raw_checkpoints/final.pt` \| model + cfg + step \| ~1 GB \| Last training step \|
	\| `raw_checkpoints/step_{10,15,20,25,30}000.pt` \| model + optimizer + scheduler + step \| ~3 GB each \| Resume training; per-step ablation \|

	Download e.g. one via:

	```python
	from huggingface_hub import hf_hub_download
	import torch

	path = hf_hub_download(
	repo_id="zengxy0624/diffusion-pusht-obstacles",
	filename="raw_checkpoints/step_20000.pt",
	)
	ckpt = torch.load(path, map_location="cpu", weights_only=False)
	# ckpt["model"]: state_dict, ckpt["model_cfg"]: dict, ckpt["step"]: int (or "success" for best.pt)
	```

	Note: the schema (`model`, `model_cfg`, `optimizer`, `scheduler`, `step`,
	`best_success`) is internal to the original training script
	([finetune.py](https://github.com/Tool-as-Interface/Tool_as_Interface) of this
	fork). Standard LeRobot tooling does NOT understand `.pt` files — use the
	`config.json`/`model.safetensors` at the repo root for that.

	## Limitations

	- Only trained on circular obstacles with radius 15 px and 1–3 per episode.
	Out-of-distribution obstacle counts/shapes are not handled.
	- Late-training evaluation showed high variance (occasional collapses to
	10% success). The released checkpoint is a single best-of-N draw and may
	not exactly reproduce 95% on a fresh 100-episode evaluation.
	- No EMA was used during training; the base `lerobot/diffusion_pusht` model
	was trained with EMA. Adding EMA is a known follow-up.

	## Citation

	If this checkpoint is useful, please cite the original Diffusion Policy work:

	```bibtex
	@article{chi2023diffusion,
	title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
	author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
	journal={The International Journal of Robotics Research},
	year={2023},
	}
	```

	And LeRobot:

	```bibtex
	@misc{cadene2024lerobot,
	title={LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
	author={Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Wolf, Thomas},
	year={2024},
	url={https://github.com/huggingface/lerobot},
	}
	```