Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
datasets:
|
| 3 |
+
- lerobot/pusht
|
| 4 |
+
library_name: lerobot
|
| 5 |
+
license: apache-2.0
|
| 6 |
+
model_name: diffusion
|
| 7 |
+
pipeline_tag: robotics
|
| 8 |
+
tags:
|
| 9 |
+
- lerobot
|
| 10 |
+
- robotics
|
| 11 |
+
- diffusion
|
| 12 |
+
- pusht
|
| 13 |
+
- imitation-learning
|
| 14 |
+
- phase-1
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# π¦Ύ Diffusion Policy for Push-T (Phase 1: 100k Steps)
|
| 18 |
+
|
| 19 |
+
[](https://github.com/huggingface/lerobot)
|
| 20 |
+
[](https://huggingface.co/datasets/lerobot/pusht)
|
| 21 |
+
[](https://www.uestc.edu.cn/)
|
| 22 |
+
[](https://huggingface.co/Lemon-03/DP_PushT_test)
|
| 23 |
+
|
| 24 |
+
> **Summary:** This model represents the **initial training phase (0 - 100k steps)** of a Diffusion Policy on the Push-T task. It serves as the pre-trained foundation for further fine-tuning. While it demonstrates strong trajectory learning capabilities, it has not yet fully converged to high success rates.
|
| 25 |
+
|
| 26 |
+
- **π§© Task**: Push-T (Simulated)
|
| 27 |
+
- **π§ Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
|
| 28 |
+
- **π Training Steps**: 100,000 (Initial Phase)
|
| 29 |
+
- **π Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## β οΈ Note on Performance & Fine-tuning
|
| 34 |
+
|
| 35 |
+
This checkpoint represents the **intermediate state** of our research.
|
| 36 |
+
While it achieves high movement precision (**Avg Max Reward: 0.71**), the strict success threshold of the Push-T task results in a lower success rate at this stage.
|
| 37 |
+
|
| 38 |
+
### π **Upgrade Available:**
|
| 39 |
+
We performed **Resume Training (Fine-tuning)** based on this checkpoint for another 100k steps, achieving significantly better results.
|
| 40 |
+
π **Check out the final model here:** [**Lemon-03/DP_PushT_test_Resume**](https://huggingface.co/Lemon-03/DP_PushT_test_Resume)
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
## π¬ Benchmark Results (Phase 1)
|
| 45 |
+
|
| 46 |
+
Evaluated on **50 episodes** in the `Push-T` environment using LeRobot.
|
| 47 |
+
|
| 48 |
+
| Metric | Value | Status |
|
| 49 |
+
| :--- | :---: | :---: |
|
| 50 |
+
| **Success Rate** | **4.0%** | π§ (Under-trained) |
|
| 51 |
+
| **Avg Max Reward** | **0.71** | π (High Precision) |
|
| 52 |
+
| **Avg Sum Reward** | **115.03** | β
(Good Trajectory) |
|
| 53 |
+
|
| 54 |
+
> **Analysis:** The model has successfully learned the multimodal distribution of the demonstration data and can push the T-block close to the target (Reward 0.71). However, it lacks the final fine-grained adjustment capabilities required for the >95% overlap success criteria. This motivated the subsequent **Phase 2 (Resume Training)**.
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## βοΈ Model Details
|
| 59 |
+
|
| 60 |
+
| Parameter | Description |
|
| 61 |
+
| :--- | :--- |
|
| 62 |
+
| **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
|
| 63 |
+
| **Prediction Horizon** | 16 steps |
|
| 64 |
+
| **Observation History** | 2 steps |
|
| 65 |
+
| **Action Steps** | 8 steps |
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
## π§ Training Configuration (Reference)
|
| 70 |
+
|
| 71 |
+
For reproducibility, here are the key parameters used during this initial training session:
|
| 72 |
+
|
| 73 |
+
- **Batch Size**: 8 (Effective)
|
| 74 |
+
- **Optimizer**: AdamW (`lr=1e-4`)
|
| 75 |
+
- **Scheduler**: Cosine with warmup
|
| 76 |
+
- **Vision**: ResNet18 with random crop (84x84)
|
| 77 |
+
|
| 78 |
+
#### Original Training Command
|
| 79 |
+
|
| 80 |
+
```bash
|
| 81 |
+
python -m lerobot.scripts.lerobot_train \
|
| 82 |
+
--policy.type diffusion \
|
| 83 |
+
--env.type pusht \
|
| 84 |
+
--dataset.repo_id lerobot/pusht \
|
| 85 |
+
--wandb.enable true \
|
| 86 |
+
--job_name DP_PushT \
|
| 87 |
+
--policy.repo_id Lemon-03/DP_PushT_test \
|
| 88 |
+
--eval.batch_size 8
|
| 89 |
+
````
|
| 90 |
+
|
| 91 |
+
-----
|
| 92 |
+
|
| 93 |
+
## π Evaluate (My Evaluation Mode)
|
| 94 |
+
|
| 95 |
+
You can evaluate this checkpoint to reproduce the Phase 1 results:
|
| 96 |
+
|
| 97 |
+
```bash
|
| 98 |
+
python -m lerobot.scripts.lerobot_eval \
|
| 99 |
+
--policy.type diffusion \
|
| 100 |
+
--policy.pretrained_path Lemon-03/DP_PushT_test \
|
| 101 |
+
--eval.n_episodes 50 \
|
| 102 |
+
--eval.batch_size 10 \
|
| 103 |
+
--env.type pusht \
|
| 104 |
+
--env.task PushT-v0
|
| 105 |
+
```
|