Lemon-03 commited on
Commit
a10bc1b
Β·
verified Β·
1 Parent(s): 62b6037

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - lerobot/pusht
4
+ library_name: lerobot
5
+ license: apache-2.0
6
+ model_name: diffusion
7
+ pipeline_tag: robotics
8
+ tags:
9
+ - lerobot
10
+ - robotics
11
+ - diffusion
12
+ - pusht
13
+ - imitation-learning
14
+ - phase-1
15
+ ---
16
+
17
+ # 🦾 Diffusion Policy for Push-T (Phase 1: 100k Steps)
18
+
19
+ [![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot)
20
+ [![Task](https://img.shields.io/badge/Task-Push--T-blue)](https://huggingface.co/datasets/lerobot/pusht)
21
+ [![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
22
+ [![Phase](https://img.shields.io/badge/Training_Phase-Initial-orange)](https://huggingface.co/Lemon-03/DP_PushT_test)
23
+
24
+ > **Summary:** This model represents the **initial training phase (0 - 100k steps)** of a Diffusion Policy on the Push-T task. It serves as the pre-trained foundation for further fine-tuning. While it demonstrates strong trajectory learning capabilities, it has not yet fully converged to high success rates.
25
+
26
+ - **🧩 Task**: Push-T (Simulated)
27
+ - **🧠 Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
28
+ - **πŸ”„ Training Steps**: 100,000 (Initial Phase)
29
+ - **πŸŽ“ Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
30
+
31
+ ---
32
+
33
+ ## ⚠️ Note on Performance & Fine-tuning
34
+
35
+ This checkpoint represents the **intermediate state** of our research.
36
+ While it achieves high movement precision (**Avg Max Reward: 0.71**), the strict success threshold of the Push-T task results in a lower success rate at this stage.
37
+
38
+ ### πŸš€ **Upgrade Available:**
39
+ We performed **Resume Training (Fine-tuning)** based on this checkpoint for another 100k steps, achieving significantly better results.
40
+ πŸ‘‰ **Check out the final model here:** [**Lemon-03/DP_PushT_test_Resume**](https://huggingface.co/Lemon-03/DP_PushT_test_Resume)
41
+
42
+ ---
43
+
44
+ ## πŸ”¬ Benchmark Results (Phase 1)
45
+
46
+ Evaluated on **50 episodes** in the `Push-T` environment using LeRobot.
47
+
48
+ | Metric | Value | Status |
49
+ | :--- | :---: | :---: |
50
+ | **Success Rate** | **4.0%** | 🚧 (Under-trained) |
51
+ | **Avg Max Reward** | **0.71** | πŸ“ˆ (High Precision) |
52
+ | **Avg Sum Reward** | **115.03** | βœ… (Good Trajectory) |
53
+
54
+ > **Analysis:** The model has successfully learned the multimodal distribution of the demonstration data and can push the T-block close to the target (Reward 0.71). However, it lacks the final fine-grained adjustment capabilities required for the >95% overlap success criteria. This motivated the subsequent **Phase 2 (Resume Training)**.
55
+
56
+ ---
57
+
58
+ ## βš™οΈ Model Details
59
+
60
+ | Parameter | Description |
61
+ | :--- | :--- |
62
+ | **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
63
+ | **Prediction Horizon** | 16 steps |
64
+ | **Observation History** | 2 steps |
65
+ | **Action Steps** | 8 steps |
66
+
67
+ ---
68
+
69
+ ## πŸ”§ Training Configuration (Reference)
70
+
71
+ For reproducibility, here are the key parameters used during this initial training session:
72
+
73
+ - **Batch Size**: 8 (Effective)
74
+ - **Optimizer**: AdamW (`lr=1e-4`)
75
+ - **Scheduler**: Cosine with warmup
76
+ - **Vision**: ResNet18 with random crop (84x84)
77
+
78
+ #### Original Training Command
79
+
80
+ ```bash
81
+ python -m lerobot.scripts.lerobot_train \
82
+ --policy.type diffusion \
83
+ --env.type pusht \
84
+ --dataset.repo_id lerobot/pusht \
85
+ --wandb.enable true \
86
+ --job_name DP_PushT \
87
+ --policy.repo_id Lemon-03/DP_PushT_test \
88
+ --eval.batch_size 8
89
+ ````
90
+
91
+ -----
92
+
93
+ ## πŸš€ Evaluate (My Evaluation Mode)
94
+
95
+ You can evaluate this checkpoint to reproduce the Phase 1 results:
96
+
97
+ ```bash
98
+ python -m lerobot.scripts.lerobot_eval \
99
+ --policy.type diffusion \
100
+ --policy.pretrained_path Lemon-03/DP_PushT_test \
101
+ --eval.n_episodes 50 \
102
+ --eval.batch_size 10 \
103
+ --env.type pusht \
104
+ --env.task PushT-v0
105
+ ```