Lemon-03
/

DP_PushT_test

+---
+datasets:
+- lerobot/pusht
+library_name: lerobot
+license: apache-2.0
+model_name: diffusion
+pipeline_tag: robotics
+tags:
+- lerobot
+- robotics
+- diffusion
+- pusht
+- imitation-learning
+- phase-1
+---
+# 🦾 Diffusion Policy for Push-T (Phase 1: 100k Steps)
+[![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot)
+[![Task](https://img.shields.io/badge/Task-Push--T-blue)](https://huggingface.co/datasets/lerobot/pusht)
+[![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
+[![Phase](https://img.shields.io/badge/Training_Phase-Initial-orange)](https://huggingface.co/Lemon-03/DP_PushT_test)
+> **Summary:** This model represents the **initial training phase (0 - 100k steps)** of a Diffusion Policy on the Push-T task. It serves as the pre-trained foundation for further fine-tuning. While it demonstrates strong trajectory learning capabilities, it has not yet fully converged to high success rates.
+- **🧩 Task**: Push-T (Simulated)
+- **🧠 Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
+- **🔄 Training Steps**: 100,000 (Initial Phase)
+- **🎓 Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
+---
+## ⚠️ Note on Performance & Fine-tuning
+This checkpoint represents the **intermediate state** of our research.
+While it achieves high movement precision (**Avg Max Reward: 0.71**), the strict success threshold of the Push-T task results in a lower success rate at this stage.
+### 🚀 **Upgrade Available:**
+We performed **Resume Training (Fine-tuning)** based on this checkpoint for another 100k steps, achieving significantly better results.
+👉 **Check out the final model here:** [**Lemon-03/DP_PushT_test_Resume**](https://huggingface.co/Lemon-03/DP_PushT_test_Resume)
+---
+## 🔬 Benchmark Results (Phase 1)
+Evaluated on **50 episodes** in the `Push-T` environment using LeRobot.
+| Metric | Value | Status |
+| :--- | :---: | :---: |
+| **Success Rate** | **4.0%** | 🚧 (Under-trained) |
+| **Avg Max Reward** | **0.71** | 📈 (High Precision) |
+| **Avg Sum Reward** | **115.03** | ✅ (Good Trajectory) |
+> **Analysis:** The model has successfully learned the multimodal distribution of the demonstration data and can push the T-block close to the target (Reward 0.71). However, it lacks the final fine-grained adjustment capabilities required for the >95% overlap success criteria. This motivated the subsequent **Phase 2 (Resume Training)**.
+---
+## ⚙️ Model Details
+| Parameter | Description |
+| :--- | :--- |
+| **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
+| **Prediction Horizon** | 16 steps |
+| **Observation History** | 2 steps |
+| **Action Steps** | 8 steps |
+---
+## 🔧 Training Configuration (Reference)
+For reproducibility, here are the key parameters used during this initial training session:
+- **Batch Size**: 8 (Effective)
+- **Optimizer**: AdamW (`lr=1e-4`)
+- **Scheduler**: Cosine with warmup
+- **Vision**: ResNet18 with random crop (84x84)
+#### Original Training Command
+```bash
+python -m lerobot.scripts.lerobot_train \
+  --policy.type diffusion \
+  --env.type pusht \
+  --dataset.repo_id lerobot/pusht \
+  --wandb.enable true \
+  --job_name DP_PushT \
+  --policy.repo_id Lemon-03/DP_PushT_test \
+  --eval.batch_size 8
+````
+-----
+## 🚀 Evaluate (My Evaluation Mode)
+You can evaluate this checkpoint to reproduce the Phase 1 results:
+```bash
+python -m lerobot.scripts.lerobot_eval \
+  --policy.type diffusion \
+  --policy.pretrained_path Lemon-03/DP_PushT_test \
+  --eval.n_episodes 50 \
+  --eval.batch_size 10 \
+  --env.type pusht \
+  --env.task PushT-v0
+```