Lemon-03
/

ACT_PushT_test

+---
+datasets:
+- lerobot/pusht
+library_name: lerobot
+license: apache-2.0
+model_name: act
+pipeline_tag: robotics
+tags:
+- lerobot
+- robotics
+- act
+- pusht
+- imitation-learning
+- baseline
+---
+# 🤖 ACT for Push-T (Baseline Benchmark)
+[![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot)
+[![Task](https://img.shields.io/badge/Task-Push--T-blue)](https://huggingface.co/datasets/lerobot/pusht)
+[![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
+[![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0)
+> **Summary:** This model represents the **ACT (Action Chunking with Transformers)** baseline trained on the Push-T task. It serves as a comparative benchmark for our research on Diffusion Policies. Despite 200k steps of training, ACT struggled to model the multimodal action distribution required for high-precision alignment in this task.
+- **🧩 Task**: Push-T (Simulated)
+- **🧠 Algorithm**: [ACT](https://arxiv.org/abs/2304.13705) (Action Chunking with Transformers)
+- **🔄 Training Steps**: 200,000
+- **🎓 Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
+---
+## 🔬 Benchmark Results (Baseline)
+This model establishes the baseline performance. Unlike Diffusion Policy, ACT tends to average out multimodal action possibilities, leading to "stiff" behavior or failure to perform fine-grained adjustments at the boundaries.
+### 📊 Evaluation Metrics (50 Episodes)
+| Metric | Value | Interpretation | Status |
+| :--- | :---: | :--- | :---: |
+| **Success Rate** | **0.0%** | Failed to meet the strict >95% overlap criteria. | ❌ |
+| **Avg Max Reward** | **0.51** | Partially covers the target (~50%), but lacks precision. | 🚧 |
+| **Avg Sum Reward** | **55.48** | Trajectories are valid but often stall or drift. | 📉 |
+> **Analysis:** While the model learned the general reaching and pushing motion (Reward > 0.5), it consistently failed the final stage of the task. This highlights ACT's limitation in handling tasks requiring high-precision correction from multimodal demonstrations compared to Generative Policies.
+---
+## ⚙️ Model Details
+| Parameter | Description |
+| :--- | :--- |
+| **Architecture** | ResNet18 (Backbone) + Transformer Encoder-Decoder |
+| **Action Chunking** | 100 steps |
+| **VAE Enabled** | Yes (Latent Dim: 32) |
+| **Input** | Single Camera (84x84) + Agent Position |
+---
+## 🔧 Training Configuration
+For reproducibility, here are the key parameters used during the training session.
+- **Batch Size**: 64
+- **Optimizer**: AdamW (`lr=2e-5`)
+- **Scheduler**: Constant
+- **Vision**: ResNet18 (Pretrained ImageNet)
+- **Precision**: Mixed Precision (AMP) enabled
+### Original Training Command (My Resume Mode)
+```bash
+python -m lerobot.scripts.lerobot_train
+  --config_path act_pusht.yaml
+  --dataset.repo_id lerobot/pusht
+  --job_name aloha_sim_insertion_human_ACT_PushT
+  --wandb.enable true
+  --policy.repo_id Lemon-03/ACT_PushT_test
+```
+### act_pusht.yaml
+<details>
+<summary>📄 <strong>Click to view full <code>act_pusht.yaml</code> configuration</strong></summary>
+```yaml
+# @package _global_
+# Basic Settings
+seed: 100000
+job_name: ACT-PushT
+steps: 200000
+eval_freq: 10000
+save_freq: 50000
+log_freq: 250
+batch_size: 64
+# Dataset
+dataset:
+  repo_id: lerobot/pusht
+# Evaluation
+eval:
+  n_episodes: 50
+  batch_size: 8
+# Environment
+env:
+  type: pusht
+  task: PushT-v0
+  fps: 10
+# Policy Configuration
+policy:
+  type: act
+  # Vision Backbone
+  vision_backbone: resnet18
+  pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1
+  replace_final_stride_with_dilation: false
+  # Transformer Params
+  pre_norm: false
+  dim_model: 512
+  n_heads: 8
+  dim_feedforward: 3200
+  feedforward_activation: relu
+  n_encoder_layers: 4
+  n_decoder_layers: 1
+  # VAE Params
+  use_vae: true
+  latent_dim: 32
+  n_vae_encoder_layers: 4
+  # Action Chunking
+  chunk_size: 100
+  n_action_steps: 100
+  n_obs_steps: 1
+  # Training & Loss
+  dropout: 0.1
+  kl_weight: 10.0
+  # Optimizer
+  optimizer_lr: 2e-5
+  optimizer_lr_backbone: 2e-5
+  optimizer_weight_decay: 2e-4
+  use_amp: true
+```
+</details>
+-----
+## 🚀 Evaluate (My Evaluation Mode)
+Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
+```bash
+python -m lerobot.scripts.lerobot_eval \
+  --policy.type act \
+  --policy.pretrained_path outputs/train/2025-12-02/00-28-32_pusht_ACT_PushT/checkpoints/last/pretrained_model \
+  --eval.n_episodes 50 \
+  --eval.batch_size 10 \
+  --env.type pusht \
+  --env.task PushT-v0
+```
+To evaluate this model locally, run the following command:
+python -m lerobot.scripts.lerobot_eval \
+  --policy.type act \
+  --policy.pretrained_path Lemon-03/pusht_ACT_PushT_test \
+  --eval.n_episodes 50 \
+  --eval.batch_size 10 \
+  --env.type pusht \
+  --env.task PushT-v0
+```