--- datasets: - lerobot/pusht library_name: lerobot license: apache-2.0 model_name: act pipeline_tag: robotics tags: - lerobot - robotics - act - pusht - imitation-learning - baseline --- # πŸ€– ACT for Push-T (Baseline Benchmark) [![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot) [![Task](https://img.shields.io/badge/Task-Push--T-blue)](https://huggingface.co/datasets/lerobot/pusht) [![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/) [![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0) ## 🎯 Research Purpose **Important Note:** This model was trained primarily for **academic comparison**β€”evaluating the performance difference between **ACT** and **Diffusion Policy** algorithms under identical training conditions (using the `lerobot/pusht` dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for this specific manipulation task, **not to train a highly successful practical model**. > **Summary:** This model represents the **ACT (Action Chunking with Transformers)** baseline trained on the Push-T task. It serves as a comparative benchmark for our research on Diffusion Policies. Despite 200k steps of training, ACT struggled to model the multimodal action distribution required for high-precision alignment in this task. - **🧩 Task**: Push-T (Simulated) - **🧠 Algorithm**: [ACT](https://arxiv.org/abs/2304.13705) (Action Chunking with Transformers) - **πŸ”„ Training Steps**: 200,000 - **πŸŽ“ Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China) --- ## πŸ”¬ Benchmark Results (Baseline) This model establishes the baseline performance. Unlike Diffusion Policy, ACT tends to average out multimodal action possibilities, leading to "stiff" behavior or failure to perform fine-grained adjustments at the boundaries. ### πŸ“Š Evaluation Metrics (50 Episodes) | Metric | Value | Interpretation | Status | | :--- | :---: | :--- | :---: | | **Success Rate** | **0.0%** | Failed to meet the strict >95% overlap criteria. | ❌ | | **Avg Max Reward** | **0.51** | Partially covers the target (~50%), but lacks precision. | 🚧 | | **Avg Sum Reward** | **55.48** | Trajectories are valid but often stall or drift. | πŸ“‰ | > **Analysis:** While the model learned the general reaching and pushing motion (Reward > 0.5), it consistently failed the final stage of the task. This highlights ACT's limitation in handling tasks requiring high-precision correction from multimodal demonstrations compared to Generative Policies. --- ## βš™οΈ Model Details | Parameter | Description | | :--- | :--- | | **Architecture** | ResNet18 (Backbone) + Transformer Encoder-Decoder | | **Action Chunking** | 100 steps | | **VAE Enabled** | Yes (Latent Dim: 32) | | **Input** | Single Camera (84x84) + Agent Position | --- ## πŸ”§ Training Configuration For reproducibility, here are the key parameters used during the training session. - **Batch Size**: 64 - **Optimizer**: AdamW (`lr=2e-5`) - **Scheduler**: Constant - **Vision**: ResNet18 (Pretrained ImageNet) - **Precision**: Mixed Precision (AMP) enabled ### Original Training Command (My Training Mode) ```bash python -m lerobot.scripts.lerobot_train --config_path act_pusht.yaml --dataset.repo_id lerobot/pusht --job_name aloha_sim_insertion_human_ACT_PushT --wandb.enable true --policy.repo_id Lemon-03/ACT_PushT_test ``` ### act_pusht.yaml
πŸ“„ Click to view full act_pusht.yaml configuration ```yaml # @package _global_ # Basic Settings seed: 100000 job_name: ACT-PushT steps: 200000 eval_freq: 10000 save_freq: 50000 log_freq: 250 batch_size: 64 # Dataset dataset: repo_id: lerobot/pusht # Evaluation eval: n_episodes: 50 batch_size: 8 # Environment env: type: pusht task: PushT-v0 fps: 10 # Policy Configuration policy: type: act # Vision Backbone vision_backbone: resnet18 pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1 replace_final_stride_with_dilation: false # Transformer Params pre_norm: false dim_model: 512 n_heads: 8 dim_feedforward: 3200 feedforward_activation: relu n_encoder_layers: 4 n_decoder_layers: 1 # VAE Params use_vae: true latent_dim: 32 n_vae_encoder_layers: 4 # Action Chunking chunk_size: 100 n_action_steps: 100 n_obs_steps: 1 # Training & Loss dropout: 0.1 kl_weight: 10.0 # Optimizer optimizer_lr: 2e-5 optimizer_lr_backbone: 2e-5 optimizer_weight_decay: 2e-4 use_amp: true ```
----- ## πŸš€ Evaluate (My Evaluation Mode) Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos: ```bash python -m lerobot.scripts.lerobot_eval \ --policy.type act \ --policy.pretrained_path outputs/train/2025-12-02/00-28-32_pusht_ACT_PushT/checkpoints/last/pretrained_model \ --eval.n_episodes 50 \ --eval.batch_size 10 \ --env.type pusht \ --env.task PushT-v0 ``` To evaluate this model locally, run the following command: ```bash python -m lerobot.scripts.lerobot_eval \ --policy.type act \ --policy.pretrained_path Lemon-03/pusht_ACT_PushT_test \ --eval.n_episodes 50 \ --eval.batch_size 10 \ --env.type pusht \ --env.task PushT-v0 ```