ACT_PushT_test / README.md

Update README.md

f08d22b verified about 1 month ago

5.46 kB

	---

	datasets:
	- lerobot/pusht
	library_name: lerobot
	license: apache-2.0
	model_name: act
	pipeline_tag: robotics
	tags:
	- lerobot
	- robotics
	- act
	- pusht
	- imitation-learning
	- baseline

	---

	# 🤖 ACT for Push-T (Baseline Benchmark)

	[![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot)
	[![Task](https://img.shields.io/badge/Task-Push--T-blue)](https://huggingface.co/datasets/lerobot/pusht)
	[![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
	[![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0)

	## 🎯 Research Purpose

	Important Note: This model was trained primarily for academic comparison—evaluating the performance difference between ACT and Diffusion Policy algorithms under identical training conditions (using the `lerobot/pusht` dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for this specific manipulation task, not to train a highly successful practical model.

	> Summary: This model represents the ACT (Action Chunking with Transformers) baseline trained on the Push-T task. It serves as a comparative benchmark for our research on Diffusion Policies. Despite 200k steps of training, ACT struggled to model the multimodal action distribution required for high-precision alignment in this task.

	- 🧩 Task: Push-T (Simulated)
	- 🧠 Algorithm: [ACT](https://arxiv.org/abs/2304.13705) (Action Chunking with Transformers)
	- 🔄 Training Steps: 200,000
	- 🎓 Author: Graduate Student, UESTC (University of Electronic Science and Technology of China)

	---

	## 🔬 Benchmark Results (Baseline)

	This model establishes the baseline performance. Unlike Diffusion Policy, ACT tends to average out multimodal action possibilities, leading to "stiff" behavior or failure to perform fine-grained adjustments at the boundaries.

	### 📊 Evaluation Metrics (50 Episodes)

	\| Metric \| Value \| Interpretation \| Status \|
	\| :--- \| :---: \| :--- \| :---: \|
	\| Success Rate \| 0.0% \| Failed to meet the strict >95% overlap criteria. \| ❌ \|
	\| Avg Max Reward \| 0.51 \| Partially covers the target (~50%), but lacks precision. \| 🚧 \|
	\| Avg Sum Reward \| 55.48 \| Trajectories are valid but often stall or drift. \| 📉 \|

	> Analysis: While the model learned the general reaching and pushing motion (Reward > 0.5), it consistently failed the final stage of the task. This highlights ACT's limitation in handling tasks requiring high-precision correction from multimodal demonstrations compared to Generative Policies.

	---

	## ⚙️ Model Details

	\| Parameter \| Description \|
	\| :--- \| :--- \|
	\| Architecture \| ResNet18 (Backbone) + Transformer Encoder-Decoder \|
	\| Action Chunking \| 100 steps \|
	\| VAE Enabled \| Yes (Latent Dim: 32) \|
	\| Input \| Single Camera (84x84) + Agent Position \|

	---

	## 🔧 Training Configuration

	For reproducibility, here are the key parameters used during the training session.

	- Batch Size: 64
	- Optimizer: AdamW (`lr=2e-5`)
	- Scheduler: Constant
	- Vision: ResNet18 (Pretrained ImageNet)
	- Precision: Mixed Precision (AMP) enabled

	### Original Training Command (My Training Mode)

	```bash
	python -m lerobot.scripts.lerobot_train
	--config_path act_pusht.yaml
	--dataset.repo_id lerobot/pusht
	--job_name aloha_sim_insertion_human_ACT_PushT
	--wandb.enable true
	--policy.repo_id Lemon-03/ACT_PushT_test
	```

	### act_pusht.yaml

	<details>
	<summary>📄 <strong>Click to view full <code>act_pusht.yaml</code> configuration</strong></summary>

	```yaml
	# @package _global_

	# Basic Settings
	seed: 100000
	job_name: ACT-PushT
	steps: 200000
	eval_freq: 10000
	save_freq: 50000
	log_freq: 250
	batch_size: 64

	# Dataset
	dataset:
	repo_id: lerobot/pusht

	# Evaluation
	eval:
	n_episodes: 50
	batch_size: 8

	# Environment
	env:
	type: pusht
	task: PushT-v0
	fps: 10

	# Policy Configuration
	policy:
	type: act

	# Vision Backbone
	vision_backbone: resnet18
	pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1
	replace_final_stride_with_dilation: false

	# Transformer Params
	pre_norm: false
	dim_model: 512
	n_heads: 8
	dim_feedforward: 3200
	feedforward_activation: relu
	n_encoder_layers: 4
	n_decoder_layers: 1

	# VAE Params
	use_vae: true
	latent_dim: 32
	n_vae_encoder_layers: 4

	# Action Chunking
	chunk_size: 100
	n_action_steps: 100
	n_obs_steps: 1

	# Training & Loss
	dropout: 0.1
	kl_weight: 10.0

	# Optimizer
	optimizer_lr: 2e-5
	optimizer_lr_backbone: 2e-5
	optimizer_weight_decay: 2e-4

	use_amp: true
	```
	</details>

	-----

	## 🚀 Evaluate (My Evaluation Mode)

	Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:

	```bash
	python -m lerobot.scripts.lerobot_eval \
	--policy.type act \
	--policy.pretrained_path outputs/train/2025-12-02/00-28-32_pusht_ACT_PushT/checkpoints/last/pretrained_model \
	--eval.n_episodes 50 \
	--eval.batch_size 10 \
	--env.type pusht \
	--env.task PushT-v0
	```

	To evaluate this model locally, run the following command:

	```bash
	python -m lerobot.scripts.lerobot_eval \
	--policy.type act \
	--policy.pretrained_path Lemon-03/pusht_ACT_PushT_test \
	--eval.n_episodes 50 \
	--eval.batch_size 10 \
	--env.type pusht \
	--env.task PushT-v0
	```