Lemon-03 commited on
Commit
fc3b271
Β·
verified Β·
1 Parent(s): 47c9190

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +179 -0
README.md ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - lerobot/pusht
4
+ library_name: lerobot
5
+ license: apache-2.0
6
+ model_name: act
7
+ pipeline_tag: robotics
8
+ tags:
9
+ - lerobot
10
+ - robotics
11
+ - act
12
+ - pusht
13
+ - imitation-learning
14
+ - baseline
15
+ ---
16
+
17
+ # πŸ€– ACT for Push-T (Baseline Benchmark)
18
+
19
+ [![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot)
20
+ [![Task](https://img.shields.io/badge/Task-Push--T-blue)](https://huggingface.co/datasets/lerobot/pusht)
21
+ [![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
22
+ [![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0)
23
+
24
+ > **Summary:** This model represents the **ACT (Action Chunking with Transformers)** baseline trained on the Push-T task. It serves as a comparative benchmark for our research on Diffusion Policies. Despite 200k steps of training, ACT struggled to model the multimodal action distribution required for high-precision alignment in this task.
25
+
26
+ - **🧩 Task**: Push-T (Simulated)
27
+ - **🧠 Algorithm**: [ACT](https://arxiv.org/abs/2304.13705) (Action Chunking with Transformers)
28
+ - **πŸ”„ Training Steps**: 200,000
29
+ - **πŸŽ“ Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
30
+
31
+ ---
32
+
33
+ ## πŸ”¬ Benchmark Results (Baseline)
34
+
35
+ This model establishes the baseline performance. Unlike Diffusion Policy, ACT tends to average out multimodal action possibilities, leading to "stiff" behavior or failure to perform fine-grained adjustments at the boundaries.
36
+
37
+ ### πŸ“Š Evaluation Metrics (50 Episodes)
38
+
39
+ | Metric | Value | Interpretation | Status |
40
+ | :--- | :---: | :--- | :---: |
41
+ | **Success Rate** | **0.0%** | Failed to meet the strict >95% overlap criteria. | ❌ |
42
+ | **Avg Max Reward** | **0.51** | Partially covers the target (~50%), but lacks precision. | 🚧 |
43
+ | **Avg Sum Reward** | **55.48** | Trajectories are valid but often stall or drift. | πŸ“‰ |
44
+
45
+ > **Analysis:** While the model learned the general reaching and pushing motion (Reward > 0.5), it consistently failed the final stage of the task. This highlights ACT's limitation in handling tasks requiring high-precision correction from multimodal demonstrations compared to Generative Policies.
46
+
47
+ ---
48
+
49
+ ## βš™οΈ Model Details
50
+
51
+ | Parameter | Description |
52
+ | :--- | :--- |
53
+ | **Architecture** | ResNet18 (Backbone) + Transformer Encoder-Decoder |
54
+ | **Action Chunking** | 100 steps |
55
+ | **VAE Enabled** | Yes (Latent Dim: 32) |
56
+ | **Input** | Single Camera (84x84) + Agent Position |
57
+
58
+ ---
59
+
60
+ ## πŸ”§ Training Configuration
61
+
62
+ For reproducibility, here are the key parameters used during the training session.
63
+
64
+ - **Batch Size**: 64
65
+ - **Optimizer**: AdamW (`lr=2e-5`)
66
+ - **Scheduler**: Constant
67
+ - **Vision**: ResNet18 (Pretrained ImageNet)
68
+ - **Precision**: Mixed Precision (AMP) enabled
69
+
70
+ ### Original Training Command (My Resume Mode)
71
+
72
+ ```bash
73
+ python -m lerobot.scripts.lerobot_train
74
+ --config_path act_pusht.yaml
75
+ --dataset.repo_id lerobot/pusht
76
+ --job_name aloha_sim_insertion_human_ACT_PushT
77
+ --wandb.enable true
78
+ --policy.repo_id Lemon-03/ACT_PushT_test
79
+ ```
80
+
81
+ ### act_pusht.yaml
82
+
83
+ <details>
84
+ <summary>πŸ“„ <strong>Click to view full <code>act_pusht.yaml</code> configuration</strong></summary>
85
+
86
+ ```yaml
87
+ # @package _global_
88
+
89
+ # Basic Settings
90
+ seed: 100000
91
+ job_name: ACT-PushT
92
+ steps: 200000
93
+ eval_freq: 10000
94
+ save_freq: 50000
95
+ log_freq: 250
96
+ batch_size: 64
97
+
98
+ # Dataset
99
+ dataset:
100
+ repo_id: lerobot/pusht
101
+
102
+ # Evaluation
103
+ eval:
104
+ n_episodes: 50
105
+ batch_size: 8
106
+
107
+ # Environment
108
+ env:
109
+ type: pusht
110
+ task: PushT-v0
111
+ fps: 10
112
+
113
+ # Policy Configuration
114
+ policy:
115
+ type: act
116
+
117
+ # Vision Backbone
118
+ vision_backbone: resnet18
119
+ pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1
120
+ replace_final_stride_with_dilation: false
121
+
122
+ # Transformer Params
123
+ pre_norm: false
124
+ dim_model: 512
125
+ n_heads: 8
126
+ dim_feedforward: 3200
127
+ feedforward_activation: relu
128
+ n_encoder_layers: 4
129
+ n_decoder_layers: 1
130
+
131
+ # VAE Params
132
+ use_vae: true
133
+ latent_dim: 32
134
+ n_vae_encoder_layers: 4
135
+
136
+ # Action Chunking
137
+ chunk_size: 100
138
+ n_action_steps: 100
139
+ n_obs_steps: 1
140
+
141
+ # Training & Loss
142
+ dropout: 0.1
143
+ kl_weight: 10.0
144
+
145
+ # Optimizer
146
+ optimizer_lr: 2e-5
147
+ optimizer_lr_backbone: 2e-5
148
+ optimizer_weight_decay: 2e-4
149
+
150
+ use_amp: true
151
+ ```
152
+ </details>
153
+
154
+ -----
155
+
156
+ ## πŸš€ Evaluate (My Evaluation Mode)
157
+
158
+ Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
159
+
160
+ ```bash
161
+ python -m lerobot.scripts.lerobot_eval \
162
+ --policy.type act \
163
+ --policy.pretrained_path outputs/train/2025-12-02/00-28-32_pusht_ACT_PushT/checkpoints/last/pretrained_model \
164
+ --eval.n_episodes 50 \
165
+ --eval.batch_size 10 \
166
+ --env.type pusht \
167
+ --env.task PushT-v0
168
+ ```
169
+
170
+ To evaluate this model locally, run the following command:
171
+
172
+ python -m lerobot.scripts.lerobot_eval \
173
+ --policy.type act \
174
+ --policy.pretrained_path Lemon-03/pusht_ACT_PushT_test \
175
+ --eval.n_episodes 50 \
176
+ --eval.batch_size 10 \
177
+ --env.type pusht \
178
+ --env.task PushT-v0
179
+ ```