gpudad commited on
Commit
d42372e
·
verified ·
1 Parent(s): 116d6b4

Update README with evaluation metrics and GIF

Browse files
Files changed (1) hide show
  1. README.md +45 -12
README.md CHANGED
@@ -16,24 +16,53 @@ pipeline_tag: robotics
16
 
17
  This is an Action Chunking Transformer (ACT) model trained on the SO-101 robot arm for a cube picking task.
18
 
 
 
 
 
 
 
 
 
19
  ![Environment Preview](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked/resolve/main/camera_angles.png)
20
 
21
  ## Model Details
22
 
23
- - **Architecture**: ACT (Action Chunking Transformer)
24
- - **Vision Backbone**: ResNet18
25
- - **Training Steps**: 500,000
26
- - **Chunk Size**: 100
27
- - **N Action Steps**: 1 (with temporal ensembling)
28
- - **Temporal Ensemble Coefficient**: 0.01
29
- - **KL Weight**: 10.0
30
- - **Batch Size**: 16
31
- - **Learning Rate**: 3e-5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ## Training Dataset
34
 
35
  Trained on [gpudad/so101_pick_cube_chunked](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked) - a chunked version of the SO-101 pick cube dataset with episode-level video files for efficient loading.
36
 
 
 
 
 
37
  ## Camera Views
38
 
39
  The model uses 3 camera inputs:
@@ -45,8 +74,8 @@ The model uses 3 camera inputs:
45
 
46
  ```bash
47
  python -m roboport.train act \
48
- /root/datasets/so101_pick_cube_chunked \
49
- -o /root/outputs/act_pick_cube_chunked \
50
  --steps 500000 \
51
  --chunk-size 100 \
52
  --n-action-steps 1 \
@@ -62,9 +91,13 @@ python -m roboport.train act \
62
  ## Usage
63
 
64
  ```python
65
- from lerobot.common.policies.act.modeling_act import ACTPolicy
66
 
67
  policy = ACTPolicy.from_pretrained("gpudad/act_so101_pick_cube")
 
 
 
 
68
  ```
69
 
70
  ## Framework
 
16
 
17
  This is an Action Chunking Transformer (ACT) model trained on the SO-101 robot arm for a cube picking task.
18
 
19
+ ## Demo
20
+
21
+ ![Model Evaluation](https://huggingface.co/gpudad/act_so101_pick_cube/resolve/main/act_eval_500k.gif)
22
+
23
+ *Visualization showing ground truth (green) vs predicted actions (blue) with mean absolute error per frame.*
24
+
25
+ ## Environment
26
+
27
  ![Environment Preview](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked/resolve/main/camera_angles.png)
28
 
29
  ## Model Details
30
 
31
+ | Parameter | Value |
32
+ |-----------|-------|
33
+ | Architecture | ACT (Action Chunking Transformer) |
34
+ | Vision Backbone | ResNet18 |
35
+ | Training Steps | 500,000 |
36
+ | Chunk Size | 100 |
37
+ | N Action Steps | 1 (with temporal ensembling) |
38
+ | Temporal Ensemble Coeff | 0.01 |
39
+ | KL Weight | 10.0 |
40
+ | Batch Size | 16 |
41
+ | Learning Rate | 3e-5 |
42
+ | Parameters | 51.6M |
43
+
44
+ ## Evaluation Metrics
45
+
46
+ Evaluated on a sample episode from the training set:
47
+
48
+ | Joint | MAE | MSE |
49
+ |-------|-----|-----|
50
+ | Joint 0 | 0.0374 | 0.0034 |
51
+ | Joint 1 | 0.0342 | 0.0042 |
52
+ | Joint 2 | 0.0394 | 0.0025 |
53
+ | Joint 3 | 0.0216 | 0.0011 |
54
+ | Joint 4 | 0.0264 | 0.0009 |
55
+ | Joint 5 (gripper) | 0.0020 | 0.00001 |
56
+ | **Overall** | **0.0268** | **0.0020** |
57
 
58
  ## Training Dataset
59
 
60
  Trained on [gpudad/so101_pick_cube_chunked](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked) - a chunked version of the SO-101 pick cube dataset with episode-level video files for efficient loading.
61
 
62
+ - ~11k episodes
63
+ - 3 camera views (front, overhead, wrist)
64
+ - 30 FPS
65
+
66
  ## Camera Views
67
 
68
  The model uses 3 camera inputs:
 
74
 
75
  ```bash
76
  python -m roboport.train act \
77
+ /path/to/so101_pick_cube_chunked \
78
+ -o /path/to/output \
79
  --steps 500000 \
80
  --chunk-size 100 \
81
  --n-action-steps 1 \
 
91
  ## Usage
92
 
93
  ```python
94
+ from lerobot.policies.act.modeling_act import ACTPolicy
95
 
96
  policy = ACTPolicy.from_pretrained("gpudad/act_so101_pick_cube")
97
+ policy.eval()
98
+
99
+ # Run inference
100
+ action = policy.select_action(observation)
101
  ```
102
 
103
  ## Framework