gpudad
/

act_so101_pick_cube

@@ -16,24 +16,53 @@ pipeline_tag: robotics
 This is an Action Chunking Transformer (ACT) model trained on the SO-101 robot arm for a cube picking task.
 ![Environment Preview](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked/resolve/main/camera_angles.png)
 ## Model Details
-- **Architecture**: ACT (Action Chunking Transformer)
-- **Vision Backbone**: ResNet18
-- **Training Steps**: 500,000
-- **Chunk Size**: 100
-- **N Action Steps**: 1 (with temporal ensembling)
-- **Temporal Ensemble Coefficient**: 0.01
-- **KL Weight**: 10.0
-- **Batch Size**: 16
-- **Learning Rate**: 3e-5
 ## Training Dataset
 Trained on [gpudad/so101_pick_cube_chunked](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked) - a chunked version of the SO-101 pick cube dataset with episode-level video files for efficient loading.
 ## Camera Views
 The model uses 3 camera inputs:
@@ -45,8 +74,8 @@ The model uses 3 camera inputs:
 ```bash
 python -m roboport.train act \
-  /root/datasets/so101_pick_cube_chunked \
-  -o /root/outputs/act_pick_cube_chunked \
   --steps 500000 \
   --chunk-size 100 \
   --n-action-steps 1 \
@@ -62,9 +91,13 @@ python -m roboport.train act \
 ## Usage
 ```python
-from lerobot.common.policies.act.modeling_act import ACTPolicy
 policy = ACTPolicy.from_pretrained("gpudad/act_so101_pick_cube")
 ```
 ## Framework

 This is an Action Chunking Transformer (ACT) model trained on the SO-101 robot arm for a cube picking task.
+## Demo
+![Model Evaluation](https://huggingface.co/gpudad/act_so101_pick_cube/resolve/main/act_eval_500k.gif)
+*Visualization showing ground truth (green) vs predicted actions (blue) with mean absolute error per frame.*
+## Environment
 ![Environment Preview](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked/resolve/main/camera_angles.png)
 ## Model Details
+| Parameter | Value |
+|-----------|-------|
+| Architecture | ACT (Action Chunking Transformer) |
+| Vision Backbone | ResNet18 |
+| Training Steps | 500,000 |
+| Chunk Size | 100 |
+| N Action Steps | 1 (with temporal ensembling) |
+| Temporal Ensemble Coeff | 0.01 |
+| KL Weight | 10.0 |
+| Batch Size | 16 |
+| Learning Rate | 3e-5 |
+| Parameters | 51.6M |
+## Evaluation Metrics
+Evaluated on a sample episode from the training set:
+| Joint | MAE | MSE |
+|-------|-----|-----|
+| Joint 0 | 0.0374 | 0.0034 |
+| Joint 1 | 0.0342 | 0.0042 |
+| Joint 2 | 0.0394 | 0.0025 |
+| Joint 3 | 0.0216 | 0.0011 |
+| Joint 4 | 0.0264 | 0.0009 |
+| Joint 5 (gripper) | 0.0020 | 0.00001 |
+| **Overall** | **0.0268** | **0.0020** |
 ## Training Dataset
 Trained on [gpudad/so101_pick_cube_chunked](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked) - a chunked version of the SO-101 pick cube dataset with episode-level video files for efficient loading.
+- ~11k episodes
+- 3 camera views (front, overhead, wrist)
+- 30 FPS
 ## Camera Views
 The model uses 3 camera inputs:
 ```bash
 python -m roboport.train act \
+  /path/to/so101_pick_cube_chunked \
+  -o /path/to/output \
   --steps 500000 \
   --chunk-size 100 \
   --n-action-steps 1 \
 ## Usage
 ```python
+from lerobot.policies.act.modeling_act import ACTPolicy
 policy = ACTPolicy.from_pretrained("gpudad/act_so101_pick_cube")
+policy.eval()
+# Run inference
+action = policy.select_action(observation)
 ```
 ## Framework