r2owb0 commited on
Commit
6a94aca
·
verified ·
1 Parent(s): ea971a0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +176 -0
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: lerobot
4
+ pipeline_tag: robotics
5
+ tags:
6
+ - robotics
7
+ - lerobot
8
+ - act
9
+ - imitation-learning
10
+ - so101
11
+ model_name: act
12
+ datasets: r2owb0/so101-DS1
13
+ base_model: lerobot/smolvla_base
14
+ ---
15
+
16
+ # ACT Model for SO101 Robot
17
+
18
+ This is an Action Chunking Transformer (ACT) model trained for the SO101 robot using LeRobot. The model was trained on demonstration data collected from teleoperation sessions.
19
+
20
+ ## Model Details
21
+
22
+ ### Architecture
23
+ - **Model Type**: Action Chunking Transformer (ACT)
24
+ - **Vision Backbone**: ResNet18 with ImageNet pretrained weights
25
+ - **Transformer Configuration**:
26
+ - Hidden dimension: 512
27
+ - Number of heads: 8
28
+ - Encoder layers: 4
29
+ - Decoder layers: 1
30
+ - Feedforward dimension: 3200
31
+ - **VAE**: Enabled with 32-dimensional latent space
32
+ - **Chunk Size**: 50 steps
33
+ - **Action Steps**: 15 steps per inference
34
+
35
+ ### Camera Setup
36
+ The model uses a **dual-camera setup** for robust perception:
37
+
38
+ 1. **Wrist Camera** (`observation.images.wrist`):
39
+ - Resolution: 240×320 pixels
40
+ - Position: Mounted on the robot's wrist
41
+ - Purpose: Provides close-up, detailed view of manipulation tasks
42
+ - Field of view: Narrow, focused on the immediate workspace
43
+
44
+ 2. **Top Camera** (`observation.images.top`):
45
+ - Resolution: 480×640 pixels
46
+ - Position: Mounted above the workspace
47
+ - Purpose: Provides broader context and overview of the environment
48
+ - Field of view: Wide, captures the entire workspace
49
+
50
+ ### Input/Output Specifications
51
+
52
+ **Inputs:**
53
+ - **Robot State**: 6-dimensional joint positions
54
+ - `shoulder_pan.pos`
55
+ - `shoulder_lift.pos`
56
+ - `elbow_flex.pos`
57
+ - `wrist_flex.pos`
58
+ - `wrist_roll.pos`
59
+ - `gripper.pos`
60
+ - **Wrist Camera**: RGB image (240×320×3)
61
+ - **Top Camera**: RGB image (480×640×3)
62
+
63
+ **Outputs:**
64
+ - **Actions**: 6-dimensional joint commands (same structure as state)
65
+
66
+ ## Training Details
67
+
68
+ ### Dataset
69
+ - **Source**: `r2owb0/so101-DS1`
70
+ - **Episodes**: 10 demonstration episodes
71
+ - **Total Frames**: 5,990 frames
72
+ - **Frame Rate**: 30 FPS
73
+ - **Robot Type**: SO101 follower robot
74
+
75
+ ### Training Configuration
76
+ - **Training Steps**: 25,000
77
+ - **Batch Size**: 4
78
+ - **Learning Rate**: 1e-5
79
+ - **Optimizer**: AdamW with weight decay 1e-4
80
+ - **Validation Split**: 10% of episodes
81
+ - **Seed**: 1000
82
+
83
+ ### Data Augmentation
84
+ The model was trained with comprehensive image augmentation:
85
+ - Brightness adjustment (0.8-1.2x)
86
+ - Contrast adjustment (0.8-1.2x)
87
+ - Saturation adjustment (0.5-1.5x)
88
+ - Hue adjustment (±0.05)
89
+ - Sharpness adjustment (0.5-1.5x)
90
+
91
+ ## Usage
92
+
93
+ ### Installation
94
+ ```bash
95
+ pip install lerobot
96
+ ```
97
+
98
+ ### Loading the Model
99
+ ```python
100
+ from lerobot.policies import ACTPolicy
101
+ from lerobot.configs.policies import ACTConfig
102
+
103
+ # Load the model
104
+ policy = ACTPolicy.from_pretrained("r2owb0/act1")
105
+ ```
106
+
107
+ ### Evaluation
108
+ ```bash
109
+ lerobot-eval \
110
+ --policy.path=r2owb0/act1 \
111
+ --env.type=your_env_type \
112
+ --eval.n_episodes=10 \
113
+ --eval.batch_size=10
114
+ ```
115
+
116
+ ### Inference
117
+ ```python
118
+ import torch
119
+
120
+ # Prepare observation
121
+ observation = {
122
+ "observation.state": torch.tensor([...]), # 6D robot state
123
+ "observation.images.wrist": torch.tensor([...]), # 240x320x3 RGB
124
+ "observation.images.top": torch.tensor([...]) # 480x640x3 RGB
125
+ }
126
+
127
+ # Get action
128
+ with torch.no_grad():
129
+ action = policy.select_action(observation)
130
+ ```
131
+
132
+ ## Hardware Requirements
133
+
134
+ ### Robot Setup
135
+ - **Robot**: SO101 follower robot
136
+ - **Cameras**:
137
+ - Wrist-mounted camera (240×320 resolution)
138
+ - Top-mounted camera (480×640 resolution)
139
+ - **Control**: 6-DOF arm with gripper
140
+
141
+ ### Computing Requirements
142
+ - **GPU**: CUDA-compatible GPU recommended
143
+ - **Memory**: At least 4GB GPU memory
144
+ - **Storage**: ~200MB for model weights
145
+
146
+ ## Performance Notes
147
+
148
+ - The model uses action chunking, predicting 50 steps ahead but executing 15 steps at a time
149
+ - Temporal ensembling is disabled for real-time inference
150
+ - The model expects normalized inputs (mean/std normalization)
151
+ - VAE is enabled for better representation learning
152
+
153
+ ## Limitations
154
+
155
+ - Trained on a specific robot configuration (SO101)
156
+ - Requires the exact camera setup described above
157
+ - Performance may vary with different lighting conditions
158
+ - Limited to the task domain covered in the training dataset
159
+
160
+ ## Citation
161
+
162
+ If you use this model in your research, please cite:
163
+
164
+ ```bibtex
165
+ @misc{r2owb0_act1,
166
+ author = {Robert},
167
+ title = {ACT Model for SO101 Robot},
168
+ year = {2024},
169
+ publisher = {Hugging Face},
170
+ url = {https://huggingface.co/r2owb0/act1}
171
+ }
172
+ ```
173
+
174
+ ## License
175
+
176
+ This model is licensed under the Apache 2.0 License.