r2owb0
/

act1

+---
+license: apache-2.0
+library_name: lerobot
+pipeline_tag: robotics
+tags:
+- robotics
+- lerobot
+- act
+- imitation-learning
+- so101
+model_name: act
+datasets: r2owb0/so101-DS1
+base_model: lerobot/smolvla_base
+---
+# ACT Model for SO101 Robot
+This is an Action Chunking Transformer (ACT) model trained for the SO101 robot using LeRobot. The model was trained on demonstration data collected from teleoperation sessions.
+## Model Details
+### Architecture
+- **Model Type**: Action Chunking Transformer (ACT)
+- **Vision Backbone**: ResNet18 with ImageNet pretrained weights
+- **Transformer Configuration**:
+  - Hidden dimension: 512
+  - Number of heads: 8
+  - Encoder layers: 4
+  - Decoder layers: 1
+  - Feedforward dimension: 3200
+- **VAE**: Enabled with 32-dimensional latent space
+- **Chunk Size**: 50 steps
+- **Action Steps**: 15 steps per inference
+### Camera Setup
+The model uses a **dual-camera setup** for robust perception:
+1. **Wrist Camera** (`observation.images.wrist`):
+   - Resolution: 240×320 pixels
+   - Position: Mounted on the robot's wrist
+   - Purpose: Provides close-up, detailed view of manipulation tasks
+   - Field of view: Narrow, focused on the immediate workspace
+2. **Top Camera** (`observation.images.top`):
+   - Resolution: 480×640 pixels
+   - Position: Mounted above the workspace
+   - Purpose: Provides broader context and overview of the environment
+   - Field of view: Wide, captures the entire workspace
+### Input/Output Specifications
+**Inputs:**
+- **Robot State**: 6-dimensional joint positions
+  - `shoulder_pan.pos`
+  - `shoulder_lift.pos`
+  - `elbow_flex.pos`
+  - `wrist_flex.pos`
+  - `wrist_roll.pos`
+  - `gripper.pos`
+- **Wrist Camera**: RGB image (240×320×3)
+- **Top Camera**: RGB image (480×640×3)
+**Outputs:**
+- **Actions**: 6-dimensional joint commands (same structure as state)
+## Training Details
+### Dataset
+- **Source**: `r2owb0/so101-DS1`
+- **Episodes**: 10 demonstration episodes
+- **Total Frames**: 5,990 frames
+- **Frame Rate**: 30 FPS
+- **Robot Type**: SO101 follower robot
+### Training Configuration
+- **Training Steps**: 25,000
+- **Batch Size**: 4
+- **Learning Rate**: 1e-5
+- **Optimizer**: AdamW with weight decay 1e-4
+- **Validation Split**: 10% of episodes
+- **Seed**: 1000
+### Data Augmentation
+The model was trained with comprehensive image augmentation:
+- Brightness adjustment (0.8-1.2x)
+- Contrast adjustment (0.8-1.2x)
+- Saturation adjustment (0.5-1.5x)
+- Hue adjustment (±0.05)
+- Sharpness adjustment (0.5-1.5x)
+## Usage
+### Installation
+```bash
+pip install lerobot
+```
+### Loading the Model
+```python
+from lerobot.policies import ACTPolicy
+from lerobot.configs.policies import ACTConfig
+# Load the model
+policy = ACTPolicy.from_pretrained("r2owb0/act1")
+```
+### Evaluation
+```bash
+lerobot-eval \
+    --policy.path=r2owb0/act1 \
+    --env.type=your_env_type \
+    --eval.n_episodes=10 \
+    --eval.batch_size=10
+```
+### Inference
+```python
+import torch
+# Prepare observation
+observation = {
+    "observation.state": torch.tensor([...]),  # 6D robot state
+    "observation.images.wrist": torch.tensor([...]),  # 240x320x3 RGB
+    "observation.images.top": torch.tensor([...])     # 480x640x3 RGB
+}
+# Get action
+with torch.no_grad():
+    action = policy.select_action(observation)
+```
+## Hardware Requirements
+### Robot Setup
+- **Robot**: SO101 follower robot
+- **Cameras**:
+  - Wrist-mounted camera (240×320 resolution)
+  - Top-mounted camera (480×640 resolution)
+- **Control**: 6-DOF arm with gripper
+### Computing Requirements
+- **GPU**: CUDA-compatible GPU recommended
+- **Memory**: At least 4GB GPU memory
+- **Storage**: ~200MB for model weights
+## Performance Notes
+- The model uses action chunking, predicting 50 steps ahead but executing 15 steps at a time
+- Temporal ensembling is disabled for real-time inference
+- The model expects normalized inputs (mean/std normalization)
+- VAE is enabled for better representation learning
+## Limitations
+- Trained on a specific robot configuration (SO101)
+- Requires the exact camera setup described above
+- Performance may vary with different lighting conditions
+- Limited to the task domain covered in the training dataset
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{r2owb0_act1,
+  author = {Robert},
+  title = {ACT Model for SO101 Robot},
+  year = {2024},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/r2owb0/act1}
+}
+```
+## License
+This model is licensed under the Apache 2.0 License.