Instructions to use CoRL2026-CSI/pi05_teleop_close_pot with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use CoRL2026-CSI/pi05_teleop_close_pot with LeRobot:
- Notebooks
- Google Colab
- Kaggle
Update model card: concise English version
Browse files
README.md
CHANGED
|
@@ -5,180 +5,75 @@ pipeline_tag: robotics
|
|
| 5 |
tags:
|
| 6 |
- lerobot
|
| 7 |
- robotics
|
| 8 |
-
- vla
|
| 9 |
-
- pi0
|
| 10 |
- pi05
|
| 11 |
- so101
|
| 12 |
-
- manipulation
|
| 13 |
- imitation-learning
|
| 14 |
-
- behavior-cloning
|
| 15 |
datasets:
|
| 16 |
- CoRL2026-CSI/SO101-teleop_close_pot_lid_100epi
|
| 17 |
base_model: lerobot/pi05_base
|
| 18 |
-
language:
|
| 19 |
-
- en
|
| 20 |
-
model-index:
|
| 21 |
-
- name: pi05_close_pot
|
| 22 |
-
results: []
|
| 23 |
---
|
| 24 |
|
| 25 |
# π0.5 — SO-101 `close_pot_lid`
|
| 26 |
|
| 27 |
-
`lerobot/pi05_base`
|
| 28 |
-
**냄비 뚜껑 닫기(`close_pot_lid`)** 단일 태스크에 대해 100 에피소드(57,173 프레임)
|
| 29 |
-
원격조작 데모로 파인튜닝한 π0.5 (PaliGemma-2B + Action Expert 300M) 정책입니다.
|
| 30 |
|
| 31 |
-
|
| 32 |
-
프레임워크: [LeRobot](https://github.com/huggingface/lerobot)
|
| 33 |
|
| 34 |
-
---
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|---|---|
|
| 40 |
-
| Architecture | π0.5 (PaliGemma-2B VLM + Gemma-300M action expert, flow-matching head) |
|
| 41 |
-
| Base checkpoint | [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base) |
|
| 42 |
-
| Action chunk size | 50 |
|
| 43 |
-
| Inference steps (flow-matching) | 10 |
|
| 44 |
-
| Image resolution | 224 × 224 |
|
| 45 |
-
| Cameras | `base_0_rgb`, `left_wrist_0_rgb`, `right_wrist_0_rgb` |
|
| 46 |
-
| State dim (padded) | 32 |
|
| 47 |
-
| Action dim (실효 / padded) | **6** / 32 |
|
| 48 |
-
| dtype | bfloat16 |
|
| 49 |
-
|
| 50 |
-
### 액션 / 카메라 매핑
|
| 51 |
-
|
| 52 |
-
데이터셋 → 정책 입력 키 rename:
|
| 53 |
-
|
| 54 |
-
```
|
| 55 |
-
observation.images.top → observation.images.base_0_rgb
|
| 56 |
-
observation.images.wrist → observation.images.left_wrist_0_rgb
|
| 57 |
-
```
|
| 58 |
-
|
| 59 |
-
> `right_wrist_0_rgb` 는 모델 입력 슬롯이지만 SO-101 단일팔에서는 빈 카메라로 처리됩니다.
|
| 60 |
-
|
| 61 |
-
액션 피처(6 DoF, SO-101):
|
| 62 |
|
|
|
|
| 63 |
```
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
elbow_flex.pos
|
| 67 |
-
wrist_flex.pos
|
| 68 |
-
wrist_roll.pos
|
| 69 |
-
gripper.pos
|
| 70 |
```
|
|
|
|
| 71 |
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
---
|
| 75 |
-
|
| 76 |
-
## 학습 데이터
|
| 77 |
|
| 78 |
-
|
| 79 |
-
- **에피소드**: 100
|
| 80 |
-
- **총 프레임**: 57,173
|
| 81 |
-
- **로봇 / 태스크**: SO-101, 냄비 뚜껑을 잡아 본체 위에 닫기
|
| 82 |
-
- **수집 방식**: human teleoperation
|
| 83 |
-
- **카메라**: top + wrist (둘 다 224 × 224 으로 리사이즈)
|
| 84 |
|
| 85 |
-
|
| 86 |
|
| 87 |
-
|
| 88 |
|
| 89 |
-
|
|
| 90 |
|---|---|
|
| 91 |
-
|
|
| 92 |
-
|
|
| 93 |
-
| ColorJitter saturation | `[0.5, 1.5]` |
|
| 94 |
-
| ColorJitter hue | `[-0.05, 0.05]` |
|
| 95 |
-
| SharpnessJitter | `[0.5, 1.5]` |
|
| 96 |
-
| RandomAffine | degrees `[-5, 5]`, translate `[0.05, 0.05]` |
|
| 97 |
-
|
| 98 |
-
---
|
| 99 |
-
|
| 100 |
-
## 학습 설정
|
| 101 |
-
|
| 102 |
-
| 항목 | 값 |
|
| 103 |
-
|---|---|
|
| 104 |
-
| Hardware | 4 × GPU (DDP via 🤗 Accelerate) |
|
| 105 |
-
| Per-device batch size | 32 |
|
| 106 |
| Gradient accumulation | 2 |
|
| 107 |
-
|
|
| 108 |
-
| Steps | 11,200 |
|
| 109 |
-
|
|
| 110 |
-
|
|
| 111 |
-
| Peak LR | 2.5e-5 |
|
| 112 |
-
| Decay LR | 2.5e-6 |
|
| 113 |
-
| Scheduler | cosine decay, warmup 1000, decay 30000 |
|
| 114 |
-
| Grad clip | 1.0 |
|
| 115 |
-
| Mixed precision | none (bf16 native) |
|
| 116 |
| Gradient checkpointing | on |
|
| 117 |
-
| `
|
| 118 |
-
| `freeze_vision_encoder` | off |
|
| 119 |
-
| `train_expert_only` | off |
|
| 120 |
| Seed | 1000 |
|
| 121 |
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
---
|
| 125 |
|
| 126 |
-
##
|
| 127 |
-
|
| 128 |
-
### 1. 모델 로드
|
| 129 |
|
| 130 |
```python
|
| 131 |
from lerobot.policies.pi05.modeling_pi05 import PI05Policy
|
| 132 |
|
| 133 |
-
policy = PI05Policy.from_pretrained("CoRL2026-CSI/pi05_close_pot")
|
| 134 |
-
policy.eval().to("cuda")
|
| 135 |
```
|
| 136 |
|
| 137 |
-
### 2. 추론 (전처리/후처리 파이프라인 포함)
|
| 138 |
-
|
| 139 |
-
LeRobot의 표준 inference 스크립트를 사용하세요:
|
| 140 |
-
|
| 141 |
```bash
|
| 142 |
-
lerobot-eval
|
| 143 |
-
--policy.path=CoRL2026-CSI/pi05_close_pot \
|
| 144 |
-
--env.type=<your_env> \
|
| 145 |
-
--eval.n_episodes=20
|
| 146 |
```
|
| 147 |
|
| 148 |
-
|
| 149 |
-
[`scripts/infer_smolvla.py`](https://github.com/HyeonseokE/train_with_lerobot/blob/main/scripts/infer_smolvla.py) 와 동일한 패턴을
|
| 150 |
-
참조해 `pi05` 로 교체해 사용할 수 있습니다.
|
| 151 |
-
|
| 152 |
-
### 3. 카메라 키 주의
|
| 153 |
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
---
|
| 159 |
|
| 160 |
-
##
|
| 161 |
|
| 162 |
-
|
| 163 |
-
- **단일팔(SO-101) 전제**: `right_wrist_0_rgb` 는 빈 카메라로 학습되어 다른 양팔 셋업에서는 재학습 필요.
|
| 164 |
-
- **카메라 위치/조명 민감도**: 100 에피소드 + image aug 만으로 학습 — 큰 도메인 시프트에서는 성능 저하 가능.
|
| 165 |
-
- **정량 평가 미수록**: 본 카드에는 실로봇 / 시뮬 success rate 가 포함되어 있지 않습니다. 사용 전 자체 평가 권장.
|
| 166 |
-
|
| 167 |
-
---
|
| 168 |
-
|
| 169 |
-
## 라이선스
|
| 170 |
-
|
| 171 |
-
Apache 2.0 (베이스 모델 [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base) 라이선스를 따릅니다).
|
| 172 |
-
|
| 173 |
-
## 인용
|
| 174 |
-
|
| 175 |
-
LeRobot 프로젝트:
|
| 176 |
-
|
| 177 |
-
```bibtex
|
| 178 |
-
@misc{cadene2024lerobot,
|
| 179 |
-
author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Wolf, Thomas},
|
| 180 |
-
title = {LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
|
| 181 |
-
howpublished = "\url{https://github.com/huggingface/lerobot}",
|
| 182 |
-
year = {2024}
|
| 183 |
-
}
|
| 184 |
-
```
|
|
|
|
| 5 |
tags:
|
| 6 |
- lerobot
|
| 7 |
- robotics
|
|
|
|
|
|
|
| 8 |
- pi05
|
| 9 |
- so101
|
|
|
|
| 10 |
- imitation-learning
|
|
|
|
| 11 |
datasets:
|
| 12 |
- CoRL2026-CSI/SO101-teleop_close_pot_lid_100epi
|
| 13 |
base_model: lerobot/pi05_base
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
# π0.5 — SO-101 `close_pot_lid`
|
| 17 |
|
| 18 |
+
Fine-tuned [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base) on 100 teleop episodes of the SO-101 `close_pot_lid` task.
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
## Model
|
|
|
|
| 21 |
|
| 22 |
+
- **Architecture**: π0.5 (PaliGemma-2B VLM + Gemma-300M action expert, flow matching, 10 inference steps)
|
| 23 |
+
- **Cameras**: `base_0_rgb`, `left_wrist_0_rgb`, `right_wrist_0_rgb` (224×224)
|
| 24 |
+
- **State / Action dim**: 32 (padded) / 6 (SO-101)
|
| 25 |
+
- **Action chunk**: 50
|
| 26 |
+
- **dtype**: bfloat16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
+
Camera key rename (dataset → policy):
|
| 29 |
```
|
| 30 |
+
observation.images.top → observation.images.base_0_rgb
|
| 31 |
+
observation.images.wrist → observation.images.left_wrist_0_rgb
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
```
|
| 33 |
+
`right_wrist_0_rgb` is an empty camera slot for this single-arm setup.
|
| 34 |
|
| 35 |
+
Action features (SO-101): `shoulder_pan, shoulder_lift, elbow_flex, wrist_flex, wrist_roll, gripper` (`.pos`).
|
| 36 |
+
Normalization: `ACTION/STATE = MEAN_STD`, `VISUAL = IDENTITY`.
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
+
## Data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
[`CoRL2026-CSI/SO101-teleop_close_pot_lid_100epi`](https://huggingface.co/datasets/CoRL2026-CSI/SO101-teleop_close_pot_lid_100epi) — 100 episodes, 57,173 frames, human teleop.
|
| 41 |
|
| 42 |
+
## Training
|
| 43 |
|
| 44 |
+
| | |
|
| 45 |
|---|---|
|
| 46 |
+
| Hardware | 4 × GPU (DDP, 🤗 Accelerate) |
|
| 47 |
+
| Per-device batch | 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
| Gradient accumulation | 2 |
|
| 49 |
+
| Effective global batch | 256 |
|
| 50 |
+
| Steps | 11,200 (~50 epochs) |
|
| 51 |
+
| Optimizer | AdamW, β=(0.9, 0.95), wd=0.01, grad clip 1.0 |
|
| 52 |
+
| LR | cosine decay, peak 2.5e-5 → 2.5e-6, warmup 1000, decay 30000 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
| Gradient checkpointing | on |
|
| 54 |
+
| Image aug | ColorJitter (brightness/contrast/saturation/hue), SharpnessJitter, RandomAffine — `max_num=3`, random order |
|
|
|
|
|
|
|
| 55 |
| Seed | 1000 |
|
| 56 |
|
| 57 |
+
Training script: [`scripts/train_pi05_close_pot_lid.sh`](https://github.com/HyeonseokE/train_with_lerobot/blob/main/scripts/train_pi05_close_pot_lid.sh).
|
|
|
|
|
|
|
| 58 |
|
| 59 |
+
## Usage
|
|
|
|
|
|
|
| 60 |
|
| 61 |
```python
|
| 62 |
from lerobot.policies.pi05.modeling_pi05 import PI05Policy
|
| 63 |
|
| 64 |
+
policy = PI05Policy.from_pretrained("CoRL2026-CSI/pi05_close_pot").to("cuda").eval()
|
|
|
|
| 65 |
```
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
```bash
|
| 68 |
+
lerobot-eval --policy.path=CoRL2026-CSI/pi05_close_pot --env.type=<env> --eval.n_episodes=20
|
|
|
|
|
|
|
|
|
|
| 69 |
```
|
| 70 |
|
| 71 |
+
## Limitations
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
+
- Single task, single seed; no quantitative success rate reported here.
|
| 74 |
+
- Trained on a single-arm SO-101; the right-wrist camera slot is empty.
|
| 75 |
+
- 100 episodes only — sensitive to camera/lighting domain shift.
|
|
|
|
|
|
|
| 76 |
|
| 77 |
+
## License
|
| 78 |
|
| 79 |
+
Apache 2.0 (inherits from [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base)).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|