File size: 4,208 Bytes
9c83b24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
language:
- en
library_name: lerobot
license: gemma
pipeline_tag: robotics
tags:
- vision-language-action
- imitation-learning
- behavior-cloning
- lerobot
- pi05
- pi0.5
- openpi
- robotics
- isaaclab
- so101
- multi-task
- corl2026
- bfloat16
- full-finetune
- safetensors
datasets:
- CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi
base_model:
- lerobot/pi05_base
inference: false
---

# Pi0.5 IsaacLab Multi-Task 1 Epoch

This repository contains a Pi0.5 policy fine-tuned with LeRobot on the IsaacLab SO-101 multi-task dataset `CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi`.

## Model Details

- **Base model:** `lerobot/pi05_base`
- **Policy type:** `pi05`
- **Training type:** full fine-tuning
- **Vision encoder frozen:** no
- **Action expert only:** no
- **Checkpoint:** final checkpoint at step `13761`
- **Training length:** `1.00` epoch
- **Precision:** bfloat16
- **Format:** safetensors

## Dataset

- **Dataset:** `CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi`
- **Robot:** SO-101 follower
- **Episodes:** `3300`
- **Frames:** `3,522,774`
- **Tasks:** `800`
- **FPS:** `30`
- **Visual inputs:** `observation.images.top`, `observation.images.left_wrist`
- **State/action dimensions:** 6 DoF robot state/action, padded by the Pi0.5 policy configuration as needed

## Training Hyperparameters

| Setting | Value |
|---|---:|
| Steps | `13761` |
| Epochs | `1.00` |
| Per-device batch size | `16` |
| GPUs | `2` |
| Gradient accumulation | `8` |
| Effective batch size | `256` |
| Mixed precision | `bf16` |
| Policy dtype | `bfloat16` |
| Chunk size | `16` |
| Action steps | `16` |
| Gradient checkpointing | `true` |
| Compile model | `false` |
| DataLoader workers | `8` |
| DataLoader prefetch factor | `2` |
| Persistent workers | `true` |
| Pin memory | `true` |
| Preprocess in workers | `true` |
| DDP find unused parameters | `true` |
| Seed | `1000` |

### Optimizer and Scheduler

| Setting | Value |
|---|---:|
| Optimizer | AdamW |
| Learning rate | `2.5e-5` |
| Betas | `[0.9, 0.95]` |
| Epsilon | `1e-8` |
| Weight decay | `0.01` |
| Gradient clip norm | `1.0` |
| Scheduler | cosine decay with warmup |
| Configured warmup steps | `1000` |
| Effective warmup steps | `458` |
| Configured decay steps | `30000` |
| Effective decay steps | `13761` |
| Final decay LR | `2.5e-6` |

The scheduler was automatically scaled because `num_training_steps=13761` was smaller than the configured `num_decay_steps=30000`.

## Final Training Log Snapshot

The final logged training metrics near completion were:

- `step=13760/13761`
- `epoch=1.00`
- `loss=0.009`
- `grad_norm=0.259`
- `lr=2.5e-06`
- `updt_s=1.658`
- `data_s=0.017`

Training completed successfully on `2026-05-13 18:37:47 UTC`.

## Files

This repository includes only the inference/evaluation policy files from `pretrained_model`:

- `config.json`
- `model.safetensors`
- `train_config.json`
- `policy_preprocessor.json`
- `policy_preprocessor_step_2_normalizer_processor.safetensors`
- `policy_postprocessor.json`
- `policy_postprocessor_step_0_unnormalizer_processor.safetensors`

Optimizer state and other resumable training-state files are intentionally excluded.

## Evaluation Status

No rollout or task-success evaluation metrics are included yet. This checkpoint is intended as a reproducible 1-epoch Pi0.5 fine-tuning artifact for IsaacLab SO-101 multi-task experiments.

## Reproducibility

Training was launched from the AutoDataCollector LeRobot workspace using the Pi0.5 IsaacLab training script configuration corresponding to:

```bash
DATASET_REPO_ID=CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi POLICY_PATH=lerobot/pi05_base BATCH_SIZE=16 GRADIENT_ACCUMULATION_STEPS=8 NUM_GPUS=2 STEPS=13761 MIXED_PRECISION=bf16 POLICY_DTYPE=bfloat16 CHUNK_SIZE=16 N_ACTION_STEPS=16 GRADIENT_CHECKPOINTING=true FREEZE_VISION_ENCODER=false TRAIN_EXPERT_ONLY=false NUM_WORKERS=8 DATALOADER_PREFETCH_FACTOR=2 DATALOADER_PERSISTENT_WORKERS=true DATALOADER_PIN_MEMORY=true PREPROCESS_IN_WORKERS=true OPTIMIZER_LR=2.5e-5 OPTIMIZER_WEIGHT_DECAY=0.01 OPTIMIZER_GRAD_CLIP_NORM=1.0 SCHEDULER_WARMUP_STEPS=1000 SCHEDULER_DECAY_STEPS=30000 SCHEDULER_DECAY_LR=2.5e-6 ./lerobot/scripts/train_pi05_isaaclab.sh
```