vpraise00 commited on
Commit
9c83b24
·
verified ·
1 Parent(s): 0c2cf55

Add model card

Browse files
Files changed (1) hide show
  1. README.md +138 -0
README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: lerobot
5
+ license: gemma
6
+ pipeline_tag: robotics
7
+ tags:
8
+ - vision-language-action
9
+ - imitation-learning
10
+ - behavior-cloning
11
+ - lerobot
12
+ - pi05
13
+ - pi0.5
14
+ - openpi
15
+ - robotics
16
+ - isaaclab
17
+ - so101
18
+ - multi-task
19
+ - corl2026
20
+ - bfloat16
21
+ - full-finetune
22
+ - safetensors
23
+ datasets:
24
+ - CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi
25
+ base_model:
26
+ - lerobot/pi05_base
27
+ inference: false
28
+ ---
29
+
30
+ # Pi0.5 IsaacLab Multi-Task 1 Epoch
31
+
32
+ This repository contains a Pi0.5 policy fine-tuned with LeRobot on the IsaacLab SO-101 multi-task dataset `CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi`.
33
+
34
+ ## Model Details
35
+
36
+ - **Base model:** `lerobot/pi05_base`
37
+ - **Policy type:** `pi05`
38
+ - **Training type:** full fine-tuning
39
+ - **Vision encoder frozen:** no
40
+ - **Action expert only:** no
41
+ - **Checkpoint:** final checkpoint at step `13761`
42
+ - **Training length:** `1.00` epoch
43
+ - **Precision:** bfloat16
44
+ - **Format:** safetensors
45
+
46
+ ## Dataset
47
+
48
+ - **Dataset:** `CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi`
49
+ - **Robot:** SO-101 follower
50
+ - **Episodes:** `3300`
51
+ - **Frames:** `3,522,774`
52
+ - **Tasks:** `800`
53
+ - **FPS:** `30`
54
+ - **Visual inputs:** `observation.images.top`, `observation.images.left_wrist`
55
+ - **State/action dimensions:** 6 DoF robot state/action, padded by the Pi0.5 policy configuration as needed
56
+
57
+ ## Training Hyperparameters
58
+
59
+ | Setting | Value |
60
+ |---|---:|
61
+ | Steps | `13761` |
62
+ | Epochs | `1.00` |
63
+ | Per-device batch size | `16` |
64
+ | GPUs | `2` |
65
+ | Gradient accumulation | `8` |
66
+ | Effective batch size | `256` |
67
+ | Mixed precision | `bf16` |
68
+ | Policy dtype | `bfloat16` |
69
+ | Chunk size | `16` |
70
+ | Action steps | `16` |
71
+ | Gradient checkpointing | `true` |
72
+ | Compile model | `false` |
73
+ | DataLoader workers | `8` |
74
+ | DataLoader prefetch factor | `2` |
75
+ | Persistent workers | `true` |
76
+ | Pin memory | `true` |
77
+ | Preprocess in workers | `true` |
78
+ | DDP find unused parameters | `true` |
79
+ | Seed | `1000` |
80
+
81
+ ### Optimizer and Scheduler
82
+
83
+ | Setting | Value |
84
+ |---|---:|
85
+ | Optimizer | AdamW |
86
+ | Learning rate | `2.5e-5` |
87
+ | Betas | `[0.9, 0.95]` |
88
+ | Epsilon | `1e-8` |
89
+ | Weight decay | `0.01` |
90
+ | Gradient clip norm | `1.0` |
91
+ | Scheduler | cosine decay with warmup |
92
+ | Configured warmup steps | `1000` |
93
+ | Effective warmup steps | `458` |
94
+ | Configured decay steps | `30000` |
95
+ | Effective decay steps | `13761` |
96
+ | Final decay LR | `2.5e-6` |
97
+
98
+ The scheduler was automatically scaled because `num_training_steps=13761` was smaller than the configured `num_decay_steps=30000`.
99
+
100
+ ## Final Training Log Snapshot
101
+
102
+ The final logged training metrics near completion were:
103
+
104
+ - `step=13760/13761`
105
+ - `epoch=1.00`
106
+ - `loss=0.009`
107
+ - `grad_norm=0.259`
108
+ - `lr=2.5e-06`
109
+ - `updt_s=1.658`
110
+ - `data_s=0.017`
111
+
112
+ Training completed successfully on `2026-05-13 18:37:47 UTC`.
113
+
114
+ ## Files
115
+
116
+ This repository includes only the inference/evaluation policy files from `pretrained_model`:
117
+
118
+ - `config.json`
119
+ - `model.safetensors`
120
+ - `train_config.json`
121
+ - `policy_preprocessor.json`
122
+ - `policy_preprocessor_step_2_normalizer_processor.safetensors`
123
+ - `policy_postprocessor.json`
124
+ - `policy_postprocessor_step_0_unnormalizer_processor.safetensors`
125
+
126
+ Optimizer state and other resumable training-state files are intentionally excluded.
127
+
128
+ ## Evaluation Status
129
+
130
+ No rollout or task-success evaluation metrics are included yet. This checkpoint is intended as a reproducible 1-epoch Pi0.5 fine-tuning artifact for IsaacLab SO-101 multi-task experiments.
131
+
132
+ ## Reproducibility
133
+
134
+ Training was launched from the AutoDataCollector LeRobot workspace using the Pi0.5 IsaacLab training script configuration corresponding to:
135
+
136
+ ```bash
137
+ DATASET_REPO_ID=CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi POLICY_PATH=lerobot/pi05_base BATCH_SIZE=16 GRADIENT_ACCUMULATION_STEPS=8 NUM_GPUS=2 STEPS=13761 MIXED_PRECISION=bf16 POLICY_DTYPE=bfloat16 CHUNK_SIZE=16 N_ACTION_STEPS=16 GRADIENT_CHECKPOINTING=true FREEZE_VISION_ENCODER=false TRAIN_EXPERT_ONLY=false NUM_WORKERS=8 DATALOADER_PREFETCH_FACTOR=2 DATALOADER_PERSISTENT_WORKERS=true DATALOADER_PIN_MEMORY=true PREPROCESS_IN_WORKERS=true OPTIMIZER_LR=2.5e-5 OPTIMIZER_WEIGHT_DECAY=0.01 OPTIMIZER_GRAD_CLIP_NORM=1.0 SCHEDULER_WARMUP_STEPS=1000 SCHEDULER_DECAY_STEPS=30000 SCHEDULER_DECAY_LR=2.5e-6 ./lerobot/scripts/train_pi05_isaaclab.sh
138
+ ```