feat(layout): improving overall layout

#12
by CarolinePascal HF Staff - opened
app/src/content/chapters/folding/06-training.mdx CHANGED
@@ -28,9 +28,11 @@ The model generates actions through **flow matching**, a generative approach tha
28
  Flow matching is closely related to diffusion models but uses a simpler, more direct interpolation path between noise and data.
29
  </Sidenote>
30
 
 
 
31
  #### [Real-Time Chunking (RTC)](https://huggingface.co/docs/lerobot/rtc)
32
 
33
- RTC was crucial for real-world deployment. Instead of waiting for the predicted action chunk to finish before generating the next, RTC generates the next chunk while executing the current one. It "freezes" actions that are already committed and "inpaints" the remaining ones, producing smooth asynchronous motion. In practice, this sped up our rollouts by at least a factor of 2.
34
 
35
  ```mermaid
36
  sequenceDiagram
@@ -43,6 +45,35 @@ sequenceDiagram
43
  end
44
  ```
45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ### Models
47
 
48
  We initially trained multiple architectures supported in LeRobot, but we ended up focusing on two VLA architectures for our cloth folding data:
 
28
  Flow matching is closely related to diffusion models but uses a simpler, more direct interpolation path between noise and data.
29
  </Sidenote>
30
 
31
+ Before diving further into the model architecture, let's introduce a few key ingredients: [RTC](https://huggingface.co/docs/lerobot/rtc), [SARM](https://huggingface.co/docs/lerobot/sarm) and [relative actions](https://huggingface.co/docs/lerobot/action_representations).
32
+
33
  #### [Real-Time Chunking (RTC)](https://huggingface.co/docs/lerobot/rtc)
34
 
35
+ RTC is crucial for real-world deployment. Instead of waiting for the predicted action chunk to finish before generating the next, RTC generates the next chunk while executing the current one. It "freezes" actions that are already committed and "inpaints" the remaining ones, producing smooth asynchronous motion. In practice, this sped up our rollouts by at least a factor of 2.
36
 
37
  ```mermaid
38
  sequenceDiagram
 
45
  end
46
  ```
47
 
48
+ #### [Stage-Aware Reward Modeling (SARM)](https://huggingface.co/docs/lerobot/sarm)
49
+
50
+ SARM is a trained reward model that scores trajectories based on how well the robot is progressing toward task completion, it acts as a learned "critic" that predicts whether things are going well or badly.
51
+
52
+ SARM is trained on our demonstration data to predict 0-1 task progression. The takeaway: it correctly identifies **mistakes** (drops in value) and **progress** (increases) in real time.
53
+
54
+ <Wide>
55
+ <Stack layout="3-column" gap="small">
56
+ <img src={sarmEp300.src} alt="SARM annotation on episode 300" style="width:100%; border-radius: 8px;" />
57
+ <img src={sarmEp2500.src} alt="SARM annotation on episode 2500" style="width:100%; border-radius: 8px;" />
58
+ <img src={sarmEp2200.src} alt="SARM annotation on episode 2200" style="width:100%; border-radius: 8px;" />
59
+ </Stack>
60
+ </Wide>
61
+
62
+ We use SARM exclusively for **RABC** (Reward-Advantage-Based Conditioning): every episode is scored with a per-timestep quality signal, and during training, actions are weighted by their contribution to progress. High-reward actions contribute more to the loss, low-reward ones contribute less. Negative progress is clipped to 0. Unlike binary success/fail labels, SARM provides a continuous signal at every timestep.
63
+
64
+ #### [Relative actions representation](https://huggingface.co/docs/lerobot/action_representations)
65
+
66
+ Following [UMI](https://arxiv.org/abs/2402.10329) approach, actions are expressed as **relative trajectories**: each action in the chunk is an offset from the robot's current state at prediction time, not from the previous action. This avoids error accumulation (unlike true delta) and doesn't require a global coordinate frame (unlike absolute).
67
+
68
+ <HtmlEmbed
69
+ id="action-representations"
70
+ src="folding/action-representations.html"
71
+ title="Action Representations"
72
+ desc="Relative trajectory (blue) references all actions to the current state. Delta (yellow) chains each action to the previous one, accumulating error. Absolute (red) requires a global coordinate frame. Diagram adapted from UMI (Chi et al., 2024)."
73
+ />
74
+
75
+ See the [LeRobot action representations docs](https://huggingface.co/docs/lerobot/action_representations) for a full guide on how to use relative actions in your own training.
76
+
77
  ### Models
78
 
79
  We initially trained multiple architectures supported in LeRobot, but we ended up focusing on two VLA architectures for our cloth folding data:
app/src/content/chapters/folding/08-ablations.mdx CHANGED
@@ -30,36 +30,9 @@ We ran 11 experiments to understand what *actually* matters. **Series 1** trains
30
 
31
  </Wide>
32
 
33
- All experiments use **[RTC](https://huggingface.co/docs/lerobot/rtc)** (Real-Time Chunking) and **action interpolation** (upsampling from 30 Hz to 90 Hz). The RTC settings used across all experiments:
34
 
35
- ```python
36
- policy_cfg.rtc_config = RTCConfig(
37
- enabled=True,
38
- execution_horizon=20,
39
- max_guidance_weight=5.0,
40
- prefix_attention_schedule=RTCAttentionSchedule.LINEAR,
41
- )
42
- ```
43
-
44
- with an action queue size of 30 and a maximum action horizon of 20.
45
-
46
- RTC gave us a ~2x speedup (sometimes even 2.5x) and action interpolation made the robot much quieter and smoother. Both are now available on [LeRobot main](https://github.com/huggingface/lerobot).
47
-
48
- ### SARM: Our Reward Model
49
-
50
- Before diving into the experiments further, let's introduce a key ingredient: **[SARM](https://huggingface.co/docs/lerobot/sarm)** (Stage-Aware Reward Modeling). SARM is a trained reward model that scores trajectories based on how well the robot is progressing toward task completion, it acts as a learned "critic" that predicts whether things are going well or badly.
51
-
52
- SARM is trained on our demonstration data to predict 0-1 task progression. The takeaway: it correctly identifies **mistakes** (drops in value) and **progress** (increases) in real time.
53
-
54
- <Wide>
55
- <Stack layout="3-column" gap="small">
56
- <img src={sarmEp300.src} alt="SARM annotation on episode 300" style="width:100%; border-radius: 8px;" />
57
- <img src={sarmEp2500.src} alt="SARM annotation on episode 2500" style="width:100%; border-radius: 8px;" />
58
- <img src={sarmEp2200.src} alt="SARM annotation on episode 2200" style="width:100%; border-radius: 8px;" />
59
- </Stack>
60
- </Wide>
61
-
62
- We use SARM exclusively for **RABC** (Reward-Advantage-Based Conditioning): every episode is scored with a per-timestep quality signal, and during training, actions are weighted by their contribution to progress. High-reward actions contribute more to the loss, low-reward ones contribute less. Negative progress is clipped to 0. Unlike binary success/fail labels, SARM provides a continuous signal at every timestep.
63
 
64
  ---
65
 
@@ -141,32 +114,6 @@ We hypothesise that the root cause is the difference in **multi-modality** betwe
141
 
142
  #### 2. Relative actions improve performance consistently
143
 
144
- We use **relative trajectory** actions as defined by [UMI](https://arxiv.org/abs/2402.10329): each action in the chunk is an offset from the robot's current state at prediction time, not from the previous action. This avoids error accumulation (unlike true delta) and doesn't require a global coordinate frame (unlike absolute). LeRobot uses absolute actions by default — switching to relative trajectory was one of our key improvements. See the [LeRobot action representations docs](https://huggingface.co/docs/lerobot/action_representations) for a full guide on how to use relative actions in your own training.
145
-
146
- <HtmlEmbed
147
- id="action-representations"
148
- src="folding/action-representations.html"
149
- title="Action Representations"
150
- desc="Relative trajectory (blue) references all actions to the current state. Delta (yellow) chains each action to the previous one, accumulating error. Absolute (red) requires a global coordinate frame. Diagram adapted from UMI (Chi et al., 2024)."
151
- />
152
-
153
- To enable relative actions for π0/π0.5, first precompute the relative action statistics for your dataset, then train with the flag enabled:
154
-
155
- ```bash
156
- # Precompute relative action stats
157
- lerobot-edit-dataset \
158
- --repo_id your_dataset \
159
- --operation.type recompute_stats \
160
- --operation.relative_action true \
161
- --operation.chunk_size 50
162
-
163
- # Train with relative actions
164
- lerobot-train \
165
- --dataset.repo_id=your_dataset \
166
- --policy.type=pi0 \
167
- --policy.use_relative_actions=true
168
- ```
169
-
170
  Comparing π0.5 without relative actions (1.2: 20% total SR, 40% Level 1) to π0.5 with relative actions and quantile normalization (1.3: 35% total SR, 70% Level 1), and then to the full combination in 1.7 (40% total SR, 80% Level 1), shows that training with relative actions consistently improves performance. The trend is clear and shows up in every comparison we made.
171
 
172
  With only 20 rollouts, the exact gap between experiments is hard to pin down — but the improvement is consistent across every comparison. **Caveat:** π0.5 is likely pretrained with relative actions, so 1.3 and 1.7 fine-tune in a regime consistent with pretraining, while 1.2 fine-tunes against it.
 
30
 
31
  </Wide>
32
 
33
+ All experiments use **[RTC](https://huggingface.co/docs/lerobot/rtc)** (Real-Time Chunking) with an action queue size of 30 and a maximum action horizon of 20, along with **action interpolation** (upsampling from 30 Hz to 90 Hz).
34
 
35
+ RTC gave us a ~2x speedup (sometimes even 2.5x) and action interpolation made the robot much quieter and smoother. Both are now available on [LeRobot repository](https://github.com/huggingface/lerobot).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ---
38
 
 
114
 
115
  #### 2. Relative actions improve performance consistently
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  Comparing π0.5 without relative actions (1.2: 20% total SR, 40% Level 1) to π0.5 with relative actions and quantile normalization (1.3: 35% total SR, 70% Level 1), and then to the full combination in 1.7 (40% total SR, 80% Level 1), shows that training with relative actions consistently improves performance. The trend is clear and shows up in every comparison we made.
118
 
119
  With only 20 rollouts, the exact gap between experiments is hard to pin down — but the improvement is consistent across every comparison. **Caveat:** π0.5 is likely pretrained with relative actions, so 1.3 and 1.7 fine-tune in a regime consistent with pretraining, while 1.2 fine-tunes against it.
app/src/content/chapters/folding/09-learnings.mdx CHANGED
@@ -22,31 +22,59 @@ If you're training a policy for a new manipulation task with LeRobot, **here's t
22
  2. **Collect 50–100 clean demonstrations.** Quality over volume. Consistent technique, good camera angles, deliberate motions. This is your foundation, everything else builds on it.
23
  3. **Train a reward model.** Use [SARM](https://huggingface.co/docs/lerobot/sarm) to score your episodes and enable RABC during training. This allows the policy to focus on the best demonstrations, which is crucial for longer tasks.
24
  4. **Train a baseline and watch it fail.** Film the rollouts. Understanding *how* and *where* it breaks tells you exactly what kind of data to collect next.
25
- 5. **Use DAgger for targeted improvement.** Once you have a model that mostly works, collect correction data for its specific failure modes. LeRobot's [HIL scripts](https://github.com/huggingface/lerobot/tree/main/examples/hil) handle the full loop, the operator watches the policy run, pauses on failure, teleoperates a recovery, and hands control back:
26
 
27
  ```bash
28
- python examples/hil/hil_data_collection.py \
 
29
  --robot.type=bi_openarm_follower \
30
- --teleop.type=openarm_mini \
31
- --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
32
- --dataset.repo_id=your-username/hil-dataset \
33
  --rtc.enabled=true \
34
- --rtc.execution_horizon=20
 
35
  ```
36
 
37
- 6. **Enable action interpolation and RTC.** This smooths transitions and speeds up execution. Action interpolation upsamples the policy's 30 Hz output to your robot's control frequency (e.g. 90 Hz), and RTC overlaps inference with execution. Both features are flags on `lerobot-eval`:
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ```bash
40
- lerobot-eval \
41
- --policy.path=outputs/checkpoints/last/pretrained_model \
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  --robot.type=bi_openarm_follower \
43
- --policy.device=cuda \
 
 
44
  --rtc.enabled=true \
45
- --rtc.execution_horizon=20 \
46
- --interpolation_multiplier=3
47
  ```
48
 
49
- 7. **Film every evaluation.** Metrics alone won't tell the full story. Video reveals subtle failure modes that success rate misses, and lets you score quality.
50
 
51
  <Note variant="info">
52
  All the innovations from this project [SARM](https://huggingface.co/docs/lerobot/sarm), [RTC](https://huggingface.co/docs/lerobot/rtc), DAgger, [OpenArm](https://huggingface.co/docs/lerobot/openarm), and OpenArm Mini are merged into [LeRobot repository](https://github.com/huggingface/lerobot). You can use our full pipeline as a starting point and swap in your own task.
 
22
  2. **Collect 50–100 clean demonstrations.** Quality over volume. Consistent technique, good camera angles, deliberate motions. This is your foundation, everything else builds on it.
23
  3. **Train a reward model.** Use [SARM](https://huggingface.co/docs/lerobot/sarm) to score your episodes and enable RABC during training. This allows the policy to focus on the best demonstrations, which is crucial for longer tasks.
24
  4. **Train a baseline and watch it fail.** Film the rollouts. Understanding *how* and *where* it breaks tells you exactly what kind of data to collect next.
25
+ 5. **Enable action interpolation and [RTC](https://huggingface.co/docs/lerobot/rtc).** This smooths transitions and speeds up execution. Action interpolation upsamples the policy's 30 Hz output to your robot's control frequency (e.g. 90 Hz), and RTC overlaps inference with execution. Both features are available at inference time with flags on `lerobot-eval`:
26
 
27
  ```bash
28
+ lerobot-eval \
29
+ --policy.path=outputs/checkpoints/last/pretrained_model \
30
  --robot.type=bi_openarm_follower \
31
+ --policy.device=cuda \
 
 
32
  --rtc.enabled=true \
33
+ --rtc.execution_horizon=20 \
34
+ --interpolation_multiplier=3
35
  ```
36
 
37
+ RTC can also be enabled at training time with the appropriate configuration:
38
+
39
+ ```python
40
+ policy_cfg.rtc_config = RTCConfig(
41
+ enabled=True,
42
+ execution_horizon=20,
43
+ max_guidance_weight=5.0,
44
+ prefix_attention_schedule=RTCAttentionSchedule.LINEAR,
45
+ )
46
+ ```
47
+
48
+ 6. **Find the right [action representation](https://huggingface.co/docs/lerobot/action_representations).** LeRobot uses absolute actions by default. Switching to relative trajectory was one of our key improvements, and unlocked consistency with π0.5 pretraining. To enable relative actions for π0/π0.5 using LeRobot, first precompute the relative action statistics for your dataset, then train with the flag enabled:
49
 
50
  ```bash
51
+ # Precompute relative action stats
52
+ lerobot-edit-dataset \
53
+ --repo_id your_dataset \
54
+ --operation.type recompute_stats \
55
+ --operation.relative_action true \
56
+ --operation.chunk_size 50
57
+
58
+ # Train with relative actions
59
+ lerobot-train \
60
+ --dataset.repo_id=your_dataset \
61
+ --policy.type=pi0 \
62
+ --policy.use_relative_actions=true
63
+ ```
64
+
65
+ 7. **Use DAgger for targeted improvement.** Once you have a model that mostly works, collect correction data for its specific failure modes. LeRobot's [HIL scripts](https://github.com/huggingface/lerobot/tree/main/examples/hil) handle the full loop, the operator watches the policy run, pauses on failure, teleoperates a recovery, and hands control back:
66
+
67
+ ```bash
68
+ python examples/hil/hil_data_collection.py \
69
  --robot.type=bi_openarm_follower \
70
+ --teleop.type=openarm_mini \
71
+ --policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
72
+ --dataset.repo_id=your-username/hil-dataset \
73
  --rtc.enabled=true \
74
+ --rtc.execution_horizon=20
 
75
  ```
76
 
77
+ 8. **Film every evaluation.** Metrics alone won't tell the full story. Video reveals subtle failure modes that success rate misses, and lets you score quality.
78
 
79
  <Note variant="info">
80
  All the innovations from this project [SARM](https://huggingface.co/docs/lerobot/sarm), [RTC](https://huggingface.co/docs/lerobot/rtc), DAgger, [OpenArm](https://huggingface.co/docs/lerobot/openarm), and OpenArm Mini are merged into [LeRobot repository](https://github.com/huggingface/lerobot). You can use our full pipeline as a starting point and swap in your own task.