Spaces:
Running
Running
feat(layout): improving overall layout
#12
by CarolinePascal HF Staff - opened
app/src/content/chapters/folding/06-training.mdx
CHANGED
|
@@ -28,9 +28,11 @@ The model generates actions through **flow matching**, a generative approach tha
|
|
| 28 |
Flow matching is closely related to diffusion models but uses a simpler, more direct interpolation path between noise and data.
|
| 29 |
</Sidenote>
|
| 30 |
|
|
|
|
|
|
|
| 31 |
#### [Real-Time Chunking (RTC)](https://huggingface.co/docs/lerobot/rtc)
|
| 32 |
|
| 33 |
-
RTC
|
| 34 |
|
| 35 |
```mermaid
|
| 36 |
sequenceDiagram
|
|
@@ -43,6 +45,35 @@ sequenceDiagram
|
|
| 43 |
end
|
| 44 |
```
|
| 45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
### Models
|
| 47 |
|
| 48 |
We initially trained multiple architectures supported in LeRobot, but we ended up focusing on two VLA architectures for our cloth folding data:
|
|
|
|
| 28 |
Flow matching is closely related to diffusion models but uses a simpler, more direct interpolation path between noise and data.
|
| 29 |
</Sidenote>
|
| 30 |
|
| 31 |
+
Before diving further into the model architecture, let's introduce a few key ingredients: [RTC](https://huggingface.co/docs/lerobot/rtc), [SARM](https://huggingface.co/docs/lerobot/sarm) and [relative actions](https://huggingface.co/docs/lerobot/action_representations).
|
| 32 |
+
|
| 33 |
#### [Real-Time Chunking (RTC)](https://huggingface.co/docs/lerobot/rtc)
|
| 34 |
|
| 35 |
+
RTC is crucial for real-world deployment. Instead of waiting for the predicted action chunk to finish before generating the next, RTC generates the next chunk while executing the current one. It "freezes" actions that are already committed and "inpaints" the remaining ones, producing smooth asynchronous motion. In practice, this sped up our rollouts by at least a factor of 2.
|
| 36 |
|
| 37 |
```mermaid
|
| 38 |
sequenceDiagram
|
|
|
|
| 45 |
end
|
| 46 |
```
|
| 47 |
|
| 48 |
+
#### [Stage-Aware Reward Modeling (SARM)](https://huggingface.co/docs/lerobot/sarm)
|
| 49 |
+
|
| 50 |
+
SARM is a trained reward model that scores trajectories based on how well the robot is progressing toward task completion, it acts as a learned "critic" that predicts whether things are going well or badly.
|
| 51 |
+
|
| 52 |
+
SARM is trained on our demonstration data to predict 0-1 task progression. The takeaway: it correctly identifies **mistakes** (drops in value) and **progress** (increases) in real time.
|
| 53 |
+
|
| 54 |
+
<Wide>
|
| 55 |
+
<Stack layout="3-column" gap="small">
|
| 56 |
+
<img src={sarmEp300.src} alt="SARM annotation on episode 300" style="width:100%; border-radius: 8px;" />
|
| 57 |
+
<img src={sarmEp2500.src} alt="SARM annotation on episode 2500" style="width:100%; border-radius: 8px;" />
|
| 58 |
+
<img src={sarmEp2200.src} alt="SARM annotation on episode 2200" style="width:100%; border-radius: 8px;" />
|
| 59 |
+
</Stack>
|
| 60 |
+
</Wide>
|
| 61 |
+
|
| 62 |
+
We use SARM exclusively for **RABC** (Reward-Advantage-Based Conditioning): every episode is scored with a per-timestep quality signal, and during training, actions are weighted by their contribution to progress. High-reward actions contribute more to the loss, low-reward ones contribute less. Negative progress is clipped to 0. Unlike binary success/fail labels, SARM provides a continuous signal at every timestep.
|
| 63 |
+
|
| 64 |
+
#### [Relative actions representation](https://huggingface.co/docs/lerobot/action_representations)
|
| 65 |
+
|
| 66 |
+
Following [UMI](https://arxiv.org/abs/2402.10329) approach, actions are expressed as **relative trajectories**: each action in the chunk is an offset from the robot's current state at prediction time, not from the previous action. This avoids error accumulation (unlike true delta) and doesn't require a global coordinate frame (unlike absolute).
|
| 67 |
+
|
| 68 |
+
<HtmlEmbed
|
| 69 |
+
id="action-representations"
|
| 70 |
+
src="folding/action-representations.html"
|
| 71 |
+
title="Action Representations"
|
| 72 |
+
desc="Relative trajectory (blue) references all actions to the current state. Delta (yellow) chains each action to the previous one, accumulating error. Absolute (red) requires a global coordinate frame. Diagram adapted from UMI (Chi et al., 2024)."
|
| 73 |
+
/>
|
| 74 |
+
|
| 75 |
+
See the [LeRobot action representations docs](https://huggingface.co/docs/lerobot/action_representations) for a full guide on how to use relative actions in your own training.
|
| 76 |
+
|
| 77 |
### Models
|
| 78 |
|
| 79 |
We initially trained multiple architectures supported in LeRobot, but we ended up focusing on two VLA architectures for our cloth folding data:
|
app/src/content/chapters/folding/08-ablations.mdx
CHANGED
|
@@ -30,36 +30,9 @@ We ran 11 experiments to understand what *actually* matters. **Series 1** trains
|
|
| 30 |
|
| 31 |
</Wide>
|
| 32 |
|
| 33 |
-
All experiments use **[RTC](https://huggingface.co/docs/lerobot/rtc)** (Real-Time Chunking) and **action interpolation** (upsampling from 30 Hz to 90 Hz).
|
| 34 |
|
| 35 |
-
|
| 36 |
-
policy_cfg.rtc_config = RTCConfig(
|
| 37 |
-
enabled=True,
|
| 38 |
-
execution_horizon=20,
|
| 39 |
-
max_guidance_weight=5.0,
|
| 40 |
-
prefix_attention_schedule=RTCAttentionSchedule.LINEAR,
|
| 41 |
-
)
|
| 42 |
-
```
|
| 43 |
-
|
| 44 |
-
with an action queue size of 30 and a maximum action horizon of 20.
|
| 45 |
-
|
| 46 |
-
RTC gave us a ~2x speedup (sometimes even 2.5x) and action interpolation made the robot much quieter and smoother. Both are now available on [LeRobot main](https://github.com/huggingface/lerobot).
|
| 47 |
-
|
| 48 |
-
### SARM: Our Reward Model
|
| 49 |
-
|
| 50 |
-
Before diving into the experiments further, let's introduce a key ingredient: **[SARM](https://huggingface.co/docs/lerobot/sarm)** (Stage-Aware Reward Modeling). SARM is a trained reward model that scores trajectories based on how well the robot is progressing toward task completion, it acts as a learned "critic" that predicts whether things are going well or badly.
|
| 51 |
-
|
| 52 |
-
SARM is trained on our demonstration data to predict 0-1 task progression. The takeaway: it correctly identifies **mistakes** (drops in value) and **progress** (increases) in real time.
|
| 53 |
-
|
| 54 |
-
<Wide>
|
| 55 |
-
<Stack layout="3-column" gap="small">
|
| 56 |
-
<img src={sarmEp300.src} alt="SARM annotation on episode 300" style="width:100%; border-radius: 8px;" />
|
| 57 |
-
<img src={sarmEp2500.src} alt="SARM annotation on episode 2500" style="width:100%; border-radius: 8px;" />
|
| 58 |
-
<img src={sarmEp2200.src} alt="SARM annotation on episode 2200" style="width:100%; border-radius: 8px;" />
|
| 59 |
-
</Stack>
|
| 60 |
-
</Wide>
|
| 61 |
-
|
| 62 |
-
We use SARM exclusively for **RABC** (Reward-Advantage-Based Conditioning): every episode is scored with a per-timestep quality signal, and during training, actions are weighted by their contribution to progress. High-reward actions contribute more to the loss, low-reward ones contribute less. Negative progress is clipped to 0. Unlike binary success/fail labels, SARM provides a continuous signal at every timestep.
|
| 63 |
|
| 64 |
---
|
| 65 |
|
|
@@ -141,32 +114,6 @@ We hypothesise that the root cause is the difference in **multi-modality** betwe
|
|
| 141 |
|
| 142 |
#### 2. Relative actions improve performance consistently
|
| 143 |
|
| 144 |
-
We use **relative trajectory** actions as defined by [UMI](https://arxiv.org/abs/2402.10329): each action in the chunk is an offset from the robot's current state at prediction time, not from the previous action. This avoids error accumulation (unlike true delta) and doesn't require a global coordinate frame (unlike absolute). LeRobot uses absolute actions by default — switching to relative trajectory was one of our key improvements. See the [LeRobot action representations docs](https://huggingface.co/docs/lerobot/action_representations) for a full guide on how to use relative actions in your own training.
|
| 145 |
-
|
| 146 |
-
<HtmlEmbed
|
| 147 |
-
id="action-representations"
|
| 148 |
-
src="folding/action-representations.html"
|
| 149 |
-
title="Action Representations"
|
| 150 |
-
desc="Relative trajectory (blue) references all actions to the current state. Delta (yellow) chains each action to the previous one, accumulating error. Absolute (red) requires a global coordinate frame. Diagram adapted from UMI (Chi et al., 2024)."
|
| 151 |
-
/>
|
| 152 |
-
|
| 153 |
-
To enable relative actions for π0/π0.5, first precompute the relative action statistics for your dataset, then train with the flag enabled:
|
| 154 |
-
|
| 155 |
-
```bash
|
| 156 |
-
# Precompute relative action stats
|
| 157 |
-
lerobot-edit-dataset \
|
| 158 |
-
--repo_id your_dataset \
|
| 159 |
-
--operation.type recompute_stats \
|
| 160 |
-
--operation.relative_action true \
|
| 161 |
-
--operation.chunk_size 50
|
| 162 |
-
|
| 163 |
-
# Train with relative actions
|
| 164 |
-
lerobot-train \
|
| 165 |
-
--dataset.repo_id=your_dataset \
|
| 166 |
-
--policy.type=pi0 \
|
| 167 |
-
--policy.use_relative_actions=true
|
| 168 |
-
```
|
| 169 |
-
|
| 170 |
Comparing π0.5 without relative actions (1.2: 20% total SR, 40% Level 1) to π0.5 with relative actions and quantile normalization (1.3: 35% total SR, 70% Level 1), and then to the full combination in 1.7 (40% total SR, 80% Level 1), shows that training with relative actions consistently improves performance. The trend is clear and shows up in every comparison we made.
|
| 171 |
|
| 172 |
With only 20 rollouts, the exact gap between experiments is hard to pin down — but the improvement is consistent across every comparison. **Caveat:** π0.5 is likely pretrained with relative actions, so 1.3 and 1.7 fine-tune in a regime consistent with pretraining, while 1.2 fine-tunes against it.
|
|
|
|
| 30 |
|
| 31 |
</Wide>
|
| 32 |
|
| 33 |
+
All experiments use **[RTC](https://huggingface.co/docs/lerobot/rtc)** (Real-Time Chunking) with an action queue size of 30 and a maximum action horizon of 20, along with **action interpolation** (upsampling from 30 Hz to 90 Hz).
|
| 34 |
|
| 35 |
+
RTC gave us a ~2x speedup (sometimes even 2.5x) and action interpolation made the robot much quieter and smoother. Both are now available on [LeRobot repository](https://github.com/huggingface/lerobot).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
---
|
| 38 |
|
|
|
|
| 114 |
|
| 115 |
#### 2. Relative actions improve performance consistently
|
| 116 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 117 |
Comparing π0.5 without relative actions (1.2: 20% total SR, 40% Level 1) to π0.5 with relative actions and quantile normalization (1.3: 35% total SR, 70% Level 1), and then to the full combination in 1.7 (40% total SR, 80% Level 1), shows that training with relative actions consistently improves performance. The trend is clear and shows up in every comparison we made.
|
| 118 |
|
| 119 |
With only 20 rollouts, the exact gap between experiments is hard to pin down — but the improvement is consistent across every comparison. **Caveat:** π0.5 is likely pretrained with relative actions, so 1.3 and 1.7 fine-tune in a regime consistent with pretraining, while 1.2 fine-tunes against it.
|
app/src/content/chapters/folding/09-learnings.mdx
CHANGED
|
@@ -22,31 +22,59 @@ If you're training a policy for a new manipulation task with LeRobot, **here's t
|
|
| 22 |
2. **Collect 50–100 clean demonstrations.** Quality over volume. Consistent technique, good camera angles, deliberate motions. This is your foundation, everything else builds on it.
|
| 23 |
3. **Train a reward model.** Use [SARM](https://huggingface.co/docs/lerobot/sarm) to score your episodes and enable RABC during training. This allows the policy to focus on the best demonstrations, which is crucial for longer tasks.
|
| 24 |
4. **Train a baseline and watch it fail.** Film the rollouts. Understanding *how* and *where* it breaks tells you exactly what kind of data to collect next.
|
| 25 |
-
5. **
|
| 26 |
|
| 27 |
```bash
|
| 28 |
-
|
|
|
|
| 29 |
--robot.type=bi_openarm_follower \
|
| 30 |
-
--
|
| 31 |
-
--policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
|
| 32 |
-
--dataset.repo_id=your-username/hil-dataset \
|
| 33 |
--rtc.enabled=true \
|
| 34 |
-
--rtc.execution_horizon=20
|
|
|
|
| 35 |
```
|
| 36 |
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
```bash
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
--robot.type=bi_openarm_follower \
|
| 43 |
-
--
|
|
|
|
|
|
|
| 44 |
--rtc.enabled=true \
|
| 45 |
-
--rtc.execution_horizon=20
|
| 46 |
-
--interpolation_multiplier=3
|
| 47 |
```
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
<Note variant="info">
|
| 52 |
All the innovations from this project [SARM](https://huggingface.co/docs/lerobot/sarm), [RTC](https://huggingface.co/docs/lerobot/rtc), DAgger, [OpenArm](https://huggingface.co/docs/lerobot/openarm), and OpenArm Mini are merged into [LeRobot repository](https://github.com/huggingface/lerobot). You can use our full pipeline as a starting point and swap in your own task.
|
|
|
|
| 22 |
2. **Collect 50–100 clean demonstrations.** Quality over volume. Consistent technique, good camera angles, deliberate motions. This is your foundation, everything else builds on it.
|
| 23 |
3. **Train a reward model.** Use [SARM](https://huggingface.co/docs/lerobot/sarm) to score your episodes and enable RABC during training. This allows the policy to focus on the best demonstrations, which is crucial for longer tasks.
|
| 24 |
4. **Train a baseline and watch it fail.** Film the rollouts. Understanding *how* and *where* it breaks tells you exactly what kind of data to collect next.
|
| 25 |
+
5. **Enable action interpolation and [RTC](https://huggingface.co/docs/lerobot/rtc).** This smooths transitions and speeds up execution. Action interpolation upsamples the policy's 30 Hz output to your robot's control frequency (e.g. 90 Hz), and RTC overlaps inference with execution. Both features are available at inference time with flags on `lerobot-eval`:
|
| 26 |
|
| 27 |
```bash
|
| 28 |
+
lerobot-eval \
|
| 29 |
+
--policy.path=outputs/checkpoints/last/pretrained_model \
|
| 30 |
--robot.type=bi_openarm_follower \
|
| 31 |
+
--policy.device=cuda \
|
|
|
|
|
|
|
| 32 |
--rtc.enabled=true \
|
| 33 |
+
--rtc.execution_horizon=20 \
|
| 34 |
+
--interpolation_multiplier=3
|
| 35 |
```
|
| 36 |
|
| 37 |
+
RTC can also be enabled at training time with the appropriate configuration:
|
| 38 |
+
|
| 39 |
+
```python
|
| 40 |
+
policy_cfg.rtc_config = RTCConfig(
|
| 41 |
+
enabled=True,
|
| 42 |
+
execution_horizon=20,
|
| 43 |
+
max_guidance_weight=5.0,
|
| 44 |
+
prefix_attention_schedule=RTCAttentionSchedule.LINEAR,
|
| 45 |
+
)
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
6. **Find the right [action representation](https://huggingface.co/docs/lerobot/action_representations).** LeRobot uses absolute actions by default. Switching to relative trajectory was one of our key improvements, and unlocked consistency with π0.5 pretraining. To enable relative actions for π0/π0.5 using LeRobot, first precompute the relative action statistics for your dataset, then train with the flag enabled:
|
| 49 |
|
| 50 |
```bash
|
| 51 |
+
# Precompute relative action stats
|
| 52 |
+
lerobot-edit-dataset \
|
| 53 |
+
--repo_id your_dataset \
|
| 54 |
+
--operation.type recompute_stats \
|
| 55 |
+
--operation.relative_action true \
|
| 56 |
+
--operation.chunk_size 50
|
| 57 |
+
|
| 58 |
+
# Train with relative actions
|
| 59 |
+
lerobot-train \
|
| 60 |
+
--dataset.repo_id=your_dataset \
|
| 61 |
+
--policy.type=pi0 \
|
| 62 |
+
--policy.use_relative_actions=true
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
7. **Use DAgger for targeted improvement.** Once you have a model that mostly works, collect correction data for its specific failure modes. LeRobot's [HIL scripts](https://github.com/huggingface/lerobot/tree/main/examples/hil) handle the full loop, the operator watches the policy run, pauses on failure, teleoperates a recovery, and hands control back:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
python examples/hil/hil_data_collection.py \
|
| 69 |
--robot.type=bi_openarm_follower \
|
| 70 |
+
--teleop.type=openarm_mini \
|
| 71 |
+
--policy.path=outputs/pretrain/checkpoints/last/pretrained_model \
|
| 72 |
+
--dataset.repo_id=your-username/hil-dataset \
|
| 73 |
--rtc.enabled=true \
|
| 74 |
+
--rtc.execution_horizon=20
|
|
|
|
| 75 |
```
|
| 76 |
|
| 77 |
+
8. **Film every evaluation.** Metrics alone won't tell the full story. Video reveals subtle failure modes that success rate misses, and lets you score quality.
|
| 78 |
|
| 79 |
<Note variant="info">
|
| 80 |
All the innovations from this project [SARM](https://huggingface.co/docs/lerobot/sarm), [RTC](https://huggingface.co/docs/lerobot/rtc), DAgger, [OpenArm](https://huggingface.co/docs/lerobot/openarm), and OpenArm Mini are merged into [LeRobot repository](https://github.com/huggingface/lerobot). You can use our full pipeline as a starting point and swap in your own task.
|