Spaces:
Running
Running
Refine blog intro, videos, resource table, and add HF Buckets mention
Browse files- Update intro links (build boxes, clean offices, household tasks)
- Replace hero video with Folding_Final.mp4
- Add descriptive captions under Level 1/Level 2 videos
- Fix white screen on video pause (black background)
- Move read time to ToC sidebar
- Simplify resource table (combine datasets, remove LeRobot code/docs)
- Replace DAgger with HIL (Human-in-the-Loop) in intro
- Add intro sentences for key metrics table in ablations
- Add HF Storage Buckets paragraph in training section
Made-with: Cursor
- app/src/components/Hero.astro +1 -1
- app/src/components/TableOfContents.astro +9 -1
- app/src/components/Video.astro +1 -1
- app/src/content/article.mdx +0 -3
- app/src/content/assets/image/{Folding_V1.mp4 → Folding_Final.mp4} +2 -2
- app/src/content/assets/image/Folding_V2.mp4 +0 -3
- app/src/content/chapters/folding/01-hero.mdx +27 -20
- app/src/content/chapters/folding/03-hardware.mdx +1 -1
- app/src/content/chapters/folding/06-training.mdx +2 -0
- app/src/content/chapters/folding/08-ablations.mdx +21 -0
- app/src/pages/index.astro +1 -0
app/src/components/Hero.astro
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
import HtmlEmbed from "./HtmlEmbed.astro";
|
| 3 |
-
import announcementVideo from "../content/assets/image/
|
| 4 |
|
| 5 |
interface Props {
|
| 6 |
title: string; // may contain HTML (e.g., <br/>)
|
|
|
|
| 1 |
---
|
| 2 |
import HtmlEmbed from "./HtmlEmbed.astro";
|
| 3 |
+
import announcementVideo from "../content/assets/image/Folding_Final.mp4";
|
| 4 |
|
| 5 |
interface Props {
|
| 6 |
title: string; // may contain HTML (e.g., <br/>)
|
app/src/components/TableOfContents.astro
CHANGED
|
@@ -1,8 +1,9 @@
|
|
| 1 |
---
|
| 2 |
export interface Props {
|
| 3 |
tableOfContentAutoCollapse?: boolean;
|
|
|
|
| 4 |
}
|
| 5 |
-
const { tableOfContentAutoCollapse = false } = Astro.props as Props;
|
| 6 |
---
|
| 7 |
|
| 8 |
<nav
|
|
@@ -10,6 +11,7 @@ const { tableOfContentAutoCollapse = false } = Astro.props as Props;
|
|
| 10 |
aria-label="Table of Contents"
|
| 11 |
data-auto-collapse={tableOfContentAutoCollapse ? "1" : "0"}
|
| 12 |
>
|
|
|
|
| 13 |
<div class="title">Table of Contents</div>
|
| 14 |
<div id="article-toc-placeholder"></div>
|
| 15 |
</nav>
|
|
@@ -867,6 +869,12 @@ const { tableOfContentAutoCollapse = false } = Astro.props as Props;
|
|
| 867 |
font-size: 13px;
|
| 868 |
}
|
| 869 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 870 |
.table-of-contents .title {
|
| 871 |
font-weight: 600;
|
| 872 |
font-size: 14px;
|
|
|
|
| 1 |
---
|
| 2 |
export interface Props {
|
| 3 |
tableOfContentAutoCollapse?: boolean;
|
| 4 |
+
readTime?: string;
|
| 5 |
}
|
| 6 |
+
const { tableOfContentAutoCollapse = false, readTime } = Astro.props as Props;
|
| 7 |
---
|
| 8 |
|
| 9 |
<nav
|
|
|
|
| 11 |
aria-label="Table of Contents"
|
| 12 |
data-auto-collapse={tableOfContentAutoCollapse ? "1" : "0"}
|
| 13 |
>
|
| 14 |
+
{readTime && <div class="toc-read-time">{readTime}</div>}
|
| 15 |
<div class="title">Table of Contents</div>
|
| 16 |
<div id="article-toc-placeholder"></div>
|
| 17 |
</nav>
|
|
|
|
| 869 |
font-size: 13px;
|
| 870 |
}
|
| 871 |
|
| 872 |
+
.toc-read-time {
|
| 873 |
+
font-size: 13px;
|
| 874 |
+
color: var(--muted-color);
|
| 875 |
+
margin-bottom: 8px;
|
| 876 |
+
}
|
| 877 |
+
|
| 878 |
.table-of-contents .title {
|
| 879 |
font-weight: 600;
|
| 880 |
font-size: 14px;
|
app/src/components/Video.astro
CHANGED
|
@@ -7,7 +7,7 @@ const id = `video-${Math.random().toString(36).slice(2, 9)}`;
|
|
| 7 |
---
|
| 8 |
|
| 9 |
<div class="video-player" data-video-player={id}>
|
| 10 |
-
<video id={id} src={src} controls muted preload="auto" playsinline style="width:100%; border-radius: 8px; display: block;" />
|
| 11 |
<div class="speed-controls">
|
| 12 |
<span class="speed-label">Speed:</span>
|
| 13 |
<button class="speed-btn active" data-speed="1">1x</button>
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
<div class="video-player" data-video-player={id}>
|
| 10 |
+
<video id={id} src={src} controls muted preload="auto" playsinline style="width:100%; border-radius: 8px; display: block; background: #000;" />
|
| 11 |
<div class="speed-controls">
|
| 12 |
<span class="speed-label">Speed:</span>
|
| 13 |
<button class="speed-btn active" data-speed="1">1x</button>
|
app/src/content/article.mdx
CHANGED
|
@@ -51,7 +51,6 @@ showPdf: false
|
|
| 51 |
---
|
| 52 |
|
| 53 |
import Hero from "./chapters/folding/01-hero.mdx";
|
| 54 |
-
import Results from "./chapters/folding/02-results.mdx";
|
| 55 |
import Hardware from "./chapters/folding/03-hardware.mdx";
|
| 56 |
import DataCollection from "./chapters/folding/04-data-collection.mdx";
|
| 57 |
import Training from "./chapters/folding/06-training.mdx";
|
|
@@ -62,8 +61,6 @@ import References from "./chapters/folding/12-references.mdx";
|
|
| 62 |
|
| 63 |
<Hero />
|
| 64 |
|
| 65 |
-
<Results />
|
| 66 |
-
|
| 67 |
<Hardware />
|
| 68 |
|
| 69 |
<DataCollection />
|
|
|
|
| 51 |
---
|
| 52 |
|
| 53 |
import Hero from "./chapters/folding/01-hero.mdx";
|
|
|
|
| 54 |
import Hardware from "./chapters/folding/03-hardware.mdx";
|
| 55 |
import DataCollection from "./chapters/folding/04-data-collection.mdx";
|
| 56 |
import Training from "./chapters/folding/06-training.mdx";
|
|
|
|
| 61 |
|
| 62 |
<Hero />
|
| 63 |
|
|
|
|
|
|
|
| 64 |
<Hardware />
|
| 65 |
|
| 66 |
<DataCollection />
|
app/src/content/assets/image/{Folding_V1.mp4 → Folding_Final.mp4}
RENAMED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:70f8daa647ddcc38350875ef4ad35c8b46318a01e3b555f2cf8eb49c7857a0e1
|
| 3 |
+
size 38519349
|
app/src/content/assets/image/Folding_V2.mp4
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:0834c68cdcfbffa09759b59d938909e55e07ae230a46ea033c56d0a024c29a46
|
| 3 |
-
size 37729508
|
|
|
|
|
|
|
|
|
|
|
|
app/src/content/chapters/folding/01-hero.mdx
CHANGED
|
@@ -1,35 +1,42 @@
|
|
| 1 |
import Sidenote from "../../../components/Sidenote.astro";
|
| 2 |
import Note from "../../../components/Note.astro";
|
| 3 |
import Wide from "../../../components/Wide.astro";
|
| 4 |
-
import
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
Everything we built for this project ([SARM](https://huggingface.co/docs/lerobot/sarm), [RTC](https://huggingface.co/docs/lerobot/rtc),
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
#### Links
|
| 23 |
|
| 24 |
-
|
| 25 |
-
<a href="https://huggingface.co/lerobot-data-collection/folding_final" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>Model</strong><br/>HF Hub</a>
|
| 26 |
-
<a href="https://huggingface.co/lerobot-data-collection/folding_sarm_reward" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>SARM Reward</strong><br/>HF Hub</a>
|
| 27 |
-
<a href="https://huggingface.co/datasets/lerobot/high_quality_folding" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>HQ Dataset</strong><br/>HF Hub</a>
|
| 28 |
-
<a href="https://huggingface.co/datasets/lerobot/full_folding" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>Full Dataset</strong><br/>HF Hub</a>
|
| 29 |
-
<a href="https://github.com/pkooij/open-arms-mini" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>OpenArm Mini</strong><br/>Repo</a>
|
| 30 |
-
<a href="https://github.com/huggingface/lerobot" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>LeRobot</strong><br/>Code</a>
|
| 31 |
-
<a href="https://huggingface.co/docs/lerobot/index" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>LeRobot</strong><br/>Documentation</a>
|
| 32 |
-
</Stack>
|
| 33 |
|
| 34 |
<Sidenote>
|
| 35 |
If you have questions, join our <a href="https://discord.com/invite/q8Dzzpym3f" target="_blank">Discord</a>!
|
|
|
|
| 1 |
import Sidenote from "../../../components/Sidenote.astro";
|
| 2 |
import Note from "../../../components/Note.astro";
|
| 3 |
import Wide from "../../../components/Wide.astro";
|
| 4 |
+
import Video from "../../../components/Video.astro";
|
| 5 |
|
| 6 |
+
Flashy demos of robotic systems are popping up on our feeds almost every day, showing robots that can [build boxes](https://www.youtube.com/watch?v=_GjunG1aGi4), [clean offices](https://www.youtube.com/watch?v=h6hTw6_7NlA), and [do household tasks](https://www.youtube.com/watch?v=jjOfpsMRhL4). But we typically don't know how these systems were actually built and trained and in some cases whether it's really the robot operating or a teleoperator behind the scenes.
|
| 7 |
|
| 8 |
+
How does a field collaboratively learn to build better and more trustworthy robots if most systems are shrouded in mystery?
|
| 9 |
+
|
| 10 |
+
To change this, we trained a robot on a challenging but highly requested task: **cloth folding**. We built and trained a bimanual robot that achieves **90% success rate** on folding a random t-shirt.
|
| 11 |
+
|
| 12 |
+
<Video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/level2.mp4" />
|
| 13 |
+
|
| 14 |
+
<p style="text-align: center; color: var(--muted-color); font-size: 0.85rem; margin-top: -8px;">Autonomous folding of crumpled t-shirts (Level 2)</p>
|
| 15 |
+
|
| 16 |
+
--------
|
| 17 |
|
| 18 |
+
To get there we used **8 bimanual robot setups**, spent **~131 hours** collecting demonstrations, and ran **dozens of training runs** on a GPU cluster. And to lift the veil on building end-to-end realistic robotic use-cases, this blog walks through every step:
|
| 19 |
|
| 20 |
+
- **Hardware** — which robot, cameras, and teleop system to use
|
| 21 |
+
- **Data collection** — how to collect and filter high-quality demonstrations
|
| 22 |
+
- **Training recipes** — which model architecture and hyperparameters work
|
| 23 |
+
- **Experiments** — careful ablations to improve the overall pipeline
|
| 24 |
+
- **Evaluation** — what metrics give good signal and are reliable enough
|
| 25 |
+
- **Takeaways** — what we learned and what we'd do differently next time
|
| 26 |
|
| 27 |
+
This post aims to serve as a **blueprint for anyone who wants to get started in robotics** and move beyond toy examples. You'll see how to build a real robotic system with all its challenges.
|
| 28 |
|
| 29 |
+
Everything we built for this project ([SARM](https://huggingface.co/docs/lerobot/sarm), [RTC](https://huggingface.co/docs/lerobot/rtc), HIL (Human-in-the-Loop), [OpenArm](https://huggingface.co/docs/lerobot/openarm), and [OpenArm Mini](https://github.com/pkooij/open-arms-mini)) is now merged into [LeRobot](https://github.com/huggingface/lerobot) and ready for the community to use. All resources from this project:
|
| 30 |
|
| 31 |
+
| Resource | Link |
|
| 32 |
+
|:---|:---|
|
| 33 |
+
| **Model** | [HF Hub](https://huggingface.co/lerobot-data-collection/folding_final) |
|
| 34 |
+
| **SARM Reward** | [HF Hub](https://huggingface.co/lerobot-data-collection/folding_sarm_reward) |
|
| 35 |
+
| **Dataset** | [Full](https://huggingface.co/datasets/lerobot/full_folding) / [HQ](https://huggingface.co/datasets/lerobot/high_quality_folding) |
|
| 36 |
+
| **OpenArm Mini** | [GitHub](https://github.com/pkooij/open-arms-mini) |
|
| 37 |
|
|
|
|
| 38 |
|
| 39 |
+
So we want to build a robot to fold clothes, but what kind of robot should we use? A humanoid? A single arm? Or something else? Let’s have a look at the design choices around the hardware.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
<Sidenote>
|
| 42 |
If you have questions, join our <a href="https://discord.com/invite/q8Dzzpym3f" target="_blank">Discord</a>!
|
app/src/content/chapters/folding/03-hardware.mdx
CHANGED
|
@@ -66,6 +66,6 @@ We use **three cameras**, each serving a purpose. The **base camera** is mounted
|
|
| 66 |
|
| 67 |
### LeRobot Integration
|
| 68 |
|
| 69 |
-
Integrating OpenArm into LeRobot required adding **CAN-bus** support. CAN-bus is the communication protocol the arm's motors use, think of it as a shared wire where LeRobot sends position commands ("move joint 3 to 45 degrees") and reads back the current joint angles.
|
| 70 |
|
| 71 |
With the hardware in place, the next step was the hardest and most time-consuming part of the entire project: collecting good data. And "good" is much harder to define than it sounds.
|
|
|
|
| 66 |
|
| 67 |
### LeRobot Integration
|
| 68 |
|
| 69 |
+
Integrating OpenArm into LeRobot required adding **CAN-bus** support. CAN-bus is the communication protocol the arm's motors use, think of it as a shared wire where LeRobot sends position commands ("move joint 3 to 45 degrees") and reads back the current joint angles. The CAN-bus driver is the thin bridge between software and hardware. This integration can now be found in the [LeRobot repository](https://github.com/huggingface/lerobot). We also created a UI for non-technical robot operators, so the CLI doesn't need to be used to start and stop episodes.
|
| 70 |
|
| 71 |
With the hardware in place, the next step was the hardest and most time-consuming part of the entire project: collecting good data. And "good" is much harder to define than it sounds.
|
app/src/content/chapters/folding/06-training.mdx
CHANGED
|
@@ -49,6 +49,8 @@ All experiments use **RTC** with an action queue size of 30 and a maximum action
|
|
| 49 |
|
| 50 |
We fine-tune two variants of this architecture: **π0**, the base flow-matching VLA, and **π0.5**, an improved version with more pretraining data and refinements to the denoising process. Both start from pretrained checkpoints. Training runs on **8× H100 GPUs** with a per-GPU batch size of 32 (a total batch size of 256), gradient accumulation, and using **AdamW** with a learning rate of **1e-4** (warmup + cosine decay). The large batch size is important for stable VLA training, and it's what drives the multi-GPU requirement.
|
| 51 |
|
|
|
|
|
|
|
| 52 |
---
|
| 53 |
|
| 54 |
### Evaluation Protocol
|
|
|
|
| 49 |
|
| 50 |
We fine-tune two variants of this architecture: **π0**, the base flow-matching VLA, and **π0.5**, an improved version with more pretraining data and refinements to the denoising process. Both start from pretrained checkpoints. Training runs on **8× H100 GPUs** with a per-GPU batch size of 32 (a total batch size of 256), gradient accumulation, and using **AdamW** with a learning rate of **1e-4** (warmup + cosine decay). The large batch size is important for stable VLA training, and it's what drives the multi-GPU requirement.
|
| 51 |
|
| 52 |
+
With ~131 hours of video-encoded demonstrations, keeping data close to compute matters. [Hugging Face Storage Buckets](https://huggingface.co/storage) now make this straightforward: they provide S3-like object storage with a built-in CDN that can be pre-warmed in your training region, and Xet deduplication means re-uploading a slightly modified dataset only transfers the diff. Datasets stored on the Hub or in buckets can be streamed directly to GPUs with no extra infrastructure.
|
| 53 |
+
|
| 54 |
---
|
| 55 |
|
| 56 |
### Evaluation Protocol
|
app/src/content/chapters/folding/08-ablations.mdx
CHANGED
|
@@ -4,6 +4,7 @@ import Wide from "../../../components/Wide.astro";
|
|
| 4 |
import Stack from "../../../components/Stack.astro";
|
| 5 |
import Accordion from "../../../components/Accordion.astro";
|
| 6 |
import HtmlEmbed from "../../../components/HtmlEmbed.astro";
|
|
|
|
| 7 |
|
| 8 |
import sarmEp300 from "../../assets/image/lerobot-data-collection_level2_final_quality3_ep300_progress.gif";
|
| 9 |
import sarmEp2500 from "../../assets/image/lerobot-data-collection_level12_rac_2_2026-02-08_1_ep2500_progress.gif";
|
|
@@ -159,6 +160,26 @@ The jump was dramatic. Experiment 2.5 reached **90% total success rate**: 100% L
|
|
| 159 |
|
| 160 |
Both 2.2 and 2.5 used the same recipe (HQ + RABC + Relative Actions), but 2.5 fine-tuned from 1.7 (the stronger base with relative actions + RABC already baked in) while 2.2 fine-tuned from 1.3. The difference (75% → 90%) likely reflects this stronger starting point. Data quality was the single biggest lever, and RABC's effect was strongest on **Level 2**, the longer, harder task where emphasizing the best demonstrations mattered most.
|
| 161 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
---
|
| 163 |
|
| 164 |
### What didn't work
|
|
|
|
| 4 |
import Stack from "../../../components/Stack.astro";
|
| 5 |
import Accordion from "../../../components/Accordion.astro";
|
| 6 |
import HtmlEmbed from "../../../components/HtmlEmbed.astro";
|
| 7 |
+
import Video from "../../../components/Video.astro";
|
| 8 |
|
| 9 |
import sarmEp300 from "../../assets/image/lerobot-data-collection_level2_final_quality3_ep300_progress.gif";
|
| 10 |
import sarmEp2500 from "../../assets/image/lerobot-data-collection_level12_rac_2_2026-02-08_1_ep2500_progress.gif";
|
|
|
|
| 160 |
|
| 161 |
Both 2.2 and 2.5 used the same recipe (HQ + RABC + Relative Actions), but 2.5 fine-tuned from 1.7 (the stronger base with relative actions + RABC already baked in) while 2.2 fine-tuned from 1.3. The difference (75% → 90%) likely reflects this stronger starting point. Data quality was the single biggest lever, and RABC's effect was strongest on **Level 2**, the longer, harder task where emphasizing the best demonstrations mattered most.
|
| 162 |
|
| 163 |
+
Here is an uncut Level 1 evaluation run from Experiment 2.5 — 15 minutes of continuous folding, no human intervention:
|
| 164 |
+
|
| 165 |
+
<Video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/level1.mp4" />
|
| 166 |
+
|
| 167 |
+
<p style="text-align: center; color: var(--muted-color); font-size: 0.85rem; margin-top: -8px;">Autonomous folding from flat state (Level 1)</p>
|
| 168 |
+
|
| 169 |
+
-------------
|
| 170 |
+
|
| 171 |
+
We evaluate on two difficulty levels. Level 1 starts from a laid-out t-shirt; Level 2 starts from a crumpled mess and requires spreading, folding, and placing the shirt aside. Results are from our best model (Experiment 2.5), evaluated over 20 rollouts:
|
| 172 |
+
|
| 173 |
+
| Task | Success Rate | Avg. Completion Time |
|
| 174 |
+
|:---|:---:|:---:|
|
| 175 |
+
| **Level 1** Laid-out to Fold | **100%** | **40.8 s** |
|
| 176 |
+
| **Level 2** Messy to Spread to Fold to Place aside | **80%** | **95.9 s** |
|
| 177 |
+
| **Combined** (Total SR) | **90%** | |
|
| 178 |
+
|
| 179 |
+
<Sidenote>
|
| 180 |
+
All evaluations filmed and scored from video. 20 rollouts per experiment (10 per level). Full methodology in the [Model and Evaluation Setup](#model-and-evaluation-setup) section.
|
| 181 |
+
</Sidenote>
|
| 182 |
+
|
| 183 |
---
|
| 184 |
|
| 185 |
### What didn't work
|
app/src/pages/index.astro
CHANGED
|
@@ -329,6 +329,7 @@ const licence =
|
|
| 329 |
<section class="content-grid">
|
| 330 |
<TableOfContents
|
| 331 |
tableOfContentAutoCollapse={tableOfContentAutoCollapse}
|
|
|
|
| 332 |
/>
|
| 333 |
<main>
|
| 334 |
<Article />
|
|
|
|
| 329 |
<section class="content-grid">
|
| 330 |
<TableOfContents
|
| 331 |
tableOfContentAutoCollapse={tableOfContentAutoCollapse}
|
| 332 |
+
readTime="~25 min read"
|
| 333 |
/>
|
| 334 |
<main>
|
| 335 |
<Article />
|