pepijn223 HF Staff commited on
Commit
765c156
·
unverified ·
1 Parent(s): 8bf1f05

Refine blog intro, videos, resource table, and add HF Buckets mention

Browse files

- Update intro links (build boxes, clean offices, household tasks)
- Replace hero video with Folding_Final.mp4
- Add descriptive captions under Level 1/Level 2 videos
- Fix white screen on video pause (black background)
- Move read time to ToC sidebar
- Simplify resource table (combine datasets, remove LeRobot code/docs)
- Replace DAgger with HIL (Human-in-the-Loop) in intro
- Add intro sentences for key metrics table in ablations
- Add HF Storage Buckets paragraph in training section

Made-with: Cursor

app/src/components/Hero.astro CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  import HtmlEmbed from "./HtmlEmbed.astro";
3
- import announcementVideo from "../content/assets/image/Folding_V2.mp4";
4
 
5
  interface Props {
6
  title: string; // may contain HTML (e.g., <br/>)
 
1
  ---
2
  import HtmlEmbed from "./HtmlEmbed.astro";
3
+ import announcementVideo from "../content/assets/image/Folding_Final.mp4";
4
 
5
  interface Props {
6
  title: string; // may contain HTML (e.g., <br/>)
app/src/components/TableOfContents.astro CHANGED
@@ -1,8 +1,9 @@
1
  ---
2
  export interface Props {
3
  tableOfContentAutoCollapse?: boolean;
 
4
  }
5
- const { tableOfContentAutoCollapse = false } = Astro.props as Props;
6
  ---
7
 
8
  <nav
@@ -10,6 +11,7 @@ const { tableOfContentAutoCollapse = false } = Astro.props as Props;
10
  aria-label="Table of Contents"
11
  data-auto-collapse={tableOfContentAutoCollapse ? "1" : "0"}
12
  >
 
13
  <div class="title">Table of Contents</div>
14
  <div id="article-toc-placeholder"></div>
15
  </nav>
@@ -867,6 +869,12 @@ const { tableOfContentAutoCollapse = false } = Astro.props as Props;
867
  font-size: 13px;
868
  }
869
 
 
 
 
 
 
 
870
  .table-of-contents .title {
871
  font-weight: 600;
872
  font-size: 14px;
 
1
  ---
2
  export interface Props {
3
  tableOfContentAutoCollapse?: boolean;
4
+ readTime?: string;
5
  }
6
+ const { tableOfContentAutoCollapse = false, readTime } = Astro.props as Props;
7
  ---
8
 
9
  <nav
 
11
  aria-label="Table of Contents"
12
  data-auto-collapse={tableOfContentAutoCollapse ? "1" : "0"}
13
  >
14
+ {readTime && <div class="toc-read-time">{readTime}</div>}
15
  <div class="title">Table of Contents</div>
16
  <div id="article-toc-placeholder"></div>
17
  </nav>
 
869
  font-size: 13px;
870
  }
871
 
872
+ .toc-read-time {
873
+ font-size: 13px;
874
+ color: var(--muted-color);
875
+ margin-bottom: 8px;
876
+ }
877
+
878
  .table-of-contents .title {
879
  font-weight: 600;
880
  font-size: 14px;
app/src/components/Video.astro CHANGED
@@ -7,7 +7,7 @@ const id = `video-${Math.random().toString(36).slice(2, 9)}`;
7
  ---
8
 
9
  <div class="video-player" data-video-player={id}>
10
- <video id={id} src={src} controls muted preload="auto" playsinline style="width:100%; border-radius: 8px; display: block;" />
11
  <div class="speed-controls">
12
  <span class="speed-label">Speed:</span>
13
  <button class="speed-btn active" data-speed="1">1x</button>
 
7
  ---
8
 
9
  <div class="video-player" data-video-player={id}>
10
+ <video id={id} src={src} controls muted preload="auto" playsinline style="width:100%; border-radius: 8px; display: block; background: #000;" />
11
  <div class="speed-controls">
12
  <span class="speed-label">Speed:</span>
13
  <button class="speed-btn active" data-speed="1">1x</button>
app/src/content/article.mdx CHANGED
@@ -51,7 +51,6 @@ showPdf: false
51
  ---
52
 
53
  import Hero from "./chapters/folding/01-hero.mdx";
54
- import Results from "./chapters/folding/02-results.mdx";
55
  import Hardware from "./chapters/folding/03-hardware.mdx";
56
  import DataCollection from "./chapters/folding/04-data-collection.mdx";
57
  import Training from "./chapters/folding/06-training.mdx";
@@ -62,8 +61,6 @@ import References from "./chapters/folding/12-references.mdx";
62
 
63
  <Hero />
64
 
65
- <Results />
66
-
67
  <Hardware />
68
 
69
  <DataCollection />
 
51
  ---
52
 
53
  import Hero from "./chapters/folding/01-hero.mdx";
 
54
  import Hardware from "./chapters/folding/03-hardware.mdx";
55
  import DataCollection from "./chapters/folding/04-data-collection.mdx";
56
  import Training from "./chapters/folding/06-training.mdx";
 
61
 
62
  <Hero />
63
 
 
 
64
  <Hardware />
65
 
66
  <DataCollection />
app/src/content/assets/image/{Folding_V1.mp4 → Folding_Final.mp4} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:329351f8eb794af365639aac14a99fd988168ecd2988860ce46078451fda7d25
3
- size 50627708
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70f8daa647ddcc38350875ef4ad35c8b46318a01e3b555f2cf8eb49c7857a0e1
3
+ size 38519349
app/src/content/assets/image/Folding_V2.mp4 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0834c68cdcfbffa09759b59d938909e55e07ae230a46ea033c56d0a024c29a46
3
- size 37729508
 
 
 
 
app/src/content/chapters/folding/01-hero.mdx CHANGED
@@ -1,35 +1,42 @@
1
  import Sidenote from "../../../components/Sidenote.astro";
2
  import Note from "../../../components/Note.astro";
3
  import Wide from "../../../components/Wide.astro";
4
- import Stack from "../../../components/Stack.astro";
5
 
6
- > We trained an open-source bimanual robot to fold t-shirts autonomously, reaching 90% success rate. The biggest lever was data quality, not the model, not the architecture.
7
 
8
- <Sidenote>
9
- Read time: ~25 minutes.
10
- </Sidenote>
 
 
 
 
 
 
11
 
12
- This post walks through the complete journey: hardware choices, data collection, training recipes, and different experiments that show what actually matters. We cover the mistakes and dead ends alongside the things that worked, because the messy middle is where most of the learning happens.
13
 
14
- Some of what we found: cheap 3D-printed leader arms outperformed the expensive ones for teleoperation. Early data collection was more wasteful than expected. A trained reward model turned out to be essential for separating useful demonstrations from harmful ones. And curating a small, high-quality dataset did more than any algorithmic improvement on the full dataset.
 
 
 
 
 
15
 
16
- By sharing this we hope to contribute to our bigger vision: **democratize robotics and robot learning**. By open-sourcing every piece (tools, data, models, and knowledge) we want to enable a community that pushes this technology further. We've tried to avoid just listing what we did in favor of telling the story of how we got here. We hope being this open will help close the gap between closed-lab demos and what the open-source community can achieve.
17
 
18
- Everything we built for this project ([SARM](https://huggingface.co/docs/lerobot/sarm), [RTC](https://huggingface.co/docs/lerobot/rtc), DAgger, [OpenArm](https://huggingface.co/docs/lerobot/openarm), and [OpenArm Mini](https://github.com/pkooij/open-arms-mini)) is now merged into [LeRobot](https://github.com/huggingface/lerobot) and ready for the community to use.
19
 
20
- _Let's start with the results, does it actually work?_
 
 
 
 
 
21
 
22
- #### Links
23
 
24
- <Stack layout="4-column" gap="small" class="links-centered">
25
- <a href="https://huggingface.co/lerobot-data-collection/folding_final" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>Model</strong><br/>HF Hub</a>
26
- <a href="https://huggingface.co/lerobot-data-collection/folding_sarm_reward" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>SARM Reward</strong><br/>HF Hub</a>
27
- <a href="https://huggingface.co/datasets/lerobot/high_quality_folding" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>HQ Dataset</strong><br/>HF Hub</a>
28
- <a href="https://huggingface.co/datasets/lerobot/full_folding" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>Full Dataset</strong><br/>HF Hub</a>
29
- <a href="https://github.com/pkooij/open-arms-mini" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>OpenArm Mini</strong><br/>Repo</a>
30
- <a href="https://github.com/huggingface/lerobot" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>LeRobot</strong><br/>Code</a>
31
- <a href="https://huggingface.co/docs/lerobot/index" className="card" style="padding: 12px 16px; text-align: center; text-decoration: none;"><strong>LeRobot</strong><br/>Documentation</a>
32
- </Stack>
33
 
34
  <Sidenote>
35
  If you have questions, join our <a href="https://discord.com/invite/q8Dzzpym3f" target="_blank">Discord</a>!
 
1
  import Sidenote from "../../../components/Sidenote.astro";
2
  import Note from "../../../components/Note.astro";
3
  import Wide from "../../../components/Wide.astro";
4
+ import Video from "../../../components/Video.astro";
5
 
6
+ Flashy demos of robotic systems are popping up on our feeds almost every day, showing robots that can [build boxes](https://www.youtube.com/watch?v=_GjunG1aGi4), [clean offices](https://www.youtube.com/watch?v=h6hTw6_7NlA), and [do household tasks](https://www.youtube.com/watch?v=jjOfpsMRhL4). But we typically don't know how these systems were actually built and trained and in some cases whether it's really the robot operating or a teleoperator behind the scenes.
7
 
8
+ How does a field collaboratively learn to build better and more trustworthy robots if most systems are shrouded in mystery?
9
+
10
+ To change this, we trained a robot on a challenging but highly requested task: **cloth folding**. We built and trained a bimanual robot that achieves **90% success rate** on folding a random t-shirt.
11
+
12
+ <Video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/level2.mp4" />
13
+
14
+ <p style="text-align: center; color: var(--muted-color); font-size: 0.85rem; margin-top: -8px;">Autonomous folding of crumpled t-shirts (Level 2)</p>
15
+
16
+ --------
17
 
18
+ To get there we used **8 bimanual robot setups**, spent **~131 hours** collecting demonstrations, and ran **dozens of training runs** on a GPU cluster. And to lift the veil on building end-to-end realistic robotic use-cases, this blog walks through every step:
19
 
20
+ - **Hardware** which robot, cameras, and teleop system to use
21
+ - **Data collection** — how to collect and filter high-quality demonstrations
22
+ - **Training recipes** — which model architecture and hyperparameters work
23
+ - **Experiments** — careful ablations to improve the overall pipeline
24
+ - **Evaluation** — what metrics give good signal and are reliable enough
25
+ - **Takeaways** — what we learned and what we'd do differently next time
26
 
27
+ This post aims to serve as a **blueprint for anyone who wants to get started in robotics** and move beyond toy examples. You'll see how to build a real robotic system with all its challenges.
28
 
29
+ Everything we built for this project ([SARM](https://huggingface.co/docs/lerobot/sarm), [RTC](https://huggingface.co/docs/lerobot/rtc), HIL (Human-in-the-Loop), [OpenArm](https://huggingface.co/docs/lerobot/openarm), and [OpenArm Mini](https://github.com/pkooij/open-arms-mini)) is now merged into [LeRobot](https://github.com/huggingface/lerobot) and ready for the community to use. All resources from this project:
30
 
31
+ | Resource | Link |
32
+ |:---|:---|
33
+ | **Model** | [HF Hub](https://huggingface.co/lerobot-data-collection/folding_final) |
34
+ | **SARM Reward** | [HF Hub](https://huggingface.co/lerobot-data-collection/folding_sarm_reward) |
35
+ | **Dataset** | [Full](https://huggingface.co/datasets/lerobot/full_folding) / [HQ](https://huggingface.co/datasets/lerobot/high_quality_folding) |
36
+ | **OpenArm Mini** | [GitHub](https://github.com/pkooij/open-arms-mini) |
37
 
 
38
 
39
+ So we want to build a robot to fold clothes, but what kind of robot should we use? A humanoid? A single arm? Or something else? Let’s have a look at the design choices around the hardware.
 
 
 
 
 
 
 
 
40
 
41
  <Sidenote>
42
  If you have questions, join our <a href="https://discord.com/invite/q8Dzzpym3f" target="_blank">Discord</a>!
app/src/content/chapters/folding/03-hardware.mdx CHANGED
@@ -66,6 +66,6 @@ We use **three cameras**, each serving a purpose. The **base camera** is mounted
66
 
67
  ### LeRobot Integration
68
 
69
- Integrating OpenArm into LeRobot required adding **CAN-bus** support. CAN-bus is the communication protocol the arm's motors use, think of it as a shared wire where LeRobot sends position commands ("move joint 3 to 45 degrees") and reads back the current joint angles. Everything else: capturing camera images, running the model, converting predictions into those joint positions, happens in Python inside LeRobot. The CAN-bus driver is the thin bridge between software and hardware. This integration can now be found in the [LeRobot repository](https://github.com/huggingface/lerobot). We also created a UI for non-technical robot operators, so the CLI doesn't need to be used to start and stop episodes.
70
 
71
  With the hardware in place, the next step was the hardest and most time-consuming part of the entire project: collecting good data. And "good" is much harder to define than it sounds.
 
66
 
67
  ### LeRobot Integration
68
 
69
+ Integrating OpenArm into LeRobot required adding **CAN-bus** support. CAN-bus is the communication protocol the arm's motors use, think of it as a shared wire where LeRobot sends position commands ("move joint 3 to 45 degrees") and reads back the current joint angles. The CAN-bus driver is the thin bridge between software and hardware. This integration can now be found in the [LeRobot repository](https://github.com/huggingface/lerobot). We also created a UI for non-technical robot operators, so the CLI doesn't need to be used to start and stop episodes.
70
 
71
  With the hardware in place, the next step was the hardest and most time-consuming part of the entire project: collecting good data. And "good" is much harder to define than it sounds.
app/src/content/chapters/folding/06-training.mdx CHANGED
@@ -49,6 +49,8 @@ All experiments use **RTC** with an action queue size of 30 and a maximum action
49
 
50
  We fine-tune two variants of this architecture: **π0**, the base flow-matching VLA, and **π0.5**, an improved version with more pretraining data and refinements to the denoising process. Both start from pretrained checkpoints. Training runs on **8× H100 GPUs** with a per-GPU batch size of 32 (a total batch size of 256), gradient accumulation, and using **AdamW** with a learning rate of **1e-4** (warmup + cosine decay). The large batch size is important for stable VLA training, and it's what drives the multi-GPU requirement.
51
 
 
 
52
  ---
53
 
54
  ### Evaluation Protocol
 
49
 
50
  We fine-tune two variants of this architecture: **π0**, the base flow-matching VLA, and **π0.5**, an improved version with more pretraining data and refinements to the denoising process. Both start from pretrained checkpoints. Training runs on **8× H100 GPUs** with a per-GPU batch size of 32 (a total batch size of 256), gradient accumulation, and using **AdamW** with a learning rate of **1e-4** (warmup + cosine decay). The large batch size is important for stable VLA training, and it's what drives the multi-GPU requirement.
51
 
52
+ With ~131 hours of video-encoded demonstrations, keeping data close to compute matters. [Hugging Face Storage Buckets](https://huggingface.co/storage) now make this straightforward: they provide S3-like object storage with a built-in CDN that can be pre-warmed in your training region, and Xet deduplication means re-uploading a slightly modified dataset only transfers the diff. Datasets stored on the Hub or in buckets can be streamed directly to GPUs with no extra infrastructure.
53
+
54
  ---
55
 
56
  ### Evaluation Protocol
app/src/content/chapters/folding/08-ablations.mdx CHANGED
@@ -4,6 +4,7 @@ import Wide from "../../../components/Wide.astro";
4
  import Stack from "../../../components/Stack.astro";
5
  import Accordion from "../../../components/Accordion.astro";
6
  import HtmlEmbed from "../../../components/HtmlEmbed.astro";
 
7
 
8
  import sarmEp300 from "../../assets/image/lerobot-data-collection_level2_final_quality3_ep300_progress.gif";
9
  import sarmEp2500 from "../../assets/image/lerobot-data-collection_level12_rac_2_2026-02-08_1_ep2500_progress.gif";
@@ -159,6 +160,26 @@ The jump was dramatic. Experiment 2.5 reached **90% total success rate**: 100% L
159
 
160
  Both 2.2 and 2.5 used the same recipe (HQ + RABC + Relative Actions), but 2.5 fine-tuned from 1.7 (the stronger base with relative actions + RABC already baked in) while 2.2 fine-tuned from 1.3. The difference (75% → 90%) likely reflects this stronger starting point. Data quality was the single biggest lever, and RABC's effect was strongest on **Level 2**, the longer, harder task where emphasizing the best demonstrations mattered most.
161
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
  ---
163
 
164
  ### What didn't work
 
4
  import Stack from "../../../components/Stack.astro";
5
  import Accordion from "../../../components/Accordion.astro";
6
  import HtmlEmbed from "../../../components/HtmlEmbed.astro";
7
+ import Video from "../../../components/Video.astro";
8
 
9
  import sarmEp300 from "../../assets/image/lerobot-data-collection_level2_final_quality3_ep300_progress.gif";
10
  import sarmEp2500 from "../../assets/image/lerobot-data-collection_level12_rac_2_2026-02-08_1_ep2500_progress.gif";
 
160
 
161
  Both 2.2 and 2.5 used the same recipe (HQ + RABC + Relative Actions), but 2.5 fine-tuned from 1.7 (the stronger base with relative actions + RABC already baked in) while 2.2 fine-tuned from 1.3. The difference (75% → 90%) likely reflects this stronger starting point. Data quality was the single biggest lever, and RABC's effect was strongest on **Level 2**, the longer, harder task where emphasizing the best demonstrations mattered most.
162
 
163
+ Here is an uncut Level 1 evaluation run from Experiment 2.5 — 15 minutes of continuous folding, no human intervention:
164
+
165
+ <Video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/level1.mp4" />
166
+
167
+ <p style="text-align: center; color: var(--muted-color); font-size: 0.85rem; margin-top: -8px;">Autonomous folding from flat state (Level 1)</p>
168
+
169
+ -------------
170
+
171
+ We evaluate on two difficulty levels. Level 1 starts from a laid-out t-shirt; Level 2 starts from a crumpled mess and requires spreading, folding, and placing the shirt aside. Results are from our best model (Experiment 2.5), evaluated over 20 rollouts:
172
+
173
+ | Task | Success Rate | Avg. Completion Time |
174
+ |:---|:---:|:---:|
175
+ | **Level 1** Laid-out to Fold | **100%** | **40.8 s** |
176
+ | **Level 2** Messy to Spread to Fold to Place aside | **80%** | **95.9 s** |
177
+ | **Combined** (Total SR) | **90%** | |
178
+
179
+ <Sidenote>
180
+ All evaluations filmed and scored from video. 20 rollouts per experiment (10 per level). Full methodology in the [Model and Evaluation Setup](#model-and-evaluation-setup) section.
181
+ </Sidenote>
182
+
183
  ---
184
 
185
  ### What didn't work
app/src/pages/index.astro CHANGED
@@ -329,6 +329,7 @@ const licence =
329
  <section class="content-grid">
330
  <TableOfContents
331
  tableOfContentAutoCollapse={tableOfContentAutoCollapse}
 
332
  />
333
  <main>
334
  <Article />
 
329
  <section class="content-grid">
330
  <TableOfContents
331
  tableOfContentAutoCollapse={tableOfContentAutoCollapse}
332
+ readTime="~25 min read"
333
  />
334
  <main>
335
  <Article />