Update app/src/content/chapters/folding/04-data-collection.mdx

#5
app/src/content/chapters/folding/04-data-collection.mdx CHANGED
@@ -10,7 +10,7 @@ We ran **8 setups** in parallel, optimizing for **maximum diversity**: 25+ diffe
10
 
11
  ### Learning to Teleoperate
12
 
13
- Here's an honest truth: **early data is worse than the final data**. Teleoperating a bimanual robot is a genuine skill, and it takes practice. The first episodes are slow, not deliberate, and full of failed attempts. Over hours of practice, operators get dramatically better smoother motions, faster execution, and more consistent grasps.
14
 
15
  This creates one of the most important practical decisions of the project: **when do you start recording data for the final model?** Too early and you pollute the dataset with low-quality demonstrations that the model will faithfully reproduce, hesitations, fumbles, and all. Too late and you've wasted precious time.
16
 
@@ -18,11 +18,11 @@ Another important part is aligning the strategy between operators. Since some pa
18
 
19
  ### Tips for Good Data Collection
20
 
21
- 1. **Practice before you record.** Smooth, deliberate motions beat fast, sloppy ones.
22
- 2. **Quality over speed. Always.** A fast but messy episode teaches bad habits that are hard to untrain.
23
- 3. **Each action should make sense from the current observation alone.** Most models don't have history, so avoid motions that only work because *you* remember what happened 5 seconds ago.
24
  4. **Be consistent within episodes.** The model learns a coherent strategy more easily than movements that vary wildly each time.
25
- 5. **Start small, then extend.** Train a quick model, see what fails, then add diversity. Don't try to collect the perfect dataset on day one.
26
  6. **Speed comes last.** Once you've dialed in quality and a consistent strategy, optimize for speed. But never sacrifice quality for it.
27
 
28
  After learning all these things and collecting data for multiple weeks we ended up with 5,688 episodes across 8 setups.
 
10
 
11
  ### Learning to Teleoperate
12
 
13
+ Teleoperating a bimanual robot is a genuine skill, and it takes practice; this means that unfortunately **early data is worse than the final data**. The first episodes are slow, not deliberate, and full of failed attempts. Over hours of practice, operators learn smoother motions, faster execution, and more consistent grasps.
14
 
15
  This creates one of the most important practical decisions of the project: **when do you start recording data for the final model?** Too early and you pollute the dataset with low-quality demonstrations that the model will faithfully reproduce, hesitations, fumbles, and all. Too late and you've wasted precious time.
16
 
 
18
 
19
  ### Tips for Good Data Collection
20
 
21
+ 1. **Practice before you record.** Smooth and deliberate beats fast and sloppy.
22
+ 2. **Quality over speed.** Once learned, bad habits are hard to untrain.
23
+ 3. **Each action should make sense from the current observation alone.** Most models are markovian, they don't have history, so avoid motions that only work because *you* remember what happened 5 seconds ago.
24
  4. **Be consistent within episodes.** The model learns a coherent strategy more easily than movements that vary wildly each time.
25
+ 5. **Start small, then extend.** Rather than trying to collect the perfect dataset day one, train a quick model, see what fails, then add diversity.
26
  6. **Speed comes last.** Once you've dialed in quality and a consistent strategy, optimize for speed. But never sacrifice quality for it.
27
 
28
  After learning all these things and collecting data for multiple weeks we ended up with 5,688 episodes across 8 setups.