pepijn223 HF Staff commited on
Commit
699a084
·
unverified ·
1 Parent(s): 18ab968

Refine data collection operator effort description

Browse files
app/src/content/chapters/folding/04-data-collection.mdx CHANGED
@@ -35,7 +35,7 @@ After weeks of collecting data across operators, these are the guidelines we fou
35
 
36
  ### What we ended up with
37
 
38
- After multiple weeks of collection across 8 setups, we had **5,688 episodes**, the **full dataset**. Those ~131 hours of recorded demonstrations represent far more than 131 hours of actual work. Setups broke and needed repair, operators had to practice teleoperation before producing useful data, and the repetitive motions are genuinely tiring. Realistically, productive recording filled less than a third of each operator's workday. Not all episodes are equally useful either: some contain hesitations, inconsistent strategies, or poor fold quality. Later in the project, we built a smaller **high-quality dataset** of 1,200 episodes by selecting the best recordings from the full set and adding new demonstrations collected with a more unified strategy.
39
 
40
  | Metric | Full dataset | High-quality dataset |
41
  |:---|:---:|:---:|
 
35
 
36
  ### What we ended up with
37
 
38
+ After multiple weeks of collection across 8 setups, we had **5,688 episodes**, the **full dataset**. Those ~131 hours of recorded demonstrations represent a fraction of the total time operators spent on the project, which also included practicing teleoperation, setting up and repairing robots, and aligning on strategies between sessions. This shows that data collection is a lot more than just recording demonstrations, and being very efficient with your time is key. Not all episodes are equally useful either: some contain hesitations, inconsistent strategies, or poor fold quality. Later in the project, we built a smaller **high-quality dataset** of 1,200 episodes by selecting the best recordings from the full set and adding new demonstrations collected with a more unified strategy.
39
 
40
  | Metric | Full dataset | High-quality dataset |
41
  |:---|:---:|:---:|