joelniklaus HF Staff commited on
Commit
5df08f8
·
1 Parent(s): 7c8644c

add overview of the findings in the conclusions

Browse files
app/src/content/chapters/conclusions.mdx CHANGED
@@ -1,6 +1,31 @@
1
  ## Conclusions
2
 
3
- TODO: Table with answers to the questions (ablation sections)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ### Next Steps
6
 
 
1
  ## Conclusions
2
 
3
+ Here are the key takeaways from our experiments:
4
+
5
+ - **Q: How do existing datasets compare?**<br/>
6
+ A: DCLM, Nemotron-HQ-Synth, and REWIRE lead. Most synthetic baselines fall behind.
7
+ - **Q: Which individual prompts from the synthetic baselines match DCLM?**<br/>
8
+ A: Only Diverse QA Pairs and REWIRE's Guided Rewrite.
9
+ - **Q: Can new prompts beat DCLM?**<br/>
10
+ A: Yes. Math, Table, FAQ, and Tutorial all outperform DCLM.
11
+ - **Q: Does model size matter?**<br/>
12
+ A: Not much. 1B is sufficient for simple prompts, 4B for complex ones.
13
+ - **Q: Do we need better models for low-quality data?**<br/>
14
+ A: No consistent advantage from larger models on low-quality sources.
15
+ - **Q: Does the model family matter?**<br/>
16
+ A: Yes. SmolLM2 dominates across all prompts.
17
+ - **Q: Does the model generation matter?**<br/>
18
+ A: Slightly. Newer Qwen versions trend better.
19
+ - **Q: Is synthetic data enough?**<br/>
20
+ A: No. Always mix synthetic with original data.
21
+ - **Q: Does the mix-in dataset matter?**<br/>
22
+ A: Yes, a major performance driver, sometimes more important than the synthetic data.
23
+ - **Q: Does the source dataset matter?**<br/>
24
+ A: Not with a strong mix-in. Even low-quality sources produce competitive results.
25
+ - **Q: Does increased diversity help?**<br/>
26
+ A: No, performance averages rather than compounds.
27
+ - **Q: Do typos in the prompt hurt?**<br/>
28
+ A: No. Typos have no negative effect on downstream performance.
29
 
30
  ### Next Steps
31