Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
e0ccb24
1
Parent(s): f7eff68
add todos and rephrased conclusion paragraph
Browse files
app/src/content/chapters/3-experiments.mdx
CHANGED
|
@@ -10,6 +10,10 @@ import FigRef from "../../components/FigRef.astro";
|
|
| 10 |
{/* TODO: Integrate decay experiment as another analysis for proxy */}
|
| 11 |
{/* TODO: share on a bunch of discords/slacks/hackernews/locallama */}
|
| 12 |
{/* TODO: brainstorm better banner, be artsy */}
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
{/* TODO: banner idea: 1T tokens = 8M books
|
| 14 |
5cm pro buech = 400km
|
| 15 |
|
|
@@ -599,4 +603,4 @@ Here are the key takeaways from our experiments:
|
|
| 599 |
- **Q: Do typos in the prompt hurt?**<br/>
|
| 600 |
A: No. Typos have no negative effect on downstream performance.
|
| 601 |
|
| 602 |
-
|
|
|
|
| 10 |
{/* TODO: Integrate decay experiment as another analysis for proxy */}
|
| 11 |
{/* TODO: share on a bunch of discords/slacks/hackernews/locallama */}
|
| 12 |
{/* TODO: brainstorm better banner, be artsy */}
|
| 13 |
+
{/* TODO: run variance experiments with pretraining from scratch */}
|
| 14 |
+
{/* TODO: run scaling experiments with longer pretraining phase */}
|
| 15 |
+
{/* TODO: filter docs before/after rephrasing (non-mathematical document for math prompt) */}
|
| 16 |
+
{/* TODO: try multiple rollouts and scoring */}
|
| 17 |
{/* TODO: banner idea: 1T tokens = 8M books
|
| 18 |
5cm pro buech = 400km
|
| 19 |
|
|
|
|
| 603 |
- **Q: Do typos in the prompt hurt?**<br/>
|
| 604 |
A: No. Typos have no negative effect on downstream performance.
|
| 605 |
|
| 606 |
+
So what actually matters? Prompt design, above all else. Structured formats like Math, Table, FAQ, and Tutorial consistently beat curated baselines. Everything else is surprisingly forgiving. A 1B model handles simple prompts just fine, 4B covers the complex ones, and going bigger buys you nothing. Source data quality barely matters either, as long as you mix in strong original data. That last point is worth emphasizing: low-quality sources with a good mix-in match high-quality sources, which means you can draw from a much larger and more diverse data pool. The recipe we landed on is simple: pick a structured prompt, use the smallest model that handles it, blend with high-quality original data, and pour the saved compute into volume.
|