joelniklaus HF Staff commited on
Commit
18b5be6
·
1 Parent(s): 33e5bb8

renamed chapter

Browse files
app/src/content/chapters/6-finephrase.mdx CHANGED
@@ -7,7 +7,7 @@ import Wide from "../../components/Wide.astro";
7
  import datasetCardImg from "../assets/image/auto-dataset-card.png";
8
  import finephraseProgressImg from "../assets/image/finephrase-progress.png";
9
 
10
- ## Applying the Recipe at Scale
11
 
12
  With the experiments done and the infrastructure battle-tested, it's time to put everything together. We take our findings and build [FinePhrase](https://huggingface.co/datasets/HuggingFaceFW/finephrase), a large-scale synthetic dataset that rephrases 339 million documents from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (sample-350BT) into four structured formats, producing 1.35 billion samples and 486 billion completion tokens of synthetic pretraining data.
13
 
 
7
  import datasetCardImg from "../assets/image/auto-dataset-card.png";
8
  import finephraseProgressImg from "../assets/image/finephrase-progress.png";
9
 
10
+ ## Building FinePhrase
11
 
12
  With the experiments done and the infrastructure battle-tested, it's time to put everything together. We take our findings and build [FinePhrase](https://huggingface.co/datasets/HuggingFaceFW/finephrase), a large-scale synthetic dataset that rephrases 339 million documents from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (sample-350BT) into four structured formats, producing 1.35 billion samples and 486 billion completion tokens of synthetic pretraining data.
13