finephrase

Running on CPU Upgrade

joelniklaus HF Staff commited on 15 days ago

Commit

e46c12a

1 Parent(s): 462a612

reordered finephrase section to end with the dataset

Files changed (1) hide show

app/src/content/chapters/6-finephrase.mdx CHANGED Viewed

@@ -104,18 +104,6 @@ datacard_pipeline = [
 ]
 ```
-### What's in the Dataset?
-<FigRef target="finephrase-explorer" /> lets you browse real examples from FinePhrase. Each sample shows the original FineWeb-Edu source document alongside all four rephrased versions. Navigate through samples to see how the same web document becomes a FAQ, a math problem, a structured table, and a step-by-step tutorial.
-<Wide>
-<HtmlEmbed
-  id="finephrase-explorer"
-  src="finephrase-explorer.html"
-  caption="Browse real examples from the FinePhrase dataset. Each sample shows the original source document alongside all four rephrased versions (FAQ, Math, Table, Tutorial). Use the arrows or Random button to navigate between samples."
-/>
-</Wide>
 ### Improvements to DataTrove
 Building FinePhrase was not just about running inference at scale. It required hardening DataTrove's inference pipeline to handle the realities of processing 339 million documents across 100 parallel workers over two weeks. Every failure mode you can imagine showed up: documents that crash the model, workers racing to commit to the same repo, Slurm jobs dying on startup, and caches corrupting under contention. We merged over a dozen PRs to make this work. Here are the most impactful ones.
@@ -171,3 +159,15 @@ SlurmPipelineExecutor(
     ...
 )
 ```

 ]
 ```
 ### Improvements to DataTrove
 Building FinePhrase was not just about running inference at scale. It required hardening DataTrove's inference pipeline to handle the realities of processing 339 million documents across 100 parallel workers over two weeks. Every failure mode you can imagine showed up: documents that crash the model, workers racing to commit to the same repo, Slurm jobs dying on startup, and caches corrupting under contention. We merged over a dozen PRs to make this work. Here are the most impactful ones.
     ...
 )
 ```
+### What's in the Dataset?
+<FigRef target="finephrase-explorer" /> lets you browse real examples from FinePhrase. Each sample shows the original FineWeb-Edu source document alongside all four rephrased versions. Navigate through samples to see how the same web document becomes a FAQ, a math problem, a structured table, and a step-by-step tutorial.
+<Wide>
+<HtmlEmbed
+  id="finephrase-explorer"
+  src="finephrase-explorer.html"
+  caption="Browse real examples from the FinePhrase dataset. Each sample shows the original source document alongside all four rephrased versions (FAQ, Math, Table, Tutorial). Use the arrows or Random button to navigate between samples."
+/>
+</Wide>