finephrase

Running on CPU Upgrade

App Files Files Community

joelniklaus HF Staff commited on Feb 11

Commit

bc1432f

1 Parent(s): 6554803

Add auto-numbering for figures

Browse files

Files changed (3) hide show

app/src/content/chapters/experiments.mdx +13 -13
app/src/content/chapters/introduction.mdx +1 -1
app/src/styles/_base.css +16 -0

app/src/content/chapters/experiments.mdx CHANGED Viewed

@@ -29,7 +29,7 @@ DCLM, Nemotron-HQ-Synth, and REWIRE lead by a significant margin (see [Baseline
   id="baselines-comparison"
   src="d3-benchmark-comparison.html"
   title="Baseline Comparison"
-  desc="Figure: Comparison of baseline datasets across different evaluation metrics. Use the dropdown to switch metrics."
   config={{
     baselines: [],
     datasetNames: {
@@ -65,7 +65,7 @@ Only [diverse_qa_pairs](#diverse_qa_pairs) (driven by very strong SQuAD performa
   id="dissecting-baselines"
   src="d3-benchmark-comparison.html"
   title="Dissecting Synthetic Baselines"
-  desc="Figure: Individual prompt performance from existing synthetic datasets compared to DCLM and FineWeb-Edu-HQ."
   config={{
     baselines: ["dclm", "nemotron_hq_synth", "rewire"],
     datasetNames: {
@@ -110,7 +110,7 @@ Four prompts ([math](#math), [table](#table), [faq](#faq), [tutorial](#tutorial)
   id="new-prompts"
   src="d3-benchmark-comparison.html"
   title="New Prompt Performance"
-  desc="Figure: Seven new prompts compared against DCLM and FineWeb-Edu-HQ."
   config={{
     datasetNames: {
       "mix-fw_edu_hq-math_1b_hq": "Math",
@@ -148,7 +148,7 @@ It is possible that larger models produce richer or more nuanced rephrasings tha
   id="model-size"
   src="d3-benchmark-comparison.html"
   title="Model Size"
-  desc="Figure: Gemma-3 model sizes (270M to 27B). Use the Setup dropdown to compare across prompts."
   config={{
     setups: {
       "Tutorial Prompt": {
@@ -191,7 +191,7 @@ The results are mixed: for some prompts 12B helps slightly with LQ data, but for
   id="size-quality"
   src="d3-benchmark-comparison.html"
   title="Model Size vs Data Quality"
-  desc="Figure: 1B vs 12B model on HQ vs LQ data. Use the Setup dropdown to compare across prompts."
   config={{
     setups: {
       "Continue Prompt": {
@@ -256,7 +256,7 @@ We hypothesize that SmolLM2's consistently strong rephrasing performance origina
   id="model-family"
   src="d3-benchmark-comparison.html"
   title="Model Family"
-  desc="Figure: Model families compared at ~1B scale. Use the Setup dropdown to compare across prompts."
   config={{
     setups: {
       "Tutorial Prompt": {
@@ -325,7 +325,7 @@ While the differences are small, we find a consistent trend: newer versions lead
   id="model-generation"
   src="d3-benchmark-comparison.html"
   title="Model Generation: Qwen Tutorial"
-  desc="Figure: Qwen model generations (1.5 to 3) on the tutorial prompt."
   config={{
     datasetNames: {
       "mix-fw_edu_hq-tutorial_qwen3_1.7b_hq": "Qwen3 (1.7B)",
@@ -363,7 +363,7 @@ Synthetic-only training beats FineWeb-Edu-HQ but falls short of both DCLM and mi
   id="synthetic-only"
   src="d3-benchmark-comparison.html"
   title="Is Synthetic Data Enough?"
-  desc="Figure: Synthetic-only vs mixed training. Use the Setup dropdown to compare across source datasets."
   config={{
     setups: {
       "DCLM Source": {
@@ -404,7 +404,7 @@ DCLM and FineWeb-Edu-HQ outperform Cosmopedia and FineWeb-Edu-LQ as mix-in datas
   id="mixin-dataset"
   src="d3-benchmark-comparison.html"
   title="Mix-in Dataset Effect"
-  desc="Figure: Effect of different mix-in datasets. Use the Setup dropdown to compare HQ vs LQ source data."
   config={{
     setups: {
       "HQ Source": {
@@ -450,7 +450,7 @@ When mix-in varies with source, source quality appears to matter: FineWeb-Edu-HQ
   id="source-dataset-mixin-source"
   src="d3-benchmark-comparison.html"
   title="Source Dataset (Mix-in = Source)"
-  desc="Figure: Effect of source dataset when mix-in equals source. Use the Setup dropdown to compare prompts."
   config={{
     setups: {
       "Tutorial Prompt": {
@@ -481,7 +481,7 @@ When mix-in varies with source, source quality appears to matter: FineWeb-Edu-HQ
   id="source-dataset-fixed-mixin"
   src="d3-benchmark-comparison.html"
   title="Source Dataset (Fixed Mix-in: FineWeb-Edu-HQ)"
-  desc="Figure: Effect of source dataset with FineWeb-Edu-HQ as fixed mix-in. Use the Setup dropdown to compare prompts."
   config={{
     setups: {
       "Tutorial Prompt": {
@@ -528,7 +528,7 @@ Interestingly, when mixing enough different prompts together, we don't seem to n
   id="diversity"
   src="d3-benchmark-comparison.html"
   title="Diversity"
-  desc="Figure: Different diversity strategies. Use the Setup dropdown to compare approaches."
   config={{
     setups: {
       "Mixing Prompts": {
@@ -584,7 +584,7 @@ Surprisingly, typos don't have a negative effect on downstream model performance
   id="typos-effect"
   src="d3-benchmark-comparison.html"
   title="Effect of Typos in Prompt"
-  desc="Figure: REWIRE prompt with original typos vs improved version at 1B and 12B scale."
   config={{
     datasetNames: {
       "mix-fw_edu_hq-guided_rewrite_original_12b_hq": "Original (12B)",

   id="baselines-comparison"
   src="d3-benchmark-comparison.html"
   title="Baseline Comparison"
+  desc="Comparison of baseline datasets across different evaluation metrics. Use the dropdown to switch metrics."
   config={{
     baselines: [],
     datasetNames: {
   id="dissecting-baselines"
   src="d3-benchmark-comparison.html"
   title="Dissecting Synthetic Baselines"
+  desc="Individual prompt performance from existing synthetic datasets compared to DCLM and FineWeb-Edu-HQ."
   config={{
     baselines: ["dclm", "nemotron_hq_synth", "rewire"],
     datasetNames: {
   id="new-prompts"
   src="d3-benchmark-comparison.html"
   title="New Prompt Performance"
+  desc="Seven new prompts compared against DCLM and FineWeb-Edu-HQ."
   config={{
     datasetNames: {
       "mix-fw_edu_hq-math_1b_hq": "Math",
   id="model-size"
   src="d3-benchmark-comparison.html"
   title="Model Size"
+  desc="Gemma-3 model sizes (270M to 27B). Use the Setup dropdown to compare across prompts."
   config={{
     setups: {
       "Tutorial Prompt": {
   id="size-quality"
   src="d3-benchmark-comparison.html"
   title="Model Size vs Data Quality"
+  desc="1B vs 12B model on HQ vs LQ data. Use the Setup dropdown to compare across prompts."
   config={{
     setups: {
       "Continue Prompt": {
   id="model-family"
   src="d3-benchmark-comparison.html"
   title="Model Family"
+  desc="Model families compared at ~1B scale. Use the Setup dropdown to compare across prompts."
   config={{
     setups: {
       "Tutorial Prompt": {
   id="model-generation"
   src="d3-benchmark-comparison.html"
   title="Model Generation: Qwen Tutorial"
+  desc="Qwen model generations (1.5 to 3) on the tutorial prompt."
   config={{
     datasetNames: {
       "mix-fw_edu_hq-tutorial_qwen3_1.7b_hq": "Qwen3 (1.7B)",
   id="synthetic-only"
   src="d3-benchmark-comparison.html"
   title="Is Synthetic Data Enough?"
+  desc="Synthetic-only vs mixed training. Use the Setup dropdown to compare across source datasets."
   config={{
     setups: {
       "DCLM Source": {
   id="mixin-dataset"
   src="d3-benchmark-comparison.html"
   title="Mix-in Dataset Effect"
+  desc="Effect of different mix-in datasets. Use the Setup dropdown to compare HQ vs LQ source data."
   config={{
     setups: {
       "HQ Source": {
   id="source-dataset-mixin-source"
   src="d3-benchmark-comparison.html"
   title="Source Dataset (Mix-in = Source)"
+  desc="Effect of source dataset when mix-in equals source. Use the Setup dropdown to compare prompts."
   config={{
     setups: {
       "Tutorial Prompt": {
   id="source-dataset-fixed-mixin"
   src="d3-benchmark-comparison.html"
   title="Source Dataset (Fixed Mix-in: FineWeb-Edu-HQ)"
+  desc="Effect of source dataset with FineWeb-Edu-HQ as fixed mix-in. Use the Setup dropdown to compare prompts."
   config={{
     setups: {
       "Tutorial Prompt": {
   id="diversity"
   src="d3-benchmark-comparison.html"
   title="Diversity"
+  desc="Different diversity strategies. Use the Setup dropdown to compare approaches."
   config={{
     setups: {
       "Mixing Prompts": {
   id="typos-effect"
   src="d3-benchmark-comparison.html"
   title="Effect of Typos in Prompt"
+  desc="REWIRE prompt with original typos vs improved version at 1B and 12B scale."
   config={{
     datasetNames: {
       "mix-fw_edu_hq-guided_rewrite_original_12b_hq": "Original (12B)",

app/src/content/chapters/introduction.mdx CHANGED Viewed

@@ -37,7 +37,7 @@ Here's a preview of where we end up: FinePhrase, our best configuration, clearly
   id="finephrase-vs-baselines"
   src="d3-benchmark-comparison.html"
   title="FinePhrase vs Synthetic Baselines"
-  desc="Figure: FinePhrase compared against synthetic data baselines across evaluation metrics."
   config={{
     defaultView: "line",
     pinnedColors: { "FinePhrase": "#EBA937" },

   id="finephrase-vs-baselines"
   src="d3-benchmark-comparison.html"
   title="FinePhrase vs Synthetic Baselines"
+  desc="FinePhrase compared against synthetic data baselines across evaluation metrics."
   config={{
     defaultView: "line",
     pinnedColors: { "FinePhrase": "#EBA937" },

app/src/styles/_base.css CHANGED Viewed

@@ -178,6 +178,22 @@ html {
   opacity: 1;
 }
 .katex .tag {
   background: none;
   border: none;

   opacity: 1;
 }
+/* ===== Auto-numbering for figures ===== */
+.content-grid main {
+  counter-reset: figure;
+}
+.content-grid main figure:not(.table-figure) {
+  counter-increment: figure;
+}
+/* Prepend "Figure N: " to description figcaptions (skip title-only figcaptions) */
+.content-grid main figure:not(.table-figure) > figcaption.html-embed__desc::before,
+.content-grid main figure:not(.table-figure):not(.html-embed) > figcaption::before {
+  content: "Figure " counter(figure) ": ";
+  font-weight: 600;
+}
 .katex .tag {
   background: none;
   border: none;