finephrase

Running on CPU Upgrade

App Files Files Community

joelniklaus HF Staff commited on Feb 10

Commit

cfb5e0c

1 Parent(s): 7aaccb8

improve coloring

Browse files

Files changed (2) hide show

app/src/content/chapters/experiments.mdx +16 -4
app/src/content/chapters/introduction.mdx +1 -1

app/src/content/chapters/experiments.mdx CHANGED Viewed

@@ -57,7 +57,7 @@ Prior synthetic datasets bundle multiple prompts together. We want to understand
 We isolate each prompt from Nemotron-HQ-Synth ([diverse_qa_pairs](#diverse_qa_pairs), [extract_knowledge](#extract_knowledge), [distill](#distill), [wikipedia_style_rephrasing](#wikipedia_style_rephrasing), [knowledge_list](#knowledge_list)), the REWIRE [guided_rewrite](#guided_rewrite_original) prompt, and the two prompts from BeyondWeb [@beyondweb] ([continue](#continue), [summarize](#summarize)), all using Gemma-3-1B on FineWeb-Edu-HQ as source.
 <Sidenote>
-We don't have access to the final BeyondWeb dataset, so we reimplemented their [continue](#continue) and [summarize](#summarize) prompts ourselves.
 </Sidenote>
 Only [diverse_qa_pairs](#diverse_qa_pairs) (driven by very strong SQuAD performance) and REWIRE's [guided_rewrite](#guided_rewrite_original) match DCLM (see [Dissecting Synthetic Baselines](#dissecting-baselines)). The BeyondWeb-inspired [continue](#continue) and [summarize](#summarize) prompts do not reach DCLM level. <mark>TLDR: Apart from two prompts, no existing synthetic method outperforms the DCLM baseline.</mark>
@@ -73,14 +73,26 @@ Only [diverse_qa_pairs](#diverse_qa_pairs) (driven by very strong SQuAD performa
       "mix-fw_edu_hq-diverse_qa_pairs_1b_hq": "Diverse QA Pairs",
       dclm: "DCLM",
       "mix-fw_edu_hq-extract_knowledge_1b_hq": "Extract Knowledge",
-      "mix-fw_edu_hq-guided_rewrite_original_1b_hq": "Guided Rewrite (REWIRE)",
       nemotron_hq_synth: "Nemotron-HQ-Synth",
       rewire: "REWIRE",
       "mix-fw_edu_hq-distill_1b_hq": "Distill",
       "mix-fw_edu_hq-wikipedia_style_rephrasing_1b_hq": "Wikipedia Rephrasing",
       "mix-fw_edu_hq-knowledge_list_1b_hq": "Knowledge List",
-      "mix-fw_edu_hq-continue_1b_hq": "Continue (BeyondWeb)",
-      "mix-fw_edu_hq-summarize_1b_hq": "Summarize (BeyondWeb)"
     }
   }}
 />

 We isolate each prompt from Nemotron-HQ-Synth ([diverse_qa_pairs](#diverse_qa_pairs), [extract_knowledge](#extract_knowledge), [distill](#distill), [wikipedia_style_rephrasing](#wikipedia_style_rephrasing), [knowledge_list](#knowledge_list)), the REWIRE [guided_rewrite](#guided_rewrite_original) prompt, and the two prompts from BeyondWeb [@beyondweb] ([continue](#continue), [summarize](#summarize)), all using Gemma-3-1B on FineWeb-Edu-HQ as source.
 <Sidenote>
+The BeyondWeb dataset was never released and the paper omits key details, yet claims strong performance. We tested their [continue](#continue) and [summarize](#summarize) prompts to verify those claims and make the knowledge publicly available.
 </Sidenote>
 Only [diverse_qa_pairs](#diverse_qa_pairs) (driven by very strong SQuAD performance) and REWIRE's [guided_rewrite](#guided_rewrite_original) match DCLM (see [Dissecting Synthetic Baselines](#dissecting-baselines)). The BeyondWeb-inspired [continue](#continue) and [summarize](#summarize) prompts do not reach DCLM level. <mark>TLDR: Apart from two prompts, no existing synthetic method outperforms the DCLM baseline.</mark>
       "mix-fw_edu_hq-diverse_qa_pairs_1b_hq": "Diverse QA Pairs",
       dclm: "DCLM",
       "mix-fw_edu_hq-extract_knowledge_1b_hq": "Extract Knowledge",
+      "mix-fw_edu_hq-guided_rewrite_original_1b_hq": "Guided Rewrite",
       nemotron_hq_synth: "Nemotron-HQ-Synth",
       rewire: "REWIRE",
       "mix-fw_edu_hq-distill_1b_hq": "Distill",
       "mix-fw_edu_hq-wikipedia_style_rephrasing_1b_hq": "Wikipedia Rephrasing",
       "mix-fw_edu_hq-knowledge_list_1b_hq": "Knowledge List",
+      "mix-fw_edu_hq-continue_1b_hq": "Continue",
+      "mix-fw_edu_hq-summarize_1b_hq": "Summarize"
+    },
+    pinnedColors: {
+      "Nemotron-HQ-Synth": "#76b900",
+      "Diverse QA Pairs": "#c5e384",
+      "Distill": "#a0c95c",
+      "Wikipedia Rephrasing": "#7fb034",
+      "Knowledge List": "#5e960e",
+      "Extract Knowledge": "#3d6b00",
+      "REWIRE": "#1877F2",
+      "Guided Rewrite": "#6aabff",
+      "Continue (BeyondWeb)": "#e8713a",
+      "Summarize (BeyondWeb)": "#c4451c"
     }
   }}
 />

app/src/content/chapters/introduction.mdx CHANGED Viewed

@@ -40,7 +40,7 @@ Here's a preview of where we end up: FinePhrase, our best configuration, clearly
   desc="Figure: FinePhrase compared against synthetic data baselines across evaluation metrics."
   config={{
     defaultView: "line",
-    pinnedColors: { "FinePhrase": "#ff6d00" },
     baselines: ["cosmopedia", "nemotron_hq_synth", "rewire", "synth_query_reasoning_answer"],
     datasetNames: {
       cosmopedia: "Cosmopedia",

   desc="Figure: FinePhrase compared against synthetic data baselines across evaluation metrics."
   config={{
     defaultView: "line",
+    pinnedColors: { "FinePhrase": "#EBA937" },
     baselines: ["cosmopedia", "nemotron_hq_synth", "rewire", "synth_query_reasoning_answer"],
     datasetNames: {
       cosmopedia: "Cosmopedia",