finephrase

Running on CPU Upgrade

App Files Files Community

joelniklaus HF Staff commited on Feb 25

Commit

455a326

1 Parent(s): 9fab25e

added article and discussion results

Browse files

Files changed (3) hide show

app/src/content/assets/data/benchmark-results.csv +2 -2
app/src/content/assets/data/rephrasing_metadata.json +2 -2
app/src/content/chapters/3-experiments.mdx +23 -1

app/src/content/assets/data/benchmark-results.csv CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ff88dedc4e0c1d7dd13f29a3bd9a68072119f1c2c5a9c48f7a6f2c893778615
-size 1245861

 version https://git-lfs.github.com/spec/v1
+oid sha256:27dd686263a9217a306811036fd361d7616dc6231393f311387d1b5dd065f595
+size 1334642

app/src/content/assets/data/rephrasing_metadata.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:19a1b032f82449d0c9dcaa9cda0c0db42fea5bc11e5007234bc0a2d27e45ff8c
-size 130560

 version https://git-lfs.github.com/spec/v1
+oid sha256:cac779aca41bc6f868d99a7c7fcc43343591b40ace727098341d52285c1ff856
+size 152802

app/src/content/chapters/3-experiments.mdx CHANGED Viewed

@@ -232,7 +232,7 @@ Since model size barely matters, does the model family make a difference?
 #### Does the model family matter?
-We test six model families (SmolLM2, Falcon3 [@falcon3], Qwen3, Gemma-3, Granite3 [@granite3], Llama-3.2) at ~1B scale on four prompts. Use the Setup dropdown to compare across prompts. SmolLM2 consistently and clearly outperforms all others across all four prompts (see <FigRef target="model-family" />).
 <Sidenote>
 We hypothesize that SmolLM2's consistently strong rephrasing performance originates from explicit [rewrite tasks](https://huggingface.co/datasets/HuggingFaceTB/smoltalk/viewer/smol-rewrite?row=0&views%5B%5D=smol_rewrite_train) in its instruction tuning data (smoltalk). This would mean the model already "knows" how to rewrite well before we even prompt it.
@@ -244,6 +244,28 @@ We hypothesize that SmolLM2's consistently strong rephrasing performance origina
   desc="Model families compared at ~1B scale. Use the Setup dropdown to compare across prompts."
   config={{
     setups: {
       "Tutorial Prompt": {
         datasets: {
           "mix-fw_edu_hq-tutorial_smollm2_1.7b_hq": "SmolLM2",

 #### Does the model family matter?
+We test six model families (SmolLM2, Falcon3 [@falcon3], Qwen3, Gemma-3, Granite3 [@granite3], Llama-3.2) at ~1B scale on six prompts. Use the Setup dropdown to compare across prompts. SmolLM2 consistently and clearly outperforms all others across all six prompts (see <FigRef target="model-family" />).
 <Sidenote>
 We hypothesize that SmolLM2's consistently strong rephrasing performance originates from explicit [rewrite tasks](https://huggingface.co/datasets/HuggingFaceTB/smoltalk/viewer/smol-rewrite?row=0&views%5B%5D=smol_rewrite_train) in its instruction tuning data (smoltalk). This would mean the model already "knows" how to rewrite well before we even prompt it.
   desc="Model families compared at ~1B scale. Use the Setup dropdown to compare across prompts."
   config={{
     setups: {
+      "Article Prompt": {
+        datasets: {
+          "mix-fw_edu_hq-article_smollm2_1.7b_hq": "SmolLM2",
+          "mix-fw_edu_hq-article_falcon3_1b_hq": "Falcon3",
+          "mix-fw_edu_hq-article_granite3_1b_hq": "Granite3",
+          "mix-fw_edu_hq-article_1b_hq": "Gemma-3",
+          "mix-fw_edu_hq-article_llama3.2_1b_hq": "Llama-3.2",
+          "mix-fw_edu_hq-article_qwen3_1.7b_hq": "Qwen3",
+          dclm: { display: "Baseline (DCLM)", color: "#8b8b8b", baseline: true }
+        }
+      },
+      "Discussion Prompt": {
+        datasets: {
+          "mix-fw_edu_hq-discussion_smollm2_1.7b_hq": "SmolLM2",
+          "mix-fw_edu_hq-discussion_falcon3_1b_hq": "Falcon3",
+          "mix-fw_edu_hq-discussion_granite3_1b_hq": "Granite3",
+          "mix-fw_edu_hq-discussion_1b_hq": "Gemma-3",
+          "mix-fw_edu_hq-discussion_llama3.2_1b_hq": "Llama-3.2",
+          "mix-fw_edu_hq-discussion_qwen3_1.7b_hq": "Qwen3",
+          dclm: { display: "Baseline (DCLM)", color: "#8b8b8b", baseline: true }
+        }
+      },
       "Tutorial Prompt": {
         datasets: {
           "mix-fw_edu_hq-tutorial_smollm2_1.7b_hq": "SmolLM2",