Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
455a326
1
Parent(s): 9fab25e
added article and discussion results
Browse files
app/src/content/assets/data/benchmark-results.csv
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:27dd686263a9217a306811036fd361d7616dc6231393f311387d1b5dd065f595
|
| 3 |
+
size 1334642
|
app/src/content/assets/data/rephrasing_metadata.json
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cac779aca41bc6f868d99a7c7fcc43343591b40ace727098341d52285c1ff856
|
| 3 |
+
size 152802
|
app/src/content/chapters/3-experiments.mdx
CHANGED
|
@@ -232,7 +232,7 @@ Since model size barely matters, does the model family make a difference?
|
|
| 232 |
|
| 233 |
#### Does the model family matter?
|
| 234 |
|
| 235 |
-
We test six model families (SmolLM2, Falcon3 [@falcon3], Qwen3, Gemma-3, Granite3 [@granite3], Llama-3.2) at ~1B scale on
|
| 236 |
|
| 237 |
<Sidenote>
|
| 238 |
We hypothesize that SmolLM2's consistently strong rephrasing performance originates from explicit [rewrite tasks](https://huggingface.co/datasets/HuggingFaceTB/smoltalk/viewer/smol-rewrite?row=0&views%5B%5D=smol_rewrite_train) in its instruction tuning data (smoltalk). This would mean the model already "knows" how to rewrite well before we even prompt it.
|
|
@@ -244,6 +244,28 @@ We hypothesize that SmolLM2's consistently strong rephrasing performance origina
|
|
| 244 |
desc="Model families compared at ~1B scale. Use the Setup dropdown to compare across prompts."
|
| 245 |
config={{
|
| 246 |
setups: {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 247 |
"Tutorial Prompt": {
|
| 248 |
datasets: {
|
| 249 |
"mix-fw_edu_hq-tutorial_smollm2_1.7b_hq": "SmolLM2",
|
|
|
|
| 232 |
|
| 233 |
#### Does the model family matter?
|
| 234 |
|
| 235 |
+
We test six model families (SmolLM2, Falcon3 [@falcon3], Qwen3, Gemma-3, Granite3 [@granite3], Llama-3.2) at ~1B scale on six prompts. Use the Setup dropdown to compare across prompts. SmolLM2 consistently and clearly outperforms all others across all six prompts (see <FigRef target="model-family" />).
|
| 236 |
|
| 237 |
<Sidenote>
|
| 238 |
We hypothesize that SmolLM2's consistently strong rephrasing performance originates from explicit [rewrite tasks](https://huggingface.co/datasets/HuggingFaceTB/smoltalk/viewer/smol-rewrite?row=0&views%5B%5D=smol_rewrite_train) in its instruction tuning data (smoltalk). This would mean the model already "knows" how to rewrite well before we even prompt it.
|
|
|
|
| 244 |
desc="Model families compared at ~1B scale. Use the Setup dropdown to compare across prompts."
|
| 245 |
config={{
|
| 246 |
setups: {
|
| 247 |
+
"Article Prompt": {
|
| 248 |
+
datasets: {
|
| 249 |
+
"mix-fw_edu_hq-article_smollm2_1.7b_hq": "SmolLM2",
|
| 250 |
+
"mix-fw_edu_hq-article_falcon3_1b_hq": "Falcon3",
|
| 251 |
+
"mix-fw_edu_hq-article_granite3_1b_hq": "Granite3",
|
| 252 |
+
"mix-fw_edu_hq-article_1b_hq": "Gemma-3",
|
| 253 |
+
"mix-fw_edu_hq-article_llama3.2_1b_hq": "Llama-3.2",
|
| 254 |
+
"mix-fw_edu_hq-article_qwen3_1.7b_hq": "Qwen3",
|
| 255 |
+
dclm: { display: "Baseline (DCLM)", color: "#8b8b8b", baseline: true }
|
| 256 |
+
}
|
| 257 |
+
},
|
| 258 |
+
"Discussion Prompt": {
|
| 259 |
+
datasets: {
|
| 260 |
+
"mix-fw_edu_hq-discussion_smollm2_1.7b_hq": "SmolLM2",
|
| 261 |
+
"mix-fw_edu_hq-discussion_falcon3_1b_hq": "Falcon3",
|
| 262 |
+
"mix-fw_edu_hq-discussion_granite3_1b_hq": "Granite3",
|
| 263 |
+
"mix-fw_edu_hq-discussion_1b_hq": "Gemma-3",
|
| 264 |
+
"mix-fw_edu_hq-discussion_llama3.2_1b_hq": "Llama-3.2",
|
| 265 |
+
"mix-fw_edu_hq-discussion_qwen3_1.7b_hq": "Qwen3",
|
| 266 |
+
dclm: { display: "Baseline (DCLM)", color: "#8b8b8b", baseline: true }
|
| 267 |
+
}
|
| 268 |
+
},
|
| 269 |
"Tutorial Prompt": {
|
| 270 |
datasets: {
|
| 271 |
"mix-fw_edu_hq-tutorial_smollm2_1.7b_hq": "SmolLM2",
|