HuggingFaceTB
/

SmolLM3-3B

Text Generation

Model card Files Files and versions

eliebak HF Staff commited on Jul 8, 2025

Commit

9d66358

·

verified ·

1 Parent(s): 2593281

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -218,9 +218,9 @@ In this section, we report the evaluation results of SmolLM3 model. All evaluati
 We highlight the best score in bold and underline the second-best score.
-## Instruction Model
-### No Extended Thinking
 Evaluation results of non reasoning models and reasoning models in no thinking mode. We highlight the best and second-best scores in bold.
 | Category | Metric | SmoLLM3-3B | Qwen2.5-3B | Llama3.1-3B | Qwen3-1.7B | Qwen3-4B |
 |---------|--------|------------|------------|-------------|------------|----------|
@@ -235,7 +235,7 @@ Evaluation results of non reasoning models and reasoning models in no thinking m
 (*): this is a tool calling finetune
-### Extended Thinking
 Evaluation results in reasoning mode for SmolLM3 and Qwen3 models:
 | Category | Metric | SmoLLM3-3B | Qwen3-1.7B | Qwen3-4B |
 |---------|--------|------------|------------|----------|
@@ -249,10 +249,10 @@ Evaluation results in reasoning mode for SmolLM3 and Qwen3 models:
 | Multilingual Q&A | Global MMLU | <u>64.1</u> | 62.3 | **73.3** |
-## Base Pre-Trained Model
 For Ruler 64k evaluation, we apply YaRN to the Qwen models with 32k context to extrapolate the context length.
-### English benchmarks
 Note: All evaluations are zero-shot unless stated otherwise.
 | Category | Metric | SmolLM3-3B | Qwen2.5-3B | Llama3-3.2B | Qwen3-1.7B-Base | Qwen3-4B-Base |
@@ -276,7 +276,7 @@ Note: All evaluations are zero-shot unless stated otherwise.
 | | Ruler 32k context | 76.35 | 75.93 | <u>77.58</u> | 70.63 | **83.98** |
 | | Ruler 64k context | 67.85 | 64.90 | **72.93** | 57.18 | 60.29 |
-### Multilingual benchmarks
 | Category | Metric | SmolLM3 3B Base | Qwen2.5-3B | Llama3.2 3B | Qwen3 1.7B Base | Qwen3 4B Base |

 We highlight the best score in bold and underline the second-best score.
+### Instruction Model
+#### No Extended Thinking
 Evaluation results of non reasoning models and reasoning models in no thinking mode. We highlight the best and second-best scores in bold.
 | Category | Metric | SmoLLM3-3B | Qwen2.5-3B | Llama3.1-3B | Qwen3-1.7B | Qwen3-4B |
 |---------|--------|------------|------------|-------------|------------|----------|
 (*): this is a tool calling finetune
+#### Extended Thinking
 Evaluation results in reasoning mode for SmolLM3 and Qwen3 models:
 | Category | Metric | SmoLLM3-3B | Qwen3-1.7B | Qwen3-4B |
 |---------|--------|------------|------------|----------|
 | Multilingual Q&A | Global MMLU | <u>64.1</u> | 62.3 | **73.3** |
+### Base Pre-Trained Model
 For Ruler 64k evaluation, we apply YaRN to the Qwen models with 32k context to extrapolate the context length.
+#### English benchmarks
 Note: All evaluations are zero-shot unless stated otherwise.
 | Category | Metric | SmolLM3-3B | Qwen2.5-3B | Llama3-3.2B | Qwen3-1.7B-Base | Qwen3-4B-Base |
 | | Ruler 32k context | 76.35 | 75.93 | <u>77.58</u> | 70.63 | **83.98** |
 | | Ruler 64k context | 67.85 | 64.90 | **72.93** | 57.18 | 60.29 |
+#### Multilingual benchmarks
 | Category | Metric | SmolLM3 3B Base | Qwen2.5-3B | Llama3.2 3B | Qwen3 1.7B Base | Qwen3 4B Base |