eliebak HF Staff commited on
Commit
035e4ab
·
verified ·
1 Parent(s): e627cfb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -72,9 +72,9 @@ In this section, we report the evaluation results of SmolLM3 model. All evaluati
72
 
73
  We highlight the best score in bold and underline the second-best score.
74
 
75
- ## Base Pre-Trained Model
76
 
77
- ### English benchmarks
78
  Note: All evaluations are zero-shot unless stated otherwise.
79
 
80
  | Category | Metric | SmolLM3-3B | Qwen2.5-3B | Llama3-3.2B | Qwen3-1.7B-Base | Qwen3-4B-Base |
@@ -98,7 +98,7 @@ Note: All evaluations are zero-shot unless stated otherwise.
98
  | | Ruler 32k context | 76.35 | 75.93 | <u>77.58</u> | 70.63 | **83.98** |
99
  | | Ruler 64k context | 67.85 | 64.90 | **72.93** | 57.18 | 60.29 |
100
 
101
- ### Multilingual benchmarks
102
 
103
 
104
 
@@ -141,9 +141,9 @@ The model has also been trained on Arabic (standard), Chinese and Russian data,
141
  | | Flores-200 (5-shot) | 47.13 | 48.74 | 50.74 | <u>54.70</u> | **60.53** |
142
 
143
 
144
- ## Instruction Model
145
 
146
- ### No Extended Thinking
147
  Evaluation results of non reasoning models and reasoning models in no thinking mode. We highlight the best and second-best scores in bold.
148
  | Category | Metric | SmoLLM3-3B | Qwen2.5-3B | Llama3.1-3B | Qwen3-1.7B | Qwen3-4B |
149
  |---------|--------|------------|------------|-------------|------------|----------|
@@ -158,7 +158,7 @@ Evaluation results of non reasoning models and reasoning models in no thinking m
158
 
159
  (*): this is a tool calling finetune
160
 
161
- ### Extended Thinking
162
  Evaluation results in reasoning mode for SmolLM3 and Qwen3 models:
163
  | Category | Metric | SmoLLM3-3B | Qwen3-1.7B | Qwen3-4B |
164
  |---------|--------|------------|------------|----------|
 
72
 
73
  We highlight the best score in bold and underline the second-best score.
74
 
75
+ ### Base Pre-Trained Model
76
 
77
+ #### English benchmarks
78
  Note: All evaluations are zero-shot unless stated otherwise.
79
 
80
  | Category | Metric | SmolLM3-3B | Qwen2.5-3B | Llama3-3.2B | Qwen3-1.7B-Base | Qwen3-4B-Base |
 
98
  | | Ruler 32k context | 76.35 | 75.93 | <u>77.58</u> | 70.63 | **83.98** |
99
  | | Ruler 64k context | 67.85 | 64.90 | **72.93** | 57.18 | 60.29 |
100
 
101
+ #### Multilingual benchmarks
102
 
103
 
104
 
 
141
  | | Flores-200 (5-shot) | 47.13 | 48.74 | 50.74 | <u>54.70</u> | **60.53** |
142
 
143
 
144
+ ### Instruction Model
145
 
146
+ #### No Extended Thinking
147
  Evaluation results of non reasoning models and reasoning models in no thinking mode. We highlight the best and second-best scores in bold.
148
  | Category | Metric | SmoLLM3-3B | Qwen2.5-3B | Llama3.1-3B | Qwen3-1.7B | Qwen3-4B |
149
  |---------|--------|------------|------------|-------------|------------|----------|
 
158
 
159
  (*): this is a tool calling finetune
160
 
161
+ #### Extended Thinking
162
  Evaluation results in reasoning mode for SmolLM3 and Qwen3 models:
163
  | Category | Metric | SmoLLM3-3B | Qwen3-1.7B | Qwen3-4B |
164
  |---------|--------|------------|------------|----------|