davidlms commited on
Commit
a55c9e2
·
verified ·
1 Parent(s): a07cc9a

Add model-index with instruction benchmark evaluations

Browse files

Added structured evaluation results from README benchmark table:

**Instruction Model Benchmarks (No Extended Thinking):**
- AIME 2025 (High school math): 9.3
- GSM-Plus (Math problem-solving): 72.8
- LiveCodeBench v4 (Competitive programming): 15.2
- GPQA Diamond (Graduate-level reasoning): 35.7
- IFEval (Instruction following): 76.7
- MixEval Hard (Alignment): 26.9
- BFCL (Tool Calling): 92.3
- Global MMLU (Multilingual Q&A): 53.5

Total: 8 benchmarks covering reasoning, math, coding, instruction-following, alignment, tool use, and multilingual capabilities.

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Note: This PR adds benchmark metadata to the model card frontmatter and should not conflict with existing PRs #43, #32, and #16 which only modify the chat template.

Files changed (1) hide show
  1. README.md +37 -1
README.md CHANGED
@@ -11,7 +11,43 @@ language:
11
  - ar
12
  - ru
13
  base_model:
14
- - HuggingFaceTB/SmolLM3-3B-Base
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
 
 
11
  - ar
12
  - ru
13
  base_model:
14
+ - HuggingFaceTB/SmolLM3-3B-Base
15
+ model-index:
16
+ - name: SmolLM3-3B
17
+ results:
18
+ - task:
19
+ type: text-generation
20
+ dataset:
21
+ name: Instruction Benchmarks
22
+ type: benchmark
23
+ metrics:
24
+ - name: AIME 2025
25
+ type: aime_2025
26
+ value: 9.3
27
+ - name: GSM-Plus
28
+ type: gsm_plus
29
+ value: 72.8
30
+ - name: LiveCodeBench v4
31
+ type: live_code_bench_v4
32
+ value: 15.2
33
+ - name: GPQA Diamond
34
+ type: gpqa_diamond
35
+ value: 35.7
36
+ - name: IFEval
37
+ type: ifeval
38
+ value: 76.7
39
+ - name: MixEval Hard
40
+ type: mixeval_hard
41
+ value: 26.9
42
+ - name: BFCL
43
+ type: bfcl
44
+ value: 92.3
45
+ - name: Global MMLU
46
+ type: global_mmlu
47
+ value: 53.5
48
+ source:
49
+ name: Model README - Instruction Benchmarks
50
+ url: https://huggingface.co/HuggingFaceTB/SmolLM3-3B
51
  ---
52
 
53