rakshith-writer commited on
Commit
566ddc8
·
verified ·
1 Parent(s): ccde6c1

Add files using upload-large-folder tool

Browse files
Files changed (1) hide show
  1. README.md +39 -41
README.md CHANGED
@@ -5,11 +5,10 @@ tags:
5
  - qwen2
6
  - thinking
7
  - reasoning
8
- base_model: nvidia/OpenReasoning-Nemotron-1.5B
9
  model-index:
10
  - name: Palmyra-mini-thinking-b
11
  results: []
12
- license: cc-by-4.0
13
  language:
14
  - en
15
  ---
@@ -44,52 +43,51 @@ The model's mathematical abilities are particularly noteworthy. It achieves an i
44
 
45
  Beyond mathematics, Palmyra-mini-thinking-b demonstrates strong performance in the competitive programming arena. Its score of 0.6343 on the Codeforces (pass_rate) benchmark underscores its ability to understand complex algorithmic problems and generate correct, efficient code. This capability suggests the model is well-suited for tasks involving code generation, debugging, and algorithmic design, making it a valuable asset for software developers and computer science researchers.
46
 
47
- ## Benchmark Scores
48
 
49
  Pass@1(avg-of-64)
50
 
51
- | Benchmark | Pass@1 (avg-of-64) | Majority@64 |
52
- | :-------- | :----------------- | :---------- |
53
- | AIME24 | 59.43 | 71.67 |
54
- | AIME25 | 49.69 | 60.00 |
55
- | gpqa | 42.01 | 47.22 |
56
- | hmmt | 27.86 | 30.00 |
57
- | hle | 5.22 | N/A |
58
- | mmlu-pro | 55.49 | 60.60 |
59
- | math500 | 93.80 | 95.40 |
60
- | LCB | 34.51 | N/A |
61
-
62
 
 
63
 
64
 
65
  Pass@1(avg-of-1)
66
 
67
- | Benchmark | Score |
68
- |:-----------------------------------------------------------------|---------:|
69
- | gsm8k (strict-match) | 0.4268 |
70
- | minerva_math(exact_match) | 0.0708 |
71
- | mmlu_pro(exact_match) | 0.2926 |
72
- | hendrycks_math | 0.0016 |
73
- | ifeval (inst_level_loose_acc) | 0.3297 |
74
- | mathqa (acc) | 0.3045 |
75
- | humaneval (pass@1) | 0.0732 |
76
- | BBH (get-answer)(exact_match) | 0.288 |
77
- | mbpp | 0.168 |
78
- | leadboard_musr (acc_norm) | 0.3796 |
79
- | gpqa lighteval gpqa diamond_pass@1:8_samples | 0.3958 |
80
- | AIME24(pass@1)(avg-of-1) | 0.6 |
81
- | AIME25(pass@1)(avg-of-1) | 0.5 |
82
- | Livecodebench-codegen (livecodebench/code_generation_lite v4_v5) | 0.2873 |
83
- | AMC23 | 0.925 |
84
- | MATH500 | 0.882 |
85
- | Minerva | 0.2941 |
86
- | Olympiadbench (extractive_match) | 0.5733 |
87
- | Codecontests (pass_rate) | 0.2018 |
88
- | Codeforces (pass_rate) | 0.6343 |
89
- | Taco (pass_rate) | 0.3456 |
90
- | APPS (all_levels) | 0.0584 |
91
- | HMMT23 (extractive_match) | 0.2333 |
92
- | Average | 0.359378 |
93
 
94
  ### Use with transformers
95
 
@@ -181,4 +179,4 @@ To cite this model:
181
  month = Sep
182
  }
183
  ```
184
- Contact Hello@writer.com
 
5
  - qwen2
6
  - thinking
7
  - reasoning
 
8
  model-index:
9
  - name: Palmyra-mini-thinking-b
10
  results: []
11
+ license: apache-2.0
12
  language:
13
  - en
14
  ---
 
43
 
44
  Beyond mathematics, Palmyra-mini-thinking-b demonstrates strong performance in the competitive programming arena. Its score of 0.6343 on the Codeforces (pass_rate) benchmark underscores its ability to understand complex algorithmic problems and generate correct, efficient code. This capability suggests the model is well-suited for tasks involving code generation, debugging, and algorithmic design, making it a valuable asset for software developers and computer science researchers.
45
 
46
+ ## Benchmark Scores (sampling params: temperature:0.6, top_p:0.95)
47
 
48
  Pass@1(avg-of-64)
49
 
50
+ | Benchmark | Pass@1 (avg-of-64) | Majority@64 |
51
+ | :-------- | :------------------- | :----------- |
52
+ | AIME24 | 59.43% | 71.67% |
53
+ | AIME25 | 49.69% | 60.00% |
54
+ | GPQA | 42.01% | 47.22% |
55
+ | HMMT25 | 27.86% | 30.00% |
56
+ | HLE | 5.22% | N/A |
57
+ | MMLU-PRO | 55.49% | 60.60% |
58
+ | MATH500 | 93.80% | 95.40% |
59
+ | LCB | 34.51% | N/A |
 
60
 
61
+ LCB here is version v6_2408_2505
62
 
63
 
64
  Pass@1(avg-of-1)
65
 
66
+ | Benchmark | Score (%) |
67
+ |:-----------------------------------------------------------------|------------:|
68
+ | GSM8K (strict-match) | 42.68% |
69
+ | Minerva Math (exact match) | 7.08% |
70
+ | MMLU-PRO (exact match) | 29.26% |
71
+ | MATH (Hendrycks) | 0.16% |
72
+ | IFEval (inst_level_loose_acc) | 32.97% |
73
+ | MathQA (acc) | 30.45% |
74
+ | HumanEval (pass@1) | 7.32% |
75
+ | BBH (get-answer)(exact match) | 28.80% |
76
+ | MBPP | 16.80% |
77
+ | GPQA (diamond, pass@1: 8 samples) | 39.58% |
78
+ | AIME24 (pass@1)(avg-of-1) | 60.00% |
79
+ | AIME25 (pass@1)(avg-of-1) | 50.00% |
80
+ | Livecodebench-codegen (livecodebench/code_generation_lite v4_v5) | 28.73% |
81
+ | AMC23 | 92.50% |
82
+ | MATH500 | 88.20% |
83
+ | Minerva | 29.41% |
84
+ | Olympiadbench (extractive_match) | 57.33% |
85
+ | Codecontests (pass_rate) | 20.18% |
86
+ | Codeforces (pass_rate) | 63.43% |
87
+ | Taco (pass_rate) | 34.56% |
88
+ | APPS (all_levels) | 5.84% |
89
+ | HMMT (Feb 2025) (extractive_match) | 23.33% |
90
+ | Average | 35.94% |
 
91
 
92
  ### Use with transformers
93
 
 
179
  month = Sep
180
  }
181
  ```
182
+ Contact Hello@writer.com