Writer
/

palmyra-mini-thinking-b

@@ -45,6 +45,24 @@ Beyond mathematics, Palmyra-mini-thinking-b demonstrates strong performance in t
 ## Benchmark Scores
 | Benchmark                                                        |    Score |
 |:-----------------------------------------------------------------|---------:|
 | gsm8k (strict-match)                                             | 0.4268   |
@@ -142,6 +160,13 @@ curl -X POST http://localhost:8000/v1/chat/completions \
 As with any language model, there is a potential for generating biased or inaccurate information. Users should be aware of these limitations and use the model responsibly.
 ### Citation and Related Information

 ## Benchmark Scores
+Pass@1(avg-of-64)
+| Benchmark | Pass@1 (avg-of-64) | Majority@64 |
+| :-------- | :----------------- | :---------- |
+| AIME24    | 59.43               | 71.67       |
+| AIME25    | 49.69               | 60.00       |
+| gpqa      | 42.01               | 47.22       |
+| hmmt      | 27.86               | 30.00       |
+| hle       | 5.22                | N/A         |
+| mmlu-pro  | 55.49               | 60.60       |
+| math500   | 93.80               | 95.40       |
+| LCB       | 34.51               | N/A         |
+Pass@1(avg-of-1)
 | Benchmark                                                        |    Score |
 |:-----------------------------------------------------------------|---------:|
 | gsm8k (strict-match)                                             | 0.4268   |
 As with any language model, there is a potential for generating biased or inaccurate information. Users should be aware of these limitations and use the model responsibly.
+### Footnotes
+- Base model: This model builds on NVIDIA's OpenReasoning-Nemotron-1.5B (`https://huggingface.co/nvidia/OpenReasoning-Nemotron-1.5B`).
+- Evaluation methodology:
+  - Pass@1 (avg-of-1): computed using `lm_eval` and `lighteval`.
+  - Pass@1 (avg-of-64) and Majority@64: computed using `nemoskills`.
 ### Citation and Related Information