Writer
/

palmyra-mini-thinking-b

+# Model Card: palmyra-mini-thinking-b
+##Introduction##
+Palmyra-mini-thinking-b represents a significant step forward in generative AI, demonstrating exceptional capabilities in complex reasoning and problem-solving domains. This model excels in mathematical and programming challenges, showcasing a robust understanding of abstract concepts and logical structures. Its performance is not just a measure of its power but a testament to its specialized training, which has honed its ability to tackle tasks that demand deep, multi-step thinking.
+##Mathematical Prowess##
+The model's mathematical abilities are particularly noteworthy. It achieves an impressive score of 0.925 on the AMC23 benchmark, indicating a strong grasp of advanced high school mathematics. This is further complemented by its performance on MATH500, where it scores 0.882, proving its proficiency across a wide range of mathematical problems. The model also shows its strength in competitive mathematics, scoring 0.6 on AIME24(pass@1)(avg-of-1) and 0.5733 on Olympiadbench (extractive_match). These scores highlight the model's capacity for sophisticated mathematical reasoning, making it a powerful tool for both educational and research applications.
+##Excellence in Competitive Programming##
+Beyond mathematics, Palmyra-mini-thinking-b demonstrates strong performance in the competitive programming arena. Its score of 0.6343 on the Codeforces (pass_rate) benchmark underscores its ability to understand complex algorithmic problems and generate correct, efficient code. This capability suggests the model is well-suited for tasks involving code generation, debugging, and algorithmic design, making it a valuable asset for software developers and computer science researchers.
+## Benchmark Scores
+| Benchmark                                                        |    Score |
+|:-----------------------------------------------------------------|---------:|
+| gsm8k (strict-match)                                             | 0.4268   |
+| minerva_math(exact_match)                                        | 0.0708   |
+| mmlu_pro(exact_match)                                            | 0.2926   |
+| hendrycks_math                                                   | 0.0016   |
+| ifeval (inst_level_loose_acc)                                    | 0.3297   |
+| mathqa (acc)                                                     | 0.3045   |
+| humaneval (pass@1)                                               | 0.0732   |
+| BBH (get-answer)(exact_match)                                    | 0.288    |
+| mbpp                                                             | 0.168    |
+| leadboard_musr (acc_norm)                                        | 0.3796   |
+| gpqa  lighteval gpqa diamond_pass@1:8_samples                    | 0.3958   |
+| AIME24(pass@1)(avg-of-1)                                         | 0.6      |
+| AIME25(pass@1)(avg-of-1)                                         | 0.5      |
+| Livecodebench-codegen (livecodebench/code_generation_lite v4_v5) | 0.2873   |
+| AMC23                                                            | 0.925    |
+| MATH500                                                          | 0.882    |
+| Minerva                                                          | 0.2941   |
+| Olympiadbench (extractive_match)                                 | 0.5733   |
+| Codecontests (pass_rate)                                         | 0.2018   |
+| Codeforces (pass_rate)                                           | 0.6343   |
+| Taco (pass_rate)                                                 | 0.3456   |
+| APPS (all_levels)                                                | 0.0584   |
+| HMMT23 (extractive_match)                                        | 0.2333   |
+| Average                                                          | 0.359378 |
+## Intended Use
+This model is intended for research and development in the field of generative AI, particularly for tasks requiring mathematical and logical reasoning.
+## Limitations
+The model's performance has been evaluated on a specific set of benchmarks. Its performance on other tasks or in real-world applications may vary.
+## Ethical Considerations
+As with any language model, there is a potential for generating biased or inaccurate information. Users should be aware of these limitations and use the model responsibly.