ReasoningModel
1. Introduction
ReasoningModel is optimized for complex reasoning tasks. This checkpoint is selected based on the combined performance of math reasoning and logical reasoning benchmarks.
Compared to general-purpose models, ReasoningModel demonstrates significantly improved performance on tasks requiring multi-step reasoning, mathematical computation, and logical inference.
2. Evaluation Results
Comprehensive Benchmark Results
| Benchmark | ReasonBase | ReasonPro | ReasoningModel | |
|---|---|---|---|---|
| Core Reasoning Tasks | Math Reasoning | 0.510 | 0.535 | 0.550 |
| Logical Reasoning | 0.789 | 0.801 | 0.819 | |
| Common Sense | 0.716 | 0.702 | 0.736 | |
| Language Understanding | Reading Comprehension | 0.671 | 0.685 | 0.700 |
| Question Answering | 0.582 | 0.599 | 0.607 | |
| Text Classification | 0.803 | 0.811 | 0.828 | |
| Sentiment Analysis | 0.777 | 0.781 | 0.792 | |
| Generation Tasks | Code Generation | 0.615 | 0.631 | 0.650 |
| Creative Writing | 0.588 | 0.579 | 0.610 | |
| Dialogue Generation | 0.621 | 0.635 | 0.644 | |
| Summarization | 0.745 | 0.755 | 0.767 | |
| Specialized Capabilities | Translation | 0.782 | 0.799 | 0.804 |
| Knowledge Retrieval | 0.651 | 0.668 | 0.676 | |
| Instruction Following | 0.733 | 0.749 | 0.758 | |
| Safety Evaluation | 0.718 | 0.701 | 0.739 |
Reasoning Performance Highlight
ReasoningModel achieves strong performance on both math reasoning and logical reasoning benchmarks, making it the best choice for reasoning-intensive applications.
3. License
4. Contact
Open an issue on GitHub.
- Downloads last month
- -