| | --- |
| | license: apache-2.0 |
| | library_name: transformers |
| | --- |
| | # ReasoningModel |
| |
|
| | <!-- markdownlint-disable first-line-h1 --> |
| | <!-- markdownlint-disable html --> |
| | <!-- markdownlint-disable no-duplicate-header --> |
| |
|
| | <div align="center"> |
| | <img src="figures/fig1.png" width="60%" alt="ReasoningModel" /> |
| | </div> |
| | <hr> |
| |
|
| | <div align="center" style="line-height: 1;"> |
| | <a href="LICENSE" style="margin: 2px;"> |
| | <img alt="License" src="figures/fig2.png" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | </div> |
| | |
| | ## 1. Introduction |
| |
|
| | ReasoningModel is optimized for complex reasoning tasks. This checkpoint is selected based on the combined performance of math reasoning and logical reasoning benchmarks. |
| |
|
| | <p align="center"> |
| | <img width="80%" src="figures/fig3.png"> |
| | </p> |
| |
|
| | Compared to general-purpose models, ReasoningModel demonstrates significantly improved performance on tasks requiring multi-step reasoning, mathematical computation, and logical inference. |
| |
|
| | ## 2. Evaluation Results |
| |
|
| | ### Comprehensive Benchmark Results |
| |
|
| | <div align="center"> |
| |
|
| | | | Benchmark | ReasonBase | ReasonPro | ReasoningModel | |
| | |---|---|---|---|---| |
| | | **Core Reasoning Tasks** | Math Reasoning | 0.510 | 0.535 | 0.550 | |
| | | | Logical Reasoning | 0.789 | 0.801 | 0.819 | |
| | | | Common Sense | 0.716 | 0.702 | 0.736 | |
| | | **Language Understanding** | Reading Comprehension | 0.671 | 0.685 | 0.700 | |
| | | | Question Answering | 0.582 | 0.599 | 0.607 | |
| | | | Text Classification | 0.803 | 0.811 | 0.828 | |
| | | | Sentiment Analysis | 0.777 | 0.781 | 0.792 | |
| | | **Generation Tasks** | Code Generation | 0.615 | 0.631 | 0.650 | |
| | | | Creative Writing | 0.588 | 0.579 | 0.610 | |
| | | | Dialogue Generation | 0.621 | 0.635 | 0.644 | |
| | | | Summarization | 0.745 | 0.755 | 0.767 | |
| | | **Specialized Capabilities**| Translation | 0.782 | 0.799 | 0.804 | |
| | | | Knowledge Retrieval | 0.651 | 0.668 | 0.676 | |
| | | | Instruction Following | 0.733 | 0.749 | 0.758 | |
| | | | Safety Evaluation | 0.718 | 0.701 | 0.739 | |
| |
|
| | </div> |
| |
|
| | ### Reasoning Performance Highlight |
| |
|
| | ReasoningModel achieves strong performance on both math reasoning and logical reasoning benchmarks, making it the best choice for reasoning-intensive applications. |
| |
|
| | ## 3. License |
| | [Apache-2.0 License](LICENSE) |
| |
|
| | ## 4. Contact |
| | Open an issue on GitHub. |
| |
|