| --- |
| base_model: Qwen/Qwen2.5-7B-Instruct |
| library_name: transformers |
| license: apache-2.0 |
| tags: |
| - llama-factory |
| - full |
| - generated_from_trainer |
| - reasoning |
| - math |
| - code |
| - general |
| model-index: |
| - name: GRM-7b |
| results: [] |
| pipeline_tag: text-generation |
| new_version: OrionLLM/GRM2-3b |
| --- |
| <p align="center"> |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/685ea8ff7b4139b6845ce395/YF0kEDYMGJhcM3Lbl2EOD.png" alt="logo" width="250"> |
| </p> |
|
|
| **GRM-7b** is a **general-purpose reasoning-focused** 7B model fine-tuned to improve **multi-domain reasoning** (math, logic, coding, and broad problem-solving). It is designed to be a strong, practical “daily driver” for **general reasoning tasks** and as a solid base for **further fine-tuning**. |
|
|
| --- |
|
|
| ## Key features |
|
|
| - **Dedicated reasoning behavior** for general tasks (stepwise problem solving, better consistency). |
| - **Strong 7B-scale model** — practical for local inference and experimentation. |
| - **Multi-domain mixture**: reasoning + code + math + (some) medical reasoning data. |
| - **Fine-tune friendly**: intended as a good starting point for your own SFT/GRPO/DPO pipelines. |
|
|
| --- |
|
|
| ## Benchmarks |
|
|
| | Model | Data | AIME24 | AIME25 | AMC23 | MATH500 | HMMT O2/25 | LCB 06/24-01/25 | CodeElo | CodeForces | GPQA-D | JEEBench | |
| | ----------------------------------------------------------------------------------------------- | ----- | ------ | ------ | ------ | ------- | ---------- | --------------- | ------- | ---------- | ------ | -------- | |
| | [OpenThinker-7B](https://huggingface.co/open-thoughts/OpenThinker-7B) | ✅ | 30.7 | 22.0 | 72.5 | 82.8 | 15.7 | 26.1 | 11.1 | 14.9 | 38.6 | 45.3 | |
| | **[GRM-7b](https://huggingface.co/OrionLLM/GRM-7b)** | ✅ |**69.0**|**53.3**|**93.5**| **90.0**| **42.7** | **51.7** | 31.0 |**32.2** | 53.7 |**72.4** | |
| | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ❌ | 51.3 | 38.0 | 92.0 | 88.0 | 25.0 | 34.5 | 19.9 | 21.1 | 33.2 | 50.4 | |
| | [OpenR1-Distill-7B](https://huggingface.co/open-r1/OpenR1-Distill-7B) | ✅ | 57.7 | 39.7 | 87.0 | 88.0 | 25.7 | 30.7 | 30.1 | 29.3 |**58.9**| 68.7 | |
| | [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1) | ✅ | 62.0 | 48.0 |**94.0**| 89.4 | 26.7 | **50.9** | 30.9 |**32.9** | 52.9 | 70.7 | |
| | [AceReason-Nemotron-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B) | ✅ |**71.0**| 50.7 |**93.8**| 89.8 | 33.3 | 44.3 |**32.9** |**30.9** | 52.9 | 64.3 | |