OrionLLM
/

GRM-7b

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

GRM-7b / README.md

DedeProGames's picture

Update README.md

25c5116 verified 16 days ago

|

history blame contribute delete

2.92 kB

	---
	base_model: Qwen/Qwen2.5-7B-Instruct
	library_name: transformers
	license: apache-2.0
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	- reasoning
	- math
	- code
	- general
	model-index:
	- name: GRM-7b
	results: []
	pipeline_tag: text-generation
	new_version: OrionLLM/GRM2-3b
	---
	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/685ea8ff7b4139b6845ce395/YF0kEDYMGJhcM3Lbl2EOD.png" alt="logo" width="250">
	</p>

	GRM-7b is a general-purpose reasoning-focused 7B model fine-tuned to improve multi-domain reasoning (math, logic, coding, and broad problem-solving). It is designed to be a strong, practical “daily driver” for general reasoning tasks and as a solid base for further fine-tuning.

	---

	## Key features

	- Dedicated reasoning behavior for general tasks (stepwise problem solving, better consistency).
	- Strong 7B-scale model — practical for local inference and experimentation.
	- Multi-domain mixture: reasoning + code + math + (some) medical reasoning data.
	- Fine-tune friendly: intended as a good starting point for your own SFT/GRPO/DPO pipelines.

	---

	## Benchmarks

	\| Model \| Data \| AIME24 \| AIME25 \| AMC23 \| MATH500 \| HMMT O2/25 \| LCB 06/24-01/25 \| CodeElo \| CodeForces \| GPQA-D \| JEEBench \|
	\| ----------------------------------------------------------------------------------------------- \| ----- \| ------ \| ------ \| ------ \| ------- \| ---------- \| --------------- \| ------- \| ---------- \| ------ \| -------- \|
	\| [OpenThinker-7B](https://huggingface.co/open-thoughts/OpenThinker-7B) \| ✅ \| 30.7 \| 22.0 \| 72.5 \| 82.8 \| 15.7 \| 26.1 \| 11.1 \| 14.9 \| 38.6 \| 45.3 \|
	\| [GRM-7b](https://huggingface.co/OrionLLM/GRM-7b) \| ✅ \|69.0\|53.3\|93.5\| 90.0\| 42.7 \| 51.7 \| 31.0 \|32.2 \| 53.7 \|72.4 \|
	\| [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) \| ❌ \| 51.3 \| 38.0 \| 92.0 \| 88.0 \| 25.0 \| 34.5 \| 19.9 \| 21.1 \| 33.2 \| 50.4 \|
	\| [OpenR1-Distill-7B](https://huggingface.co/open-r1/OpenR1-Distill-7B) \| ✅ \| 57.7 \| 39.7 \| 87.0 \| 88.0 \| 25.7 \| 30.7 \| 30.1 \| 29.3 \|58.9\| 68.7 \|
	\| [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1) \| ✅ \| 62.0 \| 48.0 \|94.0\| 89.4 \| 26.7 \| 50.9 \| 30.9 \|32.9 \| 52.9 \| 70.7 \|
	\| [AceReason-Nemotron-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B) \| ✅ \|71.0\| 50.7 \|93.8\| 89.8 \| 33.3 \| 44.3 \|32.9 \|30.9 \| 52.9 \| 64.3 \|