GRM2-3b / README.md
DedeProGames's picture
Update README.md
e919b40 verified
---
license: apache-2.0
tags:
- reasoning
- chat
- coding
- math
- science
- agent
- tools
base_model:
- OrionLLM/GRM2-3b
---
# GRM2
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/685ea8ff7b4139b6845ce395/YF0kEDYMGJhcM3Lbl2EOD.png" alt="logo" width="100">
</p>
<p align="center">
<a href="https://huggingface.co/OrionLLM/GRM2-3b/">
<img src="https://img.shields.io/badge/%F0%9F%A4%97%20HF-OrionLLM%2FGRM2--3b-green" alt="Hugging Face">
</a>
<a href="https://huggingface.co/OrionLLM/GRM2-3b-GGUF">
<img src="https://img.shields.io/badge/Quantizations-GRM2--3b--GGUF-blue" alt="Quantizations">
</a>
<a href="https://huggingface.co/OrionLLM">
<img src="https://img.shields.io/badge/Research-OrionLLM-purple" alt="Research">
</a>
<a href="https://huggingface.co/spaces/DedeProGames/GRM2-Chat">
<img src="https://img.shields.io/badge/Chat%20with%20the%20model-HuggingfaceSpace-5B3DF5?logo=chatbot&logoColor=white" alt="Chat with the model on HuggingfaceSpace">
</a>
<a href="">
<img src="https://img.shields.io/badge/License-Apache%202.0-green" alt="License">
</a>
</p>
## 1. Introduction
GRM2 is a **3B-parameter AI designed for general-purpose, reasoning-focused tasks**, with a strong emphasis on improving **multi-domain reasoning** across code, mathematics, science, and complex knowledge tasks. It is optimized for handling **long chains of thought**, enabling more structured, accurate, and reliable reasoning over difficult problems.
Despite its compact size, the model achieves **strong benchmark performance**, making it an efficient choice for users who want a balance between reasoning quality, versatility, and deployability.
## 2. Key Capabilities
- **Deep Reasoning at Speed:** GRM2 delivers high performance on reasoning-heavy and complex tasks, with the ability to compete with — and in some cases surpass — much larger 30B-class models.
- **A Robust Engine for Coding & Agents:** Despite having only 3B parameters, GRM2 can generate large, consistent code outputs and is an excellent choice for agentic workflows running on personal devices.
- **Accessible Local Deployment:** Optimized for accessibility, GRM2 brings elite-level intelligence to local environments, making it a strong option for local inference across a wide range of hardware.
- **Efficient Long Context:** The model supports a cost-efficient **256K context window**, enabling long, chronologically consistent chains of reasoning with strong introspective capabilities.
## 3. Performance
The GRM2 delivers performance equivalent to larger models, while remaining open, small, and efficient.
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/685ea8ff7b4139b6845ce395/c3m3C_XipLqjwjbcvsNWy.png" alt="logo" width="1000">
</p>
### Detailed Benchmarks
| Model | LiveCodeBench v6 | HMMT Nov 25 | GPQA / GPQA Diamond | MultiChallenge | AIME 2026 | xBench-DeepSearch-2510 | BFCL-V4 |
|---|---:|---:|---:|---:|---:|---:|---:|
| OrionLLM/GRM2-3b | **76.9** | **77.92** | **83.8** | **52.21** | **87.40** | **39.0** | 56.5 |
| Qwen/Qwen3-32B | 55.7 | 57.08 | 68.4 | 38.72 | 75.83 | 8 | 47.90 |
| OpenAI/o3-mini | 76.4 | N/A | 79.7 | 39.89 | 86.5 | N/A | **65.12** |