INSAIT-Institute
/

OPC-R1-8B

Safetensors

English

qwen3

Model card Files Files and versions

xet

Community

ivovpetrov01 commited on Jun 21, 2025

Commit

6b3e5d2

verified ·

1 Parent(s): f89fc92

Update README.md

Browse files

Files changed (1) hide show

README.md +120 -1

README.md CHANGED Viewed

@@ -6,6 +6,125 @@ base_model:
 - deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
 ---
 # OPC-R1-8B
-We present **OPC-RQ-8B**, an open source model for judging LLM-generated proofs, trained on the data from the Open Proof Corpus (OPC)

 - deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
 ---
+<div align="center">
+  <img src="https://github.com/eth-sri/matharena/blob/main/images/matharena_icon.png?raw=true" width="20%" alt="MathArena" />
+</div>
+<hr>
+<div align="center" style="line-height: 1;">
+  <a href="https://proofcorpus.ai/" target="_blank" style="margin: 2px;">
+    <img alt="Homepage" src="https://img.shields.io/badge/Homepage-OPC-ffc107?color=00c107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/insait-institute/open-proof-corpus/" target="_blank" style="margin: 2px;">
+    <img alt="GitHub" src="https://img.shields.io/badge/GitHub-OPC-ffc107?logo=github&color=c10000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://huggingface.co/collections/INSAIT-Institute/open-proof-corpus-6856d1279a7b6f12b83dc886" target="_blank" style="margin: 2px;">
+    <img alt="OPC HuggingFace" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-OPC-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/insait-institute/open-proof-corpus/blob/master/LICENSE.md" style="margin: 2px;">
+    <img alt="License" src="https://img.shields.io/badge/license-Apache%20License%202.0-blue" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
 # OPC-R1-8B
+We introduce the Open Proof Corpus (OPC)—the world’s first large-scale, open-source dataset of human-verified solutions to advanced mathematics problems. With over 5,000 solutions spanning 1,000+ challenging problems, the OPC is specifically designed for broad applicability and downstream usage in proof generation research and is the first to include a substantial number of correct, LLM-generated solutions to problems from prestigious mathematics competitions such as the USAMO and IMO.
+Leverage OPC to tackle pressing research questions in automated proof generation:
+(1) How do natural language and formal proof generation compare?
+(2) How often do models that produce correct final answers truly reason their way to valid proofs?
+(3) By what margin do best-of-n selection methods improveme proof quality?
+WBuilding on this breakthrough resource, we present **OPC-R1-8B** - an open-source model for proof correctness judging that matches state-of-the-art performance. OPC-R1-8B ouperforms the majority of leading closed-source models, reaching an impressive 88.1% accuracy on verifying LLM-generated proofs.
+## Introduction
+**OPC-R1-8B** is a specialized, open-source model fine-tuned for proof evaluation. We finetuned *DeepSeek R1 Distill Qwen3 8B (05/28)* using GRPO reward signals from judgment accuracy, using the "LLM-as-judge" prompt detailed in our paper.
+## Evaluation results
+To benchmark LLMs as judges for proofs, we took 293 solutions from the OPC, containing human-labeled ground truths. **OPC-R1-8B** achieves a notable 88.1% maj@5 accuracy, matching the top-performing **Gemini 2.5 Pro**, and outperforming all other closed- and open-source alternatives by a clear margin. Notably, OPC-R1-8B outpaces its base model by a full *17%*, highlighting the downstream impact of the high-quality data in the OPC
+| **Judge**          | **pass@1** | **maj@5** | **Cost (USD)** |
+|--------------------|:----------:|:---------:|:--------------:|
+| Human              | 90.4       | -         | N/A            |
+| Gemini 2.5 Pro     | 85.4       | 88.1      | 135.47         |
+| **OPC-R1-8B**      | 83.8       | 88.1      | N/A            |
+| o4-mini            | 83.8       | 85.3      | 29.57          |
+| o3                 | 83.1       | 84.3      | 93.3           |
+| Gemini 2.5 Flash   | 82.7       | 86.0      | 86.95          |
+| Qwen3 235B-A22B    | 81.8       | 84.6      | 3.79           |
+| DeepSeek R1 (05/28)| 80.9       | 82.6      | 27.35          |
+| Qwen3 30B-A3B     | 74.0       | 75.4      | N/A            |
+| DeepSeek R1 Distill 8B| 70.7       | 71.3      | N/A            |
+| Claude 4.0 Sonnet  | 70.6       | 75.0      | 28.21          |
+| Qwen3-8B           | 64.4       | 63.6      | N/A            |
+| GPT-4.1            | 61.4       | 60.8      | 20.33          |
+## Usage
+You can run our model following the example below:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "INSAIT-Institute/OPC-R1-8B"
+# load the tokenizer and the model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+# prepare the model input
+problem = "Compute the number of real solutions of the equation $x^2 + 2x + 2 = 0$."
+solution = "The equation can be factored in as $x^2 + 2x + 2 = (x+1)^2 + 1$. Because $(x+1)^2 \\geq 0$, then $x^2 + 2x + 2 \\geq 1 > 0$. Therefore, there are \\boxed{0} real solutions."
+prompt = "<substitute the evaluation prompt template from the paper>"
+messages = [
+    {"role": "user", "content": prompt.format(problem=problem, solution=solution)}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+# conduct text completion
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=32768
+)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
+content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
+print("content:", content)
+```
+You can also run the model using the inference pipeline in our <a href="https://github.com/insait-institute/open-proof-corpus" style="margin: 2px;">GitHub repository</a>.
+## Dataset
+We have open-sourced the OPC dataset, where we answer many open questions regarding the LLMs’ proof capabilities.
+<div align="center">
+|            **Dataset**            |                          **Download**                         |
+| :-----------------------------: | :----------------------------------------------------------: |
+|  OPC  | [🤗 HuggingFace](https://huggingface.co/datasets/INSAIT-Institute/OPC) |
+</div>
+## License
+**OPC-R1-8B**  is released under the permissive Apache 2.0 license. Please review and respect the license when using the model.
+## Citation
+TODO