Improve model card with metadata and additional usage information
Browse filesThis PR enhances the model card for Goedel-Prover-V2 by:
* **Adding `pipeline_tag: text-generation` to the metadata**: This ensures the model is correctly categorized on the Hugging Face Hub, improving its discoverability and enabling the proper inference widget for text generation tasks.
* **Adding `library_name: transformers` to the metadata**: This indicates that the model is compatible with the `transformers` library, ensuring users see the correct "how to use" code snippets.
* **Replacing HTML badges with cleaner Markdown badges**: The existing inline HTML for badges has been replaced with standard Markdown badge syntax from the project's GitHub README, improving the model card's structure and rendering. The GitHub code link, project website, and arXiv paper link are all correctly included.
* **Including "Environment Setup" and "Batch Inference and Self-correction" sections**: These sections, present in the project's GitHub README but missing from the current model card, provide crucial information for setting up the Lean 4 environment and performing batch inference, significantly improving the user experience.
* **Removing extraneous "File information"**: As per guidelines, internal file information has been removed from the public model card.
These updates aim to make the Goedel-Prover-V2 model card more informative, user-friendly, and compliant with Hugging Face Hub best practices.
|
@@ -1,90 +1,34 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- Qwen/Qwen3-8B
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
|
|
|
| 6 |
<div align="center">
|
| 7 |
<h1> <a href="http://blog.goedel-prover.com"> <strong>Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date</strong></a></h1>
|
| 8 |
</div>
|
| 9 |
|
| 10 |
-
<
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
.badge {
|
| 20 |
-
display: inline-flex; /* inline flex for pill layout */
|
| 21 |
-
align-items: center; /* center icon & text vertically */
|
| 22 |
-
padding: 0.4rem 0.4rem; /* larger badge size */
|
| 23 |
-
background: #333;
|
| 24 |
-
color: #fff; /* white text */
|
| 25 |
-
text-decoration: none;
|
| 26 |
-
font-family: sans-serif;
|
| 27 |
-
font-size: 0.8rem; /* larger font */
|
| 28 |
-
border-radius: 999em; /* full pill shape */
|
| 29 |
-
transition: background 0.15s;/* smooth hover */
|
| 30 |
-
}
|
| 31 |
-
.badge:hover {
|
| 32 |
-
background: #444;
|
| 33 |
-
}
|
| 34 |
-
.badge svg,
|
| 35 |
-
.badge .emoji {
|
| 36 |
-
width: 1.7em; /* scale icons with badge text */
|
| 37 |
-
height: 1.7em;
|
| 38 |
-
margin-right: 0.5em;
|
| 39 |
-
fill: currentColor; /* inherit badge color */
|
| 40 |
-
}
|
| 41 |
-
</style>
|
| 42 |
-
</head>
|
| 43 |
-
<body>
|
| 44 |
-
<div class="badges">
|
| 45 |
-
<a href="https://arxiv.org/abs/2508.03613"
|
| 46 |
-
class="badge"
|
| 47 |
-
target="_blank"
|
| 48 |
-
rel="noopener">
|
| 49 |
-
<!-- use an emoji or inline-SVG for the PDF icon -->
|
| 50 |
-
<span class="emoji"><h1>📄</h1></span>
|
| 51 |
-
arXiv
|
| 52 |
-
</a>
|
| 53 |
-
<a href="http://blog.goedel-prover.com"
|
| 54 |
-
class="badge" target="_blank" rel="noopener">
|
| 55 |
-
<span class="emoji"><h1>🌐</h1></span>Website</a><a href="https://huggingface.co/Goedel-LM/Goedel-Prover-V2-32B"
|
| 56 |
-
class="badge" target="_blank" rel="noopener">
|
| 57 |
-
<span class="emoji"><h1>🤗</h1></span>HuggingFace</a>
|
| 58 |
-
<a href="https://github.com/Goedel-LM/Goedel-Prover-V2" class="badge" target="_blank" rel="noopener">
|
| 59 |
-
<svg viewBox="0 0 64 64" xmlns="http://www.w3.org/2000/svg">
|
| 60 |
-
<path d="M32.029,8.345c-13.27,0-24.029,10.759-24.029,24.033c0,10.617 6.885,19.624
|
| 61 |
-
16.435,22.803c1.202,0.22 1.64-0.522 1.64-1.16c0-0.569-0.02-2.081-0.032-4.086
|
| 62 |
-
c-6.685,1.452-8.095-3.222-8.095-3.222c-1.093-2.775-2.669-3.514-2.669-3.514
|
| 63 |
-
c-2.182-1.492,0.165-1.462,0.165-1.462c2.412,0.171 3.681,2.477 3.681,2.477
|
| 64 |
-
c2.144,3.672 5.625,2.611 6.994,1.997c0.219-1.553 0.838-2.612 1.526-3.213
|
| 65 |
-
c-5.336-0.606-10.947-2.669-10.947-11.877c0-2.623 0.937-4.769 2.474-6.449
|
| 66 |
-
c-0.247-0.608-1.072-3.051 0.235-6.36c0,0 2.018-0.646 6.609,2.464c1.917-0.533
|
| 67 |
-
3.973-0.8 6.016-0.809c2.041,0.009 4.097,0.276 6.017,0.809c4.588-3.11
|
| 68 |
-
6.602-2.464 6.602-2.464c1.311,3.309 0.486,5.752 0.239,6.36c1.54,1.68
|
| 69 |
-
2.471,3.826 2.471,6.449c0,9.232-5.62,11.263-10.974,11.858c0.864,0.742
|
| 70 |
-
1.632,2.208 1.632,4.451c0,3.212-0.029,5.804-0.029,6.591c0,0.644
|
| 71 |
-
0.432,1.392 1.652,1.157c9.542-3.185 16.421-12.186 16.421-22.8c0-13.274
|
| 72 |
-
-10.76-24.033-24.034-24.033"/>
|
| 73 |
-
</svg>
|
| 74 |
-
Code
|
| 75 |
-
</a>
|
| 76 |
-
</div>
|
| 77 |
-
</body>
|
| 78 |
|
| 79 |
## 1. Introduction
|
| 80 |
|
| 81 |
-
We introduce Goedel-Prover-V2, an open-source language model series that sets a new state-of-the-art in automated formal proof generation. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1)
|
| 82 |
|
| 83 |
-
Our small model, Goedel-Prover-V2-8B, reaches 83.0% on MiniF2F test set at Pass@32, matching the performance of prior state-of-the-art DeepSeek-Prover-V2-671B while being nearly 100 times smaller in model size.
|
| 84 |
|
| 85 |
## 2. Benchmark Performance
|
| 86 |
|
| 87 |
-
|
| 88 |
|
| 89 |
|
| 90 |
<style>
|
|
@@ -231,11 +175,63 @@ We release our Goedel-Prover-V2 models and the new MathOlympiadBench benchmark t
|
|
| 231 |
|
| 232 |
</div>
|
| 233 |
|
| 234 |
-
|
| 235 |
|
| 236 |
This model is being released to aid other open-source projects, including those geared towards the upcoming IMO competition. A full paper with all details will be released in the coming weeks.
|
| 237 |
|
| 238 |
-
## 5.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
You can directly use [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
|
| 240 |
|
| 241 |
```python
|
|
@@ -284,9 +280,13 @@ print(tokenizer.batch_decode(outputs))
|
|
| 284 |
print(time.time() - start)
|
| 285 |
```
|
| 286 |
|
|
|
|
| 287 |
|
|
|
|
|
|
|
|
|
|
| 288 |
|
| 289 |
-
###
|
| 290 |
```bibtex
|
| 291 |
@article{lin2025goedelproverv2,
|
| 292 |
title={Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction},
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen3-8B
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
library_name: transformers
|
| 6 |
+
pipeline_tag: text-generation
|
| 7 |
---
|
| 8 |
+
|
| 9 |
<div align="center">
|
| 10 |
<h1> <a href="http://blog.goedel-prover.com"> <strong>Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date</strong></a></h1>
|
| 11 |
</div>
|
| 12 |
|
| 13 |
+
<div align="center">
|
| 14 |
+
|
| 15 |
+
[](http://blog.goedel-prover.com)
|
| 16 |
+
[](https://github.com/Goedel-LM/Goedel-Prover-V2)
|
| 17 |
+
[](https://huggingface.co/Goedel-LM/Goedel-Prover-V2-32B)
|
| 18 |
+
[](https://arxiv.org/abs/2508.03613)
|
| 19 |
+
[](https://opensource.org/licenses/Apache-2.0)
|
| 20 |
+
|
| 21 |
+
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## 1. Introduction
|
| 24 |
|
| 25 |
+
We introduce Goedel-Prover-V2, an open-source language model series that sets a new state-of-the-art in automated formal proof generation. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) **Scaffolded data synthesis**: We generate synthetic proof tasks of increasing difficulty to progressively train the model, enabling it to master increasingly complex theorems; (2) **Verifier-guided self-correction**: The model learns to iteratively revise its own proofs by leveraging feedback from Lean’s compiler, closely mimicking how humans refine their work; (3) **Model averaging**: We combine multiple model checkpoints to improve robustness and overall performance.
|
| 26 |
|
| 27 |
+
Our small model, Goedel-Prover-V2-8B, reaches 83.0% on MiniF2F test set at Pass@32, matching the performance of prior state-of-the-art DeepSeek-Prover-V2-671B while being nearly 100 times smaller in model size. Our flagship model, Goedel-Prover-V2-32B, achieves 88.0% on MiniF2F at Pass@32 on standard mode and 90.4% on self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and concurrent work Kimina-Prover-72B by a large margin. Additionaly, our flagship model with self-correction solves 64 problems on PutnamBench at Pass@64, securing the 1st on the leaderboard surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by Pass@1024.
|
| 28 |
|
| 29 |
## 2. Benchmark Performance
|
| 30 |
|
| 31 |
+
**Self-correction mode**: Our model improves proof quality by first generating an initial candidate and then using Lean compiler feedback to iteratively revise it. We perform two rounds of self-correction, which remain computationally efficient—the total output length (including the initial proof and two revisions) increases only modestly from the standard 32K to 40K tokens.
|
| 32 |
|
| 33 |
|
| 34 |
<style>
|
|
|
|
| 175 |
|
| 176 |
</div>
|
| 177 |
|
| 178 |
+
**MathOlympiadBench** (Math Olympiad Bench) comprises human-verified formalizations of Olympiad-level mathematical competition problems, sourced from [Compfiles](https://dwrensha.github.io/compfiles/imo.html) and [IMOSLLean4 repository](https://github.com/mortarsanjaya/IMOSLLean4). MathOlympiadBench contains 360 problems, including 158 IMO problems from 1959 to 2024, 131 IMO shortlist problems covering 2006 to 2023, 68 regional mathematical Olympiad problems, and 3 additional mathematical puzzles.
|
| 179 |
|
| 180 |
This model is being released to aid other open-source projects, including those geared towards the upcoming IMO competition. A full paper with all details will be released in the coming weeks.
|
| 181 |
|
| 182 |
+
## 5. Environment Setup
|
| 183 |
+
|
| 184 |
+
We follow [DeepSeek-Prover-V1.5](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5), which uses Lean 4 version 4.9 and the corresponding Mathlib. Please refer to the following instructions to set up the environments.
|
| 185 |
+
|
| 186 |
+
### Requirements
|
| 187 |
+
|
| 188 |
+
* Supported platform: Linux
|
| 189 |
+
* Python 3.10
|
| 190 |
+
|
| 191 |
+
### Installation
|
| 192 |
+
|
| 193 |
+
1. **Install Lean 4**
|
| 194 |
+
|
| 195 |
+
Follow the instructions on the [Lean 4 installation page](https://leanprover.github.io/lean4/doc/quickstart.html) to set up Lean 4.
|
| 196 |
+
|
| 197 |
+
2. **Clone the repository**
|
| 198 |
+
|
| 199 |
+
```sh
|
| 200 |
+
git clone --recurse-submodules https://github.com/Goedel-LM/Goedel-Prover-V2.git
|
| 201 |
+
cd Goedel-Prover-V2
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
3. **Install required packages**
|
| 205 |
+
```sh
|
| 206 |
+
conda env create -f goedelv2.yml
|
| 207 |
+
```
|
| 208 |
+
|
| 209 |
+
If you encounter installation error when installing packages (flash-attn), please run the following:
|
| 210 |
+
|
| 211 |
+
```sh
|
| 212 |
+
conda activate goedelv2
|
| 213 |
+
pip install torch==2.6.0
|
| 214 |
+
conda env update --file goedelv2.yml
|
| 215 |
+
```
|
| 216 |
+
|
| 217 |
+
4. **Build Mathlib4**
|
| 218 |
+
|
| 219 |
+
```sh
|
| 220 |
+
cd mathlib4
|
| 221 |
+
lake build
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
+
5. **Test Lean 4 and mathlib4 installation**
|
| 225 |
+
|
| 226 |
+
```sh
|
| 227 |
+
cd ..
|
| 228 |
+
python lean_compiler/repl_scheduler.py
|
| 229 |
+
```
|
| 230 |
+
If there is any error, reinstall Lean 4 and rebuild mathlib4.
|
| 231 |
+
|
| 232 |
+
If you have installed Lean and Mathlib for other projects and want to use the pre-installed things, note that you might need to modify `DEFAULT_LAKE_PATH` and `DEFAULT_LEAN_WORKSPACE` in `lean_compiler/repl_scheduler.py`.
|
| 233 |
+
|
| 234 |
+
## 6. Quick Start
|
| 235 |
You can directly use [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
|
| 236 |
|
| 237 |
```python
|
|
|
|
| 280 |
print(time.time() - start)
|
| 281 |
```
|
| 282 |
|
| 283 |
+
## 7. Batch Inference and Self-correction
|
| 284 |
|
| 285 |
+
```sh
|
| 286 |
+
bash scripts/pipeline.sh
|
| 287 |
+
```
|
| 288 |
|
| 289 |
+
### 8. Citation
|
| 290 |
```bibtex
|
| 291 |
@article{lin2025goedelproverv2,
|
| 292 |
title={Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction},
|