Update README.md
Browse files
README.md
CHANGED
|
@@ -50,7 +50,7 @@ We introduce Goedel-Prover, an open-source language model that achieves state-of
|
|
| 50 |
<img width="100%" src="performance.png">
|
| 51 |
</p>
|
| 52 |
|
| 53 |
-
**Caption:** The Pass@N metric indicates that we generate N proofs for a single problem; if any one of these N proofs successfully solves the problem, it is considered solved. (Left): The performance of Pass@32 for full proof generation on miniF2F. Due to limited compute, we compare with DeepSeek-Prover-v1.5 on the Pass@32 metric. (Middle): This sub-figure presents a comparison of Goedel-Prover
|
| 54 |
|
| 55 |
|
| 56 |
|
|
@@ -119,6 +119,7 @@ We are also releasing 29,7K proofs of the problems in Lean-workbook found by our
|
|
| 119 |
| **Datasets** | **Download** |
|
| 120 |
| :-----------------------------: | :----------------------------------------------------------: |
|
| 121 |
| Lean-workbook-proofs | [🤗 HuggingFace](https://huggingface.co/datasets/Goedel-LM/Lean-workbook-proofs) |
|
|
|
|
| 122 |
</div>
|
| 123 |
|
| 124 |
## 4. Citation
|
|
|
|
| 50 |
<img width="100%" src="performance.png">
|
| 51 |
</p>
|
| 52 |
|
| 53 |
+
**Caption:** The Pass@N metric indicates that we generate N proofs for a single problem; if any one of these N proofs successfully solves the problem, it is considered solved. (Left): The performance of Pass@32 for full proof generation on miniF2F. Due to limited compute, we compare with DeepSeek-Prover-v1.5 on the Pass@32 metric. (Middle): This sub-figure presents a comparison of Goedel-Prover and Deepseek-Prover-v1.5 in terms of miniF2F performance across different inference budgets, ranging from Pass@32, 64, 128, ..., 4 \* 6400, to 16 \* 6400. The performance numbers of Deepseek-Prover-v1.5 are directly taken from Table 1 of Xin et al. (2024b). Due to computational resource constraints, we tested Goedel-Prover-SFT only up to Pass@4 × 6400. Notably, Goedel-Prover-SFT's Pass@256 already exceeds the Pass@16 \* 6400 performance of Deepseek-Prover-v1.5-RL (without inference time tree search). (Right): The number of problems solved in Lean-workbook by Goedel-Prover-SFT compared to prior works. InternLM2.5-Step-Prover and InternLM-Math-Plus collectively solve and open-source 15.7K samples, while we solve and open-source 29.7K samples.
|
| 54 |
|
| 55 |
|
| 56 |
|
|
|
|
| 119 |
| **Datasets** | **Download** |
|
| 120 |
| :-----------------------------: | :----------------------------------------------------------: |
|
| 121 |
| Lean-workbook-proofs | [🤗 HuggingFace](https://huggingface.co/datasets/Goedel-LM/Lean-workbook-proofs) |
|
| 122 |
+
| Goedel-Pset-v1 | [🤗 HuggingFace](https://huggingface.co/datasets/Goedel-LM/Goedel-Pset-v1) |
|
| 123 |
</div>
|
| 124 |
|
| 125 |
## 4. Citation
|