Goedel-LM
/

Goedel-Prover-SFT

Safetensors

llama

Model card Files Files and versions

xet

Community

linyongver commited on Apr 18, 2025

Commit

1006b97

verified ·

1 Parent(s): db537be

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -50,7 +50,7 @@ We introduce Goedel-Prover, an open-source language model that achieves state-of
   <img width="100%" src="performance.png">
 </p>
-**Caption:** The Pass@N metric indicates that we generate N proofs for a single problem; if any one of these N proofs successfully solves the problem, it is considered solved. (Left): The performance of Pass@32 for full proof generation on miniF2F. Due to limited compute, we compare with DeepSeek-Prover-v1.5 on the Pass@32 metric. (Middle): This sub-figure presents a comparison of Goedel-Prover-SFT and Deepseek-Prover-v1.5 in terms of miniF2F performance across different inference budgets, ranging from Pass@32, 64, 128, ..., 4 \* 6400, to 16 \* 6400. The performance numbers of Deepseek-Prover-v1.5 are directly taken from Table 1 of Xin et al. (2024b). Due to computational resource constraints, we tested Goedel-Prover-SFT only up to Pass@4 × 6400. Notably, Goedel-Prover-SFT's Pass@256 already exceeds the Pass@16 \* 6400 performance of Deepseek-Prover-v1.5-RL (without inference time tree search). (Right): The number of problems solved in Lean-workbook by Goedel-Prover-SFT compared to prior works. InternLM2.5-Step-Prover and InternLM-Math-Plus collectively solve and open-source 15.7K samples, while we solve and open-source 29.7K samples.
@@ -119,6 +119,7 @@ We are also releasing 29,7K proofs of the problems in Lean-workbook found by our
 |            **Datasets**            |                          **Download**                         |
 | :-----------------------------: | :----------------------------------------------------------: |
 |   Lean-workbook-proofs   | [🤗 HuggingFace](https://huggingface.co/datasets/Goedel-LM/Lean-workbook-proofs) |
 </div>
 ## 4. Citation

   <img width="100%" src="performance.png">
 </p>
+**Caption:** The Pass@N metric indicates that we generate N proofs for a single problem; if any one of these N proofs successfully solves the problem, it is considered solved. (Left): The performance of Pass@32 for full proof generation on miniF2F. Due to limited compute, we compare with DeepSeek-Prover-v1.5 on the Pass@32 metric. (Middle): This sub-figure presents a comparison of Goedel-Prover and Deepseek-Prover-v1.5 in terms of miniF2F performance across different inference budgets, ranging from Pass@32, 64, 128, ..., 4 \* 6400, to 16 \* 6400. The performance numbers of Deepseek-Prover-v1.5 are directly taken from Table 1 of Xin et al. (2024b). Due to computational resource constraints, we tested Goedel-Prover-SFT only up to Pass@4 × 6400. Notably, Goedel-Prover-SFT's Pass@256 already exceeds the Pass@16 \* 6400 performance of Deepseek-Prover-v1.5-RL (without inference time tree search). (Right): The number of problems solved in Lean-workbook by Goedel-Prover-SFT compared to prior works. InternLM2.5-Step-Prover and InternLM-Math-Plus collectively solve and open-source 15.7K samples, while we solve and open-source 29.7K samples.
 |            **Datasets**            |                          **Download**                         |
 | :-----------------------------: | :----------------------------------------------------------: |
 |   Lean-workbook-proofs   | [🤗 HuggingFace](https://huggingface.co/datasets/Goedel-LM/Lean-workbook-proofs) |
+|   Goedel-Pset-v1   | [🤗 HuggingFace](https://huggingface.co/datasets/Goedel-LM/Goedel-Pset-v1) |
 </div>
 ## 4. Citation