Files changed (1) hide show
  1. README.md +35 -23
README.md CHANGED
@@ -1,23 +1,35 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - SJTULean/LeanStatement_SFT
5
- language:
6
- - en
7
- result:
8
- dataset:
9
- type: Formalizer Lean
10
- name: miniF2F
11
- metrics:
12
- - name: pass@1
13
- type: pass@1
14
- value: 0.9406
15
- verified: false
16
- base_model:
17
- - Qwen/Qwen2.5-7B-Instruct
18
- ---
19
- # LeanFormalizer_SFT
20
-
21
- <!-- Provide a quick summary of what the model is/does. -->
22
-
23
- Based on Qwen2.5-7b and trained on our LeanStatement_SFT dataset, our model achieves **state-of-the-art** performance in formal mathematics verification, with a **pass@1 compilation success rate** of 94.1% (459/488) on MiniF2F and 75.4% (282/374) on ProofNet benchmarks as of December 2024(while our PPO model achieves 95.3% on MiniF2F and 76.0% on Proofnet). These results demonstrate the effectiveness of scaling laws in mathematical formalization using the Lean theorem prover.
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - SJTULean/LeanStatement_SFT
5
+ language:
6
+ - zho
7
+ - eng
8
+ - fra
9
+ - spa
10
+ - por
11
+ - deu
12
+ - ita
13
+ - rus
14
+ - jpn
15
+ - kor
16
+ - vie
17
+ - tha
18
+ - ara
19
+ result:
20
+ dataset:
21
+ type: Formalizer Lean
22
+ name: miniF2F
23
+ metrics:
24
+ - name: pass@1
25
+ type: pass@1
26
+ value: 0.9406
27
+ verified: false
28
+ base_model:
29
+ - Qwen/Qwen2.5-7B-Instruct
30
+ ---
31
+ # LeanFormalizer_SFT
32
+
33
+ <!-- Provide a quick summary of what the model is/does. -->
34
+
35
+ Based on Qwen2.5-7b and trained on our LeanStatement_SFT dataset, our model achieves **state-of-the-art** performance in formal mathematics verification, with a **pass@1 compilation success rate** of 94.1% (459/488) on MiniF2F and 75.4% (282/374) on ProofNet benchmarks as of December 2024(while our PPO model achieves 95.3% on MiniF2F and 76.0% on Proofnet). These results demonstrate the effectiveness of scaling laws in mathematical formalization using the Lean theorem prover.