Update README.md
Browse files
README.md
CHANGED
|
@@ -33,9 +33,25 @@ This repository contains the latest tactic generator model checkpoint from BFS-P
|
|
| 33 |
- Autoformalized NuminaMath-CoT dataset
|
| 34 |
|
| 35 |
## Performance
|
|
|
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
## Usage
|
| 41 |
|
|
|
|
| 33 |
- Autoformalized NuminaMath-CoT dataset
|
| 34 |
|
| 35 |
## Performance
|
| 36 |
+
BFS-Prover achieves state-of-the-art performance on the MiniF2F test benchmark. Here's a detailed comparison:
|
| 37 |
|
| 38 |
+
### MiniF2F Test Benchmark Results
|
| 39 |
+
|
| 40 |
+
| Prover System | Search Method | Critic Model | Tactic Budget | Score |
|
| 41 |
+
|---------------|---------------|--------------|---------------|--------|
|
| 42 |
+
| BFS-Prover (Accumulative) | BFS | No | - | **72.95%** |
|
| 43 |
+
| BFS-Prover (This Work) | BFS | No | 2048×2×600 | **70.83% ± 0.89%** |
|
| 44 |
+
| HunyuanProver | BFS | Yes | 600×8×400 | 68.4% |
|
| 45 |
+
| InternLM2.5-StepProver | BFS | Yes | 256×32×600 | 65.9% |
|
| 46 |
+
| DeepSeek-Prover-V1.5* | MCTS | No | 32×16×400 | 63.5% |
|
| 47 |
+
|
| 48 |
+
*Note: DeepSeek-Prover-V1.5 uses whole-proof generation method; tactic budget decomposed for comparison.
|
| 49 |
+
|
| 50 |
+
### Key Advantages
|
| 51 |
+
- Achieves better performance without requiring a critic model
|
| 52 |
+
- Uses simpler search method (BFS) compared to MCTS
|
| 53 |
+
- Shows strong scaling with increased search passes
|
| 54 |
+
- Benefits from DPO training using compiler feedback
|
| 55 |
|
| 56 |
## Usage
|
| 57 |
|