zeyu-zheng commited on
Commit
f86aa4d
Β·
verified Β·
1 Parent(s): d8a57f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -23
README.md CHANGED
@@ -9,34 +9,59 @@ tags:
9
  ---
10
 
11
  <div align="center">
12
- <h1 style="font-size: 2.0em;">πŸš€ BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers</h1>
13
- <div style="display: flex; justify-content: center; gap: 8px; flex-wrap: wrap;">
14
- <a href="https://arxiv.org/abs/2509.06493"><img src="https://img.shields.io/badge/arXiv-2509.06493-b31b1b.svg" alt="arXiv"></a>
15
- <a href="https://choosealicense.com/licenses/apache-2.0/"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="License: Apache 2.0"></a>
16
- <a href="https://github.com/leanprover-community/mathlib4"><img src="https://img.shields.io/badge/Lean-4-orange" alt="Lean 4"></a>
 
 
 
 
 
 
 
 
 
 
 
 
17
  </div>
18
- <h2>State-of-the-art tactic generation model in Lean4</h2>
19
  </div>
20
- This repository contains the latest tactic generator model checkpoint from BFS-Prover-V2, a state-of-the-art step-level theorem proving system in Lean4. While the full BFS-Prover-V2 system integrates multiple components for scalable theorem proving, we are releasing the core tactic generation model here. Given a proof state in Lean4, the model generates a tactic that transforms the current proof state into a new state, progressively working towards completing the proof.
 
21
 
22
- **πŸ“„ Paper: [Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers](https://arxiv.org/abs/2509.06493)**
23
 
 
24
 
25
- ## ✨ Model Details
26
 
27
- - Base Model: Qwen2.5-32B
 
 
 
 
28
  - Training Approach: Multi-stage expert iteration with best-first tree search
29
  - Training Data Sources:
30
  - Mathlib (via LeanDojo)
31
  - Lean-Github repositories
32
  - Autoformalized NuminaMath datasets
 
33
 
34
- ## πŸ“ˆ Performance
35
- BFS-Prover-V2-32B achieves 95.08% on the miniF2F test, when integrated with the planner-based multi-agent tree search system, which significantly outperforms all previous step-provers.
36
- Additionally, the model demonstrates strong generalization to undergraduate-level mathematics, independently attaining 41.4% on the ProofNet test without a planner.
37
 
38
- ## βš™οΈ Usage
39
- - The model expects Lean4 tactic states in the format `"{state}:::"`
 
 
 
 
 
 
 
40
  - `:::` serves as a special indicator to signal the model to generate a tactic for the given state.
41
  - The model will echo back the input state followed by the generated tactic.
42
 
@@ -68,15 +93,13 @@ outputs = model.generate(**inputs)
68
  tactic = tokenizer.decode(outputs[0], skip_special_tokens=True).split(sep)[1]
69
  print(tactic)
70
 
71
- # Generated tactic: "nlinarith [sq_nonneg (c - a), sq_nonneg (b - a), sq_nonneg (c - b), hβ‚€.1, hβ‚€.2.1, hβ‚€.2.2, h₁.le, hβ‚‚.le, h₃.le]"
72
  ```
73
 
74
- ## πŸ“š Citation
75
-
76
- If you use this model in your research, please cite our paper:
77
 
78
  ```bibtex
79
- @article{xin2025bfsproverv2,
80
  title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers},
81
  author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia},
82
  journal={arXiv preprint arXiv:2509.06493},
@@ -84,11 +107,11 @@ If you use this model in your research, please cite our paper:
84
  }
85
  ```
86
 
87
- ## πŸ“„ License
88
 
89
- https://choosealicense.com/licenses/apache-2.0/
90
 
91
- ## πŸ“§ Contact
92
 
93
  For questions and feedback about the tactic generator model, please contact:
94
  - Ran Xin (ran.xin@bytedance.com)
 
9
  ---
10
 
11
  <div align="center">
12
+ <h1 style="font-size: 2.0em;">BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers</h1>
13
+ <div align="center" style="line-height: 1;">
14
+ <a href="https://bfs-prover.github.io/V2/">
15
+ <img src="https://img.shields.io/badge/Homepage-BFS--Prover--V2-78DED4?style=flat-square&labelColor=2E5AA8">
16
+ </a>
17
+ <a href="https://arxiv.org/abs/2509.06493">
18
+ <img src="https://img.shields.io/badge/arXiv-2509.06493-b31b1b.svg?style=flat-square&labelColor=2E5AA8">
19
+ </a>
20
+ <a href="https://huggingface.co/collections/ByteDance-Seed/bfs-prover-68db961a5fdf9de045440230">
21
+ <img src="https://img.shields.io/badge/GitHub-BFS--Prover--V2-808080?&style=flat-square&labelColor=2E5AA8">
22
+ </a>
23
+ <a href="https://github.com/cmu-l3/llmlean">
24
+ <img src="https://img.shields.io/badge/Integration-LLMLean-black?style=flat-square&labelColor=2E5AA8">
25
+ </a>
26
+ <a href="https://www.apache.org/licenses/LICENSE-2.0.txt">
27
+ <img src="https://img.shields.io/badge/License-Apache%202.0-purple.svg?style=flat-square&labelColor=2E5AA8">
28
+ </a>
29
  </div>
 
30
  </div>
31
+
32
+ ## Introduction
33
 
34
+ We introduce **BFS-Prover-V2**, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. BFS-Prover-V2 introduces novel solutions to overcome these limitations through:
35
 
36
+ 1. **Training-time scaling**: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training
37
 
38
+ 2. **Inference-time scaling**: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time
39
 
40
+ **BFS-Prover-V2** achieves 95.08\% and 41.4\% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers.
41
+
42
+ This repo contains the **BFS-Prover-V2-7B** model, with the following features:
43
+
44
+ - Base Model: Qwen2.5-Math-7B
45
  - Training Approach: Multi-stage expert iteration with best-first tree search
46
  - Training Data Sources:
47
  - Mathlib (via LeanDojo)
48
  - Lean-Github repositories
49
  - Autoformalized NuminaMath datasets
50
+ - Goedel-Pset
51
 
52
+ ## Benchmark Performance
53
+
54
+ <div align="center">
55
 
56
+ | Model | miniF2F-test | miniF2F-valid | ProofNet-test |
57
+ |:------|:------------:|:-------------:|:-------------:|
58
+ | πŸ‘‰ **BFS-Prover-V2-7B** | 82.4% | - | - |
59
+ | BFS-Prover-V2-32B | 86.1% | 85.5% | 41.4% |
60
+ | BFS-Prover-V2-32B w/ Planner | 95.08% | 95.5% | - |
61
+ </div>
62
+
63
+ ## Usage
64
+ - The model expects input in the format `"{state}:::"` where {state} is a Lean4 tactic state.
65
  - `:::` serves as a special indicator to signal the model to generate a tactic for the given state.
66
  - The model will echo back the input state followed by the generated tactic.
67
 
 
93
  tactic = tokenizer.decode(outputs[0], skip_special_tokens=True).split(sep)[1]
94
  print(tactic)
95
 
96
+ # Generated tactic: "nlinarith [sq_nonneg (a - b), sq_nonneg (c - a), sq_nonneg (b - c)]"
97
  ```
98
 
99
+ ## Citation
 
 
100
 
101
  ```bibtex
102
+ @article{xin2025scaling,
103
  title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers},
104
  author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia},
105
  journal={arXiv preprint arXiv:2509.06493},
 
107
  }
108
  ```
109
 
110
+ ## License
111
 
112
+ This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt).
113
 
114
+ ## Contact
115
 
116
  For questions and feedback about the tactic generator model, please contact:
117
  - Ran Xin (ran.xin@bytedance.com)