ByteDance-Seed
/

BFS-Prover-V2-7B

@@ -9,34 +9,59 @@ tags:
 ---
 <div align="center">
-  <h1 style="font-size: 2.0em;">🚀 BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers</h1>
-  <div style="display: flex; justify-content: center; gap: 8px; flex-wrap: wrap;">
-    <a href="https://arxiv.org/abs/2509.06493"><img src="https://img.shields.io/badge/arXiv-2509.06493-b31b1b.svg" alt="arXiv"></a>
-    <a href="https://choosealicense.com/licenses/apache-2.0/"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="License: Apache 2.0"></a>
-    <a href="https://github.com/leanprover-community/mathlib4"><img src="https://img.shields.io/badge/Lean-4-orange" alt="Lean 4"></a>
   </div>
-  <h2>State-of-the-art tactic generation model in Lean4</h2>
 </div>
-This repository contains the latest tactic generator model checkpoint from BFS-Prover-V2, a state-of-the-art step-level theorem proving system in Lean4. While the full BFS-Prover-V2 system integrates multiple components for scalable theorem proving, we are releasing the core tactic generation model here. Given a proof state in Lean4, the model generates a tactic that transforms the current proof state into a new state, progressively working towards completing the proof.
-**📄 Paper: [Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers](https://arxiv.org/abs/2509.06493)**
-## ✨ Model Details
-- Base Model: Qwen2.5-32B
 - Training Approach: Multi-stage expert iteration with best-first tree search
 - Training Data Sources:
   - Mathlib (via LeanDojo)
   - Lean-Github repositories
   - Autoformalized NuminaMath datasets
-## 📈 Performance
-BFS-Prover-V2-32B achieves 95.08% on the miniF2F test, when integrated with the planner-based multi-agent tree search system, which significantly outperforms all previous step-provers.
-Additionally, the model demonstrates strong generalization to undergraduate-level mathematics, independently attaining 41.4% on the ProofNet test without a planner.
-## ⚙️ Usage
-- The model expects Lean4 tactic states in the format `"{state}:::"`
 - `:::` serves as a special indicator to signal the model to generate a tactic for the given state.
 - The model will echo back the input state followed by the generated tactic.
@@ -68,15 +93,13 @@ outputs = model.generate(**inputs)
 tactic = tokenizer.decode(outputs[0], skip_special_tokens=True).split(sep)[1]
 print(tactic)
-# Generated tactic: "nlinarith [sq_nonneg (c - a), sq_nonneg (b - a), sq_nonneg (c - b), h₀.1, h₀.2.1, h₀.2.2, h₁.le, h₂.le, h₃.le]"
 ```
-## 📚 Citation
-If you use this model in your research, please cite our paper:
 ```bibtex
-@article{xin2025bfsproverv2,
   title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers},
   author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia},
   journal={arXiv preprint arXiv:2509.06493},
@@ -84,11 +107,11 @@ If you use this model in your research, please cite our paper:
 }
 ```
-## 📄 License
-https://choosealicense.com/licenses/apache-2.0/
-## 📧 Contact
 For questions and feedback about the tactic generator model, please contact:
 - Ran Xin (ran.xin@bytedance.com)

 ---
 <div align="center">
+  <h1 style="font-size: 2.0em;">BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers</h1>
+  <div align="center" style="line-height: 1;">
+    <a href="https://bfs-prover.github.io/V2/">
+      <img src="https://img.shields.io/badge/Homepage-BFS--Prover--V2-78DED4?style=flat-square&labelColor=2E5AA8">
+    </a>
+    <a href="https://arxiv.org/abs/2509.06493">
+      <img src="https://img.shields.io/badge/arXiv-2509.06493-b31b1b.svg?style=flat-square&labelColor=2E5AA8">
+    </a>
+    <a href="https://huggingface.co/collections/ByteDance-Seed/bfs-prover-68db961a5fdf9de045440230">
+      <img src="https://img.shields.io/badge/GitHub-BFS--Prover--V2-808080?&style=flat-square&labelColor=2E5AA8">
+    </a>
+    <a href="https://github.com/cmu-l3/llmlean">
+      <img src="https://img.shields.io/badge/Integration-LLMLean-black?style=flat-square&labelColor=2E5AA8">
+    </a>
+    <a href="https://www.apache.org/licenses/LICENSE-2.0.txt">
+      <img src="https://img.shields.io/badge/License-Apache%202.0-purple.svg?style=flat-square&labelColor=2E5AA8">
+    </a>
   </div>
 </div>
+## Introduction
+We introduce **BFS-Prover-V2**, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. BFS-Prover-V2 introduces novel solutions to overcome these limitations through:
+1. **Training-time scaling**: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training
+2. **Inference-time scaling**: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time
+**BFS-Prover-V2** achieves 95.08\% and 41.4\% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers.
+This repo contains the **BFS-Prover-V2-7B** model, with the following features:
+- Base Model: Qwen2.5-Math-7B
 - Training Approach: Multi-stage expert iteration with best-first tree search
 - Training Data Sources:
   - Mathlib (via LeanDojo)
   - Lean-Github repositories
   - Autoformalized NuminaMath datasets
+  - Goedel-Pset
+## Benchmark Performance
+<div align="center">
+| Model | miniF2F-test | miniF2F-valid | ProofNet-test |
+|:------|:------------:|:-------------:|:-------------:|
+| 👉 **BFS-Prover-V2-7B** | 82.4% | - | - |
+| BFS-Prover-V2-32B | 86.1% | 85.5% | 41.4% |
+| BFS-Prover-V2-32B w/ Planner | 95.08% | 95.5% | - |
+</div>
+## Usage
+- The model expects input in the format `"{state}:::"` where {state} is a Lean4 tactic state.
 - `:::` serves as a special indicator to signal the model to generate a tactic for the given state.
 - The model will echo back the input state followed by the generated tactic.
 tactic = tokenizer.decode(outputs[0], skip_special_tokens=True).split(sep)[1]
 print(tactic)
+# Generated tactic: "nlinarith [sq_nonneg (a - b), sq_nonneg (c - a), sq_nonneg (b - c)]"
 ```
+## Citation
 ```bibtex
+@article{xin2025scaling,
   title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers},
   author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia},
   journal={arXiv preprint arXiv:2509.06493},
 }
 ```
+## License
+This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt).
+## Contact
 For questions and feedback about the tactic generator model, please contact:
 - Ran Xin (ran.xin@bytedance.com)