File size: 4,440 Bytes
d7f9310 49e2f80 02f4d82 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 49e2f80 d7f9310 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-32B
pipeline_tag: text-generation
tags:
- lean4
- step-prover
---
<div align="center">
<h1 style="font-size: 2.0em;">BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers</h1>
<div align="center" style="line-height: 1;">
<a href="https://bfs-prover.github.io/V2/">
<img src="https://img.shields.io/badge/Homepage-BFS--Prover--V2-78DED4?style=flat-square&labelColor=2E5AA8">
</a>
<a href="https://arxiv.org/abs/2509.06493">
<img src="https://img.shields.io/badge/arXiv-2509.06493-b31b1b.svg?style=flat-square&labelColor=2E5AA8">
</a>
<a href="https://github.com/ByteDance-Seed/BFS-Prover-V2">
<img src="https://img.shields.io/badge/GitHub-BFS--Prover--V2-808080?&style=flat-square&labelColor=2E5AA8">
</a>
<a href="https://github.com/cmu-l3/llmlean">
<img src="https://img.shields.io/badge/Integration-LLMLean-black?style=flat-square&labelColor=2E5AA8">
</a>
<a href="https://www.apache.org/licenses/LICENSE-2.0.txt">
<img src="https://img.shields.io/badge/License-Apache%202.0-purple.svg?style=flat-square&labelColor=2E5AA8">
</a>
</div>
</div>
## Introduction
We introduce **BFS-Prover-V2**, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. BFS-Prover-V2 introduces novel solutions to overcome these limitations through:
1. **Training-time scaling**: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training
2. **Inference-time scaling**: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time
**BFS-Prover-V2** achieves 95.08\% and 41.4\% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers.
This repo contains the **BFS-Prover-V2-32B** model, with the following features:
- Base Model: Qwen2.5-32B
- Training Approach: Multi-stage expert iteration with best-first tree search
- Training Data Sources:
- Mathlib (via LeanDojo)
- Lean-Github repositories
- Autoformalized NuminaMath datasets
- Goedel-Pset
## Benchmark Performance
<div align="center">
| Model | miniF2F-test | miniF2F-valid | ProofNet-test |
|:------|:------------:|:-------------:|:-------------:|
| BFS-Prover-V2-7B | 82.4% | - | - |
| 👉 **BFS-Prover-V2-32B** | 86.1% | 85.5% | 41.4% |
| 👉 **BFS-Prover-V2-32B** w/ Planner | 95.08% | 95.5% | - |
</div>
## Usage
- The model expects input in the format `"{state}:::"` where {state} is a Lean4 tactic state.
- `:::` serves as a special indicator to signal the model to generate a tactic for the given state.
- The model will echo back the input state followed by the generated tactic.
```python
# Example code for loading and using the tactic generator model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ByteDance-Seed/BFS-Prover-V2-32B")
tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/BFS-Prover-V2-32B")
# imo_1964_p2 from miniF2F
state = """a b c : ℝ
h₀ : 0 < a ∧ 0 < b ∧ 0 < c
h₁ : c < a + b
h₂ : b < a + c
h₃ : a < b + c
⊢ a ^ 2 * (b + c - a) + b ^ 2 * (c + a - b) + c ^ 2 * (a + b - c) ≤ 3 * a * b * c"""
# Tactic generation
sep = ":::"
prompt = state + sep
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
tactic = tokenizer.decode(outputs[0], skip_special_tokens=True).split(sep)[1]
print(tactic)
# Generated tactic: "nlinarith [sq_nonneg (a - b), sq_nonneg (c - a), sq_nonneg (b - c)]"
```
## Citation
```bibtex
@article{xin2025scaling,
title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers},
author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia},
journal={arXiv preprint arXiv:2509.06493},
year={2025}
}
```
## License
This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt).
## Contact
For questions and feedback about the tactic generator model, please contact:
- Ran Xin (ran.xin@bytedance.com)
- Zeyu Zheng (zeyuzhen@andrew.cmu.edu) |