File size: 4,440 Bytes
d7f9310
 
 
 
 
 
 
 
 
 
 
49e2f80
 
 
 
 
 
 
 
02f4d82
49e2f80
 
 
 
 
 
 
 
d7f9310
 
49e2f80
 
d7f9310
49e2f80
d7f9310
49e2f80
d7f9310
49e2f80
 
 
 
 
d7f9310
 
 
 
 
 
 
49e2f80
d7f9310
49e2f80
 
 
d7f9310
49e2f80
 
 
 
 
 
 
 
 
d7f9310
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49e2f80
d7f9310
 
49e2f80
d7f9310
 
 
 
 
 
 
49e2f80
d7f9310
49e2f80
d7f9310
49e2f80
d7f9310
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-32B
pipeline_tag: text-generation
tags:
- lean4
- step-prover
---

<div align="center">
  <h1 style="font-size: 2.0em;">BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers</h1>
  <div align="center" style="line-height: 1;">
    <a href="https://bfs-prover.github.io/V2/">
      <img src="https://img.shields.io/badge/Homepage-BFS--Prover--V2-78DED4?style=flat-square&labelColor=2E5AA8">
    </a>
    <a href="https://arxiv.org/abs/2509.06493">
      <img src="https://img.shields.io/badge/arXiv-2509.06493-b31b1b.svg?style=flat-square&labelColor=2E5AA8">
    </a>
    <a href="https://github.com/ByteDance-Seed/BFS-Prover-V2">
      <img src="https://img.shields.io/badge/GitHub-BFS--Prover--V2-808080?&style=flat-square&labelColor=2E5AA8">
    </a>
    <a href="https://github.com/cmu-l3/llmlean">
      <img src="https://img.shields.io/badge/Integration-LLMLean-black?style=flat-square&labelColor=2E5AA8">
    </a>
    <a href="https://www.apache.org/licenses/LICENSE-2.0.txt">
      <img src="https://img.shields.io/badge/License-Apache%202.0-purple.svg?style=flat-square&labelColor=2E5AA8">
    </a>
  </div>
</div>
  
## Introduction

We introduce **BFS-Prover-V2**, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. BFS-Prover-V2 introduces novel solutions to overcome these limitations through:

1. **Training-time scaling**: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training

2. **Inference-time scaling**: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time

**BFS-Prover-V2** achieves 95.08\% and 41.4\% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers.

This repo contains the **BFS-Prover-V2-32B** model, with the following features:

- Base Model: Qwen2.5-32B
- Training Approach: Multi-stage expert iteration with best-first tree search
- Training Data Sources:
  - Mathlib (via LeanDojo)
  - Lean-Github repositories
  - Autoformalized NuminaMath datasets
  - Goedel-Pset

## Benchmark Performance

<div align="center">

| Model | miniF2F-test | miniF2F-valid | ProofNet-test |
|:------|:------------:|:-------------:|:-------------:|
| BFS-Prover-V2-7B | 82.4% | - | - |
| 👉 **BFS-Prover-V2-32B** | 86.1% | 85.5% | 41.4% |
| 👉 **BFS-Prover-V2-32B** w/ Planner | 95.08% | 95.5% | - |
</div>

## Usage
- The model expects input in the format `"{state}:::"` where {state} is a Lean4 tactic state.
- `:::` serves as a special indicator to signal the model to generate a tactic for the given state. 
- The model will echo back the input state followed by the generated tactic.


```python
# Example code for loading and using the tactic generator model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ByteDance-Seed/BFS-Prover-V2-32B")
tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/BFS-Prover-V2-32B")

# imo_1964_p2 from miniF2F
state = """a b c : ℝ

    h₀ : 0 < a ∧ 0 < b ∧ 0 < c

    h₁ : c < a + b

    h₂ : b < a + c

    h₃ : a < b + c

    ⊢ a ^ 2 * (b + c - a) + b ^ 2 * (c + a - b) + c ^ 2 * (a + b - c) ≤ 3 * a * b * c"""

# Tactic generation
sep = ":::"
prompt = state + sep
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
tactic = tokenizer.decode(outputs[0], skip_special_tokens=True).split(sep)[1]
print(tactic)

# Generated tactic: "nlinarith [sq_nonneg (a - b), sq_nonneg (c - a), sq_nonneg (b - c)]"
```

## Citation

```bibtex
@article{xin2025scaling,
  title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers},
  author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia},
  journal={arXiv preprint arXiv:2509.06493},
  year={2025}
}
```

## License

This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt).

## Contact

For questions and feedback about the tactic generator model, please contact:
- Ran Xin (ran.xin@bytedance.com)
- Zeyu Zheng (zeyuzhen@andrew.cmu.edu)