| | --- |
| | license: apache-2.0 |
| | base_model: |
| | - Qwen/Qwen2.5-Math-7B |
| | pipeline_tag: text-generation |
| | tags: |
| | - lean4 |
| | - step-prover |
| | --- |
| | |
| | <div align="center"> |
| | <h1 style="font-size: 2.0em;">BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers</h1> |
| | <div align="center" style="line-height: 1;"> |
| | <a href="https://bfs-prover.github.io/V2/"> |
| | <img src="https://img.shields.io/badge/Homepage-BFS--Prover--V2-78DED4?style=flat-square&labelColor=2E5AA8"> |
| | </a> |
| | <a href="https://arxiv.org/abs/2509.06493"> |
| | <img src="https://img.shields.io/badge/arXiv-2509.06493-b31b1b.svg?style=flat-square&labelColor=2E5AA8"> |
| | </a> |
| | <a href="https://huggingface.co/collections/ByteDance-Seed/bfs-prover-68db961a5fdf9de045440230"> |
| | <img src="https://img.shields.io/badge/GitHub-BFS--Prover--V2-808080?&style=flat-square&labelColor=2E5AA8"> |
| | </a> |
| | <a href="https://github.com/cmu-l3/llmlean"> |
| | <img src="https://img.shields.io/badge/Integration-LLMLean-black?style=flat-square&labelColor=2E5AA8"> |
| | </a> |
| | <a href="https://www.apache.org/licenses/LICENSE-2.0.txt"> |
| | <img src="https://img.shields.io/badge/License-Apache%202.0-purple.svg?style=flat-square&labelColor=2E5AA8"> |
| | </a> |
| | </div> |
| | </div> |
| | |
| | ## Introduction |
| | |
| | We introduce **BFS-Prover-V2**, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. BFS-Prover-V2 introduces novel solutions to overcome these limitations through: |
| |
|
| | 1. **Training-time scaling**: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training |
| |
|
| | 2. **Inference-time scaling**: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time |
| |
|
| | **BFS-Prover-V2** achieves 95.08\% and 41.4\% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers. |
| |
|
| | This repo contains the **BFS-Prover-V2-7B** model, with the following features: |
| |
|
| | - Base Model: Qwen2.5-Math-7B |
| | - Training Approach: Multi-stage expert iteration with best-first tree search |
| | - Training Data Sources: |
| | - Mathlib (via LeanDojo) |
| | - Lean-Github repositories |
| | - Autoformalized NuminaMath datasets |
| | - Goedel-Pset |
| |
|
| | ## Benchmark Performance |
| |
|
| | <div align="center"> |
| |
|
| | | Model | miniF2F-test | miniF2F-valid | ProofNet-test | |
| | |:------|:------------:|:-------------:|:-------------:| |
| | | 👉 **BFS-Prover-V2-7B** | 82.4% | - | - | |
| | | BFS-Prover-V2-32B | 86.1% | 85.5% | 41.4% | |
| | | BFS-Prover-V2-32B w/ Planner | 95.08% | 95.5% | - | |
| | </div> |
| |
|
| | ## Usage |
| | - The model expects input in the format `"{state}:::"` where {state} is a Lean4 tactic state. |
| | - `:::` serves as a special indicator to signal the model to generate a tactic for the given state. |
| | - The model will echo back the input state followed by the generated tactic. |
| |
|
| |
|
| | ```python |
| | # Example code for loading and using the tactic generator model |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | model = AutoModelForCausalLM.from_pretrained("ByteDance-Seed/BFS-Prover-V2-7B") |
| | tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/BFS-Prover-V2-7B") |
| | |
| | # imo_1964_p2 from miniF2F |
| | state = """a b c : ℝ |
| | |
| | h₀ : 0 < a ∧ 0 < b ∧ 0 < c |
| | |
| | h₁ : c < a + b |
| | |
| | h₂ : b < a + c |
| | |
| | h₃ : a < b + c |
| | |
| | ⊢ a ^ 2 * (b + c - a) + b ^ 2 * (c + a - b) + c ^ 2 * (a + b - c) ≤ 3 * a * b * c""" |
| | |
| | # Tactic generation |
| | sep = ":::" |
| | prompt = state + sep |
| | inputs = tokenizer(prompt, return_tensors="pt") |
| | outputs = model.generate(**inputs) |
| | tactic = tokenizer.decode(outputs[0], skip_special_tokens=True).split(sep)[1] |
| | print(tactic) |
| | |
| | # Generated tactic: "nlinarith [sq_nonneg (a - b), sq_nonneg (c - a), sq_nonneg (b - c)]" |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{xin2025scaling, |
| | title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers}, |
| | author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia}, |
| | journal={arXiv preprint arXiv:2509.06493}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt). |
| |
|
| | ## Contact |
| |
|
| | For questions and feedback about the tactic generator model, please contact: |
| | - Ran Xin (ran.xin@bytedance.com) |
| | - Zeyu Zheng (zeyuzhen@andrew.cmu.edu) |