metadata
license: apache-2.0
datasets:
- purewhite42/CoPA_Dataset
- purewhite42/formal_math500
- purewhite42/minif2f_solving
language:
- en
base_model: Qwen/Qwen2.5-Math-7B
pipeline_tag: text-generation
library_name: transformers
tags:
- formal-mathematics
- lean4
- solution-step-prediction
[NeurIPS'25] Bootstrapping Hierarchical Autoregressive Formal Reasoner with Chain-of-Proxy-Autoformalization
Qi Liu, Xinhao Zheng, Renqiu Xia, Qinxiang Cao, Junchi Yan* (* indicates corresponding author)
School of Computer Science & School of Artificial Intelligence, Shanghai Jiao Tong University
Shanghai Innovation Institute
Please refer to the 📺GitHub repo and 📃Paper for more details.
📈 Performance
| Cycle | Method | FormalMath500 | MiniF2F-Solving | ||
|---|---|---|---|---|---|
| Solved% ↑ | Budget ↓ | Solved % ↑ | Budget ↓ | ||
| 1 | BFS (This Model) | 9.52% ± 0.57% | 28139 ± 104 | 9.64% ± 1.66% | 27712 ± 339 |
| 1 | WG | 18.78% ± 0.22% | 18456 ± 92 | 24.95% ± 1.09% | 18853 ± 454 |
| 1 | WG $(K_W=16)$ | 21.78% ± 0.12% | 35391 ± 153 | 28.83% ± 0.77% | 35895 ± 487 |
| 1 | AR (This Model) | 34.39% ± 0.78% | 20860 ± 233 | 44.41% ± 0.34% | 17814 ± 140 |
| 1 | H-BFS | 13.32% ± 1.47% | 27777 ± 360 | 12.25% ± 1.47% | 27479 ± 203 |
| 1 | H-WG | 36.77% ± 0.86% | 17039 ± 461 | 47.30% ± 1.01% | 16303 ± 388 |
| 1 | H-WG $(K_W=16)$ | 40.04% ± 0.50% | 32134 ± 226 | 51.08% ± 0.44% | 30357 ± 212 |
| 1 | HAR | 43.65% ± 1.35% | 18432 ± 344 | 55.50% ± 0.46% | 15069 ± 143 |
| 2 | HAR | 44.09% ± 0.54% | 18273 ± 116 | 56.58% ± 0.92% | 14754 ± 269 |
⚙️ Usage
We recommend using vLLM to serve the model locally.
This model is SFTed for the following tasks with 🤗purewhite42/CoPA_Dataset.
- Next-Solution-Step Prediction (
next_solution_step_prediction): Input an informal problem and a current solution state; the model outputs the next solution step.
📋 Example
Next-Solution-Step Prediction
System Prompt
You are a Lean 4 expert.
User Prompt
Given a natural language math problem and the current solution state, please generate the next solution step.
Please use comments to plan and reason in natural language and deductive reasoning to derive the answer.
Assume `Mathlib` is imported.
# Informal Problem
"""
Given $2\sin (π-x)+1=0$, find the value of $\cos 2x$.
"""
# Current Solution State
```lean4
case h.mp
answer x : ℝ
h_eq : 2 * Real.sin (Real.pi - x) + 1 = 0
h_answer : Real.cos (2 * x) = answer
h_given_eq : 2 * Real.sin (Real.pi - x) + 1 = 0
⊢ ?w
```
Response
# Next Solution Step
```lean4
-- Using the cofunction identity $\sin(π-x)=\sin(x)$, we have: $2\sin(x)+1=0$
have h_cofunction : Real.sin (Real.pi - x) = Real.sin x := by {
simp [Real.sin_pi_sub]
}
```
📚 Citation
If you find our work useful in your research, please cite
@inproceedings{
liu2025bootstrapping,
title={Bootstrapping Hierarchical Autoregressive Formal Reasoner with Chain-of-Proxy-Autoformalization},
author={Liu, Qi and Zheng, Xinhao and Xia, Renqiu and Cao, Qinxiang and Yan, Junchi},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=2Xn8h68mP3}
}
©️ License
This project is released under the Apache 2.0 license. Please see the LICENSE file for more information.
✉️ Contact
Feel free to discuss the paper/data/code with us through issues/emails!
- Qi Liu: purewhite@sjtu.edu.cn