APRIL-Goedel-8B: Lean Proof Repair with Explanations

This model is a LoRA finetune of Goedel-Prover-V2-8B on the APRIL dataset for Lean 4 proof repair with natural-language explanations. Given an erroneous Lean proof and compiler feedback, the model produces a diagnosis explaining the failure, a fix suggestion, and a corrected proof.

Model Details

  • Base model: Goedel-Prover-V2-8B
  • Method: Supervised finetuning with LoRA (rank 32, α = 64)
  • Training data: APRIL — 260K paired erroneous/correct Lean proofs with compiler diagnostics and LLM-generated explanations
  • Task: Joint proof repair + natural-language explanation generation
  • Lean version: 4.22.0-rc4

Results

Single-shot proof repair accuracy (pass@1) on the APRIL test set (1,835 examples), evaluated by Lean compilation:

Error Type This Model (w/ exp) This Model (w/o exp) Base Goedel-8B Goedel-32B
Full 34.6% 36.7% 15.5% 26.8%
Tactic 41.7% 48.5% 19.6% 34.2%
Line 18.5% 25.5% 20.0% 28.5%
Theorem 36.8% 37.5% 12.7% 23.0%
Multi-Line 20.8% 24.3% 19.4% 32.6%

Model & Dataset Download

Resource Description Link
APRIL Dataset 260K Lean proof-repair tuples with compiler diagnostics and explanations uw-math-ai/APRIL
gAPRIL-w-exp Goedel-8B finetuned on APRIL with joint explanation supervision uw-math-ai/gAPRIL-w-exp
gAPRIL-wo-exp Goedel-8B finetuned on APRIL for repair only (no explanations) uw-math-ai/gAPRIL-wo-exp

Joint explanation training reduces pass@1 by ~2% compared to repair-only training, but produces human-interpretable diagnostics that support downstream use (e.g., augmenting other models). See the paper for details on explanation utility.

Usage

The model expects a chat-formatted prompt with the erroneous proof, goal state, error line, and compiler error message. The assistant response contains the corrected proof in a lean code block.

System: You are diagnosing a single failing proof

User:

Explain the error, suggest a fix, and provide the corrected proof based on the context:

Incorrect Proof: <erroneous proof>
State: <goal state before error from InfoView>
Line at error: <error-occurred line of code>
Lean error: <error messages from InfoView>

Assistant (model output):

Explanation: <explanation of error cause>
Fix: <code manipulation fix suggestion>
Corrected Proof: <corrected proof>

Example Inference

import re

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

torch.manual_seed(42)


def extract_proof_from_text(output):
    lean_codes = re.findall(r"```lean\s*(.*?)\s*```", output, re.DOTALL)
    if not lean_codes or len(lean_codes) == 0:
        lean_codes = re.findall(r"```lean4\s*(.*?)\s*```", output, re.DOTALL)
    words = ["by", ":="]

    for i in range(len(lean_codes)):
        lean_code = lean_codes[-i - 1]
        if all(word in lean_code for word in words):
            return lean_code
    return None


model_id = "uw-math-ai/gAPRIL-w-exp"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True
)

system_prompt = (
    "You are a Lean 4 programmer diagnosing a single failing proof. "
    "Assume you only see the incorrect proof text, the infoview state"
    " near the failure, and Lean's error message."
)

# Context information for the incorrect proof
incorrect_proof = """
theorem lean_problem : IsLeast {x : ℕ | x > 0 ∧ (7 * x) % 100 = 29} 47 := by
  constructor
  · constructor
    · norm_num
    · norm_num
  · intro x ⟨hx_pos, hx_cong⟩
    by_contra h
    push_neg at h
    obtain ⟨h_le, h_ne⟩ := lt_iff_le_and_ne.mp h
    have h_lt := h_le
    revert x hx_pos hx_cong h_lt
    refine' Nat.le_induction _ _ 47 _ <;> intros x hx_lt hx_pos hx_cong
    · rfl
    · have : x < 47 := by omega
      interval_cases x
      all_goals try { norm_num at hx_cong; norm_num }
""".strip()

infoview_state = (
    "case right.intro.refine'_4 ⊢ ∀ (n : ℕ), sorry ≤ n → "
    "(∀ ⦃x : ℕ⦄, x > 0 → 7 * x % 100 = 29 → x < n → x ≤ n → x ≠ n → x ≤ n → False) → "
    "∀ ⦃x : ℕ⦄, x > 0 → 7 * x % 100 = 29 → x < n + 1 → x ≤ n + 1 → x ≠ n + 1 → x ≤ n + 1 → False"
)

line_at_error = "refine' Nat.le_induction _ _ 47 _ <;> intros x hx_lt hx_pos hx_cong"

error_message = (
    "tactic 'introN' failed, insufficient number of binders\n"
    "case right.intro.refine'_1\n⊢ ℕ"
)

user_prompt = f"""
**Instruction:** Explain why this proof fails, then explain how to fix it at a high level \
(no code). Finally, provide the full corrected Lean 4 theorem/proof in a single ```lean``` code block.

**Context:**

Incorrect proof:
```lean
{incorrect_proof}
```

Infoview state:
{infoview_state}

Line at error:
{line_at_error}

Lean error:
{error_message}
""".strip()


chat = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]

inputs = tokenizer.apply_chat_template(
    chat, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=8192)
decoded_outputs = tokenizer.batch_decode(outputs)
print(f"model_output: {decoded_outputs[0]}")
proof = extract_proof_from_text(decoded_outputs[0])
print(f"extracted_proof: {proof}")

Citation

@article{wang2026repairlean,
  title  = {Learning to Repair Lean Proofs from Compiler Feedback},
  author = {Wang, Evan and Chess, Simon and Lee, Daniel and Ge, Siyuan and Mallavarapu, Ajit and Ilin, Vasily},
  journal= {arXiv preprint arXiv:2602.02990},
  year   = {2026},
  doi    = {10.48550/arXiv.2602.02990},
  url    = {https://arxiv.org/abs/2602.02990}
}

Acknowledgements

Work by the Math AI Lab, University of Washington. Supported by UW eScience School, UW IT (AWS credits), UW Department of Applied Mathematics (GPU access), and Nebius (LLM cloud credits).

Downloads last month
89
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for uw-math-ai/gAPRIL-w-exp

Finetuned
Qwen/Qwen3-8B
Finetuned
(3)
this model

Dataset used to train uw-math-ai/gAPRIL-w-exp

Paper for uw-math-ai/gAPRIL-w-exp