Qwen3-4B Lean Prover β€” Experiment 1

Training

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Training data: Goedel-LM/Lean-workbook-proofs (29,750 verified proof pairs)
  • Fine-tuning method: LoRA (r=16, alpha=32, target: q/k/v/o_proj)
  • Epochs: 3
  • Per-device batch size: 2 Γ— 3 GPUs Γ— 3 grad accum = 18 effective
  • Learning rate: 2e-4 (cosine schedule, 5% warmup)
  • Max sequence length: 1024
  • Hardware: 3Γ— RTX 5090 32GB (RunPod)

Results on miniF2F test (244 problems, pass@32)

Model Solved pass@1 pass@8 pass@32
Baseline (no fine-tuning) 0/244 0.0% 0.0% 0.0%
This checkpoint (29.7K SFT) 22/244 0.86% 4.47% 9.02%
Goedel-Prover-SFT (paper) 142/244 β€” β€” 57.6%
DeepSeek-Prover-V2-7B (paper) β€” β€” β€” 73.4%

Reproducing

Downloads last month
65
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for yotsubian/qwen

Adapter
(5269)
this model