Lean Finder Logo

Lean Finder: Semantic Search For Mathlib That Understands User Intents

Website GitHub arXiv License

Introduction

We present Lean Finder, a semantic search engine for Lean and mathlib that understands and aligns with the intents of mathematicians, published at ICLR 2026. Progress in formal theorem proving is often hindered by the difficulty of locating relevant theorems and the steep learning curve of the Lean 4 language. Existing Lean search engines rely primarily on informalizations (natural language translations of formal statements), while largely overlooking the mismatch with real-world user queries.

Lean Finder proposes a user-centered semantic search tailored to the needs of mathematicians. Our approach:

  1. User-centered query synthesis: Analyzes and clusters the semantics of public Lean discussions, then fine-tunes text embeddings on synthesized queries that emulate user intents across diverse intent clusters.
  2. Diverse input modalities: Supports informalized statements, synthetic user queries, augmented proof states, and formal statements. Model was trained with over 1.4M query-code pairs in total.
  3. Preference alignment: Further aligns Lean Finder with mathematicians' preferences via DPO using human and LLM feedback collected from a deployed web service.

Performance

  • 81.6% Top-3 preference rate in a user study with real Lean users.
  • 30%+ relative improvement (Recall@1) over previous search engines and GPT-4o on informalized statement queries.
  • 16%+ relative improvement on proof state queries.
  • Compatible with LLM-based theorem provers as a plug-and-play retrieval backbone.

Model Versions

The model in this repository's main revision is a newly optimized version that is significantly stronger than the model described in the paper. The model corresponding to the paper release is available in the old-model revision. For details on the differences and how to access each version, see the GitHub repository.

Usage

Try the web service at leanfinder.github.io.

For code, dataset, and instructions for hosting the model locally, see the GitHub repository.

Citation

@article{lu2025lean,
  title={Lean finder: Semantic search for mathlib that understands user intents},
  author={Lu, Jialin and Emond, Kye and Yang, Kaiyu and Chaudhuri, Swarat and Sun, Weiran and Chen, Wuyang},
  journal={arXiv preprint arXiv:2510.15940},
  year={2025}
}
Downloads last month
118
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using delta-lab-ai/lean-finder 1

Paper for delta-lab-ai/lean-finder