Instructions to use delta-lab-ai/lean-finder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use delta-lab-ai/lean-finder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="delta-lab-ai/lean-finder") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("delta-lab-ai/lean-finder") model = AutoModelForCausalLM.from_pretrained("delta-lab-ai/lean-finder") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use delta-lab-ai/lean-finder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "delta-lab-ai/lean-finder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "delta-lab-ai/lean-finder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/delta-lab-ai/lean-finder
- SGLang
How to use delta-lab-ai/lean-finder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "delta-lab-ai/lean-finder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "delta-lab-ai/lean-finder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "delta-lab-ai/lean-finder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "delta-lab-ai/lean-finder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use delta-lab-ai/lean-finder with Docker Model Runner:
docker model run hf.co/delta-lab-ai/lean-finder
Lean Finder: Semantic Search For Mathlib That Understands User Intents
Introduction
We present Lean Finder, a semantic search engine for Lean and mathlib that understands and aligns with the intents of mathematicians, published at ICLR 2026. Progress in formal theorem proving is often hindered by the difficulty of locating relevant theorems and the steep learning curve of the Lean 4 language. Existing Lean search engines rely primarily on informalizations (natural language translations of formal statements), while largely overlooking the mismatch with real-world user queries.
Lean Finder proposes a user-centered semantic search tailored to the needs of mathematicians. Our approach:
- User-centered query synthesis: Analyzes and clusters the semantics of public Lean discussions, then fine-tunes text embeddings on synthesized queries that emulate user intents across diverse intent clusters.
- Diverse input modalities: Supports informalized statements, synthetic user queries, augmented proof states, and formal statements. Model was trained with over 1.4M query-code pairs in total.
- Preference alignment: Further aligns Lean Finder with mathematicians' preferences via DPO using human and LLM feedback collected from a deployed web service.
Performance
- 81.6% Top-3 preference rate in a user study with real Lean users.
- 30%+ relative improvement (Recall@1) over previous search engines and GPT-4o on informalized statement queries.
- 16%+ relative improvement on proof state queries.
- Compatible with LLM-based theorem provers as a plug-and-play retrieval backbone.
Model Versions
The model in this repository's main revision is a newly optimized version that is significantly stronger than the model described in the paper. The model corresponding to the paper release is available in the old-model revision. For details on the differences and how to access each version, see the GitHub repository.
Usage
Try the web service at leanfinder.github.io.
For code, dataset, and instructions for hosting the model locally, see the GitHub repository.
Citation
@article{lu2025lean,
title={Lean finder: Semantic search for mathlib that understands user intents},
author={Lu, Jialin and Emond, Kye and Yang, Kaiyu and Chaudhuri, Swarat and Sun, Weiran and Chen, Wuyang},
journal={arXiv preprint arXiv:2510.15940},
year={2025}
}
- Downloads last month
- 118