--- pipeline_tag: text-ranking library_name: sentence-transformers license: mit --- # Model Card: Assisting Mathematical Formalization with A Learning-based Premise Retriever ## Model Description This model is the first version designed for **premise retrieval** in **Lean**, based on the **state representation** of Lean. The model follows the architecture described in the paper: [Assisting Mathematical Formalization with A Learning-based Premise Retriever](https://arxiv.org/abs/2501.13959) The model implementation and code are available at: [GitHub Repository](https://github.com/ruc-ai4math/Premise-Retrieval) [Try our model](https://premise-search.com) ## Usage You can use this model with the `sentence-transformers` library to embed queries and premises and then calculate their similarity for retrieval. ```python from sentence_transformers import SentenceTransformer, util import torch # Load the pretrained model model = SentenceTransformer('ruc-ai4math/Lean_State_Search_Random') # Example Lean proof state (query) and a list of premises query = " (n : \u2115), n + 0 = n " premises = [ " (n : \u2115) n + 0 = n ", " (n m : \u2115) n + m = m + n ", " (n : \u2115) n = n ", "lemma add_zero (n : \u2115) : n + 0 = n := by sorry" # An actual Lean lemma ] # Encode the query and premises into embeddings query_embedding = model.encode(query, convert_to_tensor=True) premise_embeddings = model.encode(premises, convert_to_tensor=True) # Calculate cosine similarity between the query and all premises cosine_scores = util.cos_sim(query_embedding, premise_embeddings) # Print the scores for each premise print("Similarity scores:") for i, score in enumerate(cosine_scores[0]): print(f" - Premise {i+1}: '{premises[i]}', Score: {score.item():.4f}") # Find the index of the premise with the highest similarity score best_match_idx = torch.argmax(cosine_scores).item() print(f" Best matching premise: '{premises[best_match_idx]}'") ``` ## Citation If you use this model, please cite the following paper: ``` @misc{tao2025assistingmathematicalformalizationlearningbased, title={Assisting Mathematical Formalization with A Learning-based Premise Retriever}, author={Yicheng Tao and Haotian Liu and Shanwen Wang and Hongteng Xu}, year={2025}, eprint={2501.13959}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2501.13959}, } ```