Spaces:
Running
Running
Add VeriDeepResearch: verified math research chatbot
Browse filesAgentic system that answers mathematical questions with Lean 4 verified proofs:
- Kimi K2.5 orchestrator via TokenFactory with function calling
- TheoremSearch (arXiv) + LeanExplore (Mathlib) for research
- Axle for fast Lean 4.28.0 verification
- Aristotle for automated theorem proving (fallback)
- Gradio ChatInterface on HuggingFace Spaces with LaTeX support
- $20/query cost limit, no sign-up required
Tested on 11 queries: 9/9 math questions verified, 2/2 non-math rejected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
.gitignore
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
__pycache__/
|
| 2 |
+
.env
|
CLAUDE.md
ADDED
|
File without changes
|
LINKS.md
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Useful Links
|
| 2 |
+
- Aristotle: https://aristotle.harmonic.fun/dashboard/docs/overview
|
| 3 |
+
- Clawristotle (just for reference): https://github.com/Vilin97/Clawristotle
|
| 4 |
+
- TheoremSearch: https://theoremsearch.com/
|
| 5 |
+
- LeanExplore: https://www.leanexplore.com/
|
| 6 |
+
- Kimi K2.5 on Token Factory: https://tokenfactory.nebius.com/models?modals=endpoint-details&model-id=moonshotai/Kimi-K2.5
|
| 7 |
+
- Axle: https://axle.axiommath.ai/
|
| 8 |
+
- HF space: https://huggingface.co/spaces/Vilin97/VeriDeepResearch
|
LOG.md
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Test Log
|
| 2 |
+
|
| 3 |
+
All tests run on https://vilin97-verideepresearch.hf.space on 2026-03-18.
|
| 4 |
+
|
| 5 |
+
## Simple / LLM-tricky questions
|
| 6 |
+
|
| 7 |
+
| # | Question | Time | Verified | Notes |
|
| 8 |
+
|---|----------|------|----------|-------|
|
| 9 |
+
| 1 | "Show that for all natural numbers n, n + 0 = n." | 16s | Yes | Proved by `rfl` (definitional equality). |
|
| 10 |
+
| 2 | "Prove that the sum of two even numbers is even." | 19s | Yes | Used `Even.add` from Mathlib + explicit proof. |
|
| 11 |
+
| 3 | "Which is larger, 9.9 or 9.11? Prove it." | 4s | Yes | Correctly answered 9.9 > 9.11. Proved with `norm_num`. Classic LLM failure point — verified correct. |
|
| 12 |
+
| 4 | "Prove that 1/3 + 1/3 + 1/3 = 1." | 16s | Yes | Proved over ℚ with `norm_num`. |
|
| 13 |
+
| 5 | "Prove that 1 = 2." | 2s | N/A | Correctly refused: "This statement is false in standard mathematics." No Lean code generated. |
|
| 14 |
+
|
| 15 |
+
## Graduate / Prelim level
|
| 16 |
+
|
| 17 |
+
| # | Question | Time | Verified | Notes |
|
| 18 |
+
|---|----------|------|----------|-------|
|
| 19 |
+
| 6 | "Prove that the composition of two injective functions is injective." | ~20s | Yes | Explicit proof with `intro`, `have`, `exact`. |
|
| 20 |
+
| 7 | "Prove Cantor's theorem: for any set S, there is no surjection from S to its power set P(S)." | 16s | Yes | Both a Mathlib one-liner (`Function.cantor_surjective`) and an explicit diagonal argument proof. |
|
| 21 |
+
| 8 | "Prove that every finite integral domain is a field." | 16s | Yes | Used `Finite.isDomain_to_isField` from Mathlib. |
|
| 22 |
+
|
| 23 |
+
## Research level
|
| 24 |
+
|
| 25 |
+
| # | Question | Time | Verified | Notes |
|
| 26 |
+
|---|----------|------|----------|-------|
|
| 27 |
+
| 9 | "Prove that a continuous bijection from a compact space to a Hausdorff space is a homeomorphism." | 66s | Yes | Used `isHomeomorph_iff_continuous_bijective` from Mathlib. Included detailed proof sketch explaining closed map argument. |
|
| 28 |
+
|
| 29 |
+
## Non-math rejection
|
| 30 |
+
|
| 31 |
+
| # | Question | Time | Result |
|
| 32 |
+
|---|----------|------|--------|
|
| 33 |
+
| 10 | "What is the best Italian restaurant in New York?" | 2s | Rejected: "I can only answer questions related to mathematics." |
|
| 34 |
+
| 11 | "What is the weather today?" | 2s | Rejected: "I'm a mathematical research assistant specialized in formalizing and verifying mathematical proofs." |
|
| 35 |
+
|
| 36 |
+
## Summary
|
| 37 |
+
|
| 38 |
+
- **9/9 mathematical questions answered correctly** (8 verified, 1 correctly identified as false)
|
| 39 |
+
- **2/2 non-math questions rejected**
|
| 40 |
+
- **Median response time: 16 seconds** for verified proofs
|
| 41 |
+
- **All Lean code verified** by the Axle proof checker (Lean 4.28.0 + Mathlib)
|
| 42 |
+
- The system correctly handles the classic "9.9 vs 9.11" trick that trips up many LLMs
|
| 43 |
+
- Fast path (Kimi K2.5 + Axle) suffices for most questions; Aristotle available as fallback for harder problems
|
PROMPT.md
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
I want to build VeriDeepResearch. It's like Deep Research but verified, using Lean. The end-product is a website that anyone can go to, type in a mathematical question and get an answer that is verified in Lean. The answer itself needs to be in natural language, but there should be an expandable section with the Lean code (a single file). The tool should refuse to answer any non-mathematical questions, because they do not have anything to verify. There should be no sign-up. The limit per query is $20 in TokenFactory costs.
|
README.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: VeriDeepResearch
|
| 3 |
+
emoji: "\U0001F9E0"
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 6.5.1
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: apache-2.0
|
| 11 |
+
short_description: Verified Deep Research powered by Lean 4
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# VeriDeepResearch
|
| 15 |
+
|
| 16 |
+
Verified Deep Research is an agentic system capable of performing Deep Research in a verified way: all math used in the research is formalized in Lean, completely eliminating mathematical mistakes.
|
| 17 |
+
|
| 18 |
+
Implementation (all tools are web-based):
|
| 19 |
+
- Kimi K2.5 on TokenFactory to orchestrate
|
| 20 |
+
- TheoremSearch API to find mathematical theorems on arXiv
|
| 21 |
+
- Aristotle to formalize in Lean
|
| 22 |
+
- Lean Explore to search mathlib
|
| 23 |
+
- Axle to check Lean correctness (on Lean v4.24)
|
| 24 |
+
- chatbot interface on the web (single query only)
|
| 25 |
+
- HuggingFace for hosting
|
agent.py
ADDED
|
@@ -0,0 +1,354 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import asyncio
|
| 4 |
+
import json
|
| 5 |
+
from openai import AsyncOpenAI
|
| 6 |
+
from config import (
|
| 7 |
+
TOKEN_FACTORY_API_KEY,
|
| 8 |
+
TOKEN_FACTORY_BASE_URL,
|
| 9 |
+
KIMI_MODEL,
|
| 10 |
+
INPUT_COST_PER_TOKEN,
|
| 11 |
+
OUTPUT_COST_PER_TOKEN,
|
| 12 |
+
MAX_COST_PER_QUERY,
|
| 13 |
+
MAX_AGENT_ITERATIONS,
|
| 14 |
+
ARISTOTLE_POLL_INTERVAL,
|
| 15 |
+
ARISTOTLE_MAX_POLLS,
|
| 16 |
+
)
|
| 17 |
+
from tools import (
|
| 18 |
+
search_theorems,
|
| 19 |
+
search_lean_library,
|
| 20 |
+
check_lean_code,
|
| 21 |
+
submit_to_aristotle,
|
| 22 |
+
check_aristotle_status,
|
| 23 |
+
get_aristotle_result,
|
| 24 |
+
TOOL_DEFINITIONS,
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
SYSTEM_PROMPT = """\
|
| 28 |
+
You are VeriDeepResearch, a mathematical research assistant that produces VERIFIED answers using Lean 4 and Mathlib.
|
| 29 |
+
|
| 30 |
+
## Rules
|
| 31 |
+
- REFUSE non-mathematical questions. Call final_answer with a polite refusal, empty lean_code, and verified=false.
|
| 32 |
+
- ALWAYS start Lean code with `import Mathlib`.
|
| 33 |
+
- Use Lean 4.28.0 syntax with full Mathlib.
|
| 34 |
+
- Use LaTeX notation in your natural language answers: $...$ for inline math, $$...$$ for display math.
|
| 35 |
+
|
| 36 |
+
## Workflow
|
| 37 |
+
|
| 38 |
+
### Phase 1: Research
|
| 39 |
+
1. Search for relevant theorems with **search_theorems**.
|
| 40 |
+
2. Search Mathlib for relevant declarations with **search_lean_library**.
|
| 41 |
+
|
| 42 |
+
### Phase 2: Fast attempt (try this first)
|
| 43 |
+
Using your research, write Lean 4 code and verify with **check_lean_code** (Axle — takes seconds).
|
| 44 |
+
- If verified: great, call **final_answer** immediately.
|
| 45 |
+
- If errors: try to fix and re-check (up to 3 attempts).
|
| 46 |
+
|
| 47 |
+
### Phase 3: Aristotle (if fast attempt fails or problem is complex)
|
| 48 |
+
If Axle verification fails after a few attempts, use Aristotle:
|
| 49 |
+
1. **Decompose** the problem into the main result + 3-5 sub-lemmas.
|
| 50 |
+
2. **Submit ALL** to Aristotle (call submit_to_aristotle for each):
|
| 51 |
+
- The main result as a natural language prompt
|
| 52 |
+
- Each sub-lemma as a separate prompt
|
| 53 |
+
- Optionally, Lean code with sorry: "Fill in the sorries:\\n```lean\\nimport Mathlib\\ntheorem foo : ... := by sorry\\n```"
|
| 54 |
+
3. **Continue working** — do NOT call wait_for_aristotle immediately. Instead:
|
| 55 |
+
- Search for more theorems and Mathlib declarations
|
| 56 |
+
- Try different Lean formulations
|
| 57 |
+
- Periodically call **check_aristotle_status** to see progress
|
| 58 |
+
4. **Collect results** when ready: use **wait_for_aristotle** or **get_aristotle_result** (if status is COMPLETE).
|
| 59 |
+
5. **Verify** Aristotle's output with check_lean_code.
|
| 60 |
+
6. **Assemble** the best verified pieces into a single file.
|
| 61 |
+
|
| 62 |
+
### Phase 4: Final answer
|
| 63 |
+
Call **final_answer** with:
|
| 64 |
+
- Clear natural language explanation (with LaTeX math)
|
| 65 |
+
- Complete verified Lean 4 code (single file, starts with `import Mathlib`)
|
| 66 |
+
- Whether verification succeeded
|
| 67 |
+
|
| 68 |
+
## Key principles
|
| 69 |
+
- Try the FAST PATH first (your own code + Axle).
|
| 70 |
+
- Use Aristotle for anything that doesn't verify quickly.
|
| 71 |
+
- Submit MULTIPLE Aristotle jobs with different approaches.
|
| 72 |
+
- Don't sit idle while Aristotle works — keep researching.
|
| 73 |
+
- Always verify with check_lean_code before calling final_answer.
|
| 74 |
+
"""
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
class CostTracker:
|
| 78 |
+
def __init__(self, max_cost: float = MAX_COST_PER_QUERY):
|
| 79 |
+
self.total_cost = 0.0
|
| 80 |
+
self.max_cost = max_cost
|
| 81 |
+
self.total_input_tokens = 0
|
| 82 |
+
self.total_output_tokens = 0
|
| 83 |
+
|
| 84 |
+
def add(self, usage) -> float:
|
| 85 |
+
input_t = usage.prompt_tokens or 0
|
| 86 |
+
output_t = usage.completion_tokens or 0
|
| 87 |
+
self.total_input_tokens += input_t
|
| 88 |
+
self.total_output_tokens += output_t
|
| 89 |
+
cost = input_t * INPUT_COST_PER_TOKEN + output_t * OUTPUT_COST_PER_TOKEN
|
| 90 |
+
self.total_cost += cost
|
| 91 |
+
return cost
|
| 92 |
+
|
| 93 |
+
@property
|
| 94 |
+
def over_budget(self) -> bool:
|
| 95 |
+
return self.total_cost >= self.max_cost
|
| 96 |
+
|
| 97 |
+
def summary(self) -> str:
|
| 98 |
+
return f"${self.total_cost:.4f} ({self.total_input_tokens} in / {self.total_output_tokens} out)"
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
TERMINAL_STATUSES = {
|
| 102 |
+
"COMPLETE", "COMPLETE_WITH_ERRORS", "FAILED",
|
| 103 |
+
"CANCELED", "OUT_OF_BUDGET",
|
| 104 |
+
}
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
async def run_agent(question: str):
|
| 108 |
+
"""Run the VeriDeepResearch agent.
|
| 109 |
+
|
| 110 |
+
Yields (display_text, final_result_or_none) tuples.
|
| 111 |
+
"""
|
| 112 |
+
client = AsyncOpenAI(
|
| 113 |
+
base_url=TOKEN_FACTORY_BASE_URL,
|
| 114 |
+
api_key=TOKEN_FACTORY_API_KEY,
|
| 115 |
+
)
|
| 116 |
+
cost_tracker = CostTracker()
|
| 117 |
+
status_log: list[str] = []
|
| 118 |
+
|
| 119 |
+
def add_status(msg: str):
|
| 120 |
+
status_log.append(msg)
|
| 121 |
+
|
| 122 |
+
def render_status():
|
| 123 |
+
return "\n".join(f"- {s}" for s in status_log)
|
| 124 |
+
|
| 125 |
+
messages: list[dict] = [
|
| 126 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
| 127 |
+
{"role": "user", "content": question},
|
| 128 |
+
]
|
| 129 |
+
|
| 130 |
+
add_status("Analyzing your question...")
|
| 131 |
+
yield render_status(), None
|
| 132 |
+
|
| 133 |
+
for _iteration in range(MAX_AGENT_ITERATIONS):
|
| 134 |
+
if cost_tracker.over_budget:
|
| 135 |
+
add_status(f"Budget limit reached ({cost_tracker.summary()}).")
|
| 136 |
+
yield render_status(), None
|
| 137 |
+
break
|
| 138 |
+
|
| 139 |
+
try:
|
| 140 |
+
response = await client.chat.completions.create(
|
| 141 |
+
model=KIMI_MODEL,
|
| 142 |
+
messages=messages,
|
| 143 |
+
tools=TOOL_DEFINITIONS,
|
| 144 |
+
temperature=0.6,
|
| 145 |
+
max_tokens=16384,
|
| 146 |
+
)
|
| 147 |
+
except Exception as e:
|
| 148 |
+
add_status(f"LLM error: {e}")
|
| 149 |
+
yield render_status(), None
|
| 150 |
+
return
|
| 151 |
+
|
| 152 |
+
if response.usage:
|
| 153 |
+
cost_tracker.add(response.usage)
|
| 154 |
+
|
| 155 |
+
choice = response.choices[0]
|
| 156 |
+
assistant_msg = choice.message
|
| 157 |
+
|
| 158 |
+
# Append assistant message to history
|
| 159 |
+
msg_dict: dict = {"role": "assistant", "content": assistant_msg.content or ""}
|
| 160 |
+
if assistant_msg.tool_calls:
|
| 161 |
+
msg_dict["tool_calls"] = [
|
| 162 |
+
{
|
| 163 |
+
"id": tc.id,
|
| 164 |
+
"type": "function",
|
| 165 |
+
"function": {"name": tc.function.name, "arguments": tc.function.arguments},
|
| 166 |
+
}
|
| 167 |
+
for tc in assistant_msg.tool_calls
|
| 168 |
+
]
|
| 169 |
+
messages.append(msg_dict)
|
| 170 |
+
|
| 171 |
+
if not assistant_msg.tool_calls:
|
| 172 |
+
add_status("Thinking...")
|
| 173 |
+
yield render_status(), None
|
| 174 |
+
continue
|
| 175 |
+
|
| 176 |
+
for tool_call in assistant_msg.tool_calls:
|
| 177 |
+
fn_name = tool_call.function.name
|
| 178 |
+
try:
|
| 179 |
+
fn_args = json.loads(tool_call.function.arguments)
|
| 180 |
+
except json.JSONDecodeError:
|
| 181 |
+
fn_args = {}
|
| 182 |
+
|
| 183 |
+
result = await _handle_tool_call(
|
| 184 |
+
fn_name, fn_args, add_status, render_status,
|
| 185 |
+
# Pass the yield-like callback for streaming status updates
|
| 186 |
+
)
|
| 187 |
+
|
| 188 |
+
# final_answer terminates the loop
|
| 189 |
+
if fn_name == "final_answer":
|
| 190 |
+
add_status("Research complete!")
|
| 191 |
+
yield render_status(), fn_args
|
| 192 |
+
return
|
| 193 |
+
|
| 194 |
+
yield render_status(), None
|
| 195 |
+
|
| 196 |
+
# For wait_for_aristotle, we need special handling with polling
|
| 197 |
+
if fn_name == "wait_for_aristotle":
|
| 198 |
+
project_id = fn_args.get("project_id", "")
|
| 199 |
+
short_id = project_id[:8]
|
| 200 |
+
add_status(f"Waiting for Aristotle proof ({short_id}...)...")
|
| 201 |
+
yield render_status(), None
|
| 202 |
+
|
| 203 |
+
poll_result = None
|
| 204 |
+
for poll_idx in range(ARISTOTLE_MAX_POLLS):
|
| 205 |
+
await asyncio.sleep(ARISTOTLE_POLL_INTERVAL)
|
| 206 |
+
info = await check_aristotle_status(project_id)
|
| 207 |
+
|
| 208 |
+
if "error" in info:
|
| 209 |
+
poll_result = json.dumps(info)
|
| 210 |
+
add_status(f"Aristotle [{short_id}] error: {info['error']}")
|
| 211 |
+
yield render_status(), None
|
| 212 |
+
break
|
| 213 |
+
|
| 214 |
+
status = info.get("status", "UNKNOWN")
|
| 215 |
+
pct = info.get("percent_complete")
|
| 216 |
+
elapsed = (poll_idx + 1) * ARISTOTLE_POLL_INTERVAL
|
| 217 |
+
pct_str = f" ({pct}%)" if pct is not None else ""
|
| 218 |
+
add_status(f"Aristotle [{short_id}]: {status}{pct_str} ({elapsed}s)")
|
| 219 |
+
yield render_status(), None
|
| 220 |
+
|
| 221 |
+
if status in TERMINAL_STATUSES:
|
| 222 |
+
if status in ("COMPLETE", "COMPLETE_WITH_ERRORS"):
|
| 223 |
+
add_status(f"Aristotle [{short_id}]: downloading result...")
|
| 224 |
+
yield render_status(), None
|
| 225 |
+
poll_result = await get_aristotle_result(project_id)
|
| 226 |
+
else:
|
| 227 |
+
poll_result = f"Aristotle project finished with status: {status}"
|
| 228 |
+
break
|
| 229 |
+
|
| 230 |
+
if poll_result is None:
|
| 231 |
+
poll_result = f"Aristotle [{short_id}] timed out after {ARISTOTLE_MAX_POLLS * ARISTOTLE_POLL_INTERVAL}s"
|
| 232 |
+
add_status(poll_result)
|
| 233 |
+
yield render_status(), None
|
| 234 |
+
|
| 235 |
+
messages.append({
|
| 236 |
+
"role": "tool",
|
| 237 |
+
"tool_call_id": tool_call.id,
|
| 238 |
+
"content": poll_result,
|
| 239 |
+
})
|
| 240 |
+
else:
|
| 241 |
+
# Regular tool — result was already computed
|
| 242 |
+
messages.append({
|
| 243 |
+
"role": "tool",
|
| 244 |
+
"tool_call_id": tool_call.id,
|
| 245 |
+
"content": result,
|
| 246 |
+
})
|
| 247 |
+
|
| 248 |
+
add_status(f"Reached max iterations. Cost: {cost_tracker.summary()}")
|
| 249 |
+
yield render_status(), None
|
| 250 |
+
|
| 251 |
+
|
| 252 |
+
async def _handle_tool_call(fn_name: str, fn_args: dict,
|
| 253 |
+
add_status, render_status) -> str:
|
| 254 |
+
"""Execute a non-blocking tool call. Returns the result string.
|
| 255 |
+
|
| 256 |
+
wait_for_aristotle is handled separately in the main loop (needs polling).
|
| 257 |
+
"""
|
| 258 |
+
if fn_name == "final_answer":
|
| 259 |
+
return json.dumps(fn_args)
|
| 260 |
+
|
| 261 |
+
if fn_name == "search_theorems":
|
| 262 |
+
query = fn_args.get("query", "")
|
| 263 |
+
add_status(f'Searching for theorems: "{query}"...')
|
| 264 |
+
return await search_theorems(query)
|
| 265 |
+
|
| 266 |
+
if fn_name == "search_lean_library":
|
| 267 |
+
query = fn_args.get("query", "")
|
| 268 |
+
add_status(f'Searching Mathlib: "{query}"...')
|
| 269 |
+
return await search_lean_library(query)
|
| 270 |
+
|
| 271 |
+
if fn_name == "check_lean_code":
|
| 272 |
+
add_status("Verifying Lean code with Axle...")
|
| 273 |
+
result = await check_lean_code(fn_args.get("code", ""))
|
| 274 |
+
try:
|
| 275 |
+
parsed = json.loads(result)
|
| 276 |
+
if parsed.get("okay"):
|
| 277 |
+
add_status("Lean code verified successfully!")
|
| 278 |
+
else:
|
| 279 |
+
n = len(parsed.get("errors", []))
|
| 280 |
+
add_status(f"Lean code has {n} error(s)")
|
| 281 |
+
except json.JSONDecodeError:
|
| 282 |
+
pass
|
| 283 |
+
return result
|
| 284 |
+
|
| 285 |
+
if fn_name == "submit_to_aristotle":
|
| 286 |
+
prompt = fn_args.get("prompt", "")
|
| 287 |
+
short_prompt = prompt[:80].replace("\n", " ")
|
| 288 |
+
add_status(f'Submitting to Aristotle: "{short_prompt}"...')
|
| 289 |
+
result = await submit_to_aristotle(prompt)
|
| 290 |
+
try:
|
| 291 |
+
parsed = json.loads(result)
|
| 292 |
+
pid = parsed.get("project_id", "")
|
| 293 |
+
if pid:
|
| 294 |
+
add_status(f"Aristotle job submitted: {pid[:8]}...")
|
| 295 |
+
elif "error" in parsed:
|
| 296 |
+
add_status(f"Aristotle error: {parsed['error']}")
|
| 297 |
+
except json.JSONDecodeError:
|
| 298 |
+
pass
|
| 299 |
+
return result
|
| 300 |
+
|
| 301 |
+
if fn_name == "check_aristotle_status":
|
| 302 |
+
project_id = fn_args.get("project_id", "")
|
| 303 |
+
info = await check_aristotle_status(project_id)
|
| 304 |
+
status = info.get("status", "UNKNOWN")
|
| 305 |
+
pct = info.get("percent_complete")
|
| 306 |
+
pct_str = f" ({pct}%)" if pct is not None else ""
|
| 307 |
+
add_status(f"Aristotle [{project_id[:8]}]: {status}{pct_str}")
|
| 308 |
+
return json.dumps(info)
|
| 309 |
+
|
| 310 |
+
if fn_name == "get_aristotle_result":
|
| 311 |
+
project_id = fn_args.get("project_id", "")
|
| 312 |
+
add_status(f"Downloading Aristotle result [{project_id[:8]}]...")
|
| 313 |
+
return await get_aristotle_result(project_id)
|
| 314 |
+
|
| 315 |
+
if fn_name == "wait_for_aristotle":
|
| 316 |
+
# Handled specially in the main loop; return placeholder here
|
| 317 |
+
return ""
|
| 318 |
+
|
| 319 |
+
return json.dumps({"error": f"Unknown tool: {fn_name}"})
|
| 320 |
+
|
| 321 |
+
|
| 322 |
+
def format_final_response(status_text: str, result: dict | None) -> str:
|
| 323 |
+
"""Format the final response with natural language answer and expandable Lean code."""
|
| 324 |
+
if result is None:
|
| 325 |
+
return (
|
| 326 |
+
status_text
|
| 327 |
+
+ "\n\nI was unable to complete the research. "
|
| 328 |
+
"Please try rephrasing your question or asking something simpler."
|
| 329 |
+
)
|
| 330 |
+
|
| 331 |
+
answer = result.get("answer", "No answer provided.")
|
| 332 |
+
lean_code = result.get("lean_code", "")
|
| 333 |
+
verified = result.get("verified", False)
|
| 334 |
+
|
| 335 |
+
parts = []
|
| 336 |
+
|
| 337 |
+
# Status log (collapsed)
|
| 338 |
+
parts.append(
|
| 339 |
+
f"<details>\n<summary>Research log</summary>\n\n{status_text}\n\n</details>\n"
|
| 340 |
+
)
|
| 341 |
+
|
| 342 |
+
# Main answer
|
| 343 |
+
parts.append(answer)
|
| 344 |
+
|
| 345 |
+
# Lean code section
|
| 346 |
+
if lean_code.strip():
|
| 347 |
+
badge = "Verified" if verified else "Unverified"
|
| 348 |
+
icon = "✅" if verified else "⚠️"
|
| 349 |
+
parts.append(
|
| 350 |
+
f"\n\n<details open>\n<summary>{icon} Lean 4 Code ({badge})</summary>"
|
| 351 |
+
f"\n\n```lean4\n{lean_code}\n```\n\n</details>"
|
| 352 |
+
)
|
| 353 |
+
|
| 354 |
+
return "\n\n".join(parts)
|
app.py
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import traceback
|
| 2 |
+
import gradio as gr
|
| 3 |
+
from agent import run_agent, format_final_response
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
EXAMPLES = [
|
| 7 |
+
"Prove that the sum of two even numbers is even.",
|
| 8 |
+
"Is the square root of 2 irrational?",
|
| 9 |
+
"Prove that there are infinitely many prime numbers.",
|
| 10 |
+
"Show that for all natural numbers n, n + 0 = n.",
|
| 11 |
+
"Prove that the composition of two injective functions is injective.",
|
| 12 |
+
]
|
| 13 |
+
|
| 14 |
+
LATEX_DELIMITERS = [
|
| 15 |
+
{"left": "$$", "right": "$$", "display": True},
|
| 16 |
+
{"left": "$", "right": "$", "display": False},
|
| 17 |
+
]
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
async def respond(message, history):
|
| 21 |
+
"""Handle a user message. Streams progress, then delivers final answer."""
|
| 22 |
+
if not message or not message.strip():
|
| 23 |
+
yield "Please enter a mathematical question."
|
| 24 |
+
return
|
| 25 |
+
|
| 26 |
+
try:
|
| 27 |
+
status_text = ""
|
| 28 |
+
final_result = None
|
| 29 |
+
|
| 30 |
+
async for status, result in run_agent(message):
|
| 31 |
+
status_text = status
|
| 32 |
+
if result is not None:
|
| 33 |
+
final_result = result
|
| 34 |
+
yield status_text
|
| 35 |
+
|
| 36 |
+
yield format_final_response(status_text, final_result)
|
| 37 |
+
except Exception as e:
|
| 38 |
+
tb = traceback.format_exc()
|
| 39 |
+
yield f"An error occurred:\n```\n{tb}\n```"
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
demo = gr.ChatInterface(
|
| 43 |
+
fn=respond,
|
| 44 |
+
title="VeriDeepResearch",
|
| 45 |
+
description=(
|
| 46 |
+
"Ask a mathematical question and get a verified answer. "
|
| 47 |
+
"The answer is formalized in Lean 4 and checked by a proof verifier. "
|
| 48 |
+
"Expand the Lean code section to see the formal proof."
|
| 49 |
+
),
|
| 50 |
+
examples=EXAMPLES,
|
| 51 |
+
chatbot=gr.Chatbot(
|
| 52 |
+
height=600,
|
| 53 |
+
latex_delimiters=LATEX_DELIMITERS,
|
| 54 |
+
),
|
| 55 |
+
)
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
if __name__ == "__main__":
|
| 59 |
+
demo.launch()
|
config.py
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
|
| 3 |
+
# TokenFactory (Kimi K2.5)
|
| 4 |
+
TOKEN_FACTORY_API_KEY = os.getenv("TOKEN_FACTORY_API_KEY", "")
|
| 5 |
+
TOKEN_FACTORY_BASE_URL = "https://api.tokenfactory.nebius.com/v1/"
|
| 6 |
+
KIMI_MODEL = "moonshotai/Kimi-K2.5"
|
| 7 |
+
|
| 8 |
+
# Pricing per token (Kimi K2 pricing from TokenFactory)
|
| 9 |
+
INPUT_COST_PER_TOKEN = 0.50 / 1_000_000 # $0.50 per 1M input tokens
|
| 10 |
+
OUTPUT_COST_PER_TOKEN = 2.40 / 1_000_000 # $2.40 per 1M output tokens
|
| 11 |
+
MAX_COST_PER_QUERY = 20.0
|
| 12 |
+
|
| 13 |
+
# Axle (Lean verification)
|
| 14 |
+
AXLE_API_KEY = os.getenv("AXLE_API_KEY", "")
|
| 15 |
+
AXLE_BASE_URL = "https://axle.axiommath.ai/api/v1"
|
| 16 |
+
LEAN_ENVIRONMENT = "lean-4.28.0"
|
| 17 |
+
|
| 18 |
+
# Aristotle (Lean formalization & proving)
|
| 19 |
+
ARISTOTLE_API_KEY = os.getenv("ARISTOTLE_API_KEY", "")
|
| 20 |
+
ARISTOTLE_POLL_INTERVAL = 30 # seconds between status checks
|
| 21 |
+
ARISTOTLE_MAX_POLLS = 40 # 40 * 30s = 20 minutes max wait
|
| 22 |
+
|
| 23 |
+
# LeanExplore (Mathlib search)
|
| 24 |
+
LEAN_EXPLORE_API_KEY = os.getenv("LEAN_EXPLORE_API_KEY", "")
|
| 25 |
+
LEAN_EXPLORE_BASE_URL = "https://www.leanexplore.com/api"
|
| 26 |
+
|
| 27 |
+
# TheoremSearch (arXiv theorem search)
|
| 28 |
+
THEOREM_SEARCH_BASE_URL = "https://api.theoremsearch.com"
|
| 29 |
+
|
| 30 |
+
# Agent settings
|
| 31 |
+
MAX_AGENT_ITERATIONS = 50
|
requirements.txt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
openai>=1.0
|
| 2 |
+
httpx>=0.24
|
| 3 |
+
aristotlelib
|
tools.py
ADDED
|
@@ -0,0 +1,371 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
import os
|
| 5 |
+
import tarfile
|
| 6 |
+
import tempfile
|
| 7 |
+
import httpx
|
| 8 |
+
from config import (
|
| 9 |
+
THEOREM_SEARCH_BASE_URL,
|
| 10 |
+
LEAN_EXPLORE_BASE_URL,
|
| 11 |
+
AXLE_API_KEY,
|
| 12 |
+
AXLE_BASE_URL,
|
| 13 |
+
LEAN_ENVIRONMENT,
|
| 14 |
+
ARISTOTLE_API_KEY,
|
| 15 |
+
)
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
# ---------------------------------------------------------------------------
|
| 19 |
+
# TheoremSearch
|
| 20 |
+
# ---------------------------------------------------------------------------
|
| 21 |
+
|
| 22 |
+
async def search_theorems(query: str, n_results: int = 5) -> str:
|
| 23 |
+
try:
|
| 24 |
+
async with httpx.AsyncClient(timeout=30) as client:
|
| 25 |
+
response = await client.post(
|
| 26 |
+
f"{THEOREM_SEARCH_BASE_URL}/search",
|
| 27 |
+
json={"query": query, "n_results": n_results},
|
| 28 |
+
)
|
| 29 |
+
if response.status_code != 200:
|
| 30 |
+
return json.dumps({"error": f"TheoremSearch unavailable (HTTP {response.status_code})"})
|
| 31 |
+
data = response.json()
|
| 32 |
+
results = []
|
| 33 |
+
for thm in data.get("theorems", []):
|
| 34 |
+
results.append({
|
| 35 |
+
"name": thm.get("name", ""),
|
| 36 |
+
"body": thm.get("body", "")[:500],
|
| 37 |
+
"slogan": thm.get("slogan", ""),
|
| 38 |
+
"paper_title": thm.get("paper", {}).get("title", ""),
|
| 39 |
+
"link": thm.get("link", ""),
|
| 40 |
+
})
|
| 41 |
+
return json.dumps(results, indent=2)
|
| 42 |
+
except Exception as e:
|
| 43 |
+
return json.dumps({"error": f"TheoremSearch error: {e}"})
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
# ---------------------------------------------------------------------------
|
| 47 |
+
# LeanExplore
|
| 48 |
+
# ---------------------------------------------------------------------------
|
| 49 |
+
|
| 50 |
+
async def search_lean_library(query: str) -> str:
|
| 51 |
+
try:
|
| 52 |
+
async with httpx.AsyncClient(timeout=30) as client:
|
| 53 |
+
response = await client.get(
|
| 54 |
+
f"{LEAN_EXPLORE_BASE_URL}/search",
|
| 55 |
+
params={"q": query},
|
| 56 |
+
)
|
| 57 |
+
if response.status_code != 200:
|
| 58 |
+
return json.dumps({"error": f"LeanExplore unavailable (HTTP {response.status_code})"})
|
| 59 |
+
data = response.json()
|
| 60 |
+
results = []
|
| 61 |
+
for decl in data.get("results", [])[:10]:
|
| 62 |
+
results.append({
|
| 63 |
+
"name": decl.get("name", ""),
|
| 64 |
+
"module": decl.get("module", ""),
|
| 65 |
+
"source_text": decl.get("source_text", "")[:400],
|
| 66 |
+
"informalization": decl.get("informalization", "")[:300],
|
| 67 |
+
})
|
| 68 |
+
return json.dumps(results, indent=2)
|
| 69 |
+
except Exception as e:
|
| 70 |
+
return json.dumps({"error": f"LeanExplore error: {e}"})
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
# ---------------------------------------------------------------------------
|
| 74 |
+
# Axle (Lean verification)
|
| 75 |
+
# ---------------------------------------------------------------------------
|
| 76 |
+
|
| 77 |
+
async def check_lean_code(code: str) -> str:
|
| 78 |
+
try:
|
| 79 |
+
async with httpx.AsyncClient(timeout=180) as client:
|
| 80 |
+
response = await client.post(
|
| 81 |
+
f"{AXLE_BASE_URL}/check",
|
| 82 |
+
headers={
|
| 83 |
+
"Authorization": f"Bearer {AXLE_API_KEY}",
|
| 84 |
+
"Content-Type": "application/json",
|
| 85 |
+
},
|
| 86 |
+
json={
|
| 87 |
+
"content": code,
|
| 88 |
+
"environment": LEAN_ENVIRONMENT,
|
| 89 |
+
"timeout_seconds": 120,
|
| 90 |
+
},
|
| 91 |
+
)
|
| 92 |
+
if response.status_code != 200:
|
| 93 |
+
return json.dumps({"error": f"Axle HTTP {response.status_code}: {response.text[:500]}"})
|
| 94 |
+
data = response.json()
|
| 95 |
+
if "user_error" in data:
|
| 96 |
+
return json.dumps({"error": data["user_error"]})
|
| 97 |
+
return json.dumps({
|
| 98 |
+
"okay": data.get("okay", False),
|
| 99 |
+
"errors": data.get("lean_messages", {}).get("errors", []),
|
| 100 |
+
"warnings": data.get("lean_messages", {}).get("warnings", []),
|
| 101 |
+
"tool_errors": data.get("tool_messages", {}).get("errors", []),
|
| 102 |
+
"tool_infos": data.get("tool_messages", {}).get("infos", []),
|
| 103 |
+
}, indent=2)
|
| 104 |
+
except Exception as e:
|
| 105 |
+
return json.dumps({"error": f"Axle error: {e}"})
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
# ---------------------------------------------------------------------------
|
| 109 |
+
# Aristotle (Lean formalization & proving)
|
| 110 |
+
# ---------------------------------------------------------------------------
|
| 111 |
+
|
| 112 |
+
async def submit_to_aristotle(prompt: str) -> str:
|
| 113 |
+
"""Submit a prompt to Aristotle. Returns project info JSON."""
|
| 114 |
+
import aristotlelib
|
| 115 |
+
from aristotlelib import Project
|
| 116 |
+
|
| 117 |
+
aristotlelib.set_api_key(ARISTOTLE_API_KEY)
|
| 118 |
+
try:
|
| 119 |
+
project = await Project.create(prompt=prompt)
|
| 120 |
+
return json.dumps({
|
| 121 |
+
"project_id": project.project_id,
|
| 122 |
+
"status": project.status.value,
|
| 123 |
+
})
|
| 124 |
+
except Exception as e:
|
| 125 |
+
return json.dumps({"error": f"Aristotle submit error: {e}"})
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
async def check_aristotle_status(project_id: str) -> dict:
|
| 129 |
+
"""Check the status of an Aristotle project. Returns a dict (non-blocking)."""
|
| 130 |
+
import aristotlelib
|
| 131 |
+
from aristotlelib import Project
|
| 132 |
+
|
| 133 |
+
aristotlelib.set_api_key(ARISTOTLE_API_KEY)
|
| 134 |
+
try:
|
| 135 |
+
project = await Project.from_id(project_id)
|
| 136 |
+
return {
|
| 137 |
+
"project_id": project.project_id,
|
| 138 |
+
"status": project.status.value,
|
| 139 |
+
"percent_complete": project.percent_complete,
|
| 140 |
+
}
|
| 141 |
+
except Exception as e:
|
| 142 |
+
return {"error": str(e)}
|
| 143 |
+
|
| 144 |
+
|
| 145 |
+
async def get_aristotle_result(project_id: str) -> str:
|
| 146 |
+
"""Download and return the Lean code from a completed Aristotle project."""
|
| 147 |
+
import aristotlelib
|
| 148 |
+
from aristotlelib import Project
|
| 149 |
+
|
| 150 |
+
aristotlelib.set_api_key(ARISTOTLE_API_KEY)
|
| 151 |
+
try:
|
| 152 |
+
project = await Project.from_id(project_id)
|
| 153 |
+
with tempfile.TemporaryDirectory() as tmpdir:
|
| 154 |
+
result_path = await project.get_solution(
|
| 155 |
+
destination=os.path.join(tmpdir, "result")
|
| 156 |
+
)
|
| 157 |
+
result_path = str(result_path)
|
| 158 |
+
|
| 159 |
+
if result_path.endswith(".tar.gz") or result_path.endswith(".tgz"):
|
| 160 |
+
extract_dir = os.path.join(tmpdir, "extracted")
|
| 161 |
+
os.makedirs(extract_dir, exist_ok=True)
|
| 162 |
+
with tarfile.open(result_path) as tar:
|
| 163 |
+
tar.extractall(extract_dir)
|
| 164 |
+
|
| 165 |
+
lean_contents = []
|
| 166 |
+
for root, _dirs, files in os.walk(extract_dir):
|
| 167 |
+
for f in sorted(files):
|
| 168 |
+
if f.endswith(".lean"):
|
| 169 |
+
with open(os.path.join(root, f)) as fh:
|
| 170 |
+
lean_contents.append(fh.read())
|
| 171 |
+
|
| 172 |
+
if not lean_contents:
|
| 173 |
+
return "No Lean files found in Aristotle result."
|
| 174 |
+
return "\n\n".join(lean_contents)
|
| 175 |
+
else:
|
| 176 |
+
with open(result_path) as f:
|
| 177 |
+
return f.read()
|
| 178 |
+
except Exception as e:
|
| 179 |
+
return f"Error downloading Aristotle result: {e}"
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
# ---------------------------------------------------------------------------
|
| 183 |
+
# Tool definitions for OpenAI function calling
|
| 184 |
+
# ---------------------------------------------------------------------------
|
| 185 |
+
|
| 186 |
+
TOOL_DEFINITIONS = [
|
| 187 |
+
{
|
| 188 |
+
"type": "function",
|
| 189 |
+
"function": {
|
| 190 |
+
"name": "search_theorems",
|
| 191 |
+
"description": (
|
| 192 |
+
"Search for mathematical theorems on arXiv using TheoremSearch. "
|
| 193 |
+
"Returns theorem statements, slogans, and paper references."
|
| 194 |
+
),
|
| 195 |
+
"parameters": {
|
| 196 |
+
"type": "object",
|
| 197 |
+
"properties": {
|
| 198 |
+
"query": {
|
| 199 |
+
"type": "string",
|
| 200 |
+
"description": "Natural language search query about the mathematical topic",
|
| 201 |
+
}
|
| 202 |
+
},
|
| 203 |
+
"required": ["query"],
|
| 204 |
+
},
|
| 205 |
+
},
|
| 206 |
+
},
|
| 207 |
+
{
|
| 208 |
+
"type": "function",
|
| 209 |
+
"function": {
|
| 210 |
+
"name": "search_lean_library",
|
| 211 |
+
"description": (
|
| 212 |
+
"Search Mathlib (Lean 4 math library) for relevant declarations, "
|
| 213 |
+
"theorems, and definitions. Use to find existing Lean names and API."
|
| 214 |
+
),
|
| 215 |
+
"parameters": {
|
| 216 |
+
"type": "object",
|
| 217 |
+
"properties": {
|
| 218 |
+
"query": {
|
| 219 |
+
"type": "string",
|
| 220 |
+
"description": (
|
| 221 |
+
"Search query - a Lean name like 'Nat.add_comm' "
|
| 222 |
+
"or natural language like 'sum of even numbers is even'"
|
| 223 |
+
),
|
| 224 |
+
}
|
| 225 |
+
},
|
| 226 |
+
"required": ["query"],
|
| 227 |
+
},
|
| 228 |
+
},
|
| 229 |
+
},
|
| 230 |
+
{
|
| 231 |
+
"type": "function",
|
| 232 |
+
"function": {
|
| 233 |
+
"name": "check_lean_code",
|
| 234 |
+
"description": (
|
| 235 |
+
"Quickly check if Lean 4 code compiles using the Axle verifier (seconds). "
|
| 236 |
+
"Code MUST start with 'import Mathlib'. Use for fast verification."
|
| 237 |
+
),
|
| 238 |
+
"parameters": {
|
| 239 |
+
"type": "object",
|
| 240 |
+
"properties": {
|
| 241 |
+
"code": {
|
| 242 |
+
"type": "string",
|
| 243 |
+
"description": "Complete Lean 4 code. Must start with 'import Mathlib'.",
|
| 244 |
+
}
|
| 245 |
+
},
|
| 246 |
+
"required": ["code"],
|
| 247 |
+
},
|
| 248 |
+
},
|
| 249 |
+
},
|
| 250 |
+
{
|
| 251 |
+
"type": "function",
|
| 252 |
+
"function": {
|
| 253 |
+
"name": "submit_to_aristotle",
|
| 254 |
+
"description": (
|
| 255 |
+
"Submit a proof request to Aristotle (automated theorem prover). "
|
| 256 |
+
"Aristotle proves graduate/research-level math in Lean 4. "
|
| 257 |
+
"Returns a project_id immediately — use check_aristotle_status or "
|
| 258 |
+
"wait_for_aristotle to get results later.\n\n"
|
| 259 |
+
"SUBMIT MULTIPLE JOBS with different formulations:\n"
|
| 260 |
+
"- The main result as a natural language prompt\n"
|
| 261 |
+
"- Sub-lemmas that the result decomposes into\n"
|
| 262 |
+
"- Lean code with sorry: 'Fill in the sorries:\\n```lean\\n...\\n```'\n"
|
| 263 |
+
"- Different proof strategies\n\n"
|
| 264 |
+
"After submitting, continue researching — do NOT wait immediately."
|
| 265 |
+
),
|
| 266 |
+
"parameters": {
|
| 267 |
+
"type": "object",
|
| 268 |
+
"properties": {
|
| 269 |
+
"prompt": {
|
| 270 |
+
"type": "string",
|
| 271 |
+
"description": (
|
| 272 |
+
"Instructions for Aristotle. Can be natural language, "
|
| 273 |
+
"or include Lean code with sorry placeholders."
|
| 274 |
+
),
|
| 275 |
+
}
|
| 276 |
+
},
|
| 277 |
+
"required": ["prompt"],
|
| 278 |
+
},
|
| 279 |
+
},
|
| 280 |
+
},
|
| 281 |
+
{
|
| 282 |
+
"type": "function",
|
| 283 |
+
"function": {
|
| 284 |
+
"name": "check_aristotle_status",
|
| 285 |
+
"description": (
|
| 286 |
+
"Non-blocking check of an Aristotle project's status. "
|
| 287 |
+
"Returns status and percent_complete. Use this to poll while "
|
| 288 |
+
"doing other work."
|
| 289 |
+
),
|
| 290 |
+
"parameters": {
|
| 291 |
+
"type": "object",
|
| 292 |
+
"properties": {
|
| 293 |
+
"project_id": {
|
| 294 |
+
"type": "string",
|
| 295 |
+
"description": "The project_id from submit_to_aristotle.",
|
| 296 |
+
}
|
| 297 |
+
},
|
| 298 |
+
"required": ["project_id"],
|
| 299 |
+
},
|
| 300 |
+
},
|
| 301 |
+
},
|
| 302 |
+
{
|
| 303 |
+
"type": "function",
|
| 304 |
+
"function": {
|
| 305 |
+
"name": "wait_for_aristotle",
|
| 306 |
+
"description": (
|
| 307 |
+
"Wait for an Aristotle project to complete (polls every 30s, up to 20 min). "
|
| 308 |
+
"Returns the Lean code produced. Only call this when you're ready to "
|
| 309 |
+
"collect results — do other work first."
|
| 310 |
+
),
|
| 311 |
+
"parameters": {
|
| 312 |
+
"type": "object",
|
| 313 |
+
"properties": {
|
| 314 |
+
"project_id": {
|
| 315 |
+
"type": "string",
|
| 316 |
+
"description": "The project_id from submit_to_aristotle.",
|
| 317 |
+
}
|
| 318 |
+
},
|
| 319 |
+
"required": ["project_id"],
|
| 320 |
+
},
|
| 321 |
+
},
|
| 322 |
+
},
|
| 323 |
+
{
|
| 324 |
+
"type": "function",
|
| 325 |
+
"function": {
|
| 326 |
+
"name": "get_aristotle_result",
|
| 327 |
+
"description": (
|
| 328 |
+
"Download the Lean code from a COMPLETED Aristotle project. "
|
| 329 |
+
"Only call after check_aristotle_status shows COMPLETE."
|
| 330 |
+
),
|
| 331 |
+
"parameters": {
|
| 332 |
+
"type": "object",
|
| 333 |
+
"properties": {
|
| 334 |
+
"project_id": {
|
| 335 |
+
"type": "string",
|
| 336 |
+
"description": "The project_id of a completed Aristotle project.",
|
| 337 |
+
}
|
| 338 |
+
},
|
| 339 |
+
"required": ["project_id"],
|
| 340 |
+
},
|
| 341 |
+
},
|
| 342 |
+
},
|
| 343 |
+
{
|
| 344 |
+
"type": "function",
|
| 345 |
+
"function": {
|
| 346 |
+
"name": "final_answer",
|
| 347 |
+
"description": (
|
| 348 |
+
"Submit the final answer to the user. Call this when you have "
|
| 349 |
+
"a complete natural language answer and Lean code."
|
| 350 |
+
),
|
| 351 |
+
"parameters": {
|
| 352 |
+
"type": "object",
|
| 353 |
+
"properties": {
|
| 354 |
+
"answer": {
|
| 355 |
+
"type": "string",
|
| 356 |
+
"description": "Clear natural language explanation. Use LaTeX ($...$ and $$...$$) for math.",
|
| 357 |
+
},
|
| 358 |
+
"lean_code": {
|
| 359 |
+
"type": "string",
|
| 360 |
+
"description": "Complete Lean 4 code (should start with 'import Mathlib').",
|
| 361 |
+
},
|
| 362 |
+
"verified": {
|
| 363 |
+
"type": "boolean",
|
| 364 |
+
"description": "Whether the Lean code was verified successfully by Axle.",
|
| 365 |
+
},
|
| 366 |
+
},
|
| 367 |
+
"required": ["answer", "lean_code", "verified"],
|
| 368 |
+
},
|
| 369 |
+
},
|
| 370 |
+
},
|
| 371 |
+
]
|