Spaces:
Running
Running
Add repair_lean_proofs tool for instant sorry-filling
Browse filesNew agent tool backed by Axle's repair_proofs API. Tries automation
tactics (grind, simp, omega, nlinarith, aesop) to fill sorries in
~1-2 seconds — before the expensive Aristotle submission.
Tested: fills simple sorries (n+0=n, commutativity, positivity)
instantly. Hard sorries still need Aristotle.
Workflow: write sorry → repair_lean_proofs (1s) → if sorry remains,
extract_sorry_lemmas → submit to Aristotle.
23/31 verified, $11.60 total.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
agent.py
CHANGED
|
@@ -18,6 +18,7 @@ from tools import (
|
|
| 18 |
search_lean_library,
|
| 19 |
search_loogle,
|
| 20 |
check_lean_code,
|
|
|
|
| 21 |
extract_sorry_lemmas,
|
| 22 |
submit_to_aristotle,
|
| 23 |
check_aristotle_status,
|
|
@@ -53,8 +54,9 @@ If a statement is FALSE or you suspect it is false:
|
|
| 53 |
### Phase 2: Write statement + parallel proving
|
| 54 |
1. **Write the formal Lean 4 theorem STATEMENT first** (with `sorry` as proof).
|
| 55 |
2. **Verify the statement compiles** with check_lean_code (just the statement + sorry).
|
| 56 |
-
3. Once the statement compiles:
|
| 57 |
-
- **
|
|
|
|
| 58 |
- **Simultaneously try proving it yourself** with check_lean_code.
|
| 59 |
4. If your proof attempt fails after 3-5 tries on a specific approach, try a DIFFERENT strategy:
|
| 60 |
- Different tactics (simp, omega, ring, norm_num, linarith, nlinarith)
|
|
@@ -453,6 +455,24 @@ async def _handle_tool_call(fn_name: str, fn_args: dict, job: JobState) -> str:
|
|
| 453 |
pass
|
| 454 |
return result
|
| 455 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 456 |
if fn_name == "extract_sorry_lemmas":
|
| 457 |
code = fn_args.get("code", "")
|
| 458 |
job.add_status("Extracting sorry'd sub-lemmas...")
|
|
|
|
| 18 |
search_lean_library,
|
| 19 |
search_loogle,
|
| 20 |
check_lean_code,
|
| 21 |
+
repair_lean_proofs,
|
| 22 |
extract_sorry_lemmas,
|
| 23 |
submit_to_aristotle,
|
| 24 |
check_aristotle_status,
|
|
|
|
| 54 |
### Phase 2: Write statement + parallel proving
|
| 55 |
1. **Write the formal Lean 4 theorem STATEMENT first** (with `sorry` as proof).
|
| 56 |
2. **Verify the statement compiles** with check_lean_code (just the statement + sorry).
|
| 57 |
+
3. Once the statement compiles with sorry:
|
| 58 |
+
- **First try repair_lean_proofs** — it may fill simple sorries automatically in 1-2 seconds.
|
| 59 |
+
- If sorry remains, **submit to Aristotle** with the sorry'd code.
|
| 60 |
- **Simultaneously try proving it yourself** with check_lean_code.
|
| 61 |
4. If your proof attempt fails after 3-5 tries on a specific approach, try a DIFFERENT strategy:
|
| 62 |
- Different tactics (simp, omega, ring, norm_num, linarith, nlinarith)
|
|
|
|
| 455 |
pass
|
| 456 |
return result
|
| 457 |
|
| 458 |
+
if fn_name == "repair_lean_proofs":
|
| 459 |
+
code = fn_args.get("code", "")
|
| 460 |
+
job.add_status("Attempting automatic proof repair...")
|
| 461 |
+
result = await repair_lean_proofs(code)
|
| 462 |
+
try:
|
| 463 |
+
parsed = json.loads(result)
|
| 464 |
+
repairs = parsed.get("repairs_applied", 0)
|
| 465 |
+
has_sorry = parsed.get("still_has_sorry", True)
|
| 466 |
+
if repairs > 0 and not has_sorry:
|
| 467 |
+
job.add_status(f"Proof repair succeeded! {repairs} sorry(s) filled automatically.")
|
| 468 |
+
elif repairs > 0:
|
| 469 |
+
job.add_status(f"Partial repair: {repairs} sorry(s) filled, some remain.")
|
| 470 |
+
else:
|
| 471 |
+
job.add_status("Proof repair: no sorries could be filled automatically.")
|
| 472 |
+
except json.JSONDecodeError:
|
| 473 |
+
pass
|
| 474 |
+
return result
|
| 475 |
+
|
| 476 |
if fn_name == "extract_sorry_lemmas":
|
| 477 |
code = fn_args.get("code", "")
|
| 478 |
job.add_status("Extracting sorry'd sub-lemmas...")
|
log.md
CHANGED
|
@@ -516,3 +516,33 @@ System is stable and reliable at its capability tier.
|
|
| 516 |
- `templates/index.html`: Updated subtitle with tech stack description, replaced examples with 6 proven-to-work problems across different difficulty levels
|
| 517 |
|
| 518 |
### Overall: 22/30 verified locally + HF deployment confirmed working
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 516 |
- `templates/index.html`: Updated subtitle with tech stack description, replaced examples with 6 proven-to-work problems across different difficulty levels
|
| 517 |
|
| 518 |
### Overall: 22/30 verified locally + HF deployment confirmed working
|
| 519 |
+
|
| 520 |
+
## Iteration 14 — 2026-03-24 11:00 PDT
|
| 521 |
+
|
| 522 |
+
### Diagnosis
|
| 523 |
+
The agent's workflow for sorry-containing code was: write sorry → submit to Aristotle → wait. But many sorries can be closed instantly by automation tactics (grind, simp, omega). Tested Axle's `repair_proofs` API:
|
| 524 |
+
- 3 simple sorries (n+0=n, a+b=b+a, 0<n+1): ALL filled by `grind` in 100ms
|
| 525 |
+
- 2 hard sorries (Fermat, AM-GM): Not filled by automation (needs Mathlib-specific reasoning)
|
| 526 |
+
|
| 527 |
+
### Fix: Add `repair_lean_proofs` tool (EFFICIENCY)
|
| 528 |
+
|
| 529 |
+
New agent tool that calls Axle's `repair_proofs` API with automation tactics (grind, simp_all, omega, norm_num, nlinarith, aesop). Costs ~1-2 seconds per call.
|
| 530 |
+
|
| 531 |
+
Updated workflow:
|
| 532 |
+
1. Agent writes code with sorry → compiles
|
| 533 |
+
2. **Call repair_lean_proofs** (NEW, ~1s) — may fill simple sorries instantly
|
| 534 |
+
3. If sorry remains → extract_sorry_lemmas + submit to Aristotle
|
| 535 |
+
4. Keep proving yourself in parallel
|
| 536 |
+
|
| 537 |
+
This adds a fast, free middle step that catches low-hanging fruit before the expensive Aristotle round-trip (which takes 5-30 minutes).
|
| 538 |
+
|
| 539 |
+
### Test Results
|
| 540 |
+
| Problem | Time | Cost | Status |
|
| 541 |
+
|---------|------|------|--------|
|
| 542 |
+
| a∣b ∧ b∣a → a=b | ~10s | $0.007 | VERIFIED | `Nat.dvd_antisymm` |
|
| 543 |
+
|
| 544 |
+
### Code Changes
|
| 545 |
+
- `tools.py`: Added `repair_lean_proofs()` function + tool definition
|
| 546 |
+
- `agent.py`: Added import, handler, updated system prompt Phase 2
|
| 547 |
+
|
| 548 |
+
### Overall: 23/31 verified (74%), $11.60 total
|
tools.py
CHANGED
|
@@ -137,6 +137,39 @@ async def check_lean_code(code: str) -> str:
|
|
| 137 |
|
| 138 |
|
| 139 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
async def extract_sorry_lemmas(code: str) -> str:
|
| 141 |
"""Use Axle sorry2lemma to extract sorry'd subgoals into standalone lemma stubs."""
|
| 142 |
try:
|
|
@@ -413,6 +446,28 @@ TOOL_DEFINITIONS = [
|
|
| 413 |
},
|
| 414 |
},
|
| 415 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 416 |
{
|
| 417 |
"type": "function",
|
| 418 |
"function": {
|
|
|
|
| 137 |
|
| 138 |
|
| 139 |
|
| 140 |
+
async def repair_lean_proofs(code: str) -> str:
|
| 141 |
+
"""Use Axle repair_proofs to automatically fill sorries with grind/simp/omega."""
|
| 142 |
+
try:
|
| 143 |
+
async with httpx.AsyncClient(timeout=180) as client:
|
| 144 |
+
response = await client.post(
|
| 145 |
+
f"{AXLE_BASE_URL}/repair_proofs",
|
| 146 |
+
headers={
|
| 147 |
+
"Authorization": f"Bearer {AXLE_API_KEY}",
|
| 148 |
+
"Content-Type": "application/json",
|
| 149 |
+
},
|
| 150 |
+
json={
|
| 151 |
+
"content": code,
|
| 152 |
+
"environment": LEAN_ENVIRONMENT,
|
| 153 |
+
"terminal_tactics": ["grind", "simp_all", "omega", "norm_num", "nlinarith", "aesop"],
|
| 154 |
+
"timeout_seconds": 60,
|
| 155 |
+
},
|
| 156 |
+
)
|
| 157 |
+
if response.status_code != 200:
|
| 158 |
+
return json.dumps({"error": f"Axle HTTP {response.status_code}: {response.text[:500]}"})
|
| 159 |
+
data = response.json()
|
| 160 |
+
repaired_code = data.get("content", code)
|
| 161 |
+
stats = data.get("repair_stats", {})
|
| 162 |
+
has_sorry = "sorry" in repaired_code
|
| 163 |
+
return json.dumps({
|
| 164 |
+
"repaired_code": repaired_code,
|
| 165 |
+
"repairs_applied": stats.get("apply_terminal_tactics", 0),
|
| 166 |
+
"still_has_sorry": has_sorry,
|
| 167 |
+
"okay": data.get("okay", False),
|
| 168 |
+
}, indent=2)
|
| 169 |
+
except Exception as e:
|
| 170 |
+
return json.dumps({"error": f"repair_proofs error: {e}"})
|
| 171 |
+
|
| 172 |
+
|
| 173 |
async def extract_sorry_lemmas(code: str) -> str:
|
| 174 |
"""Use Axle sorry2lemma to extract sorry'd subgoals into standalone lemma stubs."""
|
| 175 |
try:
|
|
|
|
| 446 |
},
|
| 447 |
},
|
| 448 |
},
|
| 449 |
+
{
|
| 450 |
+
"type": "function",
|
| 451 |
+
"function": {
|
| 452 |
+
"name": "repair_lean_proofs",
|
| 453 |
+
"description": (
|
| 454 |
+
"Automatically attempt to fill sorry placeholders with automation tactics "
|
| 455 |
+
"(grind, simp, omega, norm_num, nlinarith, aesop). FAST (~1-2 seconds). "
|
| 456 |
+
"Call this BEFORE submitting to Aristotle — it may resolve simple sorries instantly. "
|
| 457 |
+
"Returns repaired code and whether sorry remains."
|
| 458 |
+
),
|
| 459 |
+
"parameters": {
|
| 460 |
+
"type": "object",
|
| 461 |
+
"properties": {
|
| 462 |
+
"code": {
|
| 463 |
+
"type": "string",
|
| 464 |
+
"description": "Lean 4 code with sorry placeholders to attempt repair.",
|
| 465 |
+
}
|
| 466 |
+
},
|
| 467 |
+
"required": ["code"],
|
| 468 |
+
},
|
| 469 |
+
},
|
| 470 |
+
},
|
| 471 |
{
|
| 472 |
"type": "function",
|
| 473 |
"function": {
|