Spaces:

Vilin97
/

VeriDeepResearch

Running

Vilin97 Claude Opus 4.6 (1M context) commited on about 10 hours ago

Commit

f3ef14f

1 Parent(s): 0253f58

Add repair_lean_proofs tool for instant sorry-filling

New agent tool backed by Axle's repair_proofs API. Tries automation
tactics (grind, simp, omega, nlinarith, aesop) to fill sorries in
~1-2 seconds — before the expensive Aristotle submission.

Tested: fills simple sorries (n+0=n, commutativity, positivity)
instantly. Hard sorries still need Aristotle.

Workflow: write sorry → repair_lean_proofs (1s) → if sorry remains,
extract_sorry_lemmas → submit to Aristotle.

23/31 verified, $11.60 total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (3) hide show

agent.py +22 -2
log.md +30 -0
tools.py +55 -0

agent.py CHANGED Viewed

@@ -18,6 +18,7 @@ from tools import (
     search_lean_library,
     search_loogle,
     check_lean_code,
     extract_sorry_lemmas,
     submit_to_aristotle,
     check_aristotle_status,
@@ -53,8 +54,9 @@ If a statement is FALSE or you suspect it is false:
 ### Phase 2: Write statement + parallel proving
 1. **Write the formal Lean 4 theorem STATEMENT first** (with `sorry` as proof).
 2. **Verify the statement compiles** with check_lean_code (just the statement + sorry).
-3. Once the statement compiles:
-   - **Submit to Aristotle immediately** with the sorry'd code. Don't wait to try yourself first.
    - **Simultaneously try proving it yourself** with check_lean_code.
 4. If your proof attempt fails after 3-5 tries on a specific approach, try a DIFFERENT strategy:
    - Different tactics (simp, omega, ring, norm_num, linarith, nlinarith)
@@ -453,6 +455,24 @@ async def _handle_tool_call(fn_name: str, fn_args: dict, job: JobState) -> str:
             pass
         return result
     if fn_name == "extract_sorry_lemmas":
         code = fn_args.get("code", "")
         job.add_status("Extracting sorry'd sub-lemmas...")

     search_lean_library,
     search_loogle,
     check_lean_code,
+    repair_lean_proofs,
     extract_sorry_lemmas,
     submit_to_aristotle,
     check_aristotle_status,
 ### Phase 2: Write statement + parallel proving
 1. **Write the formal Lean 4 theorem STATEMENT first** (with `sorry` as proof).
 2. **Verify the statement compiles** with check_lean_code (just the statement + sorry).
+3. Once the statement compiles with sorry:
+   - **First try repair_lean_proofs** — it may fill simple sorries automatically in 1-2 seconds.
+   - If sorry remains, **submit to Aristotle** with the sorry'd code.
    - **Simultaneously try proving it yourself** with check_lean_code.
 4. If your proof attempt fails after 3-5 tries on a specific approach, try a DIFFERENT strategy:
    - Different tactics (simp, omega, ring, norm_num, linarith, nlinarith)
             pass
         return result
+    if fn_name == "repair_lean_proofs":
+        code = fn_args.get("code", "")
+        job.add_status("Attempting automatic proof repair...")
+        result = await repair_lean_proofs(code)
+        try:
+            parsed = json.loads(result)
+            repairs = parsed.get("repairs_applied", 0)
+            has_sorry = parsed.get("still_has_sorry", True)
+            if repairs > 0 and not has_sorry:
+                job.add_status(f"Proof repair succeeded! {repairs} sorry(s) filled automatically.")
+            elif repairs > 0:
+                job.add_status(f"Partial repair: {repairs} sorry(s) filled, some remain.")
+            else:
+                job.add_status("Proof repair: no sorries could be filled automatically.")
+        except json.JSONDecodeError:
+            pass
+        return result
     if fn_name == "extract_sorry_lemmas":
         code = fn_args.get("code", "")
         job.add_status("Extracting sorry'd sub-lemmas...")

log.md CHANGED Viewed

@@ -516,3 +516,33 @@ System is stable and reliable at its capability tier.
 - `templates/index.html`: Updated subtitle with tech stack description, replaced examples with 6 proven-to-work problems across different difficulty levels
 ### Overall: 22/30 verified locally + HF deployment confirmed working

 - `templates/index.html`: Updated subtitle with tech stack description, replaced examples with 6 proven-to-work problems across different difficulty levels
 ### Overall: 22/30 verified locally + HF deployment confirmed working
+## Iteration 14 — 2026-03-24 11:00 PDT
+### Diagnosis
+The agent's workflow for sorry-containing code was: write sorry → submit to Aristotle → wait. But many sorries can be closed instantly by automation tactics (grind, simp, omega). Tested Axle's `repair_proofs` API:
+- 3 simple sorries (n+0=n, a+b=b+a, 0<n+1): ALL filled by `grind` in 100ms
+- 2 hard sorries (Fermat, AM-GM): Not filled by automation (needs Mathlib-specific reasoning)
+### Fix: Add `repair_lean_proofs` tool (EFFICIENCY)
+New agent tool that calls Axle's `repair_proofs` API with automation tactics (grind, simp_all, omega, norm_num, nlinarith, aesop). Costs ~1-2 seconds per call.
+Updated workflow:
+1. Agent writes code with sorry → compiles
+2. **Call repair_lean_proofs** (NEW, ~1s) — may fill simple sorries instantly
+3. If sorry remains → extract_sorry_lemmas + submit to Aristotle
+4. Keep proving yourself in parallel
+This adds a fast, free middle step that catches low-hanging fruit before the expensive Aristotle round-trip (which takes 5-30 minutes).
+### Test Results
+| Problem | Time | Cost | Status |
+|---------|------|------|--------|
+| a∣b ∧ b∣a → a=b | ~10s | $0.007 | VERIFIED | `Nat.dvd_antisymm` |
+### Code Changes
+- `tools.py`: Added `repair_lean_proofs()` function + tool definition
+- `agent.py`: Added import, handler, updated system prompt Phase 2
+### Overall: 23/31 verified (74%), $11.60 total

tools.py CHANGED Viewed

@@ -137,6 +137,39 @@ async def check_lean_code(code: str) -> str:
 async def extract_sorry_lemmas(code: str) -> str:
     """Use Axle sorry2lemma to extract sorry'd subgoals into standalone lemma stubs."""
     try:
@@ -413,6 +446,28 @@ TOOL_DEFINITIONS = [
             },
         },
     },
     {
         "type": "function",
         "function": {

+async def repair_lean_proofs(code: str) -> str:
+    """Use Axle repair_proofs to automatically fill sorries with grind/simp/omega."""
+    try:
+        async with httpx.AsyncClient(timeout=180) as client:
+            response = await client.post(
+                f"{AXLE_BASE_URL}/repair_proofs",
+                headers={
+                    "Authorization": f"Bearer {AXLE_API_KEY}",
+                    "Content-Type": "application/json",
+                },
+                json={
+                    "content": code,
+                    "environment": LEAN_ENVIRONMENT,
+                    "terminal_tactics": ["grind", "simp_all", "omega", "norm_num", "nlinarith", "aesop"],
+                    "timeout_seconds": 60,
+                },
+            )
+            if response.status_code != 200:
+                return json.dumps({"error": f"Axle HTTP {response.status_code}: {response.text[:500]}"})
+            data = response.json()
+            repaired_code = data.get("content", code)
+            stats = data.get("repair_stats", {})
+            has_sorry = "sorry" in repaired_code
+            return json.dumps({
+                "repaired_code": repaired_code,
+                "repairs_applied": stats.get("apply_terminal_tactics", 0),
+                "still_has_sorry": has_sorry,
+                "okay": data.get("okay", False),
+            }, indent=2)
+    except Exception as e:
+        return json.dumps({"error": f"repair_proofs error: {e}"})
 async def extract_sorry_lemmas(code: str) -> str:
     """Use Axle sorry2lemma to extract sorry'd subgoals into standalone lemma stubs."""
     try:
             },
         },
     },
+    {
+        "type": "function",
+        "function": {
+            "name": "repair_lean_proofs",
+            "description": (
+                "Automatically attempt to fill sorry placeholders with automation tactics "
+                "(grind, simp, omega, norm_num, nlinarith, aesop). FAST (~1-2 seconds). "
+                "Call this BEFORE submitting to Aristotle — it may resolve simple sorries instantly. "
+                "Returns repaired code and whether sorry remains."
+            ),
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "code": {
+                        "type": "string",
+                        "description": "Lean 4 code with sorry placeholders to attempt repair.",
+                    }
+                },
+                "required": ["code"],
+            },
+        },
+    },
     {
         "type": "function",
         "function": {