Spaces:
Running
Running
| title: Q.E.D | |
| emoji: π¬ | |
| colorFrom: green | |
| colorTo: gray | |
| sdk: gradio | |
| app_file: front.py | |
| pinned: true | |
| license: mit | |
| short_description: LLM-guided formal verification, kernel-certified proofs | |
| tags: | |
| - track:wood | |
| - sponsor:modal | |
| - achievement:offbrand | |
| - achievement:best-demo | |
| - gradio | |
| - lean4 | |
| - formal-verification | |
| - agents | |
| - build-small-hackathon | |
| - llama.cpp | |
| # β’ Q.E.D | |
| An LLM-guided formal verification agent. You give it a theorem statement in Lean 4; it finds a proof that **Lean's kernel certifies as formally correct**. | |
| Unlike a chatbot saying "yes, that's true," Lean's kernel is a proof-checker that either accepts or rejects every logical step against its axioms. No hallucinations. No approximations. The result is machine-checked mathematics. | |
| ## Demo | |
| πΉ **[DEMO VIDEO: https://huggingface.co/spaces/build-small-hackathon/QED/blob/main/Demo.mkv]** | |
| π¦ **[SOCIAL POST: https://www.reddit.com/r/LocalLLaMA/comments/1u6xo0t/hf_hackathon_submission/]** | |
| ## What it does | |
| The agent runs a propose β verify β learn loop, using a 27B LLM (well under the 32B limit) to propose Lean 4 proof tactics, Lean's kernel to verify each one, and the kernel's error messages fed back verbatim into the next prompt: | |
| ``` | |
| theorem | |
| β propose 3 tactic candidates (27B LLM via Modal) | |
| β verify each in live Lean 4 REPL (Modal container) | |
| β kernel rejects? β error text fed back to LLM | |
| β kernel accepts? β advance proof state | |
| β repeat until complete or stuck | |
| ``` | |
| Key behaviours: | |
| - **Fallback layer** β deterministic tactics (`rfl`, `norm_num`, `simp`, `omega`, `contradiction`, `assumption`) are tried at each step before the LLM is called. | |
| - **Stuck-state detector** β if the same proof state recurs 3 times, or 3 consecutive steps all fail, the agent concludes "not provable as stated" and returns a clean verdict instead of looping indefinitely. This correctly identifies false theorems. | |
| - **No cache on demo runs** β every run executes the full live loop so the proposeβverifyβlearn steps are always visible. | |
| **Model**: one model, Qwen3-27B quantized (Q4_K_M GGUF) served via llama.cpp. 27B < 32B. β | |
| ## Best Use of Modal | |
| Two Modal deployments power the app: | |
| | App | What it runs | | |
| |---|---| | |
| | `lean-proof-agent` | FastAPI app with a persistent Lean 4 REPL. `min_containers=1` keeps it warm so there is zero cold-start delay. Runs the full agent loop and orchestrates the LLM calls. | | |
| | `llama-server` | llama.cpp HTTP server running **Qwen3-27B (Q4_K_M)** on a Modal GPU. Receives the current proof state as context, returns tactic candidates. | | |
| Why Modal specifically: | |
| - Lean 4 requires a **persistent REPL process** with the full toolchain installed and pre-warmed β not something you can spin up per-request. Modal containers hold that state across calls. | |
| - `min_containers=1` means the Lean server is **always alive**, which is critical for a live demo with ~30-second proof runs. | |
| - The LLM needs a GPU. Modal's on-demand GPU allocation means no dedicated hardware to maintain. | |
| - The two apps are **independently scaled**: REPL is CPU-bound, LLM is GPU-bound. | |
| ## Off Brand β custom UI | |
| The frontend is not stock Gradio. Every proof run produces a **live-rendered SVG proof tree** built from the agent's search trace: goal-state nodes connected by tactic edges, failed branches in red, the accepted path in green, a terminal QED node. It runs in the browser with zero JS dependencies β generated server-side and injected as HTML. | |
| The overall aesthetic is a dark mathematical terminal: JetBrains Mono, GitHub-dark palette (`#0d1117` background, `#2ea043` green accent), styled to match the proof tree's colour scheme. | |
| ## Best Agent β agent loop design | |
| The proof search is a genuine multi-step agentic loop with external tool use and error-driven self-correction: | |
| - **Propose**: LLM generates 3 tactic candidates given the current proof state and the previous kernel error (if any) | |
| - **Verify**: Lean's formal kernel checks each candidate β this is ground truth, not a heuristic | |
| - **Learn**: the exact kernel error message is injected back into the next LLM prompt | |
| - **Decide**: best partial-progress result is selected; all-failed steps increment a failure counter | |
| - **Conclude**: stuck-state detector fires a deliberate "not provable" verdict rather than hitting the step limit | |
| The agent correctly proves true theorems and correctly identifies false ones β both happen live on every run. | |
| ## Try it | |
| Load any example theorem and click **β’ Prove**. The step walkthrough shows every LLM candidate, which ones the kernel rejected, the error fed back, and the tactic that advanced the proof. | |
| Each completed proof includes a pre-filled link to [live.lean-lang.org](https://live.lean-lang.org/) so you can paste it into Lean's web kernel and see "Goals accomplished!" yourself. | |