Spaces:

build-small-hackathon
/

QED

Running

App Files Files Community

QED / README.md

Novian

Update README.md

cc99070 verified 18 days ago

preview code

Raw

History Blame Contribute Delete

4.86 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: Q.E.D
emoji: 🔬
colorFrom: green
colorTo: gray
sdk: gradio
app_file: front.py
pinned: true
license: mit
short_description: LLM-guided formal verification, kernel-certified proofs
tags:
  - track:wood
  - sponsor:modal
  - achievement:offbrand
  - achievement:best-demo
  - gradio
  - lean4
  - formal-verification
  - agents
  - build-small-hackathon
  - llama.cpp

⊢ Q.E.D

An LLM-guided formal verification agent. You give it a theorem statement in Lean 4; it finds a proof that Lean's kernel certifies as formally correct.

Unlike a chatbot saying "yes, that's true," Lean's kernel is a proof-checker that either accepts or rejects every logical step against its axioms. No hallucinations. No approximations. The result is machine-checked mathematics.

Demo

📹 [DEMO VIDEO: https://huggingface.co/spaces/build-small-hackathon/QED/blob/main/Demo.mkv]

🐦 [SOCIAL POST: https://www.reddit.com/r/LocalLLaMA/comments/1u6xo0t/hf_hackathon_submission/]

What it does

The agent runs a propose → verify → learn loop, using a 27B LLM (well under the 32B limit) to propose Lean 4 proof tactics, Lean's kernel to verify each one, and the kernel's error messages fed back verbatim into the next prompt:

theorem
  → propose 3 tactic candidates (27B LLM via Modal)
  → verify each in live Lean 4 REPL (Modal container)
  → kernel rejects? → error text fed back to LLM
  → kernel accepts? → advance proof state
  → repeat until complete or stuck

Key behaviours:

Fallback layer — deterministic tactics (rfl, norm_num, simp, omega, contradiction, assumption) are tried at each step before the LLM is called.
Stuck-state detector — if the same proof state recurs 3 times, or 3 consecutive steps all fail, the agent concludes "not provable as stated" and returns a clean verdict instead of looping indefinitely. This correctly identifies false theorems.
No cache on demo runs — every run executes the full live loop so the propose→verify→learn steps are always visible.

Model: one model, Qwen3-27B quantized (Q4_K_M GGUF) served via llama.cpp. 27B < 32B. ✓

Best Use of Modal

Two Modal deployments power the app:

App	What it runs
`lean-proof-agent`	FastAPI app with a persistent Lean 4 REPL. `min_containers=1` keeps it warm so there is zero cold-start delay. Runs the full agent loop and orchestrates the LLM calls.
`llama-server`	llama.cpp HTTP server running Qwen3-27B (Q4_K_M) on a Modal GPU. Receives the current proof state as context, returns tactic candidates.

Why Modal specifically:

Lean 4 requires a persistent REPL process with the full toolchain installed and pre-warmed — not something you can spin up per-request. Modal containers hold that state across calls.
min_containers=1 means the Lean server is always alive, which is critical for a live demo with ~30-second proof runs.
The LLM needs a GPU. Modal's on-demand GPU allocation means no dedicated hardware to maintain.
The two apps are independently scaled: REPL is CPU-bound, LLM is GPU-bound.

Off Brand — custom UI

The frontend is not stock Gradio. Every proof run produces a live-rendered SVG proof tree built from the agent's search trace: goal-state nodes connected by tactic edges, failed branches in red, the accepted path in green, a terminal QED node. It runs in the browser with zero JS dependencies — generated server-side and injected as HTML.

The overall aesthetic is a dark mathematical terminal: JetBrains Mono, GitHub-dark palette (#0d1117 background, #2ea043 green accent), styled to match the proof tree's colour scheme.

Best Agent — agent loop design

The proof search is a genuine multi-step agentic loop with external tool use and error-driven self-correction:

Propose: LLM generates 3 tactic candidates given the current proof state and the previous kernel error (if any)
Verify: Lean's formal kernel checks each candidate — this is ground truth, not a heuristic
Learn: the exact kernel error message is injected back into the next LLM prompt
Decide: best partial-progress result is selected; all-failed steps increment a failure counter
Conclude: stuck-state detector fires a deliberate "not provable" verdict rather than hitting the step limit

The agent correctly proves true theorems and correctly identifies false ones — both happen live on every run.

Try it

Load any example theorem and click ⊢ Prove. The step walkthrough shows every LLM candidate, which ones the kernel rejected, the error fed back, and the tactic that advanced the proof.

Each completed proof includes a pre-filled link to live.lean-lang.org so you can paste it into Lean's web kernel and see "Goals accomplished!" yourself.