Spaces:

HuggingAI4Engineering
/

CADGenBench

Running

Michael Rabinovich Cursor commited on May 26

Commit

c040324

1 Parent(s): 64a522a

step 3: gradio leaderboard scaffold

Replace the static placeholder index with a Gradio app: Leaderboard tab
backed by a hand-crafted results.jsonl, Submit tab (UI-only stub, real
eval wiring lands in step 6), About tab. Org slugs parameterised via
HF_ORG / HF_SUBMISSIONS_REPO / HF_DATA_REPO env vars so the future move
to the science org is a secret change, not a code change.

Files:
- app.py: gradio 6.14 app, three tabs
- requirements.txt: gradio, pandas, huggingface_hub, datasets
- results.jsonl: two dev seed rows
- README.md: sdk: static -> sdk: gradio
- legacy/: archive of the prototype nist comparison HTMLs
- .gitignore: __pycache__, .venv, .gradio

Schema: validity_rate (float [0,1]) instead of valid (bool); notes kept
in the row but hidden from the visible table. All copy uses "AI-driven"
rather than "LLM-driven".

Co-authored-by: Cursor <cursoragent@cursor.com>

Files changed (9) hide show

.gitignore +6 -0
README.md +18 -6
app.py +180 -0
index.html +0 -14
nist_comparison_3d.html → legacy/nist_comparison_3d.html +0 -0
nist_hf_comparison_3d.html → legacy/nist_hf_comparison_3d.html +0 -0
requirements.txt +4 -0
results.jsonl +2 -0
style.css +0 -28

.gitignore ADDED Viewed

	@@ -0,0 +1,6 @@

+__pycache__/
+*.pyc
+.venv/
+venv/
+.env
+.gradio/

README.md CHANGED Viewed

@@ -1,11 +1,23 @@
 ---
-title: AI4Engineering
-emoji: 🔥
 colorFrom: indigo
 colorTo: pink
-sdk: static
-pinned: false
-short_description: A space to explore AI tooling for Engineering
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: CADGenBench Leaderboard
+emoji: 🔧
 colorFrom: indigo
 colorTo: pink
+sdk: gradio
+sdk_version: 6.14.0
+app_file: app.py
+pinned: true
+short_description: Leaderboard for AI-driven CAD generation
 ---
+# CADGenBench Leaderboard
+A benchmark for AI-driven CAD generation: how well can a model turn a
+textual or visual description of a mechanical part into a valid,
+geometrically correct 3D model?
+**Status:** in active development under `michaelr27/AI4Engineering`. Will
+move to `science/cadgenbench-leaderboard` before going public. Reference
+shape inspired by [`adyen/DABstep`](https://huggingface.co/spaces/adyen/DABstep).
+The previous static-prototype HTML artifacts live in [`legacy/`](legacy/).

app.py ADDED Viewed

	@@ -0,0 +1,180 @@

+"""CADGenBench Leaderboard Space.
+Step 3 prototype: a hand-crafted ``results.jsonl`` drives the leaderboard
+table, and the Submit tab is a UI-only stub. The read path (Step 5) will
+swap the JSONL for ``datasets.load_dataset(HF_SUBMISSIONS_REPO, 'results')``
+and the write path (Step 6) will run ``cadgenbench evaluate`` and push a
+result row back to the submissions dataset via ``HfApi``.
+"""
+from __future__ import annotations
+import json
+import os
+from pathlib import Path
+import gradio as gr
+import pandas as pd
+HF_ORG = os.getenv("HF_ORG", "michaelr27")
+HF_SUBMISSIONS_REPO = os.getenv(
+    "HF_SUBMISSIONS_REPO", f"{HF_ORG}/cadgenbench-submissions"
+)
+HF_DATA_REPO = os.getenv("HF_DATA_REPO", f"{HF_ORG}/cadgenbench-data")
+LOCAL_RESULTS_PATH = Path(__file__).parent / "results.jsonl"
+LEADERBOARD_COLS = [
+    "model",
+    "submitter_name",
+    "aggregate_score",
+    "validity_rate",
+    "submitted_at",
+    "cadgenbench_version",
+]
+def load_leaderboard() -> pd.DataFrame:
+    if not LOCAL_RESULTS_PATH.exists():
+        return pd.DataFrame(columns=LEADERBOARD_COLS)
+    rows = [
+        json.loads(line)
+        for line in LOCAL_RESULTS_PATH.read_text().splitlines()
+        if line.strip()
+    ]
+    if not rows:
+        return pd.DataFrame(columns=LEADERBOARD_COLS)
+    df = pd.DataFrame(rows)
+    cols = [c for c in LEADERBOARD_COLS if c in df.columns]
+    return (
+        df[cols]
+        .sort_values("aggregate_score", ascending=False, na_position="last")
+        .reset_index(drop=True)
+    )
+def handle_submit(
+    zip_file,
+    model: str,
+    submitter: str,
+    agent_url: str,
+    notes: str,
+    agree: bool,
+) -> str:
+    if zip_file is None:
+        return "**Error:** please attach a submission zip."
+    if not model.strip():
+        return "**Error:** please fill in the Model identifier."
+    if not submitter.strip():
+        return "**Error:** please fill in your Submitter name."
+    if not agree:
+        return "**Error:** you must agree to publish before submitting."
+    name = Path(zip_file.name).name
+    return (
+        f"Received `{name}` for model `{model}` by `{submitter}`.\n\n"
+        f"_Evaluation is not wired yet (Step 6 of the build plan). Once it "
+        f"is, this submission will run the CPU eval inline and append a row "
+        f"to `{HF_SUBMISSIONS_REPO}`._"
+    )
+ABOUT_MD = f"""## About
+**CADGenBench** evaluates AI-driven CAD generation: how well a model can
+turn a description of a mechanical part into a valid, geometrically
+correct 3D model.
+- Reference baseline: an iterative AI agent that writes build123d Python.
+- Submission flow: upload a zip of per-fixture STEP files; the Space runs
+  the CPU eval and appends a row to the submissions dataset.
+- Datasets: fixtures (inputs + ground truth) live in `{HF_DATA_REPO}`;
+  submissions and computed results live in `{HF_SUBMISSIONS_REPO}`.
+### Status
+This Space is in **active development** under `{HF_ORG}/AI4Engineering` and
+will move to `science/cadgenbench-leaderboard` before going public. See
+`space-setup/` in the source tree for the full build plan.
+"""
+with gr.Blocks(title="CADGenBench Leaderboard") as app:
+    gr.Markdown(
+        "# CADGenBench Leaderboard\n"
+        "_Benchmarking AI-driven CAD generation._"
+    )
+    with gr.Tab("Leaderboard"):
+        df_view = gr.Dataframe(
+            value=load_leaderboard(),
+            interactive=False,
+            wrap=True,
+            label="Results (sorted by aggregate CAD score)",
+        )
+        refresh_btn = gr.Button("Refresh", size="sm")
+        refresh_btn.click(fn=load_leaderboard, outputs=df_view)
+    with gr.Tab("Submit"):
+        gr.Markdown(
+            f"""
+**Submission format.** A single zip with:
+- one folder per fixture in `{HF_DATA_REPO}`, each containing `output.step`;
+- a top-level `meta.json`:
+```json
+{{
+  "submitter_name": "your name or team",
+  "model": "anthropic/claude-sonnet-4-6",
+  "agent_url": "https://github.com/...   (optional)",
+  "notes": "free text, optional, max 500 chars, single line, plain text",
+  "agree_to_publish": true
+}}
+```
+**Notes field.** Plain text only (no markdown / HTML). Capped at 500 chars
+and stripped to a single line. Shown in the per-submission detail view,
+not in the main leaderboard table.
+The Space runs the CPU eval inline and appends a row to
+`{HF_SUBMISSIONS_REPO}`. You can fill the fields below to override
+`meta.json` for a quick test.
+"""
+        )
+        zip_in = gr.File(label="Submission ZIP", file_types=[".zip"])
+        with gr.Row():
+            model_in = gr.Textbox(
+                label="Model identifier",
+                placeholder="e.g. anthropic/claude-sonnet-4-6",
+            )
+            submitter_in = gr.Textbox(label="Submitter name")
+        with gr.Row():
+            agent_url_in = gr.Textbox(
+                label="Agent / paper URL (optional)",
+                placeholder="https://github.com/...",
+            )
+            notes_in = gr.Textbox(label="Notes (optional)")
+        agree_in = gr.Checkbox(
+            label="I agree to publish this result on the public leaderboard."
+        )
+        submit_btn = gr.Button("Submit", variant="primary")
+        submit_out = gr.Markdown()
+        submit_btn.click(
+            fn=handle_submit,
+            inputs=[
+                zip_in,
+                model_in,
+                submitter_in,
+                agent_url_in,
+                notes_in,
+                agree_in,
+            ],
+            outputs=submit_out,
+        )
+    with gr.Tab("About"):
+        gr.Markdown(ABOUT_MD)
+if __name__ == "__main__":
+    app.launch(theme=gr.themes.Soft())

index.html DELETED Viewed

@@ -1,14 +0,0 @@
-<!doctype html>
-<html>
-	<head>
-		<meta charset="utf-8" />
-		<meta name="viewport" content="width=device-width" />
-		<title>My static Space</title>
-		<link rel="stylesheet" href="style.css" />
-	</head>
-	<body>
-		<div id="app"></div>
-		<hr> Hello </hr>
-	</body>
-</html>

nist_comparison_3d.html → legacy/nist_comparison_3d.html RENAMED Viewed

File without changes

nist_hf_comparison_3d.html → legacy/nist_hf_comparison_3d.html RENAMED Viewed

File without changes

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio==6.14.0
+pandas>=2.0
+huggingface_hub>=0.27.0
+datasets>=3.0

results.jsonl ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ {"submission_id": "demo-001", "submitter_name": "Reference (dev seed)", "model": "anthropic/claude-sonnet-4-6", "agent_url": "https://github.com/MichaelRabinovich/LeForge", "notes": "seed row for UI dev", "submitted_at": "2026-05-26T08:00:00Z", "cadgenbench_version": "0.0.0-dev", "cadgenbench_data_revision": "stub", "validity_rate": 1.0, "aggregate_score": 0.42, "per_fixture_scores": {"jig-01-single-hole-plate": 0.85, "jig-02-4hole-pattern-plate": 0.31}, "submission_blob_url": null}
2	+ {"submission_id": "demo-002", "submitter_name": "Reference (dev seed)", "model": "openai/gpt-5.3", "agent_url": null, "notes": "second seed row", "submitted_at": "2026-05-26T08:30:00Z", "cadgenbench_version": "0.0.0-dev", "cadgenbench_data_revision": "stub", "validity_rate": 0.5, "aggregate_score": 0.18, "per_fixture_scores": {"jig-01-single-hole-plate": 0.36, "jig-02-4hole-pattern-plate": 0.0}, "submission_blob_url": null}

style.css DELETED Viewed

@@ -1,28 +0,0 @@
-body {
-	padding: 2rem;
-	font-family: -apple-system, BlinkMacSystemFont, "Arial", sans-serif;
-}
-h1 {
-	font-size: 16px;
-	margin-top: 0;
-}
-p {
-	color: rgb(107, 114, 128);
-	font-size: 15px;
-	margin-bottom: 10px;
-	margin-top: 5px;
-}
-.card {
-	max-width: 620px;
-	margin: 0 auto;
-	padding: 16px;
-	border: 1px solid lightgray;
-	border-radius: 16px;
-}
-.card p:last-child {
-	margin-bottom: 0;
-}