app+leaderboard: link submission_name to a Space-side report proxy
Browse filesPolish follow-up #2 on C6. Three user-visible changes plus the
plumbing required to make them work.
1. Drop the `report` and `zip` columns from the rendered table.
`submission_name` becomes the report's deep link (typical HF
leaderboard pattern: the system / model name doubles as the link
target; cf. Open LLM Leaderboard, DABstep, BigCodeBench). Modern
completed rows get `[<name>](/reports/<sid>.html)`; pending,
failed, and legacy seed rows fall back to plain text so there's
no broken click. The hidden `submission_blob_url` and the
computed `report_url` ride along in the DataFrame and surface in
the row-click detail panel only.
2. Rename the agent-URL column from `agent` to
`model details (optional)`. Missing cells render as italic
`_None_` so the optionality is explicit instead of a blank.
"Agent" was confusing; the schema lets `agent_url` be code OR
paper OR a blog post and "agent" reads narrow.
3. Add a Space-side proxy for the per-submission HTML report at
`/reports/{submission_id}.html`. HF Hub's /resolve/ serves
dataset HTML as `Content-Type: text/plain` for security, so a
direct dataset link renders source rather than the rendered
report. The proxy fetches the file via `hf_hub_download` (with
an in-memory LRU cache, 128 entries) and re-streams it with
`Content-Type: text/html`. Reports actually render.
Plumbing:
- app.py flips its launch shape from `with gr.Blocks() as app /
app.launch()` to building a parent FastAPI app, registering the
proxy route, and mounting Gradio under "/" via
`gr.mount_gradio_app`. Direct FastAPI routes are evaluated before
the mounted sub-app, so `/reports/<sid>.html` never gets shadowed
by Gradio's catch-all. `uvicorn.run` reads the same
GRADIO_SERVER_NAME / GRADIO_SERVER_PORT env vars the old launch
did, so the Dockerfile's CMD doesn't need to change. (uvicorn is
already a gradio transitive dep, no new requirements line.)
- The relative URL `/reports/<sid>.html` in the markdown link works
on any host (localhost during dev, *.hf.space in prod) without
the leaderboard reader needing to know its own URL.
- `_report_relative_url` keeps the existing modern-pipeline gate
(status == "completed" AND submission_sha256 non-null); the
three baseline seed rows still don't get a link target.
Tests (15/15 green):
- test_leaderboard: existing assertions updated for the
submission_name link wrapping; new test_submission_name_links_to_
report_when_available + test_legacy_row_submission_name_is_
plain_text + test_model_details_column_renders.
- New tests/test_proxy.py: serve_report behavior with monkeypatched
_fetch_report_html (success / missing / Hub failure) + the
route-registration regression guard.
Verification loop change (per user request): every UI/route change
gets booted via subprocess locally and probed with HTTP before push.
For this commit:
- Boot app.py on a free port.
- GET /reports/<known-modern-sid>.html -> 200, text/html, body
starts with <!DOCTYPE html, length ~1.4MB.
- GET /reports/<bogus-sid>.html -> 404.
Both PASS. The Space rebuild is now a final sanity check, not the
verify loop.
- app.py +87 -14
- leaderboard.py +82 -63
- tests/test_leaderboard.py +58 -48
- tests/test_proxy.py +60 -0
|
@@ -1,17 +1,28 @@
|
|
| 1 |
-
"""CADGenBench Leaderboard Space - Gradio UI
|
| 2 |
|
| 3 |
Read path lives in :mod:`leaderboard`. Submit-tab validation lives in
|
| 4 |
-
:mod:`submit`. Both are wired into the Gradio
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
from __future__ import annotations
|
| 7 |
|
| 8 |
import logging
|
|
|
|
| 9 |
import re
|
|
|
|
|
|
|
| 10 |
|
| 11 |
import gradio as gr
|
| 12 |
-
from gradio_leaderboard import Leaderboard
|
| 13 |
-
|
| 14 |
import pandas as pd
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
from leaderboard import (
|
| 17 |
HF_DATA_REPO,
|
|
@@ -23,6 +34,8 @@ from leaderboard import (
|
|
| 23 |
)
|
| 24 |
from submit import handle_submit
|
| 25 |
|
|
|
|
|
|
|
| 26 |
# Surface module-level logger.info / logger.warning / logger.exception
|
| 27 |
# calls from leaderboard.py + submit.py in the Space's runtime logs.
|
| 28 |
# Otherwise they go nowhere and any refresh / worker pathology is
|
|
@@ -109,20 +122,65 @@ def _format_detail(df: pd.DataFrame | None, evt: gr.SelectData) -> str:
|
|
| 109 |
lines.append(f"- **Submitted**: {_fmt_timestamp(row['submitted_at'])}")
|
| 110 |
if _has(row.get("notes")):
|
| 111 |
lines.append(f"- **Notes**: {row['notes']}")
|
| 112 |
-
#
|
| 113 |
-
#
|
| 114 |
-
# `
|
| 115 |
-
lines.append(
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
| 120 |
if row.get("status") == "failed" and _has(row.get("failure_reason")):
|
| 121 |
lines.append(f"- **Failure reason**: {row['failure_reason']}")
|
| 122 |
return "\n".join(lines)
|
| 123 |
|
| 124 |
|
| 125 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
gr.Markdown(
|
| 127 |
"# CADGenBench Leaderboard\n"
|
| 128 |
"_Benchmarking AI-driven CAD generation._"
|
|
@@ -228,5 +286,20 @@ to publish the resulting row on the public leaderboard.
|
|
| 228 |
)
|
| 229 |
|
| 230 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 231 |
if __name__ == "__main__":
|
| 232 |
-
|
|
|
|
|
|
|
|
|
| 1 |
+
"""CADGenBench Leaderboard Space - Gradio UI + report-proxy mount.
|
| 2 |
|
| 3 |
Read path lives in :mod:`leaderboard`. Submit-tab validation lives in
|
| 4 |
+
:mod:`submit`. Both are wired into the Gradio Blocks below. The
|
| 5 |
+
Gradio app is mounted under a FastAPI parent so the custom
|
| 6 |
+
``/reports/{submission_id}.html`` route can re-serve dataset HTML
|
| 7 |
+
with ``Content-Type: text/html`` (HF Hub's ``/resolve/`` serves it
|
| 8 |
+
as ``text/plain`` by policy, which makes the browser show source
|
| 9 |
+
rather than render).
|
| 10 |
"""
|
| 11 |
from __future__ import annotations
|
| 12 |
|
| 13 |
import logging
|
| 14 |
+
import os
|
| 15 |
import re
|
| 16 |
+
from functools import lru_cache
|
| 17 |
+
from pathlib import Path
|
| 18 |
|
| 19 |
import gradio as gr
|
|
|
|
|
|
|
| 20 |
import pandas as pd
|
| 21 |
+
import uvicorn
|
| 22 |
+
from fastapi import FastAPI
|
| 23 |
+
from fastapi.responses import HTMLResponse, Response
|
| 24 |
+
from gradio_leaderboard import Leaderboard
|
| 25 |
+
from huggingface_hub import hf_hub_download
|
| 26 |
|
| 27 |
from leaderboard import (
|
| 28 |
HF_DATA_REPO,
|
|
|
|
| 34 |
)
|
| 35 |
from submit import handle_submit
|
| 36 |
|
| 37 |
+
logger = logging.getLogger(__name__)
|
| 38 |
+
|
| 39 |
# Surface module-level logger.info / logger.warning / logger.exception
|
| 40 |
# calls from leaderboard.py + submit.py in the Space's runtime logs.
|
| 41 |
# Otherwise they go nowhere and any refresh / worker pathology is
|
|
|
|
| 122 |
lines.append(f"- **Submitted**: {_fmt_timestamp(row['submitted_at'])}")
|
| 123 |
if _has(row.get("notes")):
|
| 124 |
lines.append(f"- **Notes**: {row['notes']}")
|
| 125 |
+
# `model details (optional)` carries the markdown link (or
|
| 126 |
+
# _None_ when missing); the hidden `submission_blob_url` /
|
| 127 |
+
# `report_url` columns are raw URLs we wrap into named links here.
|
| 128 |
+
lines.append(
|
| 129 |
+
f"- **Model details (optional)**: "
|
| 130 |
+
f"{row.get('model details (optional)') or '_None_'}"
|
| 131 |
+
)
|
| 132 |
+
if _has(row.get("submission_blob_url")):
|
| 133 |
+
lines.append(f"- **Submission ZIP**: [download]({row['submission_blob_url']})")
|
| 134 |
+
if _has(row.get("report_url")):
|
| 135 |
+
lines.append(f"- **Report**: [open]({row['report_url']})")
|
| 136 |
if row.get("status") == "failed" and _has(row.get("failure_reason")):
|
| 137 |
lines.append(f"- **Failure reason**: {row['failure_reason']}")
|
| 138 |
return "\n".join(lines)
|
| 139 |
|
| 140 |
|
| 141 |
+
@lru_cache(maxsize=128)
|
| 142 |
+
def _fetch_report_html(submission_id: str) -> bytes | None:
|
| 143 |
+
"""Pull ``reports/<id>.html`` off the submissions dataset.
|
| 144 |
+
|
| 145 |
+
Cached in-process so repeat clicks on the same row don't hit
|
| 146 |
+
the Hub. Returns ``None`` on any failure so the caller can
|
| 147 |
+
serve a clean 404 rather than leaking a stack trace.
|
| 148 |
+
"""
|
| 149 |
+
try:
|
| 150 |
+
local_path = hf_hub_download(
|
| 151 |
+
repo_id=HF_SUBMISSIONS_REPO,
|
| 152 |
+
filename=f"reports/{submission_id}.html",
|
| 153 |
+
repo_type="dataset",
|
| 154 |
+
)
|
| 155 |
+
return Path(local_path).read_bytes()
|
| 156 |
+
except Exception as e: # noqa: BLE001 - any Hub failure -> 404
|
| 157 |
+
logger.warning(
|
| 158 |
+
"Failed to fetch report for %s (%s: %s)",
|
| 159 |
+
submission_id, type(e).__name__, e,
|
| 160 |
+
)
|
| 161 |
+
return None
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
def serve_report(submission_id: str) -> Response:
|
| 165 |
+
"""Proxy a per-submission HTML report through the Space.
|
| 166 |
+
|
| 167 |
+
HF Hub serves dataset HTML under ``/resolve/`` with
|
| 168 |
+
``Content-Type: text/plain`` (security: dataset files can't host
|
| 169 |
+
live HTML), so a direct dataset link shows source instead of
|
| 170 |
+
rendering. This route lives on the Space (which can legitimately
|
| 171 |
+
serve text/html) and re-streams the file's bytes with the right
|
| 172 |
+
content-type.
|
| 173 |
+
"""
|
| 174 |
+
content = _fetch_report_html(submission_id)
|
| 175 |
+
if content is None:
|
| 176 |
+
return HTMLResponse(
|
| 177 |
+
content="<h1>Report not found</h1>",
|
| 178 |
+
status_code=404,
|
| 179 |
+
)
|
| 180 |
+
return Response(content=content, media_type="text/html; charset=utf-8")
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
with gr.Blocks(title="CADGenBench Leaderboard", theme=gr.themes.Soft()) as blocks:
|
| 184 |
gr.Markdown(
|
| 185 |
"# CADGenBench Leaderboard\n"
|
| 186 |
"_Benchmarking AI-driven CAD generation._"
|
|
|
|
| 286 |
)
|
| 287 |
|
| 288 |
|
| 289 |
+
# Mount Gradio under a FastAPI parent so the custom proxy route
|
| 290 |
+
# above lives at the same origin as the UI. Direct routes on `app`
|
| 291 |
+
# get checked before the Gradio sub-app, so `/reports/<sid>.html`
|
| 292 |
+
# never gets shadowed.
|
| 293 |
+
app = FastAPI()
|
| 294 |
+
app.add_api_route(
|
| 295 |
+
"/reports/{submission_id}.html",
|
| 296 |
+
serve_report,
|
| 297 |
+
methods=["GET"],
|
| 298 |
+
)
|
| 299 |
+
app = gr.mount_gradio_app(app, blocks, path="/")
|
| 300 |
+
|
| 301 |
+
|
| 302 |
if __name__ == "__main__":
|
| 303 |
+
host = os.getenv("GRADIO_SERVER_NAME", "0.0.0.0")
|
| 304 |
+
port = int(os.getenv("GRADIO_SERVER_PORT", "7860"))
|
| 305 |
+
uvicorn.run(app, host=host, port=port)
|
|
@@ -34,10 +34,13 @@ HUB_FETCH_TIMEOUT_SECONDS = 30
|
|
| 34 |
# Hidden columns ride along in the DataFrame so `Leaderboard.select(...)`
|
| 35 |
# can read them out without a separate state-cache or re-fetch.
|
| 36 |
# Display column names (these are what the rendered table headers
|
| 37 |
-
# read). The
|
| 38 |
-
#
|
| 39 |
-
#
|
| 40 |
-
#
|
|
|
|
|
|
|
|
|
|
| 41 |
LEADERBOARD_COLS = [
|
| 42 |
"status",
|
| 43 |
"submission_name",
|
|
@@ -46,13 +49,13 @@ LEADERBOARD_COLS = [
|
|
| 46 |
"validity_rate",
|
| 47 |
"submitted_at",
|
| 48 |
"cadgenbench_version",
|
| 49 |
-
"
|
| 50 |
-
"zip",
|
| 51 |
-
"report",
|
| 52 |
# Detail-panel-only (hidden via `hide_columns` on the widget):
|
| 53 |
"submission_id",
|
| 54 |
"notes",
|
| 55 |
"failure_reason",
|
|
|
|
|
|
|
| 56 |
]
|
| 57 |
|
| 58 |
# Validated table additionally exposes `validation_method`; on the
|
|
@@ -67,25 +70,33 @@ VALIDATED_LEADERBOARD_COLS = [
|
|
| 67 |
"validation_method",
|
| 68 |
"submitted_at",
|
| 69 |
"cadgenbench_version",
|
| 70 |
-
"
|
| 71 |
-
"zip",
|
| 72 |
-
"report",
|
| 73 |
"submission_id",
|
| 74 |
"notes",
|
| 75 |
"failure_reason",
|
|
|
|
|
|
|
| 76 |
]
|
| 77 |
|
| 78 |
# Columns to hide from rendering on both tables. These ride in the
|
| 79 |
# DataFrame so the row-click detail panel can populate from them; the
|
| 80 |
# widget hides them from view.
|
| 81 |
-
LEADERBOARD_HIDE_COLUMNS = [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
-
# Per-column gradio_leaderboard datatypes.
|
| 84 |
-
#
|
| 85 |
-
#
|
| 86 |
-
#
|
| 87 |
-
#
|
| 88 |
-
|
|
|
|
|
|
|
| 89 |
|
| 90 |
|
| 91 |
def _datatypes_for(columns: list[str]) -> list[str]:
|
|
@@ -198,13 +209,13 @@ def _shorten_url_for_display(url: str) -> str:
|
|
| 198 |
|
| 199 |
|
| 200 |
def _agent_url_md(url) -> str:
|
| 201 |
-
"""Render the `
|
| 202 |
|
| 203 |
Uses a shortened version of the URL itself as the link text:
|
| 204 |
`agent_url` is a free-form "URL pointing at the agent code or
|
| 205 |
-
paper" per the schema, so the URL itself carries the only
|
| 206 |
-
|
| 207 |
-
|
| 208 |
just wasn't filled, rather than a blank.
|
| 209 |
"""
|
| 210 |
if _is_empty(url):
|
|
@@ -212,39 +223,41 @@ def _agent_url_md(url) -> str:
|
|
| 212 |
return f"[{_shorten_url_for_display(str(url))}]({url})"
|
| 213 |
|
| 214 |
|
| 215 |
-
def
|
| 216 |
-
"""
|
| 217 |
-
|
| 218 |
-
Link text stays "zip" (the URL points at our own infrastructure
|
| 219 |
-
and adds no extra information for the reader).
|
| 220 |
-
"""
|
| 221 |
-
if _is_empty(url):
|
| 222 |
-
return ""
|
| 223 |
-
return f"[zip]({url})"
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
def _report_url_md(submission_id, status, submission_sha256) -> str:
|
| 227 |
-
"""Build the report URL, only when the report file is known to exist.
|
| 228 |
|
| 229 |
-
`reports/<id>.html`
|
| 230 |
-
|
| 231 |
-
(the
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
|
| 236 |
-
`
|
| 237 |
-
|
| 238 |
-
|
|
|
|
|
|
|
| 239 |
"""
|
| 240 |
if status != "completed" or _is_empty(submission_id):
|
| 241 |
return ""
|
| 242 |
if _is_empty(submission_sha256):
|
| 243 |
return ""
|
| 244 |
-
return
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
|
| 249 |
|
| 250 |
def load_leaderboard_split() -> tuple[pd.DataFrame, pd.DataFrame]:
|
|
@@ -296,25 +309,24 @@ def _project_and_format(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:
|
|
| 296 |
if df.empty:
|
| 297 |
return pd.DataFrame(columns=columns)
|
| 298 |
df = df.copy()
|
| 299 |
-
# Compute
|
| 300 |
-
#
|
| 301 |
-
#
|
|
|
|
| 302 |
if "submission_id" in df.columns and "status" in df.columns:
|
| 303 |
-
df["
|
| 304 |
-
lambda r:
|
| 305 |
r["submission_id"],
|
| 306 |
r["status"],
|
| 307 |
r.get("submission_sha256"),
|
| 308 |
),
|
| 309 |
axis=1,
|
| 310 |
)
|
| 311 |
-
# Schema
|
| 312 |
-
#
|
| 313 |
-
# results.jsonl.
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
"submission_blob_url": "zip",
|
| 317 |
-
})
|
| 318 |
# Make sure every declared column exists (legacy rows can be
|
| 319 |
# missing optional fields). Detail-panel reads expect the
|
| 320 |
# column-set to be stable regardless of which source rows had
|
|
@@ -335,8 +347,15 @@ def _project_and_format(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:
|
|
| 335 |
out["aggregate_score"] = out.apply(
|
| 336 |
lambda r: _fmt_score(r["aggregate_score"], r["status"]), axis=1,
|
| 337 |
)
|
| 338 |
-
if "
|
| 339 |
-
out["
|
| 340 |
-
|
| 341 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 342 |
return out
|
|
|
|
| 34 |
# Hidden columns ride along in the DataFrame so `Leaderboard.select(...)`
|
| 35 |
# can read them out without a separate state-cache or re-fetch.
|
| 36 |
# Display column names (these are what the rendered table headers
|
| 37 |
+
# read). The visible set is intentionally compact; the detail-panel
|
| 38 |
+
# columns at the end of the list ride along in the DataFrame so the
|
| 39 |
+
# row-click handler can populate from them without a separate state
|
| 40 |
+
# cache. `submission_name` is the primary link (markdown-wrapped at
|
| 41 |
+
# projection time to point at the report when one exists), so we
|
| 42 |
+
# don't need a separate `report` column. `submission_blob_url` lives
|
| 43 |
+
# in the detail panel only.
|
| 44 |
LEADERBOARD_COLS = [
|
| 45 |
"status",
|
| 46 |
"submission_name",
|
|
|
|
| 49 |
"validity_rate",
|
| 50 |
"submitted_at",
|
| 51 |
"cadgenbench_version",
|
| 52 |
+
"model details (optional)",
|
|
|
|
|
|
|
| 53 |
# Detail-panel-only (hidden via `hide_columns` on the widget):
|
| 54 |
"submission_id",
|
| 55 |
"notes",
|
| 56 |
"failure_reason",
|
| 57 |
+
"submission_blob_url",
|
| 58 |
+
"report_url",
|
| 59 |
]
|
| 60 |
|
| 61 |
# Validated table additionally exposes `validation_method`; on the
|
|
|
|
| 70 |
"validation_method",
|
| 71 |
"submitted_at",
|
| 72 |
"cadgenbench_version",
|
| 73 |
+
"model details (optional)",
|
|
|
|
|
|
|
| 74 |
"submission_id",
|
| 75 |
"notes",
|
| 76 |
"failure_reason",
|
| 77 |
+
"submission_blob_url",
|
| 78 |
+
"report_url",
|
| 79 |
]
|
| 80 |
|
| 81 |
# Columns to hide from rendering on both tables. These ride in the
|
| 82 |
# DataFrame so the row-click detail panel can populate from them; the
|
| 83 |
# widget hides them from view.
|
| 84 |
+
LEADERBOARD_HIDE_COLUMNS = [
|
| 85 |
+
"submission_id",
|
| 86 |
+
"notes",
|
| 87 |
+
"failure_reason",
|
| 88 |
+
"submission_blob_url",
|
| 89 |
+
"report_url",
|
| 90 |
+
]
|
| 91 |
|
| 92 |
+
# Per-column gradio_leaderboard datatypes. `submission_name` and
|
| 93 |
+
# `model details (optional)` render their pre-formatted markdown
|
| 94 |
+
# (the submission_name cell links to the report when one exists; the
|
| 95 |
+
# model-details cell is the agent URL or _None_); everything else is
|
| 96 |
+
# plain string (numeric cells get pending / failed status tags
|
| 97 |
+
# applied by _fmt_pct / _fmt_score so they're string-shaped by the
|
| 98 |
+
# time the widget sees them).
|
| 99 |
+
_LINK_COLUMNS = frozenset({"submission_name", "model details (optional)"})
|
| 100 |
|
| 101 |
|
| 102 |
def _datatypes_for(columns: list[str]) -> list[str]:
|
|
|
|
| 209 |
|
| 210 |
|
| 211 |
def _agent_url_md(url) -> str:
|
| 212 |
+
"""Render the `model details (optional)` cell as a markdown link.
|
| 213 |
|
| 214 |
Uses a shortened version of the URL itself as the link text:
|
| 215 |
`agent_url` is a free-form "URL pointing at the agent code or
|
| 216 |
+
paper" per the schema, so the URL itself carries the only honest
|
| 217 |
+
hint about what's behind the click. Missing cells render as
|
| 218 |
+
italic ``_None_`` so a reader sees the field is optional and
|
| 219 |
just wasn't filled, rather than a blank.
|
| 220 |
"""
|
| 221 |
if _is_empty(url):
|
|
|
|
| 223 |
return f"[{_shorten_url_for_display(str(url))}]({url})"
|
| 224 |
|
| 225 |
|
| 226 |
+
def _report_relative_url(submission_id, status, submission_sha256) -> str:
|
| 227 |
+
"""Relative URL to the Space's report-proxy route, or empty.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 228 |
|
| 229 |
+
The Space exposes ``/reports/<id>.html`` which fetches the file
|
| 230 |
+
from the submissions dataset and re-serves it as ``text/html``
|
| 231 |
+
(the dataset's ``/resolve/`` path returns it as ``text/plain``,
|
| 232 |
+
which renders as source). Returning a relative URL means the
|
| 233 |
+
same string works whether the Space is running locally on a
|
| 234 |
+
random port or on huggingface.co.
|
| 235 |
|
| 236 |
+
`reports/<id>.html` only exists for completed rows from the
|
| 237 |
+
modern submit pipeline; legacy seed rows (pre-pipeline,
|
| 238 |
+
``submission_sha256`` null) never had a report uploaded, so this
|
| 239 |
+
function returns empty for them and the caller leaves the
|
| 240 |
+
submission_name cell as plain text rather than a broken link.
|
| 241 |
"""
|
| 242 |
if status != "completed" or _is_empty(submission_id):
|
| 243 |
return ""
|
| 244 |
if _is_empty(submission_sha256):
|
| 245 |
return ""
|
| 246 |
+
return f"/reports/{submission_id}.html"
|
| 247 |
+
|
| 248 |
+
|
| 249 |
+
def _submission_name_md(name, report_url) -> str:
|
| 250 |
+
"""Render `submission_name` as a markdown link when a report exists.
|
| 251 |
+
|
| 252 |
+
The submission name itself is the link target (typical HF
|
| 253 |
+
leaderboard pattern: model / system name doubles as the
|
| 254 |
+
deep-link). Plain text when no report is available.
|
| 255 |
+
"""
|
| 256 |
+
if _is_empty(name):
|
| 257 |
+
return "(unnamed submission)"
|
| 258 |
+
if _is_empty(report_url):
|
| 259 |
+
return str(name)
|
| 260 |
+
return f"[{name}]({report_url})"
|
| 261 |
|
| 262 |
|
| 263 |
def load_leaderboard_split() -> tuple[pd.DataFrame, pd.DataFrame]:
|
|
|
|
| 309 |
if df.empty:
|
| 310 |
return pd.DataFrame(columns=columns)
|
| 311 |
df = df.copy()
|
| 312 |
+
# Compute the relative report URL before projection drops the
|
| 313 |
+
# source columns it depends on. Lives as a hidden column so the
|
| 314 |
+
# detail panel can show "Report: link" and `submission_name` can
|
| 315 |
+
# use it as the link target.
|
| 316 |
if "submission_id" in df.columns and "status" in df.columns:
|
| 317 |
+
df["report_url"] = df.apply(
|
| 318 |
+
lambda r: _report_relative_url(
|
| 319 |
r["submission_id"],
|
| 320 |
r["status"],
|
| 321 |
r.get("submission_sha256"),
|
| 322 |
),
|
| 323 |
axis=1,
|
| 324 |
)
|
| 325 |
+
# Schema field name -> display column header. Keep the rename
|
| 326 |
+
# narrow: only the agent URL gets a friendlier header; the schema
|
| 327 |
+
# field stays `agent_url` in results.jsonl. `submission_blob_url`
|
| 328 |
+
# stays under that name (hidden, detail-panel-only).
|
| 329 |
+
df = df.rename(columns={"agent_url": "model details (optional)"})
|
|
|
|
|
|
|
| 330 |
# Make sure every declared column exists (legacy rows can be
|
| 331 |
# missing optional fields). Detail-panel reads expect the
|
| 332 |
# column-set to be stable regardless of which source rows had
|
|
|
|
| 347 |
out["aggregate_score"] = out.apply(
|
| 348 |
lambda r: _fmt_score(r["aggregate_score"], r["status"]), axis=1,
|
| 349 |
)
|
| 350 |
+
if "submission_name" in out.columns:
|
| 351 |
+
out["submission_name"] = out.apply(
|
| 352 |
+
lambda r: _submission_name_md(
|
| 353 |
+
r["submission_name"], r.get("report_url"),
|
| 354 |
+
),
|
| 355 |
+
axis=1,
|
| 356 |
+
)
|
| 357 |
+
if "model details (optional)" in out.columns:
|
| 358 |
+
out["model details (optional)"] = out["model details (optional)"].apply(
|
| 359 |
+
_agent_url_md
|
| 360 |
+
)
|
| 361 |
return out
|
|
@@ -86,7 +86,11 @@ def test_split_shape(monkeypatch):
|
|
| 86 |
|
| 87 |
|
| 88 |
def test_legacy_row_defaults_to_unvalidated(monkeypatch):
|
| 89 |
-
"""(b) Legacy row with no `validation_status` field lands unvalidated.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
|
| 91 |
validated, unvalidated = leaderboard.load_leaderboard_split()
|
| 92 |
assert "Gamma baseline" in set(unvalidated["submission_name"].tolist())
|
|
@@ -97,7 +101,8 @@ def test_field_passthrough(monkeypatch):
|
|
| 97 |
"""(c) Non-formatted metadata fields and validated-only columns pass through.
|
| 98 |
|
| 99 |
`aggregate_score` and `validity_rate` get status-aware string
|
| 100 |
-
formatting
|
|
|
|
| 101 |
untransformed (``submitter_name``, ``submitted_at``,
|
| 102 |
``cadgenbench_version``) plus the validated-table-only
|
| 103 |
``validation_method``.
|
|
@@ -106,15 +111,15 @@ def test_field_passthrough(monkeypatch):
|
|
| 106 |
validated, unvalidated = leaderboard.load_leaderboard_split()
|
| 107 |
|
| 108 |
assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
|
| 109 |
-
|
| 110 |
-
|
|
|
|
| 111 |
assert beta["submitted_at"] == "2026-05-02T10:00:00Z"
|
| 112 |
assert beta["cadgenbench_version"] == "0.1.0"
|
| 113 |
|
| 114 |
assert list(validated.columns) == leaderboard.VALIDATED_LEADERBOARD_COLS
|
| 115 |
-
alpha = validated[validated["
|
| 116 |
assert alpha["validation_method"] == "code"
|
| 117 |
-
assert alpha["submitter_name"] == "team-alpha"
|
| 118 |
|
| 119 |
|
| 120 |
def test_empty_input_returns_two_empty_frames(monkeypatch):
|
|
@@ -128,56 +133,60 @@ def test_empty_input_returns_two_empty_frames(monkeypatch):
|
|
| 128 |
assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
|
| 129 |
|
| 130 |
|
| 131 |
-
def
|
| 132 |
-
"""`
|
| 133 |
-
|
| 134 |
-
Link cells are ``[label](url)`` strings so the Leaderboard widget
|
| 135 |
-
under ``datatype="markdown"`` produces clickable anchors. The
|
| 136 |
-
``agent`` cell uses the URL itself (scheme stripped) as link
|
| 137 |
-
text so a reader can tell what's behind the click; ``zip`` and
|
| 138 |
-
``report`` use the short fixed labels because they always point
|
| 139 |
-
at our own infrastructure.
|
| 140 |
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
"""
|
| 145 |
monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
|
| 146 |
validated, unvalidated = leaderboard.load_leaderboard_split()
|
| 147 |
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
)
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
assert
|
| 156 |
-
assert "reports/sub-a.html" in alpha["report"]
|
| 157 |
-
|
| 158 |
-
beta = unvalidated[unvalidated["submission_name"] == "Beta Agent v2"].iloc[0]
|
| 159 |
-
assert beta["agent"] == "_None_"
|
| 160 |
-
assert beta["zip"].startswith("[zip](")
|
| 161 |
-
# Beta has submission_sha256, so the report link is emitted.
|
| 162 |
-
assert beta["report"].startswith("[report](")
|
| 163 |
|
| 164 |
|
| 165 |
-
def
|
| 166 |
-
"""Rows without
|
| 167 |
|
| 168 |
-
`
|
| 169 |
-
|
| 170 |
-
a report uploaded, so the /resolve/ URL would 404. Gate on
|
| 171 |
-
``submission_sha256`` (the schema's "modern pipeline" sentinel)
|
| 172 |
-
so the leaderboard doesn't render a broken link.
|
| 173 |
"""
|
| 174 |
monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
|
| 175 |
_, unvalidated = leaderboard.load_leaderboard_split()
|
| 176 |
gamma = unvalidated[unvalidated["submission_name"] == "Gamma baseline"].iloc[0]
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
assert
|
| 180 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 181 |
|
| 182 |
|
| 183 |
def test_datatypes_align_with_columns():
|
|
@@ -195,11 +204,12 @@ def test_datatypes_align_with_columns():
|
|
| 195 |
len(leaderboard.VALIDATED_LEADERBOARD_DATATYPES)
|
| 196 |
== len(leaderboard.VALIDATED_LEADERBOARD_COLS)
|
| 197 |
)
|
| 198 |
-
#
|
|
|
|
| 199 |
for col, dt in zip(
|
| 200 |
leaderboard.LEADERBOARD_COLS, leaderboard.LEADERBOARD_DATATYPES
|
| 201 |
):
|
| 202 |
-
if col in
|
| 203 |
-
assert dt == "markdown"
|
| 204 |
else:
|
| 205 |
-
assert dt == "str"
|
|
|
|
| 86 |
|
| 87 |
|
| 88 |
def test_legacy_row_defaults_to_unvalidated(monkeypatch):
|
| 89 |
+
"""(b) Legacy row with no `validation_status` field lands unvalidated.
|
| 90 |
+
|
| 91 |
+
Legacy rows keep plain-text submission_name (no report exists),
|
| 92 |
+
so identity check is straightforward equality.
|
| 93 |
+
"""
|
| 94 |
monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
|
| 95 |
validated, unvalidated = leaderboard.load_leaderboard_split()
|
| 96 |
assert "Gamma baseline" in set(unvalidated["submission_name"].tolist())
|
|
|
|
| 101 |
"""(c) Non-formatted metadata fields and validated-only columns pass through.
|
| 102 |
|
| 103 |
`aggregate_score` and `validity_rate` get status-aware string
|
| 104 |
+
formatting; `submission_name` gets wrapped into a markdown link
|
| 105 |
+
on modern rows. Passthrough is checked on fields that survive
|
| 106 |
untransformed (``submitter_name``, ``submitted_at``,
|
| 107 |
``cadgenbench_version``) plus the validated-table-only
|
| 108 |
``validation_method``.
|
|
|
|
| 111 |
validated, unvalidated = leaderboard.load_leaderboard_split()
|
| 112 |
|
| 113 |
assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
|
| 114 |
+
# Use submitter_name to identify Beta (its submission_name is now
|
| 115 |
+
# wrapped as a markdown link).
|
| 116 |
+
beta = unvalidated[unvalidated["submitter_name"] == "team-beta"].iloc[0]
|
| 117 |
assert beta["submitted_at"] == "2026-05-02T10:00:00Z"
|
| 118 |
assert beta["cadgenbench_version"] == "0.1.0"
|
| 119 |
|
| 120 |
assert list(validated.columns) == leaderboard.VALIDATED_LEADERBOARD_COLS
|
| 121 |
+
alpha = validated[validated["submitter_name"] == "team-alpha"].iloc[0]
|
| 122 |
assert alpha["validation_method"] == "code"
|
|
|
|
| 123 |
|
| 124 |
|
| 125 |
def test_empty_input_returns_two_empty_frames(monkeypatch):
|
|
|
|
| 133 |
assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
|
| 134 |
|
| 135 |
|
| 136 |
+
def test_submission_name_links_to_report_when_available(monkeypatch):
|
| 137 |
+
"""`submission_name` cell is a markdown link to the proxy when a report exists.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
|
| 139 |
+
Modern rows (status=="completed" + non-null submission_sha256)
|
| 140 |
+
get ``[<name>](/reports/<sid>.html)``; the relative URL targets
|
| 141 |
+
the Space's report proxy.
|
| 142 |
"""
|
| 143 |
monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
|
| 144 |
validated, unvalidated = leaderboard.load_leaderboard_split()
|
| 145 |
|
| 146 |
+
# Alpha is a modern completed row -> linked name.
|
| 147 |
+
alpha = validated.iloc[0]
|
| 148 |
+
assert alpha["submission_name"] == "[Alpha Agent v1](/reports/sub-a.html)"
|
| 149 |
+
# Beta likewise (also completed, has sha256).
|
| 150 |
+
beta = unvalidated[
|
| 151 |
+
unvalidated["submission_name"].str.contains("Beta Agent v2", regex=False)
|
| 152 |
+
].iloc[0]
|
| 153 |
+
assert beta["submission_name"] == "[Beta Agent v2](/reports/sub-b.html)"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 154 |
|
| 155 |
|
| 156 |
+
def test_legacy_row_submission_name_is_plain_text(monkeypatch):
|
| 157 |
+
"""Rows without submission_sha256 (legacy seeds) keep plain-text names.
|
| 158 |
|
| 159 |
+
No `reports/<id>.html` exists on the dataset for those rows, so
|
| 160 |
+
we don't wrap the name in a link.
|
|
|
|
|
|
|
|
|
|
| 161 |
"""
|
| 162 |
monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
|
| 163 |
_, unvalidated = leaderboard.load_leaderboard_split()
|
| 164 |
gamma = unvalidated[unvalidated["submission_name"] == "Gamma baseline"].iloc[0]
|
| 165 |
+
# Plain text: no `[...](...)` wrapping.
|
| 166 |
+
assert "[" not in gamma["submission_name"]
|
| 167 |
+
assert "](" not in gamma["submission_name"]
|
| 168 |
+
# And the hidden report_url column is empty for the same reason.
|
| 169 |
+
assert gamma["report_url"] == ""
|
| 170 |
+
|
| 171 |
+
|
| 172 |
+
def test_model_details_column_renders(monkeypatch):
|
| 173 |
+
"""`model details (optional)` cell carries the agent URL or `_None_`.
|
| 174 |
+
|
| 175 |
+
Cell uses the shortened URL as link text (honest about what's
|
| 176 |
+
behind the click). Missing agent_url -> italic placeholder.
|
| 177 |
+
"""
|
| 178 |
+
monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
|
| 179 |
+
validated, unvalidated = leaderboard.load_leaderboard_split()
|
| 180 |
+
|
| 181 |
+
alpha = validated.iloc[0]
|
| 182 |
+
assert alpha["model details (optional)"] == (
|
| 183 |
+
"[github.com/example/alpha-agent](https://github.com/example/alpha-agent)"
|
| 184 |
+
)
|
| 185 |
+
|
| 186 |
+
beta = unvalidated[
|
| 187 |
+
unvalidated["submission_name"].str.contains("Beta Agent v2", regex=False)
|
| 188 |
+
].iloc[0]
|
| 189 |
+
assert beta["model details (optional)"] == "_None_"
|
| 190 |
|
| 191 |
|
| 192 |
def test_datatypes_align_with_columns():
|
|
|
|
| 204 |
len(leaderboard.VALIDATED_LEADERBOARD_DATATYPES)
|
| 205 |
== len(leaderboard.VALIDATED_LEADERBOARD_COLS)
|
| 206 |
)
|
| 207 |
+
# Markdown cells: the two link columns. Everything else is str.
|
| 208 |
+
markdown_cols = {"submission_name", "model details (optional)"}
|
| 209 |
for col, dt in zip(
|
| 210 |
leaderboard.LEADERBOARD_COLS, leaderboard.LEADERBOARD_DATATYPES
|
| 211 |
):
|
| 212 |
+
if col in markdown_cols:
|
| 213 |
+
assert dt == "markdown", f"{col} should be markdown"
|
| 214 |
else:
|
| 215 |
+
assert dt == "str", f"{col} should be str"
|
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Unit tests for the per-submission report proxy route.
|
| 2 |
+
|
| 3 |
+
The Space exposes ``/reports/{submission_id}.html`` which fetches the
|
| 4 |
+
file from the submissions dataset and re-serves it as
|
| 5 |
+
``Content-Type: text/html`` (the dataset's ``/resolve/`` returns it
|
| 6 |
+
as ``text/plain``, which makes the browser show source). Tests cover
|
| 7 |
+
the handler's response shape with the fetch monkeypatched out, so
|
| 8 |
+
the suite has zero network I/O.
|
| 9 |
+
"""
|
| 10 |
+
from __future__ import annotations
|
| 11 |
+
|
| 12 |
+
import app
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def test_serve_report_returns_html_when_file_exists(monkeypatch):
|
| 16 |
+
"""Successful fetch -> 200 + text/html + body passthrough."""
|
| 17 |
+
monkeypatch.setattr(
|
| 18 |
+
app,
|
| 19 |
+
"_fetch_report_html",
|
| 20 |
+
lambda sid: b"<!DOCTYPE html><html><body>ok</body></html>",
|
| 21 |
+
)
|
| 22 |
+
resp = app.serve_report("sub-test")
|
| 23 |
+
assert resp.status_code == 200
|
| 24 |
+
assert resp.media_type.startswith("text/html")
|
| 25 |
+
assert resp.body == b"<!DOCTYPE html><html><body>ok</body></html>"
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def test_serve_report_returns_404_when_file_missing(monkeypatch):
|
| 29 |
+
"""``_fetch_report_html`` returning None -> 404 with a small html body."""
|
| 30 |
+
monkeypatch.setattr(app, "_fetch_report_html", lambda sid: None)
|
| 31 |
+
resp = app.serve_report("sub-missing")
|
| 32 |
+
assert resp.status_code == 404
|
| 33 |
+
# 404 body is still HTML so the browser renders the message.
|
| 34 |
+
assert "Report not found" in resp.body.decode("utf-8")
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def test_fetch_report_html_returns_none_on_hub_failure(monkeypatch):
|
| 38 |
+
"""A Hub-side exception is caught and surfaced as None.
|
| 39 |
+
|
| 40 |
+
The serve handler relies on this to keep a transient Hub blip
|
| 41 |
+
from leaking a stack trace into the Space's HTTP response.
|
| 42 |
+
"""
|
| 43 |
+
def boom(*a, **kw):
|
| 44 |
+
raise RuntimeError("simulated Hub failure")
|
| 45 |
+
|
| 46 |
+
monkeypatch.setattr(app, "hf_hub_download", boom)
|
| 47 |
+
# The lru_cache on _fetch_report_html caches by arg; use a unique
|
| 48 |
+
# id per test so prior runs don't shortcut this one.
|
| 49 |
+
assert app._fetch_report_html("sub-failure-probe-unique-1") is None
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
def test_proxy_route_is_registered():
|
| 53 |
+
"""The mounted FastAPI app exposes ``/reports/{submission_id}.html`` as GET.
|
| 54 |
+
|
| 55 |
+
Catches the regression where the ``add_api_route`` call moves
|
| 56 |
+
below ``mount_gradio_app`` (which would still register the route
|
| 57 |
+
but make this regression silent until someone tries to hit it).
|
| 58 |
+
"""
|
| 59 |
+
routes = [getattr(r, "path", None) for r in app.app.routes]
|
| 60 |
+
assert "/reports/{submission_id}.html" in routes
|