Michael Rabinovich commited on
Commit
77edebf
·
1 Parent(s): 1a8f331

app+leaderboard: link submission_name to a Space-side report proxy

Browse files

Polish follow-up #2 on C6. Three user-visible changes plus the
plumbing required to make them work.

1. Drop the `report` and `zip` columns from the rendered table.
`submission_name` becomes the report's deep link (typical HF
leaderboard pattern: the system / model name doubles as the link
target; cf. Open LLM Leaderboard, DABstep, BigCodeBench). Modern
completed rows get `[<name>](/reports/<sid>.html)`; pending,
failed, and legacy seed rows fall back to plain text so there's
no broken click. The hidden `submission_blob_url` and the
computed `report_url` ride along in the DataFrame and surface in
the row-click detail panel only.

2. Rename the agent-URL column from `agent` to
`model details (optional)`. Missing cells render as italic
`_None_` so the optionality is explicit instead of a blank.
"Agent" was confusing; the schema lets `agent_url` be code OR
paper OR a blog post and "agent" reads narrow.

3. Add a Space-side proxy for the per-submission HTML report at
`/reports/{submission_id}.html`. HF Hub's /resolve/ serves
dataset HTML as `Content-Type: text/plain` for security, so a
direct dataset link renders source rather than the rendered
report. The proxy fetches the file via `hf_hub_download` (with
an in-memory LRU cache, 128 entries) and re-streams it with
`Content-Type: text/html`. Reports actually render.

Plumbing:
- app.py flips its launch shape from `with gr.Blocks() as app /
app.launch()` to building a parent FastAPI app, registering the
proxy route, and mounting Gradio under "/" via
`gr.mount_gradio_app`. Direct FastAPI routes are evaluated before
the mounted sub-app, so `/reports/<sid>.html` never gets shadowed
by Gradio's catch-all. `uvicorn.run` reads the same
GRADIO_SERVER_NAME / GRADIO_SERVER_PORT env vars the old launch
did, so the Dockerfile's CMD doesn't need to change. (uvicorn is
already a gradio transitive dep, no new requirements line.)
- The relative URL `/reports/<sid>.html` in the markdown link works
on any host (localhost during dev, *.hf.space in prod) without
the leaderboard reader needing to know its own URL.
- `_report_relative_url` keeps the existing modern-pipeline gate
(status == "completed" AND submission_sha256 non-null); the
three baseline seed rows still don't get a link target.

Tests (15/15 green):
- test_leaderboard: existing assertions updated for the
submission_name link wrapping; new test_submission_name_links_to_
report_when_available + test_legacy_row_submission_name_is_
plain_text + test_model_details_column_renders.
- New tests/test_proxy.py: serve_report behavior with monkeypatched
_fetch_report_html (success / missing / Hub failure) + the
route-registration regression guard.

Verification loop change (per user request): every UI/route change
gets booted via subprocess locally and probed with HTTP before push.
For this commit:
- Boot app.py on a free port.
- GET /reports/<known-modern-sid>.html -> 200, text/html, body
starts with <!DOCTYPE html, length ~1.4MB.
- GET /reports/<bogus-sid>.html -> 404.
Both PASS. The Space rebuild is now a final sanity check, not the
verify loop.

Files changed (4) hide show
  1. app.py +87 -14
  2. leaderboard.py +82 -63
  3. tests/test_leaderboard.py +58 -48
  4. tests/test_proxy.py +60 -0
app.py CHANGED
@@ -1,17 +1,28 @@
1
- """CADGenBench Leaderboard Space - Gradio UI assembly.
2
 
3
  Read path lives in :mod:`leaderboard`. Submit-tab validation lives in
4
- :mod:`submit`. Both are wired into the Gradio app below.
 
 
 
 
 
5
  """
6
  from __future__ import annotations
7
 
8
  import logging
 
9
  import re
 
 
10
 
11
  import gradio as gr
12
- from gradio_leaderboard import Leaderboard
13
-
14
  import pandas as pd
 
 
 
 
 
15
 
16
  from leaderboard import (
17
  HF_DATA_REPO,
@@ -23,6 +34,8 @@ from leaderboard import (
23
  )
24
  from submit import handle_submit
25
 
 
 
26
  # Surface module-level logger.info / logger.warning / logger.exception
27
  # calls from leaderboard.py + submit.py in the Space's runtime logs.
28
  # Otherwise they go nowhere and any refresh / worker pathology is
@@ -109,20 +122,65 @@ def _format_detail(df: pd.DataFrame | None, evt: gr.SelectData) -> str:
109
  lines.append(f"- **Submitted**: {_fmt_timestamp(row['submitted_at'])}")
110
  if _has(row.get("notes")):
111
  lines.append(f"- **Notes**: {row['notes']}")
112
- # Display columns from leaderboard.py's _project_and_format:
113
- # `agent` carries the markdown link (or "_None_" when missing);
114
- # `zip` and `report` are empty strings when not applicable.
115
- lines.append(f"- **Agent**: {row.get('agent') or '_None_'}")
116
- if _has(row.get("zip")):
117
- lines.append(f"- **Submission**: {row['zip']}")
118
- if _has(row.get("report")):
119
- lines.append(f"- **Report**: {row['report']}")
 
 
 
120
  if row.get("status") == "failed" and _has(row.get("failure_reason")):
121
  lines.append(f"- **Failure reason**: {row['failure_reason']}")
122
  return "\n".join(lines)
123
 
124
 
125
- with gr.Blocks(title="CADGenBench Leaderboard", theme=gr.themes.Soft()) as app:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  gr.Markdown(
127
  "# CADGenBench Leaderboard\n"
128
  "_Benchmarking AI-driven CAD generation._"
@@ -228,5 +286,20 @@ to publish the resulting row on the public leaderboard.
228
  )
229
 
230
 
 
 
 
 
 
 
 
 
 
 
 
 
 
231
  if __name__ == "__main__":
232
- app.launch()
 
 
 
1
+ """CADGenBench Leaderboard Space - Gradio UI + report-proxy mount.
2
 
3
  Read path lives in :mod:`leaderboard`. Submit-tab validation lives in
4
+ :mod:`submit`. Both are wired into the Gradio Blocks below. The
5
+ Gradio app is mounted under a FastAPI parent so the custom
6
+ ``/reports/{submission_id}.html`` route can re-serve dataset HTML
7
+ with ``Content-Type: text/html`` (HF Hub's ``/resolve/`` serves it
8
+ as ``text/plain`` by policy, which makes the browser show source
9
+ rather than render).
10
  """
11
  from __future__ import annotations
12
 
13
  import logging
14
+ import os
15
  import re
16
+ from functools import lru_cache
17
+ from pathlib import Path
18
 
19
  import gradio as gr
 
 
20
  import pandas as pd
21
+ import uvicorn
22
+ from fastapi import FastAPI
23
+ from fastapi.responses import HTMLResponse, Response
24
+ from gradio_leaderboard import Leaderboard
25
+ from huggingface_hub import hf_hub_download
26
 
27
  from leaderboard import (
28
  HF_DATA_REPO,
 
34
  )
35
  from submit import handle_submit
36
 
37
+ logger = logging.getLogger(__name__)
38
+
39
  # Surface module-level logger.info / logger.warning / logger.exception
40
  # calls from leaderboard.py + submit.py in the Space's runtime logs.
41
  # Otherwise they go nowhere and any refresh / worker pathology is
 
122
  lines.append(f"- **Submitted**: {_fmt_timestamp(row['submitted_at'])}")
123
  if _has(row.get("notes")):
124
  lines.append(f"- **Notes**: {row['notes']}")
125
+ # `model details (optional)` carries the markdown link (or
126
+ # _None_ when missing); the hidden `submission_blob_url` /
127
+ # `report_url` columns are raw URLs we wrap into named links here.
128
+ lines.append(
129
+ f"- **Model details (optional)**: "
130
+ f"{row.get('model details (optional)') or '_None_'}"
131
+ )
132
+ if _has(row.get("submission_blob_url")):
133
+ lines.append(f"- **Submission ZIP**: [download]({row['submission_blob_url']})")
134
+ if _has(row.get("report_url")):
135
+ lines.append(f"- **Report**: [open]({row['report_url']})")
136
  if row.get("status") == "failed" and _has(row.get("failure_reason")):
137
  lines.append(f"- **Failure reason**: {row['failure_reason']}")
138
  return "\n".join(lines)
139
 
140
 
141
+ @lru_cache(maxsize=128)
142
+ def _fetch_report_html(submission_id: str) -> bytes | None:
143
+ """Pull ``reports/<id>.html`` off the submissions dataset.
144
+
145
+ Cached in-process so repeat clicks on the same row don't hit
146
+ the Hub. Returns ``None`` on any failure so the caller can
147
+ serve a clean 404 rather than leaking a stack trace.
148
+ """
149
+ try:
150
+ local_path = hf_hub_download(
151
+ repo_id=HF_SUBMISSIONS_REPO,
152
+ filename=f"reports/{submission_id}.html",
153
+ repo_type="dataset",
154
+ )
155
+ return Path(local_path).read_bytes()
156
+ except Exception as e: # noqa: BLE001 - any Hub failure -> 404
157
+ logger.warning(
158
+ "Failed to fetch report for %s (%s: %s)",
159
+ submission_id, type(e).__name__, e,
160
+ )
161
+ return None
162
+
163
+
164
+ def serve_report(submission_id: str) -> Response:
165
+ """Proxy a per-submission HTML report through the Space.
166
+
167
+ HF Hub serves dataset HTML under ``/resolve/`` with
168
+ ``Content-Type: text/plain`` (security: dataset files can't host
169
+ live HTML), so a direct dataset link shows source instead of
170
+ rendering. This route lives on the Space (which can legitimately
171
+ serve text/html) and re-streams the file's bytes with the right
172
+ content-type.
173
+ """
174
+ content = _fetch_report_html(submission_id)
175
+ if content is None:
176
+ return HTMLResponse(
177
+ content="<h1>Report not found</h1>",
178
+ status_code=404,
179
+ )
180
+ return Response(content=content, media_type="text/html; charset=utf-8")
181
+
182
+
183
+ with gr.Blocks(title="CADGenBench Leaderboard", theme=gr.themes.Soft()) as blocks:
184
  gr.Markdown(
185
  "# CADGenBench Leaderboard\n"
186
  "_Benchmarking AI-driven CAD generation._"
 
286
  )
287
 
288
 
289
+ # Mount Gradio under a FastAPI parent so the custom proxy route
290
+ # above lives at the same origin as the UI. Direct routes on `app`
291
+ # get checked before the Gradio sub-app, so `/reports/<sid>.html`
292
+ # never gets shadowed.
293
+ app = FastAPI()
294
+ app.add_api_route(
295
+ "/reports/{submission_id}.html",
296
+ serve_report,
297
+ methods=["GET"],
298
+ )
299
+ app = gr.mount_gradio_app(app, blocks, path="/")
300
+
301
+
302
  if __name__ == "__main__":
303
+ host = os.getenv("GRADIO_SERVER_NAME", "0.0.0.0")
304
+ port = int(os.getenv("GRADIO_SERVER_PORT", "7860"))
305
+ uvicorn.run(app, host=host, port=port)
leaderboard.py CHANGED
@@ -34,10 +34,13 @@ HUB_FETCH_TIMEOUT_SECONDS = 30
34
  # Hidden columns ride along in the DataFrame so `Leaderboard.select(...)`
35
  # can read them out without a separate state-cache or re-fetch.
36
  # Display column names (these are what the rendered table headers
37
- # read). The schema fields `agent_url` and `submission_blob_url` get
38
- # renamed at projection time to `agent` / `zip` so the rendered
39
- # headers stay short and uniform; the `report` column is computed
40
- # fresh from `submission_id`.
 
 
 
41
  LEADERBOARD_COLS = [
42
  "status",
43
  "submission_name",
@@ -46,13 +49,13 @@ LEADERBOARD_COLS = [
46
  "validity_rate",
47
  "submitted_at",
48
  "cadgenbench_version",
49
- "agent",
50
- "zip",
51
- "report",
52
  # Detail-panel-only (hidden via `hide_columns` on the widget):
53
  "submission_id",
54
  "notes",
55
  "failure_reason",
 
 
56
  ]
57
 
58
  # Validated table additionally exposes `validation_method`; on the
@@ -67,25 +70,33 @@ VALIDATED_LEADERBOARD_COLS = [
67
  "validation_method",
68
  "submitted_at",
69
  "cadgenbench_version",
70
- "agent",
71
- "zip",
72
- "report",
73
  "submission_id",
74
  "notes",
75
  "failure_reason",
 
 
76
  ]
77
 
78
  # Columns to hide from rendering on both tables. These ride in the
79
  # DataFrame so the row-click detail panel can populate from them; the
80
  # widget hides them from view.
81
- LEADERBOARD_HIDE_COLUMNS = ["submission_id", "notes", "failure_reason"]
 
 
 
 
 
 
82
 
83
- # Per-column gradio_leaderboard datatypes. Link columns render their
84
- # pre-formatted markdown; everything else is plain string (numeric
85
- # cells get pending / failed status tags applied by _fmt_pct /
86
- # _fmt_score so they're string-shaped by the time the widget sees
87
- # them).
88
- _LINK_COLUMNS = frozenset({"agent", "zip", "report"})
 
 
89
 
90
 
91
  def _datatypes_for(columns: list[str]) -> list[str]:
@@ -198,13 +209,13 @@ def _shorten_url_for_display(url: str) -> str:
198
 
199
 
200
  def _agent_url_md(url) -> str:
201
- """Render the `agent` cell as a markdown link.
202
 
203
  Uses a shortened version of the URL itself as the link text:
204
  `agent_url` is a free-form "URL pointing at the agent code or
205
- paper" per the schema, so the URL itself carries the only
206
- honest hint about what's behind the click. Missing cells render
207
- as italic `_None_` so a reader sees the field is optional and
208
  just wasn't filled, rather than a blank.
209
  """
210
  if _is_empty(url):
@@ -212,39 +223,41 @@ def _agent_url_md(url) -> str:
212
  return f"[{_shorten_url_for_display(str(url))}]({url})"
213
 
214
 
215
- def _submission_blob_md(url) -> str:
216
- """Render the `zip` cell as a markdown link.
217
-
218
- Link text stays "zip" (the URL points at our own infrastructure
219
- and adds no extra information for the reader).
220
- """
221
- if _is_empty(url):
222
- return ""
223
- return f"[zip]({url})"
224
-
225
-
226
- def _report_url_md(submission_id, status, submission_sha256) -> str:
227
- """Build the report URL, only when the report file is known to exist.
228
 
229
- `reports/<id>.html` is uploaded by the post-eval worker in the
230
- modern submit pipeline. Legacy rows that pre-date that pipeline
231
- (the three baseline seed rows; identifiable by ``submission_sha256``
232
- being null, per the schema's compatibility note) never had a
233
- report uploaded, so the link would 404. Gate on
234
- ``submission_sha256`` to keep the rendered link honest.
235
 
236
- ``/resolve/main/`` (matching the convention used by the submit
237
- handler for ``submission_blob_url``) serves the file with its
238
- content type so the browser renders the HTML report directly.
 
 
239
  """
240
  if status != "completed" or _is_empty(submission_id):
241
  return ""
242
  if _is_empty(submission_sha256):
243
  return ""
244
- return (
245
- f"[report](https://huggingface.co/datasets/{HF_SUBMISSIONS_REPO}"
246
- f"/resolve/main/reports/{submission_id}.html)"
247
- )
 
 
 
 
 
 
 
 
 
 
 
248
 
249
 
250
  def load_leaderboard_split() -> tuple[pd.DataFrame, pd.DataFrame]:
@@ -296,25 +309,24 @@ def _project_and_format(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:
296
  if df.empty:
297
  return pd.DataFrame(columns=columns)
298
  df = df.copy()
299
- # Compute `report` (not stored on the row) before projection drops
300
- # the source columns it needs. Path is derived from `submission_id`
301
- # so a layout change doesn't require a results.jsonl rewrite.
 
302
  if "submission_id" in df.columns and "status" in df.columns:
303
- df["report"] = df.apply(
304
- lambda r: _report_url_md(
305
  r["submission_id"],
306
  r["status"],
307
  r.get("submission_sha256"),
308
  ),
309
  axis=1,
310
  )
311
- # Schema-field names -> display-column names. Keeps the rendered
312
- # headers short and uniform without renaming anything in
313
- # results.jsonl.
314
- df = df.rename(columns={
315
- "agent_url": "agent",
316
- "submission_blob_url": "zip",
317
- })
318
  # Make sure every declared column exists (legacy rows can be
319
  # missing optional fields). Detail-panel reads expect the
320
  # column-set to be stable regardless of which source rows had
@@ -335,8 +347,15 @@ def _project_and_format(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:
335
  out["aggregate_score"] = out.apply(
336
  lambda r: _fmt_score(r["aggregate_score"], r["status"]), axis=1,
337
  )
338
- if "agent" in out.columns:
339
- out["agent"] = out["agent"].apply(_agent_url_md)
340
- if "zip" in out.columns:
341
- out["zip"] = out["zip"].apply(_submission_blob_md)
 
 
 
 
 
 
 
342
  return out
 
34
  # Hidden columns ride along in the DataFrame so `Leaderboard.select(...)`
35
  # can read them out without a separate state-cache or re-fetch.
36
  # Display column names (these are what the rendered table headers
37
+ # read). The visible set is intentionally compact; the detail-panel
38
+ # columns at the end of the list ride along in the DataFrame so the
39
+ # row-click handler can populate from them without a separate state
40
+ # cache. `submission_name` is the primary link (markdown-wrapped at
41
+ # projection time to point at the report when one exists), so we
42
+ # don't need a separate `report` column. `submission_blob_url` lives
43
+ # in the detail panel only.
44
  LEADERBOARD_COLS = [
45
  "status",
46
  "submission_name",
 
49
  "validity_rate",
50
  "submitted_at",
51
  "cadgenbench_version",
52
+ "model details (optional)",
 
 
53
  # Detail-panel-only (hidden via `hide_columns` on the widget):
54
  "submission_id",
55
  "notes",
56
  "failure_reason",
57
+ "submission_blob_url",
58
+ "report_url",
59
  ]
60
 
61
  # Validated table additionally exposes `validation_method`; on the
 
70
  "validation_method",
71
  "submitted_at",
72
  "cadgenbench_version",
73
+ "model details (optional)",
 
 
74
  "submission_id",
75
  "notes",
76
  "failure_reason",
77
+ "submission_blob_url",
78
+ "report_url",
79
  ]
80
 
81
  # Columns to hide from rendering on both tables. These ride in the
82
  # DataFrame so the row-click detail panel can populate from them; the
83
  # widget hides them from view.
84
+ LEADERBOARD_HIDE_COLUMNS = [
85
+ "submission_id",
86
+ "notes",
87
+ "failure_reason",
88
+ "submission_blob_url",
89
+ "report_url",
90
+ ]
91
 
92
+ # Per-column gradio_leaderboard datatypes. `submission_name` and
93
+ # `model details (optional)` render their pre-formatted markdown
94
+ # (the submission_name cell links to the report when one exists; the
95
+ # model-details cell is the agent URL or _None_); everything else is
96
+ # plain string (numeric cells get pending / failed status tags
97
+ # applied by _fmt_pct / _fmt_score so they're string-shaped by the
98
+ # time the widget sees them).
99
+ _LINK_COLUMNS = frozenset({"submission_name", "model details (optional)"})
100
 
101
 
102
  def _datatypes_for(columns: list[str]) -> list[str]:
 
209
 
210
 
211
  def _agent_url_md(url) -> str:
212
+ """Render the `model details (optional)` cell as a markdown link.
213
 
214
  Uses a shortened version of the URL itself as the link text:
215
  `agent_url` is a free-form "URL pointing at the agent code or
216
+ paper" per the schema, so the URL itself carries the only honest
217
+ hint about what's behind the click. Missing cells render as
218
+ italic ``_None_`` so a reader sees the field is optional and
219
  just wasn't filled, rather than a blank.
220
  """
221
  if _is_empty(url):
 
223
  return f"[{_shorten_url_for_display(str(url))}]({url})"
224
 
225
 
226
+ def _report_relative_url(submission_id, status, submission_sha256) -> str:
227
+ """Relative URL to the Space's report-proxy route, or empty.
 
 
 
 
 
 
 
 
 
 
 
228
 
229
+ The Space exposes ``/reports/<id>.html`` which fetches the file
230
+ from the submissions dataset and re-serves it as ``text/html``
231
+ (the dataset's ``/resolve/`` path returns it as ``text/plain``,
232
+ which renders as source). Returning a relative URL means the
233
+ same string works whether the Space is running locally on a
234
+ random port or on huggingface.co.
235
 
236
+ `reports/<id>.html` only exists for completed rows from the
237
+ modern submit pipeline; legacy seed rows (pre-pipeline,
238
+ ``submission_sha256`` null) never had a report uploaded, so this
239
+ function returns empty for them and the caller leaves the
240
+ submission_name cell as plain text rather than a broken link.
241
  """
242
  if status != "completed" or _is_empty(submission_id):
243
  return ""
244
  if _is_empty(submission_sha256):
245
  return ""
246
+ return f"/reports/{submission_id}.html"
247
+
248
+
249
+ def _submission_name_md(name, report_url) -> str:
250
+ """Render `submission_name` as a markdown link when a report exists.
251
+
252
+ The submission name itself is the link target (typical HF
253
+ leaderboard pattern: model / system name doubles as the
254
+ deep-link). Plain text when no report is available.
255
+ """
256
+ if _is_empty(name):
257
+ return "(unnamed submission)"
258
+ if _is_empty(report_url):
259
+ return str(name)
260
+ return f"[{name}]({report_url})"
261
 
262
 
263
  def load_leaderboard_split() -> tuple[pd.DataFrame, pd.DataFrame]:
 
309
  if df.empty:
310
  return pd.DataFrame(columns=columns)
311
  df = df.copy()
312
+ # Compute the relative report URL before projection drops the
313
+ # source columns it depends on. Lives as a hidden column so the
314
+ # detail panel can show "Report: link" and `submission_name` can
315
+ # use it as the link target.
316
  if "submission_id" in df.columns and "status" in df.columns:
317
+ df["report_url"] = df.apply(
318
+ lambda r: _report_relative_url(
319
  r["submission_id"],
320
  r["status"],
321
  r.get("submission_sha256"),
322
  ),
323
  axis=1,
324
  )
325
+ # Schema field name -> display column header. Keep the rename
326
+ # narrow: only the agent URL gets a friendlier header; the schema
327
+ # field stays `agent_url` in results.jsonl. `submission_blob_url`
328
+ # stays under that name (hidden, detail-panel-only).
329
+ df = df.rename(columns={"agent_url": "model details (optional)"})
 
 
330
  # Make sure every declared column exists (legacy rows can be
331
  # missing optional fields). Detail-panel reads expect the
332
  # column-set to be stable regardless of which source rows had
 
347
  out["aggregate_score"] = out.apply(
348
  lambda r: _fmt_score(r["aggregate_score"], r["status"]), axis=1,
349
  )
350
+ if "submission_name" in out.columns:
351
+ out["submission_name"] = out.apply(
352
+ lambda r: _submission_name_md(
353
+ r["submission_name"], r.get("report_url"),
354
+ ),
355
+ axis=1,
356
+ )
357
+ if "model details (optional)" in out.columns:
358
+ out["model details (optional)"] = out["model details (optional)"].apply(
359
+ _agent_url_md
360
+ )
361
  return out
tests/test_leaderboard.py CHANGED
@@ -86,7 +86,11 @@ def test_split_shape(monkeypatch):
86
 
87
 
88
  def test_legacy_row_defaults_to_unvalidated(monkeypatch):
89
- """(b) Legacy row with no `validation_status` field lands unvalidated."""
 
 
 
 
90
  monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
91
  validated, unvalidated = leaderboard.load_leaderboard_split()
92
  assert "Gamma baseline" in set(unvalidated["submission_name"].tolist())
@@ -97,7 +101,8 @@ def test_field_passthrough(monkeypatch):
97
  """(c) Non-formatted metadata fields and validated-only columns pass through.
98
 
99
  `aggregate_score` and `validity_rate` get status-aware string
100
- formatting, so passthrough is checked on fields that survive
 
101
  untransformed (``submitter_name``, ``submitted_at``,
102
  ``cadgenbench_version``) plus the validated-table-only
103
  ``validation_method``.
@@ -106,15 +111,15 @@ def test_field_passthrough(monkeypatch):
106
  validated, unvalidated = leaderboard.load_leaderboard_split()
107
 
108
  assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
109
- beta = unvalidated[unvalidated["submission_name"] == "Beta Agent v2"].iloc[0]
110
- assert beta["submitter_name"] == "team-beta"
 
111
  assert beta["submitted_at"] == "2026-05-02T10:00:00Z"
112
  assert beta["cadgenbench_version"] == "0.1.0"
113
 
114
  assert list(validated.columns) == leaderboard.VALIDATED_LEADERBOARD_COLS
115
- alpha = validated[validated["submission_name"] == "Alpha Agent v1"].iloc[0]
116
  assert alpha["validation_method"] == "code"
117
- assert alpha["submitter_name"] == "team-alpha"
118
 
119
 
120
  def test_empty_input_returns_two_empty_frames(monkeypatch):
@@ -128,56 +133,60 @@ def test_empty_input_returns_two_empty_frames(monkeypatch):
128
  assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
129
 
130
 
131
- def test_link_columns_render_as_markdown(monkeypatch):
132
- """`agent` / `zip` / `report` columns render as markdown links.
133
-
134
- Link cells are ``[label](url)`` strings so the Leaderboard widget
135
- under ``datatype="markdown"`` produces clickable anchors. The
136
- ``agent`` cell uses the URL itself (scheme stripped) as link
137
- text so a reader can tell what's behind the click; ``zip`` and
138
- ``report`` use the short fixed labels because they always point
139
- at our own infrastructure.
140
 
141
- Missing ``agent_url`` renders as ``_None_`` (italic placeholder
142
- so the reader sees the field exists but wasn't filled), not a
143
- blank cell.
144
  """
145
  monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
146
  validated, unvalidated = leaderboard.load_leaderboard_split()
147
 
148
- alpha = validated[validated["submission_name"] == "Alpha Agent v1"].iloc[0]
149
- assert alpha["agent"] == (
150
- "[github.com/example/alpha-agent](https://github.com/example/alpha-agent)"
151
- )
152
- assert alpha["zip"] == (
153
- "[zip](https://huggingface.co/datasets/test/sub-a.zip)"
154
- )
155
- assert alpha["report"].startswith("[report](")
156
- assert "reports/sub-a.html" in alpha["report"]
157
-
158
- beta = unvalidated[unvalidated["submission_name"] == "Beta Agent v2"].iloc[0]
159
- assert beta["agent"] == "_None_"
160
- assert beta["zip"].startswith("[zip](")
161
- # Beta has submission_sha256, so the report link is emitted.
162
- assert beta["report"].startswith("[report](")
163
 
164
 
165
- def test_legacy_row_omits_report_link(monkeypatch):
166
- """Rows without ``submission_sha256`` (legacy seed rows) drop the report link.
167
 
168
- ``reports/<id>.html`` is only uploaded by the modern submit
169
- pipeline; legacy seed rows that pre-date that pipeline never had
170
- a report uploaded, so the /resolve/ URL would 404. Gate on
171
- ``submission_sha256`` (the schema's "modern pipeline" sentinel)
172
- so the leaderboard doesn't render a broken link.
173
  """
174
  monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
175
  _, unvalidated = leaderboard.load_leaderboard_split()
176
  gamma = unvalidated[unvalidated["submission_name"] == "Gamma baseline"].iloc[0]
177
- assert gamma["report"] == ""
178
- # Sanity: agent + zip still render normally on the legacy row.
179
- assert gamma["agent"].startswith("[github.com/example/gamma-baseline](")
180
- assert gamma["zip"].startswith("[zip](")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
 
182
 
183
  def test_datatypes_align_with_columns():
@@ -195,11 +204,12 @@ def test_datatypes_align_with_columns():
195
  len(leaderboard.VALIDATED_LEADERBOARD_DATATYPES)
196
  == len(leaderboard.VALIDATED_LEADERBOARD_COLS)
197
  )
198
- # Link columns are markdown, everything else is str.
 
199
  for col, dt in zip(
200
  leaderboard.LEADERBOARD_COLS, leaderboard.LEADERBOARD_DATATYPES
201
  ):
202
- if col in ("agent", "zip", "report"):
203
- assert dt == "markdown"
204
  else:
205
- assert dt == "str"
 
86
 
87
 
88
  def test_legacy_row_defaults_to_unvalidated(monkeypatch):
89
+ """(b) Legacy row with no `validation_status` field lands unvalidated.
90
+
91
+ Legacy rows keep plain-text submission_name (no report exists),
92
+ so identity check is straightforward equality.
93
+ """
94
  monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
95
  validated, unvalidated = leaderboard.load_leaderboard_split()
96
  assert "Gamma baseline" in set(unvalidated["submission_name"].tolist())
 
101
  """(c) Non-formatted metadata fields and validated-only columns pass through.
102
 
103
  `aggregate_score` and `validity_rate` get status-aware string
104
+ formatting; `submission_name` gets wrapped into a markdown link
105
+ on modern rows. Passthrough is checked on fields that survive
106
  untransformed (``submitter_name``, ``submitted_at``,
107
  ``cadgenbench_version``) plus the validated-table-only
108
  ``validation_method``.
 
111
  validated, unvalidated = leaderboard.load_leaderboard_split()
112
 
113
  assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
114
+ # Use submitter_name to identify Beta (its submission_name is now
115
+ # wrapped as a markdown link).
116
+ beta = unvalidated[unvalidated["submitter_name"] == "team-beta"].iloc[0]
117
  assert beta["submitted_at"] == "2026-05-02T10:00:00Z"
118
  assert beta["cadgenbench_version"] == "0.1.0"
119
 
120
  assert list(validated.columns) == leaderboard.VALIDATED_LEADERBOARD_COLS
121
+ alpha = validated[validated["submitter_name"] == "team-alpha"].iloc[0]
122
  assert alpha["validation_method"] == "code"
 
123
 
124
 
125
  def test_empty_input_returns_two_empty_frames(monkeypatch):
 
133
  assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
134
 
135
 
136
+ def test_submission_name_links_to_report_when_available(monkeypatch):
137
+ """`submission_name` cell is a markdown link to the proxy when a report exists.
 
 
 
 
 
 
 
138
 
139
+ Modern rows (status=="completed" + non-null submission_sha256)
140
+ get ``[<name>](/reports/<sid>.html)``; the relative URL targets
141
+ the Space's report proxy.
142
  """
143
  monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
144
  validated, unvalidated = leaderboard.load_leaderboard_split()
145
 
146
+ # Alpha is a modern completed row -> linked name.
147
+ alpha = validated.iloc[0]
148
+ assert alpha["submission_name"] == "[Alpha Agent v1](/reports/sub-a.html)"
149
+ # Beta likewise (also completed, has sha256).
150
+ beta = unvalidated[
151
+ unvalidated["submission_name"].str.contains("Beta Agent v2", regex=False)
152
+ ].iloc[0]
153
+ assert beta["submission_name"] == "[Beta Agent v2](/reports/sub-b.html)"
 
 
 
 
 
 
 
154
 
155
 
156
+ def test_legacy_row_submission_name_is_plain_text(monkeypatch):
157
+ """Rows without submission_sha256 (legacy seeds) keep plain-text names.
158
 
159
+ No `reports/<id>.html` exists on the dataset for those rows, so
160
+ we don't wrap the name in a link.
 
 
 
161
  """
162
  monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
163
  _, unvalidated = leaderboard.load_leaderboard_split()
164
  gamma = unvalidated[unvalidated["submission_name"] == "Gamma baseline"].iloc[0]
165
+ # Plain text: no `[...](...)` wrapping.
166
+ assert "[" not in gamma["submission_name"]
167
+ assert "](" not in gamma["submission_name"]
168
+ # And the hidden report_url column is empty for the same reason.
169
+ assert gamma["report_url"] == ""
170
+
171
+
172
+ def test_model_details_column_renders(monkeypatch):
173
+ """`model details (optional)` cell carries the agent URL or `_None_`.
174
+
175
+ Cell uses the shortened URL as link text (honest about what's
176
+ behind the click). Missing agent_url -> italic placeholder.
177
+ """
178
+ monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
179
+ validated, unvalidated = leaderboard.load_leaderboard_split()
180
+
181
+ alpha = validated.iloc[0]
182
+ assert alpha["model details (optional)"] == (
183
+ "[github.com/example/alpha-agent](https://github.com/example/alpha-agent)"
184
+ )
185
+
186
+ beta = unvalidated[
187
+ unvalidated["submission_name"].str.contains("Beta Agent v2", regex=False)
188
+ ].iloc[0]
189
+ assert beta["model details (optional)"] == "_None_"
190
 
191
 
192
  def test_datatypes_align_with_columns():
 
204
  len(leaderboard.VALIDATED_LEADERBOARD_DATATYPES)
205
  == len(leaderboard.VALIDATED_LEADERBOARD_COLS)
206
  )
207
+ # Markdown cells: the two link columns. Everything else is str.
208
+ markdown_cols = {"submission_name", "model details (optional)"}
209
  for col, dt in zip(
210
  leaderboard.LEADERBOARD_COLS, leaderboard.LEADERBOARD_DATATYPES
211
  ):
212
+ if col in markdown_cols:
213
+ assert dt == "markdown", f"{col} should be markdown"
214
  else:
215
+ assert dt == "str", f"{col} should be str"
tests/test_proxy.py ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Unit tests for the per-submission report proxy route.
2
+
3
+ The Space exposes ``/reports/{submission_id}.html`` which fetches the
4
+ file from the submissions dataset and re-serves it as
5
+ ``Content-Type: text/html`` (the dataset's ``/resolve/`` returns it
6
+ as ``text/plain``, which makes the browser show source). Tests cover
7
+ the handler's response shape with the fetch monkeypatched out, so
8
+ the suite has zero network I/O.
9
+ """
10
+ from __future__ import annotations
11
+
12
+ import app
13
+
14
+
15
+ def test_serve_report_returns_html_when_file_exists(monkeypatch):
16
+ """Successful fetch -> 200 + text/html + body passthrough."""
17
+ monkeypatch.setattr(
18
+ app,
19
+ "_fetch_report_html",
20
+ lambda sid: b"<!DOCTYPE html><html><body>ok</body></html>",
21
+ )
22
+ resp = app.serve_report("sub-test")
23
+ assert resp.status_code == 200
24
+ assert resp.media_type.startswith("text/html")
25
+ assert resp.body == b"<!DOCTYPE html><html><body>ok</body></html>"
26
+
27
+
28
+ def test_serve_report_returns_404_when_file_missing(monkeypatch):
29
+ """``_fetch_report_html`` returning None -> 404 with a small html body."""
30
+ monkeypatch.setattr(app, "_fetch_report_html", lambda sid: None)
31
+ resp = app.serve_report("sub-missing")
32
+ assert resp.status_code == 404
33
+ # 404 body is still HTML so the browser renders the message.
34
+ assert "Report not found" in resp.body.decode("utf-8")
35
+
36
+
37
+ def test_fetch_report_html_returns_none_on_hub_failure(monkeypatch):
38
+ """A Hub-side exception is caught and surfaced as None.
39
+
40
+ The serve handler relies on this to keep a transient Hub blip
41
+ from leaking a stack trace into the Space's HTTP response.
42
+ """
43
+ def boom(*a, **kw):
44
+ raise RuntimeError("simulated Hub failure")
45
+
46
+ monkeypatch.setattr(app, "hf_hub_download", boom)
47
+ # The lru_cache on _fetch_report_html caches by arg; use a unique
48
+ # id per test so prior runs don't shortcut this one.
49
+ assert app._fetch_report_html("sub-failure-probe-unique-1") is None
50
+
51
+
52
+ def test_proxy_route_is_registered():
53
+ """The mounted FastAPI app exposes ``/reports/{submission_id}.html`` as GET.
54
+
55
+ Catches the regression where the ``add_api_route`` call moves
56
+ below ``mount_gradio_app`` (which would still register the route
57
+ but make this regression silent until someone tries to hit it).
58
+ """
59
+ routes = [getattr(r, "path", None) for r in app.app.routes]
60
+ assert "/reports/{submission_id}.html" in routes