Spaces:

HuggingAI4Engineering
/

cadgenbench-leaderboard

Running

Michael Rabinovich commited on 9 days ago

Commit

77edebf

1 Parent(s): 1a8f331

app+leaderboard: link submission_name to a Space-side report proxy

Polish follow-up #2 on C6. Three user-visible changes plus the
plumbing required to make them work.

1. Drop the `report` and `zip` columns from the rendered table.
`submission_name` becomes the report's deep link (typical HF
leaderboard pattern: the system / model name doubles as the link
target; cf. Open LLM Leaderboard, DABstep, BigCodeBench). Modern
completed rows get `[<name>](/reports/<sid>.html)`; pending,
failed, and legacy seed rows fall back to plain text so there's
no broken click. The hidden `submission_blob_url` and the
computed `report_url` ride along in the DataFrame and surface in
the row-click detail panel only.

2. Rename the agent-URL column from `agent` to
`model details (optional)`. Missing cells render as italic
`_None_` so the optionality is explicit instead of a blank.
"Agent" was confusing; the schema lets `agent_url` be code OR
paper OR a blog post and "agent" reads narrow.

3. Add a Space-side proxy for the per-submission HTML report at
`/reports/{submission_id}.html`. HF Hub's /resolve/ serves
dataset HTML as `Content-Type: text/plain` for security, so a
direct dataset link renders source rather than the rendered
report. The proxy fetches the file via `hf_hub_download` (with
an in-memory LRU cache, 128 entries) and re-streams it with
`Content-Type: text/html`. Reports actually render.

Plumbing:
- app.py flips its launch shape from `with gr.Blocks() as app /
app.launch()` to building a parent FastAPI app, registering the
proxy route, and mounting Gradio under "/" via
`gr.mount_gradio_app`. Direct FastAPI routes are evaluated before
the mounted sub-app, so `/reports/<sid>.html` never gets shadowed
by Gradio's catch-all. `uvicorn.run` reads the same
GRADIO_SERVER_NAME / GRADIO_SERVER_PORT env vars the old launch
did, so the Dockerfile's CMD doesn't need to change. (uvicorn is
already a gradio transitive dep, no new requirements line.)
- The relative URL `/reports/<sid>.html` in the markdown link works
on any host (localhost during dev, *.hf.space in prod) without
the leaderboard reader needing to know its own URL.
- `_report_relative_url` keeps the existing modern-pipeline gate
(status == "completed" AND submission_sha256 non-null); the
three baseline seed rows still don't get a link target.

Tests (15/15 green):
- test_leaderboard: existing assertions updated for the
submission_name link wrapping; new test_submission_name_links_to_
report_when_available + test_legacy_row_submission_name_is_
plain_text + test_model_details_column_renders.
- New tests/test_proxy.py: serve_report behavior with monkeypatched
_fetch_report_html (success / missing / Hub failure) + the
route-registration regression guard.

Verification loop change (per user request): every UI/route change
gets booted via subprocess locally and probed with HTTP before push.
For this commit:
- Boot app.py on a free port.
- GET /reports/<known-modern-sid>.html -> 200, text/html, body
starts with <!DOCTYPE html, length ~1.4MB.
- GET /reports/<bogus-sid>.html -> 404.
Both PASS. The Space rebuild is now a final sanity check, not the
verify loop.

Files changed (4) hide show

app.py +87 -14
leaderboard.py +82 -63
tests/test_leaderboard.py +58 -48
tests/test_proxy.py +60 -0

app.py CHANGED Viewed

@@ -1,17 +1,28 @@
-"""CADGenBench Leaderboard Space - Gradio UI assembly.
 Read path lives in :mod:`leaderboard`. Submit-tab validation lives in
-:mod:`submit`. Both are wired into the Gradio app below.
 """
 from __future__ import annotations
 import logging
 import re
 import gradio as gr
-from gradio_leaderboard import Leaderboard
 import pandas as pd
 from leaderboard import (
     HF_DATA_REPO,
@@ -23,6 +34,8 @@ from leaderboard import (
 )
 from submit import handle_submit
 # Surface module-level logger.info / logger.warning / logger.exception
 # calls from leaderboard.py + submit.py in the Space's runtime logs.
 # Otherwise they go nowhere and any refresh / worker pathology is
@@ -109,20 +122,65 @@ def _format_detail(df: pd.DataFrame | None, evt: gr.SelectData) -> str:
         lines.append(f"- **Submitted**: {_fmt_timestamp(row['submitted_at'])}")
     if _has(row.get("notes")):
         lines.append(f"- **Notes**: {row['notes']}")
-    # Display columns from leaderboard.py's _project_and_format:
-    # `agent` carries the markdown link (or "_None_" when missing);
-    # `zip` and `report` are empty strings when not applicable.
-    lines.append(f"- **Agent**: {row.get('agent') or '_None_'}")
-    if _has(row.get("zip")):
-        lines.append(f"- **Submission**: {row['zip']}")
-    if _has(row.get("report")):
-        lines.append(f"- **Report**: {row['report']}")
     if row.get("status") == "failed" and _has(row.get("failure_reason")):
         lines.append(f"- **Failure reason**: {row['failure_reason']}")
     return "\n".join(lines)
-with gr.Blocks(title="CADGenBench Leaderboard", theme=gr.themes.Soft()) as app:
     gr.Markdown(
         "# CADGenBench Leaderboard\n"
         "_Benchmarking AI-driven CAD generation._"
@@ -228,5 +286,20 @@ to publish the resulting row on the public leaderboard.
     )
 if __name__ == "__main__":
-    app.launch()

+"""CADGenBench Leaderboard Space - Gradio UI + report-proxy mount.
 Read path lives in :mod:`leaderboard`. Submit-tab validation lives in
+:mod:`submit`. Both are wired into the Gradio Blocks below. The
+Gradio app is mounted under a FastAPI parent so the custom
+``/reports/{submission_id}.html`` route can re-serve dataset HTML
+with ``Content-Type: text/html`` (HF Hub's ``/resolve/`` serves it
+as ``text/plain`` by policy, which makes the browser show source
+rather than render).
 """
 from __future__ import annotations
 import logging
+import os
 import re
+from functools import lru_cache
+from pathlib import Path
 import gradio as gr
 import pandas as pd
+import uvicorn
+from fastapi import FastAPI
+from fastapi.responses import HTMLResponse, Response
+from gradio_leaderboard import Leaderboard
+from huggingface_hub import hf_hub_download
 from leaderboard import (
     HF_DATA_REPO,
 )
 from submit import handle_submit
+logger = logging.getLogger(__name__)
 # Surface module-level logger.info / logger.warning / logger.exception
 # calls from leaderboard.py + submit.py in the Space's runtime logs.
 # Otherwise they go nowhere and any refresh / worker pathology is
         lines.append(f"- **Submitted**: {_fmt_timestamp(row['submitted_at'])}")
     if _has(row.get("notes")):
         lines.append(f"- **Notes**: {row['notes']}")
+    # `model details (optional)` carries the markdown link (or
+    # _None_ when missing); the hidden `submission_blob_url` /
+    # `report_url` columns are raw URLs we wrap into named links here.
+    lines.append(
+        f"- **Model details (optional)**: "
+        f"{row.get('model details (optional)') or '_None_'}"
+    )
+    if _has(row.get("submission_blob_url")):
+        lines.append(f"- **Submission ZIP**: [download]({row['submission_blob_url']})")
+    if _has(row.get("report_url")):
+        lines.append(f"- **Report**: [open]({row['report_url']})")
     if row.get("status") == "failed" and _has(row.get("failure_reason")):
         lines.append(f"- **Failure reason**: {row['failure_reason']}")
     return "\n".join(lines)
+@lru_cache(maxsize=128)
+def _fetch_report_html(submission_id: str) -> bytes | None:
+    """Pull ``reports/<id>.html`` off the submissions dataset.
+    Cached in-process so repeat clicks on the same row don't hit
+    the Hub. Returns ``None`` on any failure so the caller can
+    serve a clean 404 rather than leaking a stack trace.
+    """
+    try:
+        local_path = hf_hub_download(
+            repo_id=HF_SUBMISSIONS_REPO,
+            filename=f"reports/{submission_id}.html",
+            repo_type="dataset",
+        )
+        return Path(local_path).read_bytes()
+    except Exception as e:  # noqa: BLE001 - any Hub failure -> 404
+        logger.warning(
+            "Failed to fetch report for %s (%s: %s)",
+            submission_id, type(e).__name__, e,
+        )
+        return None
+def serve_report(submission_id: str) -> Response:
+    """Proxy a per-submission HTML report through the Space.
+    HF Hub serves dataset HTML under ``/resolve/`` with
+    ``Content-Type: text/plain`` (security: dataset files can't host
+    live HTML), so a direct dataset link shows source instead of
+    rendering. This route lives on the Space (which can legitimately
+    serve text/html) and re-streams the file's bytes with the right
+    content-type.
+    """
+    content = _fetch_report_html(submission_id)
+    if content is None:
+        return HTMLResponse(
+            content="<h1>Report not found</h1>",
+            status_code=404,
+        )
+    return Response(content=content, media_type="text/html; charset=utf-8")
+with gr.Blocks(title="CADGenBench Leaderboard", theme=gr.themes.Soft()) as blocks:
     gr.Markdown(
         "# CADGenBench Leaderboard\n"
         "_Benchmarking AI-driven CAD generation._"
     )
+# Mount Gradio under a FastAPI parent so the custom proxy route
+# above lives at the same origin as the UI. Direct routes on `app`
+# get checked before the Gradio sub-app, so `/reports/<sid>.html`
+# never gets shadowed.
+app = FastAPI()
+app.add_api_route(
+    "/reports/{submission_id}.html",
+    serve_report,
+    methods=["GET"],
+)
+app = gr.mount_gradio_app(app, blocks, path="/")
 if __name__ == "__main__":
+    host = os.getenv("GRADIO_SERVER_NAME", "0.0.0.0")
+    port = int(os.getenv("GRADIO_SERVER_PORT", "7860"))
+    uvicorn.run(app, host=host, port=port)

leaderboard.py CHANGED Viewed

@@ -34,10 +34,13 @@ HUB_FETCH_TIMEOUT_SECONDS = 30
 # Hidden columns ride along in the DataFrame so `Leaderboard.select(...)`
 # can read them out without a separate state-cache or re-fetch.
 # Display column names (these are what the rendered table headers
-# read). The schema fields `agent_url` and `submission_blob_url` get
-# renamed at projection time to `agent` / `zip` so the rendered
-# headers stay short and uniform; the `report` column is computed
-# fresh from `submission_id`.
 LEADERBOARD_COLS = [
     "status",
     "submission_name",
@@ -46,13 +49,13 @@ LEADERBOARD_COLS = [
     "validity_rate",
     "submitted_at",
     "cadgenbench_version",
-    "agent",
-    "zip",
-    "report",
     # Detail-panel-only (hidden via `hide_columns` on the widget):
     "submission_id",
     "notes",
     "failure_reason",
 ]
 # Validated table additionally exposes `validation_method`; on the
@@ -67,25 +70,33 @@ VALIDATED_LEADERBOARD_COLS = [
     "validation_method",
     "submitted_at",
     "cadgenbench_version",
-    "agent",
-    "zip",
-    "report",
     "submission_id",
     "notes",
     "failure_reason",
 ]
 # Columns to hide from rendering on both tables. These ride in the
 # DataFrame so the row-click detail panel can populate from them; the
 # widget hides them from view.
-LEADERBOARD_HIDE_COLUMNS = ["submission_id", "notes", "failure_reason"]
-# Per-column gradio_leaderboard datatypes. Link columns render their
-# pre-formatted markdown; everything else is plain string (numeric
-# cells get pending / failed status tags applied by _fmt_pct /
-# _fmt_score so they're string-shaped by the time the widget sees
-# them).
-_LINK_COLUMNS = frozenset({"agent", "zip", "report"})
 def _datatypes_for(columns: list[str]) -> list[str]:
@@ -198,13 +209,13 @@ def _shorten_url_for_display(url: str) -> str:
 def _agent_url_md(url) -> str:
-    """Render the `agent` cell as a markdown link.
     Uses a shortened version of the URL itself as the link text:
     `agent_url` is a free-form "URL pointing at the agent code or
-    paper" per the schema, so the URL itself carries the only
-    honest hint about what's behind the click. Missing cells render
-    as italic `_None_` so a reader sees the field is optional and
     just wasn't filled, rather than a blank.
     """
     if _is_empty(url):
@@ -212,39 +223,41 @@ def _agent_url_md(url) -> str:
     return f"[{_shorten_url_for_display(str(url))}]({url})"
-def _submission_blob_md(url) -> str:
-    """Render the `zip` cell as a markdown link.
-    Link text stays "zip" (the URL points at our own infrastructure
-    and adds no extra information for the reader).
-    """
-    if _is_empty(url):
-        return ""
-    return f"[zip]({url})"
-def _report_url_md(submission_id, status, submission_sha256) -> str:
-    """Build the report URL, only when the report file is known to exist.
-    `reports/<id>.html` is uploaded by the post-eval worker in the
-    modern submit pipeline. Legacy rows that pre-date that pipeline
-    (the three baseline seed rows; identifiable by ``submission_sha256``
-    being null, per the schema's compatibility note) never had a
-    report uploaded, so the link would 404. Gate on
-    ``submission_sha256`` to keep the rendered link honest.
-    ``/resolve/main/`` (matching the convention used by the submit
-    handler for ``submission_blob_url``) serves the file with its
-    content type so the browser renders the HTML report directly.
     """
     if status != "completed" or _is_empty(submission_id):
         return ""
     if _is_empty(submission_sha256):
         return ""
-    return (
-        f"[report](https://huggingface.co/datasets/{HF_SUBMISSIONS_REPO}"
-        f"/resolve/main/reports/{submission_id}.html)"
-    )
 def load_leaderboard_split() -> tuple[pd.DataFrame, pd.DataFrame]:
@@ -296,25 +309,24 @@ def _project_and_format(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:
     if df.empty:
         return pd.DataFrame(columns=columns)
     df = df.copy()
-    # Compute `report` (not stored on the row) before projection drops
-    # the source columns it needs. Path is derived from `submission_id`
-    # so a layout change doesn't require a results.jsonl rewrite.
     if "submission_id" in df.columns and "status" in df.columns:
-        df["report"] = df.apply(
-            lambda r: _report_url_md(
                 r["submission_id"],
                 r["status"],
                 r.get("submission_sha256"),
             ),
             axis=1,
         )
-    # Schema-field names -> display-column names. Keeps the rendered
-    # headers short and uniform without renaming anything in
-    # results.jsonl.
-    df = df.rename(columns={
-        "agent_url": "agent",
-        "submission_blob_url": "zip",
-    })
     # Make sure every declared column exists (legacy rows can be
     # missing optional fields). Detail-panel reads expect the
     # column-set to be stable regardless of which source rows had
@@ -335,8 +347,15 @@ def _project_and_format(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:
         out["aggregate_score"] = out.apply(
             lambda r: _fmt_score(r["aggregate_score"], r["status"]), axis=1,
         )
-    if "agent" in out.columns:
-        out["agent"] = out["agent"].apply(_agent_url_md)
-    if "zip" in out.columns:
-        out["zip"] = out["zip"].apply(_submission_blob_md)
     return out

 # Hidden columns ride along in the DataFrame so `Leaderboard.select(...)`
 # can read them out without a separate state-cache or re-fetch.
 # Display column names (these are what the rendered table headers
+# read). The visible set is intentionally compact; the detail-panel
+# columns at the end of the list ride along in the DataFrame so the
+# row-click handler can populate from them without a separate state
+# cache. `submission_name` is the primary link (markdown-wrapped at
+# projection time to point at the report when one exists), so we
+# don't need a separate `report` column. `submission_blob_url` lives
+# in the detail panel only.
 LEADERBOARD_COLS = [
     "status",
     "submission_name",
     "validity_rate",
     "submitted_at",
     "cadgenbench_version",
+    "model details (optional)",
     # Detail-panel-only (hidden via `hide_columns` on the widget):
     "submission_id",
     "notes",
     "failure_reason",
+    "submission_blob_url",
+    "report_url",
 ]
 # Validated table additionally exposes `validation_method`; on the
     "validation_method",
     "submitted_at",
     "cadgenbench_version",
+    "model details (optional)",
     "submission_id",
     "notes",
     "failure_reason",
+    "submission_blob_url",
+    "report_url",
 ]
 # Columns to hide from rendering on both tables. These ride in the
 # DataFrame so the row-click detail panel can populate from them; the
 # widget hides them from view.
+LEADERBOARD_HIDE_COLUMNS = [
+    "submission_id",
+    "notes",
+    "failure_reason",
+    "submission_blob_url",
+    "report_url",
+]
+# Per-column gradio_leaderboard datatypes. `submission_name` and
+# `model details (optional)` render their pre-formatted markdown
+# (the submission_name cell links to the report when one exists; the
+# model-details cell is the agent URL or _None_); everything else is
+# plain string (numeric cells get pending / failed status tags
+# applied by _fmt_pct / _fmt_score so they're string-shaped by the
+# time the widget sees them).
+_LINK_COLUMNS = frozenset({"submission_name", "model details (optional)"})
 def _datatypes_for(columns: list[str]) -> list[str]:
 def _agent_url_md(url) -> str:
+    """Render the `model details (optional)` cell as a markdown link.
     Uses a shortened version of the URL itself as the link text:
     `agent_url` is a free-form "URL pointing at the agent code or
+    paper" per the schema, so the URL itself carries the only honest
+    hint about what's behind the click. Missing cells render as
+    italic ``_None_`` so a reader sees the field is optional and
     just wasn't filled, rather than a blank.
     """
     if _is_empty(url):
     return f"[{_shorten_url_for_display(str(url))}]({url})"
+def _report_relative_url(submission_id, status, submission_sha256) -> str:
+    """Relative URL to the Space's report-proxy route, or empty.
+    The Space exposes ``/reports/<id>.html`` which fetches the file
+    from the submissions dataset and re-serves it as ``text/html``
+    (the dataset's ``/resolve/`` path returns it as ``text/plain``,
+    which renders as source). Returning a relative URL means the
+    same string works whether the Space is running locally on a
+    random port or on huggingface.co.
+    `reports/<id>.html` only exists for completed rows from the
+    modern submit pipeline; legacy seed rows (pre-pipeline,
+    ``submission_sha256`` null) never had a report uploaded, so this
+    function returns empty for them and the caller leaves the
+    submission_name cell as plain text rather than a broken link.
     """
     if status != "completed" or _is_empty(submission_id):
         return ""
     if _is_empty(submission_sha256):
         return ""
+    return f"/reports/{submission_id}.html"
+def _submission_name_md(name, report_url) -> str:
+    """Render `submission_name` as a markdown link when a report exists.
+    The submission name itself is the link target (typical HF
+    leaderboard pattern: model / system name doubles as the
+    deep-link). Plain text when no report is available.
+    """
+    if _is_empty(name):
+        return "(unnamed submission)"
+    if _is_empty(report_url):
+        return str(name)
+    return f"[{name}]({report_url})"
 def load_leaderboard_split() -> tuple[pd.DataFrame, pd.DataFrame]:
     if df.empty:
         return pd.DataFrame(columns=columns)
     df = df.copy()
+    # Compute the relative report URL before projection drops the
+    # source columns it depends on. Lives as a hidden column so the
+    # detail panel can show "Report: link" and `submission_name` can
+    # use it as the link target.
     if "submission_id" in df.columns and "status" in df.columns:
+        df["report_url"] = df.apply(
+            lambda r: _report_relative_url(
                 r["submission_id"],
                 r["status"],
                 r.get("submission_sha256"),
             ),
             axis=1,
         )
+    # Schema field name -> display column header. Keep the rename
+    # narrow: only the agent URL gets a friendlier header; the schema
+    # field stays `agent_url` in results.jsonl. `submission_blob_url`
+    # stays under that name (hidden, detail-panel-only).
+    df = df.rename(columns={"agent_url": "model details (optional)"})
     # Make sure every declared column exists (legacy rows can be
     # missing optional fields). Detail-panel reads expect the
     # column-set to be stable regardless of which source rows had
         out["aggregate_score"] = out.apply(
             lambda r: _fmt_score(r["aggregate_score"], r["status"]), axis=1,
         )
+    if "submission_name" in out.columns:
+        out["submission_name"] = out.apply(
+            lambda r: _submission_name_md(
+                r["submission_name"], r.get("report_url"),
+            ),
+            axis=1,
+        )
+    if "model details (optional)" in out.columns:
+        out["model details (optional)"] = out["model details (optional)"].apply(
+            _agent_url_md
+        )
     return out

tests/test_leaderboard.py CHANGED Viewed

@@ -86,7 +86,11 @@ def test_split_shape(monkeypatch):
 def test_legacy_row_defaults_to_unvalidated(monkeypatch):
-    """(b) Legacy row with no `validation_status` field lands unvalidated."""
     monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
     validated, unvalidated = leaderboard.load_leaderboard_split()
     assert "Gamma baseline" in set(unvalidated["submission_name"].tolist())
@@ -97,7 +101,8 @@ def test_field_passthrough(monkeypatch):
     """(c) Non-formatted metadata fields and validated-only columns pass through.
     `aggregate_score` and `validity_rate` get status-aware string
-    formatting, so passthrough is checked on fields that survive
     untransformed (``submitter_name``, ``submitted_at``,
     ``cadgenbench_version``) plus the validated-table-only
     ``validation_method``.
@@ -106,15 +111,15 @@ def test_field_passthrough(monkeypatch):
     validated, unvalidated = leaderboard.load_leaderboard_split()
     assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
-    beta = unvalidated[unvalidated["submission_name"] == "Beta Agent v2"].iloc[0]
-    assert beta["submitter_name"] == "team-beta"
     assert beta["submitted_at"] == "2026-05-02T10:00:00Z"
     assert beta["cadgenbench_version"] == "0.1.0"
     assert list(validated.columns) == leaderboard.VALIDATED_LEADERBOARD_COLS
-    alpha = validated[validated["submission_name"] == "Alpha Agent v1"].iloc[0]
     assert alpha["validation_method"] == "code"
-    assert alpha["submitter_name"] == "team-alpha"
 def test_empty_input_returns_two_empty_frames(monkeypatch):
@@ -128,56 +133,60 @@ def test_empty_input_returns_two_empty_frames(monkeypatch):
     assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
-def test_link_columns_render_as_markdown(monkeypatch):
-    """`agent` / `zip` / `report` columns render as markdown links.
-    Link cells are ``[label](url)`` strings so the Leaderboard widget
-    under ``datatype="markdown"`` produces clickable anchors. The
-    ``agent`` cell uses the URL itself (scheme stripped) as link
-    text so a reader can tell what's behind the click; ``zip`` and
-    ``report`` use the short fixed labels because they always point
-    at our own infrastructure.
-    Missing ``agent_url`` renders as ``_None_`` (italic placeholder
-    so the reader sees the field exists but wasn't filled), not a
-    blank cell.
     """
     monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
     validated, unvalidated = leaderboard.load_leaderboard_split()
-    alpha = validated[validated["submission_name"] == "Alpha Agent v1"].iloc[0]
-    assert alpha["agent"] == (
-        "[github.com/example/alpha-agent](https://github.com/example/alpha-agent)"
-    )
-    assert alpha["zip"] == (
-        "[zip](https://huggingface.co/datasets/test/sub-a.zip)"
-    )
-    assert alpha["report"].startswith("[report](")
-    assert "reports/sub-a.html" in alpha["report"]
-    beta = unvalidated[unvalidated["submission_name"] == "Beta Agent v2"].iloc[0]
-    assert beta["agent"] == "_None_"
-    assert beta["zip"].startswith("[zip](")
-    # Beta has submission_sha256, so the report link is emitted.
-    assert beta["report"].startswith("[report](")
-def test_legacy_row_omits_report_link(monkeypatch):
-    """Rows without ``submission_sha256`` (legacy seed rows) drop the report link.
-    ``reports/<id>.html`` is only uploaded by the modern submit
-    pipeline; legacy seed rows that pre-date that pipeline never had
-    a report uploaded, so the /resolve/ URL would 404. Gate on
-    ``submission_sha256`` (the schema's "modern pipeline" sentinel)
-    so the leaderboard doesn't render a broken link.
     """
     monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
     _, unvalidated = leaderboard.load_leaderboard_split()
     gamma = unvalidated[unvalidated["submission_name"] == "Gamma baseline"].iloc[0]
-    assert gamma["report"] == ""
-    # Sanity: agent + zip still render normally on the legacy row.
-    assert gamma["agent"].startswith("[github.com/example/gamma-baseline](")
-    assert gamma["zip"].startswith("[zip](")
 def test_datatypes_align_with_columns():
@@ -195,11 +204,12 @@ def test_datatypes_align_with_columns():
         len(leaderboard.VALIDATED_LEADERBOARD_DATATYPES)
         == len(leaderboard.VALIDATED_LEADERBOARD_COLS)
     )
-    # Link columns are markdown, everything else is str.
     for col, dt in zip(
         leaderboard.LEADERBOARD_COLS, leaderboard.LEADERBOARD_DATATYPES
     ):
-        if col in ("agent", "zip", "report"):
-            assert dt == "markdown"
         else:
-            assert dt == "str"

 def test_legacy_row_defaults_to_unvalidated(monkeypatch):
+    """(b) Legacy row with no `validation_status` field lands unvalidated.
+    Legacy rows keep plain-text submission_name (no report exists),
+    so identity check is straightforward equality.
+    """
     monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
     validated, unvalidated = leaderboard.load_leaderboard_split()
     assert "Gamma baseline" in set(unvalidated["submission_name"].tolist())
     """(c) Non-formatted metadata fields and validated-only columns pass through.
     `aggregate_score` and `validity_rate` get status-aware string
+    formatting; `submission_name` gets wrapped into a markdown link
+    on modern rows. Passthrough is checked on fields that survive
     untransformed (``submitter_name``, ``submitted_at``,
     ``cadgenbench_version``) plus the validated-table-only
     ``validation_method``.
     validated, unvalidated = leaderboard.load_leaderboard_split()
     assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
+    # Use submitter_name to identify Beta (its submission_name is now
+    # wrapped as a markdown link).
+    beta = unvalidated[unvalidated["submitter_name"] == "team-beta"].iloc[0]
     assert beta["submitted_at"] == "2026-05-02T10:00:00Z"
     assert beta["cadgenbench_version"] == "0.1.0"
     assert list(validated.columns) == leaderboard.VALIDATED_LEADERBOARD_COLS
+    alpha = validated[validated["submitter_name"] == "team-alpha"].iloc[0]
     assert alpha["validation_method"] == "code"
 def test_empty_input_returns_two_empty_frames(monkeypatch):
     assert list(unvalidated.columns) == leaderboard.LEADERBOARD_COLS
+def test_submission_name_links_to_report_when_available(monkeypatch):
+    """`submission_name` cell is a markdown link to the proxy when a report exists.
+    Modern rows (status=="completed" + non-null submission_sha256)
+    get ``[<name>](/reports/<sid>.html)``; the relative URL targets
+    the Space's report proxy.
     """
     monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
     validated, unvalidated = leaderboard.load_leaderboard_split()
+    # Alpha is a modern completed row -> linked name.
+    alpha = validated.iloc[0]
+    assert alpha["submission_name"] == "[Alpha Agent v1](/reports/sub-a.html)"
+    # Beta likewise (also completed, has sha256).
+    beta = unvalidated[
+        unvalidated["submission_name"].str.contains("Beta Agent v2", regex=False)
+    ].iloc[0]
+    assert beta["submission_name"] == "[Beta Agent v2](/reports/sub-b.html)"
+def test_legacy_row_submission_name_is_plain_text(monkeypatch):
+    """Rows without submission_sha256 (legacy seeds) keep plain-text names.
+    No `reports/<id>.html` exists on the dataset for those rows, so
+    we don't wrap the name in a link.
     """
     monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
     _, unvalidated = leaderboard.load_leaderboard_split()
     gamma = unvalidated[unvalidated["submission_name"] == "Gamma baseline"].iloc[0]
+    # Plain text: no `[...](...)` wrapping.
+    assert "[" not in gamma["submission_name"]
+    assert "](" not in gamma["submission_name"]
+    # And the hidden report_url column is empty for the same reason.
+    assert gamma["report_url"] == ""
+def test_model_details_column_renders(monkeypatch):
+    """`model details (optional)` cell carries the agent URL or `_None_`.
+    Cell uses the shortened URL as link text (honest about what's
+    behind the click). Missing agent_url -> italic placeholder.
+    """
+    monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
+    validated, unvalidated = leaderboard.load_leaderboard_split()
+    alpha = validated.iloc[0]
+    assert alpha["model details (optional)"] == (
+        "[github.com/example/alpha-agent](https://github.com/example/alpha-agent)"
+    )
+    beta = unvalidated[
+        unvalidated["submission_name"].str.contains("Beta Agent v2", regex=False)
+    ].iloc[0]
+    assert beta["model details (optional)"] == "_None_"
 def test_datatypes_align_with_columns():
         len(leaderboard.VALIDATED_LEADERBOARD_DATATYPES)
         == len(leaderboard.VALIDATED_LEADERBOARD_COLS)
     )
+    # Markdown cells: the two link columns. Everything else is str.
+    markdown_cols = {"submission_name", "model details (optional)"}
     for col, dt in zip(
         leaderboard.LEADERBOARD_COLS, leaderboard.LEADERBOARD_DATATYPES
     ):
+        if col in markdown_cols:
+            assert dt == "markdown", f"{col} should be markdown"
         else:
+            assert dt == "str", f"{col} should be str"

tests/test_proxy.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""Unit tests for the per-submission report proxy route.
+The Space exposes ``/reports/{submission_id}.html`` which fetches the
+file from the submissions dataset and re-serves it as
+``Content-Type: text/html`` (the dataset's ``/resolve/`` returns it
+as ``text/plain``, which makes the browser show source). Tests cover
+the handler's response shape with the fetch monkeypatched out, so
+the suite has zero network I/O.
+"""
+from __future__ import annotations
+import app
+def test_serve_report_returns_html_when_file_exists(monkeypatch):
+    """Successful fetch -> 200 + text/html + body passthrough."""
+    monkeypatch.setattr(
+        app,
+        "_fetch_report_html",
+        lambda sid: b"<!DOCTYPE html><html><body>ok</body></html>",
+    )
+    resp = app.serve_report("sub-test")
+    assert resp.status_code == 200
+    assert resp.media_type.startswith("text/html")
+    assert resp.body == b"<!DOCTYPE html><html><body>ok</body></html>"
+def test_serve_report_returns_404_when_file_missing(monkeypatch):
+    """``_fetch_report_html`` returning None -> 404 with a small html body."""
+    monkeypatch.setattr(app, "_fetch_report_html", lambda sid: None)
+    resp = app.serve_report("sub-missing")
+    assert resp.status_code == 404
+    # 404 body is still HTML so the browser renders the message.
+    assert "Report not found" in resp.body.decode("utf-8")
+def test_fetch_report_html_returns_none_on_hub_failure(monkeypatch):
+    """A Hub-side exception is caught and surfaced as None.
+    The serve handler relies on this to keep a transient Hub blip
+    from leaking a stack trace into the Space's HTTP response.
+    """
+    def boom(*a, **kw):
+        raise RuntimeError("simulated Hub failure")
+    monkeypatch.setattr(app, "hf_hub_download", boom)
+    # The lru_cache on _fetch_report_html caches by arg; use a unique
+    # id per test so prior runs don't shortcut this one.
+    assert app._fetch_report_html("sub-failure-probe-unique-1") is None
+def test_proxy_route_is_registered():
+    """The mounted FastAPI app exposes ``/reports/{submission_id}.html`` as GET.
+    Catches the regression where the ``add_api_route`` call moves
+    below ``mount_gradio_app`` (which would still register the route
+    but make this regression silent until someone tries to hit it).
+    """
+    routes = [getattr(r, "path", None) for r in app.app.routes]
+    assert "/reports/{submission_id}.html" in routes