Spaces:

HuggingAI4Engineering
/

cadgenbench-leaderboard

Running

Michael Rabinovich commited on 9 days ago

Commit

1a8f331

1 Parent(s): 3112173

app+leaderboard: detail panel polish (rename link columns, fix broken report links)

Polish follow-up on C6, three fixes after the first detail-panel
commit landed:

1. Rename display columns from snake_case schema names to clean
short headers. `agent_url` -> `agent`, `submission_blob_url` ->
`zip`, `report_url` -> `report`. The rename happens at projection
time via `df.rename`, so the schema in results.jsonl is
unchanged. Headers now read as plain English ("agent", "zip",
"report") rather than a python dict.

2. The `agent` column shows the URL itself (scheme stripped,
truncated past 40 chars) as link text instead of a fixed "code"
label. The schema says agent_url can be code OR paper; the
shortened URL is the only honest hint about what's behind the
click. Missing values render as italic `_None_` rather than a
blank cell, so the optionality is explicit.

3. The `report` link is now gated on the row's `submission_sha256`
being non-null in addition to status == "completed". Legacy seed
rows pre-date the modern submit pipeline (which uploads
reports/<id>.html); the per-row sha256 is the schema's
"modern pipeline" sentinel ("null only on legacy rows that
pre-date this field"). Pre-fix the legacy rows rendered a
/resolve/ link that 404'd, exactly the bug surfaced on the live
Space.

Detail panel (app.py):
- Timestamp formatting: 2026-05-26T12:02:31Z -> 2026-05-26 12:02 UTC.
ISO punctuation isn't useful in a human-readable card; dropping
the T/Z and showing the timezone explicitly is plenty.
- Always shows "Agent: ..." (with `_None_` when missing) so the
optionality is visible in the rendered card.
- Reads the renamed display columns.

Tests:
- _stub_rows() now carries submission_sha256 on rows that should
emit the report link; the legacy stub deliberately leaves it out.
- test_link_columns_render_as_markdown updated for the renamed
columns + shortened-URL agent text + the `_None_` placeholder.
- New test_legacy_row_omits_report_link guards the sentinel gate
so a future refactor that drops it gets caught.

10/10 unit tests green. Live read on the submissions dataset:
legacy baseline rows render `agent` as a shortened github URL,
`zip` as [zip](...), `report` as empty (no submission_sha256 ->
no report file would exist). Modern submission rows pick up a
real [report](...) link to a reports/<id>.html that actually
exists on the dataset.

Files changed (3) hide show

app.py +32 -8
leaderboard.py +74 -27
tests/test_leaderboard.py +49 -19

app.py CHANGED Viewed

@@ -6,6 +6,7 @@ Read path lives in :mod:`leaderboard`. Submit-tab validation lives in
 from __future__ import annotations
 import logging
 import gradio as gr
 from gradio_leaderboard import Leaderboard
@@ -50,6 +51,8 @@ correct 3D model.
 DETAIL_PLACEHOLDER = "_Click a row above for details._"
 def _has(value) -> bool:
     """True for values that should show up in the detail panel."""
@@ -60,6 +63,22 @@ def _has(value) -> bool:
     return str(value).strip() != ""
 def _format_detail(df: pd.DataFrame | None, evt: gr.SelectData) -> str:
     """Build the row-detail markdown for the clicked submission.
@@ -68,7 +87,10 @@ def _format_detail(df: pd.DataFrame | None, evt: gr.SelectData) -> str:
     the visible link cells (already pre-formatted as ``[label](url)``
     by ``leaderboard.py``'s ``_project_and_format``).
     ``failure_reason`` only shows on ``failed`` rows;
-    ``report_url`` is only non-empty for ``completed`` rows.
     """
     if df is None or len(df) == 0 or evt is None or evt.index is None:
         return DETAIL_PLACEHOLDER
@@ -84,15 +106,17 @@ def _format_detail(df: pd.DataFrame | None, evt: gr.SelectData) -> str:
     if _has(row.get("status")):
         lines.append(f"- **Status**: {row['status']}")
     if _has(row.get("submitted_at")):
-        lines.append(f"- **Submitted**: {row['submitted_at']}")
     if _has(row.get("notes")):
         lines.append(f"- **Notes**: {row['notes']}")
-    if _has(row.get("agent_url")):
-        lines.append(f"- **Agent code**: {row['agent_url']}")
-    if _has(row.get("submission_blob_url")):
-        lines.append(f"- **Submission**: {row['submission_blob_url']}")
-    if _has(row.get("report_url")):
-        lines.append(f"- **Report**: {row['report_url']}")
     if row.get("status") == "failed" and _has(row.get("failure_reason")):
         lines.append(f"- **Failure reason**: {row['failure_reason']}")
     return "\n".join(lines)

 from __future__ import annotations
 import logging
+import re
 import gradio as gr
 from gradio_leaderboard import Leaderboard
 DETAIL_PLACEHOLDER = "_Click a row above for details._"
+_ISO_TS_RE = re.compile(r"^(\d{4}-\d{2}-\d{2})T(\d{2}:\d{2}):\d{2}Z$")
 def _has(value) -> bool:
     """True for values that should show up in the detail panel."""
     return str(value).strip() != ""
+def _fmt_timestamp(ts) -> str:
+    """Render an ISO-8601 ``submitted_at`` as ``YYYY-MM-DD HH:MM UTC``.
+    The schema writes timestamps as ``YYYY-MM-DDTHH:MM:SSZ``; the
+    minute-level UTC form is plenty for the detail panel, drops the
+    ``T``/``Z`` punctuation, and renders timezone explicitly so a
+    reader doesn't have to know that "Z" means UTC.
+    """
+    if not _has(ts):
+        return ""
+    m = _ISO_TS_RE.match(str(ts))
+    if m:
+        return f"{m.group(1)} {m.group(2)} UTC"
+    return str(ts)
 def _format_detail(df: pd.DataFrame | None, evt: gr.SelectData) -> str:
     """Build the row-detail markdown for the clicked submission.
     the visible link cells (already pre-formatted as ``[label](url)``
     by ``leaderboard.py``'s ``_project_and_format``).
     ``failure_reason`` only shows on ``failed`` rows;
+    ``report_url`` is only non-empty for completed rows from the
+    modern submit pipeline (leaderboard.py gates on
+    ``submission_sha256`` so legacy rows don't render a broken
+    /resolve/ link).
     """
     if df is None or len(df) == 0 or evt is None or evt.index is None:
         return DETAIL_PLACEHOLDER
     if _has(row.get("status")):
         lines.append(f"- **Status**: {row['status']}")
     if _has(row.get("submitted_at")):
+        lines.append(f"- **Submitted**: {_fmt_timestamp(row['submitted_at'])}")
     if _has(row.get("notes")):
         lines.append(f"- **Notes**: {row['notes']}")
+    # Display columns from leaderboard.py's _project_and_format:
+    # `agent` carries the markdown link (or "_None_" when missing);
+    # `zip` and `report` are empty strings when not applicable.
+    lines.append(f"- **Agent**: {row.get('agent') or '_None_'}")
+    if _has(row.get("zip")):
+        lines.append(f"- **Submission**: {row['zip']}")
+    if _has(row.get("report")):
+        lines.append(f"- **Report**: {row['report']}")
     if row.get("status") == "failed" and _has(row.get("failure_reason")):
         lines.append(f"- **Failure reason**: {row['failure_reason']}")
     return "\n".join(lines)

leaderboard.py CHANGED Viewed

@@ -33,6 +33,11 @@ HUB_FETCH_TIMEOUT_SECONDS = 30
 # by hidden-but-data-present columns the row-click detail panel pulls from.
 # Hidden columns ride along in the DataFrame so `Leaderboard.select(...)`
 # can read them out without a separate state-cache or re-fetch.
 LEADERBOARD_COLS = [
     "status",
     "submission_name",
@@ -41,9 +46,9 @@ LEADERBOARD_COLS = [
     "validity_rate",
     "submitted_at",
     "cadgenbench_version",
-    "agent_url",
-    "submission_blob_url",
-    "report_url",
     # Detail-panel-only (hidden via `hide_columns` on the widget):
     "submission_id",
     "notes",
@@ -62,9 +67,9 @@ VALIDATED_LEADERBOARD_COLS = [
     "validation_method",
     "submitted_at",
     "cadgenbench_version",
-    "agent_url",
-    "submission_blob_url",
-    "report_url",
     "submission_id",
     "notes",
     "failure_reason",
@@ -80,7 +85,7 @@ LEADERBOARD_HIDE_COLUMNS = ["submission_id", "notes", "failure_reason"]
 # cells get pending / failed status tags applied by _fmt_pct /
 # _fmt_score so they're string-shaped by the time the widget sees
 # them).
-_LINK_COLUMNS = frozenset({"agent_url", "submission_blob_url", "report_url"})
 def _datatypes_for(columns: list[str]) -> list[str]:
@@ -181,31 +186,61 @@ def _is_empty(v) -> bool:
     return False
 def _agent_url_md(url) -> str:
-    """Render an `agent_url` cell as a markdown link (empty string if absent)."""
     if _is_empty(url):
-        return ""
-    return f"[code]({url})"
 def _submission_blob_md(url) -> str:
-    """Render a `submission_blob_url` cell as a markdown link."""
     if _is_empty(url):
         return ""
     return f"[zip]({url})"
-def _report_url_md(submission_id, status) -> str:
-    """Build the report URL from `submission_id`, only for completed rows.
-    `reports/<id>.html` lives on the submissions dataset alongside the
-    submission zip. ``/resolve/main/`` (matching the convention used
-    by the submit handler for ``submission_blob_url``) serves the
-    file with its content type so the browser renders the HTML
-    report directly. Pending and failed rows have no report yet.
     """
     if status != "completed" or _is_empty(submission_id):
         return ""
     return (
         f"[report](https://huggingface.co/datasets/{HF_SUBMISSIONS_REPO}"
         f"/resolve/main/reports/{submission_id}.html)"
@@ -261,13 +296,25 @@ def _project_and_format(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:
     if df.empty:
         return pd.DataFrame(columns=columns)
     df = df.copy()
-    # Derive `report_url` before projection drops `submission_id`.
-    # Computed (not stored on the row) so a path change doesn't
-    # require a results.jsonl rewrite.
     if "submission_id" in df.columns and "status" in df.columns:
-        df["report_url"] = df.apply(
-            lambda r: _report_url_md(r["submission_id"], r["status"]), axis=1,
         )
     # Make sure every declared column exists (legacy rows can be
     # missing optional fields). Detail-panel reads expect the
     # column-set to be stable regardless of which source rows had
@@ -288,8 +335,8 @@ def _project_and_format(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:
         out["aggregate_score"] = out.apply(
             lambda r: _fmt_score(r["aggregate_score"], r["status"]), axis=1,
         )
-    if "agent_url" in out.columns:
-        out["agent_url"] = out["agent_url"].apply(_agent_url_md)
-    if "submission_blob_url" in out.columns:
-        out["submission_blob_url"] = out["submission_blob_url"].apply(_submission_blob_md)
     return out

 # by hidden-but-data-present columns the row-click detail panel pulls from.
 # Hidden columns ride along in the DataFrame so `Leaderboard.select(...)`
 # can read them out without a separate state-cache or re-fetch.
+# Display column names (these are what the rendered table headers
+# read). The schema fields `agent_url` and `submission_blob_url` get
+# renamed at projection time to `agent` / `zip` so the rendered
+# headers stay short and uniform; the `report` column is computed
+# fresh from `submission_id`.
 LEADERBOARD_COLS = [
     "status",
     "submission_name",
     "validity_rate",
     "submitted_at",
     "cadgenbench_version",
+    "agent",
+    "zip",
+    "report",
     # Detail-panel-only (hidden via `hide_columns` on the widget):
     "submission_id",
     "notes",
     "validation_method",
     "submitted_at",
     "cadgenbench_version",
+    "agent",
+    "zip",
+    "report",
     "submission_id",
     "notes",
     "failure_reason",
 # cells get pending / failed status tags applied by _fmt_pct /
 # _fmt_score so they're string-shaped by the time the widget sees
 # them).
+_LINK_COLUMNS = frozenset({"agent", "zip", "report"})
 def _datatypes_for(columns: list[str]) -> list[str]:
     return False
+_AGENT_URL_MAX_LINK_TEXT = 40
+def _shorten_url_for_display(url: str) -> str:
+    """Strip scheme + trailing slash; truncate to keep the table cell tidy."""
+    s = url.replace("https://", "").replace("http://", "").rstrip("/")
+    if len(s) > _AGENT_URL_MAX_LINK_TEXT:
+        s = s[: _AGENT_URL_MAX_LINK_TEXT - 1] + "…"
+    return s
 def _agent_url_md(url) -> str:
+    """Render the `agent` cell as a markdown link.
+    Uses a shortened version of the URL itself as the link text:
+    `agent_url` is a free-form "URL pointing at the agent code or
+    paper" per the schema, so the URL itself carries the only
+    honest hint about what's behind the click. Missing cells render
+    as italic `_None_` so a reader sees the field is optional and
+    just wasn't filled, rather than a blank.
+    """
     if _is_empty(url):
+        return "_None_"
+    return f"[{_shorten_url_for_display(str(url))}]({url})"
 def _submission_blob_md(url) -> str:
+    """Render the `zip` cell as a markdown link.
+    Link text stays "zip" (the URL points at our own infrastructure
+    and adds no extra information for the reader).
+    """
     if _is_empty(url):
         return ""
     return f"[zip]({url})"
+def _report_url_md(submission_id, status, submission_sha256) -> str:
+    """Build the report URL, only when the report file is known to exist.
+    `reports/<id>.html` is uploaded by the post-eval worker in the
+    modern submit pipeline. Legacy rows that pre-date that pipeline
+    (the three baseline seed rows; identifiable by ``submission_sha256``
+    being null, per the schema's compatibility note) never had a
+    report uploaded, so the link would 404. Gate on
+    ``submission_sha256`` to keep the rendered link honest.
+    ``/resolve/main/`` (matching the convention used by the submit
+    handler for ``submission_blob_url``) serves the file with its
+    content type so the browser renders the HTML report directly.
     """
     if status != "completed" or _is_empty(submission_id):
         return ""
+    if _is_empty(submission_sha256):
+        return ""
     return (
         f"[report](https://huggingface.co/datasets/{HF_SUBMISSIONS_REPO}"
         f"/resolve/main/reports/{submission_id}.html)"
     if df.empty:
         return pd.DataFrame(columns=columns)
     df = df.copy()
+    # Compute `report` (not stored on the row) before projection drops
+    # the source columns it needs. Path is derived from `submission_id`
+    # so a layout change doesn't require a results.jsonl rewrite.
     if "submission_id" in df.columns and "status" in df.columns:
+        df["report"] = df.apply(
+            lambda r: _report_url_md(
+                r["submission_id"],
+                r["status"],
+                r.get("submission_sha256"),
+            ),
+            axis=1,
         )
+    # Schema-field names -> display-column names. Keeps the rendered
+    # headers short and uniform without renaming anything in
+    # results.jsonl.
+    df = df.rename(columns={
+        "agent_url": "agent",
+        "submission_blob_url": "zip",
+    })
     # Make sure every declared column exists (legacy rows can be
     # missing optional fields). Detail-panel reads expect the
     # column-set to be stable regardless of which source rows had
         out["aggregate_score"] = out.apply(
             lambda r: _fmt_score(r["aggregate_score"], r["status"]), axis=1,
         )
+    if "agent" in out.columns:
+        out["agent"] = out["agent"].apply(_agent_url_md)
+    if "zip" in out.columns:
+        out["zip"] = out["zip"].apply(_submission_blob_md)
     return out

tests/test_leaderboard.py CHANGED Viewed

@@ -34,6 +34,9 @@ def _stub_rows():
             "submission_blob_url": (
                 "https://huggingface.co/datasets/test/sub-a.zip"
             ),
         },
         {
             "submission_id": "sub-b",
@@ -51,9 +54,13 @@ def _stub_rows():
             "submission_blob_url": (
                 "https://huggingface.co/datasets/test/sub-b.zip"
             ),
         },
         # Legacy row: pre-schema-bump shape. No `validation_status` key,
-        # no `status` key. Both should be defaulted by the reader.
         {
             "submission_id": "sub-c-legacy",
             "submitter_name": "team-gamma",
@@ -122,32 +129,55 @@ def test_empty_input_returns_two_empty_frames(monkeypatch):
 def test_link_columns_render_as_markdown(monkeypatch):
-    """agent_url / submission_blob_url / report_url render as markdown links.
-    Covers C5: link cells should be ``[label](url)`` strings (so the
-    Leaderboard widget rendering them under ``datatype="markdown"``
-    produces clickable anchors), null/missing agent_urls are empty,
-    and report_url is built from submission_id but only for
-    ``status == "completed"`` rows.
     """
     monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
     validated, unvalidated = leaderboard.load_leaderboard_split()
     alpha = validated[validated["submission_name"] == "Alpha Agent v1"].iloc[0]
-    assert alpha["agent_url"] == "[code](https://github.com/example/alpha-agent)"
-    assert alpha["submission_blob_url"] == (
         "[zip](https://huggingface.co/datasets/test/sub-a.zip)"
     )
-    # Report URL is computed from submission_id and points at the
-    # submissions dataset's `reports/<id>.html` via /resolve/main/.
-    assert alpha["report_url"].startswith("[report](")
-    assert "reports/sub-a.html" in alpha["report_url"]
-    # Null agent_url renders as empty cell, not a broken anchor.
     beta = unvalidated[unvalidated["submission_name"] == "Beta Agent v2"].iloc[0]
-    assert beta["agent_url"] == ""
-    assert beta["submission_blob_url"].startswith("[zip](")
-    assert beta["report_url"].startswith("[report](")
 def test_datatypes_align_with_columns():
@@ -169,7 +199,7 @@ def test_datatypes_align_with_columns():
     for col, dt in zip(
         leaderboard.LEADERBOARD_COLS, leaderboard.LEADERBOARD_DATATYPES
     ):
-        if col in ("agent_url", "submission_blob_url", "report_url"):
             assert dt == "markdown"
         else:
             assert dt == "str"

             "submission_blob_url": (
                 "https://huggingface.co/datasets/test/sub-a.zip"
             ),
+            # Modern submit pipeline: sha256 is populated, so report
+            # links should be emitted on completed rows.
+            "submission_sha256": "a" * 64,
         },
         {
             "submission_id": "sub-b",
             "submission_blob_url": (
                 "https://huggingface.co/datasets/test/sub-b.zip"
             ),
+            "submission_sha256": "b" * 64,
         },
         # Legacy row: pre-schema-bump shape. No `validation_status` key,
+        # no `status` key, no `submission_sha256`. Both `status` and
+        # `validation_status` should be defaulted by the reader; the
+        # missing sha256 must suppress the report link (the
+        # corresponding reports/<id>.html doesn't exist on the dataset).
         {
             "submission_id": "sub-c-legacy",
             "submitter_name": "team-gamma",
 def test_link_columns_render_as_markdown(monkeypatch):
+    """`agent` / `zip` / `report` columns render as markdown links.
+    Link cells are ``[label](url)`` strings so the Leaderboard widget
+    under ``datatype="markdown"`` produces clickable anchors. The
+    ``agent`` cell uses the URL itself (scheme stripped) as link
+    text so a reader can tell what's behind the click; ``zip`` and
+    ``report`` use the short fixed labels because they always point
+    at our own infrastructure.
+    Missing ``agent_url`` renders as ``_None_`` (italic placeholder
+    so the reader sees the field exists but wasn't filled), not a
+    blank cell.
     """
     monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
     validated, unvalidated = leaderboard.load_leaderboard_split()
     alpha = validated[validated["submission_name"] == "Alpha Agent v1"].iloc[0]
+    assert alpha["agent"] == (
+        "[github.com/example/alpha-agent](https://github.com/example/alpha-agent)"
+    )
+    assert alpha["zip"] == (
         "[zip](https://huggingface.co/datasets/test/sub-a.zip)"
     )
+    assert alpha["report"].startswith("[report](")
+    assert "reports/sub-a.html" in alpha["report"]
     beta = unvalidated[unvalidated["submission_name"] == "Beta Agent v2"].iloc[0]
+    assert beta["agent"] == "_None_"
+    assert beta["zip"].startswith("[zip](")
+    # Beta has submission_sha256, so the report link is emitted.
+    assert beta["report"].startswith("[report](")
+def test_legacy_row_omits_report_link(monkeypatch):
+    """Rows without ``submission_sha256`` (legacy seed rows) drop the report link.
+    ``reports/<id>.html`` is only uploaded by the modern submit
+    pipeline; legacy seed rows that pre-date that pipeline never had
+    a report uploaded, so the /resolve/ URL would 404. Gate on
+    ``submission_sha256`` (the schema's "modern pipeline" sentinel)
+    so the leaderboard doesn't render a broken link.
+    """
+    monkeypatch.setattr(leaderboard, "_load_rows_from_hub", lambda: _stub_rows())
+    _, unvalidated = leaderboard.load_leaderboard_split()
+    gamma = unvalidated[unvalidated["submission_name"] == "Gamma baseline"].iloc[0]
+    assert gamma["report"] == ""
+    # Sanity: agent + zip still render normally on the legacy row.
+    assert gamma["agent"].startswith("[github.com/example/gamma-baseline](")
+    assert gamma["zip"].startswith("[zip](")
 def test_datatypes_align_with_columns():
     for col, dt in zip(
         leaderboard.LEADERBOARD_COLS, leaderboard.LEADERBOARD_DATATYPES
     ):
+        if col in ("agent", "zip", "report"):
             assert dt == "markdown"
         else:
             assert dt == "str"