Spaces:

HuggingAI4Engineering
/

CADGenBench

Running

Michael Rabinovich commited on May 27

Commit

1748c76

1 Parent(s): 0501689

submit: write pending row + upload zip on successful validation

Step 6 (E) chunk 3. Cheap-sync validation (chunk 2) now also commits
to the Hub: on success the handler uploads submissions/<id>.zip and
appends a status=pending row to results.jsonl under a process-wide
threading.Lock. Eval still stubbed, the row stays pending forever
until chunk 4 lands the background worker.

Hub-write ordering (after validation passes):
1. Upload submissions/<id>.zip. Unique path per submission, no
lock needed (HfApi handles its own commit serialization).
2. Build the pending row: metadata fields + status="pending" +
failure_reason=null + submission_blob_url + null score fields,
matching the "Pending" regime in cadgenbench-submissions/schema.md.
3. Acquire _HUB_LOCK, download current results.jsonl (treat
EntryNotFoundError as empty), append one JSON line, re-upload.
Lock held only for the RMW cycle (~1-2s), not for eval time;
concurrent submitters serialise here, not in the worker.

If step 1 fails the submitter sees a clean "Hub upload failed"
rejection. If step 3 fails the zip is left orphaned in submissions/
and the submitter sees a clean "Hub write failed" rejection; an
orphan-zip sweep is a future-chunk concern (covered by the eventual
stuck-pending recovery in chunk 6 plus a manual sweep if needed).

leaderboard.py:
- Add `status` as the first leaderboard column so pending rows are
visually distinguishable from completed ones.
- Backfill `status` to "completed" in-memory for legacy rows
written before the schema bump (the three baseline seeds);
populated rows are completed by definition.

cadgenbench_version and cadgenbench_data_revision get pinned on the
pending row at submit time so the row is meaningful even if the
worker never runs and chunk 6 ends up flipping it to failed.
data_revision is the short sha from HfApi().dataset_info; cached
per-process and falls back to "unknown" on Hub flake (it's a
metadata field, not worth failing a submit over).

HF_DATA_REPO and HF_SUBMISSIONS_REPO are imported from leaderboard.py
for now; pulling them into a shared config module is a worthwhile
refactor once submit.py grows again in chunks 4+.

Files changed (2) hide show

leaderboard.py +8 -0
submit.py +200 -10

leaderboard.py CHANGED Viewed

@@ -23,6 +23,7 @@ HF_DATA_REPO = os.getenv("HF_DATA_REPO", f"{HF_ORG}/cadgenbench-data")
 LOCAL_RESULTS_PATH = Path(__file__).parent / "results.jsonl"
 LEADERBOARD_COLS = [
     "submission_name",
     "submitter_name",
     "aggregate_score",
@@ -80,6 +81,13 @@ def load_leaderboard() -> pd.DataFrame:
     if not rows:
         return pd.DataFrame(columns=LEADERBOARD_COLS)
     df = pd.DataFrame(rows)
     cols = [c for c in LEADERBOARD_COLS if c in df.columns]
     df = (
         df[cols]

 LOCAL_RESULTS_PATH = Path(__file__).parent / "results.jsonl"
 LEADERBOARD_COLS = [
+    "status",
     "submission_name",
     "submitter_name",
     "aggregate_score",
     if not rows:
         return pd.DataFrame(columns=LEADERBOARD_COLS)
     df = pd.DataFrame(rows)
+    # Backfill `status` for legacy rows written before the schema bump
+    # (the three baseline seed rows). They all have populated score
+    # fields, so "completed" is the correct retrofit.
+    if "status" not in df.columns:
+        df["status"] = "completed"
+    else:
+        df["status"] = df["status"].fillna("completed")
     cols = [c for c in LEADERBOARD_COLS if c in df.columns]
     df = (
         df[cols]

submit.py CHANGED Viewed

@@ -1,10 +1,11 @@
 """Submit-tab handler for the CADGenBench leaderboard Space.
-Step 6 (E) chunk 2: the cheap-sync validation pipeline. The handler
-validates the upload and returns a placeholder message describing the
-submission it would have queued. No Hub writes, no eval kick-off, no
-background work. The async write + worker thread land in later chunks
-of Step 6 (E); see ``space-setup/step-6e-async.md``.
 Validation gates, in order:
@@ -23,19 +24,40 @@ Validation gates, in order:
    geometry. Per-fixture validity (watertight, manifold, etc) is
    *not* checked here, that's the evaluator's job and contributes to
    the per-fixture score; this gate only rejects "not actually STEP".
 """
 from __future__ import annotations
 import json
 import re
 import tempfile
 import zipfile
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Any
 from cadgenbench.common.paths import data_inputs_dir
 from cadgenbench.common.validity import parse_step
 NOTES_MAX_CHARS = 500
 REQUIRED_META_KEYS: tuple[str, ...] = (
@@ -46,12 +68,31 @@ REQUIRED_META_KEYS: tuple[str, ...] = (
     "agree_to_publish",
 )
 SUBMISSION_ID_SLUG_MAX = 40
 class _ValidationError(Exception):
     """Internal sentinel that maps to a user-facing rejection message."""
 def handle_submit(
     zip_file,
     submission_name: str,  # noqa: ARG001 - kept for UI compat; meta.json wins
@@ -93,12 +134,19 @@ def handle_submit(
     submission_id = _mint_submission_id(
         meta["submitter_name"], meta["submission_name"]
     )
     return (
-        f"**Validation OK.** Would queue submission `{submission_id}` "
-        f"(submitter: `{meta['submitter_name']}`, system: "
-        f"`{meta['submission_name']}`, {len(fixture_names)} fixtures).\n\n"
-        f"_Chunk 2 of Step 6 (E): validation only. Hub write + eval "
-        f"kick-off land in the next chunk._"
     )
@@ -259,3 +307,145 @@ def _slug(s: str) -> str:
     """Filesystem-safe slug. Lowercase, ``[a-z0-9-]``, collapsed dashes."""
     cleaned = re.sub(r"[^A-Za-z0-9]+", "-", s).strip("-").lower()
     return cleaned[:SUBMISSION_ID_SLUG_MAX] or "unnamed"

 """Submit-tab handler for the CADGenBench leaderboard Space.
+Step 6 (E) chunks 2 + 3: cheap-sync validation pipeline + pending-row
+write + zip upload. The handler validates the upload, uploads the
+zip to ``submissions/<id>.zip``, appends a ``status: pending`` row to
+``results.jsonl`` (under a process-wide lock), and returns
+immediately. No eval and no worker yet, the row stays pending
+forever until later chunks add the background thread.
 Validation gates, in order:
    geometry. Per-fixture validity (watertight, manifold, etc) is
    *not* checked here, that's the evaluator's job and contributes to
    the per-fixture score; this gate only rejects "not actually STEP".
+Hub-write ordering (after validation passes):
+1. Upload ``submissions/<id>.zip``. Unique path per submission, no
+   lock needed.
+2. Build pending row (metadata + null scores + ``submission_blob_url``).
+3. Acquire ``_HUB_LOCK``; download current ``results.jsonl`` (or
+   start empty); append the pending row; re-upload.
+If step 1 fails the user sees a clean rejection. If step 3 fails the
+zip is left orphaned in ``submissions/`` and the user sees a clean
+rejection; an orphan-zip sweep is a future-chunk concern.
 """
 from __future__ import annotations
 import json
+import logging
 import re
 import tempfile
+import threading
 import zipfile
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Any
+import cadgenbench
 from cadgenbench.common.paths import data_inputs_dir
 from cadgenbench.common.validity import parse_step
+from huggingface_hub import HfApi
+from huggingface_hub.errors import EntryNotFoundError
+from leaderboard import HF_DATA_REPO, HF_SUBMISSIONS_REPO
+logger = logging.getLogger(__name__)
 NOTES_MAX_CHARS = 500
 REQUIRED_META_KEYS: tuple[str, ...] = (
     "agree_to_publish",
 )
 SUBMISSION_ID_SLUG_MAX = 40
+RESULTS_FILENAME = "results.jsonl"
+SUBMISSIONS_DIR = "submissions"
+DATA_REV_SHORT_LEN = 12
+# One HfApi client per process. HF_TOKEN is picked up from the env at
+# construction time and reused for every call.
+_HF_API = HfApi()
+# Process-wide lock guarding the read-modify-write of results.jsonl.
+# The Space is single-process so a threading.Lock is sufficient; held
+# only for the duration of the RMW cycle (~1-2s), not for eval time.
+_HUB_LOCK = threading.Lock()
+# Lazily-resolved cadgenbench-data revision, cached per process.
+_DATA_REVISION: str | None = None
 class _ValidationError(Exception):
     """Internal sentinel that maps to a user-facing rejection message."""
+class _HubWriteError(Exception):
+    """Raised when a Hub upload fails after validation succeeded."""
 def handle_submit(
     zip_file,
     submission_name: str,  # noqa: ARG001 - kept for UI compat; meta.json wins
     submission_id = _mint_submission_id(
         meta["submitter_name"], meta["submission_name"]
     )
+    try:
+        blob_url = _upload_submission_zip(submission_id, zip_path)
+        row = _build_pending_row(submission_id, meta, fixture_names, blob_url)
+        _append_pending_row(row)
+    except _HubWriteError as e:
+        return f"**Submission rejected.** {e}"
     return (
+        f"**Queued.** Submission `{submission_id}` has been accepted and a "
+        f"`pending` row added to the leaderboard (submitter: "
+        f"`{meta['submitter_name']}`, system: `{meta['submission_name']}`, "
+        f"{len(fixture_names)} fixtures). Evaluation will populate the "
+        f"score columns once the worker lands in a later chunk."
     )
     """Filesystem-safe slug. Lowercase, ``[a-z0-9-]``, collapsed dashes."""
     cleaned = re.sub(r"[^A-Za-z0-9]+", "-", s).strip("-").lower()
     return cleaned[:SUBMISSION_ID_SLUG_MAX] or "unnamed"
+def _upload_submission_zip(submission_id: str, zip_path: Path) -> str:
+    """Upload the submission zip to ``submissions/<id>.zip``.
+    Returns the canonical Hub blob URL on success. Raises
+    :class:`_HubWriteError` with a short user-facing reason on
+    failure.
+    """
+    repo_path = f"{SUBMISSIONS_DIR}/{submission_id}.zip"
+    try:
+        _HF_API.upload_file(
+            path_or_fileobj=str(zip_path),
+            path_in_repo=repo_path,
+            repo_id=HF_SUBMISSIONS_REPO,
+            repo_type="dataset",
+            commit_message=f"add submission zip for {submission_id}",
+        )
+    except Exception as e:  # noqa: BLE001 - Hub API surface is broad
+        logger.exception("Failed to upload submission zip %s", submission_id)
+        raise _HubWriteError(
+            f"Server-side error uploading submission zip "
+            f"({type(e).__name__}: {e}). Please try again later."
+        ) from e
+    return (
+        f"https://huggingface.co/datasets/{HF_SUBMISSIONS_REPO}"
+        f"/resolve/main/{repo_path}"
+    )
+def _build_pending_row(
+    submission_id: str,
+    meta: dict[str, Any],
+    fixture_names: set[str],  # noqa: ARG001 - kept for future per-fixture pre-fill
+    blob_url: str,
+) -> dict[str, Any]:
+    """Construct the JSON row written for a freshly-queued submission.
+    Mirrors the pending regime in ``cadgenbench-submissions/schema.md``:
+    metadata + ``status: pending`` + ``submission_blob_url``; every
+    score-shaped field is ``null`` until the worker flips the row.
+    """
+    return {
+        "submission_id": submission_id,
+        "status": "pending",
+        "failure_reason": None,
+        "submitter_name": meta["submitter_name"],
+        "submission_name": meta["submission_name"],
+        "agent_url": meta["agent_url"],
+        "notes": meta["notes"],
+        "submitted_at": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
+        "cadgenbench_version": cadgenbench.__version__,
+        "cadgenbench_data_revision": _resolve_data_revision(),
+        "validity_rate": None,
+        "aggregate_score": None,
+        "score_by_task_type": None,
+        "per_task_scores": None,
+        "per_fixture_scores": None,
+        "per_fixture_breakdown": None,
+        "submission_blob_url": blob_url,
+    }
+def _append_pending_row(row: dict[str, Any]) -> None:
+    """Append a pending row to ``results.jsonl`` on the Hub under the lock.
+    Read-modify-write: download the current file (or treat as empty if
+    it doesn't exist yet), append one line, re-upload. The lock is
+    held only for the duration of this cycle (~1-2s), not for any
+    background eval; concurrent submitters serialise here, not on the
+    eval pipeline.
+    """
+    with _HUB_LOCK:
+        try:
+            existing = _download_results_jsonl()
+        except Exception as e:  # noqa: BLE001 - Hub API surface is broad
+            logger.exception("Failed to download results.jsonl for append")
+            raise _HubWriteError(
+                f"Server-side error reading the submissions table "
+                f"({type(e).__name__}: {e}). Please try again later."
+            ) from e
+        line = json.dumps(row, ensure_ascii=False)
+        new_body = existing + line + "\n" if existing else line + "\n"
+        try:
+            _HF_API.upload_file(
+                path_or_fileobj=new_body.encode("utf-8"),
+                path_in_repo=RESULTS_FILENAME,
+                repo_id=HF_SUBMISSIONS_REPO,
+                repo_type="dataset",
+                commit_message=(
+                    f"add pending row for {row['submission_id']}"
+                ),
+            )
+        except Exception as e:  # noqa: BLE001 - Hub API surface is broad
+            logger.exception(
+                "Failed to upload appended results.jsonl for %s",
+                row["submission_id"],
+            )
+            raise _HubWriteError(
+                f"Server-side error writing the submissions table "
+                f"({type(e).__name__}: {e}). The submission zip was "
+                f"uploaded but the row was not; please try again later."
+            ) from e
+def _download_results_jsonl() -> str:
+    """Fetch the current ``results.jsonl`` body as text, or ``""`` if absent."""
+    from huggingface_hub import hf_hub_download
+    try:
+        path = hf_hub_download(
+            repo_id=HF_SUBMISSIONS_REPO,
+            filename=RESULTS_FILENAME,
+            repo_type="dataset",
+            force_download=True,
+        )
+    except EntryNotFoundError:
+        return ""
+    return Path(path).read_text(encoding="utf-8")
+def _resolve_data_revision() -> str:
+    """Return a short sha for the cadgenbench-data dataset, cached per process.
+    Falls back to ``"unknown"`` on Hub errors so a flaky network can't
+    block a submission over a metadata field.
+    """
+    global _DATA_REVISION
+    if _DATA_REVISION is not None:
+        return _DATA_REVISION
+    try:
+        info = _HF_API.dataset_info(HF_DATA_REPO)
+        _DATA_REVISION = (info.sha or "unknown")[:DATA_REV_SHORT_LEN]
+    except Exception as e:  # noqa: BLE001 - metadata only, don't fail the submit
+        logger.warning(
+            "Failed to resolve cadgenbench-data revision (%s: %s)",
+            type(e).__name__, e,
+        )
+        _DATA_REVISION = "unknown"
+    return _DATA_REVISION