Michael Rabinovich commited on
Commit
855bb8a
·
1 Parent(s): 5aee3e5

submit: background-thread eval, flip row pending -> completed / failed

Browse files

Step 6 (E) chunk 4. On a successful submit handle_submit now spawns
a daemon thread that runs `cadgenbench evaluate` then `cadgenbench
report single` over the unpacked submission, uploads
reports/<id>.{html,json} to the submissions dataset, reads
run_summary.json, and flips the row pending -> completed under the
existing _HUB_LOCK. Any worker-side exception flips the row to
failed with a short failure_reason (<=200 chars; full traceback
goes to Space logs). Tempdir cleanup always runs in finally.

Pipeline per submission, in the worker:
1. cadgenbench evaluate <run_dir> --workers 8
(subprocess via `python -m cadgenbench.cli`; the eval CLI
already fans out across fixtures with ProcessPoolExecutor,
self-throttles to n_fixtures when smaller).
2. cadgenbench report single <run_dir> -o <tmp>/<id>.html
3. Build reports JSON: run_summary.json + each fixture's
result.json bundled together (mirrors README's description
of reports/<id>.json as the "machine-readable mirror").
4. Upload reports/<id>.html and reports/<id>.json.
5. Under _HUB_LOCK: merge aggregate_score, validity_rate,
score_by_task_type, per_task_scores, per_fixture_scores from
run_summary.json into the existing pending row; set status
to "completed", refresh cadgenbench_data_revision.

Subprocess timeouts: 15 min for eval (eval ceiling is ~5 min on
cpu-upgrade; generous), 2 min for report. Subprocess.run with
check=False so we surface the exit code + stderr tail in the
RuntimeError, not the default CalledProcessError chain. env is
inherited so HF_TOKEN / CADGENBENCH_DATA_REPO /
CADGENBENCH_DATA_GT_REPO reach the cadgenbench subprocess.

Tempdir lifecycle change: handle_submit was using a `with
TemporaryDirectory()` that auto-cleaned at function exit, which
would have nuked the unpacked submission before the worker had
a chance to read it. Switched to manual mkdtemp + try/finally;
ownership transfers to the worker on successful spawn (worker's
own finally does shutil.rmtree). On any pre-spawn rejection
(validation or Hub-write) the handler still cleans up.

Refactor: _append_pending_row (chunk 3) and the new _update_row
(chunk 4) both do a lock-acquire + download + mutate + upload
of results.jsonl. Extracted _hub_rmw_results that takes a
mutate(rows) callable; append is mut.append(row), update is
mut(rows): find by id and r.update(updates). Same lock, same
RMW cycle.

Failure modes covered:
- eval subprocess non-zero / timeout -> row "failed" + reason
- report subprocess non-zero / missing out -> row "failed" + reason
- report JSON build (missing run_summary) -> row "failed" + reason
- Hub upload of reports/<id>.{html,json} -> row "failed" + reason
- Final row-flip itself fails -> row stays "pending";
chunk 6's stuck-pending sweep catches it on next Space boot.

"Queued." UI message updated to include the typical eval wall
clock ("2-5 minutes on this Space's cpu-upgrade tier") so the
user knows roughly when to come back. The pending row's
"evaluating..." cell tag from the earlier polish commit holds
the in-table progress signal until chunk 5 adds the every=10
auto-refresh.

Files changed (1) hide show
  1. submit.py +301 -64
submit.py CHANGED
@@ -1,11 +1,13 @@
1
  """Submit-tab handler for the CADGenBench leaderboard Space.
2
 
3
- Step 6 (E) chunks 2 + 3: cheap-sync validation pipeline + pending-row
4
- write + zip upload. The handler validates the upload, uploads the
5
- zip to ``submissions/<id>.zip``, appends a ``status: pending`` row to
6
- ``results.jsonl`` (under a process-wide lock), and returns
7
- immediately. No eval and no worker yet, the row stays pending
8
- forever until later chunks add the background thread.
 
 
9
 
10
  Validation gates, in order:
11
 
@@ -31,16 +33,37 @@ Hub-write ordering (after validation passes):
31
  2. Build pending row (metadata + null scores + ``submission_blob_url``).
32
  3. Acquire ``_HUB_LOCK``; download current ``results.jsonl`` (or
33
  start empty); append the pending row; re-upload.
 
 
34
 
35
  If step 1 fails the user sees a clean rejection. If step 3 fails the
36
  zip is left orphaned in ``submissions/`` and the user sees a clean
37
  rejection; an orphan-zip sweep is a future-chunk concern.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  """
39
  from __future__ import annotations
40
 
41
  import json
42
  import logging
 
43
  import re
 
 
 
44
  import tempfile
45
  import threading
46
  import zipfile
@@ -69,7 +92,12 @@ REQUIRED_META_KEYS: tuple[str, ...] = (
69
  SUBMISSION_ID_SLUG_MAX = 40
70
  RESULTS_FILENAME = "results.jsonl"
71
  SUBMISSIONS_DIR = "submissions"
 
72
  DATA_REV_SHORT_LEN = 12
 
 
 
 
73
 
74
  # One HfApi client per process. HF_TOKEN is picked up from the env at
75
  # construction time and reused for every call.
@@ -109,33 +137,48 @@ def handle_submit(zip_file) -> str:
109
  return form_err
110
 
111
  zip_path = Path(zip_file.name)
112
- with tempfile.TemporaryDirectory(prefix="cadgenbench-validate-") as tmp:
113
- unpacked = Path(tmp) / "unpacked"
114
- unpacked.mkdir()
 
 
 
 
 
 
115
  try:
116
- _extract_zip(zip_path, unpacked)
117
- meta = _load_and_validate_meta(unpacked)
118
- fixture_names = _validate_fixture_set(unpacked)
119
- _validate_steps_parseable(unpacked, fixture_names)
120
  except _ValidationError as e:
121
  return f"**Submission rejected.** {e}"
122
 
123
- submission_id = _mint_submission_id(
124
- meta["submitter_name"], meta["submission_name"]
125
- )
126
- try:
127
- blob_url = _upload_submission_zip(submission_id, zip_path)
128
- row = _build_pending_row(submission_id, meta, fixture_names, blob_url)
129
- _append_pending_row(row)
130
- except _HubWriteError as e:
131
- return f"**Submission rejected.** {e}"
 
 
 
 
 
 
 
 
132
 
133
  return (
134
- f"**Queued.** Submission `{submission_id}` has been accepted and a "
135
- f"`pending` row added to the leaderboard (submitter: "
136
- f"`{meta['submitter_name']}`, system: `{meta['submission_name']}`, "
137
- f"{len(fixture_names)} fixtures). Evaluation will populate the "
138
- f"score columns once the worker lands in a later chunk."
 
139
  )
140
 
141
 
@@ -358,47 +401,79 @@ def _build_pending_row(
358
 
359
 
360
  def _append_pending_row(row: dict[str, Any]) -> None:
361
- """Append a pending row to ``results.jsonl`` on the Hub under the lock.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
362
 
363
- Read-modify-write: download the current file (or treat as empty if
364
- it doesn't exist yet), append one line, re-upload. The lock is
365
- held only for the duration of this cycle (~1-2s), not for any
366
- background eval; concurrent submitters serialise here, not on the
367
- eval pipeline.
 
 
368
  """
369
- with _HUB_LOCK:
370
- try:
371
- existing = _download_results_jsonl()
372
- except Exception as e: # noqa: BLE001 - Hub API surface is broad
373
- logger.exception("Failed to download results.jsonl for append")
374
- raise _HubWriteError(
375
- f"Server-side error reading the submissions table "
376
- f"({type(e).__name__}: {e}). Please try again later."
377
- ) from e
378
 
379
- line = json.dumps(row, ensure_ascii=False)
380
- new_body = existing + line + "\n" if existing else line + "\n"
 
 
 
 
381
 
382
- try:
383
- _HF_API.upload_file(
384
- path_or_fileobj=new_body.encode("utf-8"),
385
- path_in_repo=RESULTS_FILENAME,
386
- repo_id=HF_SUBMISSIONS_REPO,
387
- repo_type="dataset",
388
- commit_message=(
389
- f"add pending row for {row['submission_id']}"
390
- ),
391
- )
392
- except Exception as e: # noqa: BLE001 - Hub API surface is broad
393
- logger.exception(
394
- "Failed to upload appended results.jsonl for %s",
395
- row["submission_id"],
396
- )
397
- raise _HubWriteError(
398
- f"Server-side error writing the submissions table "
399
- f"({type(e).__name__}: {e}). The submission zip was "
400
- f"uploaded but the row was not; please try again later."
401
- ) from e
 
 
 
 
 
 
 
 
402
 
403
 
404
  def _download_results_jsonl() -> str:
@@ -436,3 +511,165 @@ def _resolve_data_revision() -> str:
436
  )
437
  _DATA_REVISION = "unknown"
438
  return _DATA_REVISION
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  """Submit-tab handler for the CADGenBench leaderboard Space.
2
 
3
+ Step 6 (E) chunks 2 + 3 + 4: cheap-sync validation pipeline + pending-row
4
+ write + zip upload + background-thread eval. The handler validates
5
+ the upload, uploads the zip to ``submissions/<id>.zip``, appends a
6
+ ``status: pending`` row to ``results.jsonl`` (under a process-wide
7
+ lock), spawns a daemon thread to run ``cadgenbench evaluate`` +
8
+ ``cadgenbench report single``, and returns immediately. The worker
9
+ uploads ``reports/<id>.{html,json}`` and flips the row
10
+ ``pending -> completed`` (or ``failed`` with a ``failure_reason``).
11
 
12
  Validation gates, in order:
13
 
 
33
  2. Build pending row (metadata + null scores + ``submission_blob_url``).
34
  3. Acquire ``_HUB_LOCK``; download current ``results.jsonl`` (or
35
  start empty); append the pending row; re-upload.
36
+ 4. Spawn worker thread (daemon, named after submission_id). The
37
+ worker owns the tempdir's lifecycle past this point.
38
 
39
  If step 1 fails the user sees a clean rejection. If step 3 fails the
40
  zip is left orphaned in ``submissions/`` and the user sees a clean
41
  rejection; an orphan-zip sweep is a future-chunk concern.
42
+
43
+ Background worker, per submission:
44
+
45
+ 1. ``cadgenbench evaluate <run_dir>`` (subprocess; runs
46
+ per-fixture eval in parallel via the CLI's ProcessPoolExecutor;
47
+ writes ``run_summary.json`` at the run-dir root).
48
+ 2. ``cadgenbench report single <run_dir> -o <report.html>``
49
+ (subprocess; self-contained HTML with embedded renders).
50
+ 3. Upload ``reports/<id>.html`` + ``reports/<id>.json``. The JSON
51
+ bundles ``run_summary.json`` + every per-fixture ``result.json``.
52
+ 4. Read ``run_summary.json``; under ``_HUB_LOCK`` flip the row's
53
+ ``status`` to ``"completed"`` and merge the score fields.
54
+ 5. On any worker-side exception, flip the row to ``"failed"`` with
55
+ a short ``failure_reason``. Tempdir cleanup runs in ``finally``
56
+ either way.
57
  """
58
  from __future__ import annotations
59
 
60
  import json
61
  import logging
62
+ import os
63
  import re
64
+ import shutil
65
+ import subprocess
66
+ import sys
67
  import tempfile
68
  import threading
69
  import zipfile
 
92
  SUBMISSION_ID_SLUG_MAX = 40
93
  RESULTS_FILENAME = "results.jsonl"
94
  SUBMISSIONS_DIR = "submissions"
95
+ REPORTS_DIR = "reports"
96
  DATA_REV_SHORT_LEN = 12
97
+ FAILURE_REASON_MAX_CHARS = 200
98
+ EVAL_TIMEOUT_SECONDS = 15 * 60
99
+ REPORT_TIMEOUT_SECONDS = 2 * 60
100
+ EVAL_WORKER_COUNT = "8"
101
 
102
  # One HfApi client per process. HF_TOKEN is picked up from the env at
103
  # construction time and reused for every call.
 
137
  return form_err
138
 
139
  zip_path = Path(zip_file.name)
140
+
141
+ # Manual tempdir lifecycle: cleaned up here on any rejection, but
142
+ # ownership passes to the worker on a successful spawn (the worker
143
+ # cleans up in its own finally). TemporaryDirectory's context
144
+ # manager doesn't fit because the dir has to outlive this function.
145
+ tmp = Path(tempfile.mkdtemp(prefix="cadgenbench-submit-"))
146
+ run_dir = tmp / "run"
147
+ run_dir.mkdir()
148
+ try:
149
  try:
150
+ _extract_zip(zip_path, run_dir)
151
+ meta = _load_and_validate_meta(run_dir)
152
+ fixture_names = _validate_fixture_set(run_dir)
153
+ _validate_steps_parseable(run_dir, fixture_names)
154
  except _ValidationError as e:
155
  return f"**Submission rejected.** {e}"
156
 
157
+ submission_id = _mint_submission_id(
158
+ meta["submitter_name"], meta["submission_name"]
159
+ )
160
+ try:
161
+ blob_url = _upload_submission_zip(submission_id, zip_path)
162
+ row = _build_pending_row(
163
+ submission_id, meta, fixture_names, blob_url
164
+ )
165
+ _append_pending_row(row)
166
+ except _HubWriteError as e:
167
+ return f"**Submission rejected.** {e}"
168
+
169
+ _spawn_worker(submission_id, tmp, run_dir)
170
+ tmp = None # ownership transferred; skip cleanup below
171
+ finally:
172
+ if tmp is not None:
173
+ shutil.rmtree(tmp, ignore_errors=True)
174
 
175
  return (
176
+ f"**Queued.** Submission `{submission_id}` has been accepted "
177
+ f"(submitter: `{meta['submitter_name']}`, system: "
178
+ f"`{meta['submission_name']}`, {len(fixture_names)} fixtures). "
179
+ f"Evaluation typically takes 2-5 minutes on this Space's "
180
+ f"`cpu-upgrade` tier; the row flips to `completed` with score "
181
+ f"columns populated when the worker finishes."
182
  )
183
 
184
 
 
401
 
402
 
403
  def _append_pending_row(row: dict[str, Any]) -> None:
404
+ """Append a pending row to ``results.jsonl`` on the Hub under the lock."""
405
+ submission_id = row["submission_id"]
406
+
407
+ def mutate(rows: list[dict[str, Any]]) -> None:
408
+ rows.append(row)
409
+
410
+ try:
411
+ _hub_rmw_results(
412
+ mutate, commit_message=f"add pending row for {submission_id}"
413
+ )
414
+ except Exception as e: # noqa: BLE001 - Hub API surface is broad
415
+ logger.exception(
416
+ "Failed RMW of results.jsonl while appending pending row for %s",
417
+ submission_id,
418
+ )
419
+ raise _HubWriteError(
420
+ f"Server-side error writing the submissions table "
421
+ f"({type(e).__name__}: {e}). The submission zip was uploaded "
422
+ f"but the row was not; please try again later."
423
+ ) from e
424
 
425
+
426
+ def _update_row(submission_id: str, updates: dict[str, Any]) -> None:
427
+ """Find the row for *submission_id* and merge *updates* into it.
428
+
429
+ Raises ``LookupError`` if no row with that id exists (worker fired
430
+ before the pending row was committed, which shouldn't happen, but
431
+ surfaces clearly if it ever does).
432
  """
433
+ def mutate(rows: list[dict[str, Any]]) -> None:
434
+ for r in rows:
435
+ if r.get("submission_id") == submission_id:
436
+ r.update(updates)
437
+ return
438
+ raise LookupError(
439
+ f"No row with submission_id={submission_id!r} in results.jsonl."
440
+ )
 
441
 
442
+ _hub_rmw_results(
443
+ mutate,
444
+ commit_message=(
445
+ f"flip row for {submission_id} -> {updates.get('status', '?')}"
446
+ ),
447
+ )
448
 
449
+
450
+ def _hub_rmw_results(
451
+ mutate, *, commit_message: str,
452
+ ) -> None:
453
+ """Lock + download + mutate + upload of ``results.jsonl``.
454
+
455
+ The lock is held only for the read-modify-write cycle (~1-2s),
456
+ never for eval time. Concurrent submitters serialise here, not
457
+ in the eval pipeline. Treats a missing file as the empty list.
458
+ """
459
+ with _HUB_LOCK:
460
+ existing = _download_results_jsonl()
461
+ rows: list[dict[str, Any]] = [
462
+ json.loads(line) for line in existing.splitlines() if line.strip()
463
+ ]
464
+ mutate(rows)
465
+ new_body = (
466
+ "\n".join(json.dumps(r, ensure_ascii=False) for r in rows) + "\n"
467
+ if rows
468
+ else ""
469
+ )
470
+ _HF_API.upload_file(
471
+ path_or_fileobj=new_body.encode("utf-8"),
472
+ path_in_repo=RESULTS_FILENAME,
473
+ repo_id=HF_SUBMISSIONS_REPO,
474
+ repo_type="dataset",
475
+ commit_message=commit_message,
476
+ )
477
 
478
 
479
  def _download_results_jsonl() -> str:
 
511
  )
512
  _DATA_REVISION = "unknown"
513
  return _DATA_REVISION
514
+
515
+
516
+ # ---------------------------------------------------------------------------
517
+ # Background worker (eval + report + row flip)
518
+ # ---------------------------------------------------------------------------
519
+
520
+
521
+ def _spawn_worker(submission_id: str, tmp: Path, run_dir: Path) -> None:
522
+ """Start the eval worker thread. Fire-and-forget; daemon=True so a
523
+ Space restart doesn't block on in-flight workers (chunk 6's
524
+ boot-time sweep flips any rows their workers didn't finish to
525
+ failed).
526
+ """
527
+ t = threading.Thread(
528
+ target=_run_worker,
529
+ args=(submission_id, tmp, run_dir),
530
+ name=f"cgb-worker-{submission_id}",
531
+ daemon=True,
532
+ )
533
+ t.start()
534
+
535
+
536
+ def _run_worker(submission_id: str, tmp: Path, run_dir: Path) -> None:
537
+ """Top-level worker entry: run eval, build + upload reports, flip row.
538
+
539
+ Any exception in the pipeline flips the row to ``failed`` with a
540
+ short ``failure_reason`` (full traceback goes to the Space's
541
+ runtime logs). The tempdir is always cleaned up.
542
+ """
543
+ try:
544
+ try:
545
+ _run_eval(run_dir)
546
+ report_html = tmp / f"{submission_id}.html"
547
+ _run_report(run_dir, report_html)
548
+ report_json = _build_report_json(run_dir)
549
+ _upload_reports(submission_id, report_html, report_json)
550
+ summary = json.loads(
551
+ (run_dir / "run_summary.json").read_text(encoding="utf-8")
552
+ )
553
+ _flip_row_to_completed(submission_id, summary)
554
+ logger.info("Worker completed for %s", submission_id)
555
+ except Exception as e: # noqa: BLE001 - broad on purpose; we map to row state
556
+ logger.exception("Worker failed for %s", submission_id)
557
+ reason = f"{type(e).__name__}: {str(e)}"[:FAILURE_REASON_MAX_CHARS]
558
+ try:
559
+ _flip_row_to_failed(submission_id, reason)
560
+ except Exception:
561
+ # If even the row-flip fails, the row stays pending.
562
+ # Chunk 6's stuck-pending sweep will catch it on the
563
+ # next Space boot.
564
+ logger.exception(
565
+ "Failed to flip row to failed for %s; row stays pending",
566
+ submission_id,
567
+ )
568
+ finally:
569
+ shutil.rmtree(tmp, ignore_errors=True)
570
+
571
+
572
+ def _run_eval(run_dir: Path) -> None:
573
+ """Invoke ``cadgenbench evaluate`` over the run_dir; raise on non-zero."""
574
+ cmd = [
575
+ sys.executable, "-m", "cadgenbench.cli", "evaluate", str(run_dir),
576
+ "--workers", EVAL_WORKER_COUNT,
577
+ ]
578
+ logger.info("Running eval: %s", " ".join(cmd))
579
+ proc = subprocess.run(
580
+ cmd,
581
+ capture_output=True,
582
+ text=True,
583
+ timeout=EVAL_TIMEOUT_SECONDS,
584
+ env=os.environ.copy(),
585
+ check=False,
586
+ )
587
+ if proc.returncode != 0:
588
+ # Surface a short tail of stderr; full output is in Space logs above.
589
+ tail = (proc.stderr or proc.stdout or "")[-500:].strip()
590
+ raise RuntimeError(
591
+ f"cadgenbench evaluate exited {proc.returncode}: {tail}"
592
+ )
593
+
594
+
595
+ def _run_report(run_dir: Path, html_out: Path) -> None:
596
+ """Invoke ``cadgenbench report single`` for the run_dir; raise on non-zero."""
597
+ cmd = [
598
+ sys.executable, "-m", "cadgenbench.cli", "report", "single",
599
+ str(run_dir), "-o", str(html_out),
600
+ ]
601
+ logger.info("Running report: %s", " ".join(cmd))
602
+ proc = subprocess.run(
603
+ cmd,
604
+ capture_output=True,
605
+ text=True,
606
+ timeout=REPORT_TIMEOUT_SECONDS,
607
+ env=os.environ.copy(),
608
+ check=False,
609
+ )
610
+ if proc.returncode != 0 or not html_out.is_file():
611
+ tail = (proc.stderr or proc.stdout or "")[-500:].strip()
612
+ raise RuntimeError(
613
+ f"cadgenbench report single exited {proc.returncode}: {tail}"
614
+ )
615
+
616
+
617
+ def _build_report_json(run_dir: Path) -> dict[str, Any]:
618
+ """Bundle ``run_summary.json`` + every per-fixture ``result.json``."""
619
+ summary_path = run_dir / "run_summary.json"
620
+ if not summary_path.is_file():
621
+ raise RuntimeError(
622
+ f"run_summary.json not produced under {run_dir} (eval issue?)"
623
+ )
624
+ summary = json.loads(summary_path.read_text(encoding="utf-8"))
625
+ per_fixture: dict[str, dict[str, Any]] = {}
626
+ for fixture_dir in sorted(d for d in run_dir.iterdir() if d.is_dir()):
627
+ rp = fixture_dir / "result.json"
628
+ if rp.is_file():
629
+ per_fixture[fixture_dir.name] = json.loads(
630
+ rp.read_text(encoding="utf-8")
631
+ )
632
+ return {"run_summary": summary, "per_fixture_results": per_fixture}
633
+
634
+
635
+ def _upload_reports(
636
+ submission_id: str, html_path: Path, report_json: dict[str, Any],
637
+ ) -> None:
638
+ """Upload ``reports/<id>.html`` and ``reports/<id>.json`` to the Hub."""
639
+ _HF_API.upload_file(
640
+ path_or_fileobj=str(html_path),
641
+ path_in_repo=f"{REPORTS_DIR}/{submission_id}.html",
642
+ repo_id=HF_SUBMISSIONS_REPO,
643
+ repo_type="dataset",
644
+ commit_message=f"add HTML report for {submission_id}",
645
+ )
646
+ _HF_API.upload_file(
647
+ path_or_fileobj=json.dumps(report_json, ensure_ascii=False, indent=2).encode("utf-8"),
648
+ path_in_repo=f"{REPORTS_DIR}/{submission_id}.json",
649
+ repo_id=HF_SUBMISSIONS_REPO,
650
+ repo_type="dataset",
651
+ commit_message=f"add JSON report for {submission_id}",
652
+ )
653
+
654
+
655
+ def _flip_row_to_completed(submission_id: str, summary: dict[str, Any]) -> None:
656
+ """Merge ``run_summary.json`` fields into the pending row."""
657
+ updates: dict[str, Any] = {
658
+ "status": "completed",
659
+ "failure_reason": None,
660
+ "cadgenbench_data_revision": _resolve_data_revision(),
661
+ "aggregate_score": summary.get("aggregate_score"),
662
+ "validity_rate": summary.get("validity_rate"),
663
+ "score_by_task_type": summary.get("score_by_task_type"),
664
+ "per_task_scores": summary.get("per_task_scores"),
665
+ "per_fixture_scores": summary.get("per_fixture_scores"),
666
+ }
667
+ _update_row(submission_id, updates)
668
+
669
+
670
+ def _flip_row_to_failed(submission_id: str, reason: str) -> None:
671
+ """Mark the row as ``failed`` with a short reason; scores stay null."""
672
+ _update_row(
673
+ submission_id,
674
+ {"status": "failed", "failure_reason": reason},
675
+ )