Spaces:
Build error
Build error
Benchmark Controller FSM (Strict-safe M3 Sweep)
Component and scope
Component: benchmark orchestration path in training/kan_benchmark_suite.py that drives matrix runs, applies toolchain gates, and writes per-cell telemetry.
External dependencies: command-line arguments, toolchain manifest cache, optional training subprocess, filesystem outputs, and user cancellation.
State list (mutually exclusive)
IDLE
No run initialized. Invariants: no active case, no snapshot timer.PRE_FLIGHT
Manifest/cached toolchain status loaded. Invariants: manifest snapshot hash set.GATE_WAIT
Running toolchain gate checks for a case. Invariants: pending case id is current.RUN_READY
All non-strict checks passed and base args are frozen for run. Invariants:kernel_profile,runtime_backend_plan, and sweep params are consistent.RUNNING
One benchmark cell is executing. Invariants:run_idandseedare assigned; history stream is hot.RUN_COMPLETED
Current cell finished and result is buffered. Invariants: final metrics exist or failure row written.TEARDOWN
Persisting run row and cleaning per-run artifacts. Invariants: output files open.ERROR
Hard gate or runtime failure; may still emit failure row in non-blocking cases.CANCELLED
User-initiated cancel/unmount; active run is aborted and best-effort persisted.
Events
E_INITE_CONFIG_PARSEDE_PRE_FLIGHT_OKE_PRE_FLIGHT_FAILE_GATE_CHECK_OKE_GATE_CHECK_FAIL_WARNE_GATE_CHECK_FAIL_ERRORE_CASE_STARTE_STEP_DONEE_RUN_OKE_RUN_FAILE_RETRYE_CANCELE_TIMEOUTE_UNMOUNTE_STALE_EVENT(older_run_id)E_NEW_INPUTS
Guards
G_strict_mode: strict coreml mode is active.G_gate_requires_coreml: case path requires strict CoreML visibility.G_case_requires_ane: current case runtime plan is ANE/HYBRID.G_retry_budget: remaining retries > 0.G_fresh: event run id matches currentrun_id.G_output_ok: output directory writable.G_cancel_requested: cancellation flag set.
Side effects
- Build environment manifest + cache lookup (
_collect_toolchain_manifest). - Evaluate gate (
_evaluate_toolchain_gate). - Create per-case output directory.
- Instantiate training args (
_set_args_from_base) and invokerun_training. - Write per-cell run JSON + summary CSV.
- Emit console warning/error lines.
- On cancel/unmount: clear in-flight worker handles and skip remaining scheduled cases.
Transition table
| state | event | guard | next state | actions |
|---|---|---|---|---|
IDLE |
E_CONFIG_PARSED |
G_output_ok |
PRE_FLIGHT |
capture manifest and persist suite manifest |
IDLE |
E_CONFIG_PARSED |
~G_output_ok |
ERROR |
fail fast, emit manifest I/O error |
PRE_FLIGHT |
E_PRE_FLIGHT_OK |
True |
GATE_WAIT |
compute suite defaults and base args |
PRE_FLIGHT |
E_PRE_FLIGHT_FAIL |
True |
ERROR |
add gate diagnostics row, continue if warn |
GATE_WAIT |
E_GATE_CHECK_OK |
~G_gate_requires_coreml OR ~G_strict_mode |
RUN_READY |
record toolchain_gate_issues (empty) |
GATE_WAIT |
E_GATE_CHECK_FAIL_WARN |
~G_strict_mode |
RUN_READY |
record issues; mark warning metadata |
GATE_WAIT |
E_GATE_CHECK_FAIL_ERROR |
G_strict_mode |
ERROR |
throw/fail row with coreml reason |
RUN_READY |
E_CASE_START |
G_fresh AND ~G_cancel_requested |
RUNNING |
set run_id, seed, case overrides |
RUN_READY |
E_NEW_INPUTS |
G_fresh |
RUN_READY |
update next-case policy and rebuild base args |
RUNNING |
E_STEP_DONE |
G_fresh |
RUNNING |
append telemetry step from history stream |
RUNNING |
E_RUN_OK |
G_fresh |
RUN_COMPLETED |
finalize metrics and compute row-level ratios |
RUNNING |
E_RUN_FAIL |
G_fresh |
RUN_COMPLETED |
persist failure row with toolchain_gate_ok=False |
RUNNING |
E_RUN_FAIL |
~G_fresh |
RUNNING |
drop stale result, retain active run |
RUNNING |
E_TIMEOUT |
G_retry_budget |
ERROR |
cancel/retry with backoff policy |
RUN_COMPLETED |
E_CASE_START |
G_fresh |
TEARDOWN |
collect manifest + append run_result |
RUN_COMPLETED |
E_CANCEL |
~G_cancel_requested |
TEARDOWN |
mark incomplete row and break loops |
TEARDOWN |
E_STEP_DONE |
True |
TEARDOWN |
continue writing CSV artifact updates |
TEARDOWN |
E_RUN_OK |
run remaining cases |
GATE_WAIT |
schedule next case |
TEARDOWN |
E_RUN_OK |
~run remaining cases |
IDLE |
emit final report paths |
ERROR |
E_RETRY |
G_retry_budget |
GATE_WAIT |
re-run last case with updated seed/backoff |
ERROR |
E_CANCEL |
True |
CANCELLED |
stop scheduling, persist partial report |
CANCELLED |
E_UNMOUNT |
True |
IDLE |
flush pending writes, close handles |
| any | E_CANCEL |
G_cancel_requested |
CANCELLED |
set abort flag and stop future case launches |
Mermaid
stateDiagram-v2
[*] --> IDLE
IDLE --> PRE_FLIGHT : E_CONFIG_PARSED / capture_manifest
PRE_FLIGHT --> GATE_WAIT : E_PRE_FLIGHT_OK
PRE_FLIGHT --> ERROR : E_PRE_FLIGHT_FAIL
GATE_WAIT --> RUN_READY : E_GATE_CHECK_OK
GATE_WAIT --> RUN_READY : E_GATE_CHECK_FAIL_WARN
GATE_WAIT --> ERROR : E_GATE_CHECK_FAIL_ERROR
RUN_READY --> RUNNING : E_CASE_START
RUNNING --> RUNNING : E_STEP_DONE
RUNNING --> RUN_COMPLETED : E_RUN_OK
RUNNING --> RUN_COMPLETED : E_RUN_FAIL
RUN_COMPLETED --> TEARDOWN : E_CASE_START
TEARDOWN --> GATE_WAIT : next_case
TEARDOWN --> IDLE : all_cases_done
ERROR --> GATE_WAIT : E_RETRY
ERROR --> CANCELLED : E_CANCEL
CANCELLED --> IDLE : E_UNMOUNT
RUN_READY --> CANCELLED : E_CANCEL
RUNNING --> CANCELLED : E_CANCEL
RUN_COMPLETED --> CANCELLED : E_CANCEL
IDLE --> [*] : process_end
Race and stale-event handling
- Older in-flight run events are ignored using
run_idguard (G_fresh). - If
E_NEW_INPUTSarrives whileRUNNING, latest override is accepted only after current run entersTEARDOWN. E_CANCELalways has priority overE_STEP_DONEand transitions directly toCANCELLED.- On unmount, only the latest active run ID is allowed to persist output; stale completions are dropped.
Edge-coverage tests
- Start in
IDLE; strict gate missing-coreml inerrormode =>GATE_WAIT -> ERROR. - Warn mode missing-coreml =>
GATE_WAIT -> RUN_READYwith warning metadata. RUNNINGstale result while next run started => stale event dropped, active run continues.E_CANCELduringRUNNING=> no additional case launches after current step.- Retry path from
ERRORexecutes when retry budget remains and clears last failed case cache. toolchain_gate_coreml_issuepopulated only when gate failure string contains coreml keywords.