# Benchmark Controller FSM (Strict-safe M3 Sweep) ## Component and scope Component: benchmark orchestration path in `training/kan_benchmark_suite.py` that drives matrix runs, applies toolchain gates, and writes per-cell telemetry. External dependencies: command-line arguments, toolchain manifest cache, optional training subprocess, filesystem outputs, and user cancellation. ## State list (mutually exclusive) 1. `IDLE` No run initialized. Invariants: no active case, no snapshot timer. 2. `PRE_FLIGHT` Manifest/cached toolchain status loaded. Invariants: manifest snapshot hash set. 3. `GATE_WAIT` Running toolchain gate checks for a case. Invariants: pending case id is current. 4. `RUN_READY` All non-strict checks passed and base args are frozen for run. Invariants: `kernel_profile`, `runtime_backend_plan`, and sweep params are consistent. 5. `RUNNING` One benchmark cell is executing. Invariants: `run_id` and `seed` are assigned; history stream is hot. 6. `RUN_COMPLETED` Current cell finished and result is buffered. Invariants: final metrics exist or failure row written. 7. `TEARDOWN` Persisting run row and cleaning per-run artifacts. Invariants: output files open. 8. `ERROR` Hard gate or runtime failure; may still emit failure row in non-blocking cases. 9. `CANCELLED` User-initiated cancel/unmount; active run is aborted and best-effort persisted. ## Events - `E_INIT` - `E_CONFIG_PARSED` - `E_PRE_FLIGHT_OK` - `E_PRE_FLIGHT_FAIL` - `E_GATE_CHECK_OK` - `E_GATE_CHECK_FAIL_WARN` - `E_GATE_CHECK_FAIL_ERROR` - `E_CASE_START` - `E_STEP_DONE` - `E_RUN_OK` - `E_RUN_FAIL` - `E_RETRY` - `E_CANCEL` - `E_TIMEOUT` - `E_UNMOUNT` - `E_STALE_EVENT(older_run_id)` - `E_NEW_INPUTS` ## Guards - `G_strict_mode`: strict coreml mode is active. - `G_gate_requires_coreml`: case path requires strict CoreML visibility. - `G_case_requires_ane`: current case runtime plan is ANE/HYBRID. - `G_retry_budget`: remaining retries > 0. - `G_fresh`: event run id matches current `run_id`. - `G_output_ok`: output directory writable. - `G_cancel_requested`: cancellation flag set. ## Side effects - Build environment manifest + cache lookup (`_collect_toolchain_manifest`). - Evaluate gate (`_evaluate_toolchain_gate`). - Create per-case output directory. - Instantiate training args (`_set_args_from_base`) and invoke `run_training`. - Write per-cell run JSON + summary CSV. - Emit console warning/error lines. - On cancel/unmount: clear in-flight worker handles and skip remaining scheduled cases. ## Transition table | state | event | guard | next state | actions | |---|---|---|---|---| | `IDLE` | `E_CONFIG_PARSED` | `G_output_ok` | `PRE_FLIGHT` | capture manifest and persist suite manifest | | `IDLE` | `E_CONFIG_PARSED` | `~G_output_ok` | `ERROR` | fail fast, emit manifest I/O error | | `PRE_FLIGHT` | `E_PRE_FLIGHT_OK` | `True` | `GATE_WAIT` | compute suite defaults and base args | | `PRE_FLIGHT` | `E_PRE_FLIGHT_FAIL` | `True` | `ERROR` | add gate diagnostics row, continue if warn | | `GATE_WAIT` | `E_GATE_CHECK_OK` | `~G_gate_requires_coreml OR ~G_strict_mode` | `RUN_READY` | record toolchain_gate_issues (empty) | | `GATE_WAIT` | `E_GATE_CHECK_FAIL_WARN` | `~G_strict_mode` | `RUN_READY` | record issues; mark warning metadata | | `GATE_WAIT` | `E_GATE_CHECK_FAIL_ERROR` | `G_strict_mode` | `ERROR` | throw/fail row with `coreml` reason | | `RUN_READY` | `E_CASE_START` | `G_fresh AND ~G_cancel_requested` | `RUNNING` | set `run_id`, `seed`, case overrides | | `RUN_READY` | `E_NEW_INPUTS` | `G_fresh` | `RUN_READY` | update next-case policy and rebuild base args | | `RUNNING` | `E_STEP_DONE` | `G_fresh` | `RUNNING` | append telemetry step from history stream | | `RUNNING` | `E_RUN_OK` | `G_fresh` | `RUN_COMPLETED` | finalize metrics and compute row-level ratios | | `RUNNING` | `E_RUN_FAIL` | `G_fresh` | `RUN_COMPLETED` | persist failure row with `toolchain_gate_ok=False` | | `RUNNING` | `E_RUN_FAIL` | `~G_fresh` | `RUNNING` | drop stale result, retain active run | | `RUNNING` | `E_TIMEOUT` | `G_retry_budget` | `ERROR` | cancel/retry with backoff policy | | `RUN_COMPLETED` | `E_CASE_START` | `G_fresh` | `TEARDOWN` | collect manifest + append run_result | | `RUN_COMPLETED` | `E_CANCEL` | `~G_cancel_requested` | `TEARDOWN` | mark incomplete row and break loops | | `TEARDOWN` | `E_STEP_DONE` | `True` | `TEARDOWN` | continue writing CSV artifact updates | | `TEARDOWN` | `E_RUN_OK` | `run remaining cases` | `GATE_WAIT` | schedule next case | | `TEARDOWN` | `E_RUN_OK` | `~run remaining cases` | `IDLE` | emit final report paths | | `ERROR` | `E_RETRY` | `G_retry_budget` | `GATE_WAIT` | re-run last case with updated seed/backoff | | `ERROR` | `E_CANCEL` | `True` | `CANCELLED` | stop scheduling, persist partial report | | `CANCELLED` | `E_UNMOUNT` | `True` | `IDLE` | flush pending writes, close handles | | any | `E_CANCEL` | `G_cancel_requested` | `CANCELLED` | set abort flag and stop future case launches | ## Mermaid ```mermaid stateDiagram-v2 [*] --> IDLE IDLE --> PRE_FLIGHT : E_CONFIG_PARSED / capture_manifest PRE_FLIGHT --> GATE_WAIT : E_PRE_FLIGHT_OK PRE_FLIGHT --> ERROR : E_PRE_FLIGHT_FAIL GATE_WAIT --> RUN_READY : E_GATE_CHECK_OK GATE_WAIT --> RUN_READY : E_GATE_CHECK_FAIL_WARN GATE_WAIT --> ERROR : E_GATE_CHECK_FAIL_ERROR RUN_READY --> RUNNING : E_CASE_START RUNNING --> RUNNING : E_STEP_DONE RUNNING --> RUN_COMPLETED : E_RUN_OK RUNNING --> RUN_COMPLETED : E_RUN_FAIL RUN_COMPLETED --> TEARDOWN : E_CASE_START TEARDOWN --> GATE_WAIT : next_case TEARDOWN --> IDLE : all_cases_done ERROR --> GATE_WAIT : E_RETRY ERROR --> CANCELLED : E_CANCEL CANCELLED --> IDLE : E_UNMOUNT RUN_READY --> CANCELLED : E_CANCEL RUNNING --> CANCELLED : E_CANCEL RUN_COMPLETED --> CANCELLED : E_CANCEL IDLE --> [*] : process_end ``` ## Race and stale-event handling - Older in-flight run events are ignored using `run_id` guard (`G_fresh`). - If `E_NEW_INPUTS` arrives while `RUNNING`, latest override is accepted only after current run enters `TEARDOWN`. - `E_CANCEL` always has priority over `E_STEP_DONE` and transitions directly to `CANCELLED`. - On unmount, only the latest active run ID is allowed to persist output; stale completions are dropped. ## Edge-coverage tests 1. Start in `IDLE`; strict gate missing-coreml in `error` mode => `GATE_WAIT -> ERROR`. 2. Warn mode missing-coreml => `GATE_WAIT -> RUN_READY` with warning metadata. 3. `RUNNING` stale result while next run started => stale event dropped, active run continues. 4. `E_CANCEL` during `RUNNING` => no additional case launches after current step. 5. Retry path from `ERROR` executes when retry budget remains and clears last failed case cache. 6. `toolchain_gate_coreml_issue` populated only when gate failure string contains coreml keywords.