Spaces:
Build error
Build error
| # Benchmark Controller FSM (Strict-safe M3 Sweep) | |
| ## Component and scope | |
| Component: benchmark orchestration path in `training/kan_benchmark_suite.py` that drives matrix runs, applies toolchain gates, and writes per-cell telemetry. | |
| External dependencies: command-line arguments, toolchain manifest cache, optional training subprocess, filesystem outputs, and user cancellation. | |
| ## State list (mutually exclusive) | |
| 1. `IDLE` | |
| No run initialized. Invariants: no active case, no snapshot timer. | |
| 2. `PRE_FLIGHT` | |
| Manifest/cached toolchain status loaded. Invariants: manifest snapshot hash set. | |
| 3. `GATE_WAIT` | |
| Running toolchain gate checks for a case. Invariants: pending case id is current. | |
| 4. `RUN_READY` | |
| All non-strict checks passed and base args are frozen for run. Invariants: `kernel_profile`, `runtime_backend_plan`, and sweep params are consistent. | |
| 5. `RUNNING` | |
| One benchmark cell is executing. Invariants: `run_id` and `seed` are assigned; history stream is hot. | |
| 6. `RUN_COMPLETED` | |
| Current cell finished and result is buffered. Invariants: final metrics exist or failure row written. | |
| 7. `TEARDOWN` | |
| Persisting run row and cleaning per-run artifacts. Invariants: output files open. | |
| 8. `ERROR` | |
| Hard gate or runtime failure; may still emit failure row in non-blocking cases. | |
| 9. `CANCELLED` | |
| User-initiated cancel/unmount; active run is aborted and best-effort persisted. | |
| ## Events | |
| - `E_INIT` | |
| - `E_CONFIG_PARSED` | |
| - `E_PRE_FLIGHT_OK` | |
| - `E_PRE_FLIGHT_FAIL` | |
| - `E_GATE_CHECK_OK` | |
| - `E_GATE_CHECK_FAIL_WARN` | |
| - `E_GATE_CHECK_FAIL_ERROR` | |
| - `E_CASE_START` | |
| - `E_STEP_DONE` | |
| - `E_RUN_OK` | |
| - `E_RUN_FAIL` | |
| - `E_RETRY` | |
| - `E_CANCEL` | |
| - `E_TIMEOUT` | |
| - `E_UNMOUNT` | |
| - `E_STALE_EVENT(older_run_id)` | |
| - `E_NEW_INPUTS` | |
| ## Guards | |
| - `G_strict_mode`: strict coreml mode is active. | |
| - `G_gate_requires_coreml`: case path requires strict CoreML visibility. | |
| - `G_case_requires_ane`: current case runtime plan is ANE/HYBRID. | |
| - `G_retry_budget`: remaining retries > 0. | |
| - `G_fresh`: event run id matches current `run_id`. | |
| - `G_output_ok`: output directory writable. | |
| - `G_cancel_requested`: cancellation flag set. | |
| ## Side effects | |
| - Build environment manifest + cache lookup (`_collect_toolchain_manifest`). | |
| - Evaluate gate (`_evaluate_toolchain_gate`). | |
| - Create per-case output directory. | |
| - Instantiate training args (`_set_args_from_base`) and invoke `run_training`. | |
| - Write per-cell run JSON + summary CSV. | |
| - Emit console warning/error lines. | |
| - On cancel/unmount: clear in-flight worker handles and skip remaining scheduled cases. | |
| ## Transition table | |
| | state | event | guard | next state | actions | | |
| |---|---|---|---|---| | |
| | `IDLE` | `E_CONFIG_PARSED` | `G_output_ok` | `PRE_FLIGHT` | capture manifest and persist suite manifest | | |
| | `IDLE` | `E_CONFIG_PARSED` | `~G_output_ok` | `ERROR` | fail fast, emit manifest I/O error | | |
| | `PRE_FLIGHT` | `E_PRE_FLIGHT_OK` | `True` | `GATE_WAIT` | compute suite defaults and base args | | |
| | `PRE_FLIGHT` | `E_PRE_FLIGHT_FAIL` | `True` | `ERROR` | add gate diagnostics row, continue if warn | | |
| | `GATE_WAIT` | `E_GATE_CHECK_OK` | `~G_gate_requires_coreml OR ~G_strict_mode` | `RUN_READY` | record toolchain_gate_issues (empty) | | |
| | `GATE_WAIT` | `E_GATE_CHECK_FAIL_WARN` | `~G_strict_mode` | `RUN_READY` | record issues; mark warning metadata | | |
| | `GATE_WAIT` | `E_GATE_CHECK_FAIL_ERROR` | `G_strict_mode` | `ERROR` | throw/fail row with `coreml` reason | | |
| | `RUN_READY` | `E_CASE_START` | `G_fresh AND ~G_cancel_requested` | `RUNNING` | set `run_id`, `seed`, case overrides | | |
| | `RUN_READY` | `E_NEW_INPUTS` | `G_fresh` | `RUN_READY` | update next-case policy and rebuild base args | | |
| | `RUNNING` | `E_STEP_DONE` | `G_fresh` | `RUNNING` | append telemetry step from history stream | | |
| | `RUNNING` | `E_RUN_OK` | `G_fresh` | `RUN_COMPLETED` | finalize metrics and compute row-level ratios | | |
| | `RUNNING` | `E_RUN_FAIL` | `G_fresh` | `RUN_COMPLETED` | persist failure row with `toolchain_gate_ok=False` | | |
| | `RUNNING` | `E_RUN_FAIL` | `~G_fresh` | `RUNNING` | drop stale result, retain active run | | |
| | `RUNNING` | `E_TIMEOUT` | `G_retry_budget` | `ERROR` | cancel/retry with backoff policy | | |
| | `RUN_COMPLETED` | `E_CASE_START` | `G_fresh` | `TEARDOWN` | collect manifest + append run_result | | |
| | `RUN_COMPLETED` | `E_CANCEL` | `~G_cancel_requested` | `TEARDOWN` | mark incomplete row and break loops | | |
| | `TEARDOWN` | `E_STEP_DONE` | `True` | `TEARDOWN` | continue writing CSV artifact updates | | |
| | `TEARDOWN` | `E_RUN_OK` | `run remaining cases` | `GATE_WAIT` | schedule next case | | |
| | `TEARDOWN` | `E_RUN_OK` | `~run remaining cases` | `IDLE` | emit final report paths | | |
| | `ERROR` | `E_RETRY` | `G_retry_budget` | `GATE_WAIT` | re-run last case with updated seed/backoff | | |
| | `ERROR` | `E_CANCEL` | `True` | `CANCELLED` | stop scheduling, persist partial report | | |
| | `CANCELLED` | `E_UNMOUNT` | `True` | `IDLE` | flush pending writes, close handles | | |
| | any | `E_CANCEL` | `G_cancel_requested` | `CANCELLED` | set abort flag and stop future case launches | | |
| ## Mermaid | |
| ```mermaid | |
| stateDiagram-v2 | |
| [*] --> IDLE | |
| IDLE --> PRE_FLIGHT : E_CONFIG_PARSED / capture_manifest | |
| PRE_FLIGHT --> GATE_WAIT : E_PRE_FLIGHT_OK | |
| PRE_FLIGHT --> ERROR : E_PRE_FLIGHT_FAIL | |
| GATE_WAIT --> RUN_READY : E_GATE_CHECK_OK | |
| GATE_WAIT --> RUN_READY : E_GATE_CHECK_FAIL_WARN | |
| GATE_WAIT --> ERROR : E_GATE_CHECK_FAIL_ERROR | |
| RUN_READY --> RUNNING : E_CASE_START | |
| RUNNING --> RUNNING : E_STEP_DONE | |
| RUNNING --> RUN_COMPLETED : E_RUN_OK | |
| RUNNING --> RUN_COMPLETED : E_RUN_FAIL | |
| RUN_COMPLETED --> TEARDOWN : E_CASE_START | |
| TEARDOWN --> GATE_WAIT : next_case | |
| TEARDOWN --> IDLE : all_cases_done | |
| ERROR --> GATE_WAIT : E_RETRY | |
| ERROR --> CANCELLED : E_CANCEL | |
| CANCELLED --> IDLE : E_UNMOUNT | |
| RUN_READY --> CANCELLED : E_CANCEL | |
| RUNNING --> CANCELLED : E_CANCEL | |
| RUN_COMPLETED --> CANCELLED : E_CANCEL | |
| IDLE --> [*] : process_end | |
| ``` | |
| ## Race and stale-event handling | |
| - Older in-flight run events are ignored using `run_id` guard (`G_fresh`). | |
| - If `E_NEW_INPUTS` arrives while `RUNNING`, latest override is accepted only after current run enters `TEARDOWN`. | |
| - `E_CANCEL` always has priority over `E_STEP_DONE` and transitions directly to `CANCELLED`. | |
| - On unmount, only the latest active run ID is allowed to persist output; stale completions are dropped. | |
| ## Edge-coverage tests | |
| 1. Start in `IDLE`; strict gate missing-coreml in `error` mode => `GATE_WAIT -> ERROR`. | |
| 2. Warn mode missing-coreml => `GATE_WAIT -> RUN_READY` with warning metadata. | |
| 3. `RUNNING` stale result while next run started => stale event dropped, active run continues. | |
| 4. `E_CANCEL` during `RUNNING` => no additional case launches after current step. | |
| 5. Retry path from `ERROR` executes when retry budget remains and clears last failed case cache. | |
| 6. `toolchain_gate_coreml_issue` populated only when gate failure string contains coreml keywords. | |