Spaces:
Runtime error
Runtime error
| # Adding Benchmarks v0 | |
| Clean product one-liner: benchmark features belong in the shipped product when they directly tune or prove the control loop. | |
| Layman version: if a benchmark changes how the harness steers, it is on-product, not just lab work. | |
| ## Is this on product? | |
| Yes, when the benchmark does one of these: | |
| | Benchmark feature | On product? | Why | | |
| | --- | --- | --- | | |
| | lane-selection check | yes | it tunes runtime control | | |
| | memory / freshness check | yes | it tunes graph-backed memory | | |
| | tensor posture check | yes | it tunes the control language | | |
| | receipt coverage check | yes | it governs trust | | |
| | random research probe with no routing consequence | no | keep it in research | | |
| ## How to add a benchmark feature | |
| 1. Add a focused runner under `benchmarks/`. | |
| 2. Write outputs under `runs/benchmark/<name>-<timestamp>/`. | |
| 3. Emit: | |
| - `summary.json` | |
| - `report.md` | |
| 4. Add the benchmark to the README if it informs product behavior. | |
| 5. If it changes steering, connect its result back into: | |
| - `policy/` | |
| - `benchmarks/control_scorecard_v0.*` | |
| ## Minimal output contract | |
| | File | Purpose | | |
| | --- | --- | | |
| | `summary.json` | machine-readable metric surface | | |
| | `report.md` | human-readable benchmark brief | | |
| ## Current shipped benchmark | |
| Use: | |
| ```bash | |
| ./bin/bvtctl benchmark | |
| ``` | |
| Why: this proves the standalone repo can benchmark its own graph-first and execution lanes through the local CLI. | |