Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available: 6.14.0
Adding Benchmarks v0
Clean product one-liner: benchmark features belong in the shipped product when they directly tune or prove the control loop.
Layman version: if a benchmark changes how the harness steers, it is on-product, not just lab work.
Is this on product?
Yes, when the benchmark does one of these:
| Benchmark feature | On product? | Why |
|---|---|---|
| lane-selection check | yes | it tunes runtime control |
| memory / freshness check | yes | it tunes graph-backed memory |
| tensor posture check | yes | it tunes the control language |
| receipt coverage check | yes | it governs trust |
| random research probe with no routing consequence | no | keep it in research |
How to add a benchmark feature
- Add a focused runner under
benchmarks/. - Write outputs under
runs/benchmark/<name>-<timestamp>/. - Emit:
summary.jsonreport.md
- Add the benchmark to the README if it informs product behavior.
- If it changes steering, connect its result back into:
policy/benchmarks/control_scorecard_v0.*
Minimal output contract
| File | Purpose |
|---|---|
summary.json |
machine-readable metric surface |
report.md |
human-readable benchmark brief |
Current shipped benchmark
Use:
./bin/bvtctl benchmark
Why: this proves the standalone repo can benchmark its own graph-first and execution lanes through the local CLI.