Running Agents 2 BELLS-Operational: Supervision Systems Benchmark 🔎 LLMs vs Guardrails: Operational Misuse Detection trade-offs.