Deploy Verifier
Objective
Watch a rollout after deployment, compare live signals against release expectations, and escalate quickly when anomalies appear.
Trigger
- Event: deployment starts, canary advances, feature flag flips, migration completes, or release tag is created.
- Schedule: every 5-15 minutes during the rollout window.
- Manual bootstrap/debug command: "verify deploy until stable or blocked."
Intake
- Release notes, expected changes, dashboards, logs, traces, error budgets, synthetic checks, feature flags, and rollback instructions.
- Known risky areas and owner contacts.
Agents
- Observer: gathers metrics, logs, traces, and check status.
- Comparator: compares observed signals against release expectations and baseline.
- Reporter: writes concise rollout status.
- Escalator: notifies humans when thresholds or unknowns are crossed.
Workspace And Permissions
- Prefer read-only access to observability systems.
- Allow comments, status updates, and issue creation.
- Require human approval for rollback, config changes, traffic shifting, database actions, or incident declaration.
Durable State
- Release version, rollout phase, baseline window, checked dashboards, anomalies, decisions, and timestamps.
Loop Steps
- Load release expectations and rollback criteria.
- Snapshot key metrics before or at rollout start.
- Delegate observation, comparison, reporting, and escalation to separate roles when the rollout is complex.
- Poll signals on the defined cadence.
- Compare against baseline and explicit thresholds.
- Report stable, degraded, blocked, or unknown.
- Escalate if thresholds are crossed or evidence is missing.
- Stop when the rollout window completes or a human takes over.
Verification Gates
- Synthetic checks pass.
- Error rate, latency, saturation, and business metrics stay within thresholds.
- Logs/traces show no new dominant failure mode.
- Feature flag or deployment status matches the expected phase.
Budget And Exit
- Max runtime: the rollout window plus one confirmation interval.
- Max unknown intervals: 2 before escalation.
- Stop when the deploy is stable, rolled back, paused, or handed to an incident owner.
Escalation
Escalate on threshold breach, missing telemetry, unclear ownership, migration errors, customer-impacting regressions, or rollback criteria.
Loop Instruction
Verify deploy <version>. Use the release notes and rollout checklist as the contract.
Poll the required dashboards, logs, traces, synthetic checks, and feature flag state.
Report stable/degraded/blocked with evidence. Do not roll back or change production without human approval.
Example automation: trigger when a deployment starts or a feature flag changes, then poll every 5-15 minutes until stable, paused, rolled back, or handed to a human.
Failure Modes
- The loop trusts a single green dashboard while logs show a new error class.
- Baseline windows are mismatched to traffic patterns.
- Missing telemetry is treated as healthy.
- The agent takes production action without approval.
References
- OpenAI Agents SDK human review - Describes approval boundaries for sensitive agent actions.
- OpenTelemetry Semantic Conventions for Generative AI Systems - Provides tracing vocabulary for model, tool, and agent workflow receipts.