nl2sql-copilot / docs /observability.md
Melika Kheirieh
feat(observability): add Prometheus-Grafana stack with auto-provisioning and docs
5e6809d
|
raw
history blame
1.1 kB

Observability and Metrics

This module adds full observability for the NL2SQL Copilot pipeline.

πŸ“Š Metrics exposed

Metric Type Labels Description
stage_duration_ms histogram stage Duration per stage (detector, planner, generator, safety, executor, verifier)
pipeline_runs_total counter status Pipeline runs by outcome (ok, error, ambiguous)
safety_checks_total, safety_blocks_total counter reason Number of safety checks and blocked queries
verifier_checks_total, verifier_failures_total counter reason Number of verification passes and failures

βš™οΈ Recording & Alerting Rules

Defined in prometheus/rules.yml:

  • nl2sql:stage_p95_ms – 95th percentile latency per stage
  • nl2sql:pipeline_success_ratio – 5-minute success ratio
  • Alerts:
    • PipelineLowSuccessRatio (<90% for 10m)
    • GeneratorLatencyHigh (>1500 ms for 5m)
    • SafetyBlocksSpike (>0.5/min)

πŸ§ͺ Local Testing

  1. Start Prometheus
    make prom-up