nl2sql-copilot / benchmarks /evaluate_spider_pro.py

Commit History

feat(core): stabilize benchmark pipeline with accurate latency tracking, retry-empty handling, and refined plots
bf06cf7

Melika Kheirieh commited on

refactor(core): trace schema upgrade, verifier/executor sync, benchmark plot polish
e3e0ac5

Melika Kheirieh commited on

feat(bench): gold-aware EM/SM/ExecAcc + p50/p95; write per-stage means; richer plots
296a94d

Melika Kheirieh commited on

feat(core): refine pipeline & verifier; improve Spider benchmark accuracy
b794494

Melika Kheirieh commited on

fix(grafana): move nl2sql.json into provisioning folder and fix dashboard mount path
454d146

Melika Kheirieh commited on

feat(benchmarks): add pro evaluator with EM, structural match, execution accuracy, and safety consistency metrics
ebc7457

Melika Kheirieh commited on