# Evaluation Report **Total cases evaluated:** 35 ## Metrics | Metric | Value | Target | Pass | |--------|-------|--------|------| | schema_pass_rate | 100.00% | 98.00% | PASS | | evidence_coverage_rate | 100.00% | 90.00% | PASS | | review_required_rate | 0.00% | — | — | | unsupported_recommendation_rate | 0.00% | 2.00% | PASS | | root_cause_consistency | 100.00% | 70.00% | PASS | ## Gate Distribution - Auto-routed: 35 - Review-routed: 0 ## Failure Modes **Total failures detected:** 26 **Cases affected:** 24 | Mode | Count | Examples | |------|-------|----------| | hallucination | 23 | `case-05a46709`: Evidence not found in source: ['I was charged twice for the ; `case-0d2ab501`: Evidence not found in source: ['I was charged twice for the | | omission | 3 | `case-380fd7e4`: Urgent signals in text but risk_level=medium; `case-652870dc`: Outage signals in text but root_cause=billing | | ambiguity | 0 | — | | overconfidence | 0 | — | | language_drift | 0 | — | --- *Generated by eval/run_eval.py*