Spaces:
Sleeping
Sleeping
File size: 1,027 Bytes
c4fe0a4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | # Evaluation Report
**Total cases evaluated:** 35
## Metrics
| Metric | Value | Target | Pass |
|--------|-------|--------|------|
| schema_pass_rate | 100.00% | 98.00% | PASS |
| evidence_coverage_rate | 100.00% | 90.00% | PASS |
| review_required_rate | 0.00% | — | — |
| unsupported_recommendation_rate | 0.00% | 2.00% | PASS |
| root_cause_consistency | 100.00% | 70.00% | PASS |
## Gate Distribution
- Auto-routed: 35
- Review-routed: 0
## Failure Modes
**Total failures detected:** 26
**Cases affected:** 24
| Mode | Count | Examples |
|------|-------|----------|
| hallucination | 23 | `case-05a46709`: Evidence not found in source: ['I was charged twice for the ; `case-0d2ab501`: Evidence not found in source: ['I was charged twice for the |
| omission | 3 | `case-380fd7e4`: Urgent signals in text but risk_level=medium; `case-652870dc`: Outage signals in text but root_cause=billing |
| ambiguity | 0 | — |
| overconfidence | 0 | — |
| language_drift | 0 | — |
---
*Generated by eval/run_eval.py* |