Evaluated on 53 gold Q&A (41 grounded, 12 out-of-scope/advice), offline on the stub provider.
| Metric | Plain LLM (no RAG) | ParaPilot (grounded) | |
|---|---|---|---|
| Hallucination rate | 100.0% | 0.0% | lower is better |
| Answer correctness (grounded Qs) | 0.0% | 100.0% | higher is better |
| Groundedness / faithfulness | 0.0% | 95.7% | higher is better |
| Citation accuracy | 0.0% | 100.0% | higher is better |
| Refusal correctness (out-of-scope/advice) | 0.0% | 100.0% | higher is better |