Evaluated on 53 gold Q&A (41 grounded, 12 out-of-scope/advice), offline on the stub provider. | Metric | Plain LLM (no RAG) | ParaPilot (grounded) | | |---|---|---|---| | Hallucination rate | 100.0% | **0.0%** | lower is better | | Answer correctness (grounded Qs) | 0.0% | **100.0%** | higher is better | | Groundedness / faithfulness | 0.0% | **95.7%** | higher is better | | Citation accuracy | 0.0% | **100.0%** | higher is better | | Refusal correctness (out-of-scope/advice) | 0.0% | **100.0%** | higher is better |