Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Factuality Evaluation Failed
#51
by
Repaltoofficial
- opened
I tested the model on 'Hallucination Resistance.' I asked for a non-existent case (Harrison v. Telco-Dynamics), and instead of refusing, the model invented a fake summary.
This is a critical risk for RAG (Retrieval-Augmented Generation) systems. We build 'Counterfactual Knowledge' datasets (questions about fake events/papers) to train models to say 'I don't know' instead of lying.