Factuality Evaluation Failed

#51
by Repaltoofficial - opened

I tested the model on 'Hallucination Resistance.' I asked for a non-existent case (Harrison v. Telco-Dynamics), and instead of refusing, the model invented a fake summary.

This is a critical risk for RAG (Retrieval-Augmented Generation) systems. We build 'Counterfactual Knowledge' datasets (questions about fake events/papers) to train models to say 'I don't know' instead of lying.
10.01.2026_06.18.29_REC

Sign up or log in to comment