Datasets from the five-CoT-faith benchmark for evaluating chain-of-thought faithfulness monitoring across five diverse tasks.
-
mats-10-sprint-cs-jb/five-cot-faith-answer-emission
Viewer • Updated • 10.6k • 21 -
mats-10-sprint-cs-jb/five-cot-faith-self-deletion
Viewer • Updated • 7.7k • 23 -
mats-10-sprint-cs-jb/five-cot-faith-atypical-answer
Viewer • Updated • 5.37k • 22 -
mats-10-sprint-cs-jb/five-cot-faith-forced-answer-entropy
Viewer • Updated • 940 • 25