goodhart-gap-benchmark / evaluate.py

Commit History

v1.1: Financial domain audit - confirms Goodhart Gap hypothesis
b684ab3
verified

Adam1010 commited on