Adam1010
/

goodhart-gap-benchmark

execution-vs-understanding

Model card Files Files and versions

goodhart-gap-benchmark

91.2 MB

Ctrl+K

Ctrl+K

1 contributor

History: 4 commits

Adam1010's picture

v2.0: Combined with cgrt-consensus-5model data (8,050 disagreements, 1,556 contested)

ca5e3d7 verified 6 months ago

data
v2.0: Combined with cgrt-consensus-5model data (8,050 disagreements, 1,556 contested) 6 months ago
results
v1.1: Financial domain audit - confirms Goodhart Gap hypothesis 6 months ago
.gitattributes

1.72 kB
v2.0: Combined with cgrt-consensus-5model data (8,050 disagreements, 1,556 contested) 6 months ago
README.md

6.33 kB
v2.0: Combined with cgrt-consensus-5model data (8,050 disagreements, 1,556 contested) 6 months ago
create_combined_dataset.py

7.18 kB
v2.0: Combined with cgrt-consensus-5model data (8,050 disagreements, 1,556 contested) 6 months ago
evaluate.py

16 kB
v1.1: Financial domain audit - confirms Goodhart Gap hypothesis 6 months ago
generate_dataset.py

45.8 kB
v1.1: Financial domain audit - confirms Goodhart Gap hypothesis 6 months ago
requirements.txt

17 Bytes
v1.1: Financial domain audit - confirms Goodhart Gap hypothesis 6 months ago