Add bootstrap_test_set.py (test-set-only consistent eval) 186cb4d verified Ouaill commited on 15 days ago