Collection of Datasets and Evals from the "Making, not Taking, the Best of N" paper (https://arxiv.org/abs/2510.00931)