DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published Apr 15, 2025 • 20
Software Entity Recognition with Noise-Robust Learning Paper • 2308.10564 • Published Aug 21, 2023 • 1
Explanation-based Finetuning Makes Models More Robust to Spurious Cues Paper • 2305.04990 • Published May 8, 2023
DataDecide Collection A suite of models, data, and evals over 25 corpora, 14 sizes, and 3 seeds to measure how accurately small experiments predict rankings at large scale. • 354 items • Updated Mar 2 • 25
DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published Apr 15, 2025 • 20