Improving Pretraining Data Using Perplexity Correlations
Paper
•
2409.05816
•
Published
This is the fastText pretraining data filter targeting the LAMBADA DE task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816
Code: https://github.com/TristanThrush/perplexity-correlations