This is the fastText pretraining data filter targeting the LAMBADA DE task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816

Code: https://github.com/TristanThrush/perplexity-correlations

Downloads last month: 12

Paper for perplexity-correlations/fasttext-lambada-de-target

Improving Pretraining Data Using Perplexity Correlations

Paper • 2409.05816 • Published Sep 9, 2024