Improving Pretraining Data Using Perplexity Correlations
Paper • 2409.05816 • Published
How to use perplexity-correlations/fasttext-lambada-de-target with fastText:
from huggingface_hub import hf_hub_download
import fasttext
model = fasttext.load_model(hf_hub_download("perplexity-correlations/fasttext-lambada-de-target", "model.bin"))This is the fastText pretraining data filter targeting the LAMBADA DE task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816
Code: https://github.com/TristanThrush/perplexity-correlations
from huggingface_hub import hf_hub_download import fasttext model = fasttext.load_model(hf_hub_download("perplexity-correlations/fasttext-lambada-de-target", "model.bin"))