| datasets: | |
| - EleutherAI/lambada_openai | |
| *Data influence models for [LAMBADA](https://huggingface.co/datasets/EleutherAI/lambada_openai) fine-tuned from [bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased).* | |
| The main branch contains the data influence model for 10k steps. | |
| Paper: [MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models](https://arxiv.org/pdf/2406.06046) | |
| Official codebase: https://github.com/cxcscmu/MATES |