File size: 363 Bytes

cfb07f2

This is the third in a series of GPT-2 (124M) models I pretrained on different orderings, of data, proving that curriculum learning (https://arxiv.org/html/2405.07490v1) is not a viable method for improving LLM performance, and in fact reduces the performance.

I trained the models on data ordered randomly, reading level ascending, and reading level descending.