路 GPT2-small architecture
路 Randomly initialized
路 Distilled on BabyLM dataset (10M) using teacher model GPT2-large-BabyLM