HuggingFaceFW/fineweb-edu
Viewer • Updated • 3.5B • 578k • 1.07k
This repo contains a simplified variant of the Llama 3.2 1B Instruct model, aimed at instruction for the Introduction to Modern AI course. The model is intended for instructional purposes only, specifically meant to test the implementation of a Transformer for Homework 4.
The differences with the normal Llama 3.2 1B model are:
To build this model, we made these two architecture changes and then finetuned the model to recover Llama 3.2 Instruct behavior using a KL distillation loss and next-token loss on a mixture of FineWebEDU (HuggingFaceFW/fineweb-edu, sample-350BT) and UltraChat200K (HuggingFaceH4/ultrachat_200k).
Base model
meta-llama/Llama-3.2-1B-Instruct