LLama 3.2 1B Simplified

This repo contains a simplified variant of the Llama 3.2 1B Instruct model, aimed at instruction for the Introduction to Modern AI course. The model is intended for instructional purposes only, specifically meant to test the implementation of a Transformer for Homework 4.

The differences with the normal Llama 3.2 1B model are:

  1. The model replaces RoPE with an absolute positional embedding. RoPE typically works slightly better, but is somewhat cumbersome and unintuitive to implement for an introductory class.
  2. The model uses normal multihead attention instead of grouped query attention. Grouped query attention is a minor architecturual optimization that introduces marginal added complexity with little instructional value.

To build this model, we made these two architecture changes and then finetuned the model to recover Llama 3.2 Instruct behavior using a KL distillation loss and next-token loss on a mixture of FineWebEDU (HuggingFaceFW/fineweb-edu, sample-350BT) and UltraChat200K (HuggingFaceH4/ultrachat_200k).

Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zkolter/Llama-3.2-1B-Instruct-Simplified

Finetuned
(1462)
this model

Datasets used to train zkolter/Llama-3.2-1B-Instruct-Simplified