zkolter's picture
Update README.md
37a1d2b verified
---
library_name: pytorch
pipeline_tag: text-generation
tags:
- text-generation
- pytorch
- fineweb-edu
- ultrachat
- homework
datasets:
- HuggingFaceFW/fineweb-edu
- HuggingFaceH4/ultrachat_200k
---
# Chat-Tuning-Homework
This is a course-homework model repo containing both checkpoints and derived data artifacts.
## Contents
- `model_base.pth`: 1.1M-step base model checkpoint in the homework's LLaMA-like single-file format.
- `model_chat.pth`: chat-tuned checkpoint in the homework model format.
- `params.json`: model architecture parameters used by the homework `LLM` loader.
- `ultrachat_short.json`: filtered short-form UltraChat conversations used for chat tuning.
- `ultrachat_dpo_pos.json`: positive DPO preference data.
- `ultrachat_dpo_neg.json`: negative DPO preference data.
## Model Card
### Architecture
The checkpoints use the Homework 5 transformer architecture with:
- dimension: 1024
- feed-forward dimension: 4096
- heads: 16
- layers: 8
- maximum sequence length: 1024
- vocabulary size: 50432
These values are also stored in `params.json`.
### Training Summary
- `model_base.pth` is the pretrained base checkpoint exported from the ~1.1T-token FineWebEDU run.
- `model_chat.pth` is the chat-tuned checkpoint saved after supervised chat tuning on a subset of the ultrachat200k dataset.
These file are intended for use with the homework's basic exercises.
## Data Card
### Data Sources
- FineWebEDU for base pretraining
- UltraChat 200k for chat tuning and preference-style data preparation
### Included Data Files
- `ultrachat_short.json`: set of short chat-tuning responses selected from Ultrachat 200k
- `ultrachat_dpo_pos.json`: preferred responses
- `ultrachat_dpo_neg.json`: dispreferred responses
## File Format Notes
- `model_base.pth` and `model_chat.pth` are PyTorch checkpoint dictionaries
- attention weights are stored in the homework-compatible unpacked format
- all exported weights are stored as `bfloat16`