---
library_name: pytorch
pipeline_tag: text-generation
tags:
  - text-generation
  - pytorch
  - fineweb-edu
  - ultrachat
  - homework
datasets:
  - HuggingFaceFW/fineweb-edu
  - HuggingFaceH4/ultrachat_200k
---

# Chat-Tuning-Homework

This is a course-homework model repo containing both checkpoints and derived data artifacts.

## Contents

- `model_base.pth`: 1.1M-step base model checkpoint in the homework's LLaMA-like single-file format.
- `model_chat.pth`: chat-tuned checkpoint in the homework model format.
- `params.json`: model architecture parameters used by the homework `LLM` loader.
- `ultrachat_short.json`: filtered short-form UltraChat conversations used for chat tuning.
- `ultrachat_dpo_pos.json`: positive DPO preference data.
- `ultrachat_dpo_neg.json`: negative DPO preference data.

## Model Card

### Architecture

The checkpoints use the Homework 5 transformer architecture with:

- dimension: 1024
- feed-forward dimension: 4096
- heads: 16
- layers: 8
- maximum sequence length: 1024
- vocabulary size: 50432

These values are also stored in `params.json`.

### Training Summary

- `model_base.pth` is the pretrained base checkpoint exported from the ~1.1T-token FineWebEDU run.
- `model_chat.pth` is the chat-tuned checkpoint saved after supervised chat tuning on a subset of the ultrachat200k dataset.

These file are intended for use with the homework's basic exercises.

## Data Card

### Data Sources

- FineWebEDU for base pretraining
- UltraChat 200k for chat tuning and preference-style data preparation

### Included Data Files

- `ultrachat_short.json`: set of short chat-tuning responses selected from Ultrachat 200k
- `ultrachat_dpo_pos.json`: preferred responses
- `ultrachat_dpo_neg.json`: dispreferred responses

## File Format Notes

- `model_base.pth` and `model_chat.pth` are PyTorch checkpoint dictionaries
- attention weights are stored in the homework-compatible unpacked format
- all exported weights are stored as `bfloat16`