File size: 1,980 Bytes
1ea0044
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37a1d2b
1ea0044
 
 
 
 
 
 
 
 
 
 
 
37a1d2b
 
1ea0044
37a1d2b
1ea0044
 
 
 
 
 
 
 
 
 
37a1d2b
1ea0044
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
library_name: pytorch
pipeline_tag: text-generation
tags:
  - text-generation
  - pytorch
  - fineweb-edu
  - ultrachat
  - homework
datasets:
  - HuggingFaceFW/fineweb-edu
  - HuggingFaceH4/ultrachat_200k
---

# Chat-Tuning-Homework

This is a course-homework model repo containing both checkpoints and derived data artifacts.

## Contents

- `model_base.pth`: 1.1M-step base model checkpoint in the homework's LLaMA-like single-file format.
- `model_chat.pth`: chat-tuned checkpoint in the homework model format.
- `params.json`: model architecture parameters used by the homework `LLM` loader.
- `ultrachat_short.json`: filtered short-form UltraChat conversations used for chat tuning.
- `ultrachat_dpo_pos.json`: positive DPO preference data.
- `ultrachat_dpo_neg.json`: negative DPO preference data.

## Model Card

### Architecture

The checkpoints use the Homework 5 transformer architecture with:

- dimension: 1024
- feed-forward dimension: 4096
- heads: 16
- layers: 8
- maximum sequence length: 1024
- vocabulary size: 50432

These values are also stored in `params.json`.

### Training Summary

- `model_base.pth` is the pretrained base checkpoint exported from the ~1.1T-token FineWebEDU run.
- `model_chat.pth` is the chat-tuned checkpoint saved after supervised chat tuning on a subset of the ultrachat200k dataset.

These file are intended for use with the homework's basic exercises.

## Data Card

### Data Sources

- FineWebEDU for base pretraining
- UltraChat 200k for chat tuning and preference-style data preparation

### Included Data Files

- `ultrachat_short.json`: set of short chat-tuning responses selected from Ultrachat 200k
- `ultrachat_dpo_pos.json`: preferred responses
- `ultrachat_dpo_neg.json`: dispreferred responses

## File Format Notes

- `model_base.pth` and `model_chat.pth` are PyTorch checkpoint dictionaries
- attention weights are stored in the homework-compatible unpacked format
- all exported weights are stored as `bfloat16`