zkolter
/

Chat-Tuning-Homework

Text Generation

Model card Files Files and versions

Chat-Tuning-Homework / README.md

zkolter's picture

Update README.md

37a1d2b verified about 1 month ago

|

history blame contribute delete

1.98 kB

	---
	library_name: pytorch
	pipeline_tag: text-generation
	tags:
	- text-generation
	- pytorch
	- fineweb-edu
	- ultrachat
	- homework
	datasets:
	- HuggingFaceFW/fineweb-edu
	- HuggingFaceH4/ultrachat_200k
	---

	# Chat-Tuning-Homework

	This is a course-homework model repo containing both checkpoints and derived data artifacts.

	## Contents

	- `model_base.pth`: 1.1M-step base model checkpoint in the homework's LLaMA-like single-file format.
	- `model_chat.pth`: chat-tuned checkpoint in the homework model format.
	- `params.json`: model architecture parameters used by the homework `LLM` loader.
	- `ultrachat_short.json`: filtered short-form UltraChat conversations used for chat tuning.
	- `ultrachat_dpo_pos.json`: positive DPO preference data.
	- `ultrachat_dpo_neg.json`: negative DPO preference data.

	## Model Card

	### Architecture

	The checkpoints use the Homework 5 transformer architecture with:

	- dimension: 1024
	- feed-forward dimension: 4096
	- heads: 16
	- layers: 8
	- maximum sequence length: 1024
	- vocabulary size: 50432

	These values are also stored in `params.json`.

	### Training Summary

	- `model_base.pth` is the pretrained base checkpoint exported from the ~1.1T-token FineWebEDU run.
	- `model_chat.pth` is the chat-tuned checkpoint saved after supervised chat tuning on a subset of the ultrachat200k dataset.

	These file are intended for use with the homework's basic exercises.

	## Data Card

	### Data Sources

	- FineWebEDU for base pretraining
	- UltraChat 200k for chat tuning and preference-style data preparation

	### Included Data Files

	- `ultrachat_short.json`: set of short chat-tuning responses selected from Ultrachat 200k
	- `ultrachat_dpo_pos.json`: preferred responses
	- `ultrachat_dpo_neg.json`: dispreferred responses

	## File Format Notes

	- `model_base.pth` and `model_chat.pth` are PyTorch checkpoint dictionaries
	- attention weights are stored in the homework-compatible unpacked format
	- all exported weights are stored as `bfloat16`