zkolter
/

RL-Homework

Text Generation

Model card Files Files and versions

RL-Homework / README.md

zkolter's picture

Add README.md

de047dc verified 24 days ago

|

history blame contribute delete

1.43 kB

	---
	library_name: pytorch
	pipeline_tag: text-generation
	tags:
	- pytorch
	- text-generation
	- homework
	- fineweb-edu
	- gsm8k
	datasets:
	- HuggingFaceFW/fineweb-edu
	- openai/gsm8k
	---

	# RL-Homework

	This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint.

	## Files

	- `model_base.pth`: base model checkpoint exported in the homework's LLaMA-like single-file format
	- `model_sft.pth`: supervised fine-tuned checkpoint trained further on the GSM8K training set
	- `params.json`: model architecture parameters for the homework loader

	## Model Info

	Architecture from `params.json`:

	- dimension: 1024
	- feed-forward dimension: 4096
	- heads: 16
	- layers: 8
	- max sequence length: 1024
	- vocabulary size: 50432

	## Training Summary

	### Base model

	`model_base.pth` is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format.

	### SFT model

	`model_sft.pth` starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage.

	## Intended Use

	- homework reproduction
	- educational experiments
	- small-scale reasoning and RL homework pipelines

	## Limitations

	- these are homework checkpoints, not production models
	- outputs may still be repetitive or incorrect
	- GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning