RL-Homework / README.md
zkolter's picture
Add README.md
de047dc verified
---
library_name: pytorch
pipeline_tag: text-generation
tags:
- pytorch
- text-generation
- homework
- fineweb-edu
- gsm8k
datasets:
- HuggingFaceFW/fineweb-edu
- openai/gsm8k
---
# RL-Homework
This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint.
## Files
- `model_base.pth`: base model checkpoint exported in the homework's LLaMA-like single-file format
- `model_sft.pth`: supervised fine-tuned checkpoint trained further on the GSM8K training set
- `params.json`: model architecture parameters for the homework loader
## Model Info
Architecture from `params.json`:
- dimension: 1024
- feed-forward dimension: 4096
- heads: 16
- layers: 8
- max sequence length: 1024
- vocabulary size: 50432
## Training Summary
### Base model
`model_base.pth` is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format.
### SFT model
`model_sft.pth` starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage.
## Intended Use
- homework reproduction
- educational experiments
- small-scale reasoning and RL homework pipelines
## Limitations
- these are homework checkpoints, not production models
- outputs may still be repetitive or incorrect
- GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning