RL-Homework / README.md

zkolter

Add README.md

de047dc verified 24 days ago

preview code

raw

history blame contribute delete

1.43 kB

metadata

library_name: pytorch
pipeline_tag: text-generation
tags:
  - pytorch
  - text-generation
  - homework
  - fineweb-edu
  - gsm8k
datasets:
  - HuggingFaceFW/fineweb-edu
  - openai/gsm8k

RL-Homework

This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint.

Files

model_base.pth: base model checkpoint exported in the homework's LLaMA-like single-file format
model_sft.pth: supervised fine-tuned checkpoint trained further on the GSM8K training set
params.json: model architecture parameters for the homework loader

Model Info

Architecture from params.json:

dimension: 1024
feed-forward dimension: 4096
heads: 16
layers: 8
max sequence length: 1024
vocabulary size: 50432

Training Summary

Base model

model_base.pth is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format.

SFT model

model_sft.pth starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage.

Intended Use

homework reproduction
educational experiments
small-scale reasoning and RL homework pipelines

Limitations

these are homework checkpoints, not production models
outputs may still be repetitive or incorrect
GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning