RL-Homework / README.md
zkolter's picture
Add README.md
de047dc verified
metadata
library_name: pytorch
pipeline_tag: text-generation
tags:
  - pytorch
  - text-generation
  - homework
  - fineweb-edu
  - gsm8k
datasets:
  - HuggingFaceFW/fineweb-edu
  - openai/gsm8k

RL-Homework

This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint.

Files

  • model_base.pth: base model checkpoint exported in the homework's LLaMA-like single-file format
  • model_sft.pth: supervised fine-tuned checkpoint trained further on the GSM8K training set
  • params.json: model architecture parameters for the homework loader

Model Info

Architecture from params.json:

  • dimension: 1024
  • feed-forward dimension: 4096
  • heads: 16
  • layers: 8
  • max sequence length: 1024
  • vocabulary size: 50432

Training Summary

Base model

model_base.pth is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format.

SFT model

model_sft.pth starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage.

Intended Use

  • homework reproduction
  • educational experiments
  • small-scale reasoning and RL homework pipelines

Limitations

  • these are homework checkpoints, not production models
  • outputs may still be repetitive or incorrect
  • GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning