zkolter
/

RL-Homework

Text Generation

Model card Files Files and versions

zkolter commited on Apr 17

Commit

de047dc

·

verified ·

1 Parent(s): eeb1203

Add README.md

Files changed (1) hide show

README.md +57 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+library_name: pytorch
+pipeline_tag: text-generation
+tags:
+  - pytorch
+  - text-generation
+  - homework
+  - fineweb-edu
+  - gsm8k
+datasets:
+  - HuggingFaceFW/fineweb-edu
+  - openai/gsm8k
+---
+# RL-Homework
+This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint.
+## Files
+- `model_base.pth`: base model checkpoint exported in the homework's LLaMA-like single-file format
+- `model_sft.pth`: supervised fine-tuned checkpoint trained further on the GSM8K training set
+- `params.json`: model architecture parameters for the homework loader
+## Model Info
+Architecture from `params.json`:
+- dimension: 1024
+- feed-forward dimension: 4096
+- heads: 16
+- layers: 8
+- max sequence length: 1024
+- vocabulary size: 50432
+## Training Summary
+### Base model
+`model_base.pth` is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format.
+### SFT model
+`model_sft.pth` starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage.
+## Intended Use
+- homework reproduction
+- educational experiments
+- small-scale reasoning and RL homework pipelines
+## Limitations
+- these are homework checkpoints, not production models
+- outputs may still be repetitive or incorrect
+- GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning