File size: 1,429 Bytes
de047dc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ---
library_name: pytorch
pipeline_tag: text-generation
tags:
- pytorch
- text-generation
- homework
- fineweb-edu
- gsm8k
datasets:
- HuggingFaceFW/fineweb-edu
- openai/gsm8k
---
# RL-Homework
This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint.
## Files
- `model_base.pth`: base model checkpoint exported in the homework's LLaMA-like single-file format
- `model_sft.pth`: supervised fine-tuned checkpoint trained further on the GSM8K training set
- `params.json`: model architecture parameters for the homework loader
## Model Info
Architecture from `params.json`:
- dimension: 1024
- feed-forward dimension: 4096
- heads: 16
- layers: 8
- max sequence length: 1024
- vocabulary size: 50432
## Training Summary
### Base model
`model_base.pth` is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format.
### SFT model
`model_sft.pth` starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage.
## Intended Use
- homework reproduction
- educational experiments
- small-scale reasoning and RL homework pipelines
## Limitations
- these are homework checkpoints, not production models
- outputs may still be repetitive or incorrect
- GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning
|