| --- |
| library_name: pytorch |
| pipeline_tag: text-generation |
| tags: |
| - pytorch |
| - text-generation |
| - homework |
| - fineweb-edu |
| - gsm8k |
| datasets: |
| - HuggingFaceFW/fineweb-edu |
| - openai/gsm8k |
| --- |
| |
| # RL-Homework |
|
|
| This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint. |
|
|
| ## Files |
|
|
| - `model_base.pth`: base model checkpoint exported in the homework's LLaMA-like single-file format |
| - `model_sft.pth`: supervised fine-tuned checkpoint trained further on the GSM8K training set |
| - `params.json`: model architecture parameters for the homework loader |
|
|
| ## Model Info |
|
|
| Architecture from `params.json`: |
|
|
| - dimension: 1024 |
| - feed-forward dimension: 4096 |
| - heads: 16 |
| - layers: 8 |
| - max sequence length: 1024 |
| - vocabulary size: 50432 |
|
|
| ## Training Summary |
|
|
| ### Base model |
|
|
| `model_base.pth` is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format. |
|
|
| ### SFT model |
|
|
| `model_sft.pth` starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage. |
|
|
| ## Intended Use |
|
|
| - homework reproduction |
| - educational experiments |
| - small-scale reasoning and RL homework pipelines |
|
|
| ## Limitations |
|
|
| - these are homework checkpoints, not production models |
| - outputs may still be repetitive or incorrect |
| - GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning |
|
|
|
|