zkolter commited on
Commit
de047dc
·
verified ·
1 Parent(s): eeb1203

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: pytorch
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - pytorch
6
+ - text-generation
7
+ - homework
8
+ - fineweb-edu
9
+ - gsm8k
10
+ datasets:
11
+ - HuggingFaceFW/fineweb-edu
12
+ - openai/gsm8k
13
+ ---
14
+
15
+ # RL-Homework
16
+
17
+ This is a homework model repo containing a base pretrained checkpoint and an additional supervised fine-tuned checkpoint.
18
+
19
+ ## Files
20
+
21
+ - `model_base.pth`: base model checkpoint exported in the homework's LLaMA-like single-file format
22
+ - `model_sft.pth`: supervised fine-tuned checkpoint trained further on the GSM8K training set
23
+ - `params.json`: model architecture parameters for the homework loader
24
+
25
+ ## Model Info
26
+
27
+ Architecture from `params.json`:
28
+
29
+ - dimension: 1024
30
+ - feed-forward dimension: 4096
31
+ - heads: 16
32
+ - layers: 8
33
+ - max sequence length: 1024
34
+ - vocabulary size: 50432
35
+
36
+ ## Training Summary
37
+
38
+ ### Base model
39
+
40
+ `model_base.pth` is the final FineWebEDU-pretrained checkpoint, exported in the homework loader format.
41
+
42
+ ### SFT model
43
+
44
+ `model_sft.pth` starts from the same base model family and is additionally trained on the GSM8K training set for the homework's supervised fine-tuning stage.
45
+
46
+ ## Intended Use
47
+
48
+ - homework reproduction
49
+ - educational experiments
50
+ - small-scale reasoning and RL homework pipelines
51
+
52
+ ## Limitations
53
+
54
+ - these are homework checkpoints, not production models
55
+ - outputs may still be repetitive or incorrect
56
+ - GSM8K fine-tuning improves math-style behavior but does not guarantee reliable reasoning
57
+