File size: 727 Bytes
bbbfdda
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# IB-Physics-Mini-GPT (from scratch)

**Model type:** small GPT-2–style decoder-only LM  
**Params:** ~30M (n_layer=6, n_head=6, n_embed=384)  
**Context length:** 256  
**Training:** tiny pretrain on physics notes → SFT on instruction pairs  

## Intended Use
Educational demo and concept explainer for IB Physics HL topics.

## Limitations
Small context, tiny dataset, not a fact oracle. Double-check results.

## How Trained
1) Tokenizer: BPE (vocab 16k) on `corpus_raw.txt`.  
2) Pretrain: next-token prediction.  
3) Finetune: instruction-style Q&A (short).  

## Eval
- Perplexity on held-out notes (see `eval/` scripts)
- Manual Q&A sanity checks.

## License
MIT for code. Dataset licensing is your responsibility.