mhla commited on
Commit
54ccc78
·
verified ·
1 Parent(s): 1cbef60

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +41 -23
README.md CHANGED
@@ -6,34 +6,38 @@ tags:
6
  - gpt
7
  - pre-1900
8
  - historical
 
9
  - nanochat
10
  ---
11
 
12
- # GPT-1905 D34 Base (fully trained)
13
 
14
- 3.29B parameter GPT-style language model trained on pre-1905 English text. Training complete (19,103 steps, 40B tokens).
15
 
16
- ## Model Details
17
 
18
- - **Architecture:** Custom GPT with RoPE, QK-norm, ReLU², value embeddings (ResFormer), per-layer residual/skip scalars
19
- - **Parameters:** 3.29B
20
- - **Layers:** 34
21
- - **Hidden dim:** 2176
22
- - **Attention heads:** 17 (query) / 17 (kv)
23
- - **Head dim:** 128
24
- - **Context length:** 2048 tokens
25
- - **Vocab size:** 32,768 (BPE, GPT-4 style split pattern)
26
- - **Training:** Base pretraining on pre-1905 corpus, 19,103 steps, 40B tokens
27
 
28
- ## Checkpoint Contents
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- ```
31
- model_019103.pt # Model weights
32
- meta_019103.json # Training config and metadata
33
- optim_019103_rank*.pt # Optimizer state shards (if present, for resuming training)
34
- tokenizer/ # BPE tokenizer (tiktoken format) + token byte counts
35
- nanochat/ # Source code to load and run the model
36
- ```
37
 
38
  ## Quick Start
39
 
@@ -48,7 +52,6 @@ with open("meta_019103.json") as f:
48
  meta = json.load(f)
49
 
50
  config = GPTConfig(**meta["model_config"])
51
-
52
  with torch.device("meta"):
53
  model = GPT(config)
54
  model.to_empty(device="cuda")
@@ -58,14 +61,20 @@ state_dict = torch.load("model_019103.pt", map_location="cuda")
58
  state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
59
  model.load_state_dict(state_dict, strict=True, assign=True)
60
  model.eval()
 
 
61
 
 
 
 
62
  bos = tokenizer.get_bos_token_id()
63
- tokens = tokenizer.encode("It was a dark and stormy night", prepend=bos)
64
  with torch.amp.autocast(device_type="cuda", dtype=torch.bfloat16):
65
- for token in model.generate(tokens, max_tokens=100, temperature=0.8):
66
  print(tokenizer.decode([token]), end="", flush=True)
67
  ```
68
 
 
69
  ## Dependencies
70
 
71
  ```
@@ -73,3 +82,12 @@ torch>=2.9
73
  tiktoken
74
  rustbpe
75
  ```
 
 
 
 
 
 
 
 
 
 
6
  - gpt
7
  - pre-1900
8
  - historical
9
+ - physics
10
  - nanochat
11
  ---
12
 
13
+ # GPT-1905
14
 
15
+ A 3.29B parameter language model trained on pre-1905 English text. Like [GPT-1900](https://huggingface.co/mhla/gpt1900-d34-22btok), but with a cutoff extended to 1905 — just before Einstein's *annus mirabilis*. This model knows of Planck's early work and Lorentz's electron theory, but has never heard of special relativity or the photon.
16
 
17
+ Trained on **~40B tokens** from digitized books and newspapers published before 1905.
18
 
19
+ ## Training
 
 
 
 
 
 
 
 
20
 
21
+ - **Data:** Pre-1905 English text corpus (institutional books + American Stories newspapers)
22
+ - **Tokens:** ~40B
23
+ - **Steps:** 19,103
24
+ - **Val BPB:** 0.787
25
+ - **Hardware:** 8x8 H100 GPUs
26
+
27
+ ## Architecture
28
+
29
+ Custom GPT with RoPE, QK-norm, ReLU² activation, value embeddings (ResFormer), and per-layer residual/skip scalars. Built with the [nanochat](https://github.com/karpathy/nanochat) framework.
30
+
31
+ | Parameter | Value |
32
+ |---|---|
33
+ | Parameters | 3.29B |
34
+ | Layers | 34 |
35
+ | Hidden dim | 2176 |
36
+ | Attention heads | 17 (query) / 17 (kv) |
37
+ | Head dim | 128 |
38
+ | Context length | 2048 tokens |
39
+ | Vocab size | 32,768 (BPE, GPT-4 style split pattern) |
40
 
 
 
 
 
 
 
 
41
 
42
  ## Quick Start
43
 
 
52
  meta = json.load(f)
53
 
54
  config = GPTConfig(**meta["model_config"])
 
55
  with torch.device("meta"):
56
  model = GPT(config)
57
  model.to_empty(device="cuda")
 
61
  state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
62
  model.load_state_dict(state_dict, strict=True, assign=True)
63
  model.eval()
64
+ ```
65
+
66
 
67
+ ### Generate text
68
+
69
+ ```python
70
  bos = tokenizer.get_bos_token_id()
71
+ tokens = tokenizer.encode("The luminiferous aether", prepend=bos)
72
  with torch.amp.autocast(device_type="cuda", dtype=torch.bfloat16):
73
+ for token in model.generate(tokens, max_tokens=200, temperature=0.8):
74
  print(tokenizer.decode([token]), end="", flush=True)
75
  ```
76
 
77
+
78
  ## Dependencies
79
 
80
  ```
 
82
  tiktoken
83
  rustbpe
84
  ```
85
+
86
+
87
+ ## Related
88
+
89
+ - [mhla/pre1900-corpus](https://huggingface.co/datasets/mhla/pre1900-corpus) — Pre-1900 training corpus with metadata
90
+ - [mhla/gpt1900-physics-clm](https://huggingface.co/datasets/mhla/gpt1900-physics-clm) — Physics texts for continued pretraining
91
+ - [mhla/gpt1900-instruct-v3-data](https://huggingface.co/datasets/mhla/gpt1900-instruct-v3-data) — Instruction-tuning conversation pairs
92
+ - [mhla/gpt1900-contradiction-eval](https://huggingface.co/datasets/mhla/gpt1900-contradiction-eval) — Physics contradiction evaluation problems
93
+