gpt1905-d34 / README.md
mhla's picture
Upload README.md with huggingface_hub
54ccc78 verified
metadata
license: mit
language:
  - en
tags:
  - gpt
  - pre-1900
  - historical
  - physics
  - nanochat

GPT-1905

A 3.29B parameter language model trained on pre-1905 English text. Like GPT-1900, but with a cutoff extended to 1905 — just before Einstein's annus mirabilis. This model knows of Planck's early work and Lorentz's electron theory, but has never heard of special relativity or the photon.

Trained on ~40B tokens from digitized books and newspapers published before 1905.

Training

  • Data: Pre-1905 English text corpus (institutional books + American Stories newspapers)
  • Tokens: ~40B
  • Steps: 19,103
  • Val BPB: 0.787
  • Hardware: 8x8 H100 GPUs

Architecture

Custom GPT with RoPE, QK-norm, ReLU² activation, value embeddings (ResFormer), and per-layer residual/skip scalars. Built with the nanochat framework.

Parameter Value
Parameters 3.29B
Layers 34
Hidden dim 2176
Attention heads 17 (query) / 17 (kv)
Head dim 128
Context length 2048 tokens
Vocab size 32,768 (BPE, GPT-4 style split pattern)

Quick Start

import torch, json
from nanochat.gpt import GPT, GPTConfig
from nanochat.tokenizer import RustBPETokenizer

tokenizer = RustBPETokenizer.from_directory("tokenizer")

with open("meta_019103.json") as f:
    meta = json.load(f)

config = GPTConfig(**meta["model_config"])
with torch.device("meta"):
    model = GPT(config)
model.to_empty(device="cuda")
model.init_weights()

state_dict = torch.load("model_019103.pt", map_location="cuda")
state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=True, assign=True)
model.eval()

Generate text

bos = tokenizer.get_bos_token_id()
tokens = tokenizer.encode("The luminiferous aether", prepend=bos)
with torch.amp.autocast(device_type="cuda", dtype=torch.bfloat16):
    for token in model.generate(tokens, max_tokens=200, temperature=0.8):
        print(tokenizer.decode([token]), end="", flush=True)

Dependencies

torch>=2.9
tiktoken
rustbpe

Related