GPT-1900 Drafts
Collection
Experimental and intermediate GPT-1900 checkpoints. Working artifacts, not for general use. • 49 items • Updated
A 1.2B parameter GPT-style language model trained exclusively on pre-1900 English text.
model_007226.pt # Model weights (4.9 GB)
meta_007226.json # Training config and metadata
optim_007226_rank*.pt # Optimizer state, 8 FSDP shards (for resuming training)
tokenizer/ # BPE tokenizer (tiktoken format) + token byte counts
nanochat/ # Source code to load and run the model
eval_results.csv # Benchmark eval results at this checkpoint
import torch
from nanochat.gpt import GPT, GPTConfig
from nanochat.tokenizer import RustBPETokenizer
# Load tokenizer
tokenizer = RustBPETokenizer.from_directory("tokenizer")
# Load model
import json
with open("meta_007226.json") as f:
meta = json.load(f)
config = GPTConfig(**meta["model_config"])
with torch.device("meta"):
model = GPT(config)
model.to_empty(device="cuda")
model.init_weights()
state_dict = torch.load("model_007226.pt", map_location="cuda")
state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=True, assign=True)
model.eval()
# Generate
bos = tokenizer.get_bos_token_id()
tokens = tokenizer.encode("It was a dark and stormy night", prepend=bos)
with torch.amp.autocast(device_type="cuda", dtype=torch.bfloat16):
for token in model.generate(tokens, max_tokens=100, temperature=0.8):
print(tokenizer.decode([token]), end="", flush=True)
torch>=2.9
tiktoken
rustbpe
| Task | Accuracy | Centered |
|---|---|---|
| hellaswag | 0.318 | 0.091 |
| arc_easy | 0.411 | 0.215 |
| lambada_openai | 0.332 | 0.332 |
| piqa | 0.586 | 0.172 |
| winograd | 0.674 | 0.348 |
| copa | 0.570 | 0.140 |
| CORE | 0.126 |
Trained with the nanochat framework using 8x H100 GPUs with FSDP.
To resume training, load the optimizer shards (optim_007226_rank*.pt) — one per FSDP rank.