WARNING: THIS MODEL IS MOSTLY USELESS. IT WAS OVERTRAINED ON NEWSPAPERS AND JUST OUTPUTS NEWSPAPER STYLE TEXT. IF THAT'S WHAT YOU WANT, CONTINUE

GPT-1964

A 3.29B parameter language model trained on 1900–1964 English text. Where GPT-1900 stops at the Victorian era, GPT-1964 covers the early 20th century through the postwar period — two world wars, the atomic age, the birth of quantum mechanics, and the space race.

Trained on institutional books (1900–1922) and American Stories newspapers (1900–1964).

Training

Data: mhla/pre1964-corpus — 1900–1964 English text
Steps: 19,103
Val BPB: 0.944
Hardware: 8x8 H100 GPUs

Architecture

Custom GPT with RoPE, QK-norm, ReLU² activation, value embeddings (ResFormer), and per-layer residual/skip scalars. Built with the nanochat framework.

Parameter	Value
Parameters	3.29B
Layers	34
Hidden dim	2176
Attention heads	17 (query) / 17 (kv)
Head dim	128
Context length	2048 tokens
Vocab size	32,768 (BPE, GPT-4 style split pattern)

Quick Start

import torch, json
from nanochat.gpt import GPT, GPTConfig
from nanochat.tokenizer import RustBPETokenizer

tokenizer = RustBPETokenizer.from_directory("tokenizer")

with open("meta_019103.json") as f:
    meta = json.load(f)

config = GPTConfig(**meta["model_config"])
with torch.device("meta"):
    model = GPT(config)
model.to_empty(device="cuda")
model.init_weights()

state_dict = torch.load("model_019103.pt", map_location="cuda")
state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=True, assign=True)
model.eval()

Generate text

bos = tokenizer.get_bos_token_id()
tokens = tokenizer.encode("The luminiferous aether", prepend=bos)
with torch.amp.autocast(device_type="cuda", dtype=torch.bfloat16):
    for token in model.generate(tokens, max_tokens=200, temperature=0.8):
        print(tokenizer.decode([token]), end="", flush=True)

Dependencies

torch>=2.9
tiktoken
rustbpe

mhla/pre1900-corpus — Pre-1900 training corpus with metadata
mhla/gpt1900-physics-clm — Physics texts for continued pretraining
mhla/gpt1900-instruct-v3-data — Instruction-tuning conversation pairs
mhla/gpt1900-contradiction-eval — Physics contradiction evaluation problems

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including mhla/gpt1964-d34

GPT-1900

Collection

Pre-1900 LLMs for physics reasoning. RL models are physics-only; use the SFT model for general chat. Tune temperature (0.6-0.7). • 11 items • Updated Apr 2 • 9

mhla
/

gpt1964-d34