GPT-1900 (D34, 8B token subset)
A 3.3B parameter GPT-style language model trained on ~8B tokens of pre-1900 English text (11x data:param ratio).
Model Details
- Architecture: Custom GPT with RoPE, QK-norm, ReLU², value embeddings (ResFormer), per-layer residual/skip scalars
- Parameters: ~3.3B
- Layers: 34
- Hidden dim: 2176
- Attention heads: 17 (query) / 17 (kv)
- Head dim: 128
- Context length: 2048 tokens
- Vocab size: 32,768 (BPE, GPT-4 style split pattern)
- Training: FP8 (tensorwise), Muon+AdamW optimizer
- Final val BPB: 0.721
Checkpoint Contents
model_010507.pt # Model weights
meta_010507.json # Training config and metadata
optim_010507_rank*.pt # Optimizer state shards (for resuming training)
tokenizer/ # BPE tokenizer (tiktoken format) + token byte counts
nanochat/ # Source code to load and run the model
Quick Start
import torch
from nanochat.gpt import GPT, GPTConfig
from nanochat.tokenizer import RustBPETokenizer
# Load tokenizer
tokenizer = RustBPETokenizer.from_directory("tokenizer")
# Load model
import json
with open("meta_010507.json") as f:
meta = json.load(f)
config = GPTConfig(**meta["model_config"])
with torch.device("meta"):
model = GPT(config)
model.to_empty(device="cuda")
model.init_weights()
state_dict = torch.load("model_010507.pt", map_location="cuda")
state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=True, assign=True)
model.eval()
# Generate
bos = tokenizer.get_bos_token_id()
tokens = tokenizer.encode("It was a dark and stormy night", prepend=bos)
with torch.amp.autocast(device_type="cuda", dtype=torch.bfloat16):
for token in model.generate(tokens, max_tokens=100, temperature=0.8):
print(tokenizer.decode([token]), end="", flush=True)
Dependencies
torch>=2.9
tiktoken
rustbpe
Training
Trained with the nanochat framework on H100 GPUs.
To resume training, load the optimizer shards (optim_010507_rank*.pt) — one per rank.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support