Add note on generation parameters

346dddf verified 21 days ago

3.1 kB

license: mit
language:
  - en
tags:
  - gpt
  - pre-1900
  - historical
  - physics
  - nanochat
  - chat

GPT-1900 Instruct v3

GPT-1900 fine-tuned for instruction following and multi-turn conversation. Ask it about the nature of light, the fate of empires, or the meaning of progress — and it answers as a thoughtful 19th-century mind would.

This is the default model served by the GPT-1900 chat interface.

Training

Base model: mhla/gpt1900-d34-22btok
Data: mhla/gpt1900-instruct-v3-data — 53,458 synthetic multi-turn conversations (full corpus)
Steps: 75
Val BPB: 0.626

Architecture

Custom GPT with RoPE, QK-norm, ReLU² activation, value embeddings (ResFormer), and per-layer residual/skip scalars. Built with the nanochat framework.

Parameter	Value
Parameters	3.29B
Layers	34
Hidden dim	2176
Attention heads	17 (query) / 17 (kv)
Head dim	128
Context length	2048 tokens
Vocab size	32,768 (BPE, GPT-4 style split pattern)

Notes

Generation parameters: You may need to play with temperature to get good results. The default is 0.6 with top_k=50.

Quick Start

import torch, json
from nanochat.gpt import GPT, GPTConfig
from nanochat.tokenizer import RustBPETokenizer

tokenizer = RustBPETokenizer.from_directory("tokenizer")

with open("meta_000075.json") as f:
    meta = json.load(f)

config = GPTConfig(**meta["model_config"])
with torch.device("meta"):
    model = GPT(config)
model.to_empty(device="cuda")
model.init_weights()

state_dict = torch.load("model_000075.pt", map_location="cuda")
state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=True, assign=True)
model.eval()

Chat

bos = tokenizer.get_bos_token_id()
user_start = tokenizer.encode_special("<|user_start|>")
user_end = tokenizer.encode_special("<|user_end|>")
assistant_start = tokenizer.encode_special("<|assistant_start|>")

tokens = [bos, user_start]
tokens += tokenizer.encode("What is the nature of light?")
tokens += [user_end, assistant_start]

with torch.amp.autocast(device_type="cuda", dtype=torch.bfloat16):
    for token in model.generate(tokens, max_tokens=500, temperature=0.8):
        print(tokenizer.decode([token]), end="", flush=True)

Dependencies

torch>=2.9
tiktoken
rustbpe

mhla/pre1900-corpus — Pre-1900 training corpus with metadata
mhla/gpt1900-physics-clm — Physics texts for continued pretraining
mhla/gpt1900-instruct-v3-data — Instruction-tuning conversation pairs
mhla/gpt1900-contradiction-eval — Physics contradiction evaluation problems

mhla
/

gpt1900-instruct-v3-sft