GPT-1900 D34 Physics SFT

3.29B parameter GPT-1900 model fine-tuned on pre-1900 physics text via causal language modeling (CLM). Base model: gpt1900-d34-22btok.

Model Details

Architecture: Custom GPT with RoPE, QK-norm, ReLU², value embeddings (ResFormer), per-layer residual/skip scalars
Parameters: 3.29B
Layers: 34
Hidden dim: 2176
Attention heads: 17 (query) / 17 (kv)
Head dim: 128
Context length: 2048 tokens
Vocab size: 32,768 (BPE, GPT-4 style split pattern)
Training: Physics CLM fine-tuning (3 epochs), bfloat16
Final val BPB: 0.861

Checkpoint Contents

model_000404.pt          # Model weights
meta_000404.json         # Training config and metadata
optim_000404_rank0.pt    # Optimizer state
tokenizer/               # BPE tokenizer (tiktoken format) + token byte counts
nanochat/                # Source code to load and run the model

Quick Start

import torch, json
from nanochat.gpt import GPT, GPTConfig
from nanochat.tokenizer import RustBPETokenizer

tokenizer = RustBPETokenizer.from_directory("tokenizer")

with open("meta_000404.json") as f:
    meta = json.load(f)

config = GPTConfig(**meta["model_config"])

with torch.device("meta"):
    model = GPT(config)
model.to_empty(device="cuda")
model.init_weights()

state_dict = torch.load("model_000404.pt", map_location="cuda")
state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=True, assign=True)
model.eval()

bos = tokenizer.get_bos_token_id()
tokens = tokenizer.encode("The laws of thermodynamics", prepend=bos)
with torch.amp.autocast(device_type="cuda", dtype=torch.bfloat16):
    for token in model.generate(tokens, max_tokens=100, temperature=0.8):
        print(tokenizer.decode([token]), end="", flush=True)

Dependencies

torch>=2.9
tiktoken
rustbpe

Model Family

mhla/gpt1900-d34-22btok - Base pretrained model
mhla/gpt1900-d34-sft-period - SFT (period style)
mhla/gpt1900-d34-sft-modern - SFT (modern style)
mhla/gpt1900-d34-rl - RL post-training
mhla/gpt1900-d34-physics-sft - Physics CLM fine-tuning

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including mhla/gpt1900-d34-physics-sft

GPT-1900 Drafts

Collection

Experimental and intermediate GPT-1900 checkpoints. Working artifacts, not for general use. • 49 items • Updated Mar 29