NanoChat GPT Model (d20, Step 21400)

This is a trained language model using the NanoChat framework with custom architecture features:

Rotary Position Embeddings (RoPE)
Query-Key Normalization (QK Norm)
ReLU^2 activation in MLP layers
Group Query Attention (GQA) support

Model Details

Architecture: NanoChat GPT
Layers: 20
Hidden Size: 1280
Attention Heads: 10
KV Heads: 10
Vocabulary Size: 65536
Max Sequence Length: 512
Training Step: 21400
Validation BPB: 0.8233

Usage

This model uses a custom architecture and requires the NanoChat codebase to load and use.

Loading the Model

from nanochat.checkpoint_manager import build_model
from nanochat.common import get_base_dir

# Download or clone this repo, then:
checkpoint_dir = "path/to/this/repo"  # or use get_base_dir()
step = 21400
device = "cuda"  # or "cpu"

model, tokenizer, meta_data = build_model(
    checkpoint_dir, 
    step, 
    device, 
    phase="eval"
)

Loading Weights Only

If you want to load just the weights:

from safetensors.torch import load_file
import torch

weights = load_file("model.safetensors")
# Then load into your model architecture

Training Details

See training_metadata.json for full training configuration and hyperparameters.

Citation

If you use this model, please cite the NanoChat repository.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support