NanoChat GPT Model (d20, Step 21400)

This is a trained language model using the NanoChat framework with custom architecture features:

  • Rotary Position Embeddings (RoPE)
  • Query-Key Normalization (QK Norm)
  • ReLU^2 activation in MLP layers
  • Group Query Attention (GQA) support

Model Details

  • Architecture: NanoChat GPT
  • Layers: 20
  • Hidden Size: 1280
  • Attention Heads: 10
  • KV Heads: 10
  • Vocabulary Size: 65536
  • Max Sequence Length: 512
  • Training Step: 21400
  • Validation BPB: 0.8233

Usage

This model uses a custom architecture and requires the NanoChat codebase to load and use.

Loading the Model

from nanochat.checkpoint_manager import build_model
from nanochat.common import get_base_dir

# Download or clone this repo, then:
checkpoint_dir = "path/to/this/repo"  # or use get_base_dir()
step = 21400
device = "cuda"  # or "cpu"

model, tokenizer, meta_data = build_model(
    checkpoint_dir, 
    step, 
    device, 
    phase="eval"
)

Loading Weights Only

If you want to load just the weights:

from safetensors.torch import load_file
import torch

weights = load_file("model.safetensors")
# Then load into your model architecture

Training Details

See training_metadata.json for full training configuration and hyperparameters.

Citation

If you use this model, please cite the NanoChat repository.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support