NanoChat GPT Model (d20, Step 21400)

This is a trained language model using the NanoChat framework with custom architecture features:

  • Rotary Position Embeddings (RoPE)
  • Query-Key Normalization (QK Norm)
  • ReLU^2 activation in MLP layers
  • Group Query Attention (GQA) support

Model Details

  • Architecture: NanoChat GPT
  • Layers: 20
  • Hidden Size: 1280
  • Attention Heads: 10
  • KV Heads: 10
  • Vocabulary Size: 65536
  • Max Sequence Length: 512
  • Training Step: 21400
  • Validation BPB: 0.8233

Usage

This model uses a custom architecture and requires the NanoChat codebase to load and use.

Loading the Model

from nanochat.checkpoint_manager import build_model
from nanochat.common import get_base_dir

# Download or clone this repo, then:
checkpoint_dir = "path/to/this/repo"  # or use get_base_dir()
step = 21400
device = "cuda"  # or "cpu"

model, tokenizer, meta_data = build_model(
    checkpoint_dir, 
    step, 
    device, 
    phase="eval"
)

Loading Weights Only

If you want to load just the weights:

from safetensors.torch import load_file
import torch

weights = load_file("model.safetensors")
# Then load into your model architecture

Training Details

See training_metadata.json for full training configuration and hyperparameters.

Citation

If you use this model, please cite the NanoChat repository.

Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support