NanoChat GPT Model (d20, Step 21400)
This is a trained language model using the NanoChat framework with custom architecture features:
- Rotary Position Embeddings (RoPE)
- Query-Key Normalization (QK Norm)
- ReLU^2 activation in MLP layers
- Group Query Attention (GQA) support
Model Details
- Architecture: NanoChat GPT
- Layers: 20
- Hidden Size: 1280
- Attention Heads: 10
- KV Heads: 10
- Vocabulary Size: 65536
- Max Sequence Length: 512
- Training Step: 21400
- Validation BPB: 0.8233
Usage
This model uses a custom architecture and requires the NanoChat codebase to load and use.
Loading the Model
from nanochat.checkpoint_manager import build_model
from nanochat.common import get_base_dir
# Download or clone this repo, then:
checkpoint_dir = "path/to/this/repo" # or use get_base_dir()
step = 21400
device = "cuda" # or "cpu"
model, tokenizer, meta_data = build_model(
checkpoint_dir,
step,
device,
phase="eval"
)
Loading Weights Only
If you want to load just the weights:
from safetensors.torch import load_file
import torch
weights = load_file("model.safetensors")
# Then load into your model architecture
Training Details
See training_metadata.json for full training configuration and hyperparameters.
Citation
If you use this model, please cite the NanoChat repository.
- Downloads last month
- 26
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support