NanoChat-D32

A 32-layer language model trained with SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning via GRPO).

Model Architecture

Parameter	Value
Layers	32
Hidden Dimension	2048
Attention Heads	16
KV Heads	16
Vocab Size	65,536
Context Length	2048
MLP Type	ReLU²
RoPE Theta	10,000

Checkpoints

SFT Checkpoint (`sft/`)

File: model_035900.pt (7.2GB)
Step: 35,900
Val Loss: 1.91
MMLU Acc: 23.3%
ARC-Easy Acc: 26.5%

SFT Training Config:

Dtype: bfloat16
Batch size: 4 per device, 32 examples per step

RL Checkpoint (`rl/`)

File: model_000499.pt (7.2GB)
Step: 499 (GRPO steps)
Training: GRPO with DAPO reward mode

RL Training Config:

Dtype: float32
Learning rate: 1e-6
KL coefficient: 0.0
Clip ratio: [0.8, 1.28]
Temperature: 1.0
Max new tokens: 2048
Samples per prompt: 4

Usage

Loading with PyTorch

import torch

# Load SFT checkpoint
sft_state = torch.load("sft/model_035900.pt", map_location="cpu")

# Load RL checkpoint  
rl_state = torch.load("rl/model_000499.pt", map_location="cpu")

For Interpretability Research

These checkpoints are in raw PyTorch format, ideal for interpretability tools like TransformerLens, baukit, or nnsight.

# Example: inspect model weights
state_dict = torch.load("rl/model_000499.pt", map_location="cpu")
print(state_dict.keys())

# Access specific layer weights
layer_10_attn = state_dict["transformer.h.10.attn.c_attn.weight"]

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support