NanoChat-D32

A 32-layer language model trained with SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning via GRPO).

Model Architecture

Parameter Value
Layers 32
Hidden Dimension 2048
Attention Heads 16
KV Heads 16
Vocab Size 65,536
Context Length 2048
MLP Type ReLU²
RoPE Theta 10,000

Checkpoints

SFT Checkpoint (sft/)

  • File: model_035900.pt (7.2GB)
  • Step: 35,900
  • Val Loss: 1.91
  • MMLU Acc: 23.3%
  • ARC-Easy Acc: 26.5%

SFT Training Config:

  • Dtype: bfloat16
  • Batch size: 4 per device, 32 examples per step

RL Checkpoint (rl/)

  • File: model_000499.pt (7.2GB)
  • Step: 499 (GRPO steps)
  • Training: GRPO with DAPO reward mode

RL Training Config:

  • Dtype: float32
  • Learning rate: 1e-6
  • KL coefficient: 0.0
  • Clip ratio: [0.8, 1.28]
  • Temperature: 1.0
  • Max new tokens: 2048
  • Samples per prompt: 4

Usage

Loading with PyTorch

import torch

# Load SFT checkpoint
sft_state = torch.load("sft/model_035900.pt", map_location="cpu")

# Load RL checkpoint  
rl_state = torch.load("rl/model_000499.pt", map_location="cpu")

For Interpretability Research

These checkpoints are in raw PyTorch format, ideal for interpretability tools like TransformerLens, baukit, or nnsight.

# Example: inspect model weights
state_dict = torch.load("rl/model_000499.pt", map_location="cpu")
print(state_dict.keys())

# Access specific layer weights
layer_10_attn = state_dict["transformer.h.10.attn.c_attn.weight"]

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support