NanoChat-D32
A 32-layer language model trained with SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning via GRPO).
Model Architecture
| Parameter | Value |
|---|---|
| Layers | 32 |
| Hidden Dimension | 2048 |
| Attention Heads | 16 |
| KV Heads | 16 |
| Vocab Size | 65,536 |
| Context Length | 2048 |
| MLP Type | ReLU² |
| RoPE Theta | 10,000 |
Checkpoints
SFT Checkpoint (sft/)
- File:
model_035900.pt(7.2GB) - Step: 35,900
- Val Loss: 1.91
- MMLU Acc: 23.3%
- ARC-Easy Acc: 26.5%
SFT Training Config:
- Dtype: bfloat16
- Batch size: 4 per device, 32 examples per step
RL Checkpoint (rl/)
- File:
model_000499.pt(7.2GB) - Step: 499 (GRPO steps)
- Training: GRPO with DAPO reward mode
RL Training Config:
- Dtype: float32
- Learning rate: 1e-6
- KL coefficient: 0.0
- Clip ratio: [0.8, 1.28]
- Temperature: 1.0
- Max new tokens: 2048
- Samples per prompt: 4
Usage
Loading with PyTorch
import torch
# Load SFT checkpoint
sft_state = torch.load("sft/model_035900.pt", map_location="cpu")
# Load RL checkpoint
rl_state = torch.load("rl/model_000499.pt", map_location="cpu")
For Interpretability Research
These checkpoints are in raw PyTorch format, ideal for interpretability tools like TransformerLens, baukit, or nnsight.
# Example: inspect model weights
state_dict = torch.load("rl/model_000499.pt", map_location="cpu")
print(state_dict.keys())
# Access specific layer weights
layer_10_attn = state_dict["transformer.h.10.attn.c_attn.weight"]
License
Apache 2.0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support