mathstral-nano-sat
A tiny GPT-style causal language model (236,928 parameters) trained on SAT-level math problems. Built entirely in NumPy with no PyTorch dependency. Demonstrates the full fine-tuning pipeline: tokenization, causal attention, AdamW, and cross-entropy loss on a byte-level vocabulary.
Model Details
| Property | Value |
|---|---|
| Architecture | 4-layer causal transformer |
| Attention heads | 8 |
| Hidden dim | 64 |
| FFN dim | 256 |
| Vocabulary | 256 (byte-level, UTF-8) |
| Max seq length | 64 |
| Total parameters | 236,928 |
| Framework | Pure NumPy + SciPy |
Training
| Property | Value |
|---|---|
| Dataset | 20 SAT math Q&A examples |
| Epochs | 3 |
| Steps | 300 |
| Batch size | 8 |
| Learning rate | 0.0003 (AdamW) |
| Baseline loss | 5.5158 |
| Final loss | 2.224 |
| Loss reduction | 59.7% |
Install
pip install safetensors scipy numpy
Usage
from modeling_mathstral_nano import MathstralNano
model = MathstralNano.from_pretrained(".")
print(model)
# MathstralNano(4L 8H 64d params=236,928)
# Raw generation
response = model.generate("Problem: If 2x + 5 = 13, find x. Solution:")
print(response)
# Convenience wrapper (formats the prompt automatically)
response = model.solve("If 3x + 7 = 22, find x.")
print(response)
Files
| File | Description |
|---|---|
model.safetensors |
Weights in safetensors format |
config.json |
Architecture config |
modeling_mathstral_nano.py |
Pure-NumPy model class with inference |
training_metadata.json |
Full training run metadata and loss curve |
- Downloads last month
- 20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support