mathstral-nano-sat

A tiny GPT-style causal language model (236,928 parameters) trained on SAT-level math problems. Built entirely in NumPy with no PyTorch dependency. Demonstrates the full fine-tuning pipeline: tokenization, causal attention, AdamW, and cross-entropy loss on a byte-level vocabulary.

Model Details

Property	Value
Architecture	4-layer causal transformer
Attention heads	8
Hidden dim	64
FFN dim	256
Vocabulary	256 (byte-level, UTF-8)
Max seq length	64
Total parameters	236,928
Framework	Pure NumPy + SciPy

Training

Property	Value
Dataset	20 SAT math Q&A examples
Epochs	3
Steps	300
Batch size	8
Learning rate	0.0003 (AdamW)
Baseline loss	5.5158
Final loss	2.224
Loss reduction	59.7%

Install

pip install safetensors scipy numpy

Usage

from modeling_mathstral_nano import MathstralNano

model = MathstralNano.from_pretrained(".")
print(model)
# MathstralNano(4L 8H 64d  params=236,928)

# Raw generation
response = model.generate("Problem: If 2x + 5 = 13, find x. Solution:")
print(response)

# Convenience wrapper (formats the prompt automatically)
response = model.solve("If 3x + 7 = 22, find x.")
print(response)

Files

File	Description
`model.safetensors`	Weights in safetensors format
`config.json`	Architecture config
`modeling_mathstral_nano.py`	Pure-NumPy model class with inference
`training_metadata.json`	Full training run metadata and loss curve

Downloads last month: 3

Safetensors

Model size

237k params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support