mathstral-nano-sat

A tiny GPT-style causal language model (236,928 parameters) trained on SAT-level math problems. Built entirely in NumPy with no PyTorch dependency. Demonstrates the full fine-tuning pipeline: tokenization, causal attention, AdamW, and cross-entropy loss on a byte-level vocabulary.

Model Details

Property Value
Architecture 4-layer causal transformer
Attention heads 8
Hidden dim 64
FFN dim 256
Vocabulary 256 (byte-level, UTF-8)
Max seq length 64
Total parameters 236,928
Framework Pure NumPy + SciPy

Training

Property Value
Dataset 20 SAT math Q&A examples
Epochs 3
Steps 300
Batch size 8
Learning rate 0.0003 (AdamW)
Baseline loss 5.5158
Final loss 2.224
Loss reduction 59.7%

Install

pip install safetensors scipy numpy

Usage

from modeling_mathstral_nano import MathstralNano

model = MathstralNano.from_pretrained(".")
print(model)
# MathstralNano(4L 8H 64d  params=236,928)

# Raw generation
response = model.generate("Problem: If 2x + 5 = 13, find x. Solution:")
print(response)

# Convenience wrapper (formats the prompt automatically)
response = model.solve("If 3x + 7 = 22, find x.")
print(response)

Files

File Description
model.safetensors Weights in safetensors format
config.json Architecture config
modeling_mathstral_nano.py Pure-NumPy model class with inference
training_metadata.json Full training run metadata and loss curve
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support