Pure Python GPT Model

A minimal GPT language model trained from scratch using only Python's standard library (no PyTorch/TensorFlow).

Model Description

This model implements the core GPT architecture:

  • Decoder-only Transformer with causal (autoregressive) attention
  • Multi-Head Self-Attention for capturing dependencies
  • Feed-Forward MLP with ReLU activation
  • RMSNorm for layer normalization
  • Residual connections for stable training

Architecture

Parameter Value
Layers 6
Embedding Dimension 192
Attention Heads 6
Context Length 256
Vocabulary Size 77
Total Parameters 112,256

Training

  • Dataset: OpenWebText-10k subset
  • Optimizer: Adam (β1=0.9, β2=0.999)
  • Learning Rate: Cosine decay with warmup
  • Training Steps: 2000

Usage

This model uses a custom pure-Python implementation. See the repository for the complete code.

# Load and generate
from model import gpt, generate
text = generate(prompt="Once upon a time", temperature=0.7)
print(text)
Limitations
Small model size (educational purposes)
Character-level tokenization (not BPE)
Limited training data and compute
Pure Python = slow inference
License
Apache 2.0

Acknowledgments
Based on Andrej Karpathy's educational implementations (micrograd, makemore, nanoGPT).
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train yashshinde0080/zeronet-3