Pure Python GPT Model

A minimal GPT language model trained from scratch using only Python's standard library (no PyTorch/TensorFlow).

Model Description

This model implements the core GPT architecture:

Decoder-only Transformer with causal (autoregressive) attention
Multi-Head Self-Attention for capturing dependencies
Feed-Forward MLP with ReLU activation
RMSNorm for layer normalization
Residual connections for stable training

Architecture

Parameter	Value
Layers	6
Embedding Dimension	192
Attention Heads	6
Context Length	256
Vocabulary Size	77
Total Parameters	112,256

Training

Dataset: OpenWebText-10k subset
Optimizer: Adam (β1=0.9, β2=0.999)
Learning Rate: Cosine decay with warmup
Training Steps: 2000

Usage

This model uses a custom pure-Python implementation. See the repository for the complete code.

# Load and generate
from model import gpt, generate
text = generate(prompt="Once upon a time", temperature=0.7)
print(text)
Limitations
Small model size (educational purposes)
Character-level tokenization (not BPE)
Limited training data and compute
Pure Python = slow inference
License
Apache 2.0

Acknowledgments
Based on Andrej Karpathy's educational implementations (micrograd, makemore, nanoGPT).

Downloads last month: 2

yashshinde0080
/

zeronet-3

Pure Python GPT Model

Model Description

Architecture

Training

Usage

Dataset used to train yashshinde0080/zeronet-3