nanoGPT SLM -- 123.8M Parameter Language Model
A small language model trained from scratch using a custom nanoGPT implementation.
Pretrained on 133 classic English fiction novels from Project Gutenberg.
Quick Start
Option 1: Run directly (downloads model + runs examples)
pip install torch tiktoken huggingface_hub
python nanogpt_slm_pretrained_inference.py
Option 2: Import and use ask() in your own code
from nanogpt_slm_pretrained_inference import ask, generate_text
print(ask("Once upon a time there was"))
print()
print(ask(
"The meaning of life is",
temperature=1.0,
top_k=100,
max_tokens=150
))
print()
print(generate_text("She opened the door and saw", max_tokens=200))
print()
Option 3: Load weights manually
from huggingface_hub import hf_hub_download
import torch
model_path = hf_hub_download(
repo_id="nishantup/nanogpt-slm-124m",
filename="nanogpt_slm_best.pth"
)
from nanogpt_slm_pretrained_inference import GPT, GPTKV, GPTConfig
config = GPTConfig()
model = GPTKV(config)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
Model Details
| Attribute |
Value |
| Parameters |
123.8M |
| Architecture |
nanoGPT (12 layers, 12 heads, 768 dim) |
| Context length |
256 tokens |
| Tokenizer |
tiktoken GPT-2 BPE (50,257 tokens) |
| Training data |
133 English fiction novels (37.5M tokens) |
| Framework |
PyTorch |
Files
| File |
Description |
nanogpt_slm_best.pth |
Pretrained model weights (best val loss) |
nanogpt_slm_pretrained_inference.py |
Standalone inference script -- import and call ask() |
config.json |
Model configuration |
ask() / generate_text() API Reference
ask(prompt, max_tokens=200, temperature=0.8, top_k=40)
generate_text(prompt, max_tokens=200, temperature=0.8, top_k=40)
| Parameter |
Default |
Description |
prompt |
(required) |
Text to continue from |
max_tokens |
200 |
Maximum tokens to generate |
temperature |
0.8 |
0.01 = near-greedy, 0.8 = balanced, 1.5 = creative |
top_k |
40 |
Top-k filtering (None = no filtering) |
Fine-tuned Variants
Notes
- Trained completely from scratch (no pretrained initialization)
- Uses KV cache (GPTKV) for O(1) per-token decode
- Weight tying between token embeddings and LM head