nanoGPT Spam Classifier -- 123.9M Parameters

Binary spam classifier fine-tuned from the nanoGPT pretrained SLM.

Pipeline: Trained from scratch -> Pretrained on 133 English fiction books -> Classification fine-tuned on SMS spam dataset.

Quick Start

Option 1: Run directly (downloads model + runs examples)

pip install torch tiktoken huggingface_hub
python nanogpt_classifier_inference.py

Option 2: Import and use in your own code

from nanogpt_classifier_inference import classify, is_spam, classify_batch

# Full result with confidence
result = classify("You won a free iPhone! Click here to claim.")
print(result)
# {'label': 'spam', 'confidence': 0.95, 'probabilities': {'not spam': 0.05, 'spam': 0.95}}
print()

# Simple boolean check
print(is_spam("You won a free iPhone!"))      # True
print(is_spam("See you at dinner tonight!"))   # False
print()

# Batch classification
texts = ["Free prize!", "Meeting at 3pm", "Click to win $$$"]
results = classify_batch(texts)
for text, r in zip(texts, results):
    print(f"  {r['label']:>8s} ({r['confidence']:.0%}) | {text}")
    
print()

Option 3: Load weights manually

from huggingface_hub import hf_hub_download
import torch, torch.nn as nn

model_path = hf_hub_download(
    repo_id="nishantup/nanogpt-slm-classifier",
    filename="nanogpt_classifier.pth"
)

from nanogpt_classifier_inference import GPT, GPTConfig

config = GPTConfig()
model = GPT(config)
model.lm_head = nn.Linear(768, 2)  # Replace LM head with 2-class classifier
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

How It Works

  1. Input text is tokenized (tiktoken GPT-2 BPE)
  2. Padded/truncated to 120 tokens
  3. Fed through the full transformer (12 layers)
  4. Last token's logits (shape: 2) are used for classification
  5. Argmax -> 0 = "not spam", 1 = "spam"

Model Details

Attribute Value
Parameters 123.9M
Architecture nanoGPT (12 layers, 12 heads, 768 dim)
Classification head Linear(768, 2) replacing lm_head
Classes 0 = not spam, 1 = spam
Max sequence length 120 tokens
Context length 256 tokens
Tokenizer tiktoken GPT-2 BPE (50,257 tokens)
Base model nishantup/nanogpt-slm-124m
Training data UCI SMS Spam Collection (balanced 747+747)
Framework PyTorch

Training Details

  • Base pretrained model frozen except: last transformer block + final LayerNorm + classification head
  • 5 epochs, AdamW (lr=5e-5, weight_decay=0.1), batch_size=8
  • Classification uses cross-entropy loss on last-token logits

Files

File Description
nanogpt_classifier.pth Classifier weights (lm_head = Linear(768, 2))
nanogpt_classifier_inference.py Standalone inference script
config.json Model + classifier configuration

API Reference

classify(text, max_length=120)

Returns dict with label, confidence, probabilities.

is_spam(text, max_length=120)

Returns True if spam, False if not.

classify_batch(texts, max_length=120)

Returns list of classify() results.

Related Models

Variant Type Repo
Pretrained (nanoGPT) Base LM nishantup/nanogpt-slm-124m
Instruction-tuned (nanoGPT) SFT nishantup/nanogpt-slm-instruct
Spam classifier (nanoGPT) Classification nishantup/nanogpt-slm-classifier
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nishantup/nanogpt-slm-classifier

Finetuned
(2)
this model