nishantup's picture
Upload README.md with huggingface_hub
5a03624 verified
---
license: mit
tags:
- pytorch
- nanogpt
- text-classification
- spam-detection
- slm
- from-scratch
base_model: nishantup/nanogpt-slm-124m
---
# nanoGPT Spam Classifier -- 123.9M Parameters
Binary spam classifier fine-tuned from the nanoGPT pretrained SLM.
**Pipeline:** Trained from scratch -> Pretrained on 133 English fiction books -> Classification fine-tuned on SMS spam dataset.
## Quick Start
### Option 1: Run directly (downloads model + runs examples)
```bash
pip install torch tiktoken huggingface_hub
python nanogpt_classifier_inference.py
```
### Option 2: Import and use in your own code
```python
from nanogpt_classifier_inference import classify, is_spam, classify_batch
# Full result with confidence
result = classify("You won a free iPhone! Click here to claim.")
print(result)
# {'label': 'spam', 'confidence': 0.95, 'probabilities': {'not spam': 0.05, 'spam': 0.95}}
print()
# Simple boolean check
print(is_spam("You won a free iPhone!")) # True
print(is_spam("See you at dinner tonight!")) # False
print()
# Batch classification
texts = ["Free prize!", "Meeting at 3pm", "Click to win $$$"]
results = classify_batch(texts)
for text, r in zip(texts, results):
print(f" {r['label']:>8s} ({r['confidence']:.0%}) | {text}")
print()
```
### Option 3: Load weights manually
```python
from huggingface_hub import hf_hub_download
import torch, torch.nn as nn
model_path = hf_hub_download(
repo_id="nishantup/nanogpt-slm-classifier",
filename="nanogpt_classifier.pth"
)
from nanogpt_classifier_inference import GPT, GPTConfig
config = GPTConfig()
model = GPT(config)
model.lm_head = nn.Linear(768, 2) # Replace LM head with 2-class classifier
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
```
## How It Works
1. Input text is tokenized (tiktoken GPT-2 BPE)
2. Padded/truncated to 120 tokens
3. Fed through the full transformer (12 layers)
4. **Last token's logits** (shape: 2) are used for classification
5. Argmax -> 0 = "not spam", 1 = "spam"
## Model Details
| Attribute | Value |
|:---|:---|
| Parameters | 123.9M |
| Architecture | nanoGPT (12 layers, 12 heads, 768 dim) |
| Classification head | Linear(768, 2) replacing lm_head |
| Classes | 0 = not spam, 1 = spam |
| Max sequence length | 120 tokens |
| Context length | 256 tokens |
| Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
| Base model | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) |
| Training data | UCI SMS Spam Collection (balanced 747+747) |
| Framework | PyTorch |
## Training Details
- Base pretrained model frozen except: last transformer block + final LayerNorm + classification head
- 5 epochs, AdamW (lr=5e-5, weight_decay=0.1), batch_size=8
- Classification uses cross-entropy loss on last-token logits
## Files
| File | Description |
|:---|:---|
| `nanogpt_classifier.pth` | Classifier weights (lm_head = Linear(768, 2)) |
| `nanogpt_classifier_inference.py` | Standalone inference script |
| `config.json` | Model + classifier configuration |
## API Reference
### `classify(text, max_length=120)`
Returns dict with `label`, `confidence`, `probabilities`.
### `is_spam(text, max_length=120)`
Returns `True` if spam, `False` if not.
### `classify_batch(texts, max_length=120)`
Returns list of classify() results.
## Related Models
| Variant | Type | Repo |
|:---|:---|:---|
| Pretrained (nanoGPT) | Base LM | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) |
| Instruction-tuned (nanoGPT) | SFT | [nishantup/nanogpt-slm-instruct](https://huggingface.co/nishantup/nanogpt-slm-instruct) |
| **Spam classifier (nanoGPT)** | **Classification** | **[nishantup/nanogpt-slm-classifier](https://huggingface.co/nishantup/nanogpt-slm-classifier)** |