File size: 4,271 Bytes
9f7c13e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | ---
license: mit
tags:
- pytorch
- gpt2
- instruction-tuning
- sft
- slm
- from-scratch
- raschka
base_model: nishantup/nanogpt-slm-124m
---
# GPT2 SLM Instruct (Raschka Architecture) -- 163.2M Parameters
Instruction fine-tuned Small Language Model using the Raschka-style GPTModel architecture.
**Pipeline:** Trained from scratch -> Pretrained on 133 classic English fiction books -> SFT on Alpaca-format instructions.
## Quick Start
### Option 1: Run directly (downloads model + runs examples)
```bash
pip install torch tiktoken huggingface_hub
python gpt2_slm_instruct_inference.py
```
### Option 2: Import and use `ask()` in your own code
```python
# Import loads the model automatically (one-time download from HuggingFace)
from gpt2_slm_instruct_inference import ask
# Simple question
print(ask("What is the capital of France?"))
print()
# With input context
print(ask(
instruction="Summarize the following text.",
input_text="Machine learning enables systems to learn from data rather than being explicitly programmed."
))
print()
# Control generation
print(ask(
"Write a short poem about the ocean.",
temperature=1.0, # higher = more creative
top_k=100, # wider sampling pool
max_tokens=150 # longer output
))
print()
```
### Option 3: Load weights manually
```python
from huggingface_hub import hf_hub_download
import torch
model_path = hf_hub_download(
repo_id="nishantup/gpt2-slm-instruct",
filename="gpt2_slm_instruct.pth"
)
from gpt2_slm_instruct_inference import GPTModel, BASE_CONFIG
model = GPTModel(BASE_CONFIG)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
```
## Prompt Format
```
Below is an instruction that describes a task.
### Instruction:
{instruction}
### Response:
```
With optional input:
```
Below is an instruction that describes a task, paired with further context.
### Instruction:
{instruction}
### Input:
{input}
### Response:
```
## Model Details
| Attribute | Value |
|:---|:---|
| Parameters | 163.2M |
| Architecture | Raschka GPTModel (12 layers, 12 heads, 768 dim) |
| Context length | 256 tokens |
| Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
| Base model | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) |
| Fine-tuning | Supervised (Alpaca format, 1,100 examples, 2 epochs) |
| Framework | PyTorch |
## Architecture Comparison
| Feature | This model (Raschka) | nanoGPT variant |
|:---|:---|:---|
| Weights file | `gpt2_slm_instruct.pth` | `nanogpt_slm_instruct.pth` |
| Attention | Separate W_query, W_key, W_value | Combined c_attn |
| LayerNorm | scale/shift params | weight/bias params |
| MLP | FeedForward (Sequential) | MLP (c_fc/c_proj) |
| Config | Dict (BASE_CONFIG) | Dataclass (GPTConfig) |
| Weight tying | No | Yes (wte = lm_head) |
| forward() returns | logits | (logits, loss) tuple |
## Files
| File | Description |
|:---|:---|
| `gpt2_slm_instruct.pth` | SFT fine-tuned weights (Raschka GPTModel) |
| `gpt2_slm_instruct_inference.py` | Standalone inference script -- import and call `ask()` |
| `config.json` | Model configuration |
## `ask()` API Reference
```python
ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)
```
| Parameter | Default | Description |
|:---|:---|:---|
| `instruction` | (required) | The task instruction |
| `input_text` | `""` | Optional additional context |
| `max_tokens` | `256` | Maximum tokens to generate |
| `temperature` | `0.7` | 0.0 = greedy, 0.7 = balanced, 1.5 = creative |
| `top_k` | `40` | Top-k filtering (None = no filtering) |
## Related Models
| Variant | Architecture | Repo |
|:---|:---|:---|
| Pretrained base (Raschka) | GPTModel | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) |
| Pretrained base (nanoGPT) | GPT | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`nanogpt_slm_best.pth`) |
| Instruct SFT (nanoGPT) | GPT | [nishantup/nanogpt-slm-instruct](https://huggingface.co/nishantup/nanogpt-slm-instruct) |
|