GPT2 SLM Instruct (Raschka Architecture) -- 163.2M Parameters
Instruction fine-tuned Small Language Model using the Raschka-style GPTModel architecture.
Pipeline: Trained from scratch -> Pretrained on 133 classic English fiction books -> SFT on Alpaca-format instructions.
Quick Start
Option 1: Run directly (downloads model + runs examples)
pip install torch tiktoken huggingface_hub
python gpt2_slm_instruct_inference.py
Option 2: Import and use ask() in your own code
# Import loads the model automatically (one-time download from HuggingFace)
from gpt2_slm_instruct_inference import ask
# Simple question
print(ask("What is the capital of France?"))
print()
# With input context
print(ask(
instruction="Summarize the following text.",
input_text="Machine learning enables systems to learn from data rather than being explicitly programmed."
))
print()
# Control generation
print(ask(
"Write a short poem about the ocean.",
temperature=1.0, # higher = more creative
top_k=100, # wider sampling pool
max_tokens=150 # longer output
))
print()
Option 3: Load weights manually
from huggingface_hub import hf_hub_download
import torch
model_path = hf_hub_download(
repo_id="nishantup/gpt2-slm-instruct",
filename="gpt2_slm_instruct.pth"
)
from gpt2_slm_instruct_inference import GPTModel, BASE_CONFIG
model = GPTModel(BASE_CONFIG)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
Prompt Format
Below is an instruction that describes a task.
### Instruction:
{instruction}
### Response:
With optional input:
Below is an instruction that describes a task, paired with further context.
### Instruction:
{instruction}
### Input:
{input}
### Response:
Model Details
| Attribute | Value |
|---|---|
| Parameters | 163.2M |
| Architecture | Raschka GPTModel (12 layers, 12 heads, 768 dim) |
| Context length | 256 tokens |
| Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
| Base model | nishantup/nanogpt-slm-124m (gpt_slm_best.pth) |
| Fine-tuning | Supervised (Alpaca format, 1,100 examples, 2 epochs) |
| Framework | PyTorch |
Architecture Comparison
| Feature | This model (Raschka) | nanoGPT variant |
|---|---|---|
| Weights file | gpt2_slm_instruct.pth |
nanogpt_slm_instruct.pth |
| Attention | Separate W_query, W_key, W_value | Combined c_attn |
| LayerNorm | scale/shift params | weight/bias params |
| MLP | FeedForward (Sequential) | MLP (c_fc/c_proj) |
| Config | Dict (BASE_CONFIG) | Dataclass (GPTConfig) |
| Weight tying | No | Yes (wte = lm_head) |
| forward() returns | logits | (logits, loss) tuple |
Files
| File | Description |
|---|---|
gpt2_slm_instruct.pth |
SFT fine-tuned weights (Raschka GPTModel) |
gpt2_slm_instruct_inference.py |
Standalone inference script -- import and call ask() |
config.json |
Model configuration |
ask() API Reference
ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)
| Parameter | Default | Description |
|---|---|---|
instruction |
(required) | The task instruction |
input_text |
"" |
Optional additional context |
max_tokens |
256 |
Maximum tokens to generate |
temperature |
0.7 |
0.0 = greedy, 0.7 = balanced, 1.5 = creative |
top_k |
40 |
Top-k filtering (None = no filtering) |
Related Models
| Variant | Architecture | Repo |
|---|---|---|
| Pretrained base (Raschka) | GPTModel | nishantup/nanogpt-slm-124m (gpt_slm_best.pth) |
| Pretrained base (nanoGPT) | GPT | nishantup/nanogpt-slm-124m (nanogpt_slm_best.pth) |
| Instruct SFT (nanoGPT) | GPT | nishantup/nanogpt-slm-instruct |
- Downloads last month
- 45
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for nishantup/gpt2-slm-instruct
Base model
nishantup/nanogpt-slm-124m