GPT2 SLM Instruct (Raschka Architecture) -- 163.2M Parameters

Instruction fine-tuned Small Language Model using the Raschka-style GPTModel architecture.

Pipeline: Trained from scratch -> Pretrained on 133 classic English fiction books -> SFT on Alpaca-format instructions.

Quick Start

Option 1: Run directly (downloads model + runs examples)

pip install torch tiktoken huggingface_hub
python gpt2_slm_instruct_inference.py

Option 2: Import and use `ask()` in your own code

# Import loads the model automatically (one-time download from HuggingFace)
from gpt2_slm_instruct_inference import ask

# Simple question
print(ask("What is the capital of France?"))
print()

# With input context
print(ask(
    instruction="Summarize the following text.",
    input_text="Machine learning enables systems to learn from data rather than being explicitly programmed."
))
print()

# Control generation
print(ask(
    "Write a short poem about the ocean.",
    temperature=1.0,    # higher = more creative
    top_k=100,          # wider sampling pool
    max_tokens=150      # longer output
))
print()

Option 3: Load weights manually

from huggingface_hub import hf_hub_download
import torch

model_path = hf_hub_download(
    repo_id="nishantup/gpt2-slm-instruct",
    filename="gpt2_slm_instruct.pth"
)

from gpt2_slm_instruct_inference import GPTModel, BASE_CONFIG

model = GPTModel(BASE_CONFIG)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

Prompt Format

Below is an instruction that describes a task.

### Instruction:
{instruction}

### Response:

With optional input:

Below is an instruction that describes a task, paired with further context.

### Instruction:
{instruction}

### Input:
{input}

### Response:

Model Details

Attribute	Value
Parameters	163.2M
Architecture	Raschka GPTModel (12 layers, 12 heads, 768 dim)
Context length	256 tokens
Tokenizer	tiktoken GPT-2 BPE (50,257 tokens)
Base model	nishantup/nanogpt-slm-124m (`gpt_slm_best.pth`)
Fine-tuning	Supervised (Alpaca format, 1,100 examples, 2 epochs)
Framework	PyTorch

Architecture Comparison

Feature	This model (Raschka)	nanoGPT variant
Weights file	`gpt2_slm_instruct.pth`	`nanogpt_slm_instruct.pth`
Attention	Separate W_query, W_key, W_value	Combined c_attn
LayerNorm	scale/shift params	weight/bias params
MLP	FeedForward (Sequential)	MLP (c_fc/c_proj)
Config	Dict (BASE_CONFIG)	Dataclass (GPTConfig)
Weight tying	No	Yes (wte = lm_head)
forward() returns	logits	(logits, loss) tuple

Files

File	Description
`gpt2_slm_instruct.pth`	SFT fine-tuned weights (Raschka GPTModel)
`gpt2_slm_instruct_inference.py`	Standalone inference script -- import and call `ask()`
`config.json`	Model configuration

`ask()` API Reference

ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)

Parameter	Default	Description
`instruction`	(required)	The task instruction
`input_text`	`""`	Optional additional context
`max_tokens`	`256`	Maximum tokens to generate
`temperature`	`0.7`	0.0 = greedy, 0.7 = balanced, 1.5 = creative
`top_k`	`40`	Top-k filtering (None = no filtering)

Related Models

Variant	Architecture	Repo
Pretrained base (Raschka)	GPTModel	nishantup/nanogpt-slm-124m (`gpt_slm_best.pth`)
Pretrained base (nanoGPT)	GPT	nishantup/nanogpt-slm-124m (`nanogpt_slm_best.pth`)
Instruct SFT (nanoGPT)	GPT	nishantup/nanogpt-slm-instruct

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nishantup/gpt2-slm-instruct

Base model

nishantup/nanogpt-slm-124m

Finetuned

(2)

this model