File size: 4,271 Bytes

9f7c13e

---

license: mit
tags:
  - pytorch
  - gpt2
  - instruction-tuning
  - sft
  - slm
  - from-scratch
  - raschka
base_model: nishantup/nanogpt-slm-124m
---


# GPT2 SLM Instruct (Raschka Architecture) -- 163.2M Parameters

Instruction fine-tuned Small Language Model using the Raschka-style GPTModel architecture.

**Pipeline:** Trained from scratch -> Pretrained on 133 classic English fiction books -> SFT on Alpaca-format instructions.

## Quick Start

### Option 1: Run directly (downloads model + runs examples)
```bash

pip install torch tiktoken huggingface_hub

python gpt2_slm_instruct_inference.py

```

### Option 2: Import and use `ask()` in your own code
```python

# Import loads the model automatically (one-time download from HuggingFace)

from gpt2_slm_instruct_inference import ask



# Simple question

print(ask("What is the capital of France?"))

print()



# With input context

print(ask(

    instruction="Summarize the following text.",

    input_text="Machine learning enables systems to learn from data rather than being explicitly programmed."

))

print()



# Control generation

print(ask(

    "Write a short poem about the ocean.",

    temperature=1.0,    # higher = more creative

    top_k=100,          # wider sampling pool

    max_tokens=150      # longer output

))

print()

```

### Option 3: Load weights manually
```python

from huggingface_hub import hf_hub_download

import torch



model_path = hf_hub_download(

    repo_id="nishantup/gpt2-slm-instruct",

    filename="gpt2_slm_instruct.pth"

)



from gpt2_slm_instruct_inference import GPTModel, BASE_CONFIG



model = GPTModel(BASE_CONFIG)

model.load_state_dict(torch.load(model_path, map_location="cpu"))

model.eval()

```

## Prompt Format

```

Below is an instruction that describes a task.



### Instruction:

{instruction}



### Response:

```

With optional input:
```

Below is an instruction that describes a task, paired with further context.



### Instruction:

{instruction}



### Input:

{input}



### Response:

```

## Model Details

| Attribute | Value |
|:---|:---|
| Parameters | 163.2M |
| Architecture | Raschka GPTModel (12 layers, 12 heads, 768 dim) |
| Context length | 256 tokens |
| Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
| Base model | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) |
| Fine-tuning | Supervised (Alpaca format, 1,100 examples, 2 epochs) |
| Framework | PyTorch |

## Architecture Comparison

| Feature | This model (Raschka) | nanoGPT variant |
|:---|:---|:---|
| Weights file | `gpt2_slm_instruct.pth` | `nanogpt_slm_instruct.pth` |
| Attention | Separate W_query, W_key, W_value | Combined c_attn |
| LayerNorm | scale/shift params | weight/bias params |
| MLP | FeedForward (Sequential) | MLP (c_fc/c_proj) |
| Config | Dict (BASE_CONFIG) | Dataclass (GPTConfig) |

| Weight tying | No | Yes (wte = lm_head) |
| forward() returns | logits | (logits, loss) tuple |

## Files

| File | Description |
|:---|:---|
| `gpt2_slm_instruct.pth` | SFT fine-tuned weights (Raschka GPTModel) |
| `gpt2_slm_instruct_inference.py` | Standalone inference script -- import and call `ask()` |
| `config.json` | Model configuration |

## `ask()` API Reference

```python

ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)

```

| Parameter | Default | Description |
|:---|:---|:---|
| `instruction` | (required) | The task instruction |
| `input_text` | `""` | Optional additional context |
| `max_tokens` | `256` | Maximum tokens to generate |
| `temperature` | `0.7` | 0.0 = greedy, 0.7 = balanced, 1.5 = creative |
| `top_k` | `40` | Top-k filtering (None = no filtering) |

## Related Models

| Variant | Architecture | Repo |
|:---|:---|:---|
| Pretrained base (Raschka) | GPTModel | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) |
| Pretrained base (nanoGPT) | GPT | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`nanogpt_slm_best.pth`) |
| Instruct SFT (nanoGPT) | GPT | [nishantup/nanogpt-slm-instruct](https://huggingface.co/nishantup/nanogpt-slm-instruct) |