File size: 4,271 Bytes
9f7c13e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---

license: mit
tags:
  - pytorch
  - gpt2
  - instruction-tuning
  - sft
  - slm
  - from-scratch
  - raschka
base_model: nishantup/nanogpt-slm-124m
---


# GPT2 SLM Instruct (Raschka Architecture) -- 163.2M Parameters

Instruction fine-tuned Small Language Model using the Raschka-style GPTModel architecture.

**Pipeline:** Trained from scratch -> Pretrained on 133 classic English fiction books -> SFT on Alpaca-format instructions.

## Quick Start

### Option 1: Run directly (downloads model + runs examples)
```bash

pip install torch tiktoken huggingface_hub

python gpt2_slm_instruct_inference.py

```

### Option 2: Import and use `ask()` in your own code
```python

# Import loads the model automatically (one-time download from HuggingFace)

from gpt2_slm_instruct_inference import ask



# Simple question

print(ask("What is the capital of France?"))

print()



# With input context

print(ask(

    instruction="Summarize the following text.",

    input_text="Machine learning enables systems to learn from data rather than being explicitly programmed."

))

print()



# Control generation

print(ask(

    "Write a short poem about the ocean.",

    temperature=1.0,    # higher = more creative

    top_k=100,          # wider sampling pool

    max_tokens=150      # longer output

))

print()

```

### Option 3: Load weights manually
```python

from huggingface_hub import hf_hub_download

import torch



model_path = hf_hub_download(

    repo_id="nishantup/gpt2-slm-instruct",

    filename="gpt2_slm_instruct.pth"

)



from gpt2_slm_instruct_inference import GPTModel, BASE_CONFIG



model = GPTModel(BASE_CONFIG)

model.load_state_dict(torch.load(model_path, map_location="cpu"))

model.eval()

```

## Prompt Format

```

Below is an instruction that describes a task.



### Instruction:

{instruction}



### Response:

```

With optional input:
```

Below is an instruction that describes a task, paired with further context.



### Instruction:

{instruction}



### Input:

{input}



### Response:

```

## Model Details

| Attribute | Value |
|:---|:---|
| Parameters | 163.2M |
| Architecture | Raschka GPTModel (12 layers, 12 heads, 768 dim) |
| Context length | 256 tokens |
| Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
| Base model | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) |
| Fine-tuning | Supervised (Alpaca format, 1,100 examples, 2 epochs) |
| Framework | PyTorch |

## Architecture Comparison

| Feature | This model (Raschka) | nanoGPT variant |
|:---|:---|:---|
| Weights file | `gpt2_slm_instruct.pth` | `nanogpt_slm_instruct.pth` |
| Attention | Separate W_query, W_key, W_value | Combined c_attn |
| LayerNorm | scale/shift params | weight/bias params |
| MLP | FeedForward (Sequential) | MLP (c_fc/c_proj) |
| Config | Dict (BASE_CONFIG) | Dataclass (GPTConfig) |

| Weight tying | No | Yes (wte = lm_head) |
| forward() returns | logits | (logits, loss) tuple |

## Files

| File | Description |
|:---|:---|
| `gpt2_slm_instruct.pth` | SFT fine-tuned weights (Raschka GPTModel) |
| `gpt2_slm_instruct_inference.py` | Standalone inference script -- import and call `ask()` |
| `config.json` | Model configuration |

## `ask()` API Reference

```python

ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)

```

| Parameter | Default | Description |
|:---|:---|:---|
| `instruction` | (required) | The task instruction |
| `input_text` | `""` | Optional additional context |
| `max_tokens` | `256` | Maximum tokens to generate |
| `temperature` | `0.7` | 0.0 = greedy, 0.7 = balanced, 1.5 = creative |
| `top_k` | `40` | Top-k filtering (None = no filtering) |

## Related Models

| Variant | Architecture | Repo |
|:---|:---|:---|
| Pretrained base (Raschka) | GPTModel | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) |
| Pretrained base (nanoGPT) | GPT | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`nanogpt_slm_best.pth`) |
| Instruct SFT (nanoGPT) | GPT | [nishantup/nanogpt-slm-instruct](https://huggingface.co/nishantup/nanogpt-slm-instruct) |