| | ---
|
| | license: mit
|
| | tags:
|
| | - pytorch
|
| | - gpt2
|
| | - instruction-tuning
|
| | - sft
|
| | - slm
|
| | - from-scratch
|
| | - raschka
|
| | base_model: nishantup/nanogpt-slm-124m
|
| | ---
|
| |
|
| | # GPT2 SLM Instruct (Raschka Architecture) -- 163.2M Parameters
|
| |
|
| | Instruction fine-tuned Small Language Model using the Raschka-style GPTModel architecture.
|
| |
|
| | **Pipeline:** Trained from scratch -> Pretrained on 133 classic English fiction books -> SFT on Alpaca-format instructions.
|
| |
|
| | ## Quick Start
|
| |
|
| | ### Option 1: Run directly (downloads model + runs examples)
|
| | ```bash
|
| | pip install torch tiktoken huggingface_hub
|
| | python gpt2_slm_instruct_inference.py
|
| | ```
|
| |
|
| | ### Option 2: Import and use `ask()` in your own code
|
| | ```python
|
| | # Import loads the model automatically (one-time download from HuggingFace)
|
| | from gpt2_slm_instruct_inference import ask
|
| |
|
| | # Simple question
|
| | print(ask("What is the capital of France?"))
|
| | print()
|
| |
|
| | # With input context
|
| | print(ask(
|
| | instruction="Summarize the following text.",
|
| | input_text="Machine learning enables systems to learn from data rather than being explicitly programmed."
|
| | ))
|
| | print()
|
| |
|
| | # Control generation
|
| | print(ask(
|
| | "Write a short poem about the ocean.",
|
| | temperature=1.0, # higher = more creative
|
| | top_k=100, # wider sampling pool
|
| | max_tokens=150 # longer output
|
| | ))
|
| | print()
|
| | ```
|
| |
|
| | ### Option 3: Load weights manually
|
| | ```python
|
| | from huggingface_hub import hf_hub_download
|
| | import torch
|
| |
|
| | model_path = hf_hub_download(
|
| | repo_id="nishantup/gpt2-slm-instruct",
|
| | filename="gpt2_slm_instruct.pth"
|
| | )
|
| |
|
| | from gpt2_slm_instruct_inference import GPTModel, BASE_CONFIG
|
| |
|
| | model = GPTModel(BASE_CONFIG)
|
| | model.load_state_dict(torch.load(model_path, map_location="cpu"))
|
| | model.eval()
|
| | ```
|
| |
|
| | ## Prompt Format
|
| |
|
| | ```
|
| | Below is an instruction that describes a task.
|
| |
|
| | ### Instruction:
|
| | {instruction}
|
| |
|
| | ### Response:
|
| | ```
|
| |
|
| | With optional input:
|
| | ```
|
| | Below is an instruction that describes a task, paired with further context.
|
| |
|
| | ### Instruction:
|
| | {instruction}
|
| |
|
| | ### Input:
|
| | {input}
|
| |
|
| | ### Response:
|
| | ```
|
| |
|
| | ## Model Details
|
| |
|
| | | Attribute | Value |
|
| | |:---|:---|
|
| | | Parameters | 163.2M |
|
| | | Architecture | Raschka GPTModel (12 layers, 12 heads, 768 dim) |
|
| | | Context length | 256 tokens |
|
| | | Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
|
| | | Base model | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) |
|
| | | Fine-tuning | Supervised (Alpaca format, 1,100 examples, 2 epochs) |
|
| | | Framework | PyTorch |
|
| |
|
| | ## Architecture Comparison
|
| |
|
| | | Feature | This model (Raschka) | nanoGPT variant |
|
| | |:---|:---|:---|
|
| | | Weights file | `gpt2_slm_instruct.pth` | `nanogpt_slm_instruct.pth` |
|
| | | Attention | Separate W_query, W_key, W_value | Combined c_attn |
|
| | | LayerNorm | scale/shift params | weight/bias params |
|
| | | MLP | FeedForward (Sequential) | MLP (c_fc/c_proj) |
|
| | | Config | Dict (BASE_CONFIG) | Dataclass (GPTConfig) |
|
| | | Weight tying | No | Yes (wte = lm_head) |
|
| | | forward() returns | logits | (logits, loss) tuple |
|
| |
|
| | ## Files
|
| |
|
| | | File | Description |
|
| | |:---|:---|
|
| | | `gpt2_slm_instruct.pth` | SFT fine-tuned weights (Raschka GPTModel) |
|
| | | `gpt2_slm_instruct_inference.py` | Standalone inference script -- import and call `ask()` |
|
| | | `config.json` | Model configuration |
|
| |
|
| | ## `ask()` API Reference
|
| |
|
| | ```python
|
| | ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)
|
| | ```
|
| |
|
| | | Parameter | Default | Description |
|
| | |:---|:---|:---|
|
| | | `instruction` | (required) | The task instruction |
|
| | | `input_text` | `""` | Optional additional context |
|
| | | `max_tokens` | `256` | Maximum tokens to generate |
|
| | | `temperature` | `0.7` | 0.0 = greedy, 0.7 = balanced, 1.5 = creative |
|
| | | `top_k` | `40` | Top-k filtering (None = no filtering) |
|
| |
|
| | ## Related Models
|
| |
|
| | | Variant | Architecture | Repo |
|
| | |:---|:---|:---|
|
| | | Pretrained base (Raschka) | GPTModel | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) |
|
| | | Pretrained base (nanoGPT) | GPT | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`nanogpt_slm_best.pth`) |
|
| | | Instruct SFT (nanoGPT) | GPT | [nishantup/nanogpt-slm-instruct](https://huggingface.co/nishantup/nanogpt-slm-instruct) |
|
| |
|