--- license: mit tags: - pytorch - gpt2 - instruction-tuning - sft - slm - from-scratch - raschka base_model: nishantup/nanogpt-slm-124m --- # GPT2 SLM Instruct (Raschka Architecture) -- 163.2M Parameters Instruction fine-tuned Small Language Model using the Raschka-style GPTModel architecture. **Pipeline:** Trained from scratch -> Pretrained on 133 classic English fiction books -> SFT on Alpaca-format instructions. ## Quick Start ### Option 1: Run directly (downloads model + runs examples) ```bash pip install torch tiktoken huggingface_hub python gpt2_slm_instruct_inference.py ``` ### Option 2: Import and use `ask()` in your own code ```python # Import loads the model automatically (one-time download from HuggingFace) from gpt2_slm_instruct_inference import ask # Simple question print(ask("What is the capital of France?")) print() # With input context print(ask( instruction="Summarize the following text.", input_text="Machine learning enables systems to learn from data rather than being explicitly programmed." )) print() # Control generation print(ask( "Write a short poem about the ocean.", temperature=1.0, # higher = more creative top_k=100, # wider sampling pool max_tokens=150 # longer output )) print() ``` ### Option 3: Load weights manually ```python from huggingface_hub import hf_hub_download import torch model_path = hf_hub_download( repo_id="nishantup/gpt2-slm-instruct", filename="gpt2_slm_instruct.pth" ) from gpt2_slm_instruct_inference import GPTModel, BASE_CONFIG model = GPTModel(BASE_CONFIG) model.load_state_dict(torch.load(model_path, map_location="cpu")) model.eval() ``` ## Prompt Format ``` Below is an instruction that describes a task. ### Instruction: {instruction} ### Response: ``` With optional input: ``` Below is an instruction that describes a task, paired with further context. ### Instruction: {instruction} ### Input: {input} ### Response: ``` ## Model Details | Attribute | Value | |:---|:---| | Parameters | 163.2M | | Architecture | Raschka GPTModel (12 layers, 12 heads, 768 dim) | | Context length | 256 tokens | | Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) | | Base model | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) | | Fine-tuning | Supervised (Alpaca format, 1,100 examples, 2 epochs) | | Framework | PyTorch | ## Architecture Comparison | Feature | This model (Raschka) | nanoGPT variant | |:---|:---|:---| | Weights file | `gpt2_slm_instruct.pth` | `nanogpt_slm_instruct.pth` | | Attention | Separate W_query, W_key, W_value | Combined c_attn | | LayerNorm | scale/shift params | weight/bias params | | MLP | FeedForward (Sequential) | MLP (c_fc/c_proj) | | Config | Dict (BASE_CONFIG) | Dataclass (GPTConfig) | | Weight tying | No | Yes (wte = lm_head) | | forward() returns | logits | (logits, loss) tuple | ## Files | File | Description | |:---|:---| | `gpt2_slm_instruct.pth` | SFT fine-tuned weights (Raschka GPTModel) | | `gpt2_slm_instruct_inference.py` | Standalone inference script -- import and call `ask()` | | `config.json` | Model configuration | ## `ask()` API Reference ```python ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40) ``` | Parameter | Default | Description | |:---|:---|:---| | `instruction` | (required) | The task instruction | | `input_text` | `""` | Optional additional context | | `max_tokens` | `256` | Maximum tokens to generate | | `temperature` | `0.7` | 0.0 = greedy, 0.7 = balanced, 1.5 = creative | | `top_k` | `40` | Top-k filtering (None = no filtering) | ## Related Models | Variant | Architecture | Repo | |:---|:---|:---| | Pretrained base (Raschka) | GPTModel | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) | | Pretrained base (nanoGPT) | GPT | [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`nanogpt_slm_best.pth`) | | Instruct SFT (nanoGPT) | GPT | [nishantup/nanogpt-slm-instruct](https://huggingface.co/nishantup/nanogpt-slm-instruct) |