Upload README.md with huggingface_hub

9f7c13e verified 6 days ago

4.27 kB

	---
	license: mit
	tags:
	- pytorch
	- gpt2
	- instruction-tuning
	- sft
	- slm
	- from-scratch
	- raschka
	base_model: nishantup/nanogpt-slm-124m
	---

	# GPT2 SLM Instruct (Raschka Architecture) -- 163.2M Parameters

	Instruction fine-tuned Small Language Model using the Raschka-style GPTModel architecture.

	Pipeline: Trained from scratch -> Pretrained on 133 classic English fiction books -> SFT on Alpaca-format instructions.

	## Quick Start

	### Option 1: Run directly (downloads model + runs examples)
	```bash
	pip install torch tiktoken huggingface_hub
	python gpt2_slm_instruct_inference.py
	```

	### Option 2: Import and use `ask()` in your own code
	```python
	# Import loads the model automatically (one-time download from HuggingFace)
	from gpt2_slm_instruct_inference import ask

	# Simple question
	print(ask("What is the capital of France?"))
	print()

	# With input context
	print(ask(
	instruction="Summarize the following text.",
	input_text="Machine learning enables systems to learn from data rather than being explicitly programmed."
	))
	print()

	# Control generation
	print(ask(
	"Write a short poem about the ocean.",
	temperature=1.0, # higher = more creative
	top_k=100, # wider sampling pool
	max_tokens=150 # longer output
	))
	print()
	```

	### Option 3: Load weights manually
	```python
	from huggingface_hub import hf_hub_download
	import torch

	model_path = hf_hub_download(
	repo_id="nishantup/gpt2-slm-instruct",
	filename="gpt2_slm_instruct.pth"
	)

	from gpt2_slm_instruct_inference import GPTModel, BASE_CONFIG

	model = GPTModel(BASE_CONFIG)
	model.load_state_dict(torch.load(model_path, map_location="cpu"))
	model.eval()
	```

	## Prompt Format

	```
	Below is an instruction that describes a task.

	### Instruction:
	{instruction}

	### Response:
	```

	With optional input:
	```
	Below is an instruction that describes a task, paired with further context.

	### Instruction:
	{instruction}

	### Input:
	{input}

	### Response:
	```

	## Model Details

	\| Attribute \| Value \|
	\|:---\|:---\|
	\| Parameters \| 163.2M \|
	\| Architecture \| Raschka GPTModel (12 layers, 12 heads, 768 dim) \|
	\| Context length \| 256 tokens \|
	\| Tokenizer \| tiktoken GPT-2 BPE (50,257 tokens) \|
	\| Base model \| [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) \|
	\| Fine-tuning \| Supervised (Alpaca format, 1,100 examples, 2 epochs) \|
	\| Framework \| PyTorch \|

	## Architecture Comparison

	\| Feature \| This model (Raschka) \| nanoGPT variant \|
	\|:---\|:---\|:---\|
	\| Weights file \| `gpt2_slm_instruct.pth` \| `nanogpt_slm_instruct.pth` \|
	\| Attention \| Separate W_query, W_key, W_value \| Combined c_attn \|
	\| LayerNorm \| scale/shift params \| weight/bias params \|
	\| MLP \| FeedForward (Sequential) \| MLP (c_fc/c_proj) \|
	\| Config \| Dict (BASE_CONFIG) \| Dataclass (GPTConfig) \|
	\| Weight tying \| No \| Yes (wte = lm_head) \|
	\| forward() returns \| logits \| (logits, loss) tuple \|

	## Files

	\| File \| Description \|
	\|:---\|:---\|
	\| `gpt2_slm_instruct.pth` \| SFT fine-tuned weights (Raschka GPTModel) \|
	\| `gpt2_slm_instruct_inference.py` \| Standalone inference script -- import and call `ask()` \|
	\| `config.json` \| Model configuration \|

	## `ask()` API Reference

	```python
	ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)
	```

	\| Parameter \| Default \| Description \|
	\|:---\|:---\|:---\|
	\| `instruction` \| (required) \| The task instruction \|
	\| `input_text` \| `""` \| Optional additional context \|
	\| `max_tokens` \| `256` \| Maximum tokens to generate \|
	\| `temperature` \| `0.7` \| 0.0 = greedy, 0.7 = balanced, 1.5 = creative \|
	\| `top_k` \| `40` \| Top-k filtering (None = no filtering) \|

	## Related Models

	\| Variant \| Architecture \| Repo \|
	\|:---\|:---\|:---\|
	\| Pretrained base (Raschka) \| GPTModel \| [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`gpt_slm_best.pth`) \|
	\| Pretrained base (nanoGPT) \| GPT \| [nishantup/nanogpt-slm-124m](https://huggingface.co/nishantup/nanogpt-slm-124m) (`nanogpt_slm_best.pth`) \|
	\| Instruct SFT (nanoGPT) \| GPT \| [nishantup/nanogpt-slm-instruct](https://huggingface.co/nishantup/nanogpt-slm-instruct) \|