metadata
license: mit
language:
- en
tags:
- pytorch
- nanogpt
- instruction-tuning
- sft
- slm
- from-scratch
- small-language-model
- tinystories
- text-generation
datasets:
- roneneldan/TinyStories
- nishantup/instruction-dataset-300k-nanogpt-slm
pipeline_tag: text-generation
nanoGPT SLM TinyStories Instruct -- 124.0M Parameters
An instruction-tuned Small Language Model trained entirely from scratch:
- Pretrained on TinyStories (2.1M children's stories, 70K iterations)
- Instruction fine-tuned (SFT) on a 300K multi-source instruction dataset
What This Model Does
This model follows instructions across diverse tasks: answering questions, summarizing text, creative writing, classification, translation, and more. Give it a task and it responds:
Task: Explain what photosynthesis is in simple terms.
Answer:
Photosynthesis is the process by which plants convert sunlight, water, and
carbon dioxide into glucose and oxygen. It occurs in the chloroplasts of
plant cells and is essential for life on Earth...
Capabilities:
- Answers factual questions
- Summarizes text
- Writes creative content (poems, stories, descriptions)
- Classifies text (sentiment, category)
- Generates lists and structured content
- Explains concepts in simple terms
Limitations:
- 512-token context window limits response length
- Trained on children's stories base -- may default to simple language
- Not as capable as larger instruction-tuned models
- English only
Training Pipeline
| Stage | Dataset | Size | Details |
|---|---|---|---|
| Pretraining | TinyStories | 2.1M stories | 70K iters, batch 32x512, lr=6e-4, cosine to 1e-5 |
| Instruction SFT | 300K Instructions | 300K examples | 3 epochs, batch 32, lr=1e-4, AdamW |
Instruction Dataset Sources
| Source | Count | Type |
|---|---|---|
| Alpaca | 52K | Stanford instruction-following |
| Dolly | 15K | Databricks human-authored |
| UltraChat | 80K | Multi-turn conversations |
| OpenAssistant | 33K | Human-generated QA |
| FLAN | 120K | Google's diverse NLP tasks |
Quick Start
Option 1: Run directly
pip install torch tiktoken huggingface_hub
python nanogpt_slm_tinystories_instruct_inference.py
Option 2: Import and use ask() in your own code
from nanogpt_slm_tinystories_instruct_inference import ask
# Simple question
print(ask("What is the capital of France?"))
# With input context
print(ask(
instruction="Summarize the following text.",
input_text="Machine learning enables systems to learn from data..."
))
# Control generation
print(ask("Write a poem about the ocean.", temperature=1.0, top_k=100))
Option 3: Load weights manually
from huggingface_hub import hf_hub_download
import torch
model_path = hf_hub_download(
repo_id="nishantup/nanogpt-slm-tinystories-instruct",
filename="nanogpt_slm_tinystories_instruct.pth"
)
from nanogpt_slm_tinystories_instruct_inference import GPT, GPTConfig
config = GPTConfig()
model = GPT(config)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
Prompt Format
The model uses the unified Task/Question/Answer format:
Task: {instruction}
Question:
{input} <- only if input is non-empty
Answer:
{response}
Model Architecture
| Attribute | Value |
|---|---|
| Architecture | nanoGPT (GPT-2 small: 12 layers, 12 heads, 768 dim) |
| Parameters | 124.0M (unique, with weight tying) |
| Context length | 512 tokens |
| Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
| EOS token | <|endoftext|> (50256) -- clean response stopping |
ask() API Reference
ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)
| Parameter | Default | Description |
|---|---|---|
instruction |
(required) | The task instruction |
input_text |
"" |
Optional additional context |
max_tokens |
256 |
Maximum tokens to generate |
temperature |
0.7 |
0.0 = greedy, 0.7 = balanced, 1.5 = creative |
top_k |
40 |
Top-k filtering (None = no filtering) |
Files
| File | Description |
|---|---|
nanogpt_slm_tinystories_instruct.pth |
Instruction fine-tuned weights |
nanogpt_slm_tinystories_instruct_inference.py |
Standalone inference script |
config.json |
Model + training configuration |
Related Models (Vizuara SLM Family)
| Variant | Type | Repo |
|---|---|---|
| Pretrained (TinyStories) | Base | nishantup/nanogpt-pretrained-slm-tinystories-124m |
| This model | Instruction SFT | nishantup/nanogpt-slm-tinystories-instruct |
| Spam classifier | Classification | nishantup/nanogpt-slm-tinystories-classifier |
Citation
Eldan, R., & Li, Y. (2023). TinyStories: How Small Can Language Models Be
and Still Speak Coherent English? arXiv preprint arXiv:2305.07759.