Update README.md

e159039 verified about 1 month ago

5.66 kB

license: mit
language:
  - en
tags:
  - pytorch
  - nanogpt
  - instruction-tuning
  - sft
  - slm
  - from-scratch
  - small-language-model
  - tinystories
  - text-generation
datasets:
  - roneneldan/TinyStories
  - nishantup/instruction-dataset-300k-nanogpt-slm
pipeline_tag: text-generation

nanoGPT SLM TinyStories Instruct -- 124.0M Parameters

An instruction-tuned Small Language Model trained entirely from scratch:

Pretrained on TinyStories (2.1M children's stories, 70K iterations)
Instruction fine-tuned (SFT) on a 300K multi-source instruction dataset

What This Model Does

This model follows instructions across diverse tasks: answering questions, summarizing text, creative writing, classification, translation, and more. Give it a task and it responds:

Task: Explain what photosynthesis is in simple terms.

Answer:
Photosynthesis is the process by which plants convert sunlight, water, and
carbon dioxide into glucose and oxygen. It occurs in the chloroplasts of
plant cells and is essential for life on Earth...

Capabilities:

Answers factual questions
Summarizes text
Writes creative content (poems, stories, descriptions)
Classifies text (sentiment, category)
Generates lists and structured content
Explains concepts in simple terms

Limitations:

512-token context window limits response length
Trained on children's stories base -- may default to simple language
Not as capable as larger instruction-tuned models
English only

Training Pipeline

Stage	Dataset	Size	Details
Pretraining	TinyStories	2.1M stories	70K iters, batch 32x512, lr=6e-4, cosine to 1e-5
Instruction SFT	300K Instructions	300K examples	3 epochs, batch 32, lr=1e-4, AdamW

Instruction Dataset Sources

Source	Count	Type
Alpaca	52K	Stanford instruction-following
Dolly	15K	Databricks human-authored
UltraChat	80K	Multi-turn conversations
OpenAssistant	33K	Human-generated QA
FLAN	120K	Google's diverse NLP tasks

Quick Start

Option 1: Run directly

pip install torch tiktoken huggingface_hub
python nanogpt_slm_tinystories_instruct_inference.py

Option 2: Import and use `ask()` in your own code

from nanogpt_slm_tinystories_instruct_inference import ask

# Simple question
print(ask("What is the capital of France?"))

# With input context
print(ask(
    instruction="Summarize the following text.",
    input_text="Machine learning enables systems to learn from data..."
))

# Control generation
print(ask("Write a poem about the ocean.", temperature=1.0, top_k=100))

Option 3: Load weights manually

from huggingface_hub import hf_hub_download
import torch

model_path = hf_hub_download(
    repo_id="nishantup/nanogpt-slm-tinystories-instruct",
    filename="nanogpt_slm_tinystories_instruct.pth"
)

from nanogpt_slm_tinystories_instruct_inference import GPT, GPTConfig
config = GPTConfig()
model = GPT(config)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

Prompt Format

The model uses the unified Task/Question/Answer format:

Task: {instruction}

Question:
{input}          <- only if input is non-empty

Answer:
{response}

Model Architecture

Attribute	Value
Architecture	nanoGPT (GPT-2 small: 12 layers, 12 heads, 768 dim)
Parameters	124.0M (unique, with weight tying)
Context length	512 tokens
Tokenizer	tiktoken GPT-2 BPE (50,257 tokens)
EOS token	`<\|endoftext\|>` (50256) -- clean response stopping

`ask()` API Reference

ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)

Parameter	Default	Description
`instruction`	(required)	The task instruction
`input_text`	`""`	Optional additional context
`max_tokens`	`256`	Maximum tokens to generate
`temperature`	`0.7`	0.0 = greedy, 0.7 = balanced, 1.5 = creative
`top_k`	`40`	Top-k filtering (None = no filtering)

Files

File	Description
`nanogpt_slm_tinystories_instruct.pth`	Instruction fine-tuned weights
`nanogpt_slm_tinystories_instruct_inference.py`	Standalone inference script
`config.json`	Model + training configuration

Related Models (Vizuara SLM Family)

Variant	Type	Repo
Pretrained (TinyStories)	Base	nishantup/nanogpt-pretrained-slm-tinystories-124m
This model	Instruction SFT	nishantup/nanogpt-slm-tinystories-instruct
Spam classifier	Classification	nishantup/nanogpt-slm-tinystories-classifier

Citation

Eldan, R., & Li, Y. (2023). TinyStories: How Small Can Language Models Be
and Still Speak Coherent English? arXiv preprint arXiv:2305.07759.

Author

Dr. Nishant Upadhyay