You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

📖 tiny-stories-3M

A 50M parameter GPT-style Small Language Model built from scratch in PyTorch, trained on the TinyStories dataset.

Model Details

Parameter	Value
Architecture	Decoder-only Transformer (GPT)
Parameters	~50 Million
Layers	6
Attention Heads	6
Embedding Dim	384
Context Window	128 tokens
Vocab Size	50,257 (GPT-2 BPE)
Validation Loss	2.390

Dataset

Trained on TinyStories — short children's stories generated by GPT-3.5 and GPT-4 (Microsoft Research).

How to Use

import tiktoken
import torch

enc = tiktoken.get_encoding("gpt2")

prompt = "Once upon a time there was a little dragon"
context = torch.tensor(enc.encode_ordinary(prompt)).unsqueeze(0)

with torch.no_grad():
    output = model.generate(
        context,
        max_new_tokens=200,
        temperature=0.8,
        top_k=50
    )

print(enc.decode(output.squeeze().tolist()))

Training Details

Setting	Value
Optimizer	AdamW
Learning Rate	1e-4 with cosine decay
Warmup Steps	1,000
Batch Size	32
Gradient Accumulation	32 steps
Precision	Mixed (bfloat16)
Gradient Clipping	0.5

Limitations

Context window limited to 128 tokens
English only
Generates simple stories only — not for factual Q&A
No instruction following

Downloads last month: 2

Safetensors

Model size

30M params

Tensor type

F32

Dataset used to train nisha97/tiny-stories-slm-50M

Evaluation results

Validation Loss
self-reported

2.390