You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸ“– tiny-stories-3M

A 50M parameter GPT-style Small Language Model built from scratch in PyTorch, trained on the TinyStories dataset.

Model Details

Parameter Value
Architecture Decoder-only Transformer (GPT)
Parameters ~50 Million
Layers 6
Attention Heads 6
Embedding Dim 384
Context Window 128 tokens
Vocab Size 50,257 (GPT-2 BPE)
Validation Loss 2.390

Dataset

Trained on TinyStories β€” short children's stories generated by GPT-3.5 and GPT-4 (Microsoft Research).

How to Use

import tiktoken
import torch

enc = tiktoken.get_encoding("gpt2")

prompt = "Once upon a time there was a little dragon"
context = torch.tensor(enc.encode_ordinary(prompt)).unsqueeze(0)

with torch.no_grad():
    output = model.generate(
        context,
        max_new_tokens=200,
        temperature=0.8,
        top_k=50
    )

print(enc.decode(output.squeeze().tolist()))

Training Details

Setting Value
Optimizer AdamW
Learning Rate 1e-4 with cosine decay
Warmup Steps 1,000
Batch Size 32
Gradient Accumulation 32 steps
Precision Mixed (bfloat16)
Gradient Clipping 0.5

Limitations

  • Context window limited to 128 tokens
  • English only
  • Generates simple stories only β€” not for factual Q&A
  • No instruction following
Downloads last month
2
Safetensors
Model size
30M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train nisha97/tiny-stories-slm-50M

Evaluation results