|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- gpt2 |
|
|
- tinystories |
|
|
- language-model |
|
|
--- |
|
|
|
|
|
# TinyStories-GPT |
|
|
|
|
|
This is a small GPT-like model trained from scratch on the [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories). |
|
|
It was implemented using a NanoGPT-style training loop in PyTorch. |
|
|
|
|
|
## Model Details |
|
|
- **Architecture:** 6 layers, 6 heads, 384 hidden size |
|
|
- **Context length:** 128 tokens |
|
|
- **Vocab size:** 50257 (GPT-2 tokenizer) |
|
|
- **Dataset:** TinyStories |
|
|
- **Training:** ~20k steps, AdamW, cosine LR decay |
|
|
|
|
|
## Example Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Here2Disrupt/tiny-stories-gpt") |
|
|
model = AutoModelForCausalLM.from_pretrained("Here2Disrupt/tiny-stories-gpt") |
|
|
|
|
|
prompt = "Once upon a time" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=50) |
|
|
print(tokenizer.decode(outputs[0])) |
|
|
|