tiny-gpt / README.md
alainbrown's picture
Publish Tiny GPT model
b3873f1 verified
|
Raw
History Blame Contribute Delete
1.69 kB
---
language:
- en
license: mit
library_name: transformers
pipeline_tag: text-generation
datasets:
- roneneldan/TinyStories
tags:
- custom_code
- educational
---
# Tiny GPT
Tiny GPT is an educational decoder-only Transformer trained from scratch on
the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
dataset. The implementation is intentionally small and readable.
## Model details
- Architecture: decoder-only causal language model
- Context length: 512 tokens
- Vocabulary size: 10,000
- Hidden size: 256
- Transformer layers: 6
- Attention heads: 8
Source code: https://github.com/alainbrown/tiny-gpt
## Usage
This repository contains custom Transformers code. Review it before enabling
`trust_remote_code`.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "alainbrown/tiny-gpt"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
inputs = tokenizer("Once upon a time", return_tensors="pt")
logits = model(**inputs).logits
```
## Intended use
This model is intended for education and experimentation. It is not intended
for production, factual question answering, or safety-critical applications.
## Limitations
The model is small, trained on synthetic children's stories, and has not been
comprehensively evaluated. It may produce incoherent, repetitive, incorrect,
or inappropriate text. English is the only supported language.
## Training
The training pipeline is available in the linked GitHub repository. This model
repository excludes optimizer and progress state and contains inference files
only.