alainbrown
/

tiny-gpt

Text Generation

Model card Files Files and versions

tiny-gpt / README.md

alainbrown's picture

Publish Tiny GPT model

b3873f1 verified 11 days ago

|

History Blame Contribute Delete

1.69 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	pipeline_tag: text-generation
	datasets:
	- roneneldan/TinyStories
	tags:
	- custom_code
	- educational
	---

	# Tiny GPT

	Tiny GPT is an educational decoder-only Transformer trained from scratch on
	the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
	dataset. The implementation is intentionally small and readable.

	## Model details

	- Architecture: decoder-only causal language model
	- Context length: 512 tokens
	- Vocabulary size: 10,000
	- Hidden size: 256
	- Transformer layers: 6
	- Attention heads: 8

	Source code: https://github.com/alainbrown/tiny-gpt

	## Usage

	This repository contains custom Transformers code. Review it before enabling
	`trust_remote_code`.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo_id = "alainbrown/tiny-gpt"
	tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)

	inputs = tokenizer("Once upon a time", return_tensors="pt")
	logits = model(**inputs).logits
	```

	## Intended use

	This model is intended for education and experimentation. It is not intended
	for production, factual question answering, or safety-critical applications.

	## Limitations

	The model is small, trained on synthetic children's stories, and has not been
	comprehensively evaluated. It may produce incoherent, repetitive, incorrect,
	or inappropriate text. English is the only supported language.

	## Training

	The training pipeline is available in the linked GitHub repository. This model
	repository excludes optimizer and progress state and contains inference files
	only.