tinyGPT / README.md

wasifali1

Upload README.md

87ee0ef verified 4 days ago

preview code

Raw

History Blame Contribute Delete

2.28 kB

metadata

license: other
library_name: pytorch
pipeline_tag: text-generation
tags:
  - text-generation
  - story-generation
  - gpt
  - pytorch
  - custom-code
language:
  - en

TinyGPT StoryGPT

TinyGPT StoryGPT is a small GPT-style decoder-only transformer trained from scratch for children's story generation, with an instruction-tuned checkpoint for simple prompt-following.

This is a custom PyTorch model, not a Hugging Face Transformers GPT-2 checkpoint. Use the included model.py, tokenizer.py, and storyGPT.py files to load and run it.

Model Details

Architecture: decoder-only transformer
Vocabulary size: 8000
Context window: 512 tokens
Embedding size: 512
Attention heads: 8
Layers: 8
Parameters: 29,541,376
Tokenizer: custom BPE tokenizer
Base model file: story_model.pth
Instruction model file: instruct_checkpoints/instruct_model.pth

The instruction checkpoint metadata reports epoch 8, loss 1.1102, vocabulary size 8000, and context window 512.

Usage

pip install -r requirements.txt
python storyGPT.py

For API serving:

pip install -r requirements.txt
python api.py

Example request:

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Write a bedtime story about a brave little star.","max_tokens":120}'

Files

model.py - transformer model definition
tokenizer.py - custom BPE tokenizer
tokenizer.json - trained tokenizer vocabulary and merges
storyGPT.py - interactive CLI generation
api.py - Flask API server
story_model.pth - base story model checkpoint
instruct_checkpoints/instruct_model.pth - instruction-tuned checkpoint

Limitations

This is an experimental small model. It can produce repetition, factual errors, malformed formatting, or unsafe/unwanted text. Review outputs before using them with children or public audiences.

Training Data

The repository contains code for base training and instruction fine-tuning. The public upload excludes large/local training corpora and intermediate checkpoints by default.

License

No final license has been selected in this scaffold. Choose a license only after confirming that your training data and assets are compatible with public release.