tinyGPT / README.md
wasifali1's picture
Upload README.md
87ee0ef verified
|
Raw
History Blame Contribute Delete
2.28 kB
metadata
license: other
library_name: pytorch
pipeline_tag: text-generation
tags:
  - text-generation
  - story-generation
  - gpt
  - pytorch
  - custom-code
language:
  - en

TinyGPT StoryGPT

TinyGPT StoryGPT is a small GPT-style decoder-only transformer trained from scratch for children's story generation, with an instruction-tuned checkpoint for simple prompt-following.

This is a custom PyTorch model, not a Hugging Face Transformers GPT-2 checkpoint. Use the included model.py, tokenizer.py, and storyGPT.py files to load and run it.

Model Details

  • Architecture: decoder-only transformer
  • Vocabulary size: 8000
  • Context window: 512 tokens
  • Embedding size: 512
  • Attention heads: 8
  • Layers: 8
  • Parameters: 29,541,376
  • Tokenizer: custom BPE tokenizer
  • Base model file: story_model.pth
  • Instruction model file: instruct_checkpoints/instruct_model.pth

The instruction checkpoint metadata reports epoch 8, loss 1.1102, vocabulary size 8000, and context window 512.

Usage

pip install -r requirements.txt
python storyGPT.py

For API serving:

pip install -r requirements.txt
python api.py

Example request:

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Write a bedtime story about a brave little star.","max_tokens":120}'

Files

  • model.py - transformer model definition
  • tokenizer.py - custom BPE tokenizer
  • tokenizer.json - trained tokenizer vocabulary and merges
  • storyGPT.py - interactive CLI generation
  • api.py - Flask API server
  • story_model.pth - base story model checkpoint
  • instruct_checkpoints/instruct_model.pth - instruction-tuned checkpoint

Limitations

This is an experimental small model. It can produce repetition, factual errors, malformed formatting, or unsafe/unwanted text. Review outputs before using them with children or public audiences.

Training Data

The repository contains code for base training and instruction fine-tuning. The public upload excludes large/local training corpora and intermediate checkpoints by default.

License

No final license has been selected in this scaffold. Choose a license only after confirming that your training data and assets are compatible with public release.