tinyGPT / README.md
wasifali1's picture
Upload README.md
87ee0ef verified
|
Raw
History Blame Contribute Delete
2.28 kB
---
license: other
library_name: pytorch
pipeline_tag: text-generation
tags:
- text-generation
- story-generation
- gpt
- pytorch
- custom-code
language:
- en
---
# TinyGPT StoryGPT
TinyGPT StoryGPT is a small GPT-style decoder-only transformer trained from scratch for children's story generation, with an instruction-tuned checkpoint for simple prompt-following.
This is a custom PyTorch model, not a Hugging Face Transformers GPT-2 checkpoint. Use the included `model.py`, `tokenizer.py`, and `storyGPT.py` files to load and run it.
## Model Details
- Architecture: decoder-only transformer
- Vocabulary size: 8000
- Context window: 512 tokens
- Embedding size: 512
- Attention heads: 8
- Layers: 8
- Parameters: 29,541,376
- Tokenizer: custom BPE tokenizer
- Base model file: `story_model.pth`
- Instruction model file: `instruct_checkpoints/instruct_model.pth`
The instruction checkpoint metadata reports epoch 8, loss `1.1102`, vocabulary size `8000`, and context window `512`.
## Usage
```bash
pip install -r requirements.txt
python storyGPT.py
```
For API serving:
```bash
pip install -r requirements.txt
python api.py
```
Example request:
```bash
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{"prompt":"Write a bedtime story about a brave little star.","max_tokens":120}'
```
## Files
- `model.py` - transformer model definition
- `tokenizer.py` - custom BPE tokenizer
- `tokenizer.json` - trained tokenizer vocabulary and merges
- `storyGPT.py` - interactive CLI generation
- `api.py` - Flask API server
- `story_model.pth` - base story model checkpoint
- `instruct_checkpoints/instruct_model.pth` - instruction-tuned checkpoint
## Limitations
This is an experimental small model. It can produce repetition, factual errors, malformed formatting, or unsafe/unwanted text. Review outputs before using them with children or public audiences.
## Training Data
The repository contains code for base training and instruction fine-tuning. The public upload excludes large/local training corpora and intermediate checkpoints by default.
## License
No final license has been selected in this scaffold. Choose a license only after confirming that your training data and assets are compatible with public release.