Transformer-MiniGPT / README.md
Austin207's picture
Update README.md
f02cfcb verified
|
raw
history blame
4.06 kB
# MiniGPT — Lightweight Transformer for Text Generation
**MiniGPT** is a minimal yet powerful GPT-style language model built from scratch using PyTorch. It is designed for educational clarity, customization, and efficient real-time text generation. This project demonstrates the full training and inference pipeline of a decoder-only transformer architecture, including streaming capabilities and modern sampling strategies.
> Hosted with ❤️ by [@Austin207](https://huggingface.co/Austin207)
---
## Model Description
MiniGPT is a small, word-level transformer model with the following architecture:
* 4 Transformer layers
* 4 Attention heads
* 128 Embedding dimensions
* 512 FFN hidden size
* Max sequence length: 128
* Word-level tokenizer (trained with Hugging Face `tokenizers`)
Despite its size, it supports advanced generation strategies including:
* Repetition Penalty
* Temperature Sampling
* Top-K & Top-P (nucleus) sampling
* Real-time streaming output
---
## Usage
Install dependencies:
```bash
pip install torch tokenizers
```
Load the model and tokenizer:
```python
from miniGPT import MiniGPT
from inference import generate_stream
from tokenizers import Tokenizer
import torch
# Load tokenizer
tokenizer = Tokenizer.from_file("wordlevel.json")
# Load model
model = MiniGPT(
vocab_size=tokenizer.get_vocab_size(),
embed_dim=128,
num_heads=4,
ff_dim=512,
num_layers=4,
max_seq_len=128
)
checkpoint = torch.load("model_checkpoint_step20000.pt")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
# Generate text
prompt = "Beneath the ancient ruins"
generate_stream(model, tokenizer, prompt, max_new_tokens=60, temperature=1.0, top_k=50, top_p=0.9)
```
---
## Training
Train from scratch on any plain-text dataset:
```bash
python training.py
```
Training includes:
* Checkpointing
* Sample generation previews
* Word-level tokenization with `tokenizers`
* Custom datasets via `alphabetical_dataset.txt` or your own
---
## Files in This Repository
| File | Purpose |
| -------------------------- | ---------------------------- |
| `miniGPT.py` | Core Transformer model |
| `transformer.py` | Transformer block logic |
| `multiheadattention.py` | Multi-head attention module |
| `Tokenizer.py` | Tokenizer loader |
| `training.py` | Training loop |
| `inference.py` | CLI and streaming generation |
| `dataprocess.py` | Text preprocessing tools |
| `wordlevel.json` | Trained word-level tokenizer |
| `alphabetical_dataset.txt` | Sample dataset |
| `requirements.txt` | Required dependencies |
---
## Model Card
| Property | Value |
| ------------ | --------------------------------- |
| Model Type | Decoder-only GPT |
| Size | Small (\~4.6M params) |
| Trained On | Word-level dataset (custom) |
| Intended Use | Text generation, educational demo |
| License | MIT |
---
## Intended Use and Limitations
This model is meant for educational, experimental, and research purposes. It is not suitable for commercial or production use out-of-the-box. Expect limitations in coherence, factuality, and long-context reasoning.
---
## Contributions
We welcome improvements, bug fixes, and new features!
```bash
# Fork, clone, and create a branch
git clone https://github.com/austin207/Transformer-Virtue-v2.git
cd Transformer-Virtue-v2
git checkout -b feature/your-feature
```
Then open a pull request!
---
## License
This project is licensed under the [MIT License](https://github.com/austin207/Transformer-Virtue-v2/blob/main/LICENSE).
---
## Explore More
* Based on GPT architecture from OpenAI
* Inspired by [karpathy/nanoGPT](https://github.com/karpathy/nanoGPT)
* Compatible with Hugging Face tools and tokenizer ecosystem