File size: 4,204 Bytes
55be401 f02cfcb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
---
language: en
license: mit
tags:
- gpt
- transformer
- text-generation
- miniGPT
model-index:
- name: MiniGPT
results: []
---
# MiniGPT — Lightweight Transformer for Text Generation
**MiniGPT** is a minimal yet powerful GPT-style language model built from scratch using PyTorch. It is designed for educational clarity, customization, and efficient real-time text generation. This project demonstrates the full training and inference pipeline of a decoder-only transformer architecture, including streaming capabilities and modern sampling strategies.
> Hosted with ❤️ by [@Austin207](https://huggingface.co/Austin207)
---
## Model Description
MiniGPT is a small, word-level transformer model with the following architecture:
* 4 Transformer layers
* 4 Attention heads
* 128 Embedding dimensions
* 512 FFN hidden size
* Max sequence length: 128
* Word-level tokenizer (trained with Hugging Face `tokenizers`)
Despite its size, it supports advanced generation strategies including:
* Repetition Penalty
* Temperature Sampling
* Top-K & Top-P (nucleus) sampling
* Real-time streaming output
---
## Usage
Install dependencies:
```bash
pip install torch tokenizers
```
Load the model and tokenizer:
```python
from miniGPT import MiniGPT
from inference import generate_stream
from tokenizers import Tokenizer
import torch
# Load tokenizer
tokenizer = Tokenizer.from_file("wordlevel.json")
# Load model
model = MiniGPT(
vocab_size=tokenizer.get_vocab_size(),
embed_dim=128,
num_heads=4,
ff_dim=512,
num_layers=4,
max_seq_len=128
)
checkpoint = torch.load("model_checkpoint_step20000.pt")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
# Generate text
prompt = "Beneath the ancient ruins"
generate_stream(model, tokenizer, prompt, max_new_tokens=60, temperature=1.0, top_k=50, top_p=0.9)
```
---
## Training
Train from scratch on any plain-text dataset:
```bash
python training.py
```
Training includes:
* Checkpointing
* Sample generation previews
* Word-level tokenization with `tokenizers`
* Custom datasets via `alphabetical_dataset.txt` or your own
---
## Files in This Repository
| File | Purpose |
| -------------------------- | ---------------------------- |
| `miniGPT.py` | Core Transformer model |
| `transformer.py` | Transformer block logic |
| `multiheadattention.py` | Multi-head attention module |
| `Tokenizer.py` | Tokenizer loader |
| `training.py` | Training loop |
| `inference.py` | CLI and streaming generation |
| `dataprocess.py` | Text preprocessing tools |
| `wordlevel.json` | Trained word-level tokenizer |
| `alphabetical_dataset.txt` | Sample dataset |
| `requirements.txt` | Required dependencies |
---
## Model Card
| Property | Value |
| ------------ | --------------------------------- |
| Model Type | Decoder-only GPT |
| Size | Small (\~4.6M params) |
| Trained On | Word-level dataset (custom) |
| Intended Use | Text generation, educational demo |
| License | MIT |
---
## Intended Use and Limitations
This model is meant for educational, experimental, and research purposes. It is not suitable for commercial or production use out-of-the-box. Expect limitations in coherence, factuality, and long-context reasoning.
---
## Contributions
We welcome improvements, bug fixes, and new features!
```bash
# Fork, clone, and create a branch
git clone https://github.com/austin207/Transformer-Virtue-v2.git
cd Transformer-Virtue-v2
git checkout -b feature/your-feature
```
Then open a pull request!
---
## License
This project is licensed under the [MIT License](https://github.com/austin207/Transformer-Virtue-v2/blob/main/LICENSE).
---
## Explore More
* Based on GPT architecture from OpenAI
* Inspired by [karpathy/nanoGPT](https://github.com/karpathy/nanoGPT)
* Compatible with Hugging Face tools and tokenizer ecosystem
|