| # MiniGPT — Lightweight Transformer for Text Generation | |
| **MiniGPT** is a minimal yet powerful GPT-style language model built from scratch using PyTorch. It is designed for educational clarity, customization, and efficient real-time text generation. This project demonstrates the full training and inference pipeline of a decoder-only transformer architecture, including streaming capabilities and modern sampling strategies. | |
| > Hosted with ❤️ by [@Austin207](https://huggingface.co/Austin207) | |
| --- | |
| ## Model Description | |
| MiniGPT is a small, word-level transformer model with the following architecture: | |
| * 4 Transformer layers | |
| * 4 Attention heads | |
| * 128 Embedding dimensions | |
| * 512 FFN hidden size | |
| * Max sequence length: 128 | |
| * Word-level tokenizer (trained with Hugging Face `tokenizers`) | |
| Despite its size, it supports advanced generation strategies including: | |
| * Repetition Penalty | |
| * Temperature Sampling | |
| * Top-K & Top-P (nucleus) sampling | |
| * Real-time streaming output | |
| --- | |
| ## Usage | |
| Install dependencies: | |
| ```bash | |
| pip install torch tokenizers | |
| ``` | |
| Load the model and tokenizer: | |
| ```python | |
| from miniGPT import MiniGPT | |
| from inference import generate_stream | |
| from tokenizers import Tokenizer | |
| import torch | |
| # Load tokenizer | |
| tokenizer = Tokenizer.from_file("wordlevel.json") | |
| # Load model | |
| model = MiniGPT( | |
| vocab_size=tokenizer.get_vocab_size(), | |
| embed_dim=128, | |
| num_heads=4, | |
| ff_dim=512, | |
| num_layers=4, | |
| max_seq_len=128 | |
| ) | |
| checkpoint = torch.load("model_checkpoint_step20000.pt") | |
| model.load_state_dict(checkpoint["model_state_dict"]) | |
| model.eval() | |
| # Generate text | |
| prompt = "Beneath the ancient ruins" | |
| generate_stream(model, tokenizer, prompt, max_new_tokens=60, temperature=1.0, top_k=50, top_p=0.9) | |
| ``` | |
| --- | |
| ## Training | |
| Train from scratch on any plain-text dataset: | |
| ```bash | |
| python training.py | |
| ``` | |
| Training includes: | |
| * Checkpointing | |
| * Sample generation previews | |
| * Word-level tokenization with `tokenizers` | |
| * Custom datasets via `alphabetical_dataset.txt` or your own | |
| --- | |
| ## Files in This Repository | |
| | File | Purpose | | |
| | -------------------------- | ---------------------------- | | |
| | `miniGPT.py` | Core Transformer model | | |
| | `transformer.py` | Transformer block logic | | |
| | `multiheadattention.py` | Multi-head attention module | | |
| | `Tokenizer.py` | Tokenizer loader | | |
| | `training.py` | Training loop | | |
| | `inference.py` | CLI and streaming generation | | |
| | `dataprocess.py` | Text preprocessing tools | | |
| | `wordlevel.json` | Trained word-level tokenizer | | |
| | `alphabetical_dataset.txt` | Sample dataset | | |
| | `requirements.txt` | Required dependencies | | |
| --- | |
| ## Model Card | |
| | Property | Value | | |
| | ------------ | --------------------------------- | | |
| | Model Type | Decoder-only GPT | | |
| | Size | Small (\~4.6M params) | | |
| | Trained On | Word-level dataset (custom) | | |
| | Intended Use | Text generation, educational demo | | |
| | License | MIT | | |
| --- | |
| ## Intended Use and Limitations | |
| This model is meant for educational, experimental, and research purposes. It is not suitable for commercial or production use out-of-the-box. Expect limitations in coherence, factuality, and long-context reasoning. | |
| --- | |
| ## Contributions | |
| We welcome improvements, bug fixes, and new features! | |
| ```bash | |
| # Fork, clone, and create a branch | |
| git clone https://github.com/austin207/Transformer-Virtue-v2.git | |
| cd Transformer-Virtue-v2 | |
| git checkout -b feature/your-feature | |
| ``` | |
| Then open a pull request! | |
| --- | |
| ## License | |
| This project is licensed under the [MIT License](https://github.com/austin207/Transformer-Virtue-v2/blob/main/LICENSE). | |
| --- | |
| ## Explore More | |
| * Based on GPT architecture from OpenAI | |
| * Inspired by [karpathy/nanoGPT](https://github.com/karpathy/nanoGPT) | |
| * Compatible with Hugging Face tools and tokenizer ecosystem | |