--- language: en license: mit tags: - gpt - transformer - text-generation - miniGPT model-index: - name: MiniGPT results: [] --- # MiniGPT — Lightweight Transformer for Text Generation **MiniGPT** is a minimal yet powerful GPT-style language model built from scratch using PyTorch. It is designed for educational clarity, customization, and efficient real-time text generation. This project demonstrates the full training and inference pipeline of a decoder-only transformer architecture, including streaming capabilities and modern sampling strategies. > Hosted with ❤️ by [@Austin207](https://huggingface.co/Austin207) --- ## Model Description MiniGPT is a small, word-level transformer model with the following architecture: * 4 Transformer layers * 4 Attention heads * 128 Embedding dimensions * 512 FFN hidden size * Max sequence length: 128 * Word-level tokenizer (trained with Hugging Face `tokenizers`) Despite its size, it supports advanced generation strategies including: * Repetition Penalty * Temperature Sampling * Top-K & Top-P (nucleus) sampling * Real-time streaming output --- ## Usage Install dependencies: ```bash pip install torch tokenizers ``` Load the model and tokenizer: ```python from miniGPT import MiniGPT from inference import generate_stream from tokenizers import Tokenizer import torch # Load tokenizer tokenizer = Tokenizer.from_file("wordlevel.json") # Load model model = MiniGPT( vocab_size=tokenizer.get_vocab_size(), embed_dim=128, num_heads=4, ff_dim=512, num_layers=4, max_seq_len=128 ) checkpoint = torch.load("model_checkpoint_step20000.pt") model.load_state_dict(checkpoint["model_state_dict"]) model.eval() # Generate text prompt = "Beneath the ancient ruins" generate_stream(model, tokenizer, prompt, max_new_tokens=60, temperature=1.0, top_k=50, top_p=0.9) ``` --- ## Training Train from scratch on any plain-text dataset: ```bash python training.py ``` Training includes: * Checkpointing * Sample generation previews * Word-level tokenization with `tokenizers` * Custom datasets via `alphabetical_dataset.txt` or your own --- ## Files in This Repository | File | Purpose | | -------------------------- | ---------------------------- | | `miniGPT.py` | Core Transformer model | | `transformer.py` | Transformer block logic | | `multiheadattention.py` | Multi-head attention module | | `Tokenizer.py` | Tokenizer loader | | `training.py` | Training loop | | `inference.py` | CLI and streaming generation | | `dataprocess.py` | Text preprocessing tools | | `wordlevel.json` | Trained word-level tokenizer | | `alphabetical_dataset.txt` | Sample dataset | | `requirements.txt` | Required dependencies | --- ## Model Card | Property | Value | | ------------ | --------------------------------- | | Model Type | Decoder-only GPT | | Size | Small (\~4.6M params) | | Trained On | Word-level dataset (custom) | | Intended Use | Text generation, educational demo | | License | MIT | --- ## Intended Use and Limitations This model is meant for educational, experimental, and research purposes. It is not suitable for commercial or production use out-of-the-box. Expect limitations in coherence, factuality, and long-context reasoning. --- ## Contributions We welcome improvements, bug fixes, and new features! ```bash # Fork, clone, and create a branch git clone https://github.com/austin207/Transformer-Virtue-v2.git cd Transformer-Virtue-v2 git checkout -b feature/your-feature ``` Then open a pull request! --- ## License This project is licensed under the [MIT License](https://github.com/austin207/Transformer-Virtue-v2/blob/main/LICENSE). --- ## Explore More * Based on GPT architecture from OpenAI * Inspired by [karpathy/nanoGPT](https://github.com/karpathy/nanoGPT) * Compatible with Hugging Face tools and tokenizer ecosystem