File size: 4,204 Bytes
55be401
 
 
 
 
 
 
 
 
 
 
 
 
f02cfcb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
language: en
license: mit
tags:
  - gpt
  - transformer
  - text-generation
  - miniGPT
model-index:
  - name: MiniGPT
    results: []
---

#  MiniGPT — Lightweight Transformer for Text Generation

**MiniGPT** is a minimal yet powerful GPT-style language model built from scratch using PyTorch. It is designed for educational clarity, customization, and efficient real-time text generation. This project demonstrates the full training and inference pipeline of a decoder-only transformer architecture, including streaming capabilities and modern sampling strategies.

>  Hosted with ❤️ by [@Austin207](https://huggingface.co/Austin207)

---

##  Model Description

MiniGPT is a small, word-level transformer model with the following architecture:

*  4 Transformer layers
*  4 Attention heads
*  128 Embedding dimensions
*  512 FFN hidden size
*  Max sequence length: 128
*  Word-level tokenizer (trained with Hugging Face `tokenizers`)

Despite its size, it supports advanced generation strategies including:

*  Repetition Penalty
*  Temperature Sampling
*  Top-K & Top-P (nucleus) sampling
*  Real-time streaming output

---

##  Usage

Install dependencies:

```bash
pip install torch tokenizers
```

Load the model and tokenizer:

```python
from miniGPT import MiniGPT
from inference import generate_stream
from tokenizers import Tokenizer
import torch

# Load tokenizer
tokenizer = Tokenizer.from_file("wordlevel.json")

# Load model
model = MiniGPT(
    vocab_size=tokenizer.get_vocab_size(),
    embed_dim=128,
    num_heads=4,
    ff_dim=512,
    num_layers=4,
    max_seq_len=128
)

checkpoint = torch.load("model_checkpoint_step20000.pt")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

# Generate text
prompt = "Beneath the ancient ruins"
generate_stream(model, tokenizer, prompt, max_new_tokens=60, temperature=1.0, top_k=50, top_p=0.9)
```

---

##  Training

Train from scratch on any plain-text dataset:

```bash
python training.py
```

Training includes:

*  Checkpointing
*  Sample generation previews
*  Word-level tokenization with `tokenizers`
*  Custom datasets via `alphabetical_dataset.txt` or your own

---

##  Files in This Repository

| File                       | Purpose                      |
| -------------------------- | ---------------------------- |
| `miniGPT.py`               | Core Transformer model       |
| `transformer.py`           | Transformer block logic      |
| `multiheadattention.py`    | Multi-head attention module  |
| `Tokenizer.py`             | Tokenizer loader             |
| `training.py`              | Training loop                |
| `inference.py`             | CLI and streaming generation |
| `dataprocess.py`           | Text preprocessing tools     |
| `wordlevel.json`           | Trained word-level tokenizer |
| `alphabetical_dataset.txt` | Sample dataset               |
| `requirements.txt`         | Required dependencies        |

---

##  Model Card

| Property     | Value                             |
| ------------ | --------------------------------- |
| Model Type   | Decoder-only GPT                  |
| Size         | Small (\~4.6M params)             |
| Trained On   | Word-level dataset (custom)       |
| Intended Use | Text generation, educational demo |
| License      | MIT                               |

---

##  Intended Use and Limitations

This model is meant for educational, experimental, and research purposes. It is not suitable for commercial or production use out-of-the-box. Expect limitations in coherence, factuality, and long-context reasoning.

---

##  Contributions

We welcome improvements, bug fixes, and new features!

```bash
# Fork, clone, and create a branch
git clone https://github.com/austin207/Transformer-Virtue-v2.git
cd Transformer-Virtue-v2
git checkout -b feature/your-feature
```

Then open a pull request!

---

##  License

This project is licensed under the [MIT License](https://github.com/austin207/Transformer-Virtue-v2/blob/main/LICENSE).

---

##  Explore More

*  Based on GPT architecture from OpenAI
*  Inspired by [karpathy/nanoGPT](https://github.com/karpathy/nanoGPT)
*  Compatible with Hugging Face tools and tokenizer ecosystem