Austin207 commited on
Commit
f02cfcb
·
verified ·
1 Parent(s): e24d6f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -3
README.md CHANGED
@@ -1,3 +1,147 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MiniGPT — Lightweight Transformer for Text Generation
2
+
3
+ **MiniGPT** is a minimal yet powerful GPT-style language model built from scratch using PyTorch. It is designed for educational clarity, customization, and efficient real-time text generation. This project demonstrates the full training and inference pipeline of a decoder-only transformer architecture, including streaming capabilities and modern sampling strategies.
4
+
5
+ > Hosted with ❤️ by [@Austin207](https://huggingface.co/Austin207)
6
+
7
+ ---
8
+
9
+ ## Model Description
10
+
11
+ MiniGPT is a small, word-level transformer model with the following architecture:
12
+
13
+ * 4 Transformer layers
14
+ * 4 Attention heads
15
+ * 128 Embedding dimensions
16
+ * 512 FFN hidden size
17
+ * Max sequence length: 128
18
+ * Word-level tokenizer (trained with Hugging Face `tokenizers`)
19
+
20
+ Despite its size, it supports advanced generation strategies including:
21
+
22
+ * Repetition Penalty
23
+ * Temperature Sampling
24
+ * Top-K & Top-P (nucleus) sampling
25
+ * Real-time streaming output
26
+
27
+ ---
28
+
29
+ ## Usage
30
+
31
+ Install dependencies:
32
+
33
+ ```bash
34
+ pip install torch tokenizers
35
+ ```
36
+
37
+ Load the model and tokenizer:
38
+
39
+ ```python
40
+ from miniGPT import MiniGPT
41
+ from inference import generate_stream
42
+ from tokenizers import Tokenizer
43
+ import torch
44
+
45
+ # Load tokenizer
46
+ tokenizer = Tokenizer.from_file("wordlevel.json")
47
+
48
+ # Load model
49
+ model = MiniGPT(
50
+ vocab_size=tokenizer.get_vocab_size(),
51
+ embed_dim=128,
52
+ num_heads=4,
53
+ ff_dim=512,
54
+ num_layers=4,
55
+ max_seq_len=128
56
+ )
57
+
58
+ checkpoint = torch.load("model_checkpoint_step20000.pt")
59
+ model.load_state_dict(checkpoint["model_state_dict"])
60
+ model.eval()
61
+
62
+ # Generate text
63
+ prompt = "Beneath the ancient ruins"
64
+ generate_stream(model, tokenizer, prompt, max_new_tokens=60, temperature=1.0, top_k=50, top_p=0.9)
65
+ ```
66
+
67
+ ---
68
+
69
+ ## Training
70
+
71
+ Train from scratch on any plain-text dataset:
72
+
73
+ ```bash
74
+ python training.py
75
+ ```
76
+
77
+ Training includes:
78
+
79
+ * Checkpointing
80
+ * Sample generation previews
81
+ * Word-level tokenization with `tokenizers`
82
+ * Custom datasets via `alphabetical_dataset.txt` or your own
83
+
84
+ ---
85
+
86
+ ## Files in This Repository
87
+
88
+ | File | Purpose |
89
+ | -------------------------- | ---------------------------- |
90
+ | `miniGPT.py` | Core Transformer model |
91
+ | `transformer.py` | Transformer block logic |
92
+ | `multiheadattention.py` | Multi-head attention module |
93
+ | `Tokenizer.py` | Tokenizer loader |
94
+ | `training.py` | Training loop |
95
+ | `inference.py` | CLI and streaming generation |
96
+ | `dataprocess.py` | Text preprocessing tools |
97
+ | `wordlevel.json` | Trained word-level tokenizer |
98
+ | `alphabetical_dataset.txt` | Sample dataset |
99
+ | `requirements.txt` | Required dependencies |
100
+
101
+ ---
102
+
103
+ ## Model Card
104
+
105
+ | Property | Value |
106
+ | ------------ | --------------------------------- |
107
+ | Model Type | Decoder-only GPT |
108
+ | Size | Small (\~4.6M params) |
109
+ | Trained On | Word-level dataset (custom) |
110
+ | Intended Use | Text generation, educational demo |
111
+ | License | MIT |
112
+
113
+ ---
114
+
115
+ ## Intended Use and Limitations
116
+
117
+ This model is meant for educational, experimental, and research purposes. It is not suitable for commercial or production use out-of-the-box. Expect limitations in coherence, factuality, and long-context reasoning.
118
+
119
+ ---
120
+
121
+ ## Contributions
122
+
123
+ We welcome improvements, bug fixes, and new features!
124
+
125
+ ```bash
126
+ # Fork, clone, and create a branch
127
+ git clone https://github.com/austin207/Transformer-Virtue-v2.git
128
+ cd Transformer-Virtue-v2
129
+ git checkout -b feature/your-feature
130
+ ```
131
+
132
+ Then open a pull request!
133
+
134
+ ---
135
+
136
+ ## License
137
+
138
+ This project is licensed under the [MIT License](https://github.com/austin207/Transformer-Virtue-v2/blob/main/LICENSE).
139
+
140
+ ---
141
+
142
+ ## Explore More
143
+
144
+ * Based on GPT architecture from OpenAI
145
+ * Inspired by [karpathy/nanoGPT](https://github.com/karpathy/nanoGPT)
146
+ * Compatible with Hugging Face tools and tokenizer ecosystem
147
+