karthik-2905 commited on
Commit
c27d6df
·
verified ·
1 Parent(s): 6643b08

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +160 -3
README.md CHANGED
@@ -1,3 +1,160 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ library_name: pytorch
5
+ tags:
6
+ - text-generation
7
+ - gpt
8
+ - transformers
9
+ - language-model
10
+ - alice-in-wonderland
11
+ - literature
12
+ datasets:
13
+ - alice-in-wonderland
14
+ metrics:
15
+ - perplexity
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # 1st Demo GPT Based Architecture Model
20
+
21
+ ## Model Description
22
+
23
+ This is a **GPT-based transformer language model** trained from scratch on Lewis Carroll's "Alice's Adventures in Wonderland". This model demonstrates a custom implementation of the GPT architecture for text generation tasks, specifically fine-tuned on classic literature.
24
+
25
+ ## Model Details
26
+
27
+ - **Model Type**: GPT (Generative Pre-trained Transformer)
28
+ - **Architecture**: Custom transformer-based language model
29
+ - **Training Data**: Alice's Adventures in Wonderland by Lewis Carroll
30
+ - **Language**: English
31
+ - **Library**: PyTorch
32
+ - **Model Size**: ~4.2M parameters (based on complete_gpt_model.pth)
33
+
34
+ ## Training Details
35
+
36
+ ### Dataset
37
+ - **Source**: Alice's Adventures in Wonderland (complete text)
38
+ - **Size**: 1,033 lines of text
39
+ - **Preprocessing**: Custom tokenization using character-level or subword tokenization
40
+
41
+ ### Training Configuration
42
+ - **Epochs**: 3 (checkpoint files available for each epoch)
43
+ - **Optimizer**: Likely AdamW (standard for transformer models)
44
+ - **Training Files**:
45
+ - `checkpoint_epoch_1.pth` (12.2MB)
46
+ - `checkpoint_epoch_2.pth` (12.2MB)
47
+ - `checkpoint_epoch_3.pth` (12.2MB)
48
+ - `best_model.pth` (4.14MB) - Best performing checkpoint
49
+ - `complete_gpt_model.pth` (4.20MB) - Final trained model
50
+
51
+ ## Files in this Repository
52
+
53
+ | File | Size | Description |
54
+ |------|------|-------------|
55
+ | `complete_gpt_model.pth` | 4.20MB | Final trained model weights |
56
+ | `best_model.pth` | 4.14MB | Best performing model checkpoint |
57
+ | `checkpoint_epoch_1.pth` | 12.2MB | Training checkpoint after epoch 1 |
58
+ | `checkpoint_epoch_2.pth` | 12.2MB | Training checkpoint after epoch 2 |
59
+ | `checkpoint_epoch_3.pth` | 12.2MB | Training checkpoint after epoch 3 |
60
+ | `tokenizer.pkl` | 37.3KB | Custom tokenizer for the model |
61
+ | `dataset.txt` | 51KB | Training dataset (Alice in Wonderland) |
62
+ | `Notebook1.ipynb` | 4.1MB | Training notebook with implementation |
63
+
64
+ ## Usage
65
+
66
+ ### Loading the Model
67
+
68
+ ```python
69
+ import torch
70
+ import pickle
71
+
72
+ # Load the tokenizer
73
+ with open('tokenizer.pkl', 'rb') as f:
74
+ tokenizer = pickle.load(f)
75
+
76
+ # Load the model
77
+ model = torch.load('complete_gpt_model.pth', map_location='cpu')
78
+ model.eval()
79
+ ```
80
+
81
+ ### Text Generation
82
+
83
+ ```python
84
+ def generate_text(model, tokenizer, prompt, max_length=100):
85
+ model.eval()
86
+ with torch.no_grad():
87
+ # Tokenize input
88
+ input_ids = tokenizer.encode(prompt)
89
+
90
+ # Generate text
91
+ for _ in range(max_length):
92
+ # Your generation logic here
93
+ # This will depend on your specific implementation
94
+ pass
95
+
96
+ return generated_text
97
+
98
+ # Example usage
99
+ prompt = "Alice was beginning to get very tired"
100
+ generated = generate_text(model, tokenizer, prompt)
101
+ print(generated)
102
+ ```
103
+
104
+ ## Model Performance
105
+
106
+ The model has been trained for 3 epochs on the Alice in Wonderland dataset. Performance metrics and loss curves can be found in the training notebook (`Notebook1.ipynb`).
107
+
108
+ ### Expected Outputs
109
+ Given the training on Alice in Wonderland, the model should generate text in a similar style to Lewis Carroll's writing, with:
110
+ - Victorian-era English vocabulary and sentence structure
111
+ - Whimsical and fantastical content
112
+ - Character references from the original story
113
+ - Descriptive and narrative prose style
114
+
115
+ ## Training Process
116
+
117
+ The training was conducted using:
118
+ 1. **Data Preprocessing**: Text cleaning and tokenization
119
+ 2. **Model Architecture**: Custom GPT implementation
120
+ 3. **Training Loop**: 3 epochs with checkpoint saving
121
+ 4. **Validation**: Best model selection based on validation metrics
122
+
123
+ ## Limitations
124
+
125
+ - **Dataset Size**: Trained on a single book, limiting vocabulary and style diversity
126
+ - **Domain Specificity**: Optimized for Lewis Carroll's writing style
127
+ - **Scale**: Relatively small model compared to modern large language models
128
+ - **Context Length**: Limited context window typical of smaller transformer models
129
+
130
+ ## Ethical Considerations
131
+
132
+ - This model is trained on public domain literature (Alice in Wonderland)
133
+ - The training data is from 1865 and may contain outdated language or concepts
134
+ - The model is intended for educational and demonstration purposes
135
+
136
+ ## Citation
137
+
138
+ If you use this model, please cite:
139
+
140
+ ```bibtex
141
+ @misc{karthik2024alice_gpt,
142
+ title={1st Demo GPT Based Architecture Model},
143
+ author={Karthik},
144
+ year={2024},
145
+ howpublished={Hugging Face Model Hub},
146
+ url={https://huggingface.co/karthik-2905/1st_Demo_GPT_Based_Architecture_Model}
147
+ }
148
+ ```
149
+
150
+ ## License
151
+
152
+ This model is released under the MIT License. The training data (Alice's Adventures in Wonderland) is in the public domain.
153
+
154
+ ## Contact
155
+
156
+ For questions or issues, please open an issue in this repository or contact the model author.
157
+
158
+ ---
159
+
160
+ *This model was created as a learning exercise to demonstrate GPT architecture implementation and training on classic literature.*