Tanaybh commited on
Commit
49ded95
·
verified ·
1 Parent(s): e5caa51

Training in progress, step 200

Browse files
Files changed (4) hide show
  1. README.md +53 -104
  2. model.safetensors +1 -1
  3. tokenizer.json +0 -0
  4. training_args.bin +1 -1
README.md CHANGED
@@ -1,124 +1,73 @@
1
  ---
2
- language: en
3
- license: mit
4
  tags:
5
- - text-generation
6
- - gpt2
7
- - transformers
8
- - custom-tokenizer
9
- datasets:
10
- - wikitext
11
  ---
12
 
13
- # 🤖 Nano GPT - Built From Scratch
 
14
 
15
- Hey there! Welcome to my tiny language model. I built this GPT from scratch as a learning project, and honestly, it was pretty fun watching it learn to generate text!
16
 
17
- ## What is this?
 
 
18
 
19
- This is a super small GPT-2 style language model that I trained on my laptop. It's not going to write your essays or solve world hunger, but it's a cool demonstration of how these language models actually work under the hood.
20
 
21
- Think of it as a baby GPT - it can generate text, but don't expect Shakespeare. More like... an enthusiastic toddler who just learned to talk.
22
 
23
- ## Model Stats
24
 
25
- - **Parameters**: ~1,065,728 (yes, that's million with an M, not billion!)
26
- - **Layers**: 4 transformer layers
27
- - **Embedding Size**: 128 dimensions
28
- - **Attention Heads**: 4 heads
29
- - **Context Length**: 128 tokens
30
- - **Vocab Size**: 2000 tokens
31
- - **Training Data**: WikiText-2 (5,000 samples)
32
- - **Training Time**: 10 epochs on my laptop
33
 
34
- ## Quick Start
35
 
36
- Want to try it out? Here's how:
37
 
38
- ```python
39
- from transformers import pipeline
40
 
41
- # Load the model
42
- generator = pipeline('text-generation', model='Tanaybh/nano-gpt-from-scratch')
43
 
44
- # Generate some text
45
- output = generator(
46
- "The meaning of life is",
47
- max_new_tokens=30,
48
- do_sample=True,
49
- temperature=0.8
50
- )
 
51
 
52
- print(output[0]['generated_text'])
53
- ```
54
 
55
- ## Example Output
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
- I gave it the prompt: "**The **"
58
 
59
- And it generated:
60
 
61
- > The XVI
62
-
63
- า��ค��ร����ฃ���อ�
64
-
65
- Not bad for a tiny model trained in a few hours, right?
66
-
67
- ## Training Details
68
-
69
- I trained this model from scratch using:
70
- - Custom BPE tokenizer (trained on the same data)
71
- - GPT-2 architecture (just way smaller)
72
- - AdamW optimizer with a learning rate of 0.0005
73
- - Batch size of 8
74
- - Trained for 10 epochs
75
-
76
- The whole thing runs on a regular laptop - no fancy GPU clusters needed!
77
-
78
- ## Limitations
79
-
80
- Let's be real here:
81
- - This model is TINY. Like, really tiny. It has 1,065,728 parameters vs GPT-3's 175 billion.
82
- - It was only trained on 5,000 Wikipedia samples, so its knowledge is... limited.
83
- - It might generate weird or nonsensical text sometimes. That's part of the charm!
84
- - Maximum context length is only 128 tokens, so don't expect long conversations.
85
- - It's a base model with no instruction tuning, so it just continues text rather than following commands.
86
-
87
- ## Why I Made This
88
-
89
- I wanted to understand how language models work by building one myself. Sure, I could've just fine-tuned a pre-trained model, but where's the fun in that? This project taught me about:
90
- - Tokenizer training
91
- - Transformer architecture
92
- - Training dynamics
93
- - How LLMs actually generate text
94
-
95
- Plus, now I can say I trained a language model from scratch on my laptop. Pretty cool, right?
96
-
97
- ## Future Improvements
98
-
99
- Some things I might try:
100
- - Train on more data (maybe the full WikiText dataset)
101
- - Experiment with different model sizes
102
- - Try different tokenizer configurations
103
- - Add instruction tuning
104
- - Fine-tune it for specific tasks
105
-
106
- ## License
107
-
108
- MIT - Feel free to use this however you want! Learn from it, break it, improve it. That's what it's here for.
109
-
110
- ## Acknowledgments
111
-
112
- Built with:
113
- - 🤗 Hugging Face Transformers
114
- - PyTorch
115
- - The WikiText dataset
116
- - Too much coffee ☕
117
-
118
- ---
119
-
120
- **Note**: This is a learning project and experimental model. Use it for fun and education, not production systems!
121
-
122
- If you found this interesting or helpful, feel free to star the repo or reach out. Always happy to chat about ML stuff!
123
-
124
- *Last updated: October 05, 2025*
 
1
  ---
2
+ library_name: transformers
 
3
  tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: nano-gpt-from-scratch
7
+ results: []
 
 
8
  ---
9
 
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
 
13
+ # nano-gpt-from-scratch
14
 
15
+ This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
16
+ It achieves the following results on the evaluation set:
17
+ - Loss: 4.5459
18
 
19
+ ## Model description
20
 
21
+ More information needed
22
 
23
+ ## Intended uses & limitations
24
 
25
+ More information needed
 
 
 
 
 
 
 
26
 
27
+ ## Training and evaluation data
28
 
29
+ More information needed
30
 
31
+ ## Training procedure
 
32
 
33
+ ### Training hyperparameters
 
34
 
35
+ The following hyperparameters were used during training:
36
+ - learning_rate: 0.0005
37
+ - train_batch_size: 8
38
+ - eval_batch_size: 8
39
+ - seed: 42
40
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
41
+ - lr_scheduler_type: linear
42
+ - num_epochs: 10
43
 
44
+ ### Training results
 
45
 
46
+ | Training Loss | Epoch | Step | Validation Loss |
47
+ |:-------------:|:------:|:----:|:---------------:|
48
+ | 5.9904 | 0.5510 | 200 | 5.9804 |
49
+ | 5.5822 | 1.1019 | 400 | 5.5805 |
50
+ | 5.3387 | 1.6529 | 600 | 5.3769 |
51
+ | 5.2461 | 2.2039 | 800 | 5.2384 |
52
+ | 5.1487 | 2.7548 | 1000 | 5.1084 |
53
+ | 4.9265 | 3.3058 | 1200 | 5.0110 |
54
+ | 4.8586 | 3.8567 | 1400 | 4.9200 |
55
+ | 4.762 | 4.4077 | 1600 | 4.8474 |
56
+ | 4.7138 | 4.9587 | 1800 | 4.7803 |
57
+ | 4.6343 | 5.5096 | 2000 | 4.7298 |
58
+ | 4.5071 | 6.0606 | 2200 | 4.6909 |
59
+ | 4.5473 | 6.6116 | 2400 | 4.6554 |
60
+ | 4.4326 | 7.1625 | 2600 | 4.6202 |
61
+ | 4.4636 | 7.7135 | 2800 | 4.5988 |
62
+ | 4.4093 | 8.2645 | 3000 | 4.5789 |
63
+ | 4.4083 | 8.8154 | 3200 | 4.5609 |
64
+ | 4.3798 | 9.3664 | 3400 | 4.5515 |
65
+ | 4.3871 | 9.9174 | 3600 | 4.5459 |
66
 
 
67
 
68
+ ### Framework versions
69
 
70
+ - Transformers 4.57.0
71
+ - Pytorch 2.8.0
72
+ - Datasets 4.0.0
73
+ - Tokenizers 0.22.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e80447004b71eb2ce844129a864a6b95ab18d416550159895541ffce6967625
3
  size 4267952
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0bfd430a3a8fdeb867c532f4416848ecf83c0606cb3a46ba037f513641d35c2
3
  size 4267952
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b46b782388218c136ca900d01fde9a3844967eb87fcde78a45eff4f5f59e5c7b
3
  size 5841
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fceb742c74051ec1b94c19ac5051f3da403efa8386edf50dce7942e91550145a
3
  size 5841