Exquisique commited on
Commit
f82a831
Β·
verified Β·
1 Parent(s): bbd1dcc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -122
README.md CHANGED
@@ -1,122 +1,122 @@
1
- ---
2
- language: en
3
- license: mit
4
- tags:
5
- - gpt
6
- - transformer
7
- - small-model
8
- - from-scratch
9
- - babymodel
10
- datasets:
11
- - roneneldan/TinyStories
12
- library_name: transformers
13
- pipeline_tag: text-generation
14
- ---
15
-
16
-
17
- # 🍼 BabyLangModel
18
-
19
- A tiny GPT-style language model trained from scratch on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset. Built using PyTorch and a custom architecture inspired by [nanoGPT](https://github.com/karpathy/nanoGPT). This model was trained for 200k iterations on a consumer GPU (RTX 4060) using custom code from scratch.
20
-
21
- ---
22
-
23
- ## 🧠 Model Details
24
-
25
- - **Architecture**: GPT (custom implementation)
26
- - **Parameters**: ~10–15M
27
- - **Layers**: 6
28
- - **Heads**: 6
29
- - **Embedding Size**: 384
30
- - **Block Size**: 128
31
- - **Tokenizer**: GPT-2 (`tiktoken`)
32
- - **Training Steps**: 200,000
33
- - **Training Loss**: ~1.80
34
-
35
- ---
36
-
37
- ## πŸ“š Training Data
38
-
39
- We trained on the open-source **[TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)** dataset by Microsoft Research. It's a dataset of short, simple English stories written for young children (ages 2–4).
40
-
41
- - Clean, simple narratives
42
- - Ideal for small model generalization
43
- - 100% open and publicly available
44
-
45
- ---
46
-
47
- ## 🧰 Usage (with `transformers`)
48
-
49
- This model uses a **custom architecture**, so you need to use `trust_remote_code=True`:
50
-
51
- ```python
52
- from transformers import AutoModel
53
-
54
- model = AutoModel.from_pretrained("Exquisique/BabyLangModel", trust_remote_code=True)
55
- ```
56
-
57
- ---
58
-
59
- ## ✨ Sample Generation
60
-
61
- ```text
62
- Prompt: Once upon a time there was a tiny robot who
63
-
64
- Output: ...lived in a far away home. One day, a little girl named Lily decided to go on a special trip in the forest. She walked and walked until she got there but suddenly she started to go. Her mom called her and said, "Don't worry, Lily. We will get you my special ride."
65
- ```
66
-
67
- > πŸ—£οΈ Still improving, but quite readable and story-like after 200k iterations!
68
-
69
- ---
70
-
71
- ## πŸ’» Train It Yourself
72
-
73
- You can find the full training code on [GitHub](https://github.com/Exquisique/Babymodel) or use this structure:
74
-
75
- ```bash
76
- python -m src.tokenizer # Tokenize TinyStories
77
- python -m src.train # Train model from scratch
78
- python -m src.generate # Generate text
79
- ```
80
-
81
- You’ll also find:
82
- - Checkpointing & resume support
83
- - Configurable hyperparams
84
- - Gradient accumulation & mixed precision
85
-
86
- ---
87
-
88
- ## πŸ”§ Config Used
89
-
90
- ```json
91
- {
92
- "vocab_size": 50257,
93
- "block_size": 128,
94
- "n_layer": 6,
95
- "n_head": 6,
96
- "n_embd": 384,
97
- "model_type": "gpt"
98
- }
99
- ```
100
-
101
- ---
102
-
103
- ## πŸ“¦ Inference Notes
104
-
105
- To load the model, use:
106
-
107
- ```python
108
- from transformers import AutoModel
109
- model = AutoModel.from_pretrained("Exquisique/BabyLangModel", trust_remote_code=True)
110
- ```
111
-
112
- You can also upload a tokenizer later for full text input support (e.g. with `tiktoken`).
113
-
114
- ---
115
-
116
- ## πŸ§‘β€πŸ’» Author
117
- **Exquisique** β€” GenAI explorer, poetic dreamer, and neural model whisperer.
118
-
119
- ---
120
-
121
- ## πŸ“œ License
122
- MIT β€” open source, fine-tune and remix freely. ✨
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - gpt
6
+ - transformer
7
+ - small-model
8
+ - from-scratch
9
+ - babymodel
10
+ datasets:
11
+ - roneneldan/TinyStories
12
+ library_name: transformers
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+
17
+ # BabyLangModel
18
+
19
+ A tiny GPT-style language model trained from scratch on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset. Built using PyTorch and a custom architecture inspired by [nanoGPT](https://github.com/karpathy/nanoGPT). This model was trained for 200k iterations on a consumer GPU (RTX 4060) using custom code from scratch.
20
+
21
+ ---
22
+
23
+ ## 🧠 Model Details
24
+
25
+ - **Architecture**: GPT (custom implementation)
26
+ - **Parameters**: ~10–15M
27
+ - **Layers**: 6
28
+ - **Heads**: 6
29
+ - **Embedding Size**: 384
30
+ - **Block Size**: 128
31
+ - **Tokenizer**: GPT-2 (`tiktoken`)
32
+ - **Training Steps**: 200,000
33
+ - **Training Loss**: ~1.80
34
+
35
+ ---
36
+
37
+ ## πŸ“š Training Data
38
+
39
+ We trained on the open-source **[TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)** dataset by Microsoft Research. It's a dataset of short, simple English stories written for young children (ages 2–4).
40
+
41
+ - Clean, simple narratives
42
+ - Ideal for small model generalization
43
+ - 100% open and publicly available
44
+
45
+ ---
46
+
47
+ ## 🧰 Usage (with `transformers`)
48
+
49
+ This model uses a **custom architecture**, so you need to use `trust_remote_code=True`:
50
+
51
+ ```python
52
+ from transformers import AutoModel
53
+
54
+ model = AutoModel.from_pretrained("Exquisique/BabyLangModel", trust_remote_code=True)
55
+ ```
56
+
57
+ ---
58
+
59
+ ## ✨ Sample Generation
60
+
61
+ ```text
62
+ Prompt: Once upon a time there was a tiny robot who
63
+
64
+ Output: ...lived in a far away home. One day, a little girl named Lily decided to go on a special trip in the forest. She walked and walked until she got there but suddenly she started to go. Her mom called her and said, "Don't worry, Lily. We will get you my special ride."
65
+ ```
66
+
67
+ > πŸ—£οΈ Still improving, but quite readable and story-like after 200k iterations!
68
+
69
+ ---
70
+
71
+ ## πŸ’» Train It Yourself
72
+
73
+ You can find the full training code on [GitHub](https://github.com/Exquisique/Babymodel) or use this structure:
74
+
75
+ ```bash
76
+ python -m src.tokenizer # Tokenize TinyStories
77
+ python -m src.train # Train model from scratch
78
+ python -m src.generate # Generate text
79
+ ```
80
+
81
+ You’ll also find:
82
+ - Checkpointing & resume support
83
+ - Configurable hyperparams
84
+ - Gradient accumulation & mixed precision
85
+
86
+ ---
87
+
88
+ ## πŸ”§ Config Used
89
+
90
+ ```json
91
+ {
92
+ "vocab_size": 50257,
93
+ "block_size": 128,
94
+ "n_layer": 6,
95
+ "n_head": 6,
96
+ "n_embd": 384,
97
+ "model_type": "gpt"
98
+ }
99
+ ```
100
+
101
+ ---
102
+
103
+ ## πŸ“¦ Inference Notes
104
+
105
+ To load the model, use:
106
+
107
+ ```python
108
+ from transformers import AutoModel
109
+ model = AutoModel.from_pretrained("Exquisique/BabyLangModel", trust_remote_code=True)
110
+ ```
111
+
112
+ You can also upload a tokenizer later for full text input support (e.g. with `tiktoken`).
113
+
114
+ ---
115
+
116
+ ## πŸ§‘β€πŸ’» Author
117
+ **Exquisique** β€” GenAI explorer, poetic dreamer, and neural model whisperer.
118
+
119
+ ---
120
+
121
+ ## πŸ“œ License
122
+ MIT β€” open source, fine-tune and remix freely. ✨