NNEngine commited on
Commit
dad5b6e
·
verified ·
1 Parent(s): 4f7e31d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -3
README.md CHANGED
@@ -1,3 +1,147 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TinyWay-1.1.0
2
+
3
+ **TinyWay-1.1.0** is a lightweight **decoder-only Transformer language model** trained **from scratch** on limited compute.
4
+ The project demonstrates that meaningful language modeling behavior can emerge from modest-scale models trained in constrained environments such as Kaggle.
5
+
6
+ > **Core idea:** *Understanding LLM training mechanics end-to-end by building, training, debugging, and deploying a Transformer LM without relying on pretrained weights.*
7
+
8
+ ---
9
+
10
+ ## Model Details
11
+
12
+ * **Architecture:** Decoder-only Transformer (GPT-style)
13
+ * **Parameters:** ~83M
14
+ * **Layers:** 10 Transformer blocks
15
+ * **Hidden size:** 512
16
+ * **Attention heads:** 8
17
+ * **Context length:** 256 tokens
18
+ * **Activation:** GELU
19
+ * **Normalization:** Pre-LayerNorm
20
+ * **Weight tying:** Token embedding ↔ LM head
21
+ * **Precision during training:** FP16 (AMP)
22
+
23
+ ---
24
+
25
+ ## Training
26
+
27
+ ### Dataset
28
+
29
+ * **TinyStoriesV2 (cleaned)**
30
+ * Natural language short stories designed for training small language models
31
+
32
+ ### Tokenization
33
+
34
+ * GPT-2 BPE tokenizer
35
+ * Vocabulary size: 50,257
36
+
37
+ ### Training Setup
38
+
39
+ * Optimizer: AdamW
40
+ * Learning rate: tuned for stable convergence
41
+ * Gradient accumulation: enabled
42
+ * Gradient clipping: enabled
43
+ * Mixed precision training (AMP)
44
+ * Training performed entirely on **Kaggle GPU environment (12-hour sessions)**
45
+
46
+ ### Checkpoints
47
+
48
+ Models were saved at multiple training steps (5k → 30k).
49
+ **TinyWay-1.1.0** corresponds to the **~30k step checkpoint**, which showed the best balance of fluency and stability.
50
+
51
+ ---
52
+
53
+ ## Example Usage
54
+
55
+ ```python
56
+ from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
57
+
58
+ model_id = "NNEngine/TinyWay-1.1.0"
59
+
60
+ config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
61
+ tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
62
+ mdl = AutoModelForCausalLM.from_pretrained(model_id, config=config, trust_remote_code=True)
63
+
64
+ out = mdl.generate(
65
+ **tok(
66
+ "It was scared to be more of the window and dad",
67
+ return_tensors="pt"
68
+ ).to(mdl.device),
69
+
70
+ max_new_tokens=200, # force length
71
+ do_sample=True, # sampling, not greedy
72
+ temperature=0.8,
73
+ top_k=50,
74
+ repetition_penalty=1.2,
75
+
76
+ eos_token_id=None, # 🔥 disable EOS stopping
77
+ pad_token_id=tok.eos_token_id
78
+ )
79
+
80
+ print(tok.decode(out[0], skip_special_tokens=True))
81
+ ```
82
+
83
+ ---
84
+
85
+ ## Sample Output
86
+
87
+ > *Once upon a time, there was a little girl named Lily. She loved to play with her toys and explore the park near her home. One day, she found a shiny red ball hidden behind a tree…*
88
+
89
+ (Outputs vary due to sampling.)
90
+
91
+ ---
92
+
93
+ ## Intended Use
94
+
95
+ * Educational purposes
96
+ * Research on small-scale language models
97
+ * Understanding Transformer internals
98
+ * Studying training dynamics under compute constraints
99
+
100
+ ---
101
+
102
+ ## Limitations
103
+
104
+ * Not instruction-tuned
105
+ * Not aligned for factual accuracy or safety
106
+ * May produce repetitive or incoherent text at times
107
+ * Trained on a limited dataset
108
+
109
+ This model is **not intended for production use** or sensitive applications.
110
+
111
+ ---
112
+
113
+ ## Ethical Considerations
114
+
115
+ * The model may generate fictional or incorrect information
116
+ * No explicit safety or content filtering was applied
117
+ * Users should apply downstream safeguards if deploying
118
+
119
+ ---
120
+
121
+ ## Citation
122
+
123
+ If you use this model in academic or technical work, please cite:
124
+
125
+ ```bibtex
126
+ @misc{sharma2025tinyway,
127
+ title={TinyWay: Training Decoder-Only Language Models from Scratch on Limited Compute},
128
+ author={Shivam Sharma},
129
+ year={2025},
130
+ }
131
+ ```
132
+
133
+ ---
134
+
135
+ ## Author
136
+
137
+ **Shivam Sharma**
138
+ B.Tech in Computer Science and Engineering (AIML)
139
+ ITM Gwalior, India
140
+
141
+ ---
142
+
143
+ ## Acknowledgements
144
+
145
+ * Hugging Face Transformers
146
+ * Kaggle GPU resources
147
+ * Open research community for open-source inspiration