NNEngine
/

TinyWay-1.1.0

+# TinyWay-1.1.0
+**TinyWay-1.1.0** is a lightweight **decoder-only Transformer language model** trained **from scratch** on limited compute.
+The project demonstrates that meaningful language modeling behavior can emerge from modest-scale models trained in constrained environments such as Kaggle.
+> **Core idea:** *Understanding LLM training mechanics end-to-end by building, training, debugging, and deploying a Transformer LM without relying on pretrained weights.*
+---
+## Model Details
+* **Architecture:** Decoder-only Transformer (GPT-style)
+* **Parameters:** ~83M
+* **Layers:** 10 Transformer blocks
+* **Hidden size:** 512
+* **Attention heads:** 8
+* **Context length:** 256 tokens
+* **Activation:** GELU
+* **Normalization:** Pre-LayerNorm
+* **Weight tying:** Token embedding ↔ LM head
+* **Precision during training:** FP16 (AMP)
+---
+## Training
+### Dataset
+* **TinyStoriesV2 (cleaned)**
+* Natural language short stories designed for training small language models
+### Tokenization
+* GPT-2 BPE tokenizer
+* Vocabulary size: 50,257
+### Training Setup
+* Optimizer: AdamW
+* Learning rate: tuned for stable convergence
+* Gradient accumulation: enabled
+* Gradient clipping: enabled
+* Mixed precision training (AMP)
+* Training performed entirely on **Kaggle GPU environment (12-hour sessions)**
+### Checkpoints
+Models were saved at multiple training steps (5k → 30k).
+**TinyWay-1.1.0** corresponds to the **~30k step checkpoint**, which showed the best balance of fluency and stability.
+---
+## Example Usage
+```python
+from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
+model_id = "NNEngine/TinyWay-1.1.0"
+config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
+tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+mdl = AutoModelForCausalLM.from_pretrained(model_id, config=config, trust_remote_code=True)
+out = mdl.generate(
+    **tok(
+        "It was scared to be more of the window and dad",
+        return_tensors="pt"
+    ).to(mdl.device),
+    max_new_tokens=200,          # force length
+    do_sample=True,              # sampling, not greedy
+    temperature=0.8,
+    top_k=50,
+    repetition_penalty=1.2,
+    eos_token_id=None,           # 🔥 disable EOS stopping
+    pad_token_id=tok.eos_token_id
+)
+print(tok.decode(out[0], skip_special_tokens=True))
+```
+---
+## Sample Output
+> *Once upon a time, there was a little girl named Lily. She loved to play with her toys and explore the park near her home. One day, she found a shiny red ball hidden behind a tree…*
+(Outputs vary due to sampling.)
+---
+## Intended Use
+* Educational purposes
+* Research on small-scale language models
+* Understanding Transformer internals
+* Studying training dynamics under compute constraints
+---
+## Limitations
+* Not instruction-tuned
+* Not aligned for factual accuracy or safety
+* May produce repetitive or incoherent text at times
+* Trained on a limited dataset
+This model is **not intended for production use** or sensitive applications.
+---
+## Ethical Considerations
+* The model may generate fictional or incorrect information
+* No explicit safety or content filtering was applied
+* Users should apply downstream safeguards if deploying
+---
+## Citation
+If you use this model in academic or technical work, please cite:
+```bibtex
+@misc{sharma2025tinyway,
+  title={TinyWay: Training Decoder-Only Language Models from Scratch on Limited Compute},
+  author={Shivam Sharma},
+  year={2025},
+}
+```
+---
+## Author
+**Shivam Sharma**
+B.Tech in Computer Science and Engineering (AIML)
+ITM Gwalior, India
+---
+## Acknowledgements
+* Hugging Face Transformers
+* Kaggle GPU resources
+* Open research community for open-source inspiration