File size: 3,879 Bytes
4b01687 4551395 4b01687 dad5b6e 27b1fc1 dad5b6e d91a44d dad5b6e 4551395 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
---
license: apache-2.0
language: en
tags:
- causal-lm
- from-scratch
- transformer
- tiny-stories
- pytorch
- custom-architecture
- text-generation
datasets:
- fhswf/TinyStoriesV2_cleaned
---
# TinyWay-1.1.0
**TinyWay-1.1.0** is a lightweight **decoder-only Transformer language model** trained **from scratch** on limited compute.
The project demonstrates that meaningful language modeling behavior can emerge from modest-scale models trained in constrained environments such as Kaggle.
> **Core idea:** *Understanding LLM training mechanics end-to-end by building, training, debugging, and deploying a Transformer LM without relying on pretrained weights.*
---
## Model Details
* **Architecture:** Decoder-only Transformer (GPT-style)
* **Parameters:** ~83M
* **Layers:** 10 Transformer blocks
* **Hidden size:** 512
* **Attention heads:** 8
* **Context length:** 256 tokens
* **Activation:** GELU
* **Normalization:** Pre-LayerNorm
* **Weight tying:** Token embedding ↔ LM head
* **Precision during training:** FP16 (AMP)
---
## Training
### Dataset
* **TinyStoriesV2 (cleaned)**
* Natural language short stories designed for training small language models
### Tokenization
* GPT-2 BPE tokenizer
* Vocabulary size: 50,257
### Training Setup
* Optimizer: AdamW
* Learning rate: tuned for stable convergence
* Gradient accumulation: enabled
* Gradient clipping: enabled
* Mixed precision training (AMP)
* Training performed entirely on **Kaggle GPU environment (12-hour sessions)**
### Checkpoints
Models were saved at multiple training steps (5k → 30k).
**TinyWay-1.1.0** corresponds to the **~25k step checkpoint**, which showed the best balance of fluency and stability.
---
## Example Usage
```python
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
model_id = "NNEngine/TinyWay-1.1.0"
config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
mdl = AutoModelForCausalLM.from_pretrained(model_id, config=config, trust_remote_code=True)
out = mdl.generate(
**tok(
"Once upon a time",
return_tensors="pt"
).to(mdl.device),
max_new_tokens=200, # force length
do_sample=True, # sampling, not greedy
temperature=0.8,
top_k=50,
repetition_penalty=1.2,
eos_token_id=None, # 🔥 disable EOS stopping
pad_token_id=tok.eos_token_id
)
print(tok.decode(out[0], skip_special_tokens=True))
```
---
## Sample Output
> *Once upon a time, there was a little girl named Lily. She loved to play with her toys and explore the park near her home. One day, she found a shiny red ball hidden behind a tree…*
(Outputs vary due to sampling.)
---
## Intended Use
* Educational purposes
* Research on small-scale language models
* Understanding Transformer internals
* Studying training dynamics under compute constraints
---
## Limitations
* Not instruction-tuned
* Not aligned for factual accuracy or safety
* May produce repetitive or incoherent text at times
* Trained on a limited dataset
This model is **not intended for production use** or sensitive applications.
---
## Ethical Considerations
* The model may generate fictional or incorrect information
* No explicit safety or content filtering was applied
* Users should apply downstream safeguards if deploying
---
## Citation
If you use this model in academic or technical work, please cite:
```bibtex
@misc{sharma2025tinyway,
title={TinyWay: Training Decoder-Only Language Models from Scratch on Limited Compute},
author={Shivam Sharma},
year={2025},
}
```
---
## Author
**Shivam Sharma**
B.Tech in Computer Science and Engineering (AIML)
ITM Gwalior, India
---
## Acknowledgements
* Hugging Face Transformers
* Kaggle GPU resources
* Open research community for open-source inspiration |