TinyWay-1.1.0 / README.md
NNEngine's picture
Update README.md
27b1fc1 verified
---
license: apache-2.0
language: en
tags:
- causal-lm
- from-scratch
- transformer
- tiny-stories
- pytorch
- custom-architecture
- text-generation
datasets:
- fhswf/TinyStoriesV2_cleaned
---
# TinyWay-1.1.0
**TinyWay-1.1.0** is a lightweight **decoder-only Transformer language model** trained **from scratch** on limited compute.
The project demonstrates that meaningful language modeling behavior can emerge from modest-scale models trained in constrained environments such as Kaggle.
> **Core idea:** *Understanding LLM training mechanics end-to-end by building, training, debugging, and deploying a Transformer LM without relying on pretrained weights.*
---
## Model Details
* **Architecture:** Decoder-only Transformer (GPT-style)
* **Parameters:** ~83M
* **Layers:** 10 Transformer blocks
* **Hidden size:** 512
* **Attention heads:** 8
* **Context length:** 256 tokens
* **Activation:** GELU
* **Normalization:** Pre-LayerNorm
* **Weight tying:** Token embedding ↔ LM head
* **Precision during training:** FP16 (AMP)
---
## Training
### Dataset
* **TinyStoriesV2 (cleaned)**
* Natural language short stories designed for training small language models
### Tokenization
* GPT-2 BPE tokenizer
* Vocabulary size: 50,257
### Training Setup
* Optimizer: AdamW
* Learning rate: tuned for stable convergence
* Gradient accumulation: enabled
* Gradient clipping: enabled
* Mixed precision training (AMP)
* Training performed entirely on **Kaggle GPU environment (12-hour sessions)**
### Checkpoints
Models were saved at multiple training steps (5k → 30k).
**TinyWay-1.1.0** corresponds to the **~25k step checkpoint**, which showed the best balance of fluency and stability.
---
## Example Usage
```python
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
model_id = "NNEngine/TinyWay-1.1.0"
config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
mdl = AutoModelForCausalLM.from_pretrained(model_id, config=config, trust_remote_code=True)
out = mdl.generate(
**tok(
"Once upon a time",
return_tensors="pt"
).to(mdl.device),
max_new_tokens=200, # force length
do_sample=True, # sampling, not greedy
temperature=0.8,
top_k=50,
repetition_penalty=1.2,
eos_token_id=None, # 🔥 disable EOS stopping
pad_token_id=tok.eos_token_id
)
print(tok.decode(out[0], skip_special_tokens=True))
```
---
## Sample Output
> *Once upon a time, there was a little girl named Lily. She loved to play with her toys and explore the park near her home. One day, she found a shiny red ball hidden behind a tree…*
(Outputs vary due to sampling.)
---
## Intended Use
* Educational purposes
* Research on small-scale language models
* Understanding Transformer internals
* Studying training dynamics under compute constraints
---
## Limitations
* Not instruction-tuned
* Not aligned for factual accuracy or safety
* May produce repetitive or incoherent text at times
* Trained on a limited dataset
This model is **not intended for production use** or sensitive applications.
---
## Ethical Considerations
* The model may generate fictional or incorrect information
* No explicit safety or content filtering was applied
* Users should apply downstream safeguards if deploying
---
## Citation
If you use this model in academic or technical work, please cite:
```bibtex
@misc{sharma2025tinyway,
title={TinyWay: Training Decoder-Only Language Models from Scratch on Limited Compute},
author={Shivam Sharma},
year={2025},
}
```
---
## Author
**Shivam Sharma**
B.Tech in Computer Science and Engineering (AIML)
ITM Gwalior, India
---
## Acknowledgements
* Hugging Face Transformers
* Kaggle GPU resources
* Open research community for open-source inspiration