--- license: apache-2.0 language: en tags: - causal-lm - from-scratch - transformer - tiny-stories - pytorch - custom-architecture - text-generation datasets: - fhswf/TinyStoriesV2_cleaned --- # TinyWay-1.1.0 **TinyWay-1.1.0** is a lightweight **decoder-only Transformer language model** trained **from scratch** on limited compute. The project demonstrates that meaningful language modeling behavior can emerge from modest-scale models trained in constrained environments such as Kaggle. > **Core idea:** *Understanding LLM training mechanics end-to-end by building, training, debugging, and deploying a Transformer LM without relying on pretrained weights.* --- ## Model Details * **Architecture:** Decoder-only Transformer (GPT-style) * **Parameters:** ~83M * **Layers:** 10 Transformer blocks * **Hidden size:** 512 * **Attention heads:** 8 * **Context length:** 256 tokens * **Activation:** GELU * **Normalization:** Pre-LayerNorm * **Weight tying:** Token embedding ↔ LM head * **Precision during training:** FP16 (AMP) --- ## Training ### Dataset * **TinyStoriesV2 (cleaned)** * Natural language short stories designed for training small language models ### Tokenization * GPT-2 BPE tokenizer * Vocabulary size: 50,257 ### Training Setup * Optimizer: AdamW * Learning rate: tuned for stable convergence * Gradient accumulation: enabled * Gradient clipping: enabled * Mixed precision training (AMP) * Training performed entirely on **Kaggle GPU environment (12-hour sessions)** ### Checkpoints Models were saved at multiple training steps (5k → 30k). **TinyWay-1.1.0** corresponds to the **~25k step checkpoint**, which showed the best balance of fluency and stability. --- ## Example Usage ```python from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM model_id = "NNEngine/TinyWay-1.1.0" config = AutoConfig.from_pretrained(model_id, trust_remote_code=True) tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) mdl = AutoModelForCausalLM.from_pretrained(model_id, config=config, trust_remote_code=True) out = mdl.generate( **tok( "Once upon a time", return_tensors="pt" ).to(mdl.device), max_new_tokens=200, # force length do_sample=True, # sampling, not greedy temperature=0.8, top_k=50, repetition_penalty=1.2, eos_token_id=None, # 🔥 disable EOS stopping pad_token_id=tok.eos_token_id ) print(tok.decode(out[0], skip_special_tokens=True)) ``` --- ## Sample Output > *Once upon a time, there was a little girl named Lily. She loved to play with her toys and explore the park near her home. One day, she found a shiny red ball hidden behind a tree…* (Outputs vary due to sampling.) --- ## Intended Use * Educational purposes * Research on small-scale language models * Understanding Transformer internals * Studying training dynamics under compute constraints --- ## Limitations * Not instruction-tuned * Not aligned for factual accuracy or safety * May produce repetitive or incoherent text at times * Trained on a limited dataset This model is **not intended for production use** or sensitive applications. --- ## Ethical Considerations * The model may generate fictional or incorrect information * No explicit safety or content filtering was applied * Users should apply downstream safeguards if deploying --- ## Citation If you use this model in academic or technical work, please cite: ```bibtex @misc{sharma2025tinyway, title={TinyWay: Training Decoder-Only Language Models from Scratch on Limited Compute}, author={Shivam Sharma}, year={2025}, } ``` --- ## Author **Shivam Sharma** B.Tech in Computer Science and Engineering (AIML) ITM Gwalior, India --- ## Acknowledgements * Hugging Face Transformers * Kaggle GPU resources * Open research community for open-source inspiration