TinyWay-1.1.0 / README.md

Update README.md

27b1fc1 verified 30 days ago

3.88 kB

	---
	license: apache-2.0
	language: en
	tags:
	- causal-lm
	- from-scratch
	- transformer
	- tiny-stories
	- pytorch
	- custom-architecture
	- text-generation
	datasets:
	- fhswf/TinyStoriesV2_cleaned
	---

	# TinyWay-1.1.0

	TinyWay-1.1.0 is a lightweight decoder-only Transformer language model trained from scratch on limited compute.
	The project demonstrates that meaningful language modeling behavior can emerge from modest-scale models trained in constrained environments such as Kaggle.

	> Core idea: Understanding LLM training mechanics end-to-end by building, training, debugging, and deploying a Transformer LM without relying on pretrained weights.

	---

	## Model Details

	* Architecture: Decoder-only Transformer (GPT-style)
	* Parameters: ~83M
	* Layers: 10 Transformer blocks
	* Hidden size: 512
	* Attention heads: 8
	* Context length: 256 tokens
	* Activation: GELU
	* Normalization: Pre-LayerNorm
	* Weight tying: Token embedding ↔ LM head
	* Precision during training: FP16 (AMP)

	---

	## Training

	### Dataset

	* TinyStoriesV2 (cleaned)
	* Natural language short stories designed for training small language models

	### Tokenization

	* GPT-2 BPE tokenizer
	* Vocabulary size: 50,257

	### Training Setup

	* Optimizer: AdamW
	* Learning rate: tuned for stable convergence
	* Gradient accumulation: enabled
	* Gradient clipping: enabled
	* Mixed precision training (AMP)
	* Training performed entirely on Kaggle GPU environment (12-hour sessions)

	### Checkpoints

	Models were saved at multiple training steps (5k → 30k).
	TinyWay-1.1.0 corresponds to the ~25k step checkpoint, which showed the best balance of fluency and stability.

	---

	## Example Usage

	```python
	from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

	model_id = "NNEngine/TinyWay-1.1.0"

	config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
	tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	mdl = AutoModelForCausalLM.from_pretrained(model_id, config=config, trust_remote_code=True)

	out = mdl.generate(
	**tok(
	"Once upon a time",
	return_tensors="pt"
	).to(mdl.device),

	max_new_tokens=200, # force length
	do_sample=True, # sampling, not greedy
	temperature=0.8,
	top_k=50,
	repetition_penalty=1.2,

	eos_token_id=None, # 🔥 disable EOS stopping
	pad_token_id=tok.eos_token_id
	)

	print(tok.decode(out[0], skip_special_tokens=True))
	```

	---

	## Sample Output

	> Once upon a time, there was a little girl named Lily. She loved to play with her toys and explore the park near her home. One day, she found a shiny red ball hidden behind a tree…

	(Outputs vary due to sampling.)

	---

	## Intended Use

	* Educational purposes
	* Research on small-scale language models
	* Understanding Transformer internals
	* Studying training dynamics under compute constraints

	---

	## Limitations

	* Not instruction-tuned
	* Not aligned for factual accuracy or safety
	* May produce repetitive or incoherent text at times
	* Trained on a limited dataset

	This model is not intended for production use or sensitive applications.

	---

	## Ethical Considerations

	* The model may generate fictional or incorrect information
	* No explicit safety or content filtering was applied
	* Users should apply downstream safeguards if deploying

	---

	## Citation

	If you use this model in academic or technical work, please cite:

	```bibtex
	@misc{sharma2025tinyway,
	title={TinyWay: Training Decoder-Only Language Models from Scratch on Limited Compute},
	author={Shivam Sharma},
	year={2025},
	}
	```

	---

	## Author

	Shivam Sharma
	B.Tech in Computer Science and Engineering (AIML)
	ITM Gwalior, India

	---

	## Acknowledgements

	* Hugging Face Transformers
	* Kaggle GPU resources
	* Open research community for open-source inspiration