NNEngine
/

TinyWay-1.0.0

Text Generation

custom-architecture

Model card Files Files and versions

TinyWay-1.0.0 / README.md

Shivam Sharma

Update README.md

e4e2116 verified 4 days ago

|

history blame contribute delete

2.5 kB

	---
	license: apache-2.0
	language: en
	tags:
	- causal-lm
	- from-scratch
	- transformer
	- tiny-stories
	- pytorch
	- custom-architecture
	- text-generation
	---

	# TinyWay 1.0.0

	TinyWay 1.0.0 is a 52.94M parameter GPT-style causal language model trained from scratch on the TinyStories dataset.
	The model is designed for lightweight story generation, research, and educational exploration of decoder-only Transformer architectures.

	Unlike fine-tuned models, TinyWay was implemented, trained, serialized, and released end-to-end, including a custom Hugging Face-compatible architecture.

	---

	## 🔍 Model Overview

	\| Attribute \| Value \|
	\|---------\|------\|
	\| Architecture \| Decoder-only Transformer (GPT-style) \|
	\| Parameters \| 52.94M \|
	\| Layers \| 8 \|
	\| Hidden size \| 384 \|
	\| Attention heads \| 8 \|
	\| Context length \| 256 tokens \|
	\| Tokenizer \| GPT-2 BPE \|
	\| Framework \| PyTorch \|
	\| Precision \| FP16 (AMP during training) \|

	---

	## 📚 Training Details

	- Dataset: TinyStories (text file, streamed)
	- Training strategy: Streaming token dataset
	- Epochs: 1
	- Effective batch size: 64
	- Optimizer: AdamW
	- Learning rate: 3e-4
	- Dropout: 0.1
	- Hardware: NVIDIA Tesla P100 (16GB)
	- Environment: Kaggle

	The model was trained using causal language modeling, predicting the next token given previous tokens.

	---

	## 🎯 Intended Use

	TinyWay is suitable for:

	- Short story generation
	- Educational demonstrations of Transformer internals
	- Research on small-scale language models
	- Understanding end-to-end LLM construction

	---

	## ⚠️ Limitations

	- Trained only on narrative-style data (TinyStories)
	- Not instruction-tuned
	- Not suitable for factual QA or reasoning-heavy tasks
	- Limited context window (256 tokens)

	---

	## 🚀 Usage

	### Load and generate text

	```python
	from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

	model_id = "shivamsharma120120/TinyWay-1.0.0"

	config = AutoConfig.from_pretrained(
	model_id,
	trust_remote_code=True
	)

	tokenizer = AutoTokenizer.from_pretrained(
	model_id,
	trust_remote_code=True
	)

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	config=config,
	trust_remote_code=True
	)

	inputs = tokenizer("Once upon a time", return_tensors="pt")

	output = model.generate(
	**inputs,
	max_new_tokens=100,
	temperature=0.8,
	top_p=0.95,
	do_sample=True
	)

	print(tokenizer.decode(output[0], skip_special_tokens=True))