TinyWay-1.0.0 / README.md
Shivam Sharma
Update README.md
e4e2116 verified
---
license: apache-2.0
language: en
tags:
- causal-lm
- from-scratch
- transformer
- tiny-stories
- pytorch
- custom-architecture
- text-generation
---
# TinyWay 1.0.0
**TinyWay 1.0.0** is a **52.94M parameter GPT-style causal language model** trained **from scratch** on the **TinyStories** dataset.
The model is designed for **lightweight story generation, research, and educational exploration** of decoder-only Transformer architectures.
Unlike fine-tuned models, TinyWay was **implemented, trained, serialized, and released end-to-end**, including a **custom Hugging Face-compatible architecture**.
---
## πŸ” Model Overview
| Attribute | Value |
|---------|------|
| Architecture | Decoder-only Transformer (GPT-style) |
| Parameters | **52.94M** |
| Layers | 8 |
| Hidden size | 384 |
| Attention heads | 8 |
| Context length | 256 tokens |
| Tokenizer | GPT-2 BPE |
| Framework | PyTorch |
| Precision | FP16 (AMP during training) |
---
## πŸ“š Training Details
- **Dataset**: TinyStories (text file, streamed)
- **Training strategy**: Streaming token dataset
- **Epochs**: 1
- **Effective batch size**: 64
- **Optimizer**: AdamW
- **Learning rate**: 3e-4
- **Dropout**: 0.1
- **Hardware**: NVIDIA Tesla P100 (16GB)
- **Environment**: Kaggle
The model was trained using **causal language modeling**, predicting the next token given previous tokens.
---
## 🎯 Intended Use
TinyWay is suitable for:
- Short story generation
- Educational demonstrations of Transformer internals
- Research on small-scale language models
- Understanding end-to-end LLM construction
---
## ⚠️ Limitations
- Trained only on narrative-style data (TinyStories)
- Not instruction-tuned
- Not suitable for factual QA or reasoning-heavy tasks
- Limited context window (256 tokens)
---
## πŸš€ Usage
### Load and generate text
```python
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
model_id = "shivamsharma120120/TinyWay-1.0.0"
config = AutoConfig.from_pretrained(
model_id,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
config=config,
trust_remote_code=True
)
inputs = tokenizer("Once upon a time", return_tensors="pt")
output = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.8,
top_p=0.95,
do_sample=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))