File size: 2,502 Bytes

---
license: apache-2.0
language: en
tags:
- causal-lm
- from-scratch
- transformer
- tiny-stories
- pytorch
- custom-architecture
- text-generation
---

# TinyWay 1.0.0

**TinyWay 1.0.0** is a **52.94M parameter GPT-style causal language model** trained **from scratch** on the **TinyStories** dataset.  
The model is designed for **lightweight story generation, research, and educational exploration** of decoder-only Transformer architectures.

Unlike fine-tuned models, TinyWay was **implemented, trained, serialized, and released end-to-end**, including a **custom Hugging Face-compatible architecture**.

---

## 🔍 Model Overview

| Attribute | Value |
|---------|------|
| Architecture | Decoder-only Transformer (GPT-style) |
| Parameters | **52.94M** |
| Layers | 8 |
| Hidden size | 384 |
| Attention heads | 8 |
| Context length | 256 tokens |
| Tokenizer | GPT-2 BPE |
| Framework | PyTorch |
| Precision | FP16 (AMP during training) |

---

## 📚 Training Details

- **Dataset**: TinyStories (text file, streamed)
- **Training strategy**: Streaming token dataset
- **Epochs**: 1
- **Effective batch size**: 64  
- **Optimizer**: AdamW  
- **Learning rate**: 3e-4  
- **Dropout**: 0.1  
- **Hardware**: NVIDIA Tesla P100 (16GB)  
- **Environment**: Kaggle  

The model was trained using **causal language modeling**, predicting the next token given previous tokens.

---

## 🎯 Intended Use

TinyWay is suitable for:

- Short story generation
- Educational demonstrations of Transformer internals
- Research on small-scale language models
- Understanding end-to-end LLM construction

---

## ⚠️ Limitations

- Trained only on narrative-style data (TinyStories)
- Not instruction-tuned
- Not suitable for factual QA or reasoning-heavy tasks
- Limited context window (256 tokens)

---

## 🚀 Usage

### Load and generate text

```python
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

model_id = "shivamsharma120120/TinyWay-1.0.0"

config = AutoConfig.from_pretrained(
    model_id,
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    config=config,
    trust_remote_code=True
)

inputs = tokenizer("Once upon a time", return_tensors="pt")

output = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.95,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))