File size: 2,502 Bytes
0ca8c65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e4e2116
0ca8c65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: apache-2.0
language: en
tags:
- causal-lm
- from-scratch
- transformer
- tiny-stories
- pytorch
- custom-architecture
- text-generation
---

# TinyWay 1.0.0

**TinyWay 1.0.0** is a **52.94M parameter GPT-style causal language model** trained **from scratch** on the **TinyStories** dataset.  
The model is designed for **lightweight story generation, research, and educational exploration** of decoder-only Transformer architectures.

Unlike fine-tuned models, TinyWay was **implemented, trained, serialized, and released end-to-end**, including a **custom Hugging Face-compatible architecture**.

---

## πŸ” Model Overview

| Attribute | Value |
|---------|------|
| Architecture | Decoder-only Transformer (GPT-style) |
| Parameters | **52.94M** |
| Layers | 8 |
| Hidden size | 384 |
| Attention heads | 8 |
| Context length | 256 tokens |
| Tokenizer | GPT-2 BPE |
| Framework | PyTorch |
| Precision | FP16 (AMP during training) |

---

## πŸ“š Training Details

- **Dataset**: TinyStories (text file, streamed)
- **Training strategy**: Streaming token dataset
- **Epochs**: 1
- **Effective batch size**: 64  
- **Optimizer**: AdamW  
- **Learning rate**: 3e-4  
- **Dropout**: 0.1  
- **Hardware**: NVIDIA Tesla P100 (16GB)  
- **Environment**: Kaggle  

The model was trained using **causal language modeling**, predicting the next token given previous tokens.

---

## 🎯 Intended Use

TinyWay is suitable for:

- Short story generation
- Educational demonstrations of Transformer internals
- Research on small-scale language models
- Understanding end-to-end LLM construction

---

## ⚠️ Limitations

- Trained only on narrative-style data (TinyStories)
- Not instruction-tuned
- Not suitable for factual QA or reasoning-heavy tasks
- Limited context window (256 tokens)

---

## πŸš€ Usage

### Load and generate text

```python
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

model_id = "shivamsharma120120/TinyWay-1.0.0"

config = AutoConfig.from_pretrained(
    model_id,
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    config=config,
    trust_remote_code=True
)

inputs = tokenizer("Once upon a time", return_tensors="pt")

output = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.95,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))