Shivam Sharma commited on
Commit
0ca8c65
·
verified ·
1 Parent(s): 3d56b4c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: en
4
+ tags:
5
+ - causal-lm
6
+ - from-scratch
7
+ - transformer
8
+ - tiny-stories
9
+ - pytorch
10
+ - custom-architecture
11
+ - text-generation
12
+ ---
13
+
14
+ # TinyWay 1.0.0
15
+
16
+ **TinyWay 1.0.0** is a **52.94M parameter GPT-style causal language model** trained **from scratch** on the **TinyStories** dataset.
17
+ The model is designed for **lightweight story generation, research, and educational exploration** of decoder-only Transformer architectures.
18
+
19
+ Unlike fine-tuned models, TinyWay was **implemented, trained, serialized, and released end-to-end**, including a **custom Hugging Face-compatible architecture**.
20
+
21
+ ---
22
+
23
+ ## 🔍 Model Overview
24
+
25
+ | Attribute | Value |
26
+ |---------|------|
27
+ | Architecture | Decoder-only Transformer (GPT-style) |
28
+ | Parameters | **52.94M** |
29
+ | Layers | 8 |
30
+ | Hidden size | 384 |
31
+ | Attention heads | 8 |
32
+ | Context length | 256 tokens |
33
+ | Tokenizer | GPT-2 BPE |
34
+ | Framework | PyTorch |
35
+ | Precision | FP16 (AMP during training) |
36
+
37
+ ---
38
+
39
+ ## 📚 Training Details
40
+
41
+ - **Dataset**: TinyStories (text file, streamed)
42
+ - **Training strategy**: Streaming token dataset
43
+ - **Epochs**: 1
44
+ - **Effective batch size**: 64
45
+ - **Optimizer**: AdamW
46
+ - **Learning rate**: 3e-4
47
+ - **Dropout**: 0.1
48
+ - **Hardware**: NVIDIA Tesla P100 (16GB)
49
+ - **Environment**: Kaggle
50
+
51
+ The model was trained using **causal language modeling**, predicting the next token given previous tokens.
52
+
53
+ ---
54
+
55
+ ## 🎯 Intended Use
56
+
57
+ TinyWay is suitable for:
58
+
59
+ - Short story generation
60
+ - Educational demonstrations of Transformer internals
61
+ - Research on small-scale language models
62
+ - Understanding end-to-end LLM construction
63
+
64
+ ---
65
+
66
+ ## ⚠️ Limitations
67
+
68
+ - Trained only on narrative-style data (TinyStories)
69
+ - Not instruction-tuned
70
+ - Not suitable for factual QA or reasoning-heavy tasks
71
+ - Limited context window (256 tokens)
72
+
73
+ ---
74
+
75
+ ## 🚀 Usage
76
+
77
+ ### Load and generate text
78
+
79
+ ```python
80
+ from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
81
+
82
+ model_id = "YOUR_USERNAME/TinyWay-1.0.0"
83
+
84
+ config = AutoConfig.from_pretrained(
85
+ model_id,
86
+ trust_remote_code=True
87
+ )
88
+
89
+ tokenizer = AutoTokenizer.from_pretrained(
90
+ model_id,
91
+ trust_remote_code=True
92
+ )
93
+
94
+ model = AutoModelForCausalLM.from_pretrained(
95
+ model_id,
96
+ config=config,
97
+ trust_remote_code=True
98
+ )
99
+
100
+ inputs = tokenizer("Once upon a time", return_tensors="pt")
101
+
102
+ output = model.generate(
103
+ **inputs,
104
+ max_new_tokens=100,
105
+ temperature=0.8,
106
+ top_p=0.95,
107
+ do_sample=True
108
+ )
109
+
110
+ print(tokenizer.decode(output[0], skip_special_tokens=True))