NNEngine
/

TinyWay-1.0.0

Text Generation

custom-architecture

Model card Files Files and versions

Shivam Sharma commited on 5 days ago

Commit

0ca8c65

·

verified ·

1 Parent(s): 3d56b4c

Create README.md

Files changed (1) hide show

README.md +110 -0

README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+---
+license: apache-2.0
+language: en
+tags:
+- causal-lm
+- from-scratch
+- transformer
+- tiny-stories
+- pytorch
+- custom-architecture
+- text-generation
+---
+# TinyWay 1.0.0
+**TinyWay 1.0.0** is a **52.94M parameter GPT-style causal language model** trained **from scratch** on the **TinyStories** dataset.
+The model is designed for **lightweight story generation, research, and educational exploration** of decoder-only Transformer architectures.
+Unlike fine-tuned models, TinyWay was **implemented, trained, serialized, and released end-to-end**, including a **custom Hugging Face-compatible architecture**.
+---
+## 🔍 Model Overview
+| Attribute | Value |
+|---------|------|
+| Architecture | Decoder-only Transformer (GPT-style) |
+| Parameters | **52.94M** |
+| Layers | 8 |
+| Hidden size | 384 |
+| Attention heads | 8 |
+| Context length | 256 tokens |
+| Tokenizer | GPT-2 BPE |
+| Framework | PyTorch |
+| Precision | FP16 (AMP during training) |
+---
+## 📚 Training Details
+- **Dataset**: TinyStories (text file, streamed)
+- **Training strategy**: Streaming token dataset
+- **Epochs**: 1
+- **Effective batch size**: 64
+- **Optimizer**: AdamW
+- **Learning rate**: 3e-4
+- **Dropout**: 0.1
+- **Hardware**: NVIDIA Tesla P100 (16GB)
+- **Environment**: Kaggle
+The model was trained using **causal language modeling**, predicting the next token given previous tokens.
+---
+## 🎯 Intended Use
+TinyWay is suitable for:
+- Short story generation
+- Educational demonstrations of Transformer internals
+- Research on small-scale language models
+- Understanding end-to-end LLM construction
+---
+## ⚠️ Limitations
+- Trained only on narrative-style data (TinyStories)
+- Not instruction-tuned
+- Not suitable for factual QA or reasoning-heavy tasks
+- Limited context window (256 tokens)
+---
+## 🚀 Usage
+### Load and generate text
+```python
+from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
+model_id = "YOUR_USERNAME/TinyWay-1.0.0"
+config = AutoConfig.from_pretrained(
+    model_id,
+    trust_remote_code=True
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    model_id,
+    trust_remote_code=True
+)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    config=config,
+    trust_remote_code=True
+)
+inputs = tokenizer("Once upon a time", return_tensors="pt")
+output = model.generate(
+    **inputs,
+    max_new_tokens=100,
+    temperature=0.8,
+    top_p=0.95,
+    do_sample=True
+)
+print(tokenizer.decode(output[0], skip_special_tokens=True))