|
|
--- |
|
|
license: apache-2.0 |
|
|
language: en |
|
|
tags: |
|
|
- causal-lm |
|
|
- from-scratch |
|
|
- transformer |
|
|
- tiny-stories |
|
|
- pytorch |
|
|
- custom-architecture |
|
|
- text-generation |
|
|
--- |
|
|
|
|
|
# TinyWay 1.0.0 |
|
|
|
|
|
**TinyWay 1.0.0** is a **52.94M parameter GPT-style causal language model** trained **from scratch** on the **TinyStories** dataset. |
|
|
The model is designed for **lightweight story generation, research, and educational exploration** of decoder-only Transformer architectures. |
|
|
|
|
|
Unlike fine-tuned models, TinyWay was **implemented, trained, serialized, and released end-to-end**, including a **custom Hugging Face-compatible architecture**. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Model Overview |
|
|
|
|
|
| Attribute | Value | |
|
|
|---------|------| |
|
|
| Architecture | Decoder-only Transformer (GPT-style) | |
|
|
| Parameters | **52.94M** | |
|
|
| Layers | 8 | |
|
|
| Hidden size | 384 | |
|
|
| Attention heads | 8 | |
|
|
| Context length | 256 tokens | |
|
|
| Tokenizer | GPT-2 BPE | |
|
|
| Framework | PyTorch | |
|
|
| Precision | FP16 (AMP during training) | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Training Details |
|
|
|
|
|
- **Dataset**: TinyStories (text file, streamed) |
|
|
- **Training strategy**: Streaming token dataset |
|
|
- **Epochs**: 1 |
|
|
- **Effective batch size**: 64 |
|
|
- **Optimizer**: AdamW |
|
|
- **Learning rate**: 3e-4 |
|
|
- **Dropout**: 0.1 |
|
|
- **Hardware**: NVIDIA Tesla P100 (16GB) |
|
|
- **Environment**: Kaggle |
|
|
|
|
|
The model was trained using **causal language modeling**, predicting the next token given previous tokens. |
|
|
|
|
|
--- |
|
|
|
|
|
## π― Intended Use |
|
|
|
|
|
TinyWay is suitable for: |
|
|
|
|
|
- Short story generation |
|
|
- Educational demonstrations of Transformer internals |
|
|
- Research on small-scale language models |
|
|
- Understanding end-to-end LLM construction |
|
|
|
|
|
--- |
|
|
|
|
|
## β οΈ Limitations |
|
|
|
|
|
- Trained only on narrative-style data (TinyStories) |
|
|
- Not instruction-tuned |
|
|
- Not suitable for factual QA or reasoning-heavy tasks |
|
|
- Limited context window (256 tokens) |
|
|
|
|
|
--- |
|
|
|
|
|
## π Usage |
|
|
|
|
|
### Load and generate text |
|
|
|
|
|
```python |
|
|
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
model_id = "shivamsharma120120/TinyWay-1.0.0" |
|
|
|
|
|
config = AutoConfig.from_pretrained( |
|
|
model_id, |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
model_id, |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
config=config, |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
inputs = tokenizer("Once upon a time", return_tensors="pt") |
|
|
|
|
|
output = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=100, |
|
|
temperature=0.8, |
|
|
top_p=0.95, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
|
|