TinyWay 1.0.0

TinyWay 1.0.0 is a 52.94M parameter GPT-style causal language model trained from scratch on the TinyStories dataset.
The model is designed for lightweight story generation, research, and educational exploration of decoder-only Transformer architectures.

Unlike fine-tuned models, TinyWay was implemented, trained, serialized, and released end-to-end, including a custom Hugging Face-compatible architecture.


πŸ” Model Overview

Attribute Value
Architecture Decoder-only Transformer (GPT-style)
Parameters 52.94M
Layers 8
Hidden size 384
Attention heads 8
Context length 256 tokens
Tokenizer GPT-2 BPE
Framework PyTorch
Precision FP16 (AMP during training)

πŸ“š Training Details

  • Dataset: TinyStories (text file, streamed)
  • Training strategy: Streaming token dataset
  • Epochs: 1
  • Effective batch size: 64
  • Optimizer: AdamW
  • Learning rate: 3e-4
  • Dropout: 0.1
  • Hardware: NVIDIA Tesla P100 (16GB)
  • Environment: Kaggle

The model was trained using causal language modeling, predicting the next token given previous tokens.


🎯 Intended Use

TinyWay is suitable for:

  • Short story generation
  • Educational demonstrations of Transformer internals
  • Research on small-scale language models
  • Understanding end-to-end LLM construction

⚠️ Limitations

  • Trained only on narrative-style data (TinyStories)
  • Not instruction-tuned
  • Not suitable for factual QA or reasoning-heavy tasks
  • Limited context window (256 tokens)

πŸš€ Usage

Load and generate text

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

model_id = "shivamsharma120120/TinyWay-1.0.0"

config = AutoConfig.from_pretrained(
    model_id,
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    config=config,
    trust_remote_code=True
)

inputs = tokenizer("Once upon a time", return_tensors="pt")

output = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.95,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support