Toy Transformer trained on Alice in Wonderland

A tiny GPT-style decoder-only Transformer trained on Lewis Carroll's Alice's Adventures in Wonderland for the L5 Transformers lecture of a Master's-level Deep Learning and Generative AI course.

Architecture

Hyperparameter Value
d_model 32
n_heads 4
n_layers 2
d_ff 128
max_seq_len 32
vocabulary size 2711
trainable parameters 111,904
positional encoding sinusoidal
residual placement post-norm
output projection tied to input embedding

Files

  • model.pt โ€” PyTorch state dict
  • vocab.json โ€” token-to-id mapping (itos, stoi)
  • config.json โ€” architecture hyperparameters
  • loss_curve.npy โ€” training loss per step

Loading

from huggingface_hub import hf_hub_download
import torch, json

state_path  = hf_hub_download('luhres/toy-transformer-alice', 'model.pt')
config_path = hf_hub_download('luhres/toy-transformer-alice', 'config.json')
vocab_path  = hf_hub_download('luhres/toy-transformer-alice', 'vocab.json')

config = json.load(open(config_path))
vocab  = json.load(open(vocab_path))
# instantiate ToyTransformer with config and call model.load_state_dict(torch.load(state_path))

Intended for educational use. Not suitable for any real-world application.

Downloads last month
104
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support