Toy Transformer trained on Alice in Wonderland

A tiny GPT-style decoder-only Transformer trained on Lewis Carroll's Alice's Adventures in Wonderland for the L5 Transformers lecture of a Master's-level Deep Learning and Generative AI course.

Architecture

Hyperparameter	Value
d_model	32
n_heads	4
n_layers	2
d_ff	128
max_seq_len	32
vocabulary size	2711
trainable parameters	111,904
positional encoding	sinusoidal
residual placement	post-norm
output projection	tied to input embedding

Files

model.pt — PyTorch state dict
vocab.json — token-to-id mapping (itos, stoi)
config.json — architecture hyperparameters
loss_curve.npy — training loss per step

Loading

from huggingface_hub import hf_hub_download
import torch, json

state_path  = hf_hub_download('luhres/toy-transformer-alice', 'model.pt')
config_path = hf_hub_download('luhres/toy-transformer-alice', 'config.json')
vocab_path  = hf_hub_download('luhres/toy-transformer-alice', 'vocab.json')

config = json.load(open(config_path))
vocab  = json.load(open(vocab_path))
# instantiate ToyTransformer with config and call model.load_state_dict(torch.load(state_path))

Intended for educational use. Not suitable for any real-world application.

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support