Toy Transformer trained on Alice in Wonderland
A tiny GPT-style decoder-only Transformer trained on Lewis Carroll's Alice's Adventures in Wonderland for the L5 Transformers lecture of a Master's-level Deep Learning and Generative AI course.
Architecture
| Hyperparameter | Value |
|---|---|
| d_model | 32 |
| n_heads | 4 |
| n_layers | 2 |
| d_ff | 128 |
| max_seq_len | 32 |
| vocabulary size | 2711 |
| trainable parameters | 111,904 |
| positional encoding | sinusoidal |
| residual placement | post-norm |
| output projection | tied to input embedding |
Files
model.ptโ PyTorch state dictvocab.jsonโ token-to-id mapping (itos,stoi)config.jsonโ architecture hyperparametersloss_curve.npyโ training loss per step
Loading
from huggingface_hub import hf_hub_download
import torch, json
state_path = hf_hub_download('luhres/toy-transformer-alice', 'model.pt')
config_path = hf_hub_download('luhres/toy-transformer-alice', 'config.json')
vocab_path = hf_hub_download('luhres/toy-transformer-alice', 'vocab.json')
config = json.load(open(config_path))
vocab = json.load(open(vocab_path))
# instantiate ToyTransformer with config and call model.load_state_dict(torch.load(state_path))
Intended for educational use. Not suitable for any real-world application.
- Downloads last month
- 104
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support