Proust Attention Machine

A character-level transformer trained from scratch on Marcel Proust's En busca del tiempo perdido (Spanish translation, all 7 volumes).

Built as an educational project to deeply understand attention mechanisms -- every component was implemented manually, first in NumPy, then ported to PyTorch for training.

Model Details

Parameter Value
Architecture Decoder-only Transformer
Parameters ~420,000
Layers 2
Attention heads 2
Embedding dim 128
Feedforward dim 512
Context window 256 characters
Vocabulary 94 characters (Spanish + punctuation)
Tokenization Character-level

Training

  • Corpus: 7.15M characters from all 7 volumes of Proust's masterwork
  • Best val loss: 1.1739 (at epoch 20)
  • Optimizer: AdamW (lr=3e-4, weight_decay=0.01)
  • Schedule: Linear warmup (500 steps) + cosine decay
  • Hardware: NVIDIA T4 GPU (Google Colab)

Usage

import torch
from model_torch import Transformer
from tokenizer import CharTokenizer

# Load checkpoint
ckpt = torch.load("best.pt", map_location="cpu", weights_only=False)

# Rebuild tokenizer
tokenizer = CharTokenizer()
tokenizer.char_to_idx = ckpt["vocab"]["char_to_idx"]
tokenizer.idx_to_char = {int(k): v for k, v in ckpt["vocab"]["idx_to_char"].items()}
tokenizer.vocab_size = ckpt["vocab"]["vocab_size"]

# Rebuild model
model = Transformer(**ckpt["model_config"])
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Generate
prompt = "Mucho tiempo he estado"
prompt_ids = torch.from_numpy(tokenizer.encode(prompt)).long().unsqueeze(0)
output = model.generate(prompt_ids, max_new_tokens=300, temperature=0.8, top_k=40)
print(tokenizer.decode(output[0].numpy()))

What This Project Demonstrates

  • Manual implementation of scaled dot-product attention
  • Multi-head attention with proper head splitting/concatenation
  • Causal masking for autoregressive generation
  • Sinusoidal positional encoding
  • Layer normalization and residual connections
  • Temperature and top-k sampling strategies

Links

Author

Andres Gonzalez Ortega -- UNAM Actuarial Science graduate exploring ML through first-principles implementation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using GonorAndres/proust-attention 1