Makemore MLP

A character-level MLP language model trained on the names dataset, built following Andrej Karpathy's makemore series.

The model learns to generate human-like first names by predicting the next character from the previous block_size characters.

Architecture

  • Embedding table C: maps each character to a learned emb_dim-dimensional vector
  • Hidden layer W1: linear + tanh, input size block_size * emb_dim
  • Output layer W2: projects to vocab_size logits (27 classes: a–z + end token)
Hyperparameter Value
block_size 3
emb_dim 10
hidden_dim 200
vocab_size 27

Train loss: ~2.18 · Dev loss: ~2.20

Usage

import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
tok   = AutoTokenizer.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
model.eval()

def gen_name(model, tok, block_size=3):
    context = [0] * block_size
    name = ""
    with torch.no_grad():
        while True:
            x = torch.tensor([context])
            logits = model(x)["logits"]
            probs = F.softmax(logits, dim=-1)
            nxt = int(torch.multinomial(probs[0], num_samples=1).item())
            context = context[1:] + [nxt]
            if nxt == 0:
                break
            name += tok.convert_ids_to_tokens(nxt)
    return name

for _ in range(10):
    print(gen_name(model, tok))
Downloads last month
-
Safetensors
Model size
11.9k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support