Makemore MLP
A character-level MLP language model trained on the names dataset, built following Andrej Karpathy's makemore series.
The model learns to generate human-like first names by predicting the next character from the previous block_size characters.
Architecture
- Embedding table
C: maps each character to a learnedemb_dim-dimensional vector - Hidden layer
W1: linear + tanh, input sizeblock_size * emb_dim - Output layer
W2: projects tovocab_sizelogits (27 classes: a–z + end token)
| Hyperparameter | Value |
|---|---|
| block_size | 3 |
| emb_dim | 10 |
| hidden_dim | 200 |
| vocab_size | 27 |
Train loss: ~2.18 · Dev loss: ~2.20
Usage
import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
model.eval()
def gen_name(model, tok, block_size=3):
context = [0] * block_size
name = ""
with torch.no_grad():
while True:
x = torch.tensor([context])
logits = model(x)["logits"]
probs = F.softmax(logits, dim=-1)
nxt = int(torch.multinomial(probs[0], num_samples=1).item())
context = context[1:] + [nxt]
if nxt == 0:
break
name += tok.convert_ids_to_tokens(nxt)
return name
for _ in range(10):
print(gen_name(model, tok))
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support