File size: 1,745 Bytes
6e01046 cad5671 6e01046 cad5671 be8f2be 6e01046 9e25bfd 6e01046 55f9c2f 6e01046 9e25bfd 6e01046 9e25bfd 6e01046 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | ---
language: en
license: mit
tags:
- character-level
- language-model
- makemore
---
# Makemore MLP
A character-level MLP language model trained on the [names dataset](https://github.com/karpathy/makemore/blob/master/names.txt), built following Andrej Karpathy's [makemore series](https://github.com/karpathy/makemore).
The model learns to generate human-like first names by predicting the next character from the previous `block_size` characters.
## Architecture
- **Embedding table** `C`: maps each character to a learned `emb_dim`-dimensional vector
- **Hidden layer** `W1`: linear + tanh, input size `block_size * emb_dim`
- **Output layer** `W2`: projects to `vocab_size` logits (27 classes: a–z + end token)
| Hyperparameter | Value |
|---|---|
| block_size | 3 |
| emb_dim | 10 |
| hidden_dim | 200 |
| vocab_size | 27 |
Train loss: ~2.18 · Dev loss: ~2.20
## Usage
```python
import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
model.eval()
def gen_name(model, tok, block_size=3):
context = [0] * block_size
name = ""
with torch.no_grad():
while True:
x = torch.tensor([context])
logits = model(x)["logits"]
probs = F.softmax(logits, dim=-1)
nxt = int(torch.multinomial(probs[0], num_samples=1).item())
context = context[1:] + [nxt]
if nxt == 0:
break
name += tok.convert_ids_to_tokens(nxt)
return name
for _ in range(10):
print(gen_name(model, tok))
```
|