File size: 1,745 Bytes
6e01046
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cad5671
6e01046
cad5671
be8f2be
6e01046
 
9e25bfd
6e01046
 
 
 
 
 
 
55f9c2f
6e01046
 
 
9e25bfd
6e01046
 
 
9e25bfd
6e01046
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
language: en
license: mit
tags:
  - character-level
  - language-model
  - makemore
---

# Makemore MLP

A character-level MLP language model trained on the [names dataset](https://github.com/karpathy/makemore/blob/master/names.txt), built following Andrej Karpathy's [makemore series](https://github.com/karpathy/makemore).

The model learns to generate human-like first names by predicting the next character from the previous `block_size` characters.

## Architecture

- **Embedding table** `C`: maps each character to a learned `emb_dim`-dimensional vector
- **Hidden layer** `W1`: linear + tanh, input size `block_size * emb_dim`
- **Output layer** `W2`: projects to `vocab_size` logits (27 classes: a–z + end token)

| Hyperparameter | Value |
|---|---|
| block_size | 3 |
| emb_dim | 10 |
| hidden_dim | 200 |
| vocab_size | 27 |

Train loss: ~2.18 · Dev loss: ~2.20

## Usage

```python
import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
tok   = AutoTokenizer.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
model.eval()

def gen_name(model, tok, block_size=3):
    context = [0] * block_size
    name = ""
    with torch.no_grad():
        while True:
            x = torch.tensor([context])
            logits = model(x)["logits"]
            probs = F.softmax(logits, dim=-1)
            nxt = int(torch.multinomial(probs[0], num_samples=1).item())
            context = context[1:] + [nxt]
            if nxt == 0:
                break
            name += tok.convert_ids_to_tokens(nxt)
    return name

for _ in range(10):
    print(gen_name(model, tok))
```