---
language: en
license: mit
tags:
  - character-level
  - language-model
  - makemore
---

# Makemore MLP

A character-level MLP language model trained on the [names dataset](https://github.com/karpathy/makemore/blob/master/names.txt), built following Andrej Karpathy's [makemore series](https://github.com/karpathy/makemore).

The model learns to generate human-like first names by predicting the next character from the previous `block_size` characters.

## Architecture

- **Embedding table** `C`: maps each character to a learned `emb_dim`-dimensional vector
- **Hidden layer** `W1`: linear + tanh, input size `block_size * emb_dim`
- **Output layer** `W2`: projects to `vocab_size` logits (27 classes: a–z + end token)

| Hyperparameter | Value |
|---|---|
| block_size | 3 |
| emb_dim | 10 |
| hidden_dim | 200 |
| vocab_size | 27 |

Train loss: ~2.18 · Dev loss: ~2.20

## Usage

```python
import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
tok   = AutoTokenizer.from_pretrained("iand666/makemore-mlp", trust_remote_code=True)
model.eval()

def gen_name(model, tok, block_size=3):
    context = [0] * block_size
    name = ""
    with torch.no_grad():
        while True:
            x = torch.tensor([context])
            logits = model(x)["logits"]
            probs = F.softmax(logits, dim=-1)
            nxt = int(torch.multinomial(probs[0], num_samples=1).item())
            context = context[1:] + [nxt]
            if nxt == 0:
                break
            name += tok.convert_ids_to_tokens(nxt)
    return name

for _ in range(10):
    print(gen_name(model, tok))
```