--- language: en license: mit tags: - character-level - language-model - makemore --- # Makemore MLP A character-level MLP language model trained on the [names dataset](https://github.com/karpathy/makemore/blob/master/names.txt), built following Andrej Karpathy's [makemore series](https://github.com/karpathy/makemore). The model learns to generate human-like first names by predicting the next character from the previous `block_size` characters. ## Architecture - **Embedding table** `C`: maps each character to a learned `emb_dim`-dimensional vector - **Hidden layer** `W1`: linear + tanh, input size `block_size * emb_dim` - **Output layer** `W2`: projects to `vocab_size` logits (27 classes: a–z + end token) | Hyperparameter | Value | |---|---| | block_size | 3 | | emb_dim | 10 | | hidden_dim | 200 | | vocab_size | 27 | Train loss: ~2.18 · Dev loss: ~2.20 ## Usage ```python import torch import torch.nn.functional as F from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("iand666/makemore-mlp", trust_remote_code=True) tok = AutoTokenizer.from_pretrained("iand666/makemore-mlp", trust_remote_code=True) model.eval() def gen_name(model, tok, block_size=3): context = [0] * block_size name = "" with torch.no_grad(): while True: x = torch.tensor([context]) logits = model(x)["logits"] probs = F.softmax(logits, dim=-1) nxt = int(torch.multinomial(probs[0], num_samples=1).item()) context = context[1:] + [nxt] if nxt == 0: break name += tok.convert_ids_to_tokens(nxt) return name for _ in range(10): print(gen_name(model, tok)) ```