| --- |
| language: en |
| tags: |
| - wordle |
| - pytorch |
| - reinforcement-learning |
| - supervised-learning |
| - game-ai |
| - nlp |
| license: mit |
| --- |
| |
| # π© Wordle AI Solver |
|
|
| Neural network models for solving Wordle puzzles. This repo contains two models β a supervised baseline and a reinforcement learning variant β both deployable via the [live app](https://wordle-solver-tan.vercel.app). |
|
|
| --- |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `model_weights.pt` | Supervised model (WordleNet) | |
| | `config.json` | Supervised model config | |
| | `rl_model_weights.pt` | RL model (REINFORCE-filtered) | |
| | `rl_config.json` | RL model config | |
| | `answers.json` | 2,315 valid Wordle answers | |
| | `allowed.json` | 12,972 valid guess words | |
|
|
| --- |
|
|
| ## Model Comparison |
|
|
| | | π§ Supervised | π€ Reinforcement | |
| |---|---|---| |
| | **Training method** | CrossEntropy on entropy-optimal games | REINFORCE with elite game filtering | |
| | **Win rate** | 100% | 98.2% | |
| | **Avg guesses** | 3.46 | 3.75 | |
| | **Opener** | CRANE | CRANE | |
| | **Parameters** | ~13M | ~13M | |
|
|
| --- |
|
|
| ## Architecture |
|
|
| Both models share the same encoder: |
|
|
| ``` |
| Input: 390-dim binary vector |
| (26 letters Γ 5 positions Γ 3 states: grey/yellow/green) |
| |
| Hidden: Linear(390 β 512) β BatchNorm1d β ReLU β Dropout(0.3) |
| Linear(512 β 512) β BatchNorm1d β ReLU β Dropout(0.3) |
| Linear(512 β 256) β BatchNorm1d β ReLU |
| |
| Output: Linear(256 β 12972) |
| logits over all 12,972 allowed guess words |
| ``` |
|
|
| Board encoding: |
| ```python |
| vec[letter_index * 15 + position * 3 + state] = 1.0 |
| # letter_index: 0-25 (a-z) |
| # position: 0-4 |
| # state: 0=grey, 1=yellow, 2=green |
| ``` |
|
|
| --- |
|
|
| ## Training |
|
|
| ### Supervised Model |
| Trained on ~10,000 (board_state, best_guess) pairs generated by an entropy-optimal solver that plays all 2,315 Wordle games. The solver picks the guess maximising expected information gain at each step: |
|
|
| $$E[\text{Info}] = \sum_{p} P(p) \cdot \log_2\left(\frac{1}{P(p)}\right)$$ |
|
|
| ### RL Model |
| 1. **Warm start** from supervised weights |
| 2. **Elite game collection** β greedy rollouts with constraint-filtered action masking, keeping only games solved in β€3 guesses (~11% hit rate) |
| 3. **REINFORCE training** β supervised loss on elite (state, action) pairs |
| 4. **Benchmark** against all 2,315 answers using constraint-filtered suggestion logic |
|
|
| The RL model learns purely from reward signal (win/lose, guesses used) without access to the entropy oracle used to train the supervised model. |
|
|
| --- |
|
|
| ## Inference |
|
|
| The models are not used as raw classifiers β the backend combines model logits with constraint filtering: |
|
|
| ```python |
| # 1. Get top-20 model words |
| logits = model(encode_board(history)) |
| model_words = [ALLOWED[i] for i in logits.topk(20).indices] |
| |
| # 2. Filter to words consistent with all previous guesses |
| possible = filter_words(ANSWERS, history) |
| |
| # 3. Score by entropy against remaining possible set |
| candidates = model_words + possible |
| best = max(candidates, key=lambda w: entropy_score(w, possible)) |
| ``` |
|
|
| This hybrid approach is why the supervised model achieves 100% β the neural net narrows the search, entropy scoring picks the optimal move. |
|
|
| --- |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| import torch.nn as nn |
| from huggingface_hub import hf_hub_download |
| import json |
| |
| REPO_ID = "sato2ru/wordle-solver" |
| |
| config = json.load(open(hf_hub_download(REPO_ID, "config.json"))) |
| ALLOWED = json.load(open(hf_hub_download(REPO_ID, "allowed.json"))) |
| |
| class WordleNet(nn.Module): |
| def __init__(self): |
| super().__init__() |
| h = config["hidden"] |
| self.net = nn.Sequential( |
| nn.Linear(390, h), nn.BatchNorm1d(h), nn.ReLU(), nn.Dropout(0.3), |
| nn.Linear(h, h), nn.BatchNorm1d(h), nn.ReLU(), nn.Dropout(0.3), |
| nn.Linear(h, 256), nn.BatchNorm1d(256), nn.ReLU(), |
| nn.Linear(256, 12972) |
| ) |
| def forward(self, x): return self.net(x) |
| |
| # Load supervised model |
| model = WordleNet() |
| model.load_state_dict( |
| torch.load(hf_hub_download(REPO_ID, "model_weights.pt"), map_location="cpu") |
| ) |
| model.eval() |
| ``` |
|
|
| Or use the live API directly: |
| ```bash |
| curl -X POST "https://web-production-ea1d.up.railway.app/suggest?model=supervised" \ |
| -H "Content-Type: application/json" \ |
| -d '{"history": []}' |
| |
| curl -X POST "https://web-production-ea1d.up.railway.app/suggest?model=rl" \ |
| -H "Content-Type: application/json" \ |
| -d '{"history": []}' |
| ``` |
|
|
| --- |
|
|
| ## Results |
|
|
| ### Supervised β all 2,315 answers (greedy + entropy filter) |
| ``` |
| 1 guess : 1 |
| 2 guesses: 59 ββββββββββββ |
| 3 guesses: 1188 ββββββββββββββββββββββββββββββββββββββββββββββ |
| 4 guesses: 1010 ββββββββββββββββββββββββββββββββββββββββ |
| 5 guesses: 56 βββββββββββ |
| 6 guesses: 1 |
| FAILED : 0 β
100% win rate |
| ``` |
|
|
| ### RL β all 2,315 answers (greedy + entropy filter) |
| ``` |
| 1 guess : 1 |
| 2 guesses: 141 ββββββββββββ |
| 3 guesses: 810 ββββββββββββββββββββββββββββββββββββββββββββββ |
| 4 guesses: 893 ββββββββββββββββββββββββββββββββββββββββ |
| 5 guesses: 343 βββββββββββ |
| 6 guesses: 86 ββββ |
| FAILED : 41 β
98.2% win rate |
| ``` |
|
|
| --- |
|
|
| ## Links |
|
|
| - **Live App:** [wordle-solver-tan.vercel.app](https://wordle-solver-tan.vercel.app) |
| - **GitHub:** [github.com/Jeanwrld/wordle-solver](https://github.com/Jeanwrld/wordle-solver) |
| - **Backend:** [github.com/Jeanwrld/wordle-api](https://github.com/Jeanwrld/wordle-api) |
| - **Gradio Demo:** [huggingface.co/spaces/sato2ru/wordle](https://huggingface.co/spaces/sato2ru/wordle) |
|
|
| --- |
|
|
| ## License |
|
|
| MIT |
|
|