wordle-solver / README.md
sato2ru's picture
Upload README.md with huggingface_hub
0028132 verified
---
language: en
tags:
- wordle
- pytorch
- reinforcement-learning
- supervised-learning
- game-ai
- nlp
license: mit
---
# 🟩 Wordle AI Solver
Neural network models for solving Wordle puzzles. This repo contains two models β€” a supervised baseline and a reinforcement learning variant β€” both deployable via the [live app](https://wordle-solver-tan.vercel.app).
---
## Files
| File | Description |
|------|-------------|
| `model_weights.pt` | Supervised model (WordleNet) |
| `config.json` | Supervised model config |
| `rl_model_weights.pt` | RL model (REINFORCE-filtered) |
| `rl_config.json` | RL model config |
| `answers.json` | 2,315 valid Wordle answers |
| `allowed.json` | 12,972 valid guess words |
---
## Model Comparison
| | 🧠 Supervised | πŸ€– Reinforcement |
|---|---|---|
| **Training method** | CrossEntropy on entropy-optimal games | REINFORCE with elite game filtering |
| **Win rate** | 100% | 98.2% |
| **Avg guesses** | 3.46 | 3.75 |
| **Opener** | CRANE | CRANE |
| **Parameters** | ~13M | ~13M |
---
## Architecture
Both models share the same encoder:
```
Input: 390-dim binary vector
(26 letters Γ— 5 positions Γ— 3 states: grey/yellow/green)
Hidden: Linear(390 β†’ 512) β†’ BatchNorm1d β†’ ReLU β†’ Dropout(0.3)
Linear(512 β†’ 512) β†’ BatchNorm1d β†’ ReLU β†’ Dropout(0.3)
Linear(512 β†’ 256) β†’ BatchNorm1d β†’ ReLU
Output: Linear(256 β†’ 12972)
logits over all 12,972 allowed guess words
```
Board encoding:
```python
vec[letter_index * 15 + position * 3 + state] = 1.0
# letter_index: 0-25 (a-z)
# position: 0-4
# state: 0=grey, 1=yellow, 2=green
```
---
## Training
### Supervised Model
Trained on ~10,000 (board_state, best_guess) pairs generated by an entropy-optimal solver that plays all 2,315 Wordle games. The solver picks the guess maximising expected information gain at each step:
$$E[\text{Info}] = \sum_{p} P(p) \cdot \log_2\left(\frac{1}{P(p)}\right)$$
### RL Model
1. **Warm start** from supervised weights
2. **Elite game collection** β€” greedy rollouts with constraint-filtered action masking, keeping only games solved in ≀3 guesses (~11% hit rate)
3. **REINFORCE training** β€” supervised loss on elite (state, action) pairs
4. **Benchmark** against all 2,315 answers using constraint-filtered suggestion logic
The RL model learns purely from reward signal (win/lose, guesses used) without access to the entropy oracle used to train the supervised model.
---
## Inference
The models are not used as raw classifiers β€” the backend combines model logits with constraint filtering:
```python
# 1. Get top-20 model words
logits = model(encode_board(history))
model_words = [ALLOWED[i] for i in logits.topk(20).indices]
# 2. Filter to words consistent with all previous guesses
possible = filter_words(ANSWERS, history)
# 3. Score by entropy against remaining possible set
candidates = model_words + possible
best = max(candidates, key=lambda w: entropy_score(w, possible))
```
This hybrid approach is why the supervised model achieves 100% β€” the neural net narrows the search, entropy scoring picks the optimal move.
---
## Usage
```python
import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download
import json
REPO_ID = "sato2ru/wordle-solver"
config = json.load(open(hf_hub_download(REPO_ID, "config.json")))
ALLOWED = json.load(open(hf_hub_download(REPO_ID, "allowed.json")))
class WordleNet(nn.Module):
def __init__(self):
super().__init__()
h = config["hidden"]
self.net = nn.Sequential(
nn.Linear(390, h), nn.BatchNorm1d(h), nn.ReLU(), nn.Dropout(0.3),
nn.Linear(h, h), nn.BatchNorm1d(h), nn.ReLU(), nn.Dropout(0.3),
nn.Linear(h, 256), nn.BatchNorm1d(256), nn.ReLU(),
nn.Linear(256, 12972)
)
def forward(self, x): return self.net(x)
# Load supervised model
model = WordleNet()
model.load_state_dict(
torch.load(hf_hub_download(REPO_ID, "model_weights.pt"), map_location="cpu")
)
model.eval()
```
Or use the live API directly:
```bash
curl -X POST "https://web-production-ea1d.up.railway.app/suggest?model=supervised" \
-H "Content-Type: application/json" \
-d '{"history": []}'
curl -X POST "https://web-production-ea1d.up.railway.app/suggest?model=rl" \
-H "Content-Type: application/json" \
-d '{"history": []}'
```
---
## Results
### Supervised β€” all 2,315 answers (greedy + entropy filter)
```
1 guess : 1
2 guesses: 59 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3 guesses: 1188 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
4 guesses: 1010 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
5 guesses: 56 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
6 guesses: 1
FAILED : 0 βœ… 100% win rate
```
### RL β€” all 2,315 answers (greedy + entropy filter)
```
1 guess : 1
2 guesses: 141 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3 guesses: 810 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
4 guesses: 893 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
5 guesses: 343 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
6 guesses: 86 β–ˆβ–ˆβ–ˆβ–ˆ
FAILED : 41 βœ… 98.2% win rate
```
---
## Links
- **Live App:** [wordle-solver-tan.vercel.app](https://wordle-solver-tan.vercel.app)
- **GitHub:** [github.com/Jeanwrld/wordle-solver](https://github.com/Jeanwrld/wordle-solver)
- **Backend:** [github.com/Jeanwrld/wordle-api](https://github.com/Jeanwrld/wordle-api)
- **Gradio Demo:** [huggingface.co/spaces/sato2ru/wordle](https://huggingface.co/spaces/sato2ru/wordle)
---
## License
MIT