Upload README.md with huggingface_hub

0028132 verified 28 days ago

5.91 kB

	---
	language: en
	tags:
	- wordle
	- pytorch
	- reinforcement-learning
	- supervised-learning
	- game-ai
	- nlp
	license: mit
	---

	# 🟩 Wordle AI Solver

	Neural network models for solving Wordle puzzles. This repo contains two models — a supervised baseline and a reinforcement learning variant — both deployable via the [live app](https://wordle-solver-tan.vercel.app).

	---

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `model_weights.pt` \| Supervised model (WordleNet) \|
	\| `config.json` \| Supervised model config \|
	\| `rl_model_weights.pt` \| RL model (REINFORCE-filtered) \|
	\| `rl_config.json` \| RL model config \|
	\| `answers.json` \| 2,315 valid Wordle answers \|
	\| `allowed.json` \| 12,972 valid guess words \|

	---

	## Model Comparison

	\| \| 🧠 Supervised \| 🤖 Reinforcement \|
	\|---\|---\|---\|
	\| Training method \| CrossEntropy on entropy-optimal games \| REINFORCE with elite game filtering \|
	\| Win rate \| 100% \| 98.2% \|
	\| Avg guesses \| 3.46 \| 3.75 \|
	\| Opener \| CRANE \| CRANE \|
	\| Parameters \| ~13M \| ~13M \|

	---

	## Architecture

	Both models share the same encoder:

	```
	Input: 390-dim binary vector
	(26 letters × 5 positions × 3 states: grey/yellow/green)

	Hidden: Linear(390 → 512) → BatchNorm1d → ReLU → Dropout(0.3)
	Linear(512 → 512) → BatchNorm1d → ReLU → Dropout(0.3)
	Linear(512 → 256) → BatchNorm1d → ReLU

	Output: Linear(256 → 12972)
	logits over all 12,972 allowed guess words
	```

	Board encoding:
	```python
	vec[letter_index * 15 + position * 3 + state] = 1.0
	# letter_index: 0-25 (a-z)
	# position: 0-4
	# state: 0=grey, 1=yellow, 2=green
	```

	---

	## Training

	### Supervised Model
	Trained on ~10,000 (board_state, best_guess) pairs generated by an entropy-optimal solver that plays all 2,315 Wordle games. The solver picks the guess maximising expected information gain at each step:

	$$E[\text{Info}] = \sum_{p} P(p) \cdot \log_2\left(\frac{1}{P(p)}\right)$$

	### RL Model
	1. Warm start from supervised weights
	2. Elite game collection — greedy rollouts with constraint-filtered action masking, keeping only games solved in ≤3 guesses (~11% hit rate)
	3. REINFORCE training — supervised loss on elite (state, action) pairs
	4. Benchmark against all 2,315 answers using constraint-filtered suggestion logic

	The RL model learns purely from reward signal (win/lose, guesses used) without access to the entropy oracle used to train the supervised model.

	---

	## Inference

	The models are not used as raw classifiers — the backend combines model logits with constraint filtering:

	```python
	# 1. Get top-20 model words
	logits = model(encode_board(history))
	model_words = [ALLOWED[i] for i in logits.topk(20).indices]

	# 2. Filter to words consistent with all previous guesses
	possible = filter_words(ANSWERS, history)

	# 3. Score by entropy against remaining possible set
	candidates = model_words + possible
	best = max(candidates, key=lambda w: entropy_score(w, possible))
	```

	This hybrid approach is why the supervised model achieves 100% — the neural net narrows the search, entropy scoring picks the optimal move.

	---

	## Usage

	```python
	import torch
	import torch.nn as nn
	from huggingface_hub import hf_hub_download
	import json

	REPO_ID = "sato2ru/wordle-solver"

	config = json.load(open(hf_hub_download(REPO_ID, "config.json")))
	ALLOWED = json.load(open(hf_hub_download(REPO_ID, "allowed.json")))

	class WordleNet(nn.Module):
	def __init__(self):
	super().__init__()
	h = config["hidden"]
	self.net = nn.Sequential(
	nn.Linear(390, h), nn.BatchNorm1d(h), nn.ReLU(), nn.Dropout(0.3),
	nn.Linear(h, h), nn.BatchNorm1d(h), nn.ReLU(), nn.Dropout(0.3),
	nn.Linear(h, 256), nn.BatchNorm1d(256), nn.ReLU(),
	nn.Linear(256, 12972)
	)
	def forward(self, x): return self.net(x)

	# Load supervised model
	model = WordleNet()
	model.load_state_dict(
	torch.load(hf_hub_download(REPO_ID, "model_weights.pt"), map_location="cpu")
	)
	model.eval()
	```

	Or use the live API directly:
	```bash
	curl -X POST "https://web-production-ea1d.up.railway.app/suggest?model=supervised" \
	-H "Content-Type: application/json" \
	-d '{"history": []}'

	curl -X POST "https://web-production-ea1d.up.railway.app/suggest?model=rl" \
	-H "Content-Type: application/json" \
	-d '{"history": []}'
	```

	---

	## Results

	### Supervised — all 2,315 answers (greedy + entropy filter)
	```
	1 guess : 1
	2 guesses: 59 ████████████
	3 guesses: 1188 ██████████████████████████████████████████████
	4 guesses: 1010 ████████████████████████████████████████
	5 guesses: 56 ███████████
	6 guesses: 1
	FAILED : 0 ✅ 100% win rate
	```

	### RL — all 2,315 answers (greedy + entropy filter)
	```
	1 guess : 1
	2 guesses: 141 ████████████
	3 guesses: 810 ██████████████████████████████████████████████
	4 guesses: 893 ████████████████████████████████████████
	5 guesses: 343 ███████████
	6 guesses: 86 ████
	FAILED : 41 ✅ 98.2% win rate
	```

	---

	## Links

	- Live App: [wordle-solver-tan.vercel.app](https://wordle-solver-tan.vercel.app)
	- GitHub: [github.com/Jeanwrld/wordle-solver](https://github.com/Jeanwrld/wordle-solver)
	- Backend: [github.com/Jeanwrld/wordle-api](https://github.com/Jeanwrld/wordle-api)
	- Gradio Demo: [huggingface.co/spaces/sato2ru/wordle](https://huggingface.co/spaces/sato2ru/wordle)

	---

	## License

	MIT