๐ฏ Wordle AI โ Fine-tuned with GRPO
A language model trained to play Wordle using Group Relative Policy Optimization (GRPO) reinforcement learning.
๐ Overview
This model is a fine-tuned version of Qwen2-0.5B-Instruct trained to play the popular word game Wordle using reinforcement learning. Instead of supervised learning from human examples, this model learned purely from reward signals โ improving its strategy game by game through the GRPO algorithm.
The model learns strategies like:
- Opening with vowel-rich words like CRANE or SLATE
- Using green letter positions in subsequent guesses
- Repositioning yellow letters correctly
- Never repeating previously guessed words
๐๏ธ Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2-0.5B-Instruct |
| Model Size | 0.5B parameters |
| Tensor Type | F16 |
| Training Algorithm | GRPO (Group Relative Policy Optimization) |
| Training Games | 20 |
| Hardware | NVIDIA T4 GPU |
| Framework | Hugging Face Transformers + TRL |
| Environment | OpenEnv + TextArena Wordle |
๐ฎ What is Wordle?
Wordle is a word guessing game where:
- A secret 5-letter word is chosen
- You have 6 attempts to guess it
- After each guess you get color-coded feedback:
- ๐ข G (Green) โ correct letter, correct position
- ๐ก Y (Yellow) โ correct letter, wrong position
- โฌ X (Gray) โ letter not in the word
๐ Reward System
The model was trained using 5 reward signals:
| Signal | Reward | Description |
|---|---|---|
| Win the game | +1.0 | All 5 letters correct (GGGGG) |
| Green letters | +0.3 | Correct letter in correct position |
| Yellow letters | +0.1 | Correct letter in wrong position |
| New guess | +0.3 | Not repeating a previous guess |
| Valid word | +0.2 | Guess is exactly 5 letters |
๐ Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"shaikabdulfahad/wordle-qwen2-mini",
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
"shaikabdulfahad/wordle-qwen2-mini"
)
# System prompt
system_prompt = """You are an expert Wordle solver.
Guess a 5-letter English word each turn.
Feedback: G=correct position, Y=wrong position, X=not in word.
Only respond with your guess in square brackets. Example: [crane]"""
# Ask for a guess
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Start! What is your first guess?"},
]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=20,
temperature=0.7,
do_sample=True,
)
response = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
)
print("Model guesses:", response)
๐ Training Pipeline
1. Connect to live Wordle environment (TextArena)
โ
2. Generate guess using current model
โ
3. Send guess to Wordle โ get feedback (G/Y/X)
โ
4. Calculate reward from 5 signals
โ
5. Update model using GRPO
โ
6. Repeat for 20 games
๐ฆ Built With
| Tool | Purpose |
|---|---|
| OpenEnv | RL environment framework |
| TextArena | Live Wordle environment |
| Hugging Face Transformers | Model loading and inference |
| TRL | Reinforcement learning for LLMs |
| Google Colab | Training hardware (T4 GPU) |
โ ๏ธ Limitations
- Trained for only 20 games โ more training would improve performance significantly
- Uses a 0.5B parameter model โ larger models would learn better strategies
- Training on T4 GPU limits batch size and training speed
- Model still occasionally repeats guesses despite the repetition penalty
๐ฎ Future Improvements
- Train for 1000+ games on A100 GPU
- Use larger model (Qwen2-7B or Qwen3-1.7B)
- Add stronger repetition penalty
- Implement multi-turn conversation memory
- Train on more word games (Quordle, Wordle variants)
๐ค Author
Shaik Abdul Fahad
- ๐ค Hugging Face: shaikabdulfahad
- ๐ฆ Spaces: Word Game | Echo Env
Built as part of the OpenEnv Course โ learning to build and deploy RL environments for LLM training.
๐ License
This model is released under the Apache 2.0 License.
- Downloads last month
- 66