๐ŸŽฏ Wordle AI โ€” Fine-tuned with GRPO

A language model trained to play Wordle using Group Relative Policy Optimization (GRPO) reinforcement learning.

Model License Framework RL Algorithm


๐Ÿ“– Overview

This model is a fine-tuned version of Qwen2-0.5B-Instruct trained to play the popular word game Wordle using reinforcement learning. Instead of supervised learning from human examples, this model learned purely from reward signals โ€” improving its strategy game by game through the GRPO algorithm.

The model learns strategies like:

  • Opening with vowel-rich words like CRANE or SLATE
  • Using green letter positions in subsequent guesses
  • Repositioning yellow letters correctly
  • Never repeating previously guessed words

๐Ÿ—๏ธ Model Details

Property Value
Base Model Qwen/Qwen2-0.5B-Instruct
Model Size 0.5B parameters
Tensor Type F16
Training Algorithm GRPO (Group Relative Policy Optimization)
Training Games 20
Hardware NVIDIA T4 GPU
Framework Hugging Face Transformers + TRL
Environment OpenEnv + TextArena Wordle

๐ŸŽฎ What is Wordle?

Wordle is a word guessing game where:

  • A secret 5-letter word is chosen
  • You have 6 attempts to guess it
  • After each guess you get color-coded feedback:
    • ๐ŸŸข G (Green) โ€” correct letter, correct position
    • ๐ŸŸก Y (Yellow) โ€” correct letter, wrong position
    • โฌ› X (Gray) โ€” letter not in the word

๐Ÿ† Reward System

The model was trained using 5 reward signals:

Signal Reward Description
Win the game +1.0 All 5 letters correct (GGGGG)
Green letters +0.3 Correct letter in correct position
Yellow letters +0.1 Correct letter in wrong position
New guess +0.3 Not repeating a previous guess
Valid word +0.2 Guess is exactly 5 letters

๐Ÿš€ Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "shaikabdulfahad/wordle-qwen2-mini",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "shaikabdulfahad/wordle-qwen2-mini"
)

# System prompt
system_prompt = """You are an expert Wordle solver.
Guess a 5-letter English word each turn.
Feedback: G=correct position, Y=wrong position, X=not in word.
Only respond with your guess in square brackets. Example: [crane]"""

# Ask for a guess
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user",   "content": "Start! What is your first guess?"},
]

text   = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=20,
        temperature=0.7,
        do_sample=True,
    )

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True
)
print("Model guesses:", response)

๐Ÿ” Training Pipeline

1. Connect to live Wordle environment (TextArena)
       โ†“
2. Generate guess using current model
       โ†“
3. Send guess to Wordle โ€” get feedback (G/Y/X)
       โ†“
4. Calculate reward from 5 signals
       โ†“
5. Update model using GRPO
       โ†“
6. Repeat for 20 games

๐Ÿ“ฆ Built With

Tool Purpose
OpenEnv RL environment framework
TextArena Live Wordle environment
Hugging Face Transformers Model loading and inference
TRL Reinforcement learning for LLMs
Google Colab Training hardware (T4 GPU)

โš ๏ธ Limitations

  • Trained for only 20 games โ€” more training would improve performance significantly
  • Uses a 0.5B parameter model โ€” larger models would learn better strategies
  • Training on T4 GPU limits batch size and training speed
  • Model still occasionally repeats guesses despite the repetition penalty

๐Ÿ”ฎ Future Improvements

  • Train for 1000+ games on A100 GPU
  • Use larger model (Qwen2-7B or Qwen3-1.7B)
  • Add stronger repetition penalty
  • Implement multi-turn conversation memory
  • Train on more word games (Quordle, Wordle variants)

๐Ÿ‘ค Author

Shaik Abdul Fahad

Built as part of the OpenEnv Course โ€” learning to build and deploy RL environments for LLM training.


๐Ÿ“„ License

This model is released under the Apache 2.0 License.

Downloads last month
66
Safetensors
Model size
0.5B params
Tensor type
F16
ยท
Inference Providers NEW
Input a message to start chatting with shaikabdulfahad/wordle-qwen2-mini.

Model tree for shaikabdulfahad/wordle-qwen2-mini

Base model

Qwen/Qwen2-0.5B
Finetuned
(561)
this model