Spaces:

LLM-course
/

codenames

Running

App Files Files Community

codenames / README.md

nathanael-fijalkow

Update README.md

b3ea7f0 verified 1 day ago

preview code

raw

history blame contribute delete

2.35 kB

A newer version of the Gradio SDK is available: 6.7.0

Upgrade

metadata

title: Codenames LLM Challenge
emoji: 🕵️
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 6.6.0
python_version: '3.11'
app_file: app.py
pinned: true

Codenames LLM Challenge

A Python framework for students to implement guesser bots for Codenames. The LLM acts as spymaster using embeddings.

Game Rules

Challenge Mode (Single Team):

Goal: Guess all RED words in minimum rounds
Board: 25 words total (9 RED, 8 BLUE, 8 ASSASSIN)
Each round: LLM spymaster gives a clue + number
Guesser makes up to (number + 1) guesses
Round ends if: BLUE word revealed, max guesses reached, or guesser stops
Game ends: WIN if all RED found, LOSE if ASSASSIN revealed

Setup

uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

Dictionary: Fixed list of 420 Codenames words. Clues and board words must be from this dictionary (case-insensitive).

Pre-build Embedding Cache (Recommended):

python -m codenames.cli init-cache

Downloads the embedding model and computes vectors for all 420 words (~30 seconds). Cached for reuse.

Test Your Guesser

Create a Python file with a guesser function:

# my_guesser.py
def guesser(clue: str, board_state: list[str]) -> str | None:
    """
    Args:
        clue: The spymaster's one-word clue (from dictionary)
        board_state: List of unrevealed words on the board
    
    Returns:
        A word to guess from board_state, or None to stop the round
    """
    # Your embedding-based or heuristic logic here
    return board_state[0]  # Simple example: always guess first word

Run against LLM spymaster:

python -m codenames.cli challenge my_guesser.py --seed 42 --output log.json

Options:

--seed: Random seed for reproducible boards
--model: Embedding model (default: sentence-transformers/all-MiniLM-L6-v2)
--max-rounds: Maximum rounds before timeout (default: 10)
--output: Save JSON log with board state, clues, guesses, and result

Log Format

The JSON output contains:

seed: Random seed used
board_words: All 25 words on the board
board_roles: Role for each word (RED/BLUE/ASSASSIN)
rounds: Array of rounds with clue, number, and guesses
final_state: Win/loss status and rounds taken

Use this data to analyze performance or train ML models.