diff --git a/README.md b/README.md index 56779643445ae4bbaf1706325ba4b7896f5b9ed2..b26f5a32cdbc8d72de91ea6383ec75c3ef73fa56 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ --- -title: Kantbench Environment Server -emoji: đŸ•šī¸ +title: KantBench Environment Server +emoji: 🎮 colorFrom: green colorTo: yellow sdk: docker @@ -10,245 +10,85 @@ tags: - openenv --- -# Kantbench Environment +# KantBench: 90+ Game Theory Environments for LLM Training -A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns. +A comprehensive game theory environment for training and evaluating LLM strategic reasoning via OpenEnv. Supports GRPO/DPO training with the environment as a reward oracle. -## Quick Start - -The simplest way to use the Kantbench environment is through the `KantbenchEnv` class: - -```python -from KantBench import KantbenchAction, KantbenchEnv - -try: - # Create environment from Docker image - KantBenchenv = KantbenchEnv.from_docker_image("KantBench-env:latest") - - # Reset - result = KantBenchenv.reset() - print(f"Reset: {result.observation.echoed_message}") - - # Send multiple messages - messages = ["Hello, World!", "Testing echo", "Final message"] - - for msg in messages: - result = KantBenchenv.step(KantbenchAction(message=msg)) - print(f"Sent: '{msg}'") - print(f" → Echoed: '{result.observation.echoed_message}'") - print(f" → Length: {result.observation.message_length}") - print(f" → Reward: {result.reward}") - -finally: - # Always clean up - KantBenchenv.close() -``` - -That's it! The `KantbenchEnv.from_docker_image()` method handles: -- Starting the Docker container -- Waiting for the server to be ready -- Connecting to the environment -- Container cleanup when you call `close()` - -## Building the Docker Image - -Before using the environment, you need to build the Docker image: - -```bash -# From project root -docker build -t KantBench-env:latest -f server/Dockerfile . -``` - -## Deploying to Hugging Face Spaces +## Games (90+) -You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command: +| Category | Examples | Count | +|---|---|---| +| **Classic Matrix** | Prisoner's Dilemma, Stag Hunt, Hawk-Dove, Battle of the Sexes | 20+ | +| **Economic/Market** | Cournot, Bertrand, Hotelling, Nash Demand, Double Auction | 23 | +| **Information & Signaling** | Beer-Quiche, Spence Signaling, Bayesian Persuasion, Moral Hazard | 21 | +| **Cooperative & Repeated** | Shapley Allocation, Stable Matching, Discounted PD, Stochastic PD | 23 | +| **Auctions & Contests** | First-Price, Vickrey, All-Pay, Colonel Blotto, Tullock Contest | 10+ | +| **Sequential** | Ultimatum, Trust, Centipede, Stackelberg, Dictator | 6 | -```bash -# From the environment directory (where openenv.yaml is located) -openenv push - -# Or specify options -openenv push --namespace my-org --private -``` - -The `openenv push` command will: -1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`) -2. Prepare a custom build for Hugging Face Docker space (enables web interface) -3. Upload to Hugging Face (ensuring you're logged in) - -### Prerequisites - -- Authenticate with Hugging Face: The command will prompt for login if not already authenticated - -### Options - -- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory) -- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml) -- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM) -- `--private`: Deploy the space as private (default: public) - -### Examples - -```bash -# Push to your personal namespace (defaults to username/env-name from openenv.yaml) -openenv push - -# Push to a specific repository -openenv push --repo-id my-org/my-env - -# Push with a custom base image -openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest - -# Push as a private space -openenv push --private - -# Combine options -openenv push --repo-id my-org/my-env --base-image custom-base:latest --private -``` +## Opponent Strategies (17) -After deployment, your space will be available at: -`https://huggingface.co/spaces/` +`random`, `always_cooperate`, `always_defect`, `tit_for_tat`, `tit_for_two_tats`, `grudger`, `pavlov`, `suspicious_tit_for_tat`, `generous_tit_for_tat`, `adaptive`, `mixed`, `ultimatum_fair`, `ultimatum_low`, `trust_fair`, `trust_generous`, `public_goods_fair`, `public_goods_free_rider` -The deployed space includes: -- **Web Interface** at `/web` - Interactive UI for exploring the environment -- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface -- **Health Check** at `/health` - Container health monitoring -- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions - -## Environment Details - -### Action -**KantbenchAction**: Contains a single field -- `message` (str) - The message to echo back - -### Observation -**KantbenchObservation**: Contains the echo response and metadata -- `echoed_message` (str) - The message echoed back -- `message_length` (int) - Length of the message -- `reward` (float) - Reward based on message length (length × 0.1) -- `done` (bool) - Always False for echo environment -- `metadata` (dict) - Additional info like step count - -### Reward -The reward is calculated as: `message_length × 0.1` -- "Hi" → reward: 0.2 -- "Hello, World!" → reward: 1.3 -- Empty message → reward: 0.0 - -## Advanced Usage - -### Connecting to an Existing Server - -If you already have a Kantbench environment server running, you can connect directly: +## Quick Start ```python -from KantBench import KantbenchEnv - -# Connect to existing server -KantBenchenv = KantbenchEnv(base_url="") - -# Use as normal -result = KantBenchenv.reset() -result = KantBenchenv.step(KantbenchAction(message="Hello!")) +from KantBench import KantBenchAction, KantBenchEnv + +with KantBenchEnv(base_url="https://openenv-community-kantbench.hf.space") as env: + # Reset with a specific game and opponent strategy + result = env.reset(game="prisoners_dilemma", strategy="tit_for_tat") + print(f"Game: {result.observation.game_name}") + print(f"Moves: {result.observation.available_moves}") + + # Play rounds until done + while not result.done: + result = env.step(KantBenchAction(move="cooperate")) + print(f"Round {result.observation.round_number}: " + f"you={result.observation.your_move}, " + f"opp={result.observation.opponent_move}, " + f"payoff={result.observation.your_payoff}") + + print(f"Final score: {result.observation.cumulative_score}") ``` -Note: When connecting to an existing server, `KantBenchenv.close()` will NOT stop the server. - -### Using the Context Manager - -The client supports context manager usage for automatic connection management: +## Reset Parameters ```python -from KantBench import KantbenchAction, KantbenchEnv - -# Connect with context manager (auto-connects and closes) -with KantbenchEnv(base_url="http://localhost:8000") as env: - result = env.reset() - print(f"Reset: {result.observation.echoed_message}") - # Multiple steps with low latency - for msg in ["Hello", "World", "!"]: - result = env.step(KantbenchAction(message=msg)) - print(f"Echoed: {result.observation.echoed_message}") -``` +# Specific game and strategy +result = env.reset(game="stag_hunt", strategy="grudger") -The client uses WebSocket connections for: -- **Lower latency**: No HTTP connection overhead per request -- **Persistent session**: Server maintains your environment state -- **Efficient for episodes**: Better for many sequential steps - -### Concurrent WebSocket Sessions - -The server supports multiple concurrent WebSocket connections. To enable this, -modify `server/app.py` to use factory mode: - -```python -# In server/app.py - use factory mode for concurrent sessions -app = create_app( - KantbenchEnvironment, # Pass class, not instance - KantbenchAction, - KantbenchObservation, - max_concurrent_envs=4, # Allow 4 concurrent sessions -) +# Random game and strategy (default) +result = env.reset() ``` -Then multiple clients can connect simultaneously: +## API Endpoints -```python -from KantBench import KantbenchAction, KantbenchEnv -from concurrent.futures import ThreadPoolExecutor - -def run_episode(client_id: int): - with KantbenchEnv(base_url="http://localhost:8000") as env: - result = env.reset() - for i in range(10): - result = env.step(KantbenchAction(message=f"Client {client_id}, step {i}")) - return client_id, result.observation.message_length - -# Run 4 episodes concurrently -with ThreadPoolExecutor(max_workers=4) as executor: - results = list(executor.map(run_episode, range(4))) -``` +- **Web Interface** at `/web` — Interactive UI for exploring the environment +- **API Docs** at `/docs` — Full OpenAPI/Swagger interface +- **Health Check** at `/health` — Container health monitoring +- **WebSocket** at `/ws` — Persistent session endpoint -## Development & Testing - -### Direct Environment Testing +## Environment Details -Test the environment logic directly without starting the HTTP server: +### Action -```bash -# From the server directory -python3 server/KantBench_environment.py -``` +**KantBenchAction**: Single field +- `move` (str) — Your move (e.g. `"cooperate"`, `"defect"`, `"hawk"`, `"produce_5"`) -This verifies that: -- Environment resets correctly -- Step executes actions properly -- State tracking works -- Rewards are calculated correctly +### Observation -### Running Locally +**KantBenchObservation**: Full round result and episode state +- `game_name`, `game_description` — Current game info +- `available_moves` — Valid moves for this game +- `your_move`, `opponent_move` — Moves played this round +- `your_payoff`, `opponent_payoff` — Payoffs this round +- `cumulative_score` — Your total score +- `round_number`, `max_rounds` — Episode progress +- `opponent_strategy` — Opponent strategy name +- `history` — Full round-by-round history -Run the server locally for development: +## Deployment ```bash -uvicorn server.app:app --reload -``` - -## Project Structure - -``` -KantBench/ -├── .dockerignore # Docker build exclusions -├── __init__.py # Module exports -├── README.md # This file -├── openenv.yaml # OpenEnv manifest -├── pyproject.toml # Project metadata and dependencies -├── uv.lock # Locked dependencies (generated) -├── client.py # KantbenchEnv client -├── models.py # Action and Observation models -└── server/ - ├── __init__.py # Server module exports - ├── KantBench_environment.py # Core environment logic - ├── app.py # FastAPI application (HTTP + WebSocket endpoints) - └── Dockerfile # Container image definition +python spaces/kant/deploy.py ``` diff --git a/__init__.py b/__init__.py index 92e11a7d590d51ab6dc67dfe04033ab1f7ee527d..d8219f27c0287040da2988da889b4e7c84742565 100644 --- a/__init__.py +++ b/__init__.py @@ -1,16 +1,10 @@ -# Copyright (c) Meta Platforms, Inc. and affiliates. -# All rights reserved. -# -# This source code is licensed under the BSD-style license found in the -# LICENSE file in the root directory of this source tree. +"""KantBench Environment — 90+ game theory games for LLM training.""" -"""Kantbench Environment.""" - -from .client import KantbenchEnv -from .models import KantbenchAction, KantbenchObservation +from .client import KantBenchEnv +from .models import KantBenchAction, KantBenchObservation __all__ = [ - "KantbenchAction", - "KantbenchObservation", - "KantbenchEnv", + "KantBenchAction", + "KantBenchObservation", + "KantBenchEnv", ] diff --git a/client.py b/client.py index 2b9d5327dd3e214b9594612d17793374f839823f..2206d46ad40d97d8fd1c7689c398acde3a747ef3 100644 --- a/client.py +++ b/client.py @@ -1,10 +1,4 @@ -# Copyright (c) Meta Platforms, Inc. and affiliates. -# All rights reserved. -# -# This source code is licensed under the BSD-style license found in the -# LICENSE file in the root directory of this source tree. - -"""Kantbench Environment Client.""" +"""KantBench Environment Client.""" from typing import Dict @@ -12,69 +6,54 @@ from openenv.core.client_types import StepResult from openenv.core.env_server.types import State from openenv.core import EnvClient -from .models import KantbenchAction, KantbenchObservation +from .models import KantBenchAction, KantBenchObservation -class KantbenchEnv( - EnvClient[KantbenchAction, KantbenchObservation] +class KantBenchEnv( + EnvClient[KantBenchAction, KantBenchObservation] ): """ - Client for the Kantbench Environment. + Client for the KantBench game theory environment. - This client maintains a persistent WebSocket connection to the environment server, - enabling efficient multi-step interactions with lower latency. - Each client instance has its own dedicated environment session on the server. + Maintains a persistent WebSocket connection to the environment server. + Each client instance has its own dedicated environment session. Example: - >>> # Connect to a running server - >>> with KantbenchEnv(base_url="http://localhost:8000") as client: + >>> with KantBenchEnv(base_url="http://localhost:8000") as client: ... result = client.reset() - ... print(result.observation.echoed_message) + ... print(result.observation.game_name) + ... print(result.observation.available_moves) ... - ... result = client.step(KantbenchAction(message="Hello!")) - ... print(result.observation.echoed_message) + ... result = client.step(KantBenchAction(move="cooperate")) + ... print(result.observation.your_payoff) - Example with Docker: - >>> # Automatically start container and connect - >>> client = KantbenchEnv.from_docker_image("KantBench-env:latest") - >>> try: + Example with HF Space: + >>> with KantBenchEnv(base_url="https://openenv-community-kantbench.hf.space") as client: ... result = client.reset() - ... result = client.step(KantbenchAction(message="Test")) - ... finally: - ... client.close() + ... result = client.step(KantBenchAction(move="cooperate")) """ - def _step_payload(self, action: KantbenchAction) -> Dict: - """ - Convert KantbenchAction to JSON payload for step message. - - Args: - action: KantbenchAction instance - - Returns: - Dictionary representation suitable for JSON encoding - """ - return { - "message": action.message, - } - - def _parse_result(self, payload: Dict) -> StepResult[KantbenchObservation]: - """ - Parse server response into StepResult[KantbenchObservation]. + def _step_payload(self, action: KantBenchAction) -> Dict: + return {"move": action.move} - Args: - payload: JSON response data from server - - Returns: - StepResult with KantbenchObservation - """ + def _parse_result(self, payload: Dict) -> StepResult[KantBenchObservation]: obs_data = payload.get("observation", {}) - observation = KantbenchObservation( - echoed_message=obs_data.get("echoed_message", ""), - message_length=obs_data.get("message_length", 0), + observation = KantBenchObservation( + game_name=obs_data.get("game_name", ""), + game_description=obs_data.get("game_description", ""), + available_moves=obs_data.get("available_moves", []), + your_move=obs_data.get("your_move", ""), + opponent_move=obs_data.get("opponent_move", ""), + your_payoff=obs_data.get("your_payoff", 0.0), + opponent_payoff=obs_data.get("opponent_payoff", 0.0), + cumulative_score=obs_data.get("cumulative_score", 0.0), + round_number=obs_data.get("round_number", 0), + max_rounds=obs_data.get("max_rounds", 10), + opponent_strategy=obs_data.get("opponent_strategy", ""), + history=obs_data.get("history", []), done=payload.get("done", False), reward=payload.get("reward"), - metadata=obs_data.get("metadata", {}), + message=obs_data.get("message", ""), ) return StepResult( @@ -84,15 +63,6 @@ class KantbenchEnv( ) def _parse_state(self, payload: Dict) -> State: - """ - Parse server response into State object. - - Args: - payload: JSON response from state request - - Returns: - State object with episode_id and step_count - """ return State( episode_id=payload.get("episode_id"), step_count=payload.get("step_count", 0), diff --git a/common/__init__.py b/common/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..062baecf713831b0073ffa275628b98c7d9e97fb --- /dev/null +++ b/common/__init__.py @@ -0,0 +1 @@ +"""Shared game infrastructure: game definitions, strategies, and extensions.""" diff --git a/common/__pycache__/__init__.cpython-311.pyc b/common/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..25062291075173dc77dff31d7e5783d35eae7a9d Binary files /dev/null and b/common/__pycache__/__init__.cpython-311.pyc differ diff --git a/common/__pycache__/games.cpython-311.pyc b/common/__pycache__/games.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..8900ee3410b450d528c0eb68211a350517fe8d6e Binary files /dev/null and b/common/__pycache__/games.cpython-311.pyc differ diff --git a/common/__pycache__/strategies.cpython-311.pyc b/common/__pycache__/strategies.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b2263be20c72035c9d9178a12c8027ea1fe8622a Binary files /dev/null and b/common/__pycache__/strategies.cpython-311.pyc differ diff --git a/common/games.py b/common/games.py new file mode 100644 index 0000000000000000000000000000000000000000..ed9aecb93e221523064c9404e482b8e0c77cc4f9 --- /dev/null +++ b/common/games.py @@ -0,0 +1,298 @@ +"""Game configuration registry and payoff computation for KantBench.""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import Callable + +from constant_definitions.game_constants import ( + DEFAULT_ZERO_FLOAT, + DEFAULT_ZERO_INT, + # Prisoner's Dilemma + PD_CC_PAYOFF, + PD_CD_PAYOFF, + PD_DC_PAYOFF, + PD_DD_PAYOFF, + # Stag Hunt + SH_SS_PAYOFF, + SH_SH_PAYOFF, + SH_HS_PAYOFF, + SH_HH_PAYOFF, + # Hawk-Dove + HD_HH_PAYOFF, + HD_HD_PAYOFF, + HD_DH_PAYOFF, + HD_DD_PAYOFF, + # Ultimatum + ULTIMATUM_POT, + # Trust + TRUST_MULTIPLIER, + TRUST_ENDOWMENT, + # Public Goods + PG_MULTIPLIER_NUMERATOR, + PG_MULTIPLIER_DENOMINATOR, + PG_ENDOWMENT, + PG_DEFAULT_NUM_PLAYERS, + # Round counts + DEFAULT_NUM_ROUNDS, + SINGLE_SHOT_ROUNDS, +) + +# --------------------------------------------------------------------------- +# GameConfig dataclass +# --------------------------------------------------------------------------- + + +@dataclass(frozen=True) +class GameConfig: + """Immutable specification for a single game type.""" + + name: str + description: str + actions: list[str] + game_type: str # "matrix" | "ultimatum" | "trust" | "public_goods" + default_rounds: int + payoff_fn: Callable[[str, str], tuple[float, float]] + + +# --------------------------------------------------------------------------- +# Matrix-game payoff helpers +# --------------------------------------------------------------------------- + +_PD_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(PD_CC_PAYOFF), float(PD_CC_PAYOFF)), + ("cooperate", "defect"): (float(PD_CD_PAYOFF), float(PD_DC_PAYOFF)), + ("defect", "cooperate"): (float(PD_DC_PAYOFF), float(PD_CD_PAYOFF)), + ("defect", "defect"): (float(PD_DD_PAYOFF), float(PD_DD_PAYOFF)), +} + +_SH_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("stag", "stag"): (float(SH_SS_PAYOFF), float(SH_SS_PAYOFF)), + ("stag", "hare"): (float(SH_SH_PAYOFF), float(SH_HS_PAYOFF)), + ("hare", "stag"): (float(SH_HS_PAYOFF), float(SH_SH_PAYOFF)), + ("hare", "hare"): (float(SH_HH_PAYOFF), float(SH_HH_PAYOFF)), +} + +_HD_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("hawk", "hawk"): (float(HD_HH_PAYOFF), float(HD_HH_PAYOFF)), + ("hawk", "dove"): (float(HD_HD_PAYOFF), float(HD_DH_PAYOFF)), + ("dove", "hawk"): (float(HD_DH_PAYOFF), float(HD_HD_PAYOFF)), + ("dove", "dove"): (float(HD_DD_PAYOFF), float(HD_DD_PAYOFF)), +} + + +def _matrix_payoff_fn( + matrix: dict[tuple[str, str], tuple[float, float]], +) -> Callable[[str, str], tuple[float, float]]: + """Return a payoff function backed by a pre-built matrix dict.""" + + def _payoff(player_action: str, opponent_action: str) -> tuple[float, float]: + return matrix[(player_action, opponent_action)] + + return _payoff + + +# --------------------------------------------------------------------------- +# Computed payoff functions +# --------------------------------------------------------------------------- + + +def _parse_action_amount(action: str) -> int: + """Extract the integer suffix from an action string like 'offer_5'.""" + parts = action.rsplit("_", maxsplit=SINGLE_SHOT_ROUNDS) + return int(parts[SINGLE_SHOT_ROUNDS]) + + +def _ultimatum_payoff(player_action: str, opponent_action: str) -> tuple[float, float]: + """Compute Ultimatum Game payoffs. + + The player chooses an offer amount; the opponent accepts or rejects. + """ + offer = _parse_action_amount(player_action) + + if opponent_action == "reject": + return (DEFAULT_ZERO_FLOAT, DEFAULT_ZERO_FLOAT) + + # accepted + player_payoff = float(ULTIMATUM_POT - offer) + opponent_payoff = float(offer) + return (player_payoff, opponent_payoff) + + +def _trust_payoff(player_action: str, opponent_action: str) -> tuple[float, float]: + """Compute Trust Game payoffs. + + The player invests X from their endowment. The opponent receives + X * multiplier and returns Y of that amount. + """ + investment = _parse_action_amount(player_action) + returned = _parse_action_amount(opponent_action) + + player_payoff = float(TRUST_ENDOWMENT - investment + returned) + opponent_payoff = float(investment * TRUST_MULTIPLIER - returned) + return (player_payoff, opponent_payoff) + + +def _public_goods_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Compute Public Goods Game payoffs. + + Each participant contributes from their endowment. The total pot is + multiplied by (numerator / denominator) then split equally among all + participants. + """ + player_contrib = _parse_action_amount(player_action) + opponent_contrib = _parse_action_amount(opponent_action) + + total_contributions = player_contrib + opponent_contrib + multiplied_pot = ( + total_contributions * PG_MULTIPLIER_NUMERATOR / PG_MULTIPLIER_DENOMINATOR + ) + share = multiplied_pot / PG_DEFAULT_NUM_PLAYERS + + player_payoff = float(PG_ENDOWMENT - player_contrib) + share + opponent_payoff = float(PG_ENDOWMENT - opponent_contrib) + share + return (player_payoff, opponent_payoff) + + +# --------------------------------------------------------------------------- +# Action lists for computed games +# --------------------------------------------------------------------------- + +_ULTIMATUM_OFFERS: list[str] = [ + f"offer_{i}" for i in range(ULTIMATUM_POT + SINGLE_SHOT_ROUNDS) +] + +_TRUST_INVESTMENTS: list[str] = [ + f"invest_{i}" for i in range(TRUST_ENDOWMENT + SINGLE_SHOT_ROUNDS) +] + +_PG_CONTRIBUTIONS: list[str] = [ + f"contribute_{i}" for i in range(PG_ENDOWMENT + SINGLE_SHOT_ROUNDS) +] + + +# --------------------------------------------------------------------------- +# Game registry +# --------------------------------------------------------------------------- + +GAMES: dict[str, GameConfig] = { + "prisoners_dilemma": GameConfig( + name="Prisoner's Dilemma", + description=( + "Two players simultaneously choose to cooperate or defect. " + "Mutual cooperation yields a moderate reward, mutual defection " + "yields a low reward, and unilateral defection tempts with the " + "highest individual payoff at the other player's expense." + ), + actions=["cooperate", "defect"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_PD_MATRIX), + ), + "stag_hunt": GameConfig( + name="Stag Hunt", + description=( + "Two players choose between hunting stag (risky but rewarding " + "if both participate) or hunting hare (safe but less rewarding). " + "Coordination on stag yields the highest joint payoff." + ), + actions=["stag", "hare"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_SH_MATRIX), + ), + "hawk_dove": GameConfig( + name="Hawk-Dove", + description=( + "Two players choose between aggressive (hawk) and passive (dove) " + "strategies over a shared resource. Two hawks suffer mutual harm; " + "a hawk facing a dove claims the resource; two doves share it." + ), + actions=["hawk", "dove"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_HD_MATRIX), + ), + "ultimatum": GameConfig( + name="Ultimatum Game", + description=( + "The proposer offers a split of a fixed pot. The responder " + "either accepts (both receive their shares) or rejects " + "(both receive nothing)." + ), + actions=_ULTIMATUM_OFFERS, + game_type="ultimatum", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_ultimatum_payoff, + ), + "trust": GameConfig( + name="Trust Game", + description=( + "The investor sends part of an endowment; the amount is " + "multiplied and given to the trustee, who then decides how " + "much to return." + ), + actions=_TRUST_INVESTMENTS, + game_type="trust", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_trust_payoff, + ), + "public_goods": GameConfig( + name="Public Goods Game", + description=( + "Each participant decides how much of their endowment to " + "contribute to a common pool. The pool is multiplied and " + "distributed equally, creating tension between individual " + "free-riding and collective benefit." + ), + actions=_PG_CONTRIBUTIONS, + game_type="public_goods", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_public_goods_payoff, + ), +} + + +def get_game(name: str) -> GameConfig: + """Retrieve a GameConfig by its registry key. + + Args: + name: Key in the GAMES registry (e.g. ``"prisoners_dilemma"``). + + Returns: + The corresponding :class:`GameConfig` instance. + + Raises: + KeyError: If *name* is not present in the registry. + """ + return GAMES[name] + + +def _load_extensions() -> None: + """Import extension modules that register additional games.""" + import importlib + for mod in [ + "common.games_ext.matrix_games", "common.games_ext.sequential", + "common.games_ext.auction", "common.games_ext.nplayer", + "common.games_ext.generated", "common.games_info.signaling", + "common.games_info.contracts", "common.games_info.communication", + "common.games_info.bayesian", "common.games_info.network", + "common.games_market.oligopoly", "common.games_market.contests", + "common.games_market.classic", "common.games_market.generated_v2", + "common.games_market.advanced", "common.games_coop.cooperative", + "common.games_coop.dynamic", "common.games_coop.pd_variants", + "common.games_coop.infinite", "common.games_coop.stochastic", + ]: + try: + importlib.import_module(mod) + except ImportError: + pass + + +_load_extensions() + +from common.games_meta.dynamic import ( # noqa: E402,F401 + create_matrix_game, create_symmetric_game, create_custom_game, +) diff --git a/common/games_coop/__pycache__/cooperative.cpython-311.pyc b/common/games_coop/__pycache__/cooperative.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6990213e66e0385e52a7c18d56aaf3f8a62543f6 Binary files /dev/null and b/common/games_coop/__pycache__/cooperative.cpython-311.pyc differ diff --git a/common/games_coop/__pycache__/dynamic.cpython-311.pyc b/common/games_coop/__pycache__/dynamic.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..a07d07cd39bdc1ffd154f30c90a45faf739e64db Binary files /dev/null and b/common/games_coop/__pycache__/dynamic.cpython-311.pyc differ diff --git a/common/games_coop/__pycache__/infinite.cpython-311.pyc b/common/games_coop/__pycache__/infinite.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..38f7f36ab720fa792a93578f18fb8b05caa4cb69 Binary files /dev/null and b/common/games_coop/__pycache__/infinite.cpython-311.pyc differ diff --git a/common/games_coop/__pycache__/pd_variants.cpython-311.pyc b/common/games_coop/__pycache__/pd_variants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..3b5adc25e31ec655f8c6510266e2de5b0cb83f27 Binary files /dev/null and b/common/games_coop/__pycache__/pd_variants.cpython-311.pyc differ diff --git a/common/games_coop/__pycache__/stochastic.cpython-311.pyc b/common/games_coop/__pycache__/stochastic.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c7546ecfd1bf7e2e509c710959a27ef9fe97e145 Binary files /dev/null and b/common/games_coop/__pycache__/stochastic.cpython-311.pyc differ diff --git a/common/games_coop/cooperative.py b/common/games_coop/cooperative.py new file mode 100644 index 0000000000000000000000000000000000000000..8509bf49ba2580164b0888970baade19751665b7 --- /dev/null +++ b/common/games_coop/cooperative.py @@ -0,0 +1,169 @@ +"""Cooperative game theory and social choice games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.ext.cooperative_constants import ( + SHAPLEY_GRAND_COALITION_VALUE, SHAPLEY_SINGLE_VALUE, + SHAPLEY_MAX_CLAIM, + CORE_POT, + WV_QUOTA, WV_PLAYER_WEIGHT, WV_OPPONENT_WEIGHT, + WV_PASS_BENEFIT, WV_FAIL_PAYOFF, WV_OPPOSITION_BONUS, + SM_TOP_MATCH_PAYOFF, SM_MID_MATCH_PAYOFF, SM_LOW_MATCH_PAYOFF, + MV_POSITION_RANGE, MV_DISTANCE_COST, + AV_PREFERRED_WIN, AV_ACCEPTABLE_WIN, AV_DISLIKED_WIN, + AV_NUM_CANDIDATES, +) + +_ONE = int(bool(True)) +_TWO = _ONE + _ONE +_ZERO_F = float() + + +# -- Shapley Value Allocation -- +def _shapley_payoff(pa: str, oa: str) -> tuple[float, float]: + """Each proposes a claim. Compatible claims split; else disagreement.""" + c_p = int(pa.rsplit("_", _ONE)[_ONE]) + c_o = int(oa.rsplit("_", _ONE)[_ONE]) + if c_p + c_o <= SHAPLEY_GRAND_COALITION_VALUE: + return (float(c_p), float(c_o)) + return (float(SHAPLEY_SINGLE_VALUE), float(SHAPLEY_SINGLE_VALUE)) + + +_SHAPLEY_ACTS = [f"claim_{i}" for i in range(SHAPLEY_MAX_CLAIM + _ONE)] + + +# -- Core / Divide-the-Dollar -- +def _core_payoff(pa: str, oa: str) -> tuple[float, float]: + """Each proposes how much they want. If feasible, they get it.""" + d_p = int(pa.rsplit("_", _ONE)[_ONE]) + d_o = int(oa.rsplit("_", _ONE)[_ONE]) + if d_p + d_o <= CORE_POT: + return (float(d_p), float(d_o)) + return (_ZERO_F, _ZERO_F) + + +_CORE_ACTS = [f"claim_{i}" for i in range(CORE_POT + _ONE)] + + +# -- Weighted Voting -- +def _weighted_voting_payoff(pa: str, oa: str) -> tuple[float, float]: + """Players vote yes or no; proposal passes if weighted votes meet quota.""" + p_yes = pa == "vote_yes" + o_yes = oa == "vote_yes" + total_weight = int() + if p_yes: + total_weight += WV_PLAYER_WEIGHT + if o_yes: + total_weight += WV_OPPONENT_WEIGHT + passes = total_weight >= WV_QUOTA + if passes: + return (float(WV_PASS_BENEFIT), float(WV_PASS_BENEFIT)) + p_pay = float(WV_OPPOSITION_BONUS) if not p_yes else float(WV_FAIL_PAYOFF) + o_pay = float(WV_OPPOSITION_BONUS) if not o_yes else float(WV_FAIL_PAYOFF) + return (p_pay, o_pay) + + +# -- Stable Matching (preference revelation) -- +_SM_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("rank_abc", "rank_abc"): (float(SM_TOP_MATCH_PAYOFF), float(SM_TOP_MATCH_PAYOFF)), + ("rank_abc", "rank_bac"): (float(SM_MID_MATCH_PAYOFF), float(SM_TOP_MATCH_PAYOFF)), + ("rank_abc", "rank_cab"): (float(SM_LOW_MATCH_PAYOFF), float(SM_MID_MATCH_PAYOFF)), + ("rank_bac", "rank_abc"): (float(SM_TOP_MATCH_PAYOFF), float(SM_MID_MATCH_PAYOFF)), + ("rank_bac", "rank_bac"): (float(SM_MID_MATCH_PAYOFF), float(SM_MID_MATCH_PAYOFF)), + ("rank_bac", "rank_cab"): (float(SM_LOW_MATCH_PAYOFF), float(SM_LOW_MATCH_PAYOFF)), + ("rank_cab", "rank_abc"): (float(SM_MID_MATCH_PAYOFF), float(SM_LOW_MATCH_PAYOFF)), + ("rank_cab", "rank_bac"): (float(SM_LOW_MATCH_PAYOFF), float(SM_LOW_MATCH_PAYOFF)), + ("rank_cab", "rank_cab"): (float(SM_TOP_MATCH_PAYOFF), float(SM_TOP_MATCH_PAYOFF)), +} + + +# -- Median Voter -- +def _median_voter_payoff(pa: str, oa: str) -> tuple[float, float]: + """Each picks a policy position; outcome is the median.""" + pos_p = int(pa.rsplit("_", _ONE)[_ONE]) + pos_o = int(oa.rsplit("_", _ONE)[_ONE]) + median = (pos_p + pos_o) // _TWO + p_pay = float(-MV_DISTANCE_COST * abs(pos_p - median)) + o_pay = float(-MV_DISTANCE_COST * abs(pos_o - median)) + return (p_pay, o_pay) + + +_MV_ACTS = [f"position_{i}" for i in range(MV_POSITION_RANGE + _ONE)] + + +# -- Approval Voting -- +def _approval_voting_payoff(pa: str, oa: str) -> tuple[float, float]: + """Each approves a candidate. Candidate with most approvals wins.""" + if pa == oa: + return (float(AV_PREFERRED_WIN), float(AV_PREFERRED_WIN)) + return (float(AV_DISLIKED_WIN), float(AV_DISLIKED_WIN)) + + +_AV_ACTS = [f"approve_{chr(ord('a') + i)}" for i in range(AV_NUM_CANDIDATES)] + +COOPERATIVE_GAMES: dict[str, GameConfig] = { + "shapley_allocation": GameConfig( + name="Shapley Value Allocation", + description=( + "Players claim shares of a coalition surplus. If claims are " + "compatible, each receives their claim; otherwise both receive " + "only their standalone value. Tests fair division reasoning." + ), + actions=_SHAPLEY_ACTS, game_type="shapley", + default_rounds=SINGLE_SHOT_ROUNDS, payoff_fn=_shapley_payoff, + ), + "core_divide_dollar": GameConfig( + name="Core / Divide-the-Dollar", + description=( + "Players simultaneously claim shares of a pot. If total " + "claims are feasible, each gets their share; otherwise " + "both get nothing. Tests coalition stability reasoning." + ), + actions=_CORE_ACTS, game_type="core", + default_rounds=SINGLE_SHOT_ROUNDS, payoff_fn=_core_payoff, + ), + "weighted_voting": GameConfig( + name="Weighted Voting Game", + description=( + "Players with different voting weights decide yes or no on " + "a proposal. The proposal passes if the weighted total meets " + "a quota. Tests understanding of pivotal power dynamics." + ), + actions=["vote_yes", "vote_no"], game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_weighted_voting_payoff, + ), + "stable_matching": GameConfig( + name="Stable Matching", + description=( + "Players report preference rankings over potential partners. " + "The matching outcome depends on reported preferences. Tests " + "whether agents report truthfully or strategically manipulate." + ), + actions=["rank_abc", "rank_bac", "rank_cab"], game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_SM_MATRIX), + ), + "median_voter": GameConfig( + name="Median Voter Game", + description=( + "Players choose policy positions on a line. The implemented " + "policy is the median. Each player's payoff decreases with " + "distance from the outcome. Tests strategic positioning." + ), + actions=_MV_ACTS, game_type="median_voter", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_median_voter_payoff, + ), + "approval_voting": GameConfig( + name="Approval Voting", + description=( + "Players approve one candidate from a set. The candidate " + "with the most approvals wins. Tests strategic vs sincere " + "voting behavior and preference aggregation." + ), + actions=_AV_ACTS, game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_approval_voting_payoff, + ), +} + +GAMES.update(COOPERATIVE_GAMES) diff --git a/common/games_coop/dynamic.py b/common/games_coop/dynamic.py new file mode 100644 index 0000000000000000000000000000000000000000..1cdee99595629d45a8cd72f1de8122d9dc33d836 --- /dev/null +++ b/common/games_coop/dynamic.py @@ -0,0 +1,162 @@ +"""Dynamic, behavioral, and repeated games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.ext.dynamic_constants import ( + BR_PATIENCE_REWARD, BR_EARLY_WITHDRAW, BR_BANK_FAIL_PAYOFF, + GSH_STAG_PAYOFF, GSH_HARE_PAYOFF, GSH_STAG_ALONE_PAYOFF, + BC_MAX_NUMBER, BC_TARGET_FRACTION_NUM, BC_TARGET_FRACTION_DEN, + BC_WIN_PAYOFF, BC_LOSE_PAYOFF, BC_TIE_PAYOFF, + HDB_RESOURCE_VALUE, HDB_FIGHT_COST, HDB_SHARE_DIVISOR, +) +from constant_definitions.game_constants import ( + PD_CC_PAYOFF, PD_CD_PAYOFF, PD_DC_PAYOFF, PD_DD_PAYOFF, +) + +_ONE = int(bool(True)) +_TWO = _ONE + _ONE +_ZERO_F = float() + + +# -- Bank Run (Diamond-Dybvig) -- +_BR_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("wait", "wait"): (float(BR_PATIENCE_REWARD), float(BR_PATIENCE_REWARD)), + ("wait", "withdraw"): (float(BR_BANK_FAIL_PAYOFF), float(BR_EARLY_WITHDRAW)), + ("withdraw", "wait"): (float(BR_EARLY_WITHDRAW), float(BR_BANK_FAIL_PAYOFF)), + ("withdraw", "withdraw"): (float(BR_BANK_FAIL_PAYOFF), float(BR_BANK_FAIL_PAYOFF)), +} + + +# -- Global Stag Hunt (higher stakes variant) -- +_GSH_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("stag", "stag"): (float(GSH_STAG_PAYOFF), float(GSH_STAG_PAYOFF)), + ("stag", "hare"): (float(GSH_STAG_ALONE_PAYOFF), float(GSH_HARE_PAYOFF)), + ("hare", "stag"): (float(GSH_HARE_PAYOFF), float(GSH_STAG_ALONE_PAYOFF)), + ("hare", "hare"): (float(GSH_HARE_PAYOFF), float(GSH_HARE_PAYOFF)), +} + + +# -- Beauty Contest (p-Guessing Game) -- +def _beauty_contest_payoff(pa: str, oa: str) -> tuple[float, float]: + """Each picks a number. Closest to p * average wins.""" + n_p = int(pa.rsplit("_", _ONE)[_ONE]) + n_o = int(oa.rsplit("_", _ONE)[_ONE]) + avg = float(n_p + n_o) / _TWO + target = avg * BC_TARGET_FRACTION_NUM / BC_TARGET_FRACTION_DEN + dist_p = abs(float(n_p) - target) + dist_o = abs(float(n_o) - target) + if dist_p < dist_o: + return (float(BC_WIN_PAYOFF), float(BC_LOSE_PAYOFF)) + if dist_o < dist_p: + return (float(BC_LOSE_PAYOFF), float(BC_WIN_PAYOFF)) + return (float(BC_TIE_PAYOFF), float(BC_TIE_PAYOFF)) + + +_BC_ACTS = [f"guess_{i}" for i in range(BC_MAX_NUMBER + _ONE)] + + +# -- Hawk-Dove-Bourgeois -- +_V = float(HDB_RESOURCE_VALUE) +_C = float(HDB_FIGHT_COST) +_S = _V / float(HDB_SHARE_DIVISOR) +_HDB_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("hawk", "hawk"): ((_V - _C) / _TWO, (_V - _C) / _TWO), + ("hawk", "dove"): (_V, _ZERO_F), + ("hawk", "bourgeois"): (_V / _TWO, (_V - _C) / (float(_TWO) * _TWO)), + ("dove", "hawk"): (_ZERO_F, _V), + ("dove", "dove"): (_S, _S), + ("dove", "bourgeois"): (_S / _TWO, _S + _V / (float(_TWO) * _TWO)), + ("bourgeois", "hawk"): ((_V - _C) / (float(_TWO) * _TWO), _V / _TWO), + ("bourgeois", "dove"): (_S + _V / (float(_TWO) * _TWO), _S / _TWO), + ("bourgeois", "bourgeois"): (_S, _S), +} + + +# -- Finitely Repeated PD (same payoffs, explicit short horizon) -- +_FPD_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(PD_CC_PAYOFF), float(PD_CC_PAYOFF)), + ("cooperate", "defect"): (float(PD_CD_PAYOFF), float(PD_DC_PAYOFF)), + ("defect", "cooperate"): (float(PD_DC_PAYOFF), float(PD_CD_PAYOFF)), + ("defect", "defect"): (float(PD_DD_PAYOFF), float(PD_DD_PAYOFF)), +} + +_FIVE = _TWO + _TWO + _ONE +_MARKOV_ROUNDS = _FIVE + _FIVE + _FIVE + +DYNAMIC_GAMES: dict[str, GameConfig] = { + "bank_run": GameConfig( + name="Bank Run (Diamond-Dybvig)", + description=( + "Depositors simultaneously decide whether to withdraw early. " + "If both wait, the bank survives and both earn a premium. If " + "both withdraw, the bank fails. Models coordination failure " + "in financial systems." + ), + actions=["wait", "withdraw"], game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_BR_MATRIX), + ), + "global_stag_hunt": GameConfig( + name="Global Stag Hunt", + description=( + "A higher-stakes Stag Hunt modeling coordination under " + "uncertainty. Both hunting stag yields a large payoff but " + "hunting stag alone yields nothing. Models bank runs, " + "currency attacks, and regime change dynamics." + ), + actions=["stag", "hare"], game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_GSH_MATRIX), + ), + "beauty_contest": GameConfig( + name="Keynesian Beauty Contest", + description=( + "Each player picks a number. The winner is closest to a " + "target fraction of the average. Tests depth of strategic " + "reasoning and level-k thinking. The unique Nash equilibrium " + "is zero, reached through iterated elimination." + ), + actions=_BC_ACTS, game_type="beauty_contest", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_beauty_contest_payoff, + ), + "hawk_dove_bourgeois": GameConfig( + name="Hawk-Dove-Bourgeois", + description=( + "Extended Hawk-Dove with a Bourgeois strategy that plays " + "Hawk when incumbent and Dove when intruder. The Bourgeois " + "strategy is an evolutionarily stable strategy. Tests " + "reasoning about ownership conventions." + ), + actions=["hawk", "dove", "bourgeois"], game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_HDB_MATRIX), + ), + "finitely_repeated_pd": GameConfig( + name="Finitely Repeated Prisoner's Dilemma", + description=( + "A Prisoner's Dilemma played for a known finite number of " + "rounds. Backward induction predicts mutual defection in " + "every round, yet cooperation often emerges experimentally. " + "Tests backward induction versus cooperation heuristics." + ), + actions=["cooperate", "defect"], game_type="matrix", + default_rounds=_FIVE, + payoff_fn=_matrix_payoff_fn(_FPD_MATRIX), + ), + "markov_game": GameConfig( + name="Markov Decision Game", + description=( + "A repeated game where the payoff structure shifts based on " + "recent history. Players must adapt strategies to changing " + "incentives. Tests dynamic programming and Markov-perfect " + "equilibrium reasoning over multiple rounds." + ), + actions=["cooperate", "defect"], game_type="matrix", + default_rounds=_MARKOV_ROUNDS, + payoff_fn=_matrix_payoff_fn(_FPD_MATRIX), + ), +} + +GAMES.update(DYNAMIC_GAMES) diff --git a/common/games_coop/infinite.py b/common/games_coop/infinite.py new file mode 100644 index 0000000000000000000000000000000000000000..08757bbb70e8387ae8abd52fb068b8e8a382a91e --- /dev/null +++ b/common/games_coop/infinite.py @@ -0,0 +1,72 @@ +"""Infinite-horizon and continuous games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS +from constant_definitions.var.infinite_constants import ( + CPD_BENEFIT_NUMERATOR, CPD_COST_NUMERATOR, CPD_DENOMINATOR, + CPD_MAX_LEVEL, + DPD_TEMPTATION, DPD_REWARD, DPD_PUNISHMENT, DPD_SUCKER, + DPD_DEFAULT_ROUNDS, +) + +_ONE = int(bool(True)) + + +# -- Continuous PD (variable contribution levels) -- +def _continuous_pd_payoff(pa: str, oa: str) -> tuple[float, float]: + """Each player chooses a cooperation level. Higher = costlier but benefits opponent.""" + lvl_p = int(pa.rsplit("_", _ONE)[_ONE]) + lvl_o = int(oa.rsplit("_", _ONE)[_ONE]) + p_pay = float(lvl_o * CPD_BENEFIT_NUMERATOR) / CPD_DENOMINATOR + p_pay -= float(lvl_p * CPD_COST_NUMERATOR) / CPD_DENOMINATOR + o_pay = float(lvl_p * CPD_BENEFIT_NUMERATOR) / CPD_DENOMINATOR + o_pay -= float(lvl_o * CPD_COST_NUMERATOR) / CPD_DENOMINATOR + return (p_pay, o_pay) + + +_CPD_ACTS = [f"level_{i}" for i in range(CPD_MAX_LEVEL + _ONE)] + + +# -- Discounted PD (high-stakes, long-horizon) -- +_DPD_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(DPD_REWARD), float(DPD_REWARD)), + ("cooperate", "defect"): (float(DPD_SUCKER), float(DPD_TEMPTATION)), + ("defect", "cooperate"): (float(DPD_TEMPTATION), float(DPD_SUCKER)), + ("defect", "defect"): (float(DPD_PUNISHMENT), float(DPD_PUNISHMENT)), +} + + +# -- Register -- +INFINITE_GAMES: dict[str, GameConfig] = { + "continuous_pd": GameConfig( + name="Continuous Prisoner's Dilemma", + description=( + "A generalization of the Prisoner's Dilemma with variable " + "cooperation levels instead of binary choices. Each unit of " + "cooperation costs the player but benefits the opponent more. " + "Tests whether agents find intermediate cooperation strategies " + "in continuous action spaces." + ), + actions=_CPD_ACTS, + game_type="continuous_pd", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_continuous_pd_payoff, + ), + "discounted_pd": GameConfig( + name="Discounted Prisoner's Dilemma", + description=( + "A high-stakes Prisoner's Dilemma with many rounds, modeling " + "an effectively infinite repeated interaction. The shadow of " + "the future makes cooperation sustainable under folk theorem " + "conditions. Tests long-horizon strategic reasoning with " + "higher temptation and reward differentials." + ), + actions=["cooperate", "defect"], + game_type="matrix", + default_rounds=DPD_DEFAULT_ROUNDS, + payoff_fn=_matrix_payoff_fn(_DPD_MATRIX), + ), +} + +GAMES.update(INFINITE_GAMES) diff --git a/common/games_coop/pd_variants.py b/common/games_coop/pd_variants.py new file mode 100644 index 0000000000000000000000000000000000000000..ea2e37b2d181f91d97834167300914ed13974ff4 --- /dev/null +++ b/common/games_coop/pd_variants.py @@ -0,0 +1,145 @@ +"""Prisoner's Dilemma variants for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import ( + PD_CC_PAYOFF, PD_CD_PAYOFF, PD_DC_PAYOFF, PD_DD_PAYOFF, + DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS, +) +from constant_definitions.var.pd_variant_constants import ( + OPD_EXIT_PAYOFF, + APD_A_TEMPTATION, APD_A_REWARD, APD_A_PUNISHMENT, APD_A_SUCKER, + APD_B_TEMPTATION, APD_B_REWARD, APD_B_PUNISHMENT, APD_B_SUCKER, + DONATION_BENEFIT, DONATION_COST, + FOF_SHARE_PAYOFF, FOF_STEAL_WIN_PAYOFF, + PW_DISARM_DISARM, PW_DISARM_ARM, PW_ARM_DISARM, PW_ARM_ARM, +) + +_ZERO_F = float() + + +# -- Optional PD (cooperate / defect / exit) -- +_OPD_EXIT_F = float(OPD_EXIT_PAYOFF) +_OPD_BASE: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(PD_CC_PAYOFF), float(PD_CC_PAYOFF)), + ("cooperate", "defect"): (float(PD_CD_PAYOFF), float(PD_DC_PAYOFF)), + ("defect", "cooperate"): (float(PD_DC_PAYOFF), float(PD_CD_PAYOFF)), + ("defect", "defect"): (float(PD_DD_PAYOFF), float(PD_DD_PAYOFF)), +} + + +def _optional_pd_payoff(pa: str, oa: str) -> tuple[float, float]: + if pa == "exit" or oa == "exit": + return (_OPD_EXIT_F, _OPD_EXIT_F) + return _OPD_BASE[(pa, oa)] + + +# -- Asymmetric PD (alibi game: different payoffs per player) -- +_ASYM_PD: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(APD_A_REWARD), float(APD_B_REWARD)), + ("cooperate", "defect"): (float(APD_A_SUCKER), float(APD_B_TEMPTATION)), + ("defect", "cooperate"): (float(APD_A_TEMPTATION), float(APD_B_SUCKER)), + ("defect", "defect"): (float(APD_A_PUNISHMENT), float(APD_B_PUNISHMENT)), +} + + +# -- Donation Game (pay cost c to give benefit b to opponent) -- +_DG: dict[tuple[str, str], tuple[float, float]] = { + ("donate", "donate"): ( + float(DONATION_BENEFIT - DONATION_COST), + float(DONATION_BENEFIT - DONATION_COST), + ), + ("donate", "keep"): (float(-DONATION_COST), float(DONATION_BENEFIT)), + ("keep", "donate"): (float(DONATION_BENEFIT), float(-DONATION_COST)), + ("keep", "keep"): (_ZERO_F, _ZERO_F), +} + + +# -- Friend or Foe (game show: both defect yields zero) -- +_FOF: dict[tuple[str, str], tuple[float, float]] = { + ("friend", "friend"): (float(FOF_SHARE_PAYOFF), float(FOF_SHARE_PAYOFF)), + ("friend", "foe"): (_ZERO_F, float(FOF_STEAL_WIN_PAYOFF)), + ("foe", "friend"): (float(FOF_STEAL_WIN_PAYOFF), _ZERO_F), + ("foe", "foe"): (_ZERO_F, _ZERO_F), +} + + +# -- Peace-War Game (arms race framing from international relations) -- +_PW: dict[tuple[str, str], tuple[float, float]] = { + ("disarm", "disarm"): (float(PW_DISARM_DISARM), float(PW_DISARM_DISARM)), + ("disarm", "arm"): (float(PW_DISARM_ARM), float(PW_ARM_DISARM)), + ("arm", "disarm"): (float(PW_ARM_DISARM), float(PW_DISARM_ARM)), + ("arm", "arm"): (float(PW_ARM_ARM), float(PW_ARM_ARM)), +} + + +# -- Register -- +PD_VARIANT_GAMES: dict[str, GameConfig] = { + "optional_pd": GameConfig( + name="Optional Prisoner's Dilemma", + description=( + "A Prisoner's Dilemma with a third action: exit. Exiting gives " + "a safe intermediate payoff regardless of the opponent's choice. " + "Tests whether outside options change cooperation dynamics and " + "models situations where players can walk away from interactions." + ), + actions=["cooperate", "defect", "exit"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_optional_pd_payoff, + ), + "asymmetric_pd": GameConfig( + name="Asymmetric Prisoner's Dilemma", + description=( + "A Prisoner's Dilemma where players have unequal payoff " + "structures. The first player has an alibi advantage with a " + "higher punishment payoff. Tests strategic reasoning under " + "asymmetric incentive conditions." + ), + actions=["cooperate", "defect"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_ASYM_PD), + ), + "donation_game": GameConfig( + name="Donation Game", + description=( + "A simplified cooperation model: each player independently " + "decides whether to donate. Donating costs the donor but " + "gives a larger benefit to the recipient. The dominant " + "strategy is to keep, but mutual donation is Pareto superior." + ), + actions=["donate", "keep"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_DG), + ), + "friend_or_foe": GameConfig( + name="Friend or Foe", + description=( + "A game show variant of the Prisoner's Dilemma. If both choose " + "friend, winnings are shared. If one steals (foe), they take all. " + "If both choose foe, neither gets anything. Unlike standard PD, " + "mutual defection yields zero, creating a weak equilibrium." + ), + actions=["friend", "foe"], + game_type="matrix", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_matrix_payoff_fn(_FOF), + ), + "peace_war": GameConfig( + name="Peace-War Game", + description=( + "An international relations framing of the Prisoner's Dilemma. " + "Players choose to arm or disarm. Mutual disarmament yields the " + "best joint outcome but unilateral arming dominates. Models " + "the security dilemma and arms race escalation dynamics." + ), + actions=["disarm", "arm"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_PW), + ), +} + +GAMES.update(PD_VARIANT_GAMES) diff --git a/common/games_coop/stochastic.py b/common/games_coop/stochastic.py new file mode 100644 index 0000000000000000000000000000000000000000..0cc50125a9b94bfb0a43fe151172c9a1c53be315 --- /dev/null +++ b/common/games_coop/stochastic.py @@ -0,0 +1,128 @@ +"""Stochastic and evolutionary game variants for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.batch4.stochastic_constants import ( + SPD_CC, SPD_CD, SPD_DC, SPD_DD, + RD_PAYOFF_DOMINANT, RD_RISK_DOMINANT, RD_MISCOORDINATION, + TPG_ENDOWMENT, TPG_THRESHOLD, TPG_SUCCESS_BONUS, + EPD_COOP_COOP, EPD_COOP_DEFECT, EPD_DEFECT_COOP, EPD_DEFECT_DEFECT, + EPD_TFT_DEFECT, EPD_DEFECT_TFT, +) + +_ONE = int(bool(True)) + + +# -- Stochastic PD (expected payoffs under action noise) -- +_SPD: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(SPD_CC), float(SPD_CC)), + ("cooperate", "defect"): (float(SPD_CD), float(SPD_DC)), + ("defect", "cooperate"): (float(SPD_DC), float(SPD_CD)), + ("defect", "defect"): (float(SPD_DD), float(SPD_DD)), +} + + +# -- Risk Dominance (payoff-dominant vs risk-dominant equilibria) -- +_RD: dict[tuple[str, str], tuple[float, float]] = { + ("risky", "risky"): (float(RD_PAYOFF_DOMINANT), float(RD_PAYOFF_DOMINANT)), + ("risky", "safe"): (float(RD_MISCOORDINATION), float(RD_MISCOORDINATION)), + ("safe", "risky"): (float(RD_MISCOORDINATION), float(RD_MISCOORDINATION)), + ("safe", "safe"): (float(RD_RISK_DOMINANT), float(RD_RISK_DOMINANT)), +} + + +# -- Threshold Public Goods (step-function provision) -- +_TPG_ENDOW_F = float(TPG_ENDOWMENT) +_TPG_THRESH = TPG_THRESHOLD +_TPG_BONUS = float(TPG_SUCCESS_BONUS) + + +def _tpg_payoff(pa: str, oa: str) -> tuple[float, float]: + p_c = int(pa.rsplit("_", _ONE)[_ONE]) + o_c = int(oa.rsplit("_", _ONE)[_ONE]) + total = p_c + o_c + if total >= _TPG_THRESH: + p_pay = _TPG_ENDOW_F - float(p_c) + _TPG_BONUS + o_pay = _TPG_ENDOW_F - float(o_c) + _TPG_BONUS + else: + p_pay = _TPG_ENDOW_F - float(p_c) + o_pay = _TPG_ENDOW_F - float(o_c) + return (p_pay, o_pay) + + +_TPG_ACTS = [f"contribute_{i}" for i in range(TPG_ENDOWMENT + _ONE)] + + +# -- Evolutionary PD (always_coop / always_defect / tit_for_tat) -- +_EPD: dict[tuple[str, str], tuple[float, float]] = { + ("always_coop", "always_coop"): (float(EPD_COOP_COOP), float(EPD_COOP_COOP)), + ("always_coop", "always_defect"): (float(EPD_COOP_DEFECT), float(EPD_DEFECT_COOP)), + ("always_coop", "tit_for_tat"): (float(EPD_COOP_COOP), float(EPD_COOP_COOP)), + ("always_defect", "always_coop"): (float(EPD_DEFECT_COOP), float(EPD_COOP_DEFECT)), + ("always_defect", "always_defect"): (float(EPD_DEFECT_DEFECT), float(EPD_DEFECT_DEFECT)), + ("always_defect", "tit_for_tat"): (float(EPD_DEFECT_TFT), float(EPD_TFT_DEFECT)), + ("tit_for_tat", "always_coop"): (float(EPD_COOP_COOP), float(EPD_COOP_COOP)), + ("tit_for_tat", "always_defect"): (float(EPD_TFT_DEFECT), float(EPD_DEFECT_TFT)), + ("tit_for_tat", "tit_for_tat"): (float(EPD_COOP_COOP), float(EPD_COOP_COOP)), +} + + +# -- Register -- +STOCHASTIC_GAMES: dict[str, GameConfig] = { + "stochastic_pd": GameConfig( + name="Stochastic Prisoner's Dilemma", + description=( + "A Prisoner's Dilemma variant where action execution is noisy. " + "With some probability each player's intended action is flipped. " + "Expected payoffs differ from the standard PD, reflecting the " + "tremble probabilities. Tests robustness of strategies to noise." + ), + actions=["cooperate", "defect"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_SPD), + ), + "risk_dominance": GameConfig( + name="Risk Dominance Game", + description=( + "A coordination game with two pure Nash equilibria: one " + "payoff-dominant (risky-risky yields higher mutual payoff) and " + "one risk-dominant (safe-safe is more robust to uncertainty). " + "Tests whether agents optimize for payoff or safety under " + "strategic uncertainty about the opponent's behavior." + ), + actions=["risky", "safe"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_RD), + ), + "threshold_public_goods": GameConfig( + name="Threshold Public Goods Game", + description=( + "A public goods game with a provision threshold. Each player " + "contributes from an endowment. If total contributions meet the " + "threshold a bonus is provided to all. Otherwise contributions " + "are spent without the bonus. Tests coordination on provision." + ), + actions=_TPG_ACTS, + game_type="threshold_public_goods", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_tpg_payoff, + ), + "evolutionary_pd": GameConfig( + name="Evolutionary Prisoner's Dilemma", + description=( + "A multi-strategy Prisoner's Dilemma representing long-run " + "evolutionary dynamics. Players choose from always cooperate " + "and always defect and tit-for-tat. Payoffs represent expected " + "long-run fitness across many interactions between strategies." + ), + actions=["always_coop", "always_defect", "tit_for_tat"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_EPD), + ), +} + +GAMES.update(STOCHASTIC_GAMES) diff --git a/common/games_ext/__pycache__/auction.cpython-311.pyc b/common/games_ext/__pycache__/auction.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d98514e31916089206ea9d3ce701e3e11219b27b Binary files /dev/null and b/common/games_ext/__pycache__/auction.cpython-311.pyc differ diff --git a/common/games_ext/__pycache__/generated.cpython-311.pyc b/common/games_ext/__pycache__/generated.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..4ec0d66c10854a025987ac2ef773728b72902327 Binary files /dev/null and b/common/games_ext/__pycache__/generated.cpython-311.pyc differ diff --git a/common/games_ext/__pycache__/matrix_games.cpython-311.pyc b/common/games_ext/__pycache__/matrix_games.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d9447c634a8284f1ca810b28de79041a74b4e858 Binary files /dev/null and b/common/games_ext/__pycache__/matrix_games.cpython-311.pyc differ diff --git a/common/games_ext/__pycache__/nplayer.cpython-311.pyc b/common/games_ext/__pycache__/nplayer.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..35bc2bccda5ab3b3b1e7358437b184b283dc28d4 Binary files /dev/null and b/common/games_ext/__pycache__/nplayer.cpython-311.pyc differ diff --git a/common/games_ext/__pycache__/sequential.cpython-311.pyc b/common/games_ext/__pycache__/sequential.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0af6b17bdfc459f25b881159c61a5e02b78ef2f0 Binary files /dev/null and b/common/games_ext/__pycache__/sequential.cpython-311.pyc differ diff --git a/common/games_ext/auction.py b/common/games_ext/auction.py new file mode 100644 index 0000000000000000000000000000000000000000..fd2f6b64199f297c3ee95228dab1c28695f8254c --- /dev/null +++ b/common/games_ext/auction.py @@ -0,0 +1,138 @@ +"""Auction mechanism games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig +from constant_definitions.game_constants import SINGLE_SHOT_ROUNDS +from constant_definitions.auction_nplayer_constants import ( + AUCTION_ITEM_VALUE, AUCTION_MAX_BID, AUCTION_BID_INCREMENT, +) + +_ONE = int(bool(True)) +_ZERO = int() +_ZERO_F = float() + + +def _parse_bid(action: str) -> int: + """Extract bid amount from action string like 'bid_5'.""" + return int(action.rsplit("_", _ONE)[_ONE]) + + +# -- First-Price Sealed Bid Auction -- + +def _first_price_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Highest bidder wins and pays their own bid.""" + p_bid = _parse_bid(player_action) + o_bid = _parse_bid(opponent_action) + + if p_bid > o_bid: + p_pay = float(AUCTION_ITEM_VALUE - p_bid) + o_pay = _ZERO_F + elif o_bid > p_bid: + p_pay = _ZERO_F + o_pay = float(AUCTION_ITEM_VALUE - o_bid) + else: + half_surplus = float(AUCTION_ITEM_VALUE - p_bid) / (_ONE + _ONE) + p_pay = half_surplus + o_pay = half_surplus + return (p_pay, o_pay) + + +# -- Second-Price (Vickrey) Auction -- + +def _vickrey_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Highest bidder wins but pays the second-highest bid.""" + p_bid = _parse_bid(player_action) + o_bid = _parse_bid(opponent_action) + + if p_bid > o_bid: + p_pay = float(AUCTION_ITEM_VALUE - o_bid) + o_pay = _ZERO_F + elif o_bid > p_bid: + p_pay = _ZERO_F + o_pay = float(AUCTION_ITEM_VALUE - p_bid) + else: + half_surplus = float(AUCTION_ITEM_VALUE - p_bid) / (_ONE + _ONE) + p_pay = half_surplus + o_pay = half_surplus + return (p_pay, o_pay) + + +# -- All-Pay Auction -- + +def _allpay_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Both bidders pay their bids; only the winner gets the item.""" + p_bid = _parse_bid(player_action) + o_bid = _parse_bid(opponent_action) + + if p_bid > o_bid: + p_pay = float(AUCTION_ITEM_VALUE - p_bid) + o_pay = float(-o_bid) + elif o_bid > p_bid: + p_pay = float(-p_bid) + o_pay = float(AUCTION_ITEM_VALUE - o_bid) + else: + half_value = float(AUCTION_ITEM_VALUE) / (_ONE + _ONE) + p_pay = half_value - float(p_bid) + o_pay = half_value - float(o_bid) + return (p_pay, o_pay) + + +# -- Action lists -- + +_BID_ACTIONS = [ + f"bid_{i}" for i in range( + _ZERO, AUCTION_MAX_BID + AUCTION_BID_INCREMENT, AUCTION_BID_INCREMENT, + ) +] + + +# -- Register -- + +AUCTION_GAMES: dict[str, GameConfig] = { + "first_price_auction": GameConfig( + name="First-Price Sealed-Bid Auction", + description=( + "Two bidders simultaneously submit sealed bids for an item. " + "The highest bidder wins and pays their own bid. Strategic " + "bidding requires shading below true value to maximize surplus " + "while still winning." + ), + actions=_BID_ACTIONS, + game_type="auction", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_first_price_payoff, + ), + "vickrey_auction": GameConfig( + name="Second-Price (Vickrey) Auction", + description=( + "Two bidders submit sealed bids. The highest bidder wins but " + "pays the second-highest bid. The dominant strategy is to bid " + "one's true valuation, making this a strategy-proof mechanism." + ), + actions=_BID_ACTIONS, + game_type="auction", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_vickrey_payoff, + ), + "allpay_auction": GameConfig( + name="All-Pay Auction", + description=( + "Two bidders submit sealed bids. Both pay their bids regardless " + "of outcome, but only the highest bidder receives the item. " + "Models contests, lobbying, and rent-seeking where effort is " + "spent whether or not you win." + ), + actions=_BID_ACTIONS, + game_type="auction", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_allpay_payoff, + ), +} + +GAMES.update(AUCTION_GAMES) diff --git a/common/games_ext/generated.py b/common/games_ext/generated.py new file mode 100644 index 0000000000000000000000000000000000000000..63520f7dd98301dd08c803e8afe9c53350007cea --- /dev/null +++ b/common/games_ext/generated.py @@ -0,0 +1,144 @@ +"""Procedurally generated games for KantBench.""" +from __future__ import annotations + +import random as _rand +from common.games import GAMES, GameConfig +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS +from constant_definitions.auction_nplayer_constants import ( + GENERATED_DEFAULT_ACTIONS, GENERATED_PAYOFF_MIN, GENERATED_PAYOFF_MAX, + GENERATED_SEED_DEFAULT, +) + +_ONE = int(bool(True)) + + +def _action_label(index: int) -> str: + """Generate action label: a, b, c, ... z, aa, ab, ...""" + alphabet_size = ord("z") - ord("a") + _ONE + if index < alphabet_size: + return chr(ord("a") + index) + first = index // alphabet_size - _ONE + second = index % alphabet_size + return chr(ord("a") + first) + chr(ord("a") + second) + + +def generate_random_symmetric( + num_actions: int = GENERATED_DEFAULT_ACTIONS, + payoff_min: int = GENERATED_PAYOFF_MIN, + payoff_max: int = GENERATED_PAYOFF_MAX, + seed: int = GENERATED_SEED_DEFAULT, +) -> GameConfig: + """Generate a random symmetric NxN matrix game. + + In a symmetric game, the payoff for the first player choosing (a, b) + equals the payoff for the second player facing (b, a). + """ + rng = _rand.Random(seed) + actions = [_action_label(i) for i in range(num_actions)] + + matrix: dict[tuple[str, str], tuple[float, float]] = {} + for i, a in enumerate(actions): + for j, b in enumerate(actions): + if (a, b) not in matrix: + p_first = float(rng.randint(payoff_min, payoff_max)) + p_second = float(rng.randint(payoff_min, payoff_max)) + matrix[(a, b)] = (p_first, p_second) + matrix[(b, a)] = (p_second, p_first) + + def _payoff(pa: str, oa: str) -> tuple[float, float]: + return matrix[(pa, oa)] + + return GameConfig( + name=f"Random Symmetric {num_actions}x{num_actions} (seed={seed})", + description=( + f"A randomly generated {num_actions}x{num_actions} symmetric " + f"matrix game with payoffs in [{payoff_min}, {payoff_max}]. " + f"Tests generalization to novel strategic structures." + ), + actions=actions, + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_payoff, + ) + + +def generate_random_asymmetric( + num_actions: int = GENERATED_DEFAULT_ACTIONS, + payoff_min: int = GENERATED_PAYOFF_MIN, + payoff_max: int = GENERATED_PAYOFF_MAX, + seed: int = GENERATED_SEED_DEFAULT, +) -> GameConfig: + """Generate a random asymmetric NxN matrix game. + + Each cell has independently drawn payoffs for both players. + """ + rng = _rand.Random(seed) + actions = [_action_label(i) for i in range(num_actions)] + + matrix: dict[tuple[str, str], tuple[float, float]] = {} + for a in actions: + for b in actions: + p_first = float(rng.randint(payoff_min, payoff_max)) + p_second = float(rng.randint(payoff_min, payoff_max)) + matrix[(a, b)] = (p_first, p_second) + + def _payoff(pa: str, oa: str) -> tuple[float, float]: + return matrix[(pa, oa)] + + return GameConfig( + name=f"Random Asymmetric {num_actions}x{num_actions} (seed={seed})", + description=( + f"A randomly generated {num_actions}x{num_actions} asymmetric " + f"matrix game with independent payoffs in [{payoff_min}, {payoff_max}]. " + f"Tests reasoning in novel non-symmetric strategic settings." + ), + actions=actions, + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_payoff, + ) + + +def generate_parameterized_pd( + temptation: int, + reward: int, + punishment: int, + sucker: int, + seed: int = GENERATED_SEED_DEFAULT, +) -> GameConfig: + """Create a Prisoner's Dilemma with custom T > R > P > S payoffs.""" + matrix: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(reward), float(reward)), + ("cooperate", "defect"): (float(sucker), float(temptation)), + ("defect", "cooperate"): (float(temptation), float(sucker)), + ("defect", "defect"): (float(punishment), float(punishment)), + } + + def _payoff(pa: str, oa: str) -> tuple[float, float]: + return matrix[(pa, oa)] + + return GameConfig( + name=f"PD(T={temptation},R={reward},P={punishment},S={sucker})", + description=( + f"A parameterized Prisoner's Dilemma with T={temptation}, " + f"R={reward}, P={punishment}, S={sucker}. Tests sensitivity " + f"to varying incentive structures." + ), + actions=["cooperate", "defect"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_payoff, + ) + + +# -- Register default generated instances -- + +_DEFAULT_SYMMETRIC = generate_random_symmetric() +_DEFAULT_ASYMMETRIC = generate_random_asymmetric(seed=GENERATED_SEED_DEFAULT + _ONE) + +GENERATED_GAMES: dict[str, GameConfig] = { + "random_symmetric_3x3": _DEFAULT_SYMMETRIC, + "random_asymmetric_3x3": _DEFAULT_ASYMMETRIC, +} + +GAMES.update(GENERATED_GAMES) diff --git a/common/games_ext/matrix_games.py b/common/games_ext/matrix_games.py new file mode 100644 index 0000000000000000000000000000000000000000..e8895765dfa7062be545572cd2efe538d06ead6b --- /dev/null +++ b/common/games_ext/matrix_games.py @@ -0,0 +1,152 @@ +"""Extended matrix (normal-form) games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.zero_sum_constants import ( + MP_MATCH_PAYOFF, MP_MISMATCH_PAYOFF, + RPS_WIN_PAYOFF, RPS_LOSE_PAYOFF, RPS_DRAW_PAYOFF, +) +from constant_definitions.coordination_constants import ( + BOS_PREFERRED_PAYOFF, BOS_COMPROMISE_PAYOFF, BOS_MISMATCH_PAYOFF, + PC_MATCH_PAYOFF, PC_MISMATCH_PAYOFF, + DL_DC_PAYOFF, DL_DD_PAYOFF, DL_CC_PAYOFF, DL_CD_PAYOFF, + HM_CC_PAYOFF, HM_DC_PAYOFF, HM_CD_PAYOFF, HM_DD_PAYOFF, +) + +# -- Matching Pennies -- +_MP_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("heads", "heads"): (float(MP_MATCH_PAYOFF), float(MP_MISMATCH_PAYOFF)), + ("heads", "tails"): (float(MP_MISMATCH_PAYOFF), float(MP_MATCH_PAYOFF)), + ("tails", "heads"): (float(MP_MISMATCH_PAYOFF), float(MP_MATCH_PAYOFF)), + ("tails", "tails"): (float(MP_MATCH_PAYOFF), float(MP_MISMATCH_PAYOFF)), +} + +# -- Rock-Paper-Scissors -- +_W, _L, _D = float(RPS_WIN_PAYOFF), float(RPS_LOSE_PAYOFF), float(RPS_DRAW_PAYOFF) +_RPS_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("rock", "rock"): (_D, _D), + ("rock", "scissors"): (_W, _L), + ("rock", "paper"): (_L, _W), + ("scissors", "rock"): (_L, _W), + ("scissors", "scissors"): (_D, _D), + ("scissors", "paper"): (_W, _L), + ("paper", "rock"): (_W, _L), + ("paper", "scissors"): (_L, _W), + ("paper", "paper"): (_D, _D), +} + +# -- Battle of the Sexes -- +_BOS_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("opera", "opera"): (float(BOS_PREFERRED_PAYOFF), float(BOS_COMPROMISE_PAYOFF)), + ("opera", "football"): (float(BOS_MISMATCH_PAYOFF), float(BOS_MISMATCH_PAYOFF)), + ("football", "opera"): (float(BOS_MISMATCH_PAYOFF), float(BOS_MISMATCH_PAYOFF)), + ("football", "football"): (float(BOS_COMPROMISE_PAYOFF), float(BOS_PREFERRED_PAYOFF)), +} + +# -- Pure Coordination -- +_PC_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("left", "left"): (float(PC_MATCH_PAYOFF), float(PC_MATCH_PAYOFF)), + ("left", "right"): (float(PC_MISMATCH_PAYOFF), float(PC_MISMATCH_PAYOFF)), + ("right", "left"): (float(PC_MISMATCH_PAYOFF), float(PC_MISMATCH_PAYOFF)), + ("right", "right"): (float(PC_MATCH_PAYOFF), float(PC_MATCH_PAYOFF)), +} + +# -- Deadlock -- +_DL_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(DL_CC_PAYOFF), float(DL_CC_PAYOFF)), + ("cooperate", "defect"): (float(DL_CD_PAYOFF), float(DL_DC_PAYOFF)), + ("defect", "cooperate"): (float(DL_DC_PAYOFF), float(DL_CD_PAYOFF)), + ("defect", "defect"): (float(DL_DD_PAYOFF), float(DL_DD_PAYOFF)), +} + +# -- Harmony -- +_HM_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(HM_CC_PAYOFF), float(HM_CC_PAYOFF)), + ("cooperate", "defect"): (float(HM_CD_PAYOFF), float(HM_DC_PAYOFF)), + ("defect", "cooperate"): (float(HM_DC_PAYOFF), float(HM_CD_PAYOFF)), + ("defect", "defect"): (float(HM_DD_PAYOFF), float(HM_DD_PAYOFF)), +} + +# -- Register all games -- + +EXTENDED_MATRIX_GAMES: dict[str, GameConfig] = { + "matching_pennies": GameConfig( + name="Matching Pennies", + description=( + "A pure zero-sum game. The matcher wins if both choose the same " + "side; the mismatcher wins if they differ. The only Nash " + "equilibrium is a mixed strategy of equal randomization." + ), + actions=["heads", "tails"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_MP_MATRIX), + ), + "rock_paper_scissors": GameConfig( + name="Rock-Paper-Scissors", + description=( + "A three-action zero-sum game: rock beats scissors, scissors " + "beats paper, paper beats rock. The unique Nash equilibrium " + "is uniform randomization over all three actions." + ), + actions=["rock", "paper", "scissors"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_RPS_MATRIX), + ), + "battle_of_the_sexes": GameConfig( + name="Battle of the Sexes", + description=( + "Two players want to coordinate but have different preferences. " + "The first player prefers opera, the second prefers football. " + "Both prefer any coordination over miscoordination. Two pure " + "Nash equilibria exist at (opera, opera) and (football, football)." + ), + actions=["opera", "football"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_BOS_MATRIX), + ), + "pure_coordination": GameConfig( + name="Pure Coordination", + description=( + "Two players receive a positive payoff only when they choose " + "the same action. Both (left, left) and (right, right) are " + "Nash equilibria. Tests whether agents can converge on a focal " + "point without communication." + ), + actions=["left", "right"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_PC_MATRIX), + ), + "deadlock": GameConfig( + name="Deadlock", + description=( + "Similar to the Prisoner's Dilemma but with different payoff " + "ordering: DC > DD > CC > CD. Both players prefer mutual " + "defection over mutual cooperation. The unique Nash equilibrium " + "is (defect, defect) and it is also Pareto optimal." + ), + actions=["cooperate", "defect"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_DL_MATRIX), + ), + "harmony": GameConfig( + name="Harmony", + description=( + "The opposite of a social dilemma: cooperation is the dominant " + "strategy for both players. Payoff ordering CC > DC > CD > DD " + "means rational self-interest naturally leads to the socially " + "optimal outcome of mutual cooperation." + ), + actions=["cooperate", "defect"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_HM_MATRIX), + ), +} + +GAMES.update(EXTENDED_MATRIX_GAMES) diff --git a/common/games_ext/nplayer.py b/common/games_ext/nplayer.py new file mode 100644 index 0000000000000000000000000000000000000000..88636e87e45ff6a6cc93d0a9af32d0259b2ca54b --- /dev/null +++ b/common/games_ext/nplayer.py @@ -0,0 +1,143 @@ +"""N-player social dilemma games for KantBench. + +Modeled as one agent vs one opponent (representing aggregate of others). +""" +from __future__ import annotations + +from common.games import GAMES, GameConfig +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.auction_nplayer_constants import ( + COMMONS_RESOURCE_CAPACITY, COMMONS_MAX_EXTRACTION, + COMMONS_DEPLETION_PENALTY, + VOLUNTEER_BENEFIT, VOLUNTEER_COST, VOLUNTEER_NO_VOL, + EL_FAROL_ATTEND_REWARD, EL_FAROL_CROWD_PENALTY, EL_FAROL_STAY_HOME, + EL_FAROL_CAPACITY, +) + +_ONE = int(bool(True)) +_ZERO_F = float() + + +# -- Tragedy of the Commons -- + +def _commons_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Resource extraction game. + + Each player extracts from a shared resource. If total extraction + exceeds capacity, both suffer a depletion penalty. + """ + p_extract = int(player_action.rsplit("_", _ONE)[_ONE]) + o_extract = int(opponent_action.rsplit("_", _ONE)[_ONE]) + total = p_extract + o_extract + + if total > COMMONS_RESOURCE_CAPACITY: + return (float(COMMONS_DEPLETION_PENALTY), float(COMMONS_DEPLETION_PENALTY)) + + return (float(p_extract), float(o_extract)) + + +_COMMONS_ACTIONS = [ + f"extract_{i}" for i in range(COMMONS_MAX_EXTRACTION + _ONE) +] + + +# -- Volunteer's Dilemma -- + +def _volunteer_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """At least one must volunteer for everyone to benefit. + + Volunteering costs the volunteer but benefits all. + If nobody volunteers, everyone gets nothing. + """ + p_vol = player_action == "volunteer" + o_vol = opponent_action == "volunteer" + + if not p_vol and not o_vol: + return (float(VOLUNTEER_NO_VOL), float(VOLUNTEER_NO_VOL)) + + p_pay = float(VOLUNTEER_BENEFIT - VOLUNTEER_COST) if p_vol else float(VOLUNTEER_BENEFIT) + o_pay = float(VOLUNTEER_BENEFIT - VOLUNTEER_COST) if o_vol else float(VOLUNTEER_BENEFIT) + return (p_pay, o_pay) + + +# -- El Farol Bar Problem -- + +def _el_farol_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Bar attendance decision game. + + Going to the bar is fun if few attend (under capacity), but + unpleasant if crowded. Staying home gives a moderate fixed payoff. + """ + p_goes = player_action == "attend" + o_goes = opponent_action == "attend" + + attendees = int(p_goes) + int(o_goes) + crowded = attendees > _ONE + + if not p_goes: + p_pay = float(EL_FAROL_STAY_HOME) + elif crowded: + p_pay = float(EL_FAROL_CROWD_PENALTY) + else: + p_pay = float(EL_FAROL_ATTEND_REWARD) + + if not o_goes: + o_pay = float(EL_FAROL_STAY_HOME) + elif crowded: + o_pay = float(EL_FAROL_CROWD_PENALTY) + else: + o_pay = float(EL_FAROL_ATTEND_REWARD) + + return (p_pay, o_pay) + + +# -- Register -- + +NPLAYER_GAMES: dict[str, GameConfig] = { + "tragedy_of_commons": GameConfig( + name="Tragedy of the Commons", + description=( + "Players extract resources from a shared pool. Individual " + "incentive is to extract more, but if total extraction exceeds " + "the sustainable capacity, the resource collapses and everyone " + "suffers. Models environmental and resource management dilemmas." + ), + actions=_COMMONS_ACTIONS, + game_type="commons", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_commons_payoff, + ), + "volunteer_dilemma": GameConfig( + name="Volunteer's Dilemma", + description=( + "At least one player must volunteer (at personal cost) for " + "everyone to receive a benefit. If nobody volunteers, all get " + "nothing. Models bystander effects and public good provision." + ), + actions=["volunteer", "abstain"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_volunteer_payoff, + ), + "el_farol": GameConfig( + name="El Farol Bar Problem", + description=( + "Each player decides whether to attend a bar. If attendance " + "is below capacity, going is better than staying home. If the " + "bar is crowded, staying home is better. Models minority games " + "and congestion dynamics." + ), + actions=["attend", "stay_home"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_el_farol_payoff, + ), +} + +GAMES.update(NPLAYER_GAMES) diff --git a/common/games_ext/sequential.py b/common/games_ext/sequential.py new file mode 100644 index 0000000000000000000000000000000000000000..1e86ad7fb075d7230ecae07389a45c2014b3f2f8 --- /dev/null +++ b/common/games_ext/sequential.py @@ -0,0 +1,140 @@ +"""Sequential (extensive-form) games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig +from constant_definitions.game_constants import SINGLE_SHOT_ROUNDS, DEFAULT_NUM_ROUNDS +from constant_definitions.sequential_constants import ( + DICTATOR_ENDOWMENT, + CENTIPEDE_INITIAL_POT, CENTIPEDE_GROWTH_MULTIPLIER, CENTIPEDE_MAX_STAGES, + CENTIPEDE_LARGE_SHARE_NUMERATOR, CENTIPEDE_LARGE_SHARE_DENOMINATOR, + CENTIPEDE_SMALL_SHARE_NUMERATOR, CENTIPEDE_SMALL_SHARE_DENOMINATOR, + STACKELBERG_DEMAND_INTERCEPT, STACKELBERG_DEMAND_SLOPE, + STACKELBERG_MARGINAL_COST, STACKELBERG_MAX_QUANTITY, +) + +_ONE = int(bool(True)) + + +# -- Dictator Game -- + +def _dictator_payoff(player_action: str, opponent_action: str) -> tuple[float, float]: + """Dictator allocates from endowment; recipient has no choice.""" + amount = int(player_action.rsplit("_", _ONE)[_ONE]) + dictator_keeps = float(DICTATOR_ENDOWMENT - amount) + recipient_gets = float(amount) + return (dictator_keeps, recipient_gets) + + +_DICTATOR_ACTIONS = [ + f"give_{i}" for i in range(DICTATOR_ENDOWMENT + _ONE) +] + + +# -- Centipede Game -- + +def _centipede_payoff(player_action: str, opponent_action: str) -> tuple[float, float]: + """Alternating pass/take game with growing pot. + + Actions encode the stage: 'take_N' means take at stage N, + 'pass_all' means pass through all stages. + The opponent strategy similarly responds with take or pass. + """ + if player_action == "pass_all": + player_stage = CENTIPEDE_MAX_STAGES + _ONE + else: + player_stage = int(player_action.rsplit("_", _ONE)[_ONE]) + + if opponent_action == "pass_all": + opp_stage = CENTIPEDE_MAX_STAGES + _ONE + else: + opp_stage = int(opponent_action.rsplit("_", _ONE)[_ONE]) + + take_stage = min(player_stage, opp_stage) + + pot = CENTIPEDE_INITIAL_POT + for _ in range(take_stage): + pot = pot * CENTIPEDE_GROWTH_MULTIPLIER + + large = pot * CENTIPEDE_LARGE_SHARE_NUMERATOR // CENTIPEDE_LARGE_SHARE_DENOMINATOR + small = pot * CENTIPEDE_SMALL_SHARE_NUMERATOR // CENTIPEDE_SMALL_SHARE_DENOMINATOR + + if player_stage <= opp_stage: + return (float(large), float(small)) + return (float(small), float(large)) + + +_CENTIPEDE_ACTIONS = [ + f"take_{i}" for i in range(CENTIPEDE_MAX_STAGES + _ONE) +] + ["pass_all"] + + +# -- Stackelberg Competition -- + +def _stackelberg_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Stackelberg duopoly: leader (player) and follower (opponent). + + Profit = (demand_intercept - slope * (q_leader + q_follower) - cost) * q + """ + q_leader = int(player_action.rsplit("_", _ONE)[_ONE]) + q_follower = int(opponent_action.rsplit("_", _ONE)[_ONE]) + + total_q = q_leader + q_follower + price = STACKELBERG_DEMAND_INTERCEPT - STACKELBERG_DEMAND_SLOPE * total_q + + leader_profit = float((price - STACKELBERG_MARGINAL_COST) * q_leader) + follower_profit = float((price - STACKELBERG_MARGINAL_COST) * q_follower) + return (leader_profit, follower_profit) + + +_STACKELBERG_ACTIONS = [ + f"produce_{i}" for i in range(STACKELBERG_MAX_QUANTITY + _ONE) +] + + +# -- Register -- + +SEQUENTIAL_GAMES: dict[str, GameConfig] = { + "dictator": GameConfig( + name="Dictator Game", + description=( + "One player (the dictator) decides how to split an endowment " + "with a passive recipient who has no say. Tests fairness " + "preferences and altruistic behavior when there is no strategic " + "incentive to share." + ), + actions=_DICTATOR_ACTIONS, + game_type="dictator", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_dictator_payoff, + ), + "centipede": GameConfig( + name="Centipede Game", + description=( + "Players alternate deciding to take or pass. Each pass doubles " + "the pot. The taker gets the larger share while the other gets " + "the smaller share. Backward induction predicts immediate taking, " + "but cooperation through passing yields higher joint payoffs." + ), + actions=_CENTIPEDE_ACTIONS, + game_type="centipede", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_centipede_payoff, + ), + "stackelberg": GameConfig( + name="Stackelberg Competition", + description=( + "A quantity-setting duopoly where the leader commits to a " + "production quantity first, and the follower observes and " + "responds. The leader can exploit first-mover advantage. " + "Price is determined by total market quantity." + ), + actions=_STACKELBERG_ACTIONS, + game_type="stackelberg", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_stackelberg_payoff, + ), +} + +GAMES.update(SEQUENTIAL_GAMES) diff --git a/common/games_info/__pycache__/bayesian.cpython-311.pyc b/common/games_info/__pycache__/bayesian.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0b41e8c4925e11230c1732c02b234e5acaa36e91 Binary files /dev/null and b/common/games_info/__pycache__/bayesian.cpython-311.pyc differ diff --git a/common/games_info/__pycache__/communication.cpython-311.pyc b/common/games_info/__pycache__/communication.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ea04ef6176f67f7ff5d52328428359e21185f0b0 Binary files /dev/null and b/common/games_info/__pycache__/communication.cpython-311.pyc differ diff --git a/common/games_info/__pycache__/contracts.cpython-311.pyc b/common/games_info/__pycache__/contracts.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..517080ab6fe437762af5991b5aad6dd3e29bf97c Binary files /dev/null and b/common/games_info/__pycache__/contracts.cpython-311.pyc differ diff --git a/common/games_info/__pycache__/network.cpython-311.pyc b/common/games_info/__pycache__/network.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..1e947a6cb8eb0af3f19c0499e6a78063b084eb35 Binary files /dev/null and b/common/games_info/__pycache__/network.cpython-311.pyc differ diff --git a/common/games_info/__pycache__/signaling.cpython-311.pyc b/common/games_info/__pycache__/signaling.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e542f178769051d2d5017d753cf373876370d588 Binary files /dev/null and b/common/games_info/__pycache__/signaling.cpython-311.pyc differ diff --git a/common/games_info/bayesian.py b/common/games_info/bayesian.py new file mode 100644 index 0000000000000000000000000000000000000000..8dba764c41a46addbc81b28703cacaa8e119049e --- /dev/null +++ b/common/games_info/bayesian.py @@ -0,0 +1,125 @@ +"""Bayesian and incomplete information games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.batch4.bayesian_constants import ( + GG_ATTACK_ATTACK, GG_ATTACK_WAIT, GG_WAIT_ATTACK, GG_WAIT_WAIT, + JV_CONVICT_CONVICT, JV_ACQUIT_ACQUIT, JV_SPLIT_VOTE, + IC_SIGNAL_SIGNAL, IC_SIGNAL_CROWD, IC_CROWD_SIGNAL, IC_CROWD_CROWD, + ASI_REVEAL_REVEAL, ASI_REVEAL_HIDE, ASI_HIDE_REVEAL, ASI_HIDE_HIDE, +) + + +# -- Global Game (regime change / bank run under private signals) -- +_GG: dict[tuple[str, str], tuple[float, float]] = { + ("attack", "attack"): (float(GG_ATTACK_ATTACK), float(GG_ATTACK_ATTACK)), + ("attack", "wait"): (float(GG_ATTACK_WAIT), float(GG_WAIT_ATTACK)), + ("wait", "attack"): (float(GG_WAIT_ATTACK), float(GG_ATTACK_WAIT)), + ("wait", "wait"): (float(GG_WAIT_WAIT), float(GG_WAIT_WAIT)), +} + + +# -- Jury Voting (unanimity rule for conviction) -- +_JV: dict[tuple[str, str], tuple[float, float]] = { + ("guilty", "guilty"): (float(JV_CONVICT_CONVICT), float(JV_CONVICT_CONVICT)), + ("guilty", "acquit"): (float(JV_SPLIT_VOTE), float(JV_SPLIT_VOTE)), + ("acquit", "guilty"): (float(JV_SPLIT_VOTE), float(JV_SPLIT_VOTE)), + ("acquit", "acquit"): (float(JV_ACQUIT_ACQUIT), float(JV_ACQUIT_ACQUIT)), +} + + +# -- Information Cascade (follow own signal vs follow crowd) -- +_IC: dict[tuple[str, str], tuple[float, float]] = { + ("follow_signal", "follow_signal"): ( + float(IC_SIGNAL_SIGNAL), float(IC_SIGNAL_SIGNAL), + ), + ("follow_signal", "follow_crowd"): ( + float(IC_SIGNAL_CROWD), float(IC_CROWD_SIGNAL), + ), + ("follow_crowd", "follow_signal"): ( + float(IC_CROWD_SIGNAL), float(IC_SIGNAL_CROWD), + ), + ("follow_crowd", "follow_crowd"): ( + float(IC_CROWD_CROWD), float(IC_CROWD_CROWD), + ), +} + + +# -- Adverse Selection (reveal or hide private type) -- +_ASI: dict[tuple[str, str], tuple[float, float]] = { + ("reveal_type", "reveal_type"): ( + float(ASI_REVEAL_REVEAL), float(ASI_REVEAL_REVEAL), + ), + ("reveal_type", "hide_type"): ( + float(ASI_REVEAL_HIDE), float(ASI_HIDE_REVEAL), + ), + ("hide_type", "reveal_type"): ( + float(ASI_HIDE_REVEAL), float(ASI_REVEAL_HIDE), + ), + ("hide_type", "hide_type"): ( + float(ASI_HIDE_HIDE), float(ASI_HIDE_HIDE), + ), +} + + +# -- Register -- +BAYESIAN_GAMES: dict[str, GameConfig] = { + "global_game": GameConfig( + name="Global Game", + description=( + "A coordination game modeling regime change or bank runs under " + "incomplete information. Players receive private signals about " + "fundamentals and choose to attack or wait. Successful coordination " + "on attack yields high payoffs but unilateral attack is costly. " + "Tests strategic behavior under private information." + ), + actions=["attack", "wait"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_GG), + ), + "jury_voting": GameConfig( + name="Jury Voting Game", + description=( + "Two jurors simultaneously vote guilty or acquit under a unanimity " + "rule. Conviction requires both voting guilty. Each juror has a " + "private signal about the defendant. Strategic voting may differ " + "from sincere voting. Tests information aggregation under voting." + ), + actions=["guilty", "acquit"], + game_type="matrix", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_matrix_payoff_fn(_JV), + ), + "information_cascade": GameConfig( + name="Information Cascade Game", + description=( + "Players choose whether to follow their own private signal or " + "follow the crowd. Independent signal-following leads to better " + "information aggregation while crowd-following creates herding. " + "Asymmetric payoffs reflect the benefit of diverse information. " + "Tests independence of judgment under social influence." + ), + actions=["follow_signal", "follow_crowd"], + game_type="matrix", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_matrix_payoff_fn(_IC), + ), + "adverse_selection_insurance": GameConfig( + name="Adverse Selection Insurance Game", + description=( + "An insurance market game with asymmetric information. Each player " + "can reveal their private risk type for efficient pricing or hide " + "it to exploit information asymmetry. Mutual revelation enables " + "fair pricing. Hiding while the other reveals creates adverse " + "selection profit. Tests screening and pooling dynamics." + ), + actions=["reveal_type", "hide_type"], + game_type="matrix", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_matrix_payoff_fn(_ASI), + ), +} + +GAMES.update(BAYESIAN_GAMES) diff --git a/common/games_info/communication.py b/common/games_info/communication.py new file mode 100644 index 0000000000000000000000000000000000000000..b9c518c11d1dc6e168557e963066ef448a601586 --- /dev/null +++ b/common/games_info/communication.py @@ -0,0 +1,162 @@ +"""Communication and mediation games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.var.communication_constants import ( + CTPD_REWARD, CTPD_TEMPTATION, CTPD_PUNISHMENT, CTPD_SUCKER, + COMMIT_COST, + CE_FOLLOW_FOLLOW, CE_FOLLOW_DEVIATE, + CE_DEVIATE_FOLLOW, CE_DEVIATE_DEVIATE, + FP_MATCH_PAYOFF, FP_MISMATCH_PAYOFF, + MG_ACCEPT_ACCEPT, MG_ACCEPT_REJECT, + MG_REJECT_ACCEPT, MG_REJECT_REJECT, +) + +_ONE = int(bool(True)) +_ZERO_F = float() + +# -- Cheap Talk PD (message + action, messages are non-binding) -- +_CTPD_BASE: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(CTPD_REWARD), float(CTPD_REWARD)), + ("cooperate", "defect"): (float(CTPD_SUCKER), float(CTPD_TEMPTATION)), + ("defect", "cooperate"): (float(CTPD_TEMPTATION), float(CTPD_SUCKER)), + ("defect", "defect"): (float(CTPD_PUNISHMENT), float(CTPD_PUNISHMENT)), +} + + +def _cheap_talk_pd_payoff(pa: str, oa: str) -> tuple[float, float]: + """Message is cheap talk; payoff depends only on actual action.""" + actual_p = pa.rsplit("_", _ONE)[_ONE] + actual_o = oa.rsplit("_", _ONE)[_ONE] + return _CTPD_BASE[(actual_p, actual_o)] + + +_CTPD_ACTS = [ + "msg_coop_cooperate", "msg_coop_defect", + "msg_def_cooperate", "msg_def_defect", +] + + +# -- Binding Commitment (costly commitment mechanism) -- +_CC = float(CTPD_REWARD) +_CS = float(CTPD_SUCKER) +_CT = float(CTPD_TEMPTATION) +_CP = float(CTPD_PUNISHMENT) +_COST = float(COMMIT_COST) + +_BIND_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("commit_coop", "commit_coop"): (_CC - _COST, _CC - _COST), + ("commit_coop", "free_coop"): (_CC - _COST, _CC), + ("commit_coop", "free_defect"): (_CS - _COST, _CT), + ("free_coop", "commit_coop"): (_CC, _CC - _COST), + ("free_coop", "free_coop"): (_CC, _CC), + ("free_coop", "free_defect"): (_CS, _CT), + ("free_defect", "commit_coop"): (_CT, _CS - _COST), + ("free_defect", "free_coop"): (_CT, _CS), + ("free_defect", "free_defect"): (_CP, _CP), +} + + +# -- Correlated Equilibrium (follow external mediator or deviate) -- +_CE: dict[tuple[str, str], tuple[float, float]] = { + ("follow", "follow"): (float(CE_FOLLOW_FOLLOW), float(CE_FOLLOW_FOLLOW)), + ("follow", "deviate"): (float(CE_FOLLOW_DEVIATE), float(CE_DEVIATE_FOLLOW)), + ("deviate", "follow"): (float(CE_DEVIATE_FOLLOW), float(CE_FOLLOW_DEVIATE)), + ("deviate", "deviate"): (float(CE_DEVIATE_DEVIATE), float(CE_DEVIATE_DEVIATE)), +} + + +# -- Focal Point (multi-option coordination without communication) -- +_FP_MATCH = float(FP_MATCH_PAYOFF) +_FP_MISS = float(FP_MISMATCH_PAYOFF) +_FP_OPTIONS = ["choose_red", "choose_green", "choose_blue", "choose_yellow"] + + +def _focal_point_payoff(pa: str, oa: str) -> tuple[float, float]: + if pa == oa: + return (_FP_MATCH, _FP_MATCH) + return (_FP_MISS, _FP_MISS) + + +# -- Mediated Game (accept or reject third-party mediation) -- +_MED: dict[tuple[str, str], tuple[float, float]] = { + ("accept", "accept"): (float(MG_ACCEPT_ACCEPT), float(MG_ACCEPT_ACCEPT)), + ("accept", "reject"): (float(MG_ACCEPT_REJECT), float(MG_REJECT_ACCEPT)), + ("reject", "accept"): (float(MG_REJECT_ACCEPT), float(MG_ACCEPT_REJECT)), + ("reject", "reject"): (float(MG_REJECT_REJECT), float(MG_REJECT_REJECT)), +} + + +# -- Register -- +COMMUNICATION_GAMES: dict[str, GameConfig] = { + "cheap_talk_pd": GameConfig( + name="Cheap Talk Prisoner's Dilemma", + description=( + "A Prisoner's Dilemma where each player sends a non-binding " + "message before acting. Messages are cheap talk: costless and " + "unenforceable. Payoffs depend only on actual actions. Tests " + "whether non-binding communication improves cooperation." + ), + actions=_CTPD_ACTS, + game_type="cheap_talk_pd", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_cheap_talk_pd_payoff, + ), + "binding_commitment": GameConfig( + name="Binding Commitment Game", + description=( + "A Prisoner's Dilemma where players can pay a cost to make a " + "binding commitment to cooperate. The commitment is credible " + "but costly. Tests whether costly signaling through commitment " + "mechanisms changes equilibrium behavior." + ), + actions=["commit_coop", "free_coop", "free_defect"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_BIND_MATRIX), + ), + "correlated_equilibrium": GameConfig( + name="Correlated Equilibrium Game", + description=( + "An external mediator sends private recommendations to each " + "player. Following yields an efficient correlated outcome. " + "Deviating can be profitable if the other follows but mutual " + "deviation destroys coordination gains. Tests trust in " + "external coordination mechanisms." + ), + actions=["follow", "deviate"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_CE), + ), + "focal_point": GameConfig( + name="Focal Point Game", + description=( + "Players must coordinate on the same choice from four options " + "without communication. Only matching yields a positive payoff. " + "Tests Schelling focal point reasoning and the ability to " + "identify salient coordination targets." + ), + actions=_FP_OPTIONS, + game_type="focal_point", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_focal_point_payoff, + ), + "mediated_game": GameConfig( + name="Mediated Game", + description=( + "A dispute between two players where a mediator proposes a " + "fair resolution. Both accepting yields an efficient outcome. " + "Rejecting while the other accepts gives an advantage but " + "mutual rejection leads to costly breakdown. Tests willingness " + "to accept third-party dispute resolution." + ), + actions=["accept", "reject"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_MED), + ), +} + +GAMES.update(COMMUNICATION_GAMES) diff --git a/common/games_info/contracts.py b/common/games_info/contracts.py new file mode 100644 index 0000000000000000000000000000000000000000..4ac687dbb8b33c75cfa5ecc82d68233e731ec35d --- /dev/null +++ b/common/games_info/contracts.py @@ -0,0 +1,125 @@ +"""Principal-agent and contract theory games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig +from constant_definitions.game_constants import SINGLE_SHOT_ROUNDS +from constant_definitions.ext.dynamic_constants import ( + MH_BASE_OUTPUT, MH_EFFORT_BOOST, MH_EFFORT_COST, MH_MAX_BONUS, + SCR_HIGH_TYPE_VALUE, SCR_LOW_TYPE_VALUE, + SCR_PREMIUM_PRICE, SCR_BASIC_PRICE, + GE_MAX_WAGE, GE_MAX_EFFORT, + GE_EFFORT_COST_PER_UNIT, GE_PRODUCTIVITY_PER_EFFORT, +) + +_ONE = int(bool(True)) +_ZERO = int() + + +# -- Moral Hazard -- +def _moral_hazard_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Principal sets bonus; agent chooses effort. + + Principal: output - bonus if agent works. + Agent: bonus - effort_cost if working, base if shirking. + """ + bonus = int(player_action.rsplit("_", _ONE)[_ONE]) + works = opponent_action == "work" + output = MH_BASE_OUTPUT + MH_EFFORT_BOOST if works else MH_BASE_OUTPUT + principal_pay = float(output - bonus) + agent_pay = float(bonus - MH_EFFORT_COST) if works else float(bonus) + return (principal_pay, agent_pay) + + +_MH_BONUS_ACTIONS = [f"bonus_{i}" for i in range(MH_MAX_BONUS + _ONE)] + + +# -- Screening -- +def _screening_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Principal offers contract menu; agent self-selects. + + Agent picks premium or basic contract based on private type. + """ + if player_action == "offer_premium": + price = SCR_PREMIUM_PRICE + else: + price = SCR_BASIC_PRICE + + if opponent_action == "choose_premium": + buyer_value = SCR_HIGH_TYPE_VALUE + seller_pay = float(SCR_PREMIUM_PRICE) + buyer_pay = float(buyer_value - SCR_PREMIUM_PRICE) + else: + buyer_value = SCR_LOW_TYPE_VALUE + seller_pay = float(SCR_BASIC_PRICE) + buyer_pay = float(buyer_value - SCR_BASIC_PRICE) + + return (seller_pay, buyer_pay) + + +# -- Gift Exchange -- +def _gift_exchange_payoff( + player_action: str, opponent_action: str, +) -> tuple[float, float]: + """Employer offers wage; worker chooses effort. + + Employer profit = productivity * effort - wage. + Worker payoff = wage - effort_cost * effort. + """ + wage = int(player_action.rsplit("_", _ONE)[_ONE]) + effort = int(opponent_action.rsplit("_", _ONE)[_ONE]) + employer_pay = float(GE_PRODUCTIVITY_PER_EFFORT * effort - wage) + worker_pay = float(wage - GE_EFFORT_COST_PER_UNIT * effort) + return (employer_pay, worker_pay) + + +_GE_WAGE_ACTIONS = [f"wage_{i}" for i in range(GE_MAX_WAGE + _ONE)] + + +# -- Register -- +CONTRACT_GAMES: dict[str, GameConfig] = { + "moral_hazard": GameConfig( + name="Moral Hazard (Principal-Agent)", + description=( + "A principal offers a bonus contract; an agent with " + "unobservable effort decides whether to work or shirk. " + "Tests optimal incentive design and the tradeoff between " + "motivation and rent extraction." + ), + actions=_MH_BONUS_ACTIONS, + game_type="moral_hazard", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_moral_hazard_payoff, + ), + "screening": GameConfig( + name="Screening Game", + description=( + "An uninformed principal offers a menu of contracts; " + "agents of different types self-select. Tests understanding " + "of incentive compatibility and separating mechanisms " + "as in Rothschild-Stiglitz insurance models." + ), + actions=["offer_premium", "offer_basic"], + game_type="matrix", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_screening_payoff, + ), + "gift_exchange": GameConfig( + name="Gift Exchange Game", + description=( + "An employer offers a wage; a worker chooses effort. " + "Nash prediction is minimal effort regardless of wage, " + "but reciprocity often leads to higher wages eliciting " + "higher effort. Tests fairness-driven behavior." + ), + actions=_GE_WAGE_ACTIONS, + game_type="gift_exchange", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_gift_exchange_payoff, + ), +} + +GAMES.update(CONTRACT_GAMES) diff --git a/common/games_info/network.py b/common/games_info/network.py new file mode 100644 index 0000000000000000000000000000000000000000..6c440180dabb00a97c9b4c393728cb0203c55a7b --- /dev/null +++ b/common/games_info/network.py @@ -0,0 +1,120 @@ +"""Network and security interaction games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.batch4.network_constants import ( + SG_DEFEND_SUCCESS, SG_ATTACK_FAIL, SG_DEFEND_FAIL, SG_ATTACK_SUCCESS, + LF_MUTUAL_CONNECT, LF_UNILATERAL_COST, LF_MUTUAL_ISOLATE, + TWP_CC, TWP_CD, TWP_DC, TWP_DD, + TWP_CP, TWP_PC, TWP_DP, TWP_PD, TWP_PP, + DG_EARLY_EARLY, DG_EARLY_LATE, DG_LATE_EARLY, DG_LATE_LATE, +) + + +# -- Security Game (defender allocates, attacker targets) -- +_SG: dict[tuple[str, str], tuple[float, float]] = { + ("target_a", "target_a"): (float(SG_DEFEND_SUCCESS), float(SG_ATTACK_FAIL)), + ("target_a", "target_b"): (float(SG_DEFEND_FAIL), float(SG_ATTACK_SUCCESS)), + ("target_b", "target_a"): (float(SG_DEFEND_FAIL), float(SG_ATTACK_SUCCESS)), + ("target_b", "target_b"): (float(SG_DEFEND_SUCCESS), float(SG_ATTACK_FAIL)), +} + + +# -- Link Formation (bilateral consent required) -- +_LF_CON = float(LF_MUTUAL_CONNECT) +_LF_UNI = float(LF_UNILATERAL_COST) +_LF_ISO = float(LF_MUTUAL_ISOLATE) + +_LF: dict[tuple[str, str], tuple[float, float]] = { + ("connect", "connect"): (_LF_CON, _LF_CON), + ("connect", "isolate"): (_LF_UNI, _LF_ISO), + ("isolate", "connect"): (_LF_ISO, _LF_UNI), + ("isolate", "isolate"): (_LF_ISO, _LF_ISO), +} + + +# -- Trust with Punishment (3x3: cooperate, defect, punish) -- +_TWP: dict[tuple[str, str], tuple[float, float]] = { + ("cooperate", "cooperate"): (float(TWP_CC), float(TWP_CC)), + ("cooperate", "defect"): (float(TWP_CD), float(TWP_DC)), + ("cooperate", "punish"): (float(TWP_CP), float(TWP_PC)), + ("defect", "cooperate"): (float(TWP_DC), float(TWP_CD)), + ("defect", "defect"): (float(TWP_DD), float(TWP_DD)), + ("defect", "punish"): (float(TWP_DP), float(TWP_PD)), + ("punish", "cooperate"): (float(TWP_PC), float(TWP_CP)), + ("punish", "defect"): (float(TWP_PD), float(TWP_DP)), + ("punish", "punish"): (float(TWP_PP), float(TWP_PP)), +} + + +# -- Dueling Game (fire timing) -- +_DG: dict[tuple[str, str], tuple[float, float]] = { + ("fire_early", "fire_early"): (float(DG_EARLY_EARLY), float(DG_EARLY_EARLY)), + ("fire_early", "fire_late"): (float(DG_EARLY_LATE), float(DG_LATE_EARLY)), + ("fire_late", "fire_early"): (float(DG_LATE_EARLY), float(DG_EARLY_LATE)), + ("fire_late", "fire_late"): (float(DG_LATE_LATE), float(DG_LATE_LATE)), +} + + +# -- Register -- +NETWORK_GAMES: dict[str, GameConfig] = { + "security_game": GameConfig( + name="Security Game", + description=( + "An attacker-defender game where the defender allocates protection " + "to one of two targets and the attacker simultaneously chooses " + "which target to attack. Matching the attacker's target means a " + "successful defense. Misallocation lets the attacker succeed. " + "Tests strategic resource allocation under adversarial uncertainty." + ), + actions=["target_a", "target_b"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_SG), + ), + "link_formation": GameConfig( + name="Link Formation Game", + description=( + "A network formation game where two players simultaneously decide " + "whether to form a connection. A link forms only when both agree. " + "Mutual connection yields network benefits. Unilateral connection " + "attempt is costly. Mutual isolation yields nothing. Tests " + "bilateral consent in network formation." + ), + actions=["connect", "isolate"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_LF), + ), + "trust_with_punishment": GameConfig( + name="Trust with Punishment Game", + description=( + "An extended trust game where players can cooperate or defect as " + "in the standard Prisoner's Dilemma plus a costly punishment " + "action. Punishing reduces the opponent's payoff but also costs " + "the punisher. Tests whether altruistic punishment enforces " + "cooperation even at personal cost." + ), + actions=["cooperate", "defect", "punish"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_TWP), + ), + "dueling_game": GameConfig( + name="Dueling Game", + description=( + "A timing game where two players simultaneously choose when to " + "fire: early for a safe but moderate payoff or late for higher " + "accuracy. Firing early against a late opponent is advantageous. " + "Mutual late firing yields better outcomes than mutual early. " + "Tests patience versus preemption under uncertainty." + ), + actions=["fire_early", "fire_late"], + game_type="matrix", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_matrix_payoff_fn(_DG), + ), +} + +GAMES.update(NETWORK_GAMES) diff --git a/common/games_info/signaling.py b/common/games_info/signaling.py new file mode 100644 index 0000000000000000000000000000000000000000..5b37a50b8804dd1f5fd346d7d01afaf3de06fb7f --- /dev/null +++ b/common/games_info/signaling.py @@ -0,0 +1,142 @@ +"""Signaling and incomplete information games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.ext.signaling_constants import ( + BQ_TOUGH_BEER_PAYOFF, BQ_TOUGH_QUICHE_PAYOFF, + BQ_WEAK_BEER_PAYOFF, BQ_WEAK_QUICHE_PAYOFF, + BQ_CHALLENGE_COST, BQ_NO_CHALLENGE_BONUS, + SPENCE_HIGH_WAGE, SPENCE_LOW_WAGE, + SPENCE_EDU_COST_HIGH, SPENCE_EDU_COST_LOW, + CT_ALIGNED_MATCH, CT_ALIGNED_MISMATCH, CT_BIAS, + LEMON_GOOD_QUALITY_VALUE, LEMON_BAD_QUALITY_VALUE, + LEMON_GOOD_SELLER_COST, LEMON_BAD_SELLER_COST, LEMON_MAX_PRICE, + BP_GOOD_STATE_VALUE, BP_BAD_STATE_PENALTY, BP_SAFE_PAYOFF, +) + +_ONE = int(bool(True)) +_TWO = _ONE + _ONE + + +# -- Beer-Quiche (simplified as simultaneous signal-response) -- +_BQ_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("beer", "challenge"): (float(BQ_TOUGH_BEER_PAYOFF + BQ_CHALLENGE_COST), float(_TWO)), + ("beer", "back_down"): (float(BQ_TOUGH_BEER_PAYOFF + BQ_NO_CHALLENGE_BONUS), float(int())), + ("quiche", "challenge"): (float(BQ_WEAK_QUICHE_PAYOFF + BQ_CHALLENGE_COST), float(-_ONE)), + ("quiche", "back_down"): (float(BQ_WEAK_QUICHE_PAYOFF + BQ_NO_CHALLENGE_BONUS), float(int())), +} + + +# -- Spence Signaling (worker picks edu level, firm responds) -- +def _spence_payoff(player_action: str, opponent_action: str) -> tuple[float, float]: + """Worker chooses education; firm offers wage based on signal.""" + educated = player_action == "educate" + high_wage = opponent_action == "high_wage" + wage = SPENCE_HIGH_WAGE if high_wage else SPENCE_LOW_WAGE + cost = SPENCE_EDU_COST_HIGH if educated else int() + worker_pay = float(wage - cost) + firm_pay = float(SPENCE_HIGH_WAGE - wage) if educated else float(SPENCE_LOW_WAGE - wage) + return (worker_pay, firm_pay) + + +# -- Cheap Talk -- +_CT_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("signal_left", "act_left"): (float(CT_ALIGNED_MATCH), float(CT_ALIGNED_MATCH)), + ("signal_left", "act_right"): (float(CT_ALIGNED_MISMATCH), float(CT_ALIGNED_MISMATCH)), + ("signal_right", "act_left"): (float(CT_ALIGNED_MISMATCH + CT_BIAS), float(CT_ALIGNED_MISMATCH)), + ("signal_right", "act_right"): (float(CT_ALIGNED_MATCH + CT_BIAS), float(CT_ALIGNED_MATCH)), +} + + +# -- Lemon Market -- +def _lemon_payoff(player_action: str, opponent_action: str) -> tuple[float, float]: + """Seller sets price; buyer decides to buy or pass.""" + price = int(player_action.rsplit("_", _ONE)[_ONE]) + if opponent_action == "pass": + return (float(int()), float(int())) + avg_value = (LEMON_GOOD_QUALITY_VALUE + LEMON_BAD_QUALITY_VALUE) // _TWO + buyer_pay = float(avg_value - price) + avg_cost = (LEMON_GOOD_SELLER_COST + LEMON_BAD_SELLER_COST) // _TWO + seller_pay = float(price - avg_cost) + return (seller_pay, buyer_pay) + + +_LEMON_ACTIONS = [f"price_{i}" for i in range(LEMON_MAX_PRICE + _ONE)] + + +# -- Bayesian Persuasion -- +_BP_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("reveal", "act"): (float(BP_GOOD_STATE_VALUE), float(BP_GOOD_STATE_VALUE)), + ("reveal", "safe"): (float(BP_SAFE_PAYOFF), float(BP_SAFE_PAYOFF)), + ("conceal", "act"): (float(BP_BAD_STATE_PENALTY), float(BP_BAD_STATE_PENALTY)), + ("conceal", "safe"): (float(BP_SAFE_PAYOFF), float(BP_SAFE_PAYOFF)), +} + + +# -- Register -- +SIGNALING_GAMES: dict[str, GameConfig] = { + "beer_quiche": GameConfig( + name="Beer-Quiche Game", + description=( + "A signaling game: the sender chooses a meal (beer or quiche) " + "to signal their type; the receiver decides whether to challenge. " + "Tests reasoning about sequential equilibrium and belief refinement." + ), + actions=["beer", "quiche"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_BQ_MATRIX), + ), + "spence_signaling": GameConfig( + name="Spence Job Market Signaling", + description=( + "A worker chooses whether to acquire education as a signal of " + "ability; a firm responds with a wage offer. Tests understanding " + "of separating versus pooling equilibria in labor markets." + ), + actions=["educate", "no_educate"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_spence_payoff, + ), + "cheap_talk": GameConfig( + name="Cheap Talk", + description=( + "A sender observes a state and sends a costless message; " + "the receiver chooses an action. Interests are partially " + "aligned. Tests strategic communication and credibility." + ), + actions=["signal_left", "signal_right"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_CT_MATRIX), + ), + "lemon_market": GameConfig( + name="Lemon Market", + description=( + "A seller with private quality information sets a price; " + "the buyer decides whether to purchase. Adverse selection " + "can cause market unraveling where only low-quality goods trade." + ), + actions=_LEMON_ACTIONS, + game_type="lemon", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_lemon_payoff, + ), + "bayesian_persuasion": GameConfig( + name="Bayesian Persuasion", + description=( + "A sender designs an information structure (reveal or conceal " + "the state); a receiver takes an action based on the signal. " + "Tests strategic information disclosure and commitment to " + "information policies." + ), + actions=["reveal", "conceal"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_BP_MATRIX), + ), +} + +GAMES.update(SIGNALING_GAMES) diff --git a/common/games_market/__pycache__/advanced.cpython-311.pyc b/common/games_market/__pycache__/advanced.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..bf11a48233928575f2e46d77da8af5b2dae6212d Binary files /dev/null and b/common/games_market/__pycache__/advanced.cpython-311.pyc differ diff --git a/common/games_market/__pycache__/classic.cpython-311.pyc b/common/games_market/__pycache__/classic.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..74e0c0ba872f1278bce51085e60a238af3033bdd Binary files /dev/null and b/common/games_market/__pycache__/classic.cpython-311.pyc differ diff --git a/common/games_market/__pycache__/contests.cpython-311.pyc b/common/games_market/__pycache__/contests.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..95418dfa62a1433d80fdee76399b0c1e12350e58 Binary files /dev/null and b/common/games_market/__pycache__/contests.cpython-311.pyc differ diff --git a/common/games_market/__pycache__/generated_v2.cpython-311.pyc b/common/games_market/__pycache__/generated_v2.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..fdd05353d23b664739e3e0780d498f8cfe2e09e1 Binary files /dev/null and b/common/games_market/__pycache__/generated_v2.cpython-311.pyc differ diff --git a/common/games_market/__pycache__/oligopoly.cpython-311.pyc b/common/games_market/__pycache__/oligopoly.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5208fc57698c709d7e53713f6d268570a1c4e5ef Binary files /dev/null and b/common/games_market/__pycache__/oligopoly.cpython-311.pyc differ diff --git a/common/games_market/advanced.py b/common/games_market/advanced.py new file mode 100644 index 0000000000000000000000000000000000000000..6c23564561e3e34e86ed53f4ab438fd87efcecb2 --- /dev/null +++ b/common/games_market/advanced.py @@ -0,0 +1,125 @@ +"""Advanced market mechanism games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.batch4.advanced_constants import ( + PRE_EARLY_EARLY, PRE_EARLY_LATE, PRE_LATE_EARLY, PRE_LATE_LATE, + PRE_OUT_PAYOFF, + WOG_LARGE_LARGE, WOG_LARGE_SMALL, WOG_LARGE_NONE, + WOG_SMALL_SMALL, WOG_SMALL_NONE, WOG_NO_GIFT, + PS_SAVE_PAYOFF, PS_SCORE_PAYOFF, PS_CENTER_BONUS, +) + +_ZERO_F = float() +_OUT_F = float(PRE_OUT_PAYOFF) + + +# -- Preemption Game (enter_early / enter_late / stay_out) -- +_PRE: dict[tuple[str, str], tuple[float, float]] = { + ("enter_early", "enter_early"): ( + float(PRE_EARLY_EARLY), float(PRE_EARLY_EARLY), + ), + ("enter_early", "enter_late"): ( + float(PRE_EARLY_LATE), float(PRE_LATE_EARLY), + ), + ("enter_early", "stay_out"): (float(PRE_EARLY_LATE), _OUT_F), + ("enter_late", "enter_early"): ( + float(PRE_LATE_EARLY), float(PRE_EARLY_LATE), + ), + ("enter_late", "enter_late"): ( + float(PRE_LATE_LATE), float(PRE_LATE_LATE), + ), + ("enter_late", "stay_out"): (float(PRE_LATE_LATE), _OUT_F), + ("stay_out", "enter_early"): (_OUT_F, float(PRE_EARLY_LATE)), + ("stay_out", "enter_late"): (_OUT_F, float(PRE_LATE_LATE)), + ("stay_out", "stay_out"): (_OUT_F, _OUT_F), +} + + +# -- War of Gifts (gift_large / gift_small / no_gift) -- +_WOG_LL = float(WOG_LARGE_LARGE) +_WOG_LS = float(WOG_LARGE_SMALL) +_WOG_LN = float(WOG_LARGE_NONE) +_WOG_SS = float(WOG_SMALL_SMALL) +_WOG_SN = float(WOG_SMALL_NONE) +_WOG_NG = float(WOG_NO_GIFT) +_WOG_SL = _ZERO_F # small loses to large + +_WOG: dict[tuple[str, str], tuple[float, float]] = { + ("gift_large", "gift_large"): (_WOG_LL, _WOG_LL), + ("gift_large", "gift_small"): (_WOG_LS, _WOG_SL), + ("gift_large", "no_gift"): (_WOG_LN, _WOG_NG), + ("gift_small", "gift_large"): (_WOG_SL, _WOG_LS), + ("gift_small", "gift_small"): (_WOG_SS, _WOG_SS), + ("gift_small", "no_gift"): (_WOG_SN, _WOG_NG), + ("no_gift", "gift_large"): (_WOG_NG, _WOG_LN), + ("no_gift", "gift_small"): (_WOG_NG, _WOG_SN), + ("no_gift", "no_gift"): (_WOG_NG, _WOG_NG), +} + + +# -- Penalty Shootout (left / center / right, kicker vs keeper) -- +_PS_SAVE = float(PS_SAVE_PAYOFF) +_PS_SCORE = float(PS_SCORE_PAYOFF) +_PS_CENTER = float(PS_CENTER_BONUS) + + +def _penalty_payoff(pa: str, oa: str) -> tuple[float, float]: + """Kicker (player) vs keeper (opponent). Match means save.""" + if pa == oa: + return (_PS_SAVE, -_PS_SAVE) + if pa == "center": + score = _PS_SCORE + _PS_CENTER + else: + score = _PS_SCORE + return (score, -score) + + +# -- Register -- +ADVANCED_GAMES: dict[str, GameConfig] = { + "preemption_game": GameConfig( + name="Preemption Game", + description=( + "A timing game with first-mover advantage. Players choose to " + "enter a market early (risky if both enter) or late (safer but " + "second-mover disadvantage) or stay out entirely for a safe " + "payoff. Early entry against a late opponent captures the market. " + "Tests preemption incentives and entry deterrence." + ), + actions=["enter_early", "enter_late", "stay_out"], + game_type="matrix", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_matrix_payoff_fn(_PRE), + ), + "war_of_gifts": GameConfig( + name="War of Gifts", + description=( + "A competitive generosity game. Players choose to give a large " + "gift or small gift or no gift. The largest giver wins prestige " + "but at material cost. Mutual large gifts cancel prestige gains. " + "No gift is safe but earns no prestige. Tests competitive " + "signaling through costly generosity." + ), + actions=["gift_large", "gift_small", "no_gift"], + game_type="matrix", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_matrix_payoff_fn(_WOG), + ), + "penalty_shootout": GameConfig( + name="Penalty Shootout", + description=( + "A zero-sum mismatch game modeling penalty kicks. The kicker " + "chooses left or center or right; the goalkeeper dives. Matching " + "means a save. Mismatching means a goal. Center kicks score a " + "bonus when the goalkeeper guesses wrong. Tests mixed-strategy " + "reasoning in adversarial settings." + ), + actions=["left", "center", "right"], + game_type="penalty_shootout", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_penalty_payoff, + ), +} + +GAMES.update(ADVANCED_GAMES) diff --git a/common/games_market/classic.py b/common/games_market/classic.py new file mode 100644 index 0000000000000000000000000000000000000000..77880c8eeafe64c990efa619ab7d31446eec1865 --- /dev/null +++ b/common/games_market/classic.py @@ -0,0 +1,164 @@ +"""Classic dilemma and extended strategic games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.var.classic_constants import ( + TD_MIN_CLAIM, TD_MAX_CLAIM, TD_BONUS, + DOLLAR_PRIZE, DOLLAR_MAX_BID, + UD_CHEAP_COST, UD_EXPENSIVE_COST, UD_CHEAP_VALUE, UD_EXPENSIVE_VALUE, + MINO_WIN_PAYOFF, MINO_TIE_PAYOFF, + RPSLS_WIN_PAYOFF, RPSLS_LOSE_PAYOFF, RPSLS_DRAW_PAYOFF, +) + +_ONE = int(bool(True)) +_TWO = _ONE + _ONE +_ZERO_F = float() + + +# -- Traveler's Dilemma -- +def _travelers_payoff(pa: str, oa: str) -> tuple[float, float]: + """Lower claim gets bonus; higher claim gets penalty.""" + claim_p = int(pa.rsplit("_", _ONE)[_ONE]) + claim_o = int(oa.rsplit("_", _ONE)[_ONE]) + if claim_p == claim_o: + return (float(claim_p), float(claim_o)) + if claim_p < claim_o: + return (float(claim_p + TD_BONUS), float(claim_p - TD_BONUS)) + return (float(claim_o - TD_BONUS), float(claim_o + TD_BONUS)) + + +_TD_ACTS = [f"claim_{i}" for i in range(TD_MIN_CLAIM, TD_MAX_CLAIM + _ONE)] + + +# -- Dollar Auction (escalation: both pay, highest wins) -- +def _dollar_auction_payoff(pa: str, oa: str) -> tuple[float, float]: + bid_p = int(pa.rsplit("_", _ONE)[_ONE]) + bid_o = int(oa.rsplit("_", _ONE)[_ONE]) + if bid_p > bid_o: + return (float(DOLLAR_PRIZE - bid_p), float(-bid_o)) + if bid_o > bid_p: + return (float(-bid_p), float(DOLLAR_PRIZE - bid_o)) + half = float(DOLLAR_PRIZE) / _TWO + return (half - float(bid_p), half - float(bid_o)) + + +_DA_ACTS = [f"bid_{i}" for i in range(DOLLAR_MAX_BID + _ONE)] + + +# -- Unscrupulous Diner's Dilemma (shared bill) -- +def _diner_payoff(pa: str, oa: str) -> tuple[float, float]: + """Each orders cheap or expensive; bill is split equally.""" + costs = {"order_cheap": UD_CHEAP_COST, "order_expensive": UD_EXPENSIVE_COST} + values = {"order_cheap": UD_CHEAP_VALUE, "order_expensive": UD_EXPENSIVE_VALUE} + total_bill = float(costs[pa] + costs[oa]) + each_pays = total_bill / _TWO + p_val = float(values[pa]) - each_pays + o_val = float(values[oa]) - each_pays + return (p_val, o_val) + + +# -- Minority Game (anti-coordination: minority side wins) -- +_MINO_ACTS = ["choose_a", "choose_b", "choose_c"] + + +def _minority_payoff(pa: str, oa: str) -> tuple[float, float]: + """With two players: matching = both lose; differing = both win.""" + if pa == oa: + return (float(MINO_TIE_PAYOFF), float(MINO_TIE_PAYOFF)) + return (float(MINO_WIN_PAYOFF), float(MINO_WIN_PAYOFF)) + + +# -- Rock-Paper-Scissors-Lizard-Spock -- +_RPSLS_W = float(RPSLS_WIN_PAYOFF) +_RPSLS_L = float(RPSLS_LOSE_PAYOFF) +_RPSLS_D = float(RPSLS_DRAW_PAYOFF) + +_RPSLS_BEATS = { + "rock": ["scissors", "lizard"], + "paper": ["rock", "spock"], + "scissors": ["paper", "lizard"], + "lizard": ["paper", "spock"], + "spock": ["rock", "scissors"], +} + + +def _rpsls_payoff(pa: str, oa: str) -> tuple[float, float]: + if pa == oa: + return (_RPSLS_D, _RPSLS_D) + if oa in _RPSLS_BEATS[pa]: + return (_RPSLS_W, _RPSLS_L) + return (_RPSLS_L, _RPSLS_W) + + +# -- Register -- +CLASSIC_GAMES: dict[str, GameConfig] = { + "travelers_dilemma": GameConfig( + name="Traveler's Dilemma", + description=( + "Two travelers submit claims. The lower claim sets the base " + "payout with a bonus for the lower claimant and a penalty for " + "the higher. Nash equilibrium is the minimum claim but " + "experimental subjects often claim high. Tests the rationality " + "paradox in iterative dominance reasoning." + ), + actions=_TD_ACTS, + game_type="travelers_dilemma", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_travelers_payoff, + ), + "dollar_auction": GameConfig( + name="Dollar Auction", + description=( + "An escalation game: both players bid and both pay their bids " + "but only the highest bidder wins the prize. Ties split the " + "prize. Models sunk cost escalation and commitment traps. " + "Tests resistance to escalation bias." + ), + actions=_DA_ACTS, + game_type="dollar_auction", + default_rounds=SINGLE_SHOT_ROUNDS, + payoff_fn=_dollar_auction_payoff, + ), + "unscrupulous_diner": GameConfig( + name="Unscrupulous Diner's Dilemma", + description=( + "Diners at a restaurant independently order cheap or expensive " + "meals and split the bill equally. Each prefers expensive food " + "but shared costs create a free-rider problem. A multiplayer " + "generalization of the Prisoner's Dilemma in social settings." + ), + actions=["order_cheap", "order_expensive"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_diner_payoff, + ), + "minority_game": GameConfig( + name="Minority Game", + description=( + "Players independently choose from three options. With two " + "players, matching choices yield a low tie payoff while " + "different choices yield a high payoff for both. Tests " + "anti-coordination and contrarian strategic reasoning." + ), + actions=_MINO_ACTS, + game_type="minority", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_minority_payoff, + ), + "rpsls": GameConfig( + name="Rock-Paper-Scissors-Lizard-Spock", + description=( + "An extended zero-sum game with five actions. Each action " + "beats two others and loses to two others. The unique Nash " + "equilibrium is uniform randomization. Tests strategic " + "reasoning in larger zero-sum action spaces." + ), + actions=["rock", "paper", "scissors", "lizard", "spock"], + game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_rpsls_payoff, + ), +} + +GAMES.update(CLASSIC_GAMES) diff --git a/common/games_market/contests.py b/common/games_market/contests.py new file mode 100644 index 0000000000000000000000000000000000000000..d2ae23c53e9df9cf1839b915827cb1430a9d8758 --- /dev/null +++ b/common/games_market/contests.py @@ -0,0 +1,188 @@ +"""Contest, conflict, and fair division games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.ext.conflict_constants import ( + BLOTTO_BATTLEFIELDS, BLOTTO_TOTAL_TROOPS, + WOA_PRIZE, WOA_COST_PER_ROUND, WOA_MAX_PERSISTENCE, + TULLOCK_PRIZE, TULLOCK_MAX_EFFORT, + INSP_VIOLATION_GAIN, INSP_FINE, INSP_INSPECTION_COST, + INSP_COMPLIANCE_PAYOFF, + RUB_SURPLUS, RUB_DISCOUNT_NUM, RUB_DISCOUNT_DEN, + DAC_ENDOWMENT, +) + +_ONE = int(bool(True)) +_TWO = _ONE + _ONE +_ZERO_F = float() + + +# -- Colonel Blotto (three battlefields, encoded as alloc_X_Y_Z) -- +def _blotto_payoff(pa: str, oa: str) -> tuple[float, float]: + """Each player allocates troops across battlefields. Most wins per field.""" + p_parts = pa.split("_")[_ONE:] + o_parts = oa.split("_")[_ONE:] + p_wins = int() + o_wins = int() + for pv, ov in zip(p_parts, o_parts): + pi, oi = int(pv), int(ov) + if pi > oi: + p_wins += _ONE + elif oi > pi: + o_wins += _ONE + return (float(p_wins), float(o_wins)) + + +def _generate_blotto_actions() -> list[str]: + """Generate all valid troop allocations across battlefields.""" + actions = [] + for a in range(BLOTTO_TOTAL_TROOPS + _ONE): + for b in range(BLOTTO_TOTAL_TROOPS - a + _ONE): + c = BLOTTO_TOTAL_TROOPS - a - b + actions.append(f"alloc_{a}_{b}_{c}") + return actions + + +_BLOTTO_ACTS = _generate_blotto_actions() + + +# -- War of Attrition -- +def _woa_payoff(pa: str, oa: str) -> tuple[float, float]: + p_pers = int(pa.rsplit("_", _ONE)[_ONE]) + o_pers = int(oa.rsplit("_", _ONE)[_ONE]) + if p_pers > o_pers: + return (float(WOA_PRIZE - p_pers * WOA_COST_PER_ROUND), + float(-o_pers * WOA_COST_PER_ROUND)) + if o_pers > p_pers: + return (float(-p_pers * WOA_COST_PER_ROUND), + float(WOA_PRIZE - o_pers * WOA_COST_PER_ROUND)) + half = float(WOA_PRIZE) / _TWO + cost = float(p_pers * WOA_COST_PER_ROUND) + return (half - cost, half - cost) + + +_WOA_ACTS = [f"persist_{i}" for i in range(WOA_MAX_PERSISTENCE + _ONE)] + + +# -- Tullock Contest -- +def _tullock_payoff(pa: str, oa: str) -> tuple[float, float]: + e_p = int(pa.rsplit("_", _ONE)[_ONE]) + e_o = int(oa.rsplit("_", _ONE)[_ONE]) + total = e_p + e_o + if total == int(): + half = float(TULLOCK_PRIZE) / _TWO + return (half, half) + p_prob = float(e_p) / float(total) + return (float(p_prob * TULLOCK_PRIZE - e_p), + float((_ONE - p_prob) * TULLOCK_PRIZE - e_o)) + + +_TULLOCK_ACTS = [f"effort_{i}" for i in range(TULLOCK_MAX_EFFORT + _ONE)] + + +# -- Inspection Game -- +_INSP_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("violate", "inspect"): (float(-INSP_FINE), float(INSP_FINE - INSP_INSPECTION_COST)), + ("violate", "no_inspect"): (float(INSP_VIOLATION_GAIN), float(int())), + ("comply", "inspect"): (float(INSP_COMPLIANCE_PAYOFF), float(-INSP_INSPECTION_COST)), + ("comply", "no_inspect"): (float(INSP_COMPLIANCE_PAYOFF), float(int())), +} + + +# -- Rubinstein Bargaining (modeled as demand with discount) -- +def _rubinstein_payoff(pa: str, oa: str) -> tuple[float, float]: + d_p = int(pa.rsplit("_", _ONE)[_ONE]) + d_o = int(oa.rsplit("_", _ONE)[_ONE]) + if d_p + d_o <= RUB_SURPLUS: + return (float(d_p), float(d_o)) + disc_p = float(d_p * RUB_DISCOUNT_NUM) / float(RUB_DISCOUNT_DEN) + disc_o = float(d_o * RUB_DISCOUNT_NUM) / float(RUB_DISCOUNT_DEN) + if d_p + d_o <= RUB_SURPLUS + _TWO: + return (disc_p, disc_o) + return (_ZERO_F, _ZERO_F) + + +_RUB_ACTS = [f"demand_{i}" for i in range(RUB_SURPLUS + _ONE)] + + +# -- Divide-and-Choose -- +def _dac_payoff(pa: str, oa: str) -> tuple[float, float]: + split = int(pa.rsplit("_", _ONE)[_ONE]) + choice = oa + left_piece = split + right_piece = DAC_ENDOWMENT - split + if choice == "choose_left": + return (float(right_piece), float(left_piece)) + return (float(left_piece), float(right_piece)) + + +_DAC_SPLIT_ACTS = [f"split_{i}" for i in range(DAC_ENDOWMENT + _ONE)] + +CONTEST_GAMES: dict[str, GameConfig] = { + "colonel_blotto": GameConfig( + name="Colonel Blotto", + description=( + "Two players allocate limited troops across multiple " + "battlefields. The player with more troops wins each field. " + "Tests multi-dimensional strategic resource allocation." + ), + actions=_BLOTTO_ACTS, game_type="blotto", + default_rounds=SINGLE_SHOT_ROUNDS, payoff_fn=_blotto_payoff, + ), + "war_of_attrition": GameConfig( + name="War of Attrition", + description=( + "Both players choose how long to persist. The survivor wins " + "a prize but both pay costs for duration. Tests endurance " + "strategy and rent dissipation reasoning." + ), + actions=_WOA_ACTS, game_type="war_of_attrition", + default_rounds=SINGLE_SHOT_ROUNDS, payoff_fn=_woa_payoff, + ), + "tullock_contest": GameConfig( + name="Tullock Contest", + description=( + "Players invest effort to win a prize. Win probability is " + "proportional to relative effort. Models lobbying, rent-seeking, " + "and competitive R&D spending." + ), + actions=_TULLOCK_ACTS, game_type="tullock", + default_rounds=SINGLE_SHOT_ROUNDS, payoff_fn=_tullock_payoff, + ), + "inspection_game": GameConfig( + name="Inspection Game", + description=( + "A potential violator chooses to comply or violate; an inspector " + "chooses whether to inspect. Mixed-strategy equilibrium models " + "compliance, auditing, and arms control verification." + ), + actions=["violate", "comply"], game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_INSP_MATRIX), + ), + "rubinstein_bargaining": GameConfig( + name="Rubinstein Bargaining", + description=( + "Players make simultaneous demands over a surplus. Compatible " + "demands yield immediate payoff; excessive demands are " + "discounted. Models alternating-offers bargaining with " + "time preference." + ), + actions=_RUB_ACTS, game_type="rubinstein", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_rubinstein_payoff, + ), + "divide_and_choose": GameConfig( + name="Divide-and-Choose", + description=( + "The divider splits a resource into two portions; the " + "chooser takes their preferred portion. The optimal " + "strategy for the divider is an even split. Tests " + "envy-free fair division reasoning." + ), + actions=_DAC_SPLIT_ACTS, game_type="divide_choose", + default_rounds=SINGLE_SHOT_ROUNDS, payoff_fn=_dac_payoff, + ), +} + +GAMES.update(CONTEST_GAMES) diff --git a/common/games_market/generated_v2.py b/common/games_market/generated_v2.py new file mode 100644 index 0000000000000000000000000000000000000000..828186b56b7736acf081758c7c9fb9a832c99635 --- /dev/null +++ b/common/games_market/generated_v2.py @@ -0,0 +1,125 @@ +"""Extended procedurally generated games for KantBench.""" +from __future__ import annotations + +import random as _rand + +from common.games import GAMES, GameConfig +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS +from constant_definitions.var.generated_ext_constants import ( + RZS_SEED, RZS_MAX_PAYOFF, RZS_DEFAULT_ACTIONS, + RC_SEED, RC_MATCH_BONUS, RC_MISMATCH_MAX, RC_DEFAULT_ACTIONS, + PCHK_RESOURCE, PCHK_FIGHT_COST, +) + +_ONE = int(bool(True)) +_TWO = _ONE + _ONE + + +def _action_label(index: int) -> str: + return chr(ord("a") + index) + + +def generate_random_zero_sum( + num_actions: int = RZS_DEFAULT_ACTIONS, + max_payoff: int = RZS_MAX_PAYOFF, + seed: int = RZS_SEED, +) -> GameConfig: + """Generate a random NxN zero-sum game.""" + rng = _rand.Random(seed) + actions = [_action_label(i) for i in range(num_actions)] + matrix: dict[tuple[str, str], tuple[float, float]] = {} + for a in actions: + for b in actions: + val = float(rng.randint(-max_payoff, max_payoff)) + matrix[(a, b)] = (val, -val) + + def _payoff(pa: str, oa: str) -> tuple[float, float]: + return matrix[(pa, oa)] + + return GameConfig( + name=f"Random Zero-Sum {num_actions}x{num_actions} (seed={seed})", + description=( + f"A randomly generated {num_actions}x{num_actions} zero-sum " + f"game. Every outcome sums to zero. Tests minimax reasoning " + f"in adversarial strategic settings." + ), + actions=actions, game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_payoff, + ) + + +def generate_random_coordination( + num_actions: int = RC_DEFAULT_ACTIONS, + match_bonus: int = RC_MATCH_BONUS, + mismatch_max: int = RC_MISMATCH_MAX, + seed: int = RC_SEED, +) -> GameConfig: + """Generate a random NxN coordination game with diagonal bonus.""" + rng = _rand.Random(seed) + actions = [_action_label(i) for i in range(num_actions)] + matrix: dict[tuple[str, str], tuple[float, float]] = {} + for a in actions: + for b in actions: + if a == b: + val = float(match_bonus + rng.randint(int(), mismatch_max)) + matrix[(a, b)] = (val, val) + else: + val = float(rng.randint(int(), mismatch_max)) + matrix[(a, b)] = (val, val) + + def _payoff(pa: str, oa: str) -> tuple[float, float]: + return matrix[(pa, oa)] + + return GameConfig( + name=f"Random Coordination {num_actions}x{num_actions} (seed={seed})", + description=( + f"A randomly generated {num_actions}x{num_actions} coordination " + f"game. Matching actions receive a bonus payoff. Tests focal " + f"point identification in novel coordination structures." + ), + actions=actions, game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_payoff, + ) + + +def generate_parameterized_chicken( + resource: int = PCHK_RESOURCE, + fight_cost: int = PCHK_FIGHT_COST, +) -> GameConfig: + """Create a Hawk-Dove / Chicken game with custom parameters.""" + half_v = float(resource) / _TWO + fight_pay = (float(resource) - float(fight_cost)) / _TWO + matrix: dict[tuple[str, str], tuple[float, float]] = { + ("hawk", "hawk"): (fight_pay, fight_pay), + ("hawk", "dove"): (float(resource), float(int())), + ("dove", "hawk"): (float(int()), float(resource)), + ("dove", "dove"): (half_v, half_v), + } + + def _payoff(pa: str, oa: str) -> tuple[float, float]: + return matrix[(pa, oa)] + + return GameConfig( + name=f"Chicken(V={resource},C={fight_cost})", + description=( + f"A parameterized Chicken / Hawk-Dove game with resource value " + f"{resource} and fight cost {fight_cost}. Tests anti-coordination " + f"behavior under varied incentive parameters." + ), + actions=["hawk", "dove"], game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_payoff, + ) + + +# -- Register default instances -- +_ZS = generate_random_zero_sum() +_CO = generate_random_coordination() +_CH = generate_parameterized_chicken() + +GENERATED_V2: dict[str, GameConfig] = { + "random_zero_sum_3x3": _ZS, + "random_coordination_3x3": _CO, + "parameterized_chicken": _CH, +} + +GAMES.update(GENERATED_V2) diff --git a/common/games_market/oligopoly.py b/common/games_market/oligopoly.py new file mode 100644 index 0000000000000000000000000000000000000000..5a3d279a73d16253f22e455f09edcf25ac64256b --- /dev/null +++ b/common/games_market/oligopoly.py @@ -0,0 +1,152 @@ +"""Market competition and bargaining games for KantBench.""" +from __future__ import annotations + +from common.games import GAMES, GameConfig, _matrix_payoff_fn +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS, SINGLE_SHOT_ROUNDS +from constant_definitions.ext.market_constants import ( + COURNOT_DEMAND_INTERCEPT, COURNOT_DEMAND_SLOPE, COURNOT_MARGINAL_COST, + COURNOT_MAX_QUANTITY, + BERTRAND_MAX_PRICE, BERTRAND_MARGINAL_COST, BERTRAND_MARKET_SIZE, + HOTELLING_LINE_LENGTH, HOTELLING_TRANSPORT_COST, HOTELLING_MARKET_VALUE, + ED_MONOPOLY_PROFIT, ED_DUOPOLY_PROFIT, ED_FIGHT_COST, + ED_ENTRANT_FIGHT_LOSS, ED_STAY_OUT_PAYOFF, + ND_SURPLUS, DA_BUYER_VALUE, DA_SELLER_COST, DA_MAX_PRICE, +) + +_ONE = int(bool(True)) +_TWO = _ONE + _ONE +_ZERO_F = float() + + +def _cournot_payoff(pa: str, oa: str) -> tuple[float, float]: + q_p = int(pa.rsplit("_", _ONE)[_ONE]) + q_o = int(oa.rsplit("_", _ONE)[_ONE]) + total = q_p + q_o + price = COURNOT_DEMAND_INTERCEPT - COURNOT_DEMAND_SLOPE * total + return (float((price - COURNOT_MARGINAL_COST) * q_p), + float((price - COURNOT_MARGINAL_COST) * q_o)) + + +def _bertrand_payoff(pa: str, oa: str) -> tuple[float, float]: + p_p = int(pa.rsplit("_", _ONE)[_ONE]) + p_o = int(oa.rsplit("_", _ONE)[_ONE]) + if p_p < p_o: + demand = max(BERTRAND_MARKET_SIZE - p_p, int()) + return (float((p_p - BERTRAND_MARGINAL_COST) * demand), _ZERO_F) + if p_o < p_p: + demand = max(BERTRAND_MARKET_SIZE - p_o, int()) + return (_ZERO_F, float((p_o - BERTRAND_MARGINAL_COST) * demand)) + demand = max(BERTRAND_MARKET_SIZE - p_p, int()) + half_profit = float((p_p - BERTRAND_MARGINAL_COST) * demand) / _TWO + return (half_profit, half_profit) + + +def _hotelling_payoff(pa: str, oa: str) -> tuple[float, float]: + loc_p = int(pa.rsplit("_", _ONE)[_ONE]) + loc_o = int(oa.rsplit("_", _ONE)[_ONE]) + if loc_p == loc_o: + share = float(HOTELLING_MARKET_VALUE) / _TWO + return (share, share) + mid = (loc_p + loc_o) / _TWO + p_share = mid if loc_p < loc_o else float(HOTELLING_LINE_LENGTH) - mid + o_share = float(HOTELLING_LINE_LENGTH) - p_share + return (float(p_share * HOTELLING_TRANSPORT_COST), + float(o_share * HOTELLING_TRANSPORT_COST)) + + +_ED_MATRIX: dict[tuple[str, str], tuple[float, float]] = { + ("enter", "accommodate"): (float(ED_DUOPOLY_PROFIT), float(ED_DUOPOLY_PROFIT)), + ("enter", "fight"): (float(ED_ENTRANT_FIGHT_LOSS), float(ED_FIGHT_COST)), + ("stay_out", "accommodate"): (float(ED_STAY_OUT_PAYOFF), float(ED_MONOPOLY_PROFIT)), + ("stay_out", "fight"): (float(ED_STAY_OUT_PAYOFF), float(ED_MONOPOLY_PROFIT)), +} + + +def _nash_demand_payoff(pa: str, oa: str) -> tuple[float, float]: + d_p = int(pa.rsplit("_", _ONE)[_ONE]) + d_o = int(oa.rsplit("_", _ONE)[_ONE]) + if d_p + d_o <= ND_SURPLUS: + return (float(d_p), float(d_o)) + return (_ZERO_F, _ZERO_F) + + +def _double_auction_payoff(pa: str, oa: str) -> tuple[float, float]: + bid = int(pa.rsplit("_", _ONE)[_ONE]) + ask = int(oa.rsplit("_", _ONE)[_ONE]) + if bid >= ask: + price = (bid + ask) // _TWO + return (float(DA_BUYER_VALUE - price), float(price - DA_SELLER_COST)) + return (_ZERO_F, _ZERO_F) + + +_COURNOT_ACTS = [f"produce_{i}" for i in range(COURNOT_MAX_QUANTITY + _ONE)] +_BERTRAND_ACTS = [f"price_{i}" for i in range(BERTRAND_MAX_PRICE + _ONE)] +_HOTELLING_ACTS = [f"locate_{i}" for i in range(HOTELLING_LINE_LENGTH + _ONE)] +_ND_ACTS = [f"demand_{i}" for i in range(ND_SURPLUS + _ONE)] +_DA_ACTS = [f"bid_{i}" for i in range(DA_MAX_PRICE + _ONE)] + +OLIGOPOLY_GAMES: dict[str, GameConfig] = { + "cournot": GameConfig( + name="Cournot Duopoly", + description=( + "Two firms simultaneously choose production quantities. " + "Market price decreases with total output. Tests Nash " + "equilibrium reasoning in quantity competition." + ), + actions=_COURNOT_ACTS, game_type="cournot", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_cournot_payoff, + ), + "bertrand": GameConfig( + name="Bertrand Competition", + description=( + "Two firms simultaneously set prices. The lower-price firm " + "captures the market. The Bertrand paradox predicts pricing " + "at marginal cost even with only two competitors." + ), + actions=_BERTRAND_ACTS, game_type="bertrand", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_bertrand_payoff, + ), + "hotelling": GameConfig( + name="Hotelling Location Game", + description=( + "Two firms choose locations on a line. Consumers visit the " + "nearest firm. Tests the principle of minimum differentiation " + "and spatial competition dynamics." + ), + actions=_HOTELLING_ACTS, game_type="hotelling", + default_rounds=DEFAULT_NUM_ROUNDS, payoff_fn=_hotelling_payoff, + ), + "entry_deterrence": GameConfig( + name="Entry Deterrence", + description=( + "A potential entrant decides whether to enter a market; " + "the incumbent decides whether to fight or accommodate. " + "Tests credible commitment and limit pricing reasoning." + ), + actions=["enter", "stay_out"], game_type="matrix", + default_rounds=DEFAULT_NUM_ROUNDS, + payoff_fn=_matrix_payoff_fn(_ED_MATRIX), + ), + "nash_demand": GameConfig( + name="Nash Demand Game", + description=( + "Two players simultaneously demand shares of a surplus. " + "If demands are compatible (sum within surplus), both " + "receive their demand; otherwise both get nothing." + ), + actions=_ND_ACTS, game_type="nash_demand", + default_rounds=SINGLE_SHOT_ROUNDS, payoff_fn=_nash_demand_payoff, + ), + "double_auction": GameConfig( + name="Double Auction", + description=( + "A buyer submits a bid and a seller submits an ask. Trade " + "occurs at the midpoint if bid exceeds ask. Tests price " + "discovery and competitive market behavior." + ), + actions=_DA_ACTS, game_type="double_auction", + default_rounds=SINGLE_SHOT_ROUNDS, payoff_fn=_double_auction_payoff, + ), +} + +GAMES.update(OLIGOPOLY_GAMES) diff --git a/common/games_meta/__pycache__/coalition_config.cpython-311.pyc b/common/games_meta/__pycache__/coalition_config.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ce4bcf9a4b64570af10dee19368b5f6e7f688381 Binary files /dev/null and b/common/games_meta/__pycache__/coalition_config.cpython-311.pyc differ diff --git a/common/games_meta/__pycache__/dynamic.cpython-311.pyc b/common/games_meta/__pycache__/dynamic.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0e6687eea3c0717ee4a905aed0957af58d2bb6fa Binary files /dev/null and b/common/games_meta/__pycache__/dynamic.cpython-311.pyc differ diff --git a/common/games_meta/__pycache__/game_tags.cpython-311.pyc b/common/games_meta/__pycache__/game_tags.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e53f8e7a52602487f35f050b7bb0255e5f0b5e80 Binary files /dev/null and b/common/games_meta/__pycache__/game_tags.cpython-311.pyc differ diff --git a/common/games_meta/__pycache__/nplayer_config.cpython-311.pyc b/common/games_meta/__pycache__/nplayer_config.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..25d4bf465b64bb0834368a159886e45b8c2edf09 Binary files /dev/null and b/common/games_meta/__pycache__/nplayer_config.cpython-311.pyc differ diff --git a/common/games_meta/coalition_config.py b/common/games_meta/coalition_config.py new file mode 100644 index 0000000000000000000000000000000000000000..480216773bda1e3e9b42bac1b4a5be097afd9415 --- /dev/null +++ b/common/games_meta/coalition_config.py @@ -0,0 +1,227 @@ +"""Coalition game configuration, payoff functions, and built-in game registry.""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import Callable + +from common.games_meta.nplayer_config import NPlayerGameConfig, NPLAYER_GAMES +from constant_definitions.nplayer.coalition_constants import ( + COALITION_DEFAULT_ROUNDS, COALITION_DEFAULT_PENALTY_NUMERATOR, + COALITION_DEFAULT_PENALTY_DENOMINATOR, + ENFORCEMENT_CHEAP_TALK, ENFORCEMENT_PENALTY, ENFORCEMENT_BINDING, + CARTEL_NUM_PLAYERS, CARTEL_COLLUDE_THRESHOLD, + CARTEL_COLLUDE_HIGH, CARTEL_COLLUDE_LOW, + CARTEL_COMPETE_HIGH, CARTEL_COMPETE_LOW, + ALLIANCE_NUM_PLAYERS, ALLIANCE_SUPPORT_POOL, + ALLIANCE_BETRAY_GAIN, ALLIANCE_NO_SUPPORT, + VOTING_NUM_PLAYERS, VOTING_WINNER_PAYOFF, VOTING_LOSER_PAYOFF, + OSTRACISM_NUM_PLAYERS, OSTRACISM_BONUS_POOL, + OSTRACISM_EXCLUDED_PAYOFF, OSTRACISM_BASE_PAYOFF, + OSTRACISM_MAJORITY_NUMERATOR, OSTRACISM_MAJORITY_DENOMINATOR, + TRADE_NUM_PLAYERS, TRADE_DIVERSE_PAYOFF, + TRADE_HOMOGENEOUS_PAYOFF, TRADE_MINORITY_BONUS, + RULE_NUM_PLAYERS, RULE_EQUAL_PAY, RULE_WINNER_HIGH, RULE_WINNER_LOW, + COMMONS_NUM_PLAYERS, COMMONS_SUSTAINABLE_THRESHOLD, + COMMONS_LOW_SUSTAINABLE, COMMONS_HIGH_SUSTAINABLE, + COMMONS_LOW_DEPLETED, COMMONS_HIGH_DEPLETED, +) + +_ONE = int(bool(True)) +_ZERO = int() +_PEN_N = COALITION_DEFAULT_PENALTY_NUMERATOR +_PEN_D = COALITION_DEFAULT_PENALTY_DENOMINATOR + + +@dataclass(frozen=True) +class CoalitionGameConfig: + """Immutable specification for a coalition-enabled N-player game.""" + + name: str + description: str + actions: list[str] + num_players: int + default_rounds: int + payoff_fn: Callable[[tuple[str, ...]], tuple[float, ...]] + enforcement: str + penalty_numerator: int + penalty_denominator: int + allow_side_payments: bool + + +COALITION_GAMES: dict[str, CoalitionGameConfig] = {} + + +def get_coalition_game(name: str) -> CoalitionGameConfig: + """Look up a coalition game by name. Raises KeyError if not found.""" + return COALITION_GAMES[name] + + +# --------------------------------------------------------------------------- +# Payoff functions +# --------------------------------------------------------------------------- + +def _cartel_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + colluders = sum(_ONE for a in actions if a == "collude") + holds = colluders >= CARTEL_COLLUDE_THRESHOLD + return tuple( + float(CARTEL_COLLUDE_HIGH if holds else CARTEL_COLLUDE_LOW) if a == "collude" + else float(CARTEL_COMPETE_HIGH if holds else CARTEL_COMPETE_LOW) + for a in actions + ) + + +def _alliance_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + supporters = sum(_ONE for a in actions if a == "support") + if supporters == _ZERO: + return tuple(float(ALLIANCE_NO_SUPPORT) for _ in actions) + return tuple( + float(ALLIANCE_SUPPORT_POOL) / supporters if a == "support" + else float(ALLIANCE_BETRAY_GAIN) for a in actions + ) + + +def _coalition_voting_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + vote_a = sum(_ONE for a in actions if a == "vote_A") + winning = "vote_A" if vote_a >= len(actions) - vote_a else "vote_B" + return tuple( + float(VOTING_WINNER_PAYOFF) if a == winning + else float(VOTING_LOSER_PAYOFF) for a in actions + ) + + +def _ostracism_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + n = len(actions) + majority = n * OSTRACISM_MAJORITY_NUMERATOR // OSTRACISM_MAJORITY_DENOMINATOR + _ONE + vote_counts: dict[str, int] = {} + for a in actions: + vote_counts[a] = vote_counts.get(a, _ZERO) + _ONE + excluded = -_ONE + for target, count in vote_counts.items(): + if target != "exclude_none" and count >= majority: + excluded = int(target.rsplit("_", _ONE)[_ONE]) + break + if excluded >= _ZERO: + non_excluded = n - _ONE + share = float(OSTRACISM_BONUS_POOL) / non_excluded if non_excluded > _ZERO else float(_ZERO) + return tuple( + float(OSTRACISM_EXCLUDED_PAYOFF) if i == excluded else share + for i in range(n) + ) + return tuple(float(OSTRACISM_BASE_PAYOFF) for _ in range(n)) + + +_OSTRACISM_ACTIONS = [f"exclude_{i}" for i in range(OSTRACISM_NUM_PLAYERS)] + ["exclude_none"] + + +def _resource_trading_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + n = len(actions) + count_a = sum(_ONE for a in actions if a == "produce_A") + count_b = n - count_a + if count_a == _ZERO or count_b == _ZERO: + return tuple(float(TRADE_HOMOGENEOUS_PAYOFF) for _ in actions) + payoffs: list[float] = [] + for a in actions: + base = float(TRADE_DIVERSE_PAYOFF) + is_min = (a == "produce_A" and count_a < count_b) or (a == "produce_B" and count_b < count_a) + payoffs.append(base + float(TRADE_MINORITY_BONUS) if is_min else base) + return tuple(payoffs) + + +def _rule_voting_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + vote_counts: dict[str, int] = {} + for a in actions: + vote_counts[a] = vote_counts.get(a, _ZERO) + _ONE + winning, best = "rule_equal", _ZERO + for rule, count in sorted(vote_counts.items()): + if count > best: + best, winning = count, rule + if winning == "rule_equal": + return tuple(float(RULE_EQUAL_PAY) for _ in actions) + return tuple( + float(RULE_WINNER_HIGH) if a == winning else float(RULE_WINNER_LOW) + for a in actions + ) + + +def _commons_governance_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + high = sum(_ONE for a in actions if a == "extract_high") + ok = high <= COMMONS_SUSTAINABLE_THRESHOLD + return tuple( + float( + (COMMONS_HIGH_SUSTAINABLE if ok else COMMONS_HIGH_DEPLETED) if a == "extract_high" + else (COMMONS_LOW_SUSTAINABLE if ok else COMMONS_LOW_DEPLETED) + ) for a in actions + ) + + +# --------------------------------------------------------------------------- +# Built-in coalition games +# --------------------------------------------------------------------------- + +def _cfg(name: str, desc: str, actions: list[str], n: int, + fn: object, enf: str, side: bool = False) -> CoalitionGameConfig: + return CoalitionGameConfig( + name=name, description=desc, actions=actions, num_players=n, + default_rounds=COALITION_DEFAULT_ROUNDS, payoff_fn=fn, # type: ignore[arg-type] + enforcement=enf, penalty_numerator=_PEN_N, penalty_denominator=_PEN_D, + allow_side_payments=side, + ) + + +_BUILTIN_COALITION_GAMES: dict[str, CoalitionGameConfig] = { + "coalition_cartel": _cfg( + "Cartel", + "Players collude or compete. If enough collude the cartel holds. " + "Defectors who promised to collude are fined under penalty enforcement.", + ["collude", "compete"], CARTEL_NUM_PLAYERS, _cartel_payoff, ENFORCEMENT_PENALTY, + ), + "coalition_alliance": _cfg( + "Alliance Formation", + "Form non-binding alliances. Supporters split a shared pool; " + "betrayers take a fixed gain. Cheap-talk: no enforcement.", + ["support", "betray"], ALLIANCE_NUM_PLAYERS, _alliance_payoff, ENFORCEMENT_CHEAP_TALK, + ), + "coalition_voting": _cfg( + "Coalition Voting", + "Form voting blocs bound to vote together. Majority earns a winner payoff. " + "Binding enforcement overrides defectors to their agreed vote.", + ["vote_A", "vote_B"], VOTING_NUM_PLAYERS, _coalition_voting_payoff, ENFORCEMENT_BINDING, + ), + "coalition_ostracism": _cfg( + "Ostracism", + "Vote to exclude a player. Excluded gets zero; others split a bonus. " + "Penalty enforcement fines defectors who break exclusion agreements.", + _OSTRACISM_ACTIONS, OSTRACISM_NUM_PLAYERS, _ostracism_payoff, ENFORCEMENT_PENALTY, + ), + "coalition_resource_trading": _cfg( + "Resource Trading", + "Produce resource A or B. Diversity is rewarded; minority producers get a bonus. " + "Cheap-talk lets players agree on production but renegotiate freely.", + ["produce_A", "produce_B"], TRADE_NUM_PLAYERS, _resource_trading_payoff, + ENFORCEMENT_CHEAP_TALK, side=True, + ), + "coalition_rule_voting": _cfg( + "Rule Voting", + "Vote on payoff rule: equal split or winner-take-all. " + "Binding enforcement locks coalition members to their agreed vote.", + ["rule_equal", "rule_winner"], RULE_NUM_PLAYERS, _rule_voting_payoff, ENFORCEMENT_BINDING, + ), + "coalition_commons": _cfg( + "Commons Governance", + "Extract from a shared resource. Over-extraction degrades payoffs. " + "Penalty enforcement fines coalition members who exceed agreed limits.", + ["extract_low", "extract_high"], COMMONS_NUM_PLAYERS, + _commons_governance_payoff, ENFORCEMENT_PENALTY, + ), +} + +COALITION_GAMES.update(_BUILTIN_COALITION_GAMES) + +# Dual registration as plain NPlayerGameConfig +for _key, _c in _BUILTIN_COALITION_GAMES.items(): + NPLAYER_GAMES[_key] = NPlayerGameConfig( + name=_c.name, description=_c.description, actions=_c.actions, + num_players=_c.num_players, default_rounds=_c.default_rounds, + payoff_fn=_c.payoff_fn, + ) diff --git a/common/games_meta/dynamic.py b/common/games_meta/dynamic.py new file mode 100644 index 0000000000000000000000000000000000000000..0cf39a58872a3dc49c01e4145c23b8586b401ff7 --- /dev/null +++ b/common/games_meta/dynamic.py @@ -0,0 +1,204 @@ +"""Dynamic game creation API for building games at runtime.""" + +from __future__ import annotations + +from typing import Callable + +from common.games import GameConfig, GAMES, _matrix_payoff_fn +from constant_definitions.nplayer.dynamic_constants import ( + MIN_ACTIONS, + MAX_ACTIONS, + DYNAMIC_DEFAULT_ROUNDS, + REGISTRY_PREFIX, +) + +_ONE = int(bool(True)) +_TWO = _ONE + _ONE + + +def _validate_actions(actions: list[str]) -> None: + """Raise ValueError if action list is invalid.""" + if len(actions) < MIN_ACTIONS: + raise ValueError( + f"Need at least {MIN_ACTIONS} actions, got {len(actions)}" + ) + if len(actions) > MAX_ACTIONS: + raise ValueError( + f"At most {MAX_ACTIONS} actions allowed, got {len(actions)}" + ) + if len(actions) != len(set(actions)): + raise ValueError("Duplicate actions are not allowed") + + +def _validate_matrix( + actions: list[str], + payoff_matrix: dict[tuple[str, str], tuple[float, float]], +) -> None: + """Raise ValueError if the matrix is incomplete or has invalid keys.""" + expected = {(a, b) for a in actions for b in actions} + actual = set(payoff_matrix.keys()) + missing = expected - actual + if missing: + raise ValueError(f"Payoff matrix is missing entries: {missing}") + extra = actual - expected + if extra: + raise ValueError(f"Payoff matrix has unknown action pairs: {extra}") + + +def create_matrix_game( + name: str, + actions: list[str], + payoff_matrix: dict[tuple[str, str], tuple[float, float]], + *, + description: str = "", + default_rounds: int = DYNAMIC_DEFAULT_ROUNDS, + register: bool = False, +) -> GameConfig: + """Create a GameConfig backed by an explicit payoff matrix. + + Parameters + ---------- + name: + Display name for the game. + actions: + List of action strings available to both players. + payoff_matrix: + ``{(player_action, opponent_action): (player_pay, opponent_pay)}``. + description: + Human-readable description of the game rules. + default_rounds: + Number of rounds when the caller does not specify. + register: + If ``True``, add the game to the global ``GAMES`` registry using the + key ``dynamic_``. + + Returns + ------- + GameConfig + """ + _validate_actions(actions) + _validate_matrix(actions, payoff_matrix) + config = GameConfig( + name=name, + description=description or f"Dynamic matrix game: {name}", + actions=list(actions), + game_type="matrix", + default_rounds=default_rounds, + payoff_fn=_matrix_payoff_fn(dict(payoff_matrix)), + ) + if register: + key = REGISTRY_PREFIX + name + GAMES[key] = config + return config + + +def create_symmetric_game( + name: str, + actions: list[str], + payoffs: dict[tuple[str, str], float], + *, + description: str = "", + default_rounds: int = DYNAMIC_DEFAULT_ROUNDS, + register: bool = False, +) -> GameConfig: + """Create a symmetric GameConfig from single-value payoffs. + + In a symmetric game, ``payoff(A, B)`` for the row player equals + ``payoff(B, A)`` for the column player. You only specify the row-player + payoff for each cell and the full matrix is derived. + + Parameters + ---------- + name: + Display name. + actions: + List of action strings. + payoffs: + ``{(my_action, their_action): my_payoff}``. + description: + Human-readable description. + default_rounds: + Number of rounds. + register: + If ``True``, register as ``dynamic_``. + + Returns + ------- + GameConfig + """ + _validate_actions(actions) + expected = {(a, b) for a in actions for b in actions} + actual = set(payoffs.keys()) + missing = expected - actual + if missing: + raise ValueError(f"Symmetric payoff table is missing entries: {missing}") + + full_matrix: dict[tuple[str, str], tuple[float, float]] = {} + for a in actions: + for b in actions: + full_matrix[(a, b)] = (payoffs[(a, b)], payoffs[(b, a)]) + + return create_matrix_game( + name, + actions, + full_matrix, + description=description, + default_rounds=default_rounds, + register=register, + ) + + +def create_custom_game( + name: str, + actions: list[str], + payoff_fn: Callable[[str, str], tuple[float, float]], + *, + game_type: str = "matrix", + description: str = "", + default_rounds: int = DYNAMIC_DEFAULT_ROUNDS, + register: bool = False, +) -> GameConfig: + """Create a GameConfig with an arbitrary payoff function. + + Parameters + ---------- + name: + Display name. + actions: + List of action strings. + payoff_fn: + ``(player_action, opponent_action) -> (player_pay, opponent_pay)``. + game_type: + Game type tag (default ``"matrix"``). + description: + Human-readable description. + default_rounds: + Number of rounds. + register: + If ``True``, register as ``dynamic_``. + + Returns + ------- + GameConfig + """ + _validate_actions(actions) + config = GameConfig( + name=name, + description=description or f"Dynamic custom game: {name}", + actions=list(actions), + game_type=game_type, + default_rounds=default_rounds, + payoff_fn=payoff_fn, + ) + if register: + key = REGISTRY_PREFIX + name + GAMES[key] = config + return config + + +def unregister_game(key: str) -> None: + """Remove a game from the global ``GAMES`` registry. + + Raises ``KeyError`` if the key is not found. + """ + del GAMES[key] diff --git a/common/games_meta/game_tags.py b/common/games_meta/game_tags.py new file mode 100644 index 0000000000000000000000000000000000000000..574966db0c140eb508c7ee8278414a7f060c4c27 --- /dev/null +++ b/common/games_meta/game_tags.py @@ -0,0 +1,190 @@ +"""Game tag registry -- maps every game key to game-theoretic property tags.""" + +from __future__ import annotations + +from constant_definitions.batch4.tag_constants import ( + # Communication + NO_COMMUNICATION, CHEAP_TALK, COSTLY_SIGNALING, + BINDING_COMMITMENT, MEDIATED, + # Information + COMPLETE_INFORMATION, INCOMPLETE_INFORMATION, ASYMMETRIC_INFORMATION, + # Structure + SIMULTANEOUS, SEQUENTIAL, REPEATED, SINGLE_SHOT, + # Payoff type + ZERO_SUM, SYMMETRIC_PAYOFF, ASYMMETRIC_PAYOFF, + COORDINATION, ANTI_COORDINATION, + # Domain + SOCIAL_DILEMMA, AUCTION, BARGAINING, VOTING, + MARKET_COMPETITION, EVOLUTIONARY, SECURITY, NETWORK, + # Action space + BINARY_CHOICE, SMALL_CHOICE, LARGE_CHOICE, + # Category grouping + CATEGORIES, +) + +# --------------------------------------------------------------------------- +# Game-to-tag mapping (one line per game, alphabetical within modules) +# --------------------------------------------------------------------------- + +GAME_TAGS: dict[str, frozenset[str]] = { + # ── Base games (server/games.py) ── + "prisoners_dilemma": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "stag_hunt": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, COORDINATION, SOCIAL_DILEMMA, BINARY_CHOICE}), + "hawk_dove": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ANTI_COORDINATION, SOCIAL_DILEMMA, BINARY_CHOICE}), + "ultimatum": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, BARGAINING, LARGE_CHOICE}), + "trust": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, SOCIAL_DILEMMA, LARGE_CHOICE}), + "public_goods": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, LARGE_CHOICE}), + + # ── games_ext/matrix_games.py ── + "matching_pennies": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ZERO_SUM, SECURITY, BINARY_CHOICE}), + "rock_paper_scissors": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ZERO_SUM, SECURITY, SMALL_CHOICE}), + "battle_of_the_sexes": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, COORDINATION, BARGAINING, BINARY_CHOICE}), + "pure_coordination": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, COORDINATION, SOCIAL_DILEMMA, BINARY_CHOICE}), + "deadlock": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "harmony": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + + # ── games_ext/sequential.py ── + "dictator": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, BARGAINING, LARGE_CHOICE}), + "centipede": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "stackelberg": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, MARKET_COMPETITION, LARGE_CHOICE}), + + # ── games_ext/auction.py ── + "first_price_auction": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, ASYMMETRIC_PAYOFF, AUCTION, LARGE_CHOICE}), + "vickrey_auction": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, ASYMMETRIC_PAYOFF, AUCTION, LARGE_CHOICE}), + "allpay_auction": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, ASYMMETRIC_PAYOFF, AUCTION, LARGE_CHOICE}), + + # ── games_ext/nplayer.py ── + "tragedy_of_commons": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, LARGE_CHOICE}), + "volunteer_dilemma": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "el_farol": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ANTI_COORDINATION, SOCIAL_DILEMMA, BINARY_CHOICE}), + + # ── games_ext/generated.py ── + "random_symmetric_3x3": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, SMALL_CHOICE}), + "random_asymmetric_3x3": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ASYMMETRIC_PAYOFF, SOCIAL_DILEMMA, SMALL_CHOICE}), + + # ── games_info/signaling.py ── + "beer_quiche": frozenset({COSTLY_SIGNALING, INCOMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, SECURITY, BINARY_CHOICE}), + "spence_signaling": frozenset({COSTLY_SIGNALING, ASYMMETRIC_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, MARKET_COMPETITION, LARGE_CHOICE}), + "cheap_talk": frozenset({CHEAP_TALK, INCOMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, COORDINATION, SOCIAL_DILEMMA, SMALL_CHOICE}), + "lemon_market": frozenset({NO_COMMUNICATION, ASYMMETRIC_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, ASYMMETRIC_PAYOFF, MARKET_COMPETITION, LARGE_CHOICE}), + "bayesian_persuasion": frozenset({CHEAP_TALK, ASYMMETRIC_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, BARGAINING, BINARY_CHOICE}), + + # ── games_info/contracts.py ── + "moral_hazard": frozenset({BINDING_COMMITMENT, ASYMMETRIC_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, MARKET_COMPETITION, SMALL_CHOICE}), + "screening": frozenset({NO_COMMUNICATION, ASYMMETRIC_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, MARKET_COMPETITION, SMALL_CHOICE}), + "gift_exchange": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, SOCIAL_DILEMMA, LARGE_CHOICE}), + + # ── games_info/communication.py ── + "cheap_talk_pd": frozenset({CHEAP_TALK, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "binding_commitment": frozenset({BINDING_COMMITMENT, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "correlated_equilibrium": frozenset({MEDIATED, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "focal_point": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, COORDINATION, SOCIAL_DILEMMA, SMALL_CHOICE}), + "mediated_game": frozenset({MEDIATED, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + + # ── games_info/bayesian.py ── + "global_game": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, COORDINATION, SOCIAL_DILEMMA, BINARY_CHOICE}), + "jury_voting": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, VOTING, BINARY_CHOICE}), + "information_cascade": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, SYMMETRIC_PAYOFF, NETWORK, BINARY_CHOICE}), + "adverse_selection_insurance": frozenset({NO_COMMUNICATION, ASYMMETRIC_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, ASYMMETRIC_PAYOFF, MARKET_COMPETITION, SMALL_CHOICE}), + + # ── games_info/network.py ── + "security_game": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ZERO_SUM, SECURITY, SMALL_CHOICE}), + "link_formation": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, NETWORK, BINARY_CHOICE}), + "trust_with_punishment": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, SOCIAL_DILEMMA, LARGE_CHOICE}), + "dueling_game": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ZERO_SUM, SECURITY, SMALL_CHOICE}), + + # ── games_market/oligopoly.py ── + "cournot": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, MARKET_COMPETITION, LARGE_CHOICE}), + "bertrand": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, MARKET_COMPETITION, LARGE_CHOICE}), + "hotelling": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, MARKET_COMPETITION, LARGE_CHOICE}), + "entry_deterrence": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, MARKET_COMPETITION, BINARY_CHOICE}), + "nash_demand": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, BARGAINING, LARGE_CHOICE}), + "double_auction": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, ASYMMETRIC_PAYOFF, AUCTION, LARGE_CHOICE}), + + # ── games_market/contests.py ── + "colonel_blotto": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, ZERO_SUM, SECURITY, SMALL_CHOICE}), + "war_of_attrition": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, BARGAINING, LARGE_CHOICE}), + "tullock_contest": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, MARKET_COMPETITION, LARGE_CHOICE}), + "inspection_game": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ZERO_SUM, SECURITY, BINARY_CHOICE}), + "rubinstein_bargaining": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, SYMMETRIC_PAYOFF, BARGAINING, LARGE_CHOICE}), + "divide_and_choose": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, SYMMETRIC_PAYOFF, BARGAINING, LARGE_CHOICE}), + + # ── games_market/classic.py ── + "travelers_dilemma": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, LARGE_CHOICE}), + "dollar_auction": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SEQUENTIAL, SINGLE_SHOT, ASYMMETRIC_PAYOFF, AUCTION, LARGE_CHOICE}), + "unscrupulous_diner": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, LARGE_CHOICE}), + "minority_game": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ANTI_COORDINATION, SOCIAL_DILEMMA, BINARY_CHOICE}), + "rpsls": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ZERO_SUM, SECURITY, SMALL_CHOICE}), + + # ── games_market/advanced.py ── + "preemption_game": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, ASYMMETRIC_PAYOFF, MARKET_COMPETITION, SMALL_CHOICE}), + "war_of_gifts": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, SMALL_CHOICE}), + "penalty_shootout": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ZERO_SUM, SECURITY, SMALL_CHOICE}), + + # ── games_market/generated_v2.py ── + "random_zero_sum_3x3": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ZERO_SUM, SECURITY, SMALL_CHOICE}), + "random_coordination_3x3": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, COORDINATION, SOCIAL_DILEMMA, SMALL_CHOICE}), + "parameterized_chicken": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ANTI_COORDINATION, SOCIAL_DILEMMA, BINARY_CHOICE}), + + # ── games_coop/cooperative.py ── + "shapley_allocation": frozenset({BINDING_COMMITMENT, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, BARGAINING, LARGE_CHOICE}), + "core_divide_dollar": frozenset({BINDING_COMMITMENT, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, BARGAINING, LARGE_CHOICE}), + "weighted_voting": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, VOTING, BINARY_CHOICE}), + "stable_matching": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, ASYMMETRIC_PAYOFF, NETWORK, SMALL_CHOICE}), + "median_voter": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, VOTING, LARGE_CHOICE}), + "approval_voting": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, VOTING, SMALL_CHOICE}), + + # ── games_coop/pd_variants.py ── + "optional_pd": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, SMALL_CHOICE}), + "asymmetric_pd": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ASYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "donation_game": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "friend_or_foe": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "peace_war": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + + # ── games_coop/dynamic.py ── + "bank_run": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, COORDINATION, SOCIAL_DILEMMA, BINARY_CHOICE}), + "global_stag_hunt": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, COORDINATION, SOCIAL_DILEMMA, BINARY_CHOICE}), + "beauty_contest": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, MARKET_COMPETITION, LARGE_CHOICE}), + "hawk_dove_bourgeois": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, ANTI_COORDINATION, EVOLUTIONARY, SMALL_CHOICE}), + "finitely_repeated_pd": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "markov_game": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + + # ── games_coop/infinite.py ── + "continuous_pd": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, LARGE_CHOICE}), + "discounted_pd": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + + # ── games_coop/stochastic.py ── + "stochastic_pd": frozenset({NO_COMMUNICATION, INCOMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, BINARY_CHOICE}), + "risk_dominance": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, COORDINATION, EVOLUTIONARY, BINARY_CHOICE}), + "threshold_public_goods": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, SINGLE_SHOT, SYMMETRIC_PAYOFF, SOCIAL_DILEMMA, LARGE_CHOICE}), + "evolutionary_pd": frozenset({NO_COMMUNICATION, COMPLETE_INFORMATION, SIMULTANEOUS, REPEATED, SYMMETRIC_PAYOFF, EVOLUTIONARY, BINARY_CHOICE}), +} + + +# --------------------------------------------------------------------------- +# Query helpers +# --------------------------------------------------------------------------- + + +def get_games_by_tag(tag: str) -> list[str]: + """Return all game keys that have the given tag.""" + return [key for key, tags in GAME_TAGS.items() if tag in tags] + + +def get_games_by_tags(*tags: str) -> list[str]: + """Return game keys that have *all* of the specified tags.""" + tag_set = frozenset(tags) + return [key for key, gtags in GAME_TAGS.items() if tag_set <= gtags] + + +def list_tags() -> list[str]: + """Return every unique tag across all games, sorted.""" + all_tags: set[str] = set() + for tags in GAME_TAGS.values(): + all_tags |= tags + return sorted(all_tags) + + +def list_categories() -> dict[str, list[str]]: + """Return tag constants grouped by dimension name.""" + return dict(CATEGORIES) diff --git a/common/games_meta/nplayer_config.py b/common/games_meta/nplayer_config.py new file mode 100644 index 0000000000000000000000000000000000000000..2445be964b17e7e4dafabbc11f1cf7024896227e --- /dev/null +++ b/common/games_meta/nplayer_config.py @@ -0,0 +1,30 @@ +"""N-player game configuration dataclass and registry.""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import Callable + +from constant_definitions.nplayer.nplayer_constants import ( + NPLAYER_DEFAULT_ROUNDS, +) + + +@dataclass(frozen=True) +class NPlayerGameConfig: + """Immutable specification for an N-player game type.""" + + name: str + description: str + actions: list[str] + num_players: int + default_rounds: int + payoff_fn: Callable[[tuple[str, ...]], tuple[float, ...]] + + +NPLAYER_GAMES: dict[str, NPlayerGameConfig] = {} + + +def get_nplayer_game(name: str) -> NPlayerGameConfig: + """Look up an N-player game by name. Raises KeyError if not found.""" + return NPLAYER_GAMES[name] diff --git a/common/games_meta/nplayer_games.py b/common/games_meta/nplayer_games.py new file mode 100644 index 0000000000000000000000000000000000000000..658968402dfc9b2bac2eb9f0f3fd884af3f42ab0 --- /dev/null +++ b/common/games_meta/nplayer_games.py @@ -0,0 +1,130 @@ +"""Built-in N-player game definitions.""" + +from __future__ import annotations + +from common.games_meta.nplayer_config import NPlayerGameConfig, NPLAYER_GAMES +from constant_definitions.nplayer.nplayer_constants import ( + NPLAYER_DEFAULT_ROUNDS, + NPG_ENDOWMENT, + NPG_MULTIPLIER_NUMERATOR, + NPG_MULTIPLIER_DENOMINATOR, + NVD_BENEFIT, + NVD_COST, + NVD_NO_VOLUNTEER, + NEF_CAPACITY_FRACTION_NUMERATOR, + NEF_CAPACITY_FRACTION_DENOMINATOR, + NEF_ATTEND_REWARD, + NEF_CROWD_PENALTY, + NEF_STAY_HOME, +) + +_ONE = int(bool(True)) +_ZERO = int() + + +# --------------------------------------------------------------------------- +# Public Goods Game (N-player) +# --------------------------------------------------------------------------- + +def _public_goods_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + """Each player contributes from an endowment. The pot is multiplied and split.""" + n = len(actions) + contributions = [] + for a in actions: + contributions.append(int(a.rsplit("_", _ONE)[_ONE])) + total = sum(contributions) + pool = total * NPG_MULTIPLIER_NUMERATOR / NPG_MULTIPLIER_DENOMINATOR + share = pool / n + payoffs = tuple( + float(NPG_ENDOWMENT - c + share) for c in contributions + ) + return payoffs + + +_PG_ACTIONS = [f"contribute_{i}" for i in range(NPG_ENDOWMENT + _ONE)] + + +# --------------------------------------------------------------------------- +# Volunteer's Dilemma (N-player) +# --------------------------------------------------------------------------- + +def _volunteer_dilemma_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + """If at least one player volunteers, everyone benefits but volunteers pay a cost.""" + any_volunteer = any(a == "volunteer" for a in actions) + payoffs: list[float] = [] + for a in actions: + if not any_volunteer: + payoffs.append(float(NVD_NO_VOLUNTEER)) + elif a == "volunteer": + payoffs.append(float(NVD_BENEFIT - NVD_COST)) + else: + payoffs.append(float(NVD_BENEFIT)) + return tuple(payoffs) + + +# --------------------------------------------------------------------------- +# El Farol Bar Problem (N-player) +# --------------------------------------------------------------------------- + +def _el_farol_payoff(actions: tuple[str, ...]) -> tuple[float, ...]: + """Attend a bar that is fun only when not overcrowded.""" + n = len(actions) + capacity = n * NEF_CAPACITY_FRACTION_NUMERATOR // NEF_CAPACITY_FRACTION_DENOMINATOR + attendees = sum(_ONE for a in actions if a == "attend") + crowded = attendees > capacity + payoffs: list[float] = [] + for a in actions: + if a == "stay_home": + payoffs.append(float(NEF_STAY_HOME)) + elif crowded: + payoffs.append(float(NEF_CROWD_PENALTY)) + else: + payoffs.append(float(NEF_ATTEND_REWARD)) + return tuple(payoffs) + + +# --------------------------------------------------------------------------- +# Registry +# --------------------------------------------------------------------------- + +_THREE = _ONE + _ONE + _ONE +_FIVE = _THREE + _ONE + _ONE + +_BUILTIN_NPLAYER_GAMES: dict[str, NPlayerGameConfig] = { + "nplayer_public_goods": NPlayerGameConfig( + name="N-Player Public Goods", + description=( + "Each player contributes from an endowment. The total pot is " + "multiplied and split equally among all players." + ), + actions=_PG_ACTIONS, + num_players=_FIVE, + default_rounds=NPLAYER_DEFAULT_ROUNDS, + payoff_fn=_public_goods_payoff, + ), + "nplayer_volunteer_dilemma": NPlayerGameConfig( + name="N-Player Volunteer's Dilemma", + description=( + "Players choose to volunteer or abstain. If at least one " + "volunteers, everyone benefits but volunteers pay a cost. " + "If nobody volunteers, everyone gets nothing." + ), + actions=["volunteer", "abstain"], + num_players=_FIVE, + default_rounds=NPLAYER_DEFAULT_ROUNDS, + payoff_fn=_volunteer_dilemma_payoff, + ), + "nplayer_el_farol": NPlayerGameConfig( + name="N-Player El Farol Bar", + description=( + "Players decide whether to attend a bar. The bar is fun when " + "not crowded but unpleasant when too many people show up." + ), + actions=["attend", "stay_home"], + num_players=_FIVE, + default_rounds=NPLAYER_DEFAULT_ROUNDS, + payoff_fn=_el_farol_payoff, + ), +} + +NPLAYER_GAMES.update(_BUILTIN_NPLAYER_GAMES) diff --git a/common/strategies.py b/common/strategies.py new file mode 100644 index 0000000000000000000000000000000000000000..0da29ff8ce140eadcf6cd3e6d935288d8f54472c --- /dev/null +++ b/common/strategies.py @@ -0,0 +1,259 @@ +"""Opponent strategy module for KantBench.""" +from __future__ import annotations +import random +from typing import Callable, Protocol +from constant_definitions.game_constants import ( + DEFAULT_ZERO_FLOAT, + GENEROUS_TFT_COOPERATION_PROB, GENEROUS_TFT_DENOMINATOR, + ADAPTIVE_THRESHOLD_NUMERATOR, ADAPTIVE_THRESHOLD_DENOMINATOR, + MIXED_STRATEGY_COOPERATE_PROB_NUMERATOR, MIXED_STRATEGY_COOPERATE_PROB_DENOMINATOR, + ULTIMATUM_POT, ULTIMATUM_FAIR_OFFER, ULTIMATUM_LOW_OFFER, ULTIMATUM_ACCEPT_THRESHOLD, + TRUST_ENDOWMENT, TRUST_MULTIPLIER, + TRUST_FAIR_RETURN_NUMERATOR, TRUST_FAIR_RETURN_DENOMINATOR, + TRUST_GENEROUS_RETURN_NUMERATOR, TRUST_GENEROUS_RETURN_DENOMINATOR, + PG_ENDOWMENT, PG_FAIR_CONTRIBUTION_NUMERATOR, PG_FAIR_CONTRIBUTION_DENOMINATOR, + PG_FREE_RIDER_CONTRIBUTION, +) + +_ONE = int(bool(True)) +_ZERO = int() +_TWO = _ONE + _ONE + + +class OpponentStrategy(Protocol): + """Interface every opponent strategy must satisfy.""" + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: ... + + +class _MatrixBase: + """Shared helpers for matrix-game strategies.""" + @staticmethod + def _coop(a: list[str]) -> str: return a[_ZERO] + @staticmethod + def _defect(a: list[str]) -> str: return a[_ONE] + @staticmethod + def _mirror(a: list[str], opp: str) -> str: return opp if opp in a else a[_ZERO] + def _last_opp(self, h: list[dict]) -> str | None: + return h[-_ONE]["player_action"] if h else None + + +class RandomStrategy: + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + return random.choice(actions) + + +class AlwaysCooperateStrategy(_MatrixBase): + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + return self._coop(actions) + + +class AlwaysDefectStrategy(_MatrixBase): + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + return self._defect(actions) + + +class TitForTatStrategy(_MatrixBase): + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + last = self._last_opp(history) + return self._coop(actions) if last is None else self._mirror(actions, last) + + +class TitForTwoTatsStrategy(_MatrixBase): + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + d = self._defect(actions) + if len(history) >= _TWO: + if all(r["player_action"] == d for r in history[-_TWO:]): + return d + return self._coop(actions) + + +class GrudgerStrategy(_MatrixBase): + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + d = self._defect(actions) + if any(r["player_action"] == d for r in history): + return d + return self._coop(actions) + + +class PavlovStrategy(_MatrixBase): + """Cooperate first. Repeat previous move if won (same choices); switch otherwise.""" + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + if not history: + return self._coop(actions) + my_last = history[-_ONE]["opponent_action"] + opp_last = history[-_ONE]["player_action"] + if my_last == opp_last: + return self._coop(actions) + return self._defect(actions) + + +class SuspiciousTitForTatStrategy(_MatrixBase): + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + last = self._last_opp(history) + return self._defect(actions) if last is None else self._mirror(actions, last) + + +class GenerousTitForTatStrategy(_MatrixBase): + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + last = self._last_opp(history) + if last is None: + return self._coop(actions) + if last == self._defect(actions): + if random.randint(_ZERO, GENEROUS_TFT_DENOMINATOR) < GENEROUS_TFT_COOPERATION_PROB: + return self._coop(actions) + return self._defect(actions) + return self._coop(actions) + + +class AdaptiveStrategy(_MatrixBase): + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + if not history: + return self._coop(actions) + c = self._coop(actions) + coop_count = sum(_ONE for r in history if r["player_action"] == c) + threshold = len(history) * ADAPTIVE_THRESHOLD_NUMERATOR / ADAPTIVE_THRESHOLD_DENOMINATOR + return c if coop_count > threshold else self._defect(actions) + + +class MixedStrategy(_MatrixBase): + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + bound = MIXED_STRATEGY_COOPERATE_PROB_DENOMINATOR - _ONE + if random.randint(_ZERO, bound) < MIXED_STRATEGY_COOPERATE_PROB_NUMERATOR: + return self._coop(actions) + return self._defect(actions) + + +# --------------------------------------------------------------------------- +# Game-specific strategies +# --------------------------------------------------------------------------- + +def _parse_amount(action: str) -> int: + return int(action.rsplit("_", _ONE)[_ONE]) + + +class UltimatumFairStrategy: + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + offer_tag = f"offer_{ULTIMATUM_FAIR_OFFER}" + if offer_tag in actions: + return offer_tag + if "accept" in actions and history: + return "accept" if _parse_amount(history[-_ONE]["player_action"]) >= ULTIMATUM_ACCEPT_THRESHOLD else "reject" + return actions[_ZERO] + + +class UltimatumLowStrategy: + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + offer_tag = f"offer_{ULTIMATUM_LOW_OFFER}" + if offer_tag in actions: + return offer_tag + if "accept" in actions: + return "accept" + return actions[_ZERO] + + +class TrustFairStrategy: + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + inv = f"invest_{TRUST_ENDOWMENT}" + if inv in actions: + return inv + if history: + total = _parse_amount(history[-_ONE]["player_action"]) * TRUST_MULTIPLIER + ret = total * TRUST_FAIR_RETURN_NUMERATOR // TRUST_FAIR_RETURN_DENOMINATOR + return f"return_{ret}" + return actions[_ZERO] + + +class TrustGenerousStrategy: + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + inv = f"invest_{TRUST_ENDOWMENT}" + if inv in actions: + return inv + if history: + total = _parse_amount(history[-_ONE]["player_action"]) * TRUST_MULTIPLIER + ret = total * TRUST_GENEROUS_RETURN_NUMERATOR // TRUST_GENEROUS_RETURN_DENOMINATOR + return f"return_{ret}" + return actions[_ZERO] + + +class PublicGoodsFairStrategy: + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + amount = PG_ENDOWMENT * PG_FAIR_CONTRIBUTION_NUMERATOR // PG_FAIR_CONTRIBUTION_DENOMINATOR + tag = f"contribute_{amount}" + return tag if tag in actions else actions[_ZERO] + + +class PublicGoodsFreeRiderStrategy: + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + tag = f"contribute_{PG_FREE_RIDER_CONTRIBUTION}" + return tag if tag in actions else actions[_ZERO] + + +# --------------------------------------------------------------------------- +# Agent wrapper +# --------------------------------------------------------------------------- + + +class AgentStrategy: + """Wraps a ``Callable[[GameObservation], GameAction]`` into the + :class:`OpponentStrategy` protocol. + + Uses the limited information available in ``choose_action()`` — sufficient + for simple agents but not for full LLM agents (use ``opponent_fn`` on the + environment for that). + """ + + def __init__( + self, + fn: Callable, + ) -> None: + self._fn = fn + + def choose_action(self, game_type: str, actions: list[str], history: list[dict]) -> str: + from env.models import GameObservation, GameAction, RoundResult + flipped = [ + RoundResult( + round_number=i + _ONE, + player_action=r["opponent_action"], + opponent_action=r["player_action"], + player_payoff=DEFAULT_ZERO_FLOAT, + opponent_payoff=DEFAULT_ZERO_FLOAT, + ) + for i, r in enumerate(history) + ] + obs = GameObservation( + available_actions=list(actions), + history=flipped, + game_name=game_type, + opponent_strategy="agent", + ) + return self._fn(obs).action + + +# --------------------------------------------------------------------------- +# Registry +# --------------------------------------------------------------------------- + +STRATEGIES: dict[str, OpponentStrategy] = { + "random": RandomStrategy(), + "always_cooperate": AlwaysCooperateStrategy(), + "always_defect": AlwaysDefectStrategy(), + "tit_for_tat": TitForTatStrategy(), + "tit_for_two_tats": TitForTwoTatsStrategy(), + "grudger": GrudgerStrategy(), + "pavlov": PavlovStrategy(), + "suspicious_tit_for_tat": SuspiciousTitForTatStrategy(), + "generous_tit_for_tat": GenerousTitForTatStrategy(), + "adaptive": AdaptiveStrategy(), + "mixed": MixedStrategy(), + "ultimatum_fair": UltimatumFairStrategy(), + "ultimatum_low": UltimatumLowStrategy(), + "trust_fair": TrustFairStrategy(), + "trust_generous": TrustGenerousStrategy(), + "public_goods_fair": PublicGoodsFairStrategy(), + "public_goods_free_rider": PublicGoodsFreeRiderStrategy(), +} + + +def get_strategy(name: str) -> OpponentStrategy: + """Look up a strategy by name. Raises KeyError if not found.""" + return STRATEGIES[name] diff --git a/constant_definitions/__pycache__/auction_nplayer_constants.cpython-311.pyc b/constant_definitions/__pycache__/auction_nplayer_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..16451fac3f260cf9aded9d701b87a5820503fe01 Binary files /dev/null and b/constant_definitions/__pycache__/auction_nplayer_constants.cpython-311.pyc differ diff --git a/constant_definitions/__pycache__/coordination_constants.cpython-311.pyc b/constant_definitions/__pycache__/coordination_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6763dcd4bc31961309e244dbf961ef8f33bd5e93 Binary files /dev/null and b/constant_definitions/__pycache__/coordination_constants.cpython-311.pyc differ diff --git a/constant_definitions/__pycache__/game_constants.cpython-311.pyc b/constant_definitions/__pycache__/game_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e312cb765cf9ba21be8f4844b677126cca8ab72f Binary files /dev/null and b/constant_definitions/__pycache__/game_constants.cpython-311.pyc differ diff --git a/constant_definitions/__pycache__/sequential_constants.cpython-311.pyc b/constant_definitions/__pycache__/sequential_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..af69e592972883788d33aa352f1cd96a0005ecf8 Binary files /dev/null and b/constant_definitions/__pycache__/sequential_constants.cpython-311.pyc differ diff --git a/constant_definitions/__pycache__/zero_sum_constants.cpython-311.pyc b/constant_definitions/__pycache__/zero_sum_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..85aee5954270af376fd2273298265729c5309cfd Binary files /dev/null and b/constant_definitions/__pycache__/zero_sum_constants.cpython-311.pyc differ diff --git a/constant_definitions/auction_nplayer_constants.py b/constant_definitions/auction_nplayer_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..53eec8bfb89855e73bc7db88971a864d995dc2b7 --- /dev/null +++ b/constant_definitions/auction_nplayer_constants.py @@ -0,0 +1,33 @@ +# --- Auction parameters --- +AUCTION_ITEM_VALUE = 10 # True value of the auctioned item +AUCTION_MAX_BID = 15 # Maximum allowed bid +AUCTION_BID_INCREMENT = 1 # Discrete bid step size + +# --- Opponent auction valuation --- +AUCTION_OPP_VALUE_LOW = 6 # Opponent's low valuation scenario +AUCTION_OPP_VALUE_HIGH = 10 # Opponent's high valuation scenario +AUCTION_OPP_DEFAULT_BID = 5 # Default opponent bid for strategies + +# --- Tragedy of the Commons --- +COMMONS_RESOURCE_CAPACITY = 20 # Sustainable extraction limit +COMMONS_MAX_EXTRACTION = 10 # Max individual extraction +COMMONS_REGEN_RATE_NUM = 1 # Regeneration numerator +COMMONS_REGEN_RATE_DEN = 2 # Regeneration denominator +COMMONS_DEPLETION_PENALTY = -2 # Payoff when resource is depleted + +# --- Volunteer's Dilemma --- +VOLUNTEER_BENEFIT = 6 # Benefit to all if someone volunteers +VOLUNTEER_COST = 2 # Cost to the volunteer +VOLUNTEER_NO_VOL = 0 # Payoff if nobody volunteers + +# --- El Farol Bar Problem --- +EL_FAROL_CAPACITY = 6 # Bar capacity threshold +EL_FAROL_ATTEND_REWARD = 4 # Payoff for attending uncrowded bar +EL_FAROL_CROWD_PENALTY = -1 # Payoff for attending crowded bar +EL_FAROL_STAY_HOME = 2 # Payoff for staying home + +# --- Generated game defaults --- +GENERATED_DEFAULT_ACTIONS = 3 # Default NxN matrix size +GENERATED_PAYOFF_MIN = -5 # Minimum random payoff +GENERATED_PAYOFF_MAX = 5 # Maximum random payoff +GENERATED_SEED_DEFAULT = 42 # Default random seed diff --git a/constant_definitions/batch4/__pycache__/advanced_constants.cpython-311.pyc b/constant_definitions/batch4/__pycache__/advanced_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ad33069116df21b4373dd1c071c6e6334aef38eb Binary files /dev/null and b/constant_definitions/batch4/__pycache__/advanced_constants.cpython-311.pyc differ diff --git a/constant_definitions/batch4/__pycache__/bayesian_constants.cpython-311.pyc b/constant_definitions/batch4/__pycache__/bayesian_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..2fd8a3650e1da46ef3aa27f2213aec57f61705b0 Binary files /dev/null and b/constant_definitions/batch4/__pycache__/bayesian_constants.cpython-311.pyc differ diff --git a/constant_definitions/batch4/__pycache__/network_constants.cpython-311.pyc b/constant_definitions/batch4/__pycache__/network_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6c835a87faf4679240540d6d5c5982b134a53216 Binary files /dev/null and b/constant_definitions/batch4/__pycache__/network_constants.cpython-311.pyc differ diff --git a/constant_definitions/batch4/__pycache__/stochastic_constants.cpython-311.pyc b/constant_definitions/batch4/__pycache__/stochastic_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d3cd90d5225ba605f1b3f863204c42844ec13cc7 Binary files /dev/null and b/constant_definitions/batch4/__pycache__/stochastic_constants.cpython-311.pyc differ diff --git a/constant_definitions/batch4/__pycache__/tag_constants.cpython-311.pyc b/constant_definitions/batch4/__pycache__/tag_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6bb488d76615026410c9af451de2695faf28dc0f Binary files /dev/null and b/constant_definitions/batch4/__pycache__/tag_constants.cpython-311.pyc differ diff --git a/constant_definitions/batch4/advanced_constants.py b/constant_definitions/batch4/advanced_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..0b0057e626163cd9533fc279ed27725185a5ce2c --- /dev/null +++ b/constant_definitions/batch4/advanced_constants.py @@ -0,0 +1,19 @@ +# Preemption Game -- first-mover timing advantage +PRE_EARLY_EARLY = 2 +PRE_EARLY_LATE = 6 +PRE_LATE_EARLY = 1 +PRE_LATE_LATE = 4 +PRE_OUT_PAYOFF = 3 + +# War of Gifts -- competitive generosity +WOG_LARGE_LARGE = 1 +WOG_LARGE_SMALL = 4 +WOG_LARGE_NONE = 2 +WOG_SMALL_SMALL = 2 +WOG_SMALL_NONE = 1 +WOG_NO_GIFT = 3 + +# Penalty Shootout -- mismatch (kicker vs keeper) game +PS_SAVE_PAYOFF = -1 +PS_SCORE_PAYOFF = 1 +PS_CENTER_BONUS = 1 diff --git a/constant_definitions/batch4/bayesian_constants.py b/constant_definitions/batch4/bayesian_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..274abd661959c9b9ef1020d866ee776c6b8c60cd --- /dev/null +++ b/constant_definitions/batch4/bayesian_constants.py @@ -0,0 +1,22 @@ +# Global Game -- coordination under private signals +GG_ATTACK_ATTACK = 5 +GG_ATTACK_WAIT = -2 +GG_WAIT_ATTACK = 0 +GG_WAIT_WAIT = 2 + +# Jury Voting -- unanimous conviction required +JV_CONVICT_CONVICT = 4 +JV_ACQUIT_ACQUIT = 1 +JV_SPLIT_VOTE = 0 + +# Information Cascade -- herding vs independence +IC_SIGNAL_SIGNAL = 4 +IC_SIGNAL_CROWD = 1 +IC_CROWD_SIGNAL = 3 +IC_CROWD_CROWD = 2 + +# Adverse Selection -- reveal or hide private type +ASI_REVEAL_REVEAL = 3 +ASI_REVEAL_HIDE = -1 +ASI_HIDE_REVEAL = 4 +ASI_HIDE_HIDE = 0 diff --git a/constant_definitions/batch4/network_constants.py b/constant_definitions/batch4/network_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..ce11f1e66f6acfcaed6ed0c11fbb5badc1bf39c2 --- /dev/null +++ b/constant_definitions/batch4/network_constants.py @@ -0,0 +1,27 @@ +# Security Game -- defender vs attacker resource allocation +SG_DEFEND_SUCCESS = 3 +SG_ATTACK_FAIL = -1 +SG_DEFEND_FAIL = -2 +SG_ATTACK_SUCCESS = 4 + +# Link Formation -- bilateral consent for network links +LF_MUTUAL_CONNECT = 3 +LF_UNILATERAL_COST = -1 +LF_MUTUAL_ISOLATE = 0 + +# Trust with Punishment (3x3: cooperate, defect, punish) +TWP_CC = 3 +TWP_CD = 0 +TWP_DC = 5 +TWP_DD = 1 +TWP_CP = -1 +TWP_PC = 2 +TWP_DP = -2 +TWP_PD = 0 +TWP_PP = -1 + +# Dueling Game -- timing under uncertainty +DG_EARLY_EARLY = 1 +DG_EARLY_LATE = 3 +DG_LATE_EARLY = -1 +DG_LATE_LATE = 2 diff --git a/constant_definitions/batch4/stochastic_constants.py b/constant_definitions/batch4/stochastic_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..27655a1e13cb59d57e8a8147b525d1fb3d59e774 --- /dev/null +++ b/constant_definitions/batch4/stochastic_constants.py @@ -0,0 +1,23 @@ +# Stochastic PD -- expected payoffs under action noise +SPD_CC = 3 +SPD_CD = 1 +SPD_DC = 4 +SPD_DD = 2 + +# Risk Dominance -- payoff-dominant vs risk-dominant equilibria +RD_PAYOFF_DOMINANT = 7 +RD_RISK_DOMINANT = 5 +RD_MISCOORDINATION = 0 + +# Threshold Public Goods -- step-function provision +TPG_ENDOWMENT = 5 +TPG_THRESHOLD = 6 +TPG_SUCCESS_BONUS = 4 + +# Evolutionary PD -- long-run strategy expected payoffs +EPD_COOP_COOP = 3 +EPD_COOP_DEFECT = 0 +EPD_DEFECT_COOP = 5 +EPD_DEFECT_DEFECT = 1 +EPD_TFT_DEFECT = 1 +EPD_DEFECT_TFT = 2 diff --git a/constant_definitions/batch4/tag_constants.py b/constant_definitions/batch4/tag_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..300db60da9ad7eb2251fabc23f05b3ba02fe14fd --- /dev/null +++ b/constant_definitions/batch4/tag_constants.py @@ -0,0 +1,66 @@ +"""String constants for game-theoretic property tags.""" + +# ── Communication ── +NO_COMMUNICATION = "no_communication" +CHEAP_TALK = "cheap_talk" +COSTLY_SIGNALING = "costly_signaling" +BINDING_COMMITMENT = "binding_commitment" +MEDIATED = "mediated" + +# ── Information ── +COMPLETE_INFORMATION = "complete_information" +INCOMPLETE_INFORMATION = "incomplete_information" +ASYMMETRIC_INFORMATION = "asymmetric_information" + +# ── Structure ── +SIMULTANEOUS = "simultaneous" +SEQUENTIAL = "sequential" +REPEATED = "repeated" +SINGLE_SHOT = "single_shot" + +# ── Payoff type ── +ZERO_SUM = "zero_sum" +SYMMETRIC_PAYOFF = "symmetric_payoff" +ASYMMETRIC_PAYOFF = "asymmetric_payoff" +COORDINATION = "coordination" +ANTI_COORDINATION = "anti_coordination" + +# ── Domain ── +SOCIAL_DILEMMA = "social_dilemma" +AUCTION = "auction" +BARGAINING = "bargaining" +VOTING = "voting" +MARKET_COMPETITION = "market_competition" +EVOLUTIONARY = "evolutionary" +SECURITY = "security" +NETWORK = "network" + +# ── Action space ── +BINARY_CHOICE = "binary_choice" +SMALL_CHOICE = "small_choice" +LARGE_CHOICE = "large_choice" + +# ── Grouped by dimension (for programmatic enumeration) ── +CATEGORIES: dict[str, list[str]] = { + "communication": [ + NO_COMMUNICATION, CHEAP_TALK, COSTLY_SIGNALING, + BINDING_COMMITMENT, MEDIATED, + ], + "information": [ + COMPLETE_INFORMATION, INCOMPLETE_INFORMATION, ASYMMETRIC_INFORMATION, + ], + "structure": [ + SIMULTANEOUS, SEQUENTIAL, REPEATED, SINGLE_SHOT, + ], + "payoff_type": [ + ZERO_SUM, SYMMETRIC_PAYOFF, ASYMMETRIC_PAYOFF, + COORDINATION, ANTI_COORDINATION, + ], + "domain": [ + SOCIAL_DILEMMA, AUCTION, BARGAINING, VOTING, + MARKET_COMPETITION, EVOLUTIONARY, SECURITY, NETWORK, + ], + "action_space": [ + BINARY_CHOICE, SMALL_CHOICE, LARGE_CHOICE, + ], +} diff --git a/constant_definitions/coordination_constants.py b/constant_definitions/coordination_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..f146432de59d5d972af7324d74723f8d905fe15c --- /dev/null +++ b/constant_definitions/coordination_constants.py @@ -0,0 +1,23 @@ +# --- Battle of the Sexes payoffs --- +# Player 1 prefers opera, Player 2 prefers football +BOS_PREFERRED_PAYOFF = 3 # Coordinating on your preferred option +BOS_COMPROMISE_PAYOFF = 2 # Coordinating on the other's preferred option +BOS_MISMATCH_PAYOFF = 0 # Failing to coordinate + +# --- Pure Coordination payoffs --- +PC_MATCH_PAYOFF = 2 # Both choose same action +PC_MISMATCH_PAYOFF = 0 # Choices differ + +# --- Deadlock payoffs (defection dominant for both) --- +# Ordering: DC > DD > CC > CD +DL_DC_PAYOFF = 4 # I defect, they cooperate +DL_DD_PAYOFF = 3 # Both defect (NE) +DL_CC_PAYOFF = 2 # Both cooperate +DL_CD_PAYOFF = 1 # I cooperate, they defect + +# --- Harmony payoffs (cooperation dominant for both) --- +# Ordering: CC > DC > CD > DD +HM_CC_PAYOFF = 4 # Both cooperate (NE) +HM_DC_PAYOFF = 3 # I defect, they cooperate +HM_CD_PAYOFF = 2 # I cooperate, they defect +HM_DD_PAYOFF = 1 # Both defect diff --git a/constant_definitions/ext/__pycache__/conflict_constants.cpython-311.pyc b/constant_definitions/ext/__pycache__/conflict_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..559114392e43ef5d4d721f6c002207611ff92605 Binary files /dev/null and b/constant_definitions/ext/__pycache__/conflict_constants.cpython-311.pyc differ diff --git a/constant_definitions/ext/__pycache__/cooperative_constants.cpython-311.pyc b/constant_definitions/ext/__pycache__/cooperative_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6f5985e2ae8b470f55b6c2a488235b63aaf7ceb3 Binary files /dev/null and b/constant_definitions/ext/__pycache__/cooperative_constants.cpython-311.pyc differ diff --git a/constant_definitions/ext/__pycache__/dynamic_constants.cpython-311.pyc b/constant_definitions/ext/__pycache__/dynamic_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ddfee7215e5cda8b3d6b6633b71a5af63e2d3a55 Binary files /dev/null and b/constant_definitions/ext/__pycache__/dynamic_constants.cpython-311.pyc differ diff --git a/constant_definitions/ext/__pycache__/market_constants.cpython-311.pyc b/constant_definitions/ext/__pycache__/market_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d056b0cdd876750646676fcee08e3712ae41abaa Binary files /dev/null and b/constant_definitions/ext/__pycache__/market_constants.cpython-311.pyc differ diff --git a/constant_definitions/ext/__pycache__/signaling_constants.cpython-311.pyc b/constant_definitions/ext/__pycache__/signaling_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..69d40e72be006a50b62db99390e5694617abd474 Binary files /dev/null and b/constant_definitions/ext/__pycache__/signaling_constants.cpython-311.pyc differ diff --git a/constant_definitions/ext/conflict_constants.py b/constant_definitions/ext/conflict_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..2179f0b19b413387611898ed75a9348ccc2168b6 --- /dev/null +++ b/constant_definitions/ext/conflict_constants.py @@ -0,0 +1,27 @@ +# --- Colonel Blotto --- +BLOTTO_BATTLEFIELDS = 3 # Number of battlefields +BLOTTO_TOTAL_TROOPS = 6 # Total troops to allocate + +# --- War of Attrition --- +WOA_PRIZE = 10 # Prize for the survivor +WOA_COST_PER_ROUND = 1 # Cost per unit of persistence +WOA_MAX_PERSISTENCE = 10 # Max persistence level + +# --- Tullock Contest --- +TULLOCK_PRIZE = 10 # Prize value +TULLOCK_MAX_EFFORT = 10 # Max effort level +TULLOCK_EFFECTIVENESS = 1 # Effort effectiveness exponent + +# --- Inspection Game --- +INSP_VIOLATION_GAIN = 4 # Gain from undetected violation +INSP_FINE = 6 # Fine if caught +INSP_INSPECTION_COST = 2 # Cost of inspecting +INSP_COMPLIANCE_PAYOFF = 0 # Payoff for complying + +# --- Rubinstein Bargaining --- +RUB_SURPLUS = 10 # Total surplus +RUB_DISCOUNT_NUM = 9 # Discount factor numerator +RUB_DISCOUNT_DEN = 10 # Discount factor denominator + +# --- Divide-and-Choose --- +DAC_ENDOWMENT = 10 # Total to divide diff --git a/constant_definitions/ext/cooperative_constants.py b/constant_definitions/ext/cooperative_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..ba386d4389113739fb9c102e90da752a1b24951f --- /dev/null +++ b/constant_definitions/ext/cooperative_constants.py @@ -0,0 +1,33 @@ +# --- Shapley Value Allocation --- +SHAPLEY_GRAND_COALITION_VALUE = 12 # v({all players}) +SHAPLEY_SINGLE_VALUE = 2 # v({single player}) +SHAPLEY_PAIR_VALUE = 8 # v({pair}) +SHAPLEY_MAX_CLAIM = 12 # Max individual claim + +# --- Core / Divide-the-Dollar --- +CORE_POT = 10 # Amount to divide +CORE_MAJORITY_THRESHOLD = 2 # Votes needed for majority + +# --- Weighted Voting --- +WV_QUOTA = 6 # Votes needed to pass +WV_PLAYER_WEIGHT = 3 # First player weight +WV_OPPONENT_WEIGHT = 4 # Second player weight +WV_PASS_BENEFIT = 5 # Benefit if proposal passes +WV_FAIL_PAYOFF = 0 # Payoff if proposal fails +WV_OPPOSITION_BONUS = 2 # Bonus for blocking + +# --- Stable Matching --- +SM_NUM_OPTIONS = 3 # Number of partners to rank +SM_TOP_MATCH_PAYOFF = 5 # Payoff for top choice match +SM_MID_MATCH_PAYOFF = 3 # Payoff for middle choice +SM_LOW_MATCH_PAYOFF = 1 # Payoff for last choice + +# --- Median Voter --- +MV_POSITION_RANGE = 10 # Policy positions from zero to this +MV_DISTANCE_COST = 1 # Payoff loss per unit distance + +# --- Approval Voting --- +AV_NUM_CANDIDATES = 4 # Number of candidates +AV_PREFERRED_WIN = 5 # Payoff if preferred wins +AV_ACCEPTABLE_WIN = 2 # Payoff if acceptable wins +AV_DISLIKED_WIN = -2 # Payoff if disliked wins diff --git a/constant_definitions/ext/dynamic_constants.py b/constant_definitions/ext/dynamic_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..ff5f68835da3af70f583c245619fe1ae9306efa8 --- /dev/null +++ b/constant_definitions/ext/dynamic_constants.py @@ -0,0 +1,40 @@ +# --- Bank Run --- +BR_PATIENCE_REWARD = 5 # Payoff for waiting when bank survives +BR_EARLY_WITHDRAW = 3 # Payoff for early withdrawal +BR_BANK_FAIL_PAYOFF = 1 # Payoff when bank collapses + +# --- Global Stag Hunt --- +GSH_STAG_PAYOFF = 6 # Mutual stag payoff (higher than normal) +GSH_HARE_PAYOFF = 3 # Hare regardless payoff +GSH_STAG_ALONE_PAYOFF = 0 # Hunting stag alone + +# --- Beauty Contest / p-Guessing --- +BC_MAX_NUMBER = 10 # Range of numbers to choose from +BC_TARGET_FRACTION_NUM = 2 # p = two thirds +BC_TARGET_FRACTION_DEN = 3 +BC_WIN_PAYOFF = 5 # Winner payoff +BC_LOSE_PAYOFF = 0 # Loser payoff +BC_TIE_PAYOFF = 2 # Tie payoff + +# --- Hawk-Dove-Bourgeois --- +HDB_RESOURCE_VALUE = 6 # Value of contested resource +HDB_FIGHT_COST = 8 # Cost of mutual hawk fight +HDB_SHARE_DIVISOR = 2 # Split resource equally + +# --- Gift Exchange --- +GE_MAX_WAGE = 10 # Maximum wage employer can offer +GE_MAX_EFFORT = 10 # Maximum effort worker can exert +GE_EFFORT_COST_PER_UNIT = 1 # Marginal cost of effort +GE_PRODUCTIVITY_PER_EFFORT = 2 # Revenue per unit of effort + +# --- Moral Hazard --- +MH_BASE_OUTPUT = 3 # Output without effort +MH_EFFORT_BOOST = 5 # Additional output from effort +MH_EFFORT_COST = 2 # Cost to agent of exerting effort +MH_MAX_BONUS = 10 # Maximum bonus principal can offer + +# --- Screening --- +SCR_HIGH_TYPE_VALUE = 8 # High type's private value +SCR_LOW_TYPE_VALUE = 4 # Low type's private value +SCR_PREMIUM_PRICE = 6 # Premium contract price +SCR_BASIC_PRICE = 3 # Basic contract price diff --git a/constant_definitions/ext/market_constants.py b/constant_definitions/ext/market_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..d8426e876a2ba3f7884bb7f443e6d27ff49f4167 --- /dev/null +++ b/constant_definitions/ext/market_constants.py @@ -0,0 +1,30 @@ +# --- Cournot Duopoly --- +COURNOT_DEMAND_INTERCEPT = 12 # a in P = a - b*Q +COURNOT_DEMAND_SLOPE = 1 # b in P = a - b*Q +COURNOT_MARGINAL_COST = 2 # Constant marginal cost +COURNOT_MAX_QUANTITY = 10 # Max production quantity + +# --- Bertrand Competition --- +BERTRAND_MAX_PRICE = 10 # Maximum price +BERTRAND_MARGINAL_COST = 3 # Production cost +BERTRAND_MARKET_SIZE = 12 # Total demand at zero price + +# --- Hotelling Location --- +HOTELLING_LINE_LENGTH = 10 # Length of the line +HOTELLING_TRANSPORT_COST = 1 # Per-unit transport cost +HOTELLING_MARKET_VALUE = 6 # Revenue per captured consumer + +# --- Entry Deterrence --- +ED_MONOPOLY_PROFIT = 10 # Incumbent profit if no entry +ED_DUOPOLY_PROFIT = 4 # Each firm profit if entry and accommodate +ED_FIGHT_COST = -2 # Incumbent cost of fighting +ED_ENTRANT_FIGHT_LOSS = -3 # Entrant loss if fought +ED_STAY_OUT_PAYOFF = 0 # Entrant payoff for staying out + +# --- Nash Demand Game --- +ND_SURPLUS = 10 # Total surplus to divide + +# --- Double Auction --- +DA_BUYER_VALUE = 8 # Buyer private valuation +DA_SELLER_COST = 3 # Seller private cost +DA_MAX_PRICE = 10 # Maximum price diff --git a/constant_definitions/ext/signaling_constants.py b/constant_definitions/ext/signaling_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..6cf7db8001bfcef98b1e0da681838e286bad67ed --- /dev/null +++ b/constant_definitions/ext/signaling_constants.py @@ -0,0 +1,35 @@ +# --- Beer-Quiche Signaling Game --- +BQ_TOUGH_BEER_PAYOFF = 3 # Tough type prefers beer +BQ_TOUGH_QUICHE_PAYOFF = 1 # Tough type dislikes quiche +BQ_WEAK_BEER_PAYOFF = 1 # Weak type dislikes beer +BQ_WEAK_QUICHE_PAYOFF = 3 # Weak type prefers quiche +BQ_CHALLENGE_COST = -2 # Cost of being challenged +BQ_NO_CHALLENGE_BONUS = 2 # Bonus for not being challenged +BQ_CHALLENGE_TOUGH_PAYOFF = -1 # Challenger loses vs tough +BQ_CHALLENGE_WEAK_PAYOFF = 2 # Challenger wins vs weak + +# --- Spence Job Market Signaling --- +SPENCE_HIGH_ABILITY = 4 # High-type productivity +SPENCE_LOW_ABILITY = 2 # Low-type productivity +SPENCE_EDU_COST_HIGH = 1 # Education cost for high type +SPENCE_EDU_COST_LOW = 3 # Education cost for low type +SPENCE_HIGH_WAGE = 4 # Wage offered to educated workers +SPENCE_LOW_WAGE = 2 # Wage offered to uneducated workers + +# --- Cheap Talk --- +CT_ALIGNED_MATCH = 3 # Both benefit from correct action +CT_ALIGNED_MISMATCH = 0 # Misaligned outcomes +CT_BIAS = 1 # Sender's preferred deviation + +# --- Lemon Market --- +LEMON_GOOD_QUALITY_VALUE = 8 # Buyer value for good car +LEMON_BAD_QUALITY_VALUE = 3 # Buyer value for lemon +LEMON_GOOD_SELLER_COST = 6 # Seller cost for good car +LEMON_BAD_SELLER_COST = 2 # Seller cost for lemon +LEMON_MAX_PRICE = 10 # Maximum price in market + +# --- Bayesian Persuasion --- +BP_GOOD_STATE_VALUE = 5 # Value of action in good state +BP_BAD_STATE_PENALTY = -3 # Penalty for action in bad state +BP_SAFE_PAYOFF = 0 # Safe action payoff +BP_REVEAL_COST = 0 # Cost of revealing (zero for sender) diff --git a/constant_definitions/game_constants.py b/constant_definitions/game_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..8acf4ced66eb151a47fec2efb06ec72a2dd2615e --- /dev/null +++ b/constant_definitions/game_constants.py @@ -0,0 +1,93 @@ +# Pydantic / model defaults +DEFAULT_ZERO_FLOAT = float() +DEFAULT_ZERO_INT = int() +DEFAULT_FALSE = False +DEFAULT_NONE = None +MIN_STEP_COUNT = int() + +# Episode configuration +DEFAULT_NUM_ROUNDS = 10 +SINGLE_SHOT_ROUNDS = 1 + +# --- Prisoner's Dilemma payoffs --- +PD_CC_PAYOFF = 3 # Both cooperate +PD_CD_PAYOFF = 0 # I cooperate, they defect +PD_DC_PAYOFF = 5 # I defect, they cooperate +PD_DD_PAYOFF = 1 # Both defect + +# --- Stag Hunt payoffs --- +SH_SS_PAYOFF = 4 # Both hunt stag +SH_SH_PAYOFF = 0 # I hunt stag, they hunt hare +SH_HS_PAYOFF = 3 # I hunt hare, they hunt stag +SH_HH_PAYOFF = 2 # Both hunt hare + +# --- Hawk-Dove payoffs --- +HD_HH_PAYOFF = -1 # Both hawk (conflict) +HD_HD_PAYOFF = 3 # I hawk, they dove +HD_DH_PAYOFF = 1 # I dove, they hawk +HD_DD_PAYOFF = 2 # Both dove + +# --- Ultimatum Game --- +ULTIMATUM_POT = 10 + +# --- Trust Game --- +TRUST_MULTIPLIER = 3 +TRUST_ENDOWMENT = 10 + +# --- Public Goods Game --- +PG_MULTIPLIER_NUMERATOR = 3 +PG_MULTIPLIER_DENOMINATOR = 2 +PG_ENDOWMENT = 20 +PG_DEFAULT_NUM_PLAYERS = 4 + +# --- Strategy parameters --- +GENEROUS_TFT_COOPERATION_PROB = 9 # out of 10 (90%) +GENEROUS_TFT_DENOMINATOR = 10 +ADAPTIVE_THRESHOLD_NUMERATOR = 1 +ADAPTIVE_THRESHOLD_DENOMINATOR = 2 +MIXED_STRATEGY_COOPERATE_PROB_NUMERATOR = 7 +MIXED_STRATEGY_COOPERATE_PROB_DENOMINATOR = 10 + +# Ultimatum strategy defaults +ULTIMATUM_FAIR_OFFER = 5 +ULTIMATUM_LOW_OFFER = 3 +ULTIMATUM_HIGH_OFFER = 7 +ULTIMATUM_ACCEPT_THRESHOLD = 3 + +# Trust strategy defaults +TRUST_FAIR_RETURN_NUMERATOR = 1 +TRUST_FAIR_RETURN_DENOMINATOR = 3 +TRUST_GENEROUS_RETURN_NUMERATOR = 1 +TRUST_GENEROUS_RETURN_DENOMINATOR = 2 + +# Public goods strategy defaults +PG_FAIR_CONTRIBUTION_NUMERATOR = 1 +PG_FAIR_CONTRIBUTION_DENOMINATOR = 2 +PG_FREE_RIDER_CONTRIBUTION = 2 + +# Port +SERVER_PORT = 8000 + +# Max concurrent environments +MAX_CONCURRENT_ENVS = 1 + +# --- Evaluation module constants --- +EVAL_ZERO = 0 +EVAL_ONE = 1 +EVAL_TWO = 2 +EVAL_THREE = 3 +EVAL_FOUR = 4 +EVAL_DEFAULT_EPISODES = 3 +EVAL_HUNDRED = 100 +EVAL_INDENT_SPACES = 4 +EVAL_PERFECT_SCORE = 1 +EVAL_ZERO_FLOAT = 0.0 +EVAL_ONE_FLOAT = 1.0 +EVAL_HALF = 0.5 +EVAL_NEGATIVE_ONE = -1 + +# --- External benchmark constants --- +EVAL_EIGHT = 8 +EVAL_TEN = 10 +EVAL_EIGHTY = 80 +EVAL_FIVE_TWELVE = 512 diff --git a/constant_definitions/nplayer/__init__.py b/constant_definitions/nplayer/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/constant_definitions/nplayer/__pycache__/__init__.cpython-311.pyc b/constant_definitions/nplayer/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e8ad980c372675ca8333c0722b7cb7bfacb2fecd Binary files /dev/null and b/constant_definitions/nplayer/__pycache__/__init__.cpython-311.pyc differ diff --git a/constant_definitions/nplayer/__pycache__/coalition_constants.cpython-311.pyc b/constant_definitions/nplayer/__pycache__/coalition_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..aaf58a5cadf51c687dbe94b5d5be002c9cc825c5 Binary files /dev/null and b/constant_definitions/nplayer/__pycache__/coalition_constants.cpython-311.pyc differ diff --git a/constant_definitions/nplayer/__pycache__/dynamic_constants.cpython-311.pyc b/constant_definitions/nplayer/__pycache__/dynamic_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..adf2a85409223f8bb103ab4afba51b0fe829b049 Binary files /dev/null and b/constant_definitions/nplayer/__pycache__/dynamic_constants.cpython-311.pyc differ diff --git a/constant_definitions/nplayer/__pycache__/nplayer_constants.cpython-311.pyc b/constant_definitions/nplayer/__pycache__/nplayer_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..90bba7656209882d29b3e1325ea4fe31f2bd038e Binary files /dev/null and b/constant_definitions/nplayer/__pycache__/nplayer_constants.cpython-311.pyc differ diff --git a/constant_definitions/nplayer/coalition_constants.py b/constant_definitions/nplayer/coalition_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..293de710fb5afa7436568532fd927bb088d1ca79 --- /dev/null +++ b/constant_definitions/nplayer/coalition_constants.py @@ -0,0 +1,65 @@ +# --- Coalition phase parameters --- +COALITION_PHASE_NEGOTIATE = "negotiate" +COALITION_PHASE_ACTION = "action" + +# --- Enforcement modes --- +ENFORCEMENT_CHEAP_TALK = "cheap_talk" +ENFORCEMENT_PENALTY = "penalty" +ENFORCEMENT_BINDING = "binding" + +# --- Default penalty fraction (penalty = payoff * NUM / DEN) --- +COALITION_DEFAULT_PENALTY_NUMERATOR = 1 +COALITION_DEFAULT_PENALTY_DENOMINATOR = 2 + +# --- Default side payment --- +COALITION_DEFAULT_SIDE_PAYMENT = 0 + +# --- Default rounds for coalition games --- +COALITION_DEFAULT_ROUNDS = 10 + +# --- Cartel Game (penalty enforcement) --- +CARTEL_NUM_PLAYERS = 4 +CARTEL_COLLUDE_THRESHOLD = 3 +CARTEL_COLLUDE_HIGH = 6 +CARTEL_COLLUDE_LOW = 2 +CARTEL_COMPETE_HIGH = 10 +CARTEL_COMPETE_LOW = 4 + +# --- Alliance Formation (cheap talk) --- +ALLIANCE_NUM_PLAYERS = 4 +ALLIANCE_SUPPORT_POOL = 20 +ALLIANCE_BETRAY_GAIN = 8 +ALLIANCE_NO_SUPPORT = 2 + +# --- Coalition Voting (binding) --- +VOTING_NUM_PLAYERS = 5 +VOTING_WINNER_PAYOFF = 6 +VOTING_LOSER_PAYOFF = 2 + +# --- Ostracism (penalty) --- +OSTRACISM_NUM_PLAYERS = 5 +OSTRACISM_BONUS_POOL = 20 +OSTRACISM_EXCLUDED_PAYOFF = 0 +OSTRACISM_BASE_PAYOFF = 3 +OSTRACISM_MAJORITY_NUMERATOR = 1 +OSTRACISM_MAJORITY_DENOMINATOR = 2 + +# --- Resource Trading (cheap talk) --- +TRADE_NUM_PLAYERS = 4 +TRADE_DIVERSE_PAYOFF = 6 +TRADE_HOMOGENEOUS_PAYOFF = 2 +TRADE_MINORITY_BONUS = 2 + +# --- Rule Voting (binding) --- +RULE_NUM_PLAYERS = 4 +RULE_EQUAL_PAY = 5 +RULE_WINNER_HIGH = 8 +RULE_WINNER_LOW = 2 + +# --- Commons Governance (penalty) --- +COMMONS_NUM_PLAYERS = 5 +COMMONS_SUSTAINABLE_THRESHOLD = 2 +COMMONS_LOW_SUSTAINABLE = 4 +COMMONS_HIGH_SUSTAINABLE = 7 +COMMONS_LOW_DEPLETED = 1 +COMMONS_HIGH_DEPLETED = 2 diff --git a/constant_definitions/nplayer/dynamic_constants.py b/constant_definitions/nplayer/dynamic_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..7575fd0f75e764f2d0a82b67f17c2676a7339187 --- /dev/null +++ b/constant_definitions/nplayer/dynamic_constants.py @@ -0,0 +1,5 @@ +# --- Dynamic game creation validation --- +MIN_ACTIONS = 2 +MAX_ACTIONS = 20 +DYNAMIC_DEFAULT_ROUNDS = 10 +REGISTRY_PREFIX = "dynamic_" diff --git a/constant_definitions/nplayer/governance_constants.py b/constant_definitions/nplayer/governance_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..97897f07c90cb024c044f77b80aa030ab4aa47d4 --- /dev/null +++ b/constant_definitions/nplayer/governance_constants.py @@ -0,0 +1,57 @@ +# --- Governance proposal types --- +GOVERNANCE_PROPOSAL_PARAMETER = "parameter" +GOVERNANCE_PROPOSAL_MECHANIC = "mechanic" +GOVERNANCE_PROPOSAL_CUSTOM = "custom" + +# --- Mechanic names (applied in this fixed order) --- +MECHANIC_TAXATION = "taxation" +MECHANIC_REDISTRIBUTION = "redistribution" +MECHANIC_INSURANCE = "insurance" +MECHANIC_QUOTA = "quota" +MECHANIC_SUBSIDY = "subsidy" +MECHANIC_VETO = "veto" + +MECHANIC_ORDER = [ + MECHANIC_TAXATION, + MECHANIC_REDISTRIBUTION, + MECHANIC_INSURANCE, + MECHANIC_QUOTA, + MECHANIC_SUBSIDY, + MECHANIC_VETO, +] + +# --- Redistribution modes --- +REDISTRIBUTION_EQUAL = "equal" +REDISTRIBUTION_PROPORTIONAL = "proportional" + +# --- Voting thresholds --- +GOVERNANCE_MAJORITY_NUMERATOR = 1 +GOVERNANCE_MAJORITY_DENOMINATOR = 2 + +# --- Per-round limits --- +GOVERNANCE_MAX_PROPOSALS_PER_ROUND = 3 + +# --- Default mechanic parameters --- +GOVERNANCE_DEFAULT_TAX_RATE_NUMERATOR = 1 +GOVERNANCE_DEFAULT_TAX_RATE_DENOMINATOR = 10 + +GOVERNANCE_DEFAULT_REDISTRIBUTION_MODE = REDISTRIBUTION_EQUAL +GOVERNANCE_DEFAULT_REDISTRIBUTION_DAMPING_NUMERATOR = 1 +GOVERNANCE_DEFAULT_REDISTRIBUTION_DAMPING_DENOMINATOR = 2 + +GOVERNANCE_DEFAULT_INSURANCE_CONTRIBUTION_NUMERATOR = 1 +GOVERNANCE_DEFAULT_INSURANCE_CONTRIBUTION_DENOMINATOR = 10 +GOVERNANCE_DEFAULT_INSURANCE_THRESHOLD_NUMERATOR = 1 +GOVERNANCE_DEFAULT_INSURANCE_THRESHOLD_DENOMINATOR = 2 + +GOVERNANCE_DEFAULT_QUOTA_MAX = 8 + +GOVERNANCE_DEFAULT_SUBSIDY_FLOOR = 2 +GOVERNANCE_DEFAULT_SUBSIDY_FUND_RATE_NUMERATOR = 1 +GOVERNANCE_DEFAULT_SUBSIDY_FUND_RATE_DENOMINATOR = 5 + +GOVERNANCE_DEFAULT_VETO_PLAYER = 0 + +# --- Custom modifier safety --- +GOVERNANCE_CUSTOM_DELTA_CLAMP_NUMERATOR = 1 +GOVERNANCE_CUSTOM_DELTA_CLAMP_DENOMINATOR = 2 diff --git a/constant_definitions/nplayer/nplayer_constants.py b/constant_definitions/nplayer/nplayer_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..fca025f080f5db5ef3cafadbf08fa3ee82b43f95 --- /dev/null +++ b/constant_definitions/nplayer/nplayer_constants.py @@ -0,0 +1,21 @@ +# --- N-player environment parameters --- +MIN_PLAYERS = 2 +MAX_PLAYERS = 10 +NPLAYER_DEFAULT_ROUNDS = 10 + +# --- Public Goods (N-player) --- +NPG_ENDOWMENT = 20 +NPG_MULTIPLIER_NUMERATOR = 3 +NPG_MULTIPLIER_DENOMINATOR = 2 + +# --- Volunteer's Dilemma (N-player) --- +NVD_BENEFIT = 6 +NVD_COST = 2 +NVD_NO_VOLUNTEER = 0 + +# --- El Farol Bar (N-player) --- +NEF_CAPACITY_FRACTION_NUMERATOR = 3 +NEF_CAPACITY_FRACTION_DENOMINATOR = 5 +NEF_ATTEND_REWARD = 4 +NEF_CROWD_PENALTY = -1 +NEF_STAY_HOME = 2 diff --git a/constant_definitions/sequential_constants.py b/constant_definitions/sequential_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..d0eb2c679050de29c30ae83261afe10014b8026a --- /dev/null +++ b/constant_definitions/sequential_constants.py @@ -0,0 +1,17 @@ +# --- Dictator Game --- +DICTATOR_ENDOWMENT = 10 # Amount the dictator allocates + +# --- Centipede Game --- +CENTIPEDE_INITIAL_POT = 4 # Starting pot size +CENTIPEDE_GROWTH_MULTIPLIER = 2 # Pot multiplier each pass +CENTIPEDE_MAX_STAGES = 6 # Maximum number of stages +CENTIPEDE_LARGE_SHARE_NUMERATOR = 3 # Large share = pot * 3/4 +CENTIPEDE_LARGE_SHARE_DENOMINATOR = 4 +CENTIPEDE_SMALL_SHARE_NUMERATOR = 1 # Small share = pot * 1/4 +CENTIPEDE_SMALL_SHARE_DENOMINATOR = 4 + +# --- Stackelberg Competition --- +STACKELBERG_DEMAND_INTERCEPT = 12 # a in P = a - b*Q +STACKELBERG_DEMAND_SLOPE = 1 # b in P = a - b*Q +STACKELBERG_MARGINAL_COST = 2 # Constant marginal cost c +STACKELBERG_MAX_QUANTITY = 10 # Max production quantity diff --git a/constant_definitions/train/__init__.py b/constant_definitions/train/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..86b7d5c2f693e34cef57b1cd8ea56b4e7177fd27 --- /dev/null +++ b/constant_definitions/train/__init__.py @@ -0,0 +1 @@ +"""Training-related constants for GRPO and DPO pipelines.""" diff --git a/constant_definitions/train/__pycache__/__init__.cpython-311.pyc b/constant_definitions/train/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..579f71708e99e50fd10151355bc14fd6c5ae218e Binary files /dev/null and b/constant_definitions/train/__pycache__/__init__.cpython-311.pyc differ diff --git a/constant_definitions/train/__pycache__/agent_constants.cpython-311.pyc b/constant_definitions/train/__pycache__/agent_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..84caad50278931dcbd34eb205c7d026dccbcda0d Binary files /dev/null and b/constant_definitions/train/__pycache__/agent_constants.cpython-311.pyc differ diff --git a/constant_definitions/train/__pycache__/grpo_constants.cpython-311.pyc b/constant_definitions/train/__pycache__/grpo_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d8b25fb7814f2448a17d531aa90456a5358008ff Binary files /dev/null and b/constant_definitions/train/__pycache__/grpo_constants.cpython-311.pyc differ diff --git a/constant_definitions/train/agent_constants.py b/constant_definitions/train/agent_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..c047b5f4730595d252f2ec198f5b97f9d01fc5e4 --- /dev/null +++ b/constant_definitions/train/agent_constants.py @@ -0,0 +1,35 @@ +"""Constants for the LLM agent prompt builder and action parser.""" + +# Maximum tokens for generated action response +MAX_ACTION_TOKENS = 64 + +# Temperature for training-time generation (numerator / denominator) +TRAIN_TEMPERATURE_NUMERATOR = 7 +TRAIN_TEMPERATURE_DENOMINATOR = 10 + +# Temperature for evaluation-time generation (greedy) +EVAL_TEMPERATURE_NUMERATOR = 0 +EVAL_TEMPERATURE_DENOMINATOR = 1 + +# Top-p sampling parameter (numerator / denominator) +TOP_P_NUMERATOR = 95 +TOP_P_DENOMINATOR = 100 + +# Maximum history rounds shown in prompt (to limit context length) +MAX_PROMPT_HISTORY_ROUNDS = 10 + +# Section delimiters for structured prompt +PROMPT_SECTION_GAME = "GAME" +PROMPT_SECTION_HISTORY = "HISTORY" +PROMPT_SECTION_SCORES = "SCORES" +PROMPT_SECTION_ACTIONS = "AVAILABLE ACTIONS" +PROMPT_SECTION_INSTRUCTION = "INSTRUCTION" + +# Default system prompt (no opponent strategy name -- prevents shortcutting) +SYSTEM_PROMPT = ( + "You are playing a game-theory game. Analyse the situation and choose " + "the best action. Respond with ONLY the action name, nothing else." +) + +# Sentinel returned when LLM output cannot be parsed +PARSE_FAILURE_SENTINEL = "__PARSE_FAILURE__" diff --git a/constant_definitions/train/dpo_constants.py b/constant_definitions/train/dpo_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..b32b75ed151d3f572aa62ce63f48cc7ebe3906b8 --- /dev/null +++ b/constant_definitions/train/dpo_constants.py @@ -0,0 +1,39 @@ +"""Constants for Direct Preference Optimisation (DPO) training.""" + +# DPO beta parameter (KL penalty coefficient), as numerator / denominator +DPO_BETA_NUMERATOR = 1 +DPO_BETA_DENOMINATOR = 10 + +# Learning rate as numerator / denominator (5e-6) +DPO_LR_NUMERATOR = 5 +DPO_LR_DENOMINATOR = 1_000_000 + +# Batch size (preference pairs per step) +DPO_BATCH_SIZE = 4 + +# Training epochs +DPO_NUM_EPOCHS = 1 + +# Number of trajectories to collect per (game, strategy) pair +DPO_TRAJECTORIES_PER_PAIR = 5 + +# Quantile threshold for chosen/rejected selection (top / bottom quartile) +DPO_TOP_QUANTILE_NUMERATOR = 1 +DPO_TOP_QUANTILE_DENOMINATOR = 4 + +DPO_BOTTOM_QUANTILE_NUMERATOR = 1 +DPO_BOTTOM_QUANTILE_DENOMINATOR = 4 + +# Minimum reward margin between chosen and rejected (numerator / denominator) +DPO_MIN_REWARD_MARGIN_NUMERATOR = 2 +DPO_MIN_REWARD_MARGIN_DENOMINATOR = 10 + +# Maximum sequence length for DPO (tokens) +DPO_MAX_LENGTH = 512 + +# Gradient accumulation steps +DPO_GRADIENT_ACCUMULATION_STEPS = 4 + +# Warmup ratio (numerator / denominator) +DPO_WARMUP_RATIO_NUMERATOR = 5 +DPO_WARMUP_RATIO_DENOMINATOR = 100 diff --git a/constant_definitions/train/grpo_constants.py b/constant_definitions/train/grpo_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..7385f3d3cf7502be9b215aaf1f22062278bae133 --- /dev/null +++ b/constant_definitions/train/grpo_constants.py @@ -0,0 +1,44 @@ +"""Constants for Group Relative Policy Optimisation (GRPO) training.""" + +# Learning rate as numerator / denominator (1e-5) +GRPO_LR_NUMERATOR = 1 +GRPO_LR_DENOMINATOR = 100_000 + +# Batch size (number of prompts per optimisation step) +GRPO_BATCH_SIZE = 8 + +# Number of completions generated per prompt for GRPO grouping +GRPO_NUM_GENERATIONS = 4 + +# Training epochs over the collected dataset +GRPO_NUM_EPOCHS = 3 + +# Maximum completion length per round (tokens) +GRPO_MAX_COMPLETION_LENGTH = 64 + +# Gradient accumulation steps +GRPO_GRADIENT_ACCUMULATION_STEPS = 4 + +# Warmup ratio (numerator / denominator) +GRPO_WARMUP_RATIO_NUMERATOR = 3 +GRPO_WARMUP_RATIO_DENOMINATOR = 100 + +# Weight decay (numerator / denominator) +GRPO_WEIGHT_DECAY_NUMERATOR = 1 +GRPO_WEIGHT_DECAY_DENOMINATOR = 100 + +# Per-step shaping reward coefficient alpha (numerator / denominator) +GRPO_SHAPING_ALPHA_NUMERATOR = 1 +GRPO_SHAPING_ALPHA_DENOMINATOR = 10 + +# Checkpoint interval (steps) +GRPO_CHECKPOINT_EVERY = 500 + +# Curriculum: number of base games to start with +GRPO_CURRICULUM_INITIAL_GAMES = 6 + +# Curriculum: games added per expansion step +GRPO_CURRICULUM_EXPANSION_STEP = 8 + +# Logging interval (steps) +GRPO_LOG_EVERY = 10 diff --git a/constant_definitions/train/models/__init__.py b/constant_definitions/train/models/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..8dec822a8228d768f0c8731cc852a956e9bbbb3d --- /dev/null +++ b/constant_definitions/train/models/__init__.py @@ -0,0 +1 @@ +"""Model identity constants for training and evaluation.""" diff --git a/constant_definitions/train/models/anthropic_constants.py b/constant_definitions/train/models/anthropic_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..4a56a631850f5f5a3174ee89abbac7696a7ffd10 --- /dev/null +++ b/constant_definitions/train/models/anthropic_constants.py @@ -0,0 +1,11 @@ +"""Anthropic API model identifiers for baseline evaluation.""" + +# --------------------------------------------------------------------------- +# Anthropic models -- baseline evaluation only +# --------------------------------------------------------------------------- + +CLAUDE_OPUS = "claude-opus-4-6" +CLAUDE_SONNET = "claude-sonnet-4-6" +CLAUDE_HAIKU = "claude-haiku-4-5-20251001" + +ANTHROPIC_MODELS = (CLAUDE_OPUS, CLAUDE_SONNET, CLAUDE_HAIKU) diff --git a/constant_definitions/train/models/local_constants.py b/constant_definitions/train/models/local_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..9af47c7a6aefae696e786b9392bd6962309e196c --- /dev/null +++ b/constant_definitions/train/models/local_constants.py @@ -0,0 +1,47 @@ +"""Open-weight model identifiers for local training and inference.""" + +# --------------------------------------------------------------------------- +# Meta Llama +# --------------------------------------------------------------------------- + +LLAMA_3_2_1B = "meta-llama/Llama-3.2-1B-Instruct" +LLAMA_3_1_8B = "meta-llama/Llama-3.1-8B-Instruct" + +# --------------------------------------------------------------------------- +# Alibaba Qwen (Apache 2.0) +# --------------------------------------------------------------------------- + +QWEN_3_5_9B = "Qwen/Qwen3.5-9B" +QWEN_3_5_27B = "Qwen/Qwen3.5-27B" + +# --------------------------------------------------------------------------- +# Google Gemma +# --------------------------------------------------------------------------- + +GEMMA_3_27B = "google/gemma-3-27b-it" + +# --------------------------------------------------------------------------- +# Microsoft Phi +# --------------------------------------------------------------------------- + +PHI_4_REASONING = "microsoft/Phi-4-reasoning" + +# --------------------------------------------------------------------------- +# Mistral (Apache 2.0) +# --------------------------------------------------------------------------- + +MISTRAL_SMALL_3_24B = "mistralai/Mistral-Small-3.2-24B-Instruct-2506" + +# --------------------------------------------------------------------------- +# All open-weight models (any can be a training target) +# --------------------------------------------------------------------------- + +LOCAL_MODELS = ( + LLAMA_3_2_1B, + LLAMA_3_1_8B, + QWEN_3_5_9B, + QWEN_3_5_27B, + GEMMA_3_27B, + PHI_4_REASONING, + MISTRAL_SMALL_3_24B, +) diff --git a/constant_definitions/train/models/model_constants.py b/constant_definitions/train/models/model_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..c8edcc02ba0000044658073997d121278589661b --- /dev/null +++ b/constant_definitions/train/models/model_constants.py @@ -0,0 +1,64 @@ +"""Model registry -- aggregates all provider-specific model constants.""" + +from constant_definitions.train.models.local_constants import ( + GEMMA_3_27B, + LLAMA_3_1_8B, + LLAMA_3_2_1B, + LOCAL_MODELS, + MISTRAL_SMALL_3_24B, + PHI_4_REASONING, + QWEN_3_5_9B, + QWEN_3_5_27B, +) +from constant_definitions.train.models.openai_constants import ( + GPT_5_4, + GPT_OSS_20B, + OPENAI_API_MODELS, + OPENAI_LOCAL_MODELS, + OPENAI_MODELS, +) +from constant_definitions.train.models.anthropic_constants import ( + ANTHROPIC_MODELS, + CLAUDE_HAIKU, + CLAUDE_OPUS, + CLAUDE_SONNET, +) + +# --------------------------------------------------------------------------- +# Short-name registry +# --------------------------------------------------------------------------- + +# Maps human-readable short names to full model identifiers. +# Used by experiment scripts to select models by name. +MODELS = { + # Open-weight -- Meta + "llama3.2-1b": LLAMA_3_2_1B, + "llama3.1-8b": LLAMA_3_1_8B, + # Open-weight -- Qwen + "qwen3.5-9b": QWEN_3_5_9B, + "qwen3.5-27b": QWEN_3_5_27B, + # Open-weight -- Google + "gemma3-27b": GEMMA_3_27B, + # Open-weight -- Microsoft + "phi4-reasoning": PHI_4_REASONING, + # Open-weight -- Mistral + "mistral-small-24b": MISTRAL_SMALL_3_24B, + # Open-weight -- OpenAI + "gpt-oss-20b": GPT_OSS_20B, + # API -- OpenAI + "gpt-5.4": GPT_5_4, + # API -- Anthropic + "claude-opus": CLAUDE_OPUS, + "claude-sonnet": CLAUDE_SONNET, + "claude-haiku": CLAUDE_HAIKU, +} + +# --------------------------------------------------------------------------- +# Groupings +# --------------------------------------------------------------------------- + +# All open-weight models that can be run and trained locally +ALL_LOCAL_MODELS = LOCAL_MODELS + OPENAI_LOCAL_MODELS + +# Models evaluated via API only (no local weights) +API_MODELS = OPENAI_API_MODELS + ANTHROPIC_MODELS diff --git a/constant_definitions/train/models/openai_constants.py b/constant_definitions/train/models/openai_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..f7851de2f6d9e11a3e4448bbce9d4360fef66ceb --- /dev/null +++ b/constant_definitions/train/models/openai_constants.py @@ -0,0 +1,22 @@ +"""OpenAI model identifiers for evaluation.""" + +# --------------------------------------------------------------------------- +# OpenAI API models +# --------------------------------------------------------------------------- + +GPT_5_4 = "gpt-5.4" + +# --------------------------------------------------------------------------- +# OpenAI open-weight models (Apache 2.0) +# --------------------------------------------------------------------------- + +GPT_OSS_20B = "openai/gpt-oss-20b" + +# API-only models +OPENAI_API_MODELS = (GPT_5_4,) + +# Open-weight models run locally +OPENAI_LOCAL_MODELS = (GPT_OSS_20B,) + +# All OpenAI models +OPENAI_MODELS = OPENAI_API_MODELS + OPENAI_LOCAL_MODELS diff --git a/constant_definitions/train/split_constants.py b/constant_definitions/train/split_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..6e0c3181e68bfeee8dcc22ee924ba4c93248d6bb --- /dev/null +++ b/constant_definitions/train/split_constants.py @@ -0,0 +1,14 @@ +"""Constants for deterministic train/eval game split.""" + +# Seed for reproducible splitting +SPLIT_SEED = 42 + +# Fraction of games allocated to training (remainder goes to eval). +# Expressed as numerator / denominator to avoid float literals. +TRAIN_FRACTION_NUMERATOR = 78 +TRAIN_FRACTION_DENOMINATOR = 100 + +# Minimum fraction of each domain tag that must appear in eval split. +# Ensures every domain has representation in the held-out set. +MIN_EVAL_TAG_FRACTION_NUMERATOR = 20 +MIN_EVAL_TAG_FRACTION_DENOMINATOR = 100 diff --git a/constant_definitions/var/__pycache__/classic_constants.cpython-311.pyc b/constant_definitions/var/__pycache__/classic_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..4b9f4c7665a416ef905e2d0c41d368d20426bef2 Binary files /dev/null and b/constant_definitions/var/__pycache__/classic_constants.cpython-311.pyc differ diff --git a/constant_definitions/var/__pycache__/communication_constants.cpython-311.pyc b/constant_definitions/var/__pycache__/communication_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..a9db8a7f2aef58de77fa31eb58b4627f7836b5b0 Binary files /dev/null and b/constant_definitions/var/__pycache__/communication_constants.cpython-311.pyc differ diff --git a/constant_definitions/var/__pycache__/generated_ext_constants.cpython-311.pyc b/constant_definitions/var/__pycache__/generated_ext_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..240b596d80cd70f92f30ac3bd55cfbe9416195a3 Binary files /dev/null and b/constant_definitions/var/__pycache__/generated_ext_constants.cpython-311.pyc differ diff --git a/constant_definitions/var/__pycache__/infinite_constants.cpython-311.pyc b/constant_definitions/var/__pycache__/infinite_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f978b6314e9dde8cbfc42473cf9a89514444035f Binary files /dev/null and b/constant_definitions/var/__pycache__/infinite_constants.cpython-311.pyc differ diff --git a/constant_definitions/var/__pycache__/pd_variant_constants.cpython-311.pyc b/constant_definitions/var/__pycache__/pd_variant_constants.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..53a412de4f395b8aa155839f777f4a2dc28f4676 Binary files /dev/null and b/constant_definitions/var/__pycache__/pd_variant_constants.cpython-311.pyc differ diff --git a/constant_definitions/var/classic_constants.py b/constant_definitions/var/classic_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..04b011560c7f11dc7fa21a188dac711bf693fec2 --- /dev/null +++ b/constant_definitions/var/classic_constants.py @@ -0,0 +1,23 @@ +# Traveler's Dilemma +TD_MIN_CLAIM = 2 +TD_MAX_CLAIM = 10 +TD_BONUS = 2 + +# Dollar Auction +DOLLAR_PRIZE = 10 +DOLLAR_MAX_BID = 10 + +# Unscrupulous Diner's Dilemma +UD_CHEAP_COST = 3 +UD_EXPENSIVE_COST = 8 +UD_CHEAP_VALUE = 5 +UD_EXPENSIVE_VALUE = 9 + +# Minority Game +MINO_WIN_PAYOFF = 5 +MINO_TIE_PAYOFF = 1 + +# Rock-Paper-Scissors-Lizard-Spock (extended zero-sum) +RPSLS_WIN_PAYOFF = 1 +RPSLS_LOSE_PAYOFF = -1 +RPSLS_DRAW_PAYOFF = 0 diff --git a/constant_definitions/var/communication_constants.py b/constant_definitions/var/communication_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..60042e99851ed504b5e9361767dc5e2a3795a7fb --- /dev/null +++ b/constant_definitions/var/communication_constants.py @@ -0,0 +1,24 @@ +# Cheap Talk PD -- standard PD payoffs (messages are non-binding) +CTPD_REWARD = 3 +CTPD_TEMPTATION = 5 +CTPD_PUNISHMENT = 1 +CTPD_SUCKER = 0 + +# Binding Commitment -- cost of making a binding promise +COMMIT_COST = 1 + +# Correlated Equilibrium (traffic light / mediator) +CE_FOLLOW_FOLLOW = 4 +CE_FOLLOW_DEVIATE = 2 +CE_DEVIATE_FOLLOW = 5 +CE_DEVIATE_DEVIATE = 1 + +# Focal Point (multi-option coordination without communication) +FP_MATCH_PAYOFF = 5 +FP_MISMATCH_PAYOFF = 0 + +# Mediated Game (accept/reject third-party mediation) +MG_ACCEPT_ACCEPT = 4 +MG_ACCEPT_REJECT = 2 +MG_REJECT_ACCEPT = 5 +MG_REJECT_REJECT = 0 diff --git a/constant_definitions/var/generated_ext_constants.py b/constant_definitions/var/generated_ext_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..53737550c1b8fce511bc96a191f43f378c546a29 --- /dev/null +++ b/constant_definitions/var/generated_ext_constants.py @@ -0,0 +1,14 @@ +# Random Zero-Sum game -- constrained to zero-sum payoffs +RZS_SEED = 42 +RZS_MAX_PAYOFF = 5 +RZS_DEFAULT_ACTIONS = 3 + +# Random Coordination game -- diagonal bonus for matching +RC_SEED = 99 +RC_MATCH_BONUS = 5 +RC_MISMATCH_MAX = 2 +RC_DEFAULT_ACTIONS = 3 + +# Parameterized Chicken (Hawk-Dove with custom parameters) +PCHK_RESOURCE = 6 +PCHK_FIGHT_COST = 8 diff --git a/constant_definitions/var/infinite_constants.py b/constant_definitions/var/infinite_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..dde082a08bef9738979b0e2b43546488212a0240 --- /dev/null +++ b/constant_definitions/var/infinite_constants.py @@ -0,0 +1,13 @@ +# Continuous PD -- variable contribution levels +# Payoff = opponent_level * BENEFIT / DEN - own_level * COST / DEN +CPD_BENEFIT_NUMERATOR = 3 +CPD_COST_NUMERATOR = 2 +CPD_DENOMINATOR = 5 +CPD_MAX_LEVEL = 10 + +# Discounted PD -- high-stakes PD with long horizon +DPD_TEMPTATION = 6 +DPD_REWARD = 4 +DPD_PUNISHMENT = 1 +DPD_SUCKER = 0 +DPD_DEFAULT_ROUNDS = 50 diff --git a/constant_definitions/var/pd_variant_constants.py b/constant_definitions/var/pd_variant_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..c546ca1e497bcc058f56770278f1754233cf650c --- /dev/null +++ b/constant_definitions/var/pd_variant_constants.py @@ -0,0 +1,26 @@ +# Optional PD -- exit gives a safe payoff between CC and DD +OPD_EXIT_PAYOFF = 2 + +# Asymmetric PD -- first player has alibi advantage +APD_A_TEMPTATION = 5 +APD_A_REWARD = 3 +APD_A_PUNISHMENT = 2 +APD_A_SUCKER = 1 +APD_B_TEMPTATION = 5 +APD_B_REWARD = 3 +APD_B_PUNISHMENT = 1 +APD_B_SUCKER = 0 + +# Donation Game -- pay cost c to give benefit b to opponent +DONATION_BENEFIT = 5 +DONATION_COST = 2 + +# Friend or Foe (game show) -- both defect yields zero unlike PD +FOF_SHARE_PAYOFF = 1 +FOF_STEAL_WIN_PAYOFF = 2 + +# Peace-War Game (arms race framing of PD) +PW_DISARM_DISARM = 4 +PW_DISARM_ARM = -1 +PW_ARM_DISARM = 6 +PW_ARM_ARM = 0 diff --git a/constant_definitions/zero_sum_constants.py b/constant_definitions/zero_sum_constants.py new file mode 100644 index 0000000000000000000000000000000000000000..567418f687ad197f78e79f7c67c681502c769a9f --- /dev/null +++ b/constant_definitions/zero_sum_constants.py @@ -0,0 +1,11 @@ +# --- Matching Pennies payoffs --- +MP_MATCH_PAYOFF = 1 # Matcher wins when both choose same +MP_MISMATCH_PAYOFF = -1 # Matcher loses when choices differ + +# --- Rock-Paper-Scissors payoffs --- +RPS_WIN_PAYOFF = 1 +RPS_LOSE_PAYOFF = -1 +RPS_DRAW_PAYOFF = 0 + +# --- Rock-Paper-Scissors action count --- +RPS_NUM_ACTIONS = 3 diff --git a/env/Dockerfile b/env/Dockerfile new file mode 100644 index 0000000000000000000000000000000000000000..9dcf8125d55c89ee8630f74e6a66a973ff544727 --- /dev/null +++ b/env/Dockerfile @@ -0,0 +1,27 @@ +FROM openenv-base AS builder + +WORKDIR /app/env + +COPY pyproject.toml . +ARG BUILD_MODE=production + +RUN --mount=type=cache,target=/root/.cache/uv \ + uv sync --frozen --no-dev + +COPY . . + +FROM python:3.11-slim AS runtime + +WORKDIR /app/env + +COPY --from=builder /app/env /app/env + +ENV PATH="/app/env/.venv/bin:$PATH" \ + PYTHONPATH=/app/env + +EXPOSE 8000 + +HEALTHCHECK --interval=30s --retries=3 \ + CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" + +CMD ["uvicorn", "env.app:app", "--host", "0.0.0.0", "--port", "8000"] diff --git a/env/__init__.py b/env/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..7875a3ec4c4c3eba60c29366a2447f958b74729e --- /dev/null +++ b/env/__init__.py @@ -0,0 +1 @@ +"""OpenEnv server and client integration for Kant.""" diff --git a/env/__pycache__/__init__.cpython-311.pyc b/env/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e827f48d91d187059c45b4e587a0a6b9e03c3c8c Binary files /dev/null and b/env/__pycache__/__init__.cpython-311.pyc differ diff --git a/env/__pycache__/environment.cpython-311.pyc b/env/__pycache__/environment.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..739267152949d1acb9193d6952090dd4c8008997 Binary files /dev/null and b/env/__pycache__/environment.cpython-311.pyc differ diff --git a/env/__pycache__/models.cpython-311.pyc b/env/__pycache__/models.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5d63ec9a1572566ea4cbcf288847ca05cca1b946 Binary files /dev/null and b/env/__pycache__/models.cpython-311.pyc differ diff --git a/env/app.py b/env/app.py new file mode 100644 index 0000000000000000000000000000000000000000..79a3877a94f4b73dedcaf2b70548b5162d9b776a --- /dev/null +++ b/env/app.py @@ -0,0 +1,13 @@ +"""FastAPI application factory for Kant.""" +from openenv.core.env_server.http_server import create_app +from env.models import GameAction, GameObservation +from env.environment import KantEnvironment +from constant_definitions.game_constants import MAX_CONCURRENT_ENVS + +app = create_app( + KantEnvironment, + GameAction, + GameObservation, + env_name="kant", + max_concurrent_envs=MAX_CONCURRENT_ENVS, +) diff --git a/env/client.py b/env/client.py new file mode 100644 index 0000000000000000000000000000000000000000..1b6f9e646134ada1e67226551e20f4b280216c02 --- /dev/null +++ b/env/client.py @@ -0,0 +1,36 @@ +"""Kant client for the OpenEnv framework.""" +from __future__ import annotations + +from typing import Any, Optional + +from openenv.core.env_client import EnvClient +from env.models import GameAction, GameObservation, GameState +from constant_definitions.game_constants import SERVER_PORT + + +class KantEnv(EnvClient): + """Gymnasium-style client for the Kant environment. + + Wraps the generic EnvClient WebSocket connection with typed helpers + for the game-theory action and observation schemas. + + Usage:: + + url = f"ws://localhost:{SERVER_PORT}" + async with KantEnv(base_url=url) as env: + obs = await env.reset(game="prisoners_dilemma", strategy="tit_for_tat") + while not obs.done: + obs = await env.step(GameAction(action="cooperate")) + """ + + async def reset(self, **kwargs: Any) -> GameObservation: + raw = await super().reset(**kwargs) + return GameObservation.model_validate(raw) + + async def step(self, action: GameAction, **kwargs: Any) -> GameObservation: + raw = await super().step(action.model_dump(), **kwargs) + return GameObservation.model_validate(raw) + + async def get_state(self) -> GameState: + raw = await super().state() + return GameState.model_validate(raw) diff --git a/env/environment.py b/env/environment.py new file mode 100644 index 0000000000000000000000000000000000000000..fdab7fe45b6e69994ef3ba2a5faf9eb52f2a4370 --- /dev/null +++ b/env/environment.py @@ -0,0 +1,225 @@ +"""Core KantBench environment implementing the OpenEnv Environment interface.""" +from __future__ import annotations + +import uuid +from typing import Any, Callable, Optional + +from openenv.core.env_server.interfaces import Environment +from env.models import GameAction, GameObservation, GameState, RoundResult +from common.games import GameConfig, get_game, GAMES +from common.strategies import get_strategy, STRATEGIES, OpponentStrategy +from constant_definitions.game_constants import DEFAULT_NUM_ROUNDS + +_ONE = int(bool(True)) +_ZERO_F = float() + + +class KantEnvironment(Environment[GameObservation, GameAction, GameState]): + """Game-theory environment hosting multiple classic games. + + The agent plays against a built-in opponent strategy or another agent + function. The opponent's move is computed automatically inside ``step()`` + via the selected strategy or the provided ``opponent_fn``. + """ + + SUPPORTS_CONCURRENT_SESSIONS = True + + def __init__(self) -> None: + super().__init__() + self._game: Optional[GameConfig] = None + self._strategy: Optional[OpponentStrategy] = None + self._strategy_name: str = "" + self._opponent_fn: Optional[Callable[[GameObservation], GameAction]] = None + self._state: GameState = GameState() + + # ------------------------------------------------------------------ + # OpenEnv interface + # ------------------------------------------------------------------ + + def reset( + self, + seed: Optional[int] = None, + episode_id: Optional[str] = None, + **kwargs: Any, + ) -> GameObservation: + game_name: str = kwargs.get("game", "prisoners_dilemma") + strategy_name: str = kwargs.get("strategy", "tit_for_tat") + num_rounds: Optional[int] = kwargs.get("num_rounds") + opponent_fn: Optional[Callable[[GameObservation], GameAction]] = kwargs.get( + "opponent_fn", + ) + + self._game = get_game(game_name) + self._opponent_fn = opponent_fn + if opponent_fn is not None: + self._strategy = None + self._strategy_name = "agent" + else: + self._strategy = get_strategy(strategy_name) + self._strategy_name = strategy_name + + rounds = num_rounds if num_rounds is not None else self._game.default_rounds + + self._state = GameState( + episode_id=episode_id or str(uuid.uuid4()), + game_name=game_name, + opponent_strategy=strategy_name, + total_rounds=rounds, + ) + + return self._build_observation() + + def step( + self, + action: GameAction, + **kwargs: Any, + ) -> GameObservation: + if self._game is None: + raise RuntimeError("Call reset() before step().") + if self._state.is_done: + raise RuntimeError("Episode already finished. Call reset().") + if action.action not in self._game.actions: + raise ValueError( + f"Invalid action '{action.action}'. " + f"Choose from: {self._game.actions}" + ) + + player_action = action.action + opponent_action = self._auto_play_opponent(player_action) + + p_pay, o_pay = self._game.payoff_fn(player_action, opponent_action) + + new_round = len(self._state.history) + _ONE + result = RoundResult( + round_number=new_round, + player_action=player_action, + opponent_action=opponent_action, + player_payoff=p_pay, + opponent_payoff=o_pay, + ) + + history = list(self._state.history) + [result] + p_score = self._state.player_score + p_pay + o_score = self._state.opponent_score + o_pay + done = new_round >= self._state.total_rounds + + self._state = GameState( + episode_id=self._state.episode_id, + step_count=self._state.step_count + _ONE, + game_name=self._state.game_name, + opponent_strategy=self._state.opponent_strategy, + current_round=new_round, + total_rounds=self._state.total_rounds, + player_score=p_score, + opponent_score=o_score, + history=history, + is_done=done, + ) + + return self._build_observation(reward=p_pay, last_round=result, done=done) + + @property + def state(self) -> GameState: + return self._state + + # ------------------------------------------------------------------ + # Internal helpers + # ------------------------------------------------------------------ + + def _auto_play_opponent(self, player_action: str) -> str: + assert self._game is not None + + if self._opponent_fn is not None: + opp_obs = self._build_opponent_observation() + opp_action = self._opponent_fn(opp_obs) + opp_actions = self._opponent_actions() + if opp_action.action not in opp_actions: + raise ValueError( + f"Opponent returned invalid action '{opp_action.action}'. " + f"Choose from: {opp_actions}" + ) + return opp_action.action + + assert self._strategy is not None + hist = [ + { + "player_action": r.player_action, + "opponent_action": r.opponent_action, + } + for r in self._state.history + ] + opp_actions = self._opponent_actions() + return self._strategy.choose_action( + self._game.game_type, opp_actions, hist, + ) + + def _opponent_actions(self) -> list[str]: + assert self._game is not None + gt = self._game.game_type + if gt == "ultimatum": + return ["accept", "reject"] + if gt == "trust": + return _trust_return_actions() + # matrix, public_goods, auction, commons, dictator, centipede, + # stackelberg, and all generated games share action space + return list(self._game.actions) + + def _build_opponent_observation(self) -> GameObservation: + """Build a GameObservation from the opponent's perspective. + + Swaps player/opponent in history, scores, and payoffs so the opponent + agent sees itself as the "player". + """ + assert self._game is not None + flipped_history = [ + RoundResult( + round_number=r.round_number, + player_action=r.opponent_action, + opponent_action=r.player_action, + player_payoff=r.opponent_payoff, + opponent_payoff=r.player_payoff, + ) + for r in self._state.history + ] + opp_actions = self._opponent_actions() + return GameObservation( + done=False, + reward=_ZERO_F, + game_name=self._state.game_name, + game_description=self._game.description, + available_actions=opp_actions, + current_round=self._state.current_round, + total_rounds=self._state.total_rounds, + history=flipped_history, + player_score=self._state.opponent_score, + opponent_score=self._state.player_score, + opponent_strategy="agent", + ) + + def _build_observation( + self, + reward: float = _ZERO_F, + last_round: Optional[RoundResult] = None, + done: bool = False, + ) -> GameObservation: + assert self._game is not None + return GameObservation( + done=done, + reward=reward, + game_name=self._state.game_name, + game_description=self._game.description, + available_actions=list(self._game.actions), + current_round=self._state.current_round, + total_rounds=self._state.total_rounds, + history=list(self._state.history), + player_score=self._state.player_score, + opponent_score=self._state.opponent_score, + opponent_strategy=self._strategy_name, + last_round=last_round, + ) + + +def _trust_return_actions() -> list[str]: + from constant_definitions.game_constants import TRUST_ENDOWMENT, TRUST_MULTIPLIER + cap = TRUST_ENDOWMENT * TRUST_MULTIPLIER + return [f"return_{i}" for i in range(cap + _ONE)] diff --git a/env/models.py b/env/models.py new file mode 100644 index 0000000000000000000000000000000000000000..c1a22fa3d6518cf0309931fc49e44bd57eb5dcad --- /dev/null +++ b/env/models.py @@ -0,0 +1,55 @@ +from __future__ import annotations + +from typing import Optional + +from pydantic import BaseModel, Field + +from constant_definitions.game_constants import ( + DEFAULT_FALSE, + DEFAULT_NONE, + DEFAULT_ZERO_FLOAT, + DEFAULT_ZERO_INT, + MIN_STEP_COUNT, +) + + +class RoundResult(BaseModel): + round_number: int = Field(..., description="Round number (one-indexed)") + player_action: str = Field(..., description="Action taken by the agent") + opponent_action: str = Field(..., description="Action taken by the opponent") + player_payoff: float = Field(..., description="Payoff received by the agent") + opponent_payoff: float = Field(..., description="Payoff received by the opponent") + + +class GameAction(BaseModel): + action: str = Field(..., description="The action to take this round") + metadata: dict = Field(default_factory=dict) + + +class GameObservation(BaseModel): + done: bool = Field(default=DEFAULT_FALSE, description="Whether the episode is over") + reward: float = Field(default=DEFAULT_ZERO_FLOAT, description="Reward for this step") + game_name: str = Field(default="", description="Name of the current game") + game_description: str = Field(default="", description="Description of the game rules") + available_actions: list[str] = Field(default_factory=list, description="Valid actions") + current_round: int = Field(default=DEFAULT_ZERO_INT, description="Current round number") + total_rounds: int = Field(default=DEFAULT_ZERO_INT, description="Total rounds in episode") + history: list[RoundResult] = Field(default_factory=list, description="Round history") + player_score: float = Field(default=DEFAULT_ZERO_FLOAT, description="Cumulative agent score") + opponent_score: float = Field(default=DEFAULT_ZERO_FLOAT, description="Cumulative opponent score") + opponent_strategy: str = Field(default="", description="Name of opponent strategy") + last_round: Optional[RoundResult] = Field(default=DEFAULT_NONE, description="Most recent round") + metadata: dict = Field(default_factory=dict) + + +class GameState(BaseModel): + episode_id: Optional[str] = Field(default=DEFAULT_NONE, description="Episode identifier") + step_count: int = Field(default=DEFAULT_ZERO_INT, ge=MIN_STEP_COUNT, description="Steps taken") + game_name: str = Field(default="", description="Current game name") + opponent_strategy: str = Field(default="", description="Current opponent strategy") + current_round: int = Field(default=DEFAULT_ZERO_INT, description="Current round") + total_rounds: int = Field(default=DEFAULT_ZERO_INT, description="Total rounds") + player_score: float = Field(default=DEFAULT_ZERO_FLOAT, description="Agent cumulative score") + opponent_score: float = Field(default=DEFAULT_ZERO_FLOAT, description="Opponent cumulative score") + history: list[RoundResult] = Field(default_factory=list, description="Round history") + is_done: bool = Field(default=DEFAULT_FALSE, description="Whether episode has ended") diff --git a/env/nplayer/__init__.py b/env/nplayer/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/env/nplayer/coalition/__init__.py b/env/nplayer/coalition/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/env/nplayer/coalition/environment.py b/env/nplayer/coalition/environment.py new file mode 100644 index 0000000000000000000000000000000000000000..34ec89ab88daf33a6d6c73a8841f6c62b604d112 --- /dev/null +++ b/env/nplayer/coalition/environment.py @@ -0,0 +1,275 @@ +"""Coalition formation environment wrapping NPlayerEnvironment.""" +from __future__ import annotations +from typing import Callable, Optional +from common.games_meta.coalition_config import CoalitionGameConfig, get_coalition_game +from constant_definitions.nplayer.coalition_constants import ( + COALITION_PHASE_NEGOTIATE, COALITION_PHASE_ACTION, ENFORCEMENT_BINDING, +) +from env.nplayer.coalition.models import ( + ActiveCoalition, CoalitionAction, CoalitionObservation, + CoalitionProposal, CoalitionResponse, CoalitionRoundResult, +) +from env.nplayer.coalition.payoffs import compute_coalition_payoffs +from env.nplayer.coalition.strategies import ( + CoalitionStrategy, CoalitionRandomStrategy, get_coalition_strategy, +) +from env.nplayer.environment import NPlayerEnvironment +from env.nplayer.governance.engine import GovernanceEngine +from env.nplayer.governance.models import GovernanceVote +from env.nplayer.models import NPlayerAction, NPlayerObservation + +_ONE = int(bool(True)) +_ZERO = int() +_ZERO_F = float() + + +class CoalitionEnvironment: + """Coalition layer over NPlayerEnvironment with meta-governance.""" + + def __init__(self) -> None: + self._inner = NPlayerEnvironment() + self._config: Optional[CoalitionGameConfig] = None + self._strategies: list[CoalitionStrategy] = [] + self._active_coalitions: list[ActiveCoalition] = [] + self._phase: str = "" + self._coalition_history: list[CoalitionRoundResult] = [] + self._pending_proposals: list[CoalitionProposal] = [] + self._opponent_actions: list[str] = [] + self._score_adjustments: list[float] = [] + self._last_inner_obs: Optional[NPlayerObservation] = None + self._round_proposals: list[CoalitionProposal] = [] + self._round_responses: list[CoalitionResponse] = [] + self._active_players: set[int] = set() + self._governance = GovernanceEngine() + + @property + def active_players(self) -> set[int]: return set(self._active_players) + @property + def phase(self) -> str: return self._phase + @property + def inner(self) -> NPlayerEnvironment: return self._inner + @property + def governance(self) -> GovernanceEngine: return self._governance + + def reset( + self, game: str, *, coalition_strategies: Optional[list[str]] = None, + num_rounds: Optional[int] = None, episode_id: Optional[str] = None, + ) -> CoalitionObservation: + self._config = get_coalition_game(game) + n, num_opp = self._config.num_players, self._config.num_players - _ONE + if coalition_strategies is None: + self._strategies = [CoalitionRandomStrategy() for _ in range(num_opp)] + else: + names = list(coalition_strategies) + while len(names) < num_opp: + names.append(names[-_ONE]) + self._strategies = [get_coalition_strategy(s) for s in names] + self._opponent_actions = [""] * num_opp + fns = [self._make_fn(i) for i in range(num_opp)] + self._last_inner_obs = self._inner.reset( + game, num_rounds=num_rounds, opponent_fns=fns, episode_id=episode_id, + ) + self._active_coalitions, self._coalition_history = [], [] + self._score_adjustments = [_ZERO_F] * n + self._active_players = set(range(n)) + self._governance.reset(self._config) + self._pending_proposals = self._gen_opponent_proposals() + self._phase = COALITION_PHASE_NEGOTIATE + return self._build_obs() + + def negotiate_step(self, action: CoalitionAction) -> CoalitionObservation: + if self._phase != COALITION_PHASE_NEGOTIATE: + raise RuntimeError("Not in negotiate phase. Call reset() or complete action phase first.") + assert self._config is not None + all_proposals = list(self._pending_proposals) + all_responses: list[CoalitionResponse] = list(action.responses) + new_coalitions: list[ActiveCoalition] = [] + p_zero_resp = {r.proposal_index: r.accepted for r in action.responses} + for idx, prop in enumerate(self._pending_proposals): + if self._proposal_accepted(prop, p_zero_resp, idx): + new_coalitions.append(ActiveCoalition( + members=list(prop.members), agreed_action=prop.agreed_action, + side_payment=prop.side_payment)) + for pi, prop in enumerate(action.proposals): + all_proposals.append(prop) + if self._primary_proposal_accepted(prop): + new_coalitions.append(ActiveCoalition( + members=list(prop.members), agreed_action=prop.agreed_action, + side_payment=prop.side_payment)) + self._active_coalitions = new_coalitions + self._round_proposals, self._round_responses = all_proposals, all_responses + self._apply_proposal_targets(all_proposals, new_coalitions) + self._run_governance(action) + self._phase = COALITION_PHASE_ACTION + return self._build_obs() + + def action_step(self, action: NPlayerAction) -> CoalitionObservation: + if self._phase != COALITION_PHASE_ACTION: + raise RuntimeError("Not in action phase. Call negotiate_step() first.") + assert self._config is not None + n, enforcement = self._config.num_players, self._governance.rules.enforcement + p_zero_action = action.action + if enforcement == ENFORCEMENT_BINDING: + for c in self._active_coalitions: + if _ZERO in c.members: + p_zero_action = c.agreed_action + break + for i, strat in enumerate(self._strategies): + pidx = i + _ONE + if pidx not in self._active_players: + self._opponent_actions[i] = self._config.actions[_ZERO] + continue + chosen = strat.choose_action(self._build_obs_for(pidx)) + if enforcement == ENFORCEMENT_BINDING: + for c in self._active_coalitions: + if pidx in c.members: + chosen = c.agreed_action + break + self._opponent_actions[i] = chosen + inner_obs = self._inner.step(NPlayerAction(action=p_zero_action)) + self._last_inner_obs = inner_obs + base_t = tuple(inner_obs.last_round.payoffs) + rules = self._governance.rules + adjusted, defectors, penalties, side_pmts = compute_coalition_payoffs( + base_t, tuple(inner_obs.last_round.actions), self._active_coalitions, + rules.enforcement, rules.penalty_numerator, rules.penalty_denominator) + adj_list = self._governance.apply(list(adjusted), self._active_players) + for i in range(n): + if i not in self._active_players: + adj_list[i] = _ZERO_F + adjusted = tuple(adj_list) + for i in range(n): + self._score_adjustments[i] += adjusted[i] - base_t[i] + self._coalition_history.append(CoalitionRoundResult( + round_number=len(self._coalition_history) + _ONE, + proposals=list(self._round_proposals), responses=list(self._round_responses), + active_coalitions=list(self._active_coalitions), + defectors=defectors, penalties=penalties, side_payments=side_pmts)) + if inner_obs.done: + self._phase = "" + return self._build_obs(reward_override=adjusted[_ZERO]) + self._active_coalitions = [] + self._pending_proposals = self._gen_opponent_proposals() + self._phase = COALITION_PHASE_NEGOTIATE + return self._build_obs(reward_override=adjusted[_ZERO]) + + def remove_player(self, player_index: int) -> None: + """Deactivate a player. Negotiate phase only.""" + if self._phase != COALITION_PHASE_NEGOTIATE: + raise RuntimeError("Can only remove players during negotiate phase.") + assert self._config is not None + if player_index < _ZERO or player_index >= self._config.num_players: + raise ValueError(f"Player index out of range: {player_index}") + if player_index not in self._active_players: + raise ValueError(f"Player {player_index} is already inactive.") + self._active_players.discard(player_index) + + def add_player(self, player_index: int, strategy: Optional[str] = None) -> None: + """Reactivate a previously removed player. Negotiate phase only.""" + if self._phase != COALITION_PHASE_NEGOTIATE: + raise RuntimeError("Can only add players during negotiate phase.") + assert self._config is not None + if player_index < _ZERO or player_index >= self._config.num_players: + raise ValueError(f"Player index out of range: {player_index}") + if player_index in self._active_players: + raise ValueError(f"Player {player_index} is already active.") + self._active_players.add(player_index) + if strategy is not None and player_index > _ZERO: + opp_idx = player_index - _ONE + if opp_idx < len(self._strategies): + self._strategies[opp_idx] = get_coalition_strategy(strategy) + + def _run_governance(self, action: CoalitionAction) -> None: + assert self._config is not None + gov_proposals = list(action.governance_proposals) + for i, strat in enumerate(self._strategies): + pidx = i + _ONE + if pidx in self._active_players and hasattr(strat, "propose_governance"): + gov_proposals.extend(strat.propose_governance(pidx)) + self._governance.submit_proposals(gov_proposals, self._active_players) + pending = self._governance.pending_proposals + votes: list[GovernanceVote] = list(action.governance_votes) + for i, strat in enumerate(self._strategies): + pidx = i + _ONE + if pidx in self._active_players and hasattr(strat, "vote_on_governance"): + votes.extend(strat.vote_on_governance(pidx, pending)) + self._governance.tally_votes(votes, self._active_players) + + def _apply_proposal_targets( + self, all_proposals: list[CoalitionProposal], accepted: list[ActiveCoalition], + ) -> None: + accepted_members = [tuple(c.members) for c in accepted] + for prop in all_proposals: + if tuple(prop.members) not in accepted_members: + continue + if prop.exclude_target is not None and prop.exclude_target in self._active_players: + self._active_players.discard(prop.exclude_target) + if prop.include_target is not None and prop.include_target not in self._active_players: + self._active_players.add(prop.include_target) + + def _make_fn(self, idx: int) -> Callable[[NPlayerObservation], NPlayerAction]: + def fn(obs: NPlayerObservation) -> NPlayerAction: + return NPlayerAction(action=self._opponent_actions[idx]) + return fn + + def _gen_opponent_proposals(self) -> list[CoalitionProposal]: + proposals: list[CoalitionProposal] = [] + for i, strat in enumerate(self._strategies): + pidx = i + _ONE + if pidx in self._active_players: + proposals.extend(strat.negotiate(self._build_obs_for(pidx)).proposals) + return proposals + + def _proposal_accepted( + self, prop: CoalitionProposal, p_zero_resp: dict[int, bool], idx: int, + ) -> bool: + for member in prop.members: + if member != prop.proposer and member == _ZERO and not p_zero_resp.get(idx, False): + return False + return True + + def _primary_proposal_accepted(self, prop: CoalitionProposal) -> bool: + assert self._config is not None + for member in prop.members: + if member == prop.proposer or member == _ZERO: + continue + opp_idx = member - _ONE + if opp_idx < len(self._strategies): + if not self._strategies[opp_idx].respond_to_proposal(self._build_obs_for(member), prop): + return False + else: + return False + return True + + def _build_obs(self, reward_override: Optional[float] = None) -> CoalitionObservation: + assert self._last_inner_obs is not None and self._config is not None + base = self._last_inner_obs + adj_scores = [s + a for s, a in zip(base.scores, self._score_adjustments)] + reward = reward_override if reward_override is not None else base.reward + rules = self._governance.rules + return CoalitionObservation( + base=base.model_copy(update={"reward": reward}), phase=self._phase, + active_coalitions=list(self._active_coalitions), + pending_proposals=list(self._pending_proposals), + coalition_history=list(self._coalition_history), + enforcement=rules.enforcement, adjusted_scores=adj_scores, + active_players=sorted(self._active_players), + current_rules=rules.model_copy(deep=True), + pending_governance=self._governance.pending_proposals, + governance_history=list(rules.governance_history)) + + def _build_obs_for(self, player_index: int) -> CoalitionObservation: + assert self._config is not None + inner_obs = self._inner._build_observation(player_index) + adj_scores = [s + a for s, a in zip(inner_obs.scores, self._score_adjustments)] + rules = self._governance.rules + return CoalitionObservation( + base=inner_obs, phase=self._phase, + active_coalitions=list(self._active_coalitions), + pending_proposals=list(self._pending_proposals), + coalition_history=list(self._coalition_history), + enforcement=rules.enforcement, adjusted_scores=adj_scores, + active_players=sorted(self._active_players), + current_rules=rules.model_copy(deep=True), + pending_governance=self._governance.pending_proposals, + governance_history=list(rules.governance_history)) diff --git a/env/nplayer/coalition/models.py b/env/nplayer/coalition/models.py new file mode 100644 index 0000000000000000000000000000000000000000..95e01209531a5e91be40b6f28ee15e78b7e1502b --- /dev/null +++ b/env/nplayer/coalition/models.py @@ -0,0 +1,108 @@ +"""Data models for the coalition formation layer.""" + +from __future__ import annotations + +from typing import Optional + +from pydantic import BaseModel, Field + +from constant_definitions.game_constants import ( + DEFAULT_FALSE, + DEFAULT_NONE, + DEFAULT_ZERO_FLOAT, + DEFAULT_ZERO_INT, +) +from constant_definitions.nplayer.coalition_constants import ( + COALITION_PHASE_NEGOTIATE, + ENFORCEMENT_CHEAP_TALK, + COALITION_DEFAULT_SIDE_PAYMENT, +) +from env.nplayer.governance.models import ( + GovernanceProposal, GovernanceResult, GovernanceVote, RuntimeRules, +) +from env.nplayer.models import NPlayerObservation, NPlayerRoundResult + + +class CoalitionProposal(BaseModel): + proposer: int = Field(..., description="Player index of the proposer") + members: list[int] = Field(..., description="Player indices in the coalition (including proposer)") + agreed_action: str = Field(..., description="Action members agree to take") + side_payment: float = Field( + default=float(COALITION_DEFAULT_SIDE_PAYMENT), + description="Payment from proposer to each other member", + ) + exclude_target: Optional[int] = Field( + default=DEFAULT_NONE, + description="If set, coalition votes to remove this player on acceptance", + ) + include_target: Optional[int] = Field( + default=DEFAULT_NONE, + description="If set, coalition votes to reactivate this player on acceptance", + ) + + +class CoalitionResponse(BaseModel): + responder: int = Field(..., description="Player index of the responder") + proposal_index: int = Field(..., description="Index into the proposals list") + accepted: bool = Field(..., description="Whether the responder accepts") + + +class ActiveCoalition(BaseModel): + members: list[int] = Field(..., description="Player indices in the coalition") + agreed_action: str = Field(..., description="Action members agreed to take") + side_payment: float = Field( + default=float(COALITION_DEFAULT_SIDE_PAYMENT), + description="Payment from proposer to each other member", + ) + + +class CoalitionRoundResult(BaseModel): + round_number: int = Field(..., description="Round number (one-indexed)") + proposals: list[CoalitionProposal] = Field(default_factory=list) + responses: list[CoalitionResponse] = Field(default_factory=list) + active_coalitions: list[ActiveCoalition] = Field(default_factory=list) + defectors: list[int] = Field(default_factory=list, description="Player indices who defected") + penalties: list[float] = Field(default_factory=list, description="Penalty per player") + side_payments: list[float] = Field(default_factory=list, description="Net side payment per player") + + +class CoalitionObservation(BaseModel): + base: NPlayerObservation = Field( + default_factory=NPlayerObservation, + description="Underlying N-player observation", + ) + phase: str = Field(default=COALITION_PHASE_NEGOTIATE, description="Current phase") + active_coalitions: list[ActiveCoalition] = Field(default_factory=list) + pending_proposals: list[CoalitionProposal] = Field( + default_factory=list, + description="Proposals from opponents awaiting player response", + ) + coalition_history: list[CoalitionRoundResult] = Field(default_factory=list) + enforcement: str = Field(default=ENFORCEMENT_CHEAP_TALK) + adjusted_scores: list[float] = Field( + default_factory=list, + description="Scores after coalition payoff adjustments", + ) + active_players: list[int] = Field( + default_factory=list, + description="Indices of players currently active in the game", + ) + current_rules: Optional[RuntimeRules] = Field( + default=DEFAULT_NONE, + description="Current governance runtime rules", + ) + pending_governance: list[GovernanceProposal] = Field( + default_factory=list, + description="Governance proposals pending vote", + ) + governance_history: list[GovernanceResult] = Field( + default_factory=list, + description="History of governance rounds", + ) + + +class CoalitionAction(BaseModel): + proposals: list[CoalitionProposal] = Field(default_factory=list) + responses: list[CoalitionResponse] = Field(default_factory=list) + governance_proposals: list[GovernanceProposal] = Field(default_factory=list) + governance_votes: list[GovernanceVote] = Field(default_factory=list) diff --git a/env/nplayer/coalition/payoffs.py b/env/nplayer/coalition/payoffs.py new file mode 100644 index 0000000000000000000000000000000000000000..65e37bc910c403ddb667310fccee19bda95bf2ee --- /dev/null +++ b/env/nplayer/coalition/payoffs.py @@ -0,0 +1,75 @@ +"""Pure functions for computing coalition-adjusted payoffs.""" + +from __future__ import annotations + +from constant_definitions.nplayer.coalition_constants import ( + ENFORCEMENT_CHEAP_TALK, + ENFORCEMENT_PENALTY, + ENFORCEMENT_BINDING, +) +from env.nplayer.coalition.models import ActiveCoalition + +_ONE = int(bool(True)) +_ZERO = int() +_ZERO_F = float() + + +def compute_coalition_payoffs( + base_payoffs: tuple[float, ...], + actions: tuple[str, ...], + active_coalitions: list[ActiveCoalition], + enforcement: str, + penalty_numerator: int, + penalty_denominator: int, +) -> tuple[tuple[float, ...], list[int], list[float], list[float]]: + """Compute payoffs adjusted for coalition agreements. + + Returns + ------- + adjusted_payoffs : tuple[float, ...] + Payoffs after penalties and side payments. + defectors : list[int] + Player indices who broke a coalition agreement. + penalties : list[float] + Penalty amount per player (zero for non-defectors). + side_payments : list[float] + Net side-payment transfer per player. + """ + n = len(base_payoffs) + adjusted = list(base_payoffs) + penalties = [_ZERO_F] * n + side_pmts = [_ZERO_F] * n + defectors: list[int] = [] + + # Identify defectors: coalition members who did not play the agreed action + for coalition in active_coalitions: + for member in coalition.members: + if member < n and actions[member] != coalition.agreed_action: + if member not in defectors: + defectors.append(member) + + # Apply enforcement + if enforcement == ENFORCEMENT_PENALTY: + for d in defectors: + penalty = base_payoffs[d] * penalty_numerator / penalty_denominator + penalties[d] = penalty + adjusted[d] = base_payoffs[d] - penalty + + # Under cheap_talk, no payoff modification. + # Under binding, actions were already overridden so defectors list should + # be empty unless something external bypassed the override. + + # Apply side payments + for coalition in active_coalitions: + if coalition.side_payment > _ZERO_F: + proposer = coalition.members[_ZERO] + other_members = coalition.members[_ONE:] + total_paid = coalition.side_payment * len(other_members) + side_pmts[proposer] -= total_paid + adjusted[proposer] -= total_paid + for m in other_members: + if m < n: + side_pmts[m] += coalition.side_payment + adjusted[m] += coalition.side_payment + + return tuple(adjusted), defectors, penalties, side_pmts diff --git a/env/nplayer/coalition/strategies.py b/env/nplayer/coalition/strategies.py new file mode 100644 index 0000000000000000000000000000000000000000..dad3f4744221b64e88c23150f7dbae16b703e3a6 --- /dev/null +++ b/env/nplayer/coalition/strategies.py @@ -0,0 +1,146 @@ +"""Coalition-aware opponent strategies.""" + +from __future__ import annotations + +import random +from typing import Protocol + +from env.nplayer.coalition.models import ( + CoalitionAction, + CoalitionObservation, + CoalitionProposal, +) + +_ONE = int(bool(True)) +_ZERO = int() + + +class CoalitionStrategy(Protocol): + """Interface for coalition opponent strategies.""" + + def negotiate(self, observation: CoalitionObservation) -> CoalitionAction: ... + + def respond_to_proposal( + self, observation: CoalitionObservation, proposal: CoalitionProposal, + ) -> bool: ... + + def choose_action(self, observation: CoalitionObservation) -> str: ... + + +# --------------------------------------------------------------------------- +# Built-in strategies +# --------------------------------------------------------------------------- + + +class CoalitionRandomStrategy: + """Random accept/reject; random action choice.""" + + def negotiate(self, observation: CoalitionObservation) -> CoalitionAction: + return CoalitionAction() + + def respond_to_proposal( + self, observation: CoalitionObservation, proposal: CoalitionProposal, + ) -> bool: + return random.choice([True, False]) + + def choose_action(self, observation: CoalitionObservation) -> str: + return random.choice(observation.base.available_actions) + + +class CoalitionLoyalStrategy: + """Accepts all proposals and always honours the agreed action.""" + + def negotiate(self, observation: CoalitionObservation) -> CoalitionAction: + return CoalitionAction() + + def respond_to_proposal( + self, observation: CoalitionObservation, proposal: CoalitionProposal, + ) -> bool: + return True + + def choose_action(self, observation: CoalitionObservation) -> str: + for coalition in observation.active_coalitions: + if observation.base.player_index in coalition.members: + if coalition.agreed_action in observation.base.available_actions: + return coalition.agreed_action + return observation.base.available_actions[_ZERO] + + +class CoalitionBetrayerStrategy: + """Accepts proposals but deliberately defects.""" + + def negotiate(self, observation: CoalitionObservation) -> CoalitionAction: + return CoalitionAction() + + def respond_to_proposal( + self, observation: CoalitionObservation, proposal: CoalitionProposal, + ) -> bool: + return True + + def choose_action(self, observation: CoalitionObservation) -> str: + for coalition in observation.active_coalitions: + if observation.base.player_index in coalition.members: + agreed = coalition.agreed_action + alternatives = [ + a for a in observation.base.available_actions + if a != agreed + ] + if alternatives: + return alternatives[_ZERO] + return observation.base.available_actions[_ZERO] + + +class CoalitionConditionalStrategy: + """Honours agreements if others honoured theirs last round.""" + + def negotiate(self, observation: CoalitionObservation) -> CoalitionAction: + return CoalitionAction() + + def respond_to_proposal( + self, observation: CoalitionObservation, proposal: CoalitionProposal, + ) -> bool: + return True + + def choose_action(self, observation: CoalitionObservation) -> str: + # Check if anyone defected last round + if observation.coalition_history: + last = observation.coalition_history[-_ONE] + my_idx = observation.base.player_index + others_defected = any( + d != my_idx for d in last.defectors + ) + if others_defected: + # Defect: pick a non-agreed action + for coalition in observation.active_coalitions: + if my_idx in coalition.members: + alternatives = [ + a for a in observation.base.available_actions + if a != coalition.agreed_action + ] + if alternatives: + return alternatives[_ZERO] + return observation.base.available_actions[_ZERO] + + # Honour the agreement + for coalition in observation.active_coalitions: + if observation.base.player_index in coalition.members: + if coalition.agreed_action in observation.base.available_actions: + return coalition.agreed_action + return observation.base.available_actions[_ZERO] + + +# --------------------------------------------------------------------------- +# Registry +# --------------------------------------------------------------------------- + +COALITION_STRATEGIES: dict[str, CoalitionStrategy] = { + "coalition_random": CoalitionRandomStrategy(), + "coalition_loyal": CoalitionLoyalStrategy(), + "coalition_betrayer": CoalitionBetrayerStrategy(), + "coalition_conditional": CoalitionConditionalStrategy(), +} + + +def get_coalition_strategy(name: str) -> CoalitionStrategy: + """Look up a coalition strategy by name. Raises KeyError if not found.""" + return COALITION_STRATEGIES[name] diff --git a/env/nplayer/environment.py b/env/nplayer/environment.py new file mode 100644 index 0000000000000000000000000000000000000000..6617072feb7a514a47f69faaaa9d85a326f09c39 --- /dev/null +++ b/env/nplayer/environment.py @@ -0,0 +1,207 @@ +"""N-player game environment.""" + +from __future__ import annotations + +import uuid +from typing import Any, Callable, Optional + +from common.games_meta.nplayer_config import NPlayerGameConfig, get_nplayer_game +from env.nplayer.models import ( + NPlayerAction, + NPlayerGameState, + NPlayerObservation, + NPlayerRoundResult, +) +from env.nplayer.strategies import get_nplayer_strategy, NPlayerStrategy + +_ONE = int(bool(True)) +_ZERO = int() +_ZERO_F = float() + + +class NPlayerEnvironment: + """Game-theory environment for N-player games. + + Player zero is the primary agent controlled via ``step()``. + Players one through N-minus-one are auto-played by strategies or + caller-provided functions (``opponent_fns``). + """ + + def __init__(self) -> None: + self._game: Optional[NPlayerGameConfig] = None + self._strategies: list[Optional[NPlayerStrategy]] = [] + self._opponent_fns: list[Optional[Callable[[NPlayerObservation], NPlayerAction]]] = [] + self._state: NPlayerGameState = NPlayerGameState() + + # ------------------------------------------------------------------ + # Public API + # ------------------------------------------------------------------ + + def reset( + self, + game: str, + *, + num_rounds: Optional[int] = None, + opponent_strategies: Optional[list[str]] = None, + opponent_fns: Optional[list[Optional[Callable[[NPlayerObservation], NPlayerAction]]]] = None, + episode_id: Optional[str] = None, + ) -> NPlayerObservation: + """Start a new episode. + + Parameters + ---------- + game: + Key in ``NPLAYER_GAMES``. + num_rounds: + Override the default round count. + opponent_strategies: + Strategy names for players one through N-minus-one. If shorter + than needed, the last entry is repeated. Defaults to all + ``"random"``. + opponent_fns: + Callable opponents for players one through N-minus-one. ``None`` + entries fall back to the corresponding strategy. + episode_id: + Optional identifier for the episode. + """ + self._game = get_nplayer_game(game) + n = self._game.num_players + num_opponents = n - _ONE + + # Resolve strategies + if opponent_strategies is None: + strat_names = ["random"] * num_opponents + else: + strat_names = list(opponent_strategies) + while len(strat_names) < num_opponents: + strat_names.append(strat_names[-_ONE]) + self._strategies = [get_nplayer_strategy(s) for s in strat_names] + + # Resolve opponent fns + if opponent_fns is None: + self._opponent_fns = [None] * num_opponents + else: + fns: list[Optional[Callable]] = list(opponent_fns) + while len(fns) < num_opponents: + fns.append(None) + self._opponent_fns = fns + + rounds = num_rounds if num_rounds is not None else self._game.default_rounds + + self._state = NPlayerGameState( + episode_id=episode_id or str(uuid.uuid4()), + game_name=game, + total_rounds=rounds, + num_players=n, + scores=[_ZERO_F] * n, + ) + + return self._build_observation(_ZERO) + + def step(self, action: NPlayerAction) -> NPlayerObservation: + """Execute one round. + + The caller supplies the action for player zero. Opponents are + auto-played. + """ + if self._game is None: + raise RuntimeError("Call reset() before step().") + if self._state.is_done: + raise RuntimeError("Episode already finished. Call reset().") + if action.action not in self._game.actions: + raise ValueError( + f"Invalid action '{action.action}'. " + f"Choose from: {self._game.actions}" + ) + + # Collect all actions: player zero first, then opponents + all_actions: list[str] = [action.action] + for idx in range(len(self._strategies)): + player_idx = idx + _ONE + opp_action = self._get_opponent_action(idx, player_idx) + all_actions.append(opp_action) + + actions_tuple = tuple(all_actions) + payoffs_tuple = self._game.payoff_fn(actions_tuple) + + new_round = len(self._state.history) + _ONE + result = NPlayerRoundResult( + round_number=new_round, + actions=list(all_actions), + payoffs=list(payoffs_tuple), + ) + + history = list(self._state.history) + [result] + new_scores = [ + s + p for s, p in zip(self._state.scores, payoffs_tuple) + ] + done = new_round >= self._state.total_rounds + + self._state = NPlayerGameState( + episode_id=self._state.episode_id, + step_count=self._state.step_count + _ONE, + game_name=self._state.game_name, + current_round=new_round, + total_rounds=self._state.total_rounds, + num_players=self._state.num_players, + scores=new_scores, + history=history, + is_done=done, + ) + + return self._build_observation( + _ZERO, + reward=payoffs_tuple[_ZERO], + last_round=result, + done=done, + ) + + @property + def state(self) -> NPlayerGameState: + return self._state + + # ------------------------------------------------------------------ + # Internal helpers + # ------------------------------------------------------------------ + + def _get_opponent_action(self, opp_idx: int, player_idx: int) -> str: + """Get the action for opponent at opp_idx (player player_idx).""" + assert self._game is not None + fn = self._opponent_fns[opp_idx] + if fn is not None: + obs = self._build_observation(player_idx) + opp_action = fn(obs) + if opp_action.action not in self._game.actions: + raise ValueError( + f"Opponent {player_idx} returned invalid action " + f"'{opp_action.action}'. Choose from: {self._game.actions}" + ) + return opp_action.action + + strategy = self._strategies[opp_idx] + assert strategy is not None + obs = self._build_observation(player_idx) + return strategy.choose_action(obs) + + def _build_observation( + self, + player_index: int, + reward: float = _ZERO_F, + last_round: Optional[NPlayerRoundResult] = None, + done: bool = False, + ) -> NPlayerObservation: + assert self._game is not None + return NPlayerObservation( + done=done, + reward=reward, + game_name=self._state.game_name, + game_description=self._game.description, + available_actions=list(self._game.actions), + current_round=self._state.current_round, + total_rounds=self._state.total_rounds, + history=list(self._state.history), + scores=list(self._state.scores), + num_players=self._state.num_players, + player_index=player_index, + last_round=last_round, + ) diff --git a/env/nplayer/governance/__init__.py b/env/nplayer/governance/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..285c0b2d34ff8c994d541d55280192f6ac811db4 --- /dev/null +++ b/env/nplayer/governance/__init__.py @@ -0,0 +1 @@ +"""Meta-governance layer for coalition games.""" diff --git a/env/nplayer/governance/engine.py b/env/nplayer/governance/engine.py new file mode 100644 index 0000000000000000000000000000000000000000..2bad0450bbbc4243d1f4d6610497d3f3b5d3c14d --- /dev/null +++ b/env/nplayer/governance/engine.py @@ -0,0 +1,218 @@ +"""GovernanceEngine — manages mutable RuntimeRules over a frozen config.""" + +from __future__ import annotations + +from typing import Callable, Optional + +from common.games_meta.coalition_config import CoalitionGameConfig +from constant_definitions.nplayer.governance_constants import ( + GOVERNANCE_PROPOSAL_PARAMETER, + GOVERNANCE_PROPOSAL_MECHANIC, + GOVERNANCE_PROPOSAL_CUSTOM, + GOVERNANCE_MAJORITY_NUMERATOR, + GOVERNANCE_MAJORITY_DENOMINATOR, + GOVERNANCE_MAX_PROPOSALS_PER_ROUND, + GOVERNANCE_CUSTOM_DELTA_CLAMP_NUMERATOR, + GOVERNANCE_CUSTOM_DELTA_CLAMP_DENOMINATOR, + MECHANIC_ORDER, +) +from env.nplayer.governance.mechanics import apply_mechanics +from env.nplayer.governance.models import ( + GovernanceProposal, + GovernanceResult, + GovernanceVote, + MechanicConfig, + RuntimeRules, +) + +_ZERO = int() +_ONE = int(bool(True)) +_ZERO_F = float() + +_PARAMETER_FIELDS = {"enforcement", "penalty_numerator", "penalty_denominator", "allow_side_payments"} + + +class GovernanceEngine: + """Manages governance proposals, voting, and payoff modification.""" + + def __init__(self) -> None: + self._rules: RuntimeRules = RuntimeRules() + self._pending: list[GovernanceProposal] = [] + self._custom_modifiers: dict[str, Callable[[list[float], set[int]], list[float]]] = {} + + @property + def rules(self) -> RuntimeRules: + return self._rules + + @property + def pending_proposals(self) -> list[GovernanceProposal]: + return list(self._pending) + + def reset(self, config: CoalitionGameConfig) -> None: + """Initialize RuntimeRules from a frozen config.""" + self._rules = RuntimeRules( + enforcement=config.enforcement, + penalty_numerator=config.penalty_numerator, + penalty_denominator=config.penalty_denominator, + allow_side_payments=config.allow_side_payments, + mechanics={name: False for name in MECHANIC_ORDER}, + mechanic_config=MechanicConfig(), + custom_modifier_keys=[], + governance_history=[], + ) + self._pending = [] + self._custom_modifiers = {} + + def submit_proposals( + self, proposals: list[GovernanceProposal], active_players: set[int], + ) -> list[GovernanceProposal]: + """Validate and queue proposals. Returns accepted (queued) proposals.""" + accepted: list[GovernanceProposal] = [] + for prop in proposals: + if len(self._pending) >= GOVERNANCE_MAX_PROPOSALS_PER_ROUND: + break + if prop.proposer not in active_players: + continue + if not self._validate_proposal(prop): + continue + self._pending.append(prop) + accepted.append(prop) + return accepted + + def tally_votes( + self, votes: list[GovernanceVote], active_players: set[int], + ) -> GovernanceResult: + """Count votes, apply majority-approved changes, return result.""" + n_active = len(active_players) + threshold = n_active * GOVERNANCE_MAJORITY_NUMERATOR // GOVERNANCE_MAJORITY_DENOMINATOR + _ONE + # Build vote counts per proposal + approve_counts: dict[int, int] = {} + reject_counts: dict[int, int] = {} + for v in votes: + if v.voter not in active_players: + continue + if v.approve: + approve_counts[v.proposal_index] = approve_counts.get(v.proposal_index, _ZERO) + _ONE + else: + reject_counts[v.proposal_index] = reject_counts.get(v.proposal_index, _ZERO) + _ONE + adopted: list[int] = [] + rejected: list[int] = [] + for idx in range(len(self._pending)): + if approve_counts.get(idx, _ZERO) >= threshold: + adopted.append(idx) + self._apply_proposal(self._pending[idx]) + else: + rejected.append(idx) + result = GovernanceResult( + proposals=list(self._pending), + votes=list(votes), + adopted=adopted, + rejected=rejected, + rules_snapshot=self._rules.model_copy(deep=True), + ) + self._rules.governance_history.append(result) + self._pending = [] + return result + + def apply( + self, payoffs: list[float], active_players: set[int], + ) -> list[float]: + """Run enabled mechanics + custom modifiers on payoffs.""" + result = apply_mechanics(payoffs, self._rules, active_players) + result = self._apply_custom_modifiers(result, active_players) + return result + + def register_custom_modifier( + self, key: str, fn: Callable[[list[float], set[int]], list[float]], + ) -> None: + """Register a custom modifier callable by key.""" + self._custom_modifiers[key] = fn + + def unregister_custom_modifier(self, key: str) -> None: + """Remove a custom modifier. Also deactivates it.""" + self._custom_modifiers.pop(key, None) + if key in self._rules.custom_modifier_keys: + self._rules.custom_modifier_keys.remove(key) + + # ------------------------------------------------------------------ + # Internal + # ------------------------------------------------------------------ + + def _validate_proposal(self, prop: GovernanceProposal) -> bool: + if prop.proposal_type == GOVERNANCE_PROPOSAL_PARAMETER: + return prop.parameter_name in _PARAMETER_FIELDS and prop.parameter_value is not None + if prop.proposal_type == GOVERNANCE_PROPOSAL_MECHANIC: + return prop.mechanic_name in MECHANIC_ORDER and prop.mechanic_active is not None + if prop.proposal_type == GOVERNANCE_PROPOSAL_CUSTOM: + return prop.custom_modifier_key is not None and prop.custom_modifier_active is not None + return False + + def _apply_proposal(self, prop: GovernanceProposal) -> None: + if prop.proposal_type == GOVERNANCE_PROPOSAL_PARAMETER: + self._apply_parameter(prop) + elif prop.proposal_type == GOVERNANCE_PROPOSAL_MECHANIC: + self._apply_mechanic(prop) + elif prop.proposal_type == GOVERNANCE_PROPOSAL_CUSTOM: + self._apply_custom(prop) + + def _apply_parameter(self, prop: GovernanceProposal) -> None: + name = prop.parameter_name + val = prop.parameter_value + if name == "enforcement" and isinstance(val, str): + self._rules.enforcement = val + elif name == "penalty_numerator" and isinstance(val, int): + self._rules.penalty_numerator = val + elif name == "penalty_denominator" and isinstance(val, int): + self._rules.penalty_denominator = val + elif name == "allow_side_payments" and isinstance(val, bool): + self._rules.allow_side_payments = val + + def _apply_mechanic(self, prop: GovernanceProposal) -> None: + if prop.mechanic_name is not None and prop.mechanic_active is not None: + self._rules.mechanics[prop.mechanic_name] = prop.mechanic_active + if prop.mechanic_params: + cfg = self._rules.mechanic_config + update = {} + for k, v in prop.mechanic_params.items(): + if hasattr(cfg, k): + update[k] = v + if update: + self._rules.mechanic_config = cfg.model_copy(update=update) + + def _apply_custom(self, prop: GovernanceProposal) -> None: + key = prop.custom_modifier_key + if key is None: + return + if prop.custom_modifier_active: + if key not in self._rules.custom_modifier_keys: + self._rules.custom_modifier_keys.append(key) + else: + if key in self._rules.custom_modifier_keys: + self._rules.custom_modifier_keys.remove(key) + + def _apply_custom_modifiers( + self, payoffs: list[float], active_players: set[int], + ) -> list[float]: + """Run custom modifiers with delta clamping for safety.""" + clamp = GOVERNANCE_CUSTOM_DELTA_CLAMP_NUMERATOR / GOVERNANCE_CUSTOM_DELTA_CLAMP_DENOMINATOR + result = list(payoffs) + for key in self._rules.custom_modifier_keys: + fn = self._custom_modifiers.get(key) + if fn is None: + continue + try: + modified = fn(list(result), set(active_players)) + except Exception: + continue + # Delta-clamp: no single payoff may change by more than clamp * abs(original) + for i in range(len(result)): + delta = modified[i] - result[i] + max_delta = abs(result[i]) * clamp + if max_delta < clamp: + max_delta = clamp + if delta > max_delta: + modified[i] = result[i] + max_delta + elif delta < -max_delta: + modified[i] = result[i] - max_delta + result = modified + return result diff --git a/env/nplayer/governance/mechanics.py b/env/nplayer/governance/mechanics.py new file mode 100644 index 0000000000000000000000000000000000000000..07a455d2de0921ab168597a5f92703d0d99935d5 --- /dev/null +++ b/env/nplayer/governance/mechanics.py @@ -0,0 +1,170 @@ +"""Pure payoff modifier functions for governance mechanics.""" + +from __future__ import annotations + +from constant_definitions.nplayer.governance_constants import ( + MECHANIC_TAXATION, + MECHANIC_REDISTRIBUTION, + MECHANIC_INSURANCE, + MECHANIC_QUOTA, + MECHANIC_SUBSIDY, + MECHANIC_VETO, + MECHANIC_ORDER, + REDISTRIBUTION_PROPORTIONAL, +) +from env.nplayer.governance.models import MechanicConfig, RuntimeRules + +_ZERO = int() +_ONE = int(bool(True)) +_ZERO_F = float() + + +def _apply_taxation( + payoffs: list[float], active: set[int], cfg: MechanicConfig, +) -> list[float]: + """Active players pay tax_rate * payoff into a pool, distributed equally.""" + n_active = len(active) + if n_active == _ZERO: + return payoffs + rate = cfg.tax_rate_numerator / cfg.tax_rate_denominator + pool = _ZERO_F + for i in active: + pool += payoffs[i] * rate + share = pool / n_active + result = list(payoffs) + for i in active: + result[i] = result[i] - payoffs[i] * rate + share + return result + + +def _apply_redistribution( + payoffs: list[float], active: set[int], cfg: MechanicConfig, +) -> list[float]: + """Equal mode: everyone gets mean. Proportional: dampen toward mean.""" + n_active = len(active) + if n_active == _ZERO: + return payoffs + total = sum(payoffs[i] for i in active) + mean = total / n_active + result = list(payoffs) + if cfg.redistribution_mode == REDISTRIBUTION_PROPORTIONAL: + damping = cfg.damping_numerator / cfg.damping_denominator + for i in active: + result[i] = result[i] + damping * (mean - result[i]) + else: + for i in active: + result[i] = mean + return result + + +def _apply_insurance( + payoffs: list[float], active: set[int], cfg: MechanicConfig, +) -> list[float]: + """All contribute a fraction; below-threshold players receive payout.""" + n_active = len(active) + if n_active == _ZERO: + return payoffs + contrib_rate = cfg.insurance_contribution_numerator / cfg.insurance_contribution_denominator + pool = _ZERO_F + result = list(payoffs) + for i in active: + contrib = result[i] * contrib_rate + pool += contrib + result[i] -= contrib + mean_pre = sum(payoffs[i] for i in active) / n_active + threshold = mean_pre * cfg.insurance_threshold_numerator / cfg.insurance_threshold_denominator + claimants = [i for i in active if payoffs[i] < threshold] + if claimants: + payout = pool / len(claimants) + for i in claimants: + result[i] += payout + return result + + +def _apply_quota( + payoffs: list[float], active: set[int], cfg: MechanicConfig, +) -> list[float]: + """Cap individual payoff at maximum; excess redistributed to below-cap.""" + cap = cfg.quota_max + result = list(payoffs) + excess = _ZERO_F + below_cap = [] + for i in active: + if result[i] > cap: + excess += result[i] - cap + result[i] = cap + else: + below_cap.append(i) + if below_cap and excess > _ZERO_F: + share = excess / len(below_cap) + for i in below_cap: + result[i] += share + return result + + +def _apply_subsidy( + payoffs: list[float], active: set[int], cfg: MechanicConfig, +) -> list[float]: + """Floor on payoffs, funded by fraction from above-floor players.""" + floor_val = cfg.subsidy_floor + fund_rate = cfg.subsidy_fund_rate_numerator / cfg.subsidy_fund_rate_denominator + result = list(payoffs) + # Collect funds from above-floor players + pool = _ZERO_F + for i in active: + if result[i] > floor_val: + contrib = (result[i] - floor_val) * fund_rate + pool += contrib + result[i] -= contrib + # Distribute to below-floor players + below = [i for i in active if payoffs[i] < floor_val] + if below and pool > _ZERO_F: + need_total = sum(floor_val - payoffs[i] for i in below) + for i in below: + need = floor_val - payoffs[i] + if need_total > _ZERO_F: + result[i] += min(need, pool * need / need_total) + return result + + +def _apply_veto( + payoffs: list[float], active: set[int], cfg: MechanicConfig, +) -> list[float]: + """Designated player triggers equalization if their payoff falls below mean.""" + vp = cfg.veto_player + if vp not in active: + return payoffs + n_active = len(active) + if n_active == _ZERO: + return payoffs + total = sum(payoffs[i] for i in active) + mean = total / n_active + if payoffs[vp] < mean: + result = list(payoffs) + for i in active: + result[i] = mean + return result + return payoffs + + +_MECHANIC_FNS = { + MECHANIC_TAXATION: _apply_taxation, + MECHANIC_REDISTRIBUTION: _apply_redistribution, + MECHANIC_INSURANCE: _apply_insurance, + MECHANIC_QUOTA: _apply_quota, + MECHANIC_SUBSIDY: _apply_subsidy, + MECHANIC_VETO: _apply_veto, +} + + +def apply_mechanics( + payoffs: list[float], rules: RuntimeRules, active_players: set[int], +) -> list[float]: + """Run all enabled mechanics in fixed order.""" + result = list(payoffs) + for name in MECHANIC_ORDER: + if rules.mechanics.get(name, False): + fn = _MECHANIC_FNS.get(name) + if fn is not None: + result = fn(result, active_players, rules.mechanic_config) + return result diff --git a/env/nplayer/governance/models.py b/env/nplayer/governance/models.py new file mode 100644 index 0000000000000000000000000000000000000000..afd72f46f3b7db42f24e7aaac7a869a43de388d8 --- /dev/null +++ b/env/nplayer/governance/models.py @@ -0,0 +1,174 @@ +"""Data models for the meta-governance system.""" + +from __future__ import annotations + +from typing import Any, Optional + +from pydantic import BaseModel, Field + +from constant_definitions.game_constants import DEFAULT_NONE +from constant_definitions.nplayer.coalition_constants import ( + ENFORCEMENT_CHEAP_TALK, + COALITION_DEFAULT_PENALTY_NUMERATOR, + COALITION_DEFAULT_PENALTY_DENOMINATOR, +) +from constant_definitions.nplayer.governance_constants import ( + GOVERNANCE_PROPOSAL_PARAMETER, + GOVERNANCE_DEFAULT_TAX_RATE_NUMERATOR, + GOVERNANCE_DEFAULT_TAX_RATE_DENOMINATOR, + GOVERNANCE_DEFAULT_REDISTRIBUTION_MODE, + GOVERNANCE_DEFAULT_REDISTRIBUTION_DAMPING_NUMERATOR, + GOVERNANCE_DEFAULT_REDISTRIBUTION_DAMPING_DENOMINATOR, + GOVERNANCE_DEFAULT_INSURANCE_CONTRIBUTION_NUMERATOR, + GOVERNANCE_DEFAULT_INSURANCE_CONTRIBUTION_DENOMINATOR, + GOVERNANCE_DEFAULT_INSURANCE_THRESHOLD_NUMERATOR, + GOVERNANCE_DEFAULT_INSURANCE_THRESHOLD_DENOMINATOR, + GOVERNANCE_DEFAULT_QUOTA_MAX, + GOVERNANCE_DEFAULT_SUBSIDY_FLOOR, + GOVERNANCE_DEFAULT_SUBSIDY_FUND_RATE_NUMERATOR, + GOVERNANCE_DEFAULT_SUBSIDY_FUND_RATE_DENOMINATOR, + GOVERNANCE_DEFAULT_VETO_PLAYER, +) + +_ZERO = int() +_ONE = int(bool(True)) +_ZERO_F = float() + + +class MechanicConfig(BaseModel): + """Per-mechanic parameter bundle.""" + + # taxation + tax_rate_numerator: int = Field(default=GOVERNANCE_DEFAULT_TAX_RATE_NUMERATOR) + tax_rate_denominator: int = Field(default=GOVERNANCE_DEFAULT_TAX_RATE_DENOMINATOR) + + # redistribution + redistribution_mode: str = Field(default=GOVERNANCE_DEFAULT_REDISTRIBUTION_MODE) + damping_numerator: int = Field(default=GOVERNANCE_DEFAULT_REDISTRIBUTION_DAMPING_NUMERATOR) + damping_denominator: int = Field(default=GOVERNANCE_DEFAULT_REDISTRIBUTION_DAMPING_DENOMINATOR) + + # insurance + insurance_contribution_numerator: int = Field( + default=GOVERNANCE_DEFAULT_INSURANCE_CONTRIBUTION_NUMERATOR, + ) + insurance_contribution_denominator: int = Field( + default=GOVERNANCE_DEFAULT_INSURANCE_CONTRIBUTION_DENOMINATOR, + ) + insurance_threshold_numerator: int = Field( + default=GOVERNANCE_DEFAULT_INSURANCE_THRESHOLD_NUMERATOR, + ) + insurance_threshold_denominator: int = Field( + default=GOVERNANCE_DEFAULT_INSURANCE_THRESHOLD_DENOMINATOR, + ) + + # quota + quota_max: float = Field(default=float(GOVERNANCE_DEFAULT_QUOTA_MAX)) + + # subsidy + subsidy_floor: float = Field(default=float(GOVERNANCE_DEFAULT_SUBSIDY_FLOOR)) + subsidy_fund_rate_numerator: int = Field( + default=GOVERNANCE_DEFAULT_SUBSIDY_FUND_RATE_NUMERATOR, + ) + subsidy_fund_rate_denominator: int = Field( + default=GOVERNANCE_DEFAULT_SUBSIDY_FUND_RATE_DENOMINATOR, + ) + + # veto + veto_player: int = Field(default=GOVERNANCE_DEFAULT_VETO_PLAYER) + + +class RuntimeRules(BaseModel): + """Mutable overlay on top of frozen CoalitionGameConfig.""" + + enforcement: str = Field(default=ENFORCEMENT_CHEAP_TALK) + penalty_numerator: int = Field(default=COALITION_DEFAULT_PENALTY_NUMERATOR) + penalty_denominator: int = Field(default=COALITION_DEFAULT_PENALTY_DENOMINATOR) + allow_side_payments: bool = Field(default=False) + + mechanics: dict[str, bool] = Field( + default_factory=dict, + description="Mechanic name -> active flag", + ) + mechanic_config: MechanicConfig = Field(default_factory=MechanicConfig) + + custom_modifier_keys: list[str] = Field( + default_factory=list, + description="Keys of active custom modifiers", + ) + + governance_history: list[GovernanceResult] = Field(default_factory=list) + + +class GovernanceProposal(BaseModel): + """A governance change proposed by a player.""" + + proposer: int = Field(..., description="Player index of the proposer") + proposal_type: str = Field( + default=GOVERNANCE_PROPOSAL_PARAMETER, + description="One of: parameter, mechanic, custom", + ) + + # parameter changes + parameter_name: Optional[str] = Field( + default=DEFAULT_NONE, + description="Name of the parameter to change (enforcement, penalty_numerator, etc.)", + ) + parameter_value: Optional[Any] = Field( + default=DEFAULT_NONE, + description="New value for the parameter", + ) + + # mechanic toggles + mechanic_name: Optional[str] = Field( + default=DEFAULT_NONE, + description="Mechanic to activate/deactivate", + ) + mechanic_active: Optional[bool] = Field( + default=DEFAULT_NONE, + description="True to activate, False to deactivate", + ) + mechanic_params: Optional[dict[str, Any]] = Field( + default=DEFAULT_NONE, + description="Optional parameter overrides for the mechanic", + ) + + # custom modifiers + custom_modifier_key: Optional[str] = Field( + default=DEFAULT_NONE, + description="Key of the custom modifier to activate/deactivate", + ) + custom_modifier_active: Optional[bool] = Field( + default=DEFAULT_NONE, + description="True to activate, False to deactivate", + ) + + +class GovernanceVote(BaseModel): + """A player's vote on a governance proposal.""" + + voter: int = Field(..., description="Player index of the voter") + proposal_index: int = Field(..., description="Index into the proposals list") + approve: bool = Field(..., description="Whether the voter approves") + + +class GovernanceResult(BaseModel): + """Record of governance activity for one round.""" + + proposals: list[GovernanceProposal] = Field(default_factory=list) + votes: list[GovernanceVote] = Field(default_factory=list) + adopted: list[int] = Field( + default_factory=list, + description="Indices of adopted proposals", + ) + rejected: list[int] = Field( + default_factory=list, + description="Indices of rejected proposals", + ) + rules_snapshot: Optional[RuntimeRules] = Field( + default=DEFAULT_NONE, + description="Rules state after this round of governance", + ) + + +# Allow RuntimeRules to reference GovernanceResult (forward ref) +RuntimeRules.model_rebuild() diff --git a/env/nplayer/governance/strategies.py b/env/nplayer/governance/strategies.py new file mode 100644 index 0000000000000000000000000000000000000000..dcf6949bc6632d7eb7f23a02137d29645bdaf59b --- /dev/null +++ b/env/nplayer/governance/strategies.py @@ -0,0 +1,100 @@ +"""Governance-aware opponent strategies.""" + +from __future__ import annotations + +import random +from typing import Protocol + +from env.nplayer.governance.models import GovernanceProposal, GovernanceVote + +_ZERO = int() +_ONE = int(bool(True)) + + +class GovernanceStrategy(Protocol): + """Interface for governance opponent behaviour.""" + + def propose_governance(self, player_index: int) -> list[GovernanceProposal]: ... + + def vote_on_governance( + self, player_index: int, proposals: list[GovernanceProposal], + ) -> list[GovernanceVote]: ... + + +# --------------------------------------------------------------------------- +# Built-in strategies +# --------------------------------------------------------------------------- + + +class GovernancePassiveStrategy: + """No proposals, no votes.""" + + def propose_governance(self, player_index: int) -> list[GovernanceProposal]: + return [] + + def vote_on_governance( + self, player_index: int, proposals: list[GovernanceProposal], + ) -> list[GovernanceVote]: + return [] + + +class GovernanceRandomStrategy: + """No proposals, random votes.""" + + def propose_governance(self, player_index: int) -> list[GovernanceProposal]: + return [] + + def vote_on_governance( + self, player_index: int, proposals: list[GovernanceProposal], + ) -> list[GovernanceVote]: + return [ + GovernanceVote(voter=player_index, proposal_index=idx, approve=random.choice([True, False])) + for idx in range(len(proposals)) + ] + + +class GovernanceConservativeStrategy: + """No proposals, rejects all.""" + + def propose_governance(self, player_index: int) -> list[GovernanceProposal]: + return [] + + def vote_on_governance( + self, player_index: int, proposals: list[GovernanceProposal], + ) -> list[GovernanceVote]: + return [ + GovernanceVote(voter=player_index, proposal_index=idx, approve=False) + for idx in range(len(proposals)) + ] + + +class GovernanceProgressiveStrategy: + """No proposals, approves all.""" + + def propose_governance(self, player_index: int) -> list[GovernanceProposal]: + return [] + + def vote_on_governance( + self, player_index: int, proposals: list[GovernanceProposal], + ) -> list[GovernanceVote]: + return [ + GovernanceVote(voter=player_index, proposal_index=idx, approve=True) + for idx in range(len(proposals)) + ] + + +# --------------------------------------------------------------------------- +# Registry +# --------------------------------------------------------------------------- + +GOVERNANCE_STRATEGIES: dict[str, GovernanceStrategy] = { + "governance_passive": GovernancePassiveStrategy(), + "governance_random": GovernanceRandomStrategy(), + "governance_conservative": GovernanceConservativeStrategy(), + "governance_progressive": GovernanceProgressiveStrategy(), +} + + +def get_governance_strategy(name: str) -> GovernanceStrategy: + """Look up a governance strategy by name. Raises KeyError if not found.""" + return GOVERNANCE_STRATEGIES[name] diff --git a/env/nplayer/models.py b/env/nplayer/models.py new file mode 100644 index 0000000000000000000000000000000000000000..fcda48cf130694717787e08ff1afefcc01b1fb19 --- /dev/null +++ b/env/nplayer/models.py @@ -0,0 +1,54 @@ +"""Data models for the N-player environment.""" + +from __future__ import annotations + +from typing import Optional + +from pydantic import BaseModel, Field + +from constant_definitions.game_constants import ( + DEFAULT_FALSE, + DEFAULT_NONE, + DEFAULT_ZERO_FLOAT, + DEFAULT_ZERO_INT, + MIN_STEP_COUNT, +) + + +class NPlayerRoundResult(BaseModel): + round_number: int = Field(..., description="Round number (one-indexed)") + actions: list[str] = Field(..., description="Actions taken by all players") + payoffs: list[float] = Field(..., description="Payoffs received by all players") + + +class NPlayerAction(BaseModel): + action: str = Field(..., description="The action to take this round") + metadata: dict = Field(default_factory=dict) + + +class NPlayerObservation(BaseModel): + done: bool = Field(default=DEFAULT_FALSE, description="Whether the episode is over") + reward: float = Field(default=DEFAULT_ZERO_FLOAT, description="Reward for this step") + game_name: str = Field(default="", description="Name of the current game") + game_description: str = Field(default="", description="Description of the game rules") + available_actions: list[str] = Field(default_factory=list, description="Valid actions") + current_round: int = Field(default=DEFAULT_ZERO_INT, description="Current round number") + total_rounds: int = Field(default=DEFAULT_ZERO_INT, description="Total rounds in episode") + history: list[NPlayerRoundResult] = Field(default_factory=list, description="Round history") + scores: list[float] = Field(default_factory=list, description="Cumulative scores for all players") + num_players: int = Field(default=DEFAULT_ZERO_INT, description="Number of players") + player_index: int = Field(default=DEFAULT_ZERO_INT, description="This player's index") + last_round: Optional[NPlayerRoundResult] = Field(default=DEFAULT_NONE, description="Most recent round") + metadata: dict = Field(default_factory=dict) + + +class NPlayerGameState(BaseModel): + episode_id: Optional[str] = Field(default=DEFAULT_NONE, description="Episode identifier") + step_count: int = Field(default=DEFAULT_ZERO_INT, ge=MIN_STEP_COUNT, description="Steps taken") + game_name: str = Field(default="", description="Current game name") + current_round: int = Field(default=DEFAULT_ZERO_INT, description="Current round") + total_rounds: int = Field(default=DEFAULT_ZERO_INT, description="Total rounds") + num_players: int = Field(default=DEFAULT_ZERO_INT, description="Number of players") + scores: list[float] = Field(default_factory=list, description="Cumulative scores for all players") + history: list[NPlayerRoundResult] = Field(default_factory=list, description="Round history") + is_done: bool = Field(default=DEFAULT_FALSE, description="Whether episode has ended") diff --git a/env/nplayer/strategies.py b/env/nplayer/strategies.py new file mode 100644 index 0000000000000000000000000000000000000000..7cf85d2ee7043c8222fa48c524e7b9c3dd2f73a6 --- /dev/null +++ b/env/nplayer/strategies.py @@ -0,0 +1,96 @@ +"""N-player opponent strategies.""" + +from __future__ import annotations + +import random +from typing import Protocol + +from env.nplayer.models import NPlayerObservation +from constant_definitions.game_constants import ( + ADAPTIVE_THRESHOLD_NUMERATOR, + ADAPTIVE_THRESHOLD_DENOMINATOR, +) + +_ONE = int(bool(True)) +_ZERO = int() + + +class NPlayerStrategy(Protocol): + """Interface for N-player opponent strategies.""" + + def choose_action( + self, observation: NPlayerObservation, + ) -> str: ... + + +class NPlayerRandomStrategy: + def choose_action(self, observation: NPlayerObservation) -> str: + return random.choice(observation.available_actions) + + +class NPlayerAlwaysCooperateStrategy: + def choose_action(self, observation: NPlayerObservation) -> str: + return observation.available_actions[_ZERO] + + +class NPlayerAlwaysDefectStrategy: + def choose_action(self, observation: NPlayerObservation) -> str: + return observation.available_actions[_ONE] + + +class NPlayerTitForTatStrategy: + """Cooperate first. Then mirror the majority action of other players.""" + + def choose_action(self, observation: NPlayerObservation) -> str: + actions = observation.available_actions + coop = actions[_ZERO] + defect = actions[_ONE] + if not observation.history: + return coop + last = observation.history[-_ONE] + my_idx = observation.player_index + other_actions = [ + a for i, a in enumerate(last.actions) if i != my_idx + ] + defect_count = sum(_ONE for a in other_actions if a == defect) + coop_count = len(other_actions) - defect_count + return coop if coop_count >= defect_count else defect + + +class NPlayerAdaptiveStrategy: + """Cooperate first. Then cooperate if majority of others cooperated overall.""" + + def choose_action(self, observation: NPlayerObservation) -> str: + actions = observation.available_actions + coop = actions[_ZERO] + if not observation.history: + return coop + my_idx = observation.player_index + total_other = _ZERO + coop_total = _ZERO + for rnd in observation.history: + for i, a in enumerate(rnd.actions): + if i != my_idx: + total_other += _ONE + if a == coop: + coop_total += _ONE + threshold = total_other * ADAPTIVE_THRESHOLD_NUMERATOR / ADAPTIVE_THRESHOLD_DENOMINATOR + return coop if coop_total > threshold else actions[_ONE] + + +# --------------------------------------------------------------------------- +# Registry +# --------------------------------------------------------------------------- + +NPLAYER_STRATEGIES: dict[str, NPlayerStrategy] = { + "random": NPlayerRandomStrategy(), + "always_cooperate": NPlayerAlwaysCooperateStrategy(), + "always_defect": NPlayerAlwaysDefectStrategy(), + "tit_for_tat": NPlayerTitForTatStrategy(), + "adaptive": NPlayerAdaptiveStrategy(), +} + + +def get_nplayer_strategy(name: str) -> NPlayerStrategy: + """Look up an N-player strategy by name. Raises KeyError if not found.""" + return NPLAYER_STRATEGIES[name] diff --git a/server/KantBench_environment.py b/server/KantBench_environment.py index 808e285e2bdf6ce7af097f2518feac8d206e36f8..048dcee9346861e50b07ccead30a5722c3b1999e 100644 --- a/server/KantBench_environment.py +++ b/server/KantBench_environment.py @@ -1,289 +1,78 @@ -"""KantBench: a game theory RL environment for OpenEnv. +"""KantBench environment adapter for the HF Space. -Each episode is one repeated game (e.g. Prisoner's Dilemma) against a -fixed strategy opponent. The agent chooses a move each round; the -environment computes payoffs and returns a structured observation. - -Supported games: Prisoner's Dilemma, Stag Hunt, Hawk-Dove, - Battle of Sexes, Chicken, Matching Pennies, - Rock-Paper-Scissors. -Opponent strategies: random, always_first, always_last, tit_for_tat, - grim_trigger, pavlov. +Thin wrapper that delegates to the real KantEnvironment (90+ games, +17 strategies) instead of a standalone reimplementation. """ from __future__ import annotations -import random from typing import Any -from uuid import uuid4 from openenv.core.env_server.interfaces import Environment from openenv.core.env_server.types import State from models import KantBenchAction, KantBenchObservation - -# --------------------------------------------------------------------------- -# Game definitions (self-contained payoff matrices) -# --------------------------------------------------------------------------- - -def _matrix(m: dict[tuple[str, str], tuple[float, float]]): - """Return a payoff function from a matrix dict.""" - def fn(a: str, b: str) -> tuple[float, float]: - return m[(a, b)] - return fn - - -GAMES: dict[str, dict[str, Any]] = { - "prisoners_dilemma": { - "name": "Prisoner's Dilemma", - "description": ( - "Two players choose to cooperate or defect simultaneously. " - "Mutual cooperation is best collectively; defection is individually tempting." - ), - "actions": ["cooperate", "defect"], - "rounds": 10, - "payoff_fn": _matrix({ - ("cooperate", "cooperate"): (3.0, 3.0), - ("cooperate", "defect"): (0.0, 5.0), - ("defect", "cooperate"): (5.0, 0.0), - ("defect", "defect"): (1.0, 1.0), - }), - }, - "stag_hunt": { - "name": "Stag Hunt", - "description": ( - "Two hunters choose to hunt stag (requires coordination) or hare " - "(safe alone). Mutual cooperation yields the best outcome." - ), - "actions": ["stag", "hare"], - "rounds": 10, - "payoff_fn": _matrix({ - ("stag", "stag"): (4.0, 4.0), - ("stag", "hare"): (0.0, 2.0), - ("hare", "stag"): (2.0, 0.0), - ("hare", "hare"): (2.0, 2.0), - }), - }, - "hawk_dove": { - "name": "Hawk-Dove", - "description": ( - "Two players compete over a resource. Hawk is aggressive; Dove is passive. " - "Two hawks fight and both lose; two doves share." - ), - "actions": ["hawk", "dove"], - "rounds": 10, - "payoff_fn": _matrix({ - ("hawk", "hawk"): (-1.0, -1.0), - ("hawk", "dove"): (4.0, 0.0), - ("dove", "hawk"): (0.0, 4.0), - ("dove", "dove"): (2.0, 2.0), - }), - }, - "battle_of_sexes": { - "name": "Battle of the Sexes", - "description": ( - "Two players want to coordinate but prefer different options. " - "Player 1 prefers opera; Player 2 prefers football. " - "Both prefer to be together over going alone." - ), - "actions": ["opera", "football"], - "rounds": 10, - "payoff_fn": _matrix({ - ("opera", "opera"): (3.0, 1.0), - ("opera", "football"): (0.0, 0.0), - ("football", "opera"): (0.0, 0.0), - ("football", "football"): (1.0, 3.0), - }), - }, - "chicken": { - "name": "Chicken (Snowdrift)", - "description": ( - "Two drivers head toward each other. Swerving is safe but cowardly; " - "going straight is bold but catastrophic if both do it." - ), - "actions": ["straight", "swerve"], - "rounds": 10, - "payoff_fn": _matrix({ - ("straight", "straight"): (-10.0, -10.0), - ("straight", "swerve"): (5.0, -1.0), - ("swerve", "straight"): (-1.0, 5.0), - ("swerve", "swerve"): (0.0, 0.0), - }), - }, - "matching_pennies": { - "name": "Matching Pennies", - "description": ( - "Player 1 wins if both show the same side; Player 2 wins if they differ. " - "Pure zero-sum game with no stable pure-strategy Nash equilibrium." - ), - "actions": ["heads", "tails"], - "rounds": 20, - "payoff_fn": _matrix({ - ("heads", "heads"): (1.0, -1.0), - ("heads", "tails"): (-1.0, 1.0), - ("tails", "heads"): (-1.0, 1.0), - ("tails", "tails"): (1.0, -1.0), - }), - }, - "rock_paper_scissors": { - "name": "Rock-Paper-Scissors", - "description": ( - "Classic zero-sum game. Rock beats Scissors, Scissors beats Paper, " - "Paper beats Rock. Ties yield 0." - ), - "actions": ["rock", "paper", "scissors"], - "rounds": 20, - "payoff_fn": _matrix({ - ("rock", "rock"): (0.0, 0.0), - ("rock", "paper"): (-1.0, 1.0), - ("rock", "scissors"): (1.0, -1.0), - ("paper", "rock"): (1.0, -1.0), - ("paper", "paper"): (0.0, 0.0), - ("paper", "scissors"): (-1.0, 1.0), - ("scissors", "rock"): (-1.0, 1.0), - ("scissors", "paper"): (1.0, -1.0), - ("scissors", "scissors"): (0.0, 0.0), - }), - }, -} - -STRATEGIES = ["random", "always_first", "always_last", "tit_for_tat", "grim_trigger", "pavlov"] +from env.environment import KantEnvironment +from env.models import GameAction -def _opponent_move(strategy: str, actions: list[str], history: list[dict]) -> str: - """Compute opponent's move given strategy and history.""" - if strategy == "random": - return random.choice(actions) - if strategy == "always_first": - return actions[0] - if strategy == "always_last": - return actions[-1] - if not history: - return actions[0] # default opening for reactive strategies - last_agent_move = history[-1]["your_move"] - last_opp_move = history[-1]["opponent_move"] - if strategy == "tit_for_tat": - return last_agent_move if last_agent_move in actions else actions[0] - if strategy == "grim_trigger": - # Defect forever once agent defects; cooperate otherwise - ever_defected = any(r["your_move"] == actions[-1] for r in history) - return actions[-1] if ever_defected else actions[0] - if strategy == "pavlov": - # Repeat own last move if it paid well (i.e. opponent cooperated), else switch - if last_opp_move == actions[0]: - return last_opp_move # mirror cooperation - return actions[0] if last_opp_move != actions[0] else actions[-1] - return random.choice(actions) - - -# --------------------------------------------------------------------------- -# Environment -# --------------------------------------------------------------------------- - class KantbenchEnvironment(Environment): - """Game theory environment for benchmarking LLM strategic reasoning. - - Each episode is a repeated 2-player game against one of six opponent - strategies. The agent submits a move each round and receives the payoff - result as a structured observation. - - Example:: - - env = KantBenchEnvironment() - obs = env.reset() - # obs.game_name == "Prisoner's Dilemma" - # obs.available_moves == ["cooperate", "defect"] + """Game theory environment exposing 90+ games via the OpenEnv interface. - obs = env.step(KantBenchAction(move="cooperate")) - # obs.your_payoff, obs.opponent_move, obs.cumulative_score, ... + Wraps the real KantEnvironment and translates between the Space's + model types (KantBenchAction/Observation) and the internal types. """ SUPPORTS_CONCURRENT_SESSIONS: bool = True - def __init__(self): - self._state = State(episode_id=str(uuid4()), step_count=0) - self._game_key: str = "prisoners_dilemma" - self._strategy: str = "random" - self._history: list[dict] = [] - self._cumulative_score: float = 0.0 + def __init__(self) -> None: + self._env = KantEnvironment() - def reset(self, **kwargs) -> KantBenchObservation: - """Start a new episode with a randomly chosen game and opponent strategy.""" - self._game_key = random.choice(list(GAMES.keys())) - self._strategy = random.choice(STRATEGIES) - self._history = [] - self._cumulative_score = 0.0 - self._state = State(episode_id=str(uuid4()), step_count=0) + def reset(self, **kwargs: Any) -> KantBenchObservation: + obs = self._env.reset(**kwargs) + return _to_space_obs(obs) - game = GAMES[self._game_key] - return KantBenchObservation( - game_name=game["name"], - game_description=game["description"], - available_moves=game["actions"], - your_move="", - opponent_move="", - your_payoff=0.0, - opponent_payoff=0.0, - cumulative_score=0.0, - round_number=0, - max_rounds=game["rounds"], - opponent_strategy=self._strategy, - history=[], - done=False, - reward=0.0, - message=( - f"New episode: {game['name']} vs {self._strategy}. " - f"Choose one of: {game['actions']}" - ), - ) - - def step(self, action: KantBenchAction, **kwargs) -> KantBenchObservation: # type: ignore[override] - """Play one round of the current game.""" - game = GAMES[self._game_key] - actions = game["actions"] - max_rounds = game["rounds"] - - # Validate move - move = action.move.lower().strip() - if move not in actions: - closest = actions[0] - move = closest + def step(self, action: KantBenchAction, **kwargs: Any) -> KantBenchObservation: + internal_action = GameAction(action=action.move) + obs = self._env.step(internal_action, **kwargs) + return _to_space_obs(obs) - opp_move = _opponent_move(self._strategy, actions, self._history) - your_pay, opp_pay = game["payoff_fn"](move, opp_move) + @property + def state(self) -> State: + s = self._env.state + return State( + episode_id=s.episode_id or "", + step_count=s.step_count, + ) - self._state.step_count += 1 - self._cumulative_score += your_pay - round_record = { - "round": self._state.step_count, - "your_move": move, - "opponent_move": opp_move, - "your_payoff": your_pay, - "opponent_payoff": opp_pay, +def _to_space_obs(obs) -> KantBenchObservation: + """Convert internal GameObservation to Space-facing KantBenchObservation.""" + last = obs.last_round + history = [ + { + "round": r.round_number, + "your_move": r.player_action, + "opponent_move": r.opponent_action, + "your_payoff": r.player_payoff, + "opponent_payoff": r.opponent_payoff, } - self._history.append(round_record) - - done = self._state.step_count >= max_rounds - - return KantBenchObservation( - game_name=game["name"], - game_description=game["description"], - available_moves=actions, - your_move=move, - opponent_move=opp_move, - your_payoff=your_pay, - opponent_payoff=opp_pay, - cumulative_score=self._cumulative_score, - round_number=self._state.step_count, - max_rounds=max_rounds, - opponent_strategy=self._strategy, - history=self._history, - done=done, - reward=your_pay, - message="Game over — call reset() to start a new episode." if done else "", - ) - - @property - def state(self) -> State: - return self._state + for r in obs.history + ] + return KantBenchObservation( + game_name=obs.game_name, + game_description=obs.game_description, + available_moves=list(obs.available_actions), + your_move=last.player_action if last else "", + opponent_move=last.opponent_action if last else "", + your_payoff=last.player_payoff if last else 0.0, + opponent_payoff=last.opponent_payoff if last else 0.0, + cumulative_score=obs.player_score, + round_number=obs.current_round, + max_rounds=obs.total_rounds, + opponent_strategy=obs.opponent_strategy, + history=history, + done=obs.done, + reward=obs.reward, + message="Game over — call reset() to start a new episode." if obs.done else "", + ) diff --git a/server/__pycache__/KantBench_environment.cpython-311.pyc b/server/__pycache__/KantBench_environment.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..90dc650438af58cdd9a5c7da398b0c5608c48cfc Binary files /dev/null and b/server/__pycache__/KantBench_environment.cpython-311.pyc differ diff --git a/server/__pycache__/__init__.cpython-311.pyc b/server/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f3f1128b992d5f49bd2d9fbbcfc88413ac9e8450 Binary files /dev/null and b/server/__pycache__/__init__.cpython-311.pyc differ diff --git a/server/__pycache__/app.cpython-311.pyc b/server/__pycache__/app.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f7c60d6bc1cfbd78e69cb8cfa54b2eff9709a689 Binary files /dev/null and b/server/__pycache__/app.cpython-311.pyc differ