Spaces:

sergiopaniego
/

textarena2

Sleeping

App Files Files Community

sergiopaniego HF Staff commited on about 1 month ago

Commit

4e38d2f

verified ·

1 Parent(s): 2928bc6

Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

Dockerfile +84 -0
README.md +198 -5
__init__.py +26 -0
client.py +117 -0
models.py +57 -0
openenv.yaml +7 -0
pyproject.toml +52 -0
rewards.py +129 -0
server/__init__.py +11 -0
server/app.py +90 -0
server/environment.py +320 -0
server/run_local.sh +7 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,84 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local src/core)
+# - Standalone environments (with openenv-core from pip)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=textarena
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv-core is already in the pyproject.toml dependencies
+# For standalone builds, openenv-core will be installed from pip via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install system libraries required by TextArena (cv2 needs libGL, glib)
+# Also install git for building from git repos
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    libgl1 \
+    libglib2.0-0 \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -1,10 +1,203 @@
 ---
-title: Textarena2
-emoji: 🦀
-colorFrom: red
-colorTo: blue
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: TextArena Environment Server
+emoji: 🎮
+colorFrom: yellow
+colorTo: indigo
 sdk: docker
 pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
 ---
+# TextArena Environment
+A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
+## Quick Start
+The simplest way to use the TextArena environment is through the `TextArenaEnv` class:
+```python
+from textarena import TextArenaAction, TextArenaEnv
+try:
+    # Create environment from Docker image
+    textarenaenv = TextArenaEnv.from_docker_image("textarena-env:latest")
+    # Reset
+    result = textArenaEnv.reset()
+    print(f"Reset: {result.observation.echoed_message}")
+    # Send multiple messages
+    messages = ["Hello, World!", "Testing echo", "Final message"]
+    for msg in messages:
+        result = textArenaEnv.step(TextArenaAction(message=msg))
+        print(f"Sent: '{msg}'")
+        print(f"  → Echoed: '{result.observation.echoed_message}'")
+        print(f"  → Length: {result.observation.message_length}")
+        print(f"  → Reward: {result.reward}")
+finally:
+    # Always clean up
+    textArenaEnv.close()
+```
+That's it! The `TextArenaEnv.from_docker_image()` method handles:
+- Starting the Docker container
+- Waiting for the server to be ready
+- Connecting to the environment
+- Container cleanup when you call `close()`
+## Building the Docker Image
+Before using the environment, you need to build the Docker image:
+```bash
+# From project root
+docker build -t textarena-env:latest -f server/Dockerfile .
+```
+## Deploying to Hugging Face Spaces
+You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
+```bash
+# From the environment directory (where openenv.yaml is located)
+openenv push
+# Or specify options
+openenv push --namespace my-org --private
+```
+The `openenv push` command will:
+1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
+2. Prepare a custom build for Hugging Face Docker space (enables web interface)
+3. Upload to Hugging Face (ensuring you're logged in)
+### Prerequisites
+- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
+### Options
+- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
+- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
+- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
+- `--private`: Deploy the space as private (default: public)
+### Examples
+```bash
+# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
+openenv push
+# Push to a specific repository
+openenv push --repo-id my-org/my-env
+# Push with a custom base image
+openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
+# Push as a private space
+openenv push --private
+# Combine options
+openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
+```
+After deployment, your space will be available at:
+`https://huggingface.co/spaces/<repo-id>`
+The deployed space includes:
+- **Web Interface** at `/web` - Interactive UI for exploring the environment
+- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
+- **Health Check** at `/health` - Container health monitoring
+## Environment Details
+### Action
+**TextArenaAction**: Contains a single field
+- `message` (str) - The message to echo back
+### Observation
+**TextArenaObservation**: Contains the echo response and metadata
+- `echoed_message` (str) - The message echoed back
+- `message_length` (int) - Length of the message
+- `reward` (float) - Reward based on message length (length × 0.1)
+- `done` (bool) - Always False for echo environment
+- `metadata` (dict) - Additional info like step count
+### Reward
+The reward is calculated as: `message_length × 0.1`
+- "Hi" → reward: 0.2
+- "Hello, World!" → reward: 1.3
+- Empty message → reward: 0.0
+## Advanced Usage
+### Connecting to an Existing Server
+If you already have a TextArena environment server running, you can connect directly:
+```python
+from textarena import TextArenaEnv
+# Connect to existing server
+textarenaenv = TextArenaEnv(base_url="<ENV_HTTP_URL_HERE>")
+# Use as normal
+result = textarenaenv.reset()
+result = textarenaenv.step(TextArenaAction(message="Hello!"))
+```
+Note: When connecting to an existing server, `textarenaenv.close()` will NOT stop the server.
+## Development & Testing
+### Direct Environment Testing
+Test the environment logic directly without starting the HTTP server:
+```bash
+# From the server directory
+python3 server/textarena_environment.py
+```
+This verifies that:
+- Environment resets correctly
+- Step executes actions properly
+- State tracking works
+- Rewards are calculated correctly
+### Running Locally
+Run the server locally for development:
+```bash
+# Install dependencies
+uv venv && source .venv/bin/activate
+uv pip install -e .
+# Start the server (use python -m to ensure venv Python is used)
+python -m uvicorn server.app:app --reload
+```
+## Project Structure
+```
+textarena/
+├── __init__.py            # Module exports
+├── README.md              # This file
+├── openenv.yaml           # OpenEnv manifest
+├── pyproject.toml         # Project metadata and dependencies
+├── uv.lock                # Locked dependencies (generated)
+├── client.py              # TextArenaEnv client implementation
+├── models.py              # Action and Observation models
+└── server/
+    ├── __init__.py        # Server module exports
+    ├── textarena_environment.py  # Core environment logic
+    ├── app.py             # FastAPI application
+    └── Dockerfile         # Container image definition
+```

__init__.py ADDED Viewed

	@@ -0,0 +1,26 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""TextArena environment integration for OpenEnv."""
+from .client import TextArenaEnv
+from .models import (
+    TextArenaAction,
+    TextArenaMessage,
+    TextArenaObservation,
+    TextArenaState,
+)
+from .rewards import RewardProvider, build_reward_providers
+__all__ = [
+    "TextArenaEnv",
+    "TextArenaAction",
+    "TextArenaObservation",
+    "TextArenaState",
+    "TextArenaMessage",
+    "RewardProvider",
+    "build_reward_providers",
+]

client.py ADDED Viewed

	@@ -0,0 +1,117 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+TextArena Environment HTTP Client.
+This module provides the client for connecting to a TextArena Environment server
+over HTTP.
+"""
+from __future__ import annotations
+from typing import Any, Dict
+from openenv.core.client_types import StepResult
+from openenv.core.env_client import EnvClient
+from .models import (
+    TextArenaAction,
+    TextArenaMessage,
+    TextArenaObservation,
+    TextArenaState,
+)
+class TextArenaEnv(EnvClient[TextArenaAction, TextArenaObservation, TextArenaState]):
+    """
+    HTTP client for the TextArena Environment.
+    This client connects to a TextArenaEnvironment HTTP server and provides
+    methods to interact with it: reset(), step(), and state access.
+    Example:
+        >>> # Connect to a running server
+        >>> client = TextArenaEnv(base_url="http://localhost:8000")
+        >>> result = client.reset()
+        >>> print(result.observation.echoed_message)
+        >>>
+        >>> # Send a message
+        >>> result = client.step(TextArenaAction(message="Hello!"))
+        >>> print(result.observation.echoed_message)
+        >>> print(result.reward)
+    Example with Docker:
+        >>> # Automatically start container and connect
+        >>> client = TextArenaEnv.from_docker_image("textarena-env:latest")
+        >>> result = client.reset()
+        >>> result = client.step(TextArenaAction(message="Test"))
+    """
+    def _step_payload(self, action: TextArenaAction) -> Dict:
+        """
+        Convert TextArenaAction to JSON payload for step request.
+        Args:
+            action: TextArenaAction instance
+        Returns:
+            Dictionary representation suitable for JSON encoding
+        """
+        return {
+            "message": action.message,
+        }
+    def _parse_result(self, payload: Dict) -> StepResult[TextArenaObservation]:
+        """
+        Parse server response into StepResult[TextArenaObservation].
+        Args:
+            payload: JSON response from server
+        Returns:
+            StepResult with TextArenaObservation
+        """
+        obs_data = payload.get("observation", {})
+        messages_payload = obs_data.get("messages", [])
+        messages = [
+            TextArenaMessage(
+                sender_id=item.get("sender_id", -1),
+                content=item.get("content", ""),
+                category=item.get("category", "MESSAGE"),
+            )
+            for item in messages_payload
+            if isinstance(item, dict)
+        ]
+        observation = TextArenaObservation(
+            prompt=obs_data.get("prompt", ""),
+            messages=messages,
+            current_player_id=obs_data.get("current_player_id", 0),
+            legal_players=obs_data.get("legal_players", []),
+            info=obs_data.get("info", {}),
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict[str, Any]) -> TextArenaState:
+        return TextArenaState(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+            env_id=payload.get("env_id", "unknown"),
+            num_players=payload.get("num_players", 1),
+            max_turns=payload.get("max_turns"),
+            turn=payload.get("turn", 0),
+            last_reward=payload.get("last_reward", 0.0),
+            last_info=payload.get("last_info", {}),
+            raw_state=payload.get("raw_state", {}),
+        )

models.py ADDED Viewed

	@@ -0,0 +1,57 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the TextArena Environment.
+The textarena environment is a simple test environment that echoes back messages.
+"""
+from __future__ import annotations
+from pydantic import Field
+from typing import Any, Dict, List, Optional
+from pydantic import BaseModel, Field
+from openenv.core.env_server.types import Action, Observation, State
+class TextArenaMessage(BaseModel):
+    """Single message observed by a player."""
+    sender_id: int
+    content: str
+    category: str
+class TextArenaAction(Action):
+    """Action issued by the agent for TextArena games."""
+    message: str
+class TextArenaObservation(Observation):
+    """Observation returned from any TextArena game."""
+    prompt: str
+    messages: List[TextArenaMessage] = Field(default_factory=list)
+    current_player_id: int = 0
+    legal_players: List[int] = Field(default_factory=list)
+    info: Dict[str, Any] = Field(default_factory=dict)
+class TextArenaState(State):
+    """Structured state snapshot for the server."""
+    env_id: str
+    num_players: int
+    max_turns: Optional[int] = None
+    turn: int = 0
+    last_reward: float = 0.0
+    last_info: Dict[str, Any] = Field(default_factory=dict)
+    raw_state: Dict[str, Any] = Field(default_factory=dict)

openenv.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+spec_version: 1
+name: textarena
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

pyproject.toml ADDED Viewed

	@@ -0,0 +1,52 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-textarena"
+version = "0.1.0"
+description = "TextArena environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv dependencies (required for server functionality)
+    "openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git@main",
+    "fastapi>=0.115.0",
+    "pydantic>=2.0.0",
+    "uvicorn>=0.24.0",
+    "requests>=2.31.0",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    # "numpy>=1.19.0",
+    # "torch>=2.0.0",
+    # "gymnasium>=0.29.0",
+    # "openspiel>=1.0.0",
+    # "smolagents>=1.22.0,<2",
+    "textarena>=0.6.1",
+    "nltk>=3.9.2",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m textarena.server.app
+server = "textarena.server.app:main"
+[tool.setuptools]
+# Explicitly list packages - "textarena_env" maps to current dir
+packages = ["textarena_env", "textarena_env.server"]
+package-dir = {"textarena_env" = ".", "textarena_env.server" = "server"}

rewards.py ADDED Viewed

	@@ -0,0 +1,129 @@

+"""Reward provider utilities for TextArena environments."""
+from __future__ import annotations
+import re
+from typing import Dict, List, Protocol, Tuple
+try:
+    from textarena_env.models import TextArenaAction, TextArenaObservation
+except ImportError:
+    from models import TextArenaAction, TextArenaObservation
+class RewardProvider(Protocol):
+    """Interface for computing auxiliary reward signals."""
+    def reset(self) -> None:
+        """Clear any internal state before a new episode."""
+    def compute(self, *, action: TextArenaAction, observation: TextArenaObservation) -> Dict[str, float]:
+        """Return a mapping of reward names to float values for the step."""
+def build_reward_providers(env_id: str) -> List[RewardProvider]:
+    """Instantiate reward providers appropriate for the given environment."""
+    providers: List[RewardProvider] = []
+    if env_id == "Wordle-v0":
+        providers.append(_WordleRewardProvider())
+    return providers
+_WORDLE_GUESS_PATTERN = re.compile(r"\[[A-Za-z]{5}\]")
+def extract_guess(text: str) -> str:
+    """Normalize a Wordle guess string from arbitrary text."""
+    match = _WORDLE_GUESS_PATTERN.search(text)
+    if match:
+        return match.group(0).lower()
+    cleaned = re.sub(r"[^a-z]", "", text.lower())
+    if len(cleaned) >= 5:
+        return f"[{cleaned[:5]}]"
+    return "[dunno]"
+def extract_wordle_feedback(observation: TextArenaObservation) -> str:
+    """Pull the latest feedback text from a Wordle observation."""
+    for message in reversed(observation.messages):
+        content = message.content.strip()
+        if "Feedback:" in content:
+            return content.split("Feedback:", 1)[-1].strip()
+    return ""
+def extract_feedback_counts(feedback: str) -> Tuple[int, int]:
+    """Return counts of green (G) and yellow (Y) markers from feedback."""
+    if not feedback:
+        return (0, 0)
+    lines = [line.strip() for line in feedback.split("\n") if line.strip()]
+    if len(lines) < 2:
+        return (0, 0)
+    for line in reversed(lines):
+        normalized = line.replace(" ", "")
+        if normalized and all(c in "GYX" for c in normalized):
+            green = normalized.count("G")
+            yellow = normalized.count("Y")
+            return (green, yellow)
+    return (0, 0)
+class _WordleRewardProvider:
+    """Reward provider that mirrors the GRPO Wordle heuristics."""
+    SIGNAL_MAP = {
+        "greens": "wordle.greens",
+        "yellows": "wordle.yellows",
+        "repetitions": "wordle.repetitions",
+        "correct": "wordle.correct",
+    }
+    def __init__(self) -> None:
+        self._guess_history: Dict[str, int] = {}
+    def reset(self) -> None:
+        self._guess_history.clear()
+    def compute(self, *, action: TextArenaAction, observation: TextArenaObservation) -> Dict[str, float]:
+        guess = extract_guess(action.message)
+        feedback = extract_wordle_feedback(observation)
+        normalized_guess = guess if guess and guess != "[dunno]" else ""
+        previous_occurrences = self._guess_history.get(normalized_guess, 0) if normalized_guess else 0
+        green_score = 0.0
+        yellow_score = 0.0
+        if feedback:
+            green_count, yellow_count = extract_feedback_counts(feedback)
+            green_score = green_count / 5.0
+            yellow_score = yellow_count / 5.0
+        repetition_score = 1.0 - previous_occurrences
+        correct_score = float(observation.reward or 0.0)
+        if normalized_guess:
+            self._guess_history[normalized_guess] = previous_occurrences + 1
+        return {
+            self.SIGNAL_MAP["greens"]: float(green_score),
+            self.SIGNAL_MAP["yellows"]: float(yellow_score),
+            self.SIGNAL_MAP["repetitions"]: float(repetition_score),
+            self.SIGNAL_MAP["correct"]: float(correct_score),
+        }
+__all__ = [
+    "RewardProvider",
+    "build_reward_providers",
+    "extract_feedback_counts",
+    "extract_guess",
+    "extract_wordle_feedback",
+]

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""TextArena environment server components."""
+from .environment import TextArenaEnvironment
+__all__ = ["TextArenaEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,90 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""FastAPI application entrypoint for the TextArena environment."""
+from __future__ import annotations
+import os
+from openenv.core.env_server.http_server import create_app
+try:
+    # When running as installed package
+    from textarena_env.models import TextArenaAction, TextArenaObservation
+    from textarena_env.server.environment import TextArenaEnvironment
+except ImportError:
+    # When running uvicorn directly from textarena_env/
+    from models import TextArenaAction, TextArenaObservation
+    from .environment import TextArenaEnvironment
+def _parse_env_kwargs(prefix: str = "TEXTARENA_KW_") -> dict[str, str]:
+    """Collect arbitrary environment kwargs from the process environment."""
+    env_kwargs: dict[str, str] = {}
+    for key, value in os.environ.items():
+        if key.startswith(prefix):
+            env_key = key[len(prefix) :].lower()
+            env_kwargs[env_key] = value
+    return env_kwargs
+env_id = os.getenv("TEXTARENA_ENV_ID", "Wordle-v0")
+num_players = int(os.getenv("TEXTARENA_NUM_PLAYERS", "1"))
+max_turns_env = os.getenv("TEXTARENA_MAX_TURNS")
+max_turns = int(max_turns_env) if max_turns_env is not None else None
+download_nltk = os.getenv("TEXTARENA_DOWNLOAD_NLTK", "1") in {"1", "true", "True"}
+extra_kwargs = _parse_env_kwargs()
+# Factory function to create TextArenaEnvironment instances
+def create_textarena_environment():
+    """Factory function that creates TextArenaEnvironment with config."""
+    return TextArenaEnvironment(
+        env_id=env_id,
+        num_players=num_players,
+        max_turns=max_turns,
+        download_nltk=download_nltk,
+        env_kwargs=extra_kwargs,
+    )
+# Create the FastAPI app
+# Pass the factory function instead of an instance for WebSocket session support
+app = create_app(
+    create_textarena_environment,
+    TextArenaAction,
+    TextArenaObservation,
+    env_name="textarena_env",
+)
+def main(host: str = "0.0.0.0", port: int = 8000):
+    """
+    Entry point for direct execution via uv run or python -m.
+    This function enables running the server without Docker:
+        uv run --project . server
+        uv run --project . server --port 8001
+        python -m textarena.server.app
+    Args:
+        host: Host address to bind to (default: "0.0.0.0")
+        port: Port number to listen on (default: 8000)
+    For production deployments, consider using uvicorn directly with
+    multiple workers:
+        uvicorn textarena.server.app:app --workers 4
+    """
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    main()

server/environment.py ADDED Viewed

	@@ -0,0 +1,320 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Server implementation for the generic TextArena environment."""
+from __future__ import annotations
+import sys
+from typing import Any, Dict, Iterable, List, Optional
+from uuid import uuid4
+import nltk
+from openenv.core.env_server.interfaces import Environment
+try:
+    # When running as installed package
+    from textarena_env.models import (
+        TextArenaAction,
+        TextArenaMessage,
+        TextArenaObservation,
+        TextArenaState,
+    )
+    from textarena_env.rewards import RewardProvider, build_reward_providers
+except ImportError:
+    # When running uvicorn directly from textarena_env/
+    from models import (
+        TextArenaAction,
+        TextArenaMessage,
+        TextArenaObservation,
+        TextArenaState,
+    )
+    from rewards import RewardProvider, build_reward_providers
+_TEXTARENA_MODULE: Any | None = None
+_TEXTARENA_IMPORT_ERROR: Exception | None = None
+def _import_textarena() -> Any:
+    """Import ``textarena`` lazily and cache the module reference."""
+    global _TEXTARENA_MODULE, _TEXTARENA_IMPORT_ERROR
+    if _TEXTARENA_MODULE is not None:
+        return _TEXTARENA_MODULE
+    if _TEXTARENA_IMPORT_ERROR is not None:
+        raise _TEXTARENA_IMPORT_ERROR
+    if sys.version_info < (3, 10):
+        _TEXTARENA_IMPORT_ERROR = RuntimeError(
+            "TextArena environments require Python 3.10 or newer; "
+            f"current interpreter is {sys.version_info.major}.{sys.version_info.minor}"
+        )
+        raise _TEXTARENA_IMPORT_ERROR
+    try:
+        import textarena as ta  # type: ignore[import]
+    except Exception as exc:  # pragma: no cover - surfaced to caller
+        _TEXTARENA_IMPORT_ERROR = exc
+        raise
+    _TEXTARENA_MODULE = ta
+    return ta
+class TextArenaEnvironment(Environment):
+    """Wrap any TextArena game behind the OpenEnv ``Environment`` API."""
+    def __init__(
+        self,
+        env_id: str = "Wordle-v0",
+        *,
+        num_players: int = 1,
+        max_turns: Optional[int] = None,
+        download_nltk: bool = True,
+        env_kwargs: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        super().__init__()
+        ta = _import_textarena()
+        if download_nltk:
+            nltk.download("words", quiet=True)
+            nltk.download("averaged_perceptron_tagger_eng", quiet=True)
+        self.env_id = env_id
+        self.num_players = num_players
+        self.max_turns = max_turns
+        self._env_kwargs = env_kwargs or {}
+        self._ta_env = ta.make(env_id=env_id, **self._env_kwargs)
+        self._state = TextArenaState(
+            env_id=env_id,
+            num_players=num_players,
+            max_turns=max_turns,
+        )
+        self._reward_providers: List[RewardProvider] = build_reward_providers(env_id)
+        self._last_reward_signals: Dict[str, float] = {}
+    # ------------------------------------------------------------------
+    # Environment interface
+    # ------------------------------------------------------------------
+    def reset(self) -> TextArenaObservation:
+        # TextArena observation wrappers (LLMObservationWrapper, etc.) accumulate
+        # observations in self.full_observations across resets. Since we can't modify TextArena,
+        # we need to manually clear this state to prevent history accumulation.
+        env = self._ta_env
+        while hasattr(env, "env"):
+            if hasattr(env, "full_observations"):
+                env.full_observations = {}
+            env = env.env
+        # Also check the final unwrapped env
+        if hasattr(env, "full_observations"):
+            env.full_observations = {}
+        self._ta_env.reset(num_players=self.num_players)
+        for provider in self._reward_providers:
+            provider.reset()
+        self._state.episode_id = str(uuid4())
+        self._state.step_count = 0
+        self._state.turn = 0
+        self._state.last_reward = 0.0
+        self._state.last_info = {}
+        self._state.raw_state = self._snapshot_state()
+        self._last_reward_signals = {}
+        observation = self._build_observation()
+        observation.reward = 0.0
+        observation.done = False
+        return observation
+    def step(self, action: TextArenaAction) -> TextArenaObservation:  # type: ignore[override]
+        if not isinstance(action, TextArenaAction):
+            raise TypeError(f"Expected TextArenaAction, received {type(action)!r}")
+        done, info = self._ta_env.step(action.message)
+        self._state.step_count += 1
+        self._state.turn = getattr(self._ta_env.state, "turn", self._state.turn + 1)
+        self._state.last_info = info or {}
+        observation = self._build_observation()
+        observation.done = done
+        reward = self._extract_reward()
+        observation.reward = reward
+        self._state.last_reward = reward
+        reward_signals = self._compute_reward_signals(action=action, observation=observation)
+        if reward_signals:
+            observation.info.setdefault("reward_signals", {}).update(reward_signals)
+            observation.metadata.setdefault("reward_signals", {}).update(reward_signals)
+        self._last_reward_signals = reward_signals
+        if reward_signals:
+            self._state.last_info = {
+                **(self._state.last_info or {}),
+                "reward_signals": reward_signals,
+            }
+        self._state.raw_state = self._snapshot_state()
+        return observation
+    @property
+    def state(self) -> TextArenaState:
+        return self._state
+    # ------------------------------------------------------------------
+    # Helpers
+    # ------------------------------------------------------------------
+    def _build_observation(self) -> TextArenaObservation:
+        player_id, messages = self._ta_env.get_observation()
+        ta_messages = self._convert_messages(messages)
+        # Extract prompt from the appropriate messages.
+        # TextArena PROMPT type messages contain the game instructions added during reset.
+        # As a fallback for environments that don't use typed messages, use only the first
+        # message if we're at turn 0 (fresh reset).
+        prompt_lines = [msg.content for msg in ta_messages if msg.category == "PROMPT"]
+        if not prompt_lines:
+            # Fallback: use the first message only if at turn 0 (just after reset)
+            # DO NOT use all messages as this causes history accumulation
+            current_turn = getattr(self._ta_env.state, "turn", 0)
+            if current_turn == 0 and ta_messages:
+                prompt_lines = [ta_messages[0].content]
+            else:
+                # Use env_id as final fallback to avoid including game history
+                prompt_lines = [self.env_id]
+        prompt = "\n".join(prompt_lines).strip()
+        info: Dict[str, Any] = {}
+        info.update(getattr(self._ta_env.state, "step_info", {}))
+        observation = TextArenaObservation(
+            prompt=prompt,
+            messages=ta_messages,
+            current_player_id=player_id,
+            legal_players=self._legal_players(),
+            info=info,
+            metadata={
+                "env_id": self.env_id,
+                "turn": getattr(self._ta_env.state, "turn", 0),
+                "raw_messages": [
+                    {
+                        "sender_id": msg.sender_id,
+                        "content": msg.content,
+                        "category": msg.category,
+                    }
+                    for msg in ta_messages
+                ],
+            },
+        )
+        return observation
+    def _legal_players(self) -> List[int]:
+        role_mapping = getattr(self._ta_env.state, "role_mapping", {}) or {}
+        players = [pid for pid in role_mapping.keys() if isinstance(pid, int) and pid >= 0]
+        return sorted(players)
+    def _convert_messages(self, messages: Iterable[Any]) -> List[TextArenaMessage]:
+        converted: List[TextArenaMessage] = []
+        buffered_sender: int | None = None
+        buffered_category: str | None = None
+        buffered_content: List[str] = []
+        def flush_buffer() -> None:
+            nonlocal buffered_content, buffered_sender, buffered_category
+            if not buffered_content:
+                return
+            converted.append(
+                TextArenaMessage(
+                    sender_id=buffered_sender if buffered_sender is not None else -1,
+                    content="".join(buffered_content),
+                    category=buffered_category or "MESSAGE",
+                )
+            )
+            buffered_content = []
+            buffered_category = None
+            buffered_sender = None
+        for entry in messages:
+            if isinstance(entry, tuple) and len(entry) == 3:
+                sender, content, category = entry
+            elif isinstance(entry, tuple) and len(entry) == 2:
+                sender, content = entry
+                category = "MESSAGE"
+            else:
+                sender, content, category = -1, str(entry), "MESSAGE"
+            category_name = getattr(category, "name", str(category))
+            sender_id = int(sender) if isinstance(sender, (int, float)) else -1
+            text = str(content)
+            if buffered_content and buffered_category == category_name and buffered_sender == sender_id:
+                buffered_content.append(text)
+            else:
+                flush_buffer()
+                buffered_sender = sender_id
+                buffered_category = category_name
+                buffered_content = [text]
+        flush_buffer()
+        return converted
+    def _extract_reward(self) -> float:
+        rewards = getattr(self._ta_env.state, "rewards", None)
+        if isinstance(rewards, dict):
+            # Use current player reward if available, otherwise default to player 0.
+            player_id = getattr(self._ta_env.state, "current_player_id", 0)
+            if player_id in rewards:
+                return float(rewards[player_id])
+            if 0 in rewards:
+                return float(rewards[0])
+        return 0.0
+    def _snapshot_state(self) -> Dict[str, Any]:
+        state = self._ta_env.state
+        snapshot: Dict[str, Any] = {
+            "turn": getattr(state, "turn", 0),
+            "game_state": getattr(state, "game_state", {}),
+            "logs": list(getattr(state, "logs", [])),
+            "rewards": getattr(state, "rewards", None),
+            "done": getattr(state, "done", False),
+            "role_mapping": getattr(state, "role_mapping", {}),
+            "game_info": getattr(state, "game_info", {}),
+            "step_info": getattr(state, "step_info", {}),
+        }
+        if self._last_reward_signals:
+            snapshot["reward_signals"] = dict(self._last_reward_signals)
+        return snapshot
+    def _compute_reward_signals(
+        self, *, action: TextArenaAction, observation: TextArenaObservation
+    ) -> Dict[str, float]:
+        if not self._reward_providers:
+            return {}
+        aggregated: Dict[str, float] = {}
+        for provider in self._reward_providers:
+            try:
+                result = provider.compute(action=action, observation=observation)
+            except Exception:  # pragma: no cover - defensive
+                continue
+            for key, value in result.items():
+                aggregated[key] = float(value)
+        return aggregated

server/run_local.sh ADDED Viewed

	@@ -0,0 +1,7 @@

+export TEXTARENA_ENV_ID="Wordle-v0"
+export TEXTARENA_NUM_PLAYERS=1
+# Run the server
+exec uvicorn envs.textarena_env.server.app:app --host 0.0.0.0 --port 8001

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff