--- title: TemporalBenchEnv MCQ Server emoji: 🥁 colorFrom: yellow colorTo: indigo sdk: docker pinned: false app_port: 8000 base_path: /web tags: - openenv --- # TemporalBenchEnv OpenEnv environment for **multi-step multiple-choice** time-series reasoning. Each episode samples nine questions from pre-built JSON banks (per-dataset files or merged JSONL in `TSQuestion` shape). Rewards combine per-step correctness and an episode bonus (see `env/reward.py`). ## Question bank layout Point the server at a directory containing `PSML_questions.json`, `freshretailnet_questions.json`, `MIMIC_questions.json`, and `causal_chambers_questions.json` (each file is a JSON array of `TSQuestion` records), or set **`TEMPORALBENCH_QUESTION_BANK_DIR`** to that path. If unset, the server uses `tests/fixtures/banks` when present (for local smoke runs). Each record must include at least: `question_id`, `dataset`, `task_type` (`T1U` | `T3` | `T2_MCQ`), `prompt`, `options` (length ≥ 2), `answer`, plus optional `family`, `capability_tags`, `difficulty`, `metadata`. ## Quick Start Use the typed client (`TemporalBenchEnvClient`; alias `TemporalbenchenvEnv`): ```python from client import TemporalBenchAction, TemporalBenchEnvClient try: env = TemporalBenchEnvClient.from_docker_image("TemporalBenchEnv-env:latest") out = env.reset() while not out.done: q = out.observation # Agent picks q.options[i] or equivalent label string out = env.step(TemporalBenchAction(answer=q.options[0])) finally: env.close() ``` `TemporalBenchEnvClient.from_docker_image()` handles: - Starting the Docker container - Waiting for the server to be ready - Connecting to the environment - Container cleanup when you call `close()` ## Building the Docker Image Before using the environment, you need to build the Docker image: ```bash # From project root docker build -t TemporalBenchEnv-env:latest -f server/Dockerfile . ``` ## Deploying to Hugging Face Spaces You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command: ```bash # From the environment directory (where openenv.yaml is located) openenv push # Or specify options openenv push --namespace my-org --private ``` The `openenv push` command will: 1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`) 2. Prepare a custom build for Hugging Face Docker space (enables web interface) 3. Upload to Hugging Face (ensuring you're logged in) ### Prerequisites - Authenticate with Hugging Face: The command will prompt for login if not already authenticated ### Options - `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory) - `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml) - `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM) - `--private`: Deploy the space as private (default: public) ### Examples ```bash # Push to your personal namespace (defaults to username/env-name from openenv.yaml) openenv push # Push to a specific repository openenv push --repo-id my-org/my-env # Push with a custom base image openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest # Push as a private space openenv push --private # Combine options openenv push --repo-id my-org/my-env --base-image custom-base:latest --private ``` After deployment, your space will be available at: `https://huggingface.co/spaces/` The deployed space includes: - **Web Interface** at `/web` - Interactive UI for exploring the environment - **API Documentation** at `/docs` - Full OpenAPI/Swagger interface - **Health Check** at `/health` - Container health monitoring - **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions ## Environment Details ### Action (`TemporalBenchAction`) - `answer` (str) — MCQ label (must match ground truth after optional normalization) - `confidence`, `reasoning` — optional ### Observation (`TemporalBenchObservation`) - `question`, `options`, `task_type`, `dataset`, `history`, `accuracy_so_far` - `step_idx`, `steps_remaining`, `max_steps`, `done`, `reward`, `metadata` ### Reward - Per step: `alpha * correctness` (correctness 0 or 1). - On the final step, adds episode bonus: `lambda_ep * (total_correct / N) * coverage_multiplier` (1.0 if every dataset in the episode has at least one correct answer, else 0.8). ## Advanced Usage ### Connecting to an Existing Server If you already have a TemporalBenchEnv server running, connect with: ```python from client import TemporalBenchAction, TemporalBenchEnvClient with TemporalBenchEnvClient(base_url="http://localhost:8000") as env: r = env.reset() r = env.step(TemporalBenchAction(answer=r.observation.options[0])) ``` Note: `close()` does not stop a remote server you attached to with `base_url=...`. ### Using the Context Manager The client supports context manager usage for automatic connection management: ```python from client import TemporalBenchAction, TemporalBenchEnvClient with TemporalBenchEnvClient(base_url="http://localhost:8000") as env: result = env.reset() while not result.done: ans = result.observation.options[0] result = env.step(TemporalBenchAction(answer=ans)) ``` The client uses WebSocket connections for: - **Lower latency**: No HTTP connection overhead per request - **Persistent session**: Server maintains your environment state - **Efficient for episodes**: Better for many sequential steps ### Concurrent WebSocket Sessions The server uses **factory mode** (`create_app(_env_factory, ...)`) so each WebSocket session gets a fresh `TemporalBenchEnvironment`. Tune `max_concurrent_envs` in `server/app.py` as needed. ## Development & Testing ### Direct environment testing ```bash uv sync --extra dev uv run pytest tests/ ``` ### Running Locally Run the server locally for development: ```bash uvicorn server.app:app --reload ``` ## Project Structure ``` TemporalBenchEnv/ ├── .dockerignore # Docker build exclusions ├── __init__.py # Module exports ├── README.md # This file ├── openenv.yaml # OpenEnv manifest ├── pyproject.toml # Project metadata and dependencies ├── uv.lock # Locked dependencies (generated) ├── client.py # TemporalBenchEnvClient (alias TemporalbenchenvEnv) ├── models.py # Action / observation / state re-exports ├── env/ # Environment, sampler, grading, rewards ├── data/ # TSQuestion schema + JSON/JSONL loaders └── server/ ├── __init__.py # Server module exports ├── app.py # FastAPI application (HTTP + WebSocket endpoints) └── Dockerfile # Container image definition ```