---
title: TemporalBenchEnv MCQ Server
emoji: 🥁
colorFrom: yellow
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
---

# TemporalBenchEnv

OpenEnv environment for **multi-step multiple-choice** time-series reasoning. Each episode samples nine questions from pre-built JSON banks (per-dataset files or merged JSONL in `TSQuestion` shape). Rewards combine per-step correctness and an episode bonus (see `env/reward.py`).

## Question bank layout

Point the server at a directory containing `PSML_questions.json`, `freshretailnet_questions.json`, `MIMIC_questions.json`, and `causal_chambers_questions.json` (each file is a JSON array of `TSQuestion` records), or set **`TEMPORALBENCH_QUESTION_BANK_DIR`** to that path. If unset, the server uses `tests/fixtures/banks` when present (for local smoke runs).

Each record must include at least: `question_id`, `dataset`, `task_type` (`T1U` | `T3` | `T2_MCQ`), `prompt`, `options` (length ≥ 2), `answer`, plus optional `family`, `capability_tags`, `difficulty`, `metadata`.

## Quick Start

Use the typed client (`TemporalBenchEnvClient`; alias `TemporalbenchenvEnv`):

```python
from client import TemporalBenchAction, TemporalBenchEnvClient

try:
    env = TemporalBenchEnvClient.from_docker_image("TemporalBenchEnv-env:latest")
    out = env.reset()
    while not out.done:
        q = out.observation
        # Agent picks q.options[i] or equivalent label string
        out = env.step(TemporalBenchAction(answer=q.options[0]))
finally:
    env.close()
```

`TemporalBenchEnvClient.from_docker_image()` handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when you call `close()`

## Building the Docker Image

Before using the environment, you need to build the Docker image:

```bash
# From project root
docker build -t TemporalBenchEnv-env:latest -f server/Dockerfile .
```

## Deploying to Hugging Face Spaces

You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:

```bash
# From the environment directory (where openenv.yaml is located)
openenv push

# Or specify options
openenv push --namespace my-org --private
```

The `openenv push` command will:
1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
2. Prepare a custom build for Hugging Face Docker space (enables web interface)
3. Upload to Hugging Face (ensuring you're logged in)

### Prerequisites

- Authenticate with Hugging Face: The command will prompt for login if not already authenticated

### Options

- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
- `--private`: Deploy the space as private (default: public)

### Examples

```bash
# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
openenv push

# Push to a specific repository
openenv push --repo-id my-org/my-env

# Push with a custom base image
openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest

# Push as a private space
openenv push --private

# Combine options
openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
```

After deployment, your space will be available at:
`https://huggingface.co/spaces/<repo-id>`

The deployed space includes:
- **Web Interface** at `/web` - Interactive UI for exploring the environment
- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
- **Health Check** at `/health` - Container health monitoring
- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions

## Environment Details

### Action (`TemporalBenchAction`)
- `answer` (str) — MCQ label (must match ground truth after optional normalization)
- `confidence`, `reasoning` — optional

### Observation (`TemporalBenchObservation`)
- `question`, `options`, `task_type`, `dataset`, `history`, `accuracy_so_far`
- `step_idx`, `steps_remaining`, `max_steps`, `done`, `reward`, `metadata`

### Reward
- Per step: `alpha * correctness` (correctness 0 or 1).
- On the final step, adds episode bonus: `lambda_ep * (total_correct / N) * coverage_multiplier` (1.0 if every dataset in the episode has at least one correct answer, else 0.8).

## Advanced Usage

### Connecting to an Existing Server

If you already have a TemporalBenchEnv server running, connect with:

```python
from client import TemporalBenchAction, TemporalBenchEnvClient

with TemporalBenchEnvClient(base_url="http://localhost:8000") as env:
    r = env.reset()
    r = env.step(TemporalBenchAction(answer=r.observation.options[0]))
```

Note: `close()` does not stop a remote server you attached to with `base_url=...`.

### Using the Context Manager

The client supports context manager usage for automatic connection management:

```python
from client import TemporalBenchAction, TemporalBenchEnvClient

with TemporalBenchEnvClient(base_url="http://localhost:8000") as env:
    result = env.reset()
    while not result.done:
        ans = result.observation.options[0]
        result = env.step(TemporalBenchAction(answer=ans))
```

The client uses WebSocket connections for:
- **Lower latency**: No HTTP connection overhead per request
- **Persistent session**: Server maintains your environment state
- **Efficient for episodes**: Better for many sequential steps

### Concurrent WebSocket Sessions

The server uses **factory mode** (`create_app(_env_factory, ...)`) so each WebSocket session gets a fresh `TemporalBenchEnvironment`. Tune `max_concurrent_envs` in `server/app.py` as needed.

## Development & Testing

### Direct environment testing

```bash
uv sync --extra dev
uv run pytest tests/
```

### Running Locally

Run the server locally for development:

```bash
uvicorn server.app:app --reload
```

## Project Structure

```
TemporalBenchEnv/
├── .dockerignore         # Docker build exclusions
├── __init__.py            # Module exports
├── README.md              # This file
├── openenv.yaml           # OpenEnv manifest
├── pyproject.toml         # Project metadata and dependencies
├── uv.lock                # Locked dependencies (generated)
├── client.py              # TemporalBenchEnvClient (alias TemporalbenchenvEnv)
├── models.py              # Action / observation / state re-exports
├── env/                   # Environment, sampler, grading, rewards
├── data/                  # TSQuestion schema + JSON/JSONL loaders
└── server/
    ├── __init__.py        # Server module exports
    ├── app.py             # FastAPI application (HTTP + WebSocket endpoints)
    └── Dockerfile         # Container image definition
```