TemporalBenchEnv / README.md
yashu2000's picture
Upload folder using huggingface_hub
d954568 verified
---
title: TemporalBenchEnv MCQ Server
emoji: πŸ₯
colorFrom: yellow
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# TemporalBenchEnv
OpenEnv environment for **multi-step multiple-choice** time-series reasoning. Each episode samples nine questions from pre-built JSON banks (per-dataset files or merged JSONL in `TSQuestion` shape). Rewards combine per-step correctness and an episode bonus (see `env/reward.py`).
## Question bank layout
Point the server at a directory containing `PSML_questions.json`, `freshretailnet_questions.json`, `MIMIC_questions.json`, and `causal_chambers_questions.json` (each file is a JSON array of `TSQuestion` records), or set **`TEMPORALBENCH_QUESTION_BANK_DIR`** to that path. If unset, the server uses `tests/fixtures/banks` when present (for local smoke runs).
Each record must include at least: `question_id`, `dataset`, `task_type` (`T1U` | `T3` | `T2_MCQ`), `prompt`, `options` (length β‰₯ 2), `answer`, plus optional `family`, `capability_tags`, `difficulty`, `metadata`.
## Quick Start
Use the typed client (`TemporalBenchEnvClient`; alias `TemporalbenchenvEnv`):
```python
from client import TemporalBenchAction, TemporalBenchEnvClient
try:
env = TemporalBenchEnvClient.from_docker_image("TemporalBenchEnv-env:latest")
out = env.reset()
while not out.done:
q = out.observation
# Agent picks q.options[i] or equivalent label string
out = env.step(TemporalBenchAction(answer=q.options[0]))
finally:
env.close()
```
`TemporalBenchEnvClient.from_docker_image()` handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when you call `close()`
## Building the Docker Image
Before using the environment, you need to build the Docker image:
```bash
# From project root
docker build -t TemporalBenchEnv-env:latest -f server/Dockerfile .
```
## Deploying to Hugging Face Spaces
You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
```bash
# From the environment directory (where openenv.yaml is located)
openenv push
# Or specify options
openenv push --namespace my-org --private
```
The `openenv push` command will:
1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
2. Prepare a custom build for Hugging Face Docker space (enables web interface)
3. Upload to Hugging Face (ensuring you're logged in)
### Prerequisites
- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
### Options
- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
- `--private`: Deploy the space as private (default: public)
### Examples
```bash
# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
openenv push
# Push to a specific repository
openenv push --repo-id my-org/my-env
# Push with a custom base image
openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
# Push as a private space
openenv push --private
# Combine options
openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
```
After deployment, your space will be available at:
`https://huggingface.co/spaces/<repo-id>`
The deployed space includes:
- **Web Interface** at `/web` - Interactive UI for exploring the environment
- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
- **Health Check** at `/health` - Container health monitoring
- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
## Environment Details
### Action (`TemporalBenchAction`)
- `answer` (str) β€” MCQ label (must match ground truth after optional normalization)
- `confidence`, `reasoning` β€” optional
### Observation (`TemporalBenchObservation`)
- `question`, `options`, `task_type`, `dataset`, `history`, `accuracy_so_far`
- `step_idx`, `steps_remaining`, `max_steps`, `done`, `reward`, `metadata`
### Reward
- Per step: `alpha * correctness` (correctness 0 or 1).
- On the final step, adds episode bonus: `lambda_ep * (total_correct / N) * coverage_multiplier` (1.0 if every dataset in the episode has at least one correct answer, else 0.8).
## Advanced Usage
### Connecting to an Existing Server
If you already have a TemporalBenchEnv server running, connect with:
```python
from client import TemporalBenchAction, TemporalBenchEnvClient
with TemporalBenchEnvClient(base_url="http://localhost:8000") as env:
r = env.reset()
r = env.step(TemporalBenchAction(answer=r.observation.options[0]))
```
Note: `close()` does not stop a remote server you attached to with `base_url=...`.
### Using the Context Manager
The client supports context manager usage for automatic connection management:
```python
from client import TemporalBenchAction, TemporalBenchEnvClient
with TemporalBenchEnvClient(base_url="http://localhost:8000") as env:
result = env.reset()
while not result.done:
ans = result.observation.options[0]
result = env.step(TemporalBenchAction(answer=ans))
```
The client uses WebSocket connections for:
- **Lower latency**: No HTTP connection overhead per request
- **Persistent session**: Server maintains your environment state
- **Efficient for episodes**: Better for many sequential steps
### Concurrent WebSocket Sessions
The server uses **factory mode** (`create_app(_env_factory, ...)`) so each WebSocket session gets a fresh `TemporalBenchEnvironment`. Tune `max_concurrent_envs` in `server/app.py` as needed.
## Development & Testing
### Direct environment testing
```bash
uv sync --extra dev
uv run pytest tests/
```
### Running Locally
Run the server locally for development:
```bash
uvicorn server.app:app --reload
```
## Project Structure
```
TemporalBenchEnv/
β”œβ”€β”€ .dockerignore # Docker build exclusions
β”œβ”€β”€ __init__.py # Module exports
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ openenv.yaml # OpenEnv manifest
β”œβ”€β”€ pyproject.toml # Project metadata and dependencies
β”œβ”€β”€ uv.lock # Locked dependencies (generated)
β”œβ”€β”€ client.py # TemporalBenchEnvClient (alias TemporalbenchenvEnv)
β”œβ”€β”€ models.py # Action / observation / state re-exports
β”œβ”€β”€ env/ # Environment, sampler, grading, rewards
β”œβ”€β”€ data/ # TSQuestion schema + JSON/JSONL loaders
└── server/
β”œβ”€β”€ __init__.py # Server module exports
β”œβ”€β”€ app.py # FastAPI application (HTTP + WebSocket endpoints)
└── Dockerfile # Container image definition
```